0% found this document useful (0 votes)
10 views65 pages

Machine Learning 2

This document contains lecture notes on time series analysis. It discusses topics like time series models, stationarity, autocorrelation functions, autoregressive processes, moving average processes, and ARMA processes. Various concepts are defined and properties are discussed with examples. Reference materials on time series analysis are also provided.

Uploaded by

dasharathv87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views65 pages

Machine Learning 2

This document contains lecture notes on time series analysis. It discusses topics like time series models, stationarity, autocorrelation functions, autoregressive processes, moving average processes, and ARMA processes. Various concepts are defined and properties are discussed with examples. Reference materials on time series analysis are also provided.

Uploaded by

dasharathv87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Lecture Notes

on

Time Series Analysis


STA-403-S3(ELECTIVE II)

Unit-II
Time Series Models for Linear Stationary
SR

Processes

Dr. Suresh, R
Assistant Professor
Department of Statistics
Bangalore University, Bengaluru-560 056

1
Contents
1 Time–series(t.s) as Discrete Parameter Stochastic Process, Def-
inition of Strict and Weak Stationarity of a t.s., Gaussian t.s.,
Ergodicity 6
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 Describing (Properties of) Stochastic Processes . . . . . . . 6
1.2 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Examples of Some Processes . . . . . . . . . . . . . . . . . . 11
1.2.2 Relationship between Strong and Weak Stationarity . . . . . 13
1.3 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Autocovariance Function (ACVF) and Autocorrelation Function


(ACF) and their properties, Partial Autocorrelation Function
(PACF) 15
2.1 Properties of Autocovariance Function (ACVF) and Autocorrela-
tion Function (ACF) . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Conditions Satisfied by the Autocorrelations of a Stationary
Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Partial Autocorrelation Function (PACF) . . . . . . . . . . . . . . . 17
2.2.1 Calculating PACF . . . . . . . . . . . . . . . . . . . . . . . 17

3 General Linear Processes(G.L.P), Autocovariance generating func-


tion, Stationarity and Invertibility conditions of a G.L.P 19
SR

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Alternative Form of GLP . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Properties of GLP . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Stationarity and Invertibility Conditions of a General Linear Process 24

4 Autoregressive Processes(AR(p)), Stationarity Condition, ACF,


PACF, Yule-Walker Equations, AR(1) and AR(2) Processes 26
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Stationarity Conditions of AR(p) Process . . . . . . . . . . . . . . . 28
4.3 ACVF and ACF of the AR(p) Process . . . . . . . . . . . . . . . . 29
4.3.1 Autoregressive Parameters in Terms of the Autocorrelations:
Yule–Walker Equations . . . . . . . . . . . . . . . . . . . . . 30
4.4 Variance of the AR(p) Process . . . . . . . . . . . . . . . . . . . . . 31
4.5 PACF of the AR(p) Process . . . . . . . . . . . . . . . . . . . . . . 31
4.6 AR(1) Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.6.2 Stationarity and Invertibility of AR(1) Process . . . . . . . . 33
4.6.3 ACVF and ACF of AR(1) Process . . . . . . . . . . . . . . . 34
4.6.4 Variance of AR(1) Process . . . . . . . . . . . . . . . . . . . 35
4.6.5 PACF of AR(1) Process . . . . . . . . . . . . . . . . . . . . 35
4.7 AR(2) Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.7.2 Stationarity and Invertibility of AR(2) Process . . . . . . . . 36
4.7.3 ACVF and ACF of AR(2) Process . . . . . . . . . . . . . . . 37
4.7.4 Variance of AR(2) Process . . . . . . . . . . . . . . . . . . . 38

2
4.7.5 PACF of AR(2) Process . . . . . . . . . . . . . . . . . . . . 39

5 Moving Average (MA(q)) Processes, Invertibility Condition, ACF,


PACF, MA(1), MA(2) Processes, Duality between AR(p) and
MA(q) Processes 40
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Invertibility Conditions of MA(q) Process . . . . . . . . . . . . . . 42
5.3 ACVF and ACF of the MA(q) Process . . . . . . . . . . . . . . . . 44
5.3.1 Moving Average Parameters in Terms of the Autocorrelations 45
5.4 PACF of the MA(q) Process . . . . . . . . . . . . . . . . . . . . . . 45
5.5 MA(1) Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5.2 Invertibility Condition of MA(1) Process . . . . . . . . . . . 46
5.5.3 ACVF and ACF of the MA(1) Process . . . . . . . . . . . . 46
5.5.4 Non-uniqueness of MA Models and Invertibility . . . . . . . 47
5.5.5 PACF of the MA(1) Process . . . . . . . . . . . . . . . . . . 48
5.6 MA(2) Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.6.2 Invertibility Condition of MA(2) Process . . . . . . . . . . . 49
5.6.3 ACVF and ACF of the MA(2) Process . . . . . . . . . . . . 49
5.6.4 PACF of the MA(2) Process . . . . . . . . . . . . . . . . . . 50
5.7 Duality between AR(p) and MA(q) Processes . . . . . . . . . . . . 50
5.7.1 Consequences of the Duality . . . . . . . . . . . . . . . . . . 53
SR

6 ARMA(p, q) Processes, Stationarity, Invertibility, ACF, PACF,


ARMA(1,1) Processes 54
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2 Stationarity Condition of ARMA(p, q) Process . . . . . . . . . . . . 57
6.3 Invertibility Condition of ARMA(p, q) Process . . . . . . . . . . . . 57
6.4 Relationship between ψj , πj , φi and θi of ARMA(p, q) Process . . . 58
6.5 ACVF and ACF of the ARMA(p, q) Process . . . . . . . . . . . . . 58
6.6 Variance of the ARMA(p, q) Process . . . . . . . . . . . . . . . . . 60
6.7 PACF of the ARMA(p, q) Process . . . . . . . . . . . . . . . . . . . 60
6.8 ARMA(1, 1) Process . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.8.2 Stationarity and Invertibility of ARMA(1, 1) Process . . . . 61
6.8.3 ACVF and ACF of ARMA(1, 1) Process . . . . . . . . . . . 61
6.8.4 PACF of ARMA(1,1) Process . . . . . . . . . . . . . . . . . 64
6.9 Summary of Properties of AR, MA, and ARMA Processes . . . . . 65

3
Syllabus

Unit-II: 12 hrs - 18 hrs - Time Series Models for Linear Stationary


Processes

1. Time–series(t.s) as Discrete Parameter Stochastic Process, Def-


inition of Strict and Weak Stationarity of a t.s., Gaussian t.s.,
Ergodicity
2. Autocovariance and Autocorrelation Functions(ACVF and ACF)
and their properties, Partial Autocorrelation Function(PACF)
3. General Linear Processes(G.L.P), Autocovariance generating func-
tion, Stationarity and Invertibility conditions of a G.L.P
SR

4. Autoregressive Processes(AR(p)), Stationarity Condition, ACF,


PACF, Yule-Walker Equations, AR(1) and AR(2) Processes
5. Moving Average (MA(q)) Processes, Invertibility Condition, ACF,
PACF,MA(1), MA(2) Processes, Duality between AR(p) and
MA(q) Processes
6. ARMA(p,q) Processes, Stationarity, Invertibility, ACF, PACF,
ARMA(1,1) processes

4
References
[1] Anderson, T. W. , The Statistical Analysis of Time Series, Wi-
ley, New York, 1971.
[2] Box, G. E. P., Jenkins, G. M., Reinsel, G. C. and Ljung, G.
M., Time Series Analysis-Forecasting and Control, 5/e, Wiley,
2016.
[3] Brockwell, P. J. and Davis, R. A., Introduction to Time Series
and Forecasting, 3/e, Springer, Switzerland, 2016.
[4] Brockwell, P. J. and Davis, R. A., Time Series: Theory and
Methods, 2/e, Springer, New York, 2009.
[5] Chatfield, C., The Analysis of Time Series: Theory and Prac-
tice, 5/e, Chapman and Hall, London, 1996.
[6] Chatfield, C. and Xing, H., The Analysis of Time Series: An
Introduction with R, 7/e, CRC Press, 2019.
[7] Cryer, J. D. and Chan, K. S., Time Series Analysis with Appli-
SR

cation in R, 2/e, Springer, New York, 2008.


[8] Kirchgassner, G., Wolters, J. and Hassler, U., Introduction to
Modern Time Series Analysis, 2/e, Springer, Berlin, 2013.
[9] Kendall, M. G. and Ord, J. K., Time Series, 3/e, Edward
Arnold, New York, 1990.
[10] Nachane, D. M., Econometrics: Theoretical Foundations and
Empirical Perspectives, Oxford University Press, London, 2006.
[11] Montgomery, D. C., Jennings, C. L., and Kulahci, M., An In-
troduction to Time Series Analysis and Forecasting, 2/e, Wiley,
New Jersey, 2015.
[12] Suresh Chandra, K., (Former Professor, University of Madras)
Lecuture Notes on Inference for Time Series, Unpublished,
2019.

5
1 Time–series(t.s) as Discrete Parameter Stochas-
tic Process, Definition of Strict and Weak Sta-
tionarity of a t.s., Gaussian t.s., Ergodicity
1.1 Introduction
Definition 1.1 ( Time–series(t.s) as Discrete Parameter Stochas-
tic Process). : A time series is a discrete time stochastic pro-
cess X = {Xt ; t ∈ T } where t a time parameter with domain T =
{0, ±1, ±2, . . .}.
Note: Let (Ω, A, P ) be a given Probability Space. Let T =
{0, ±1, ±2, . . .}. A sequence X = {Xt ; t ∈ T } is said to be a discrete
time
Q stochastic process if it is a mapping from Ω to RT (which is equal
to t∈T Rt , where for each t, Rt is a copy of the real line R) such that
for any integer m ≥ 1, t1 , t2 , . . . , tm ∈ T , and for all x1 , x2 , . . . , xm ∈
R, {(Xt1 ≤ x1 ) ∩ (Xt2 ≤ x2 ) ∩ (Xt3 ≤ x3 ) ∩ · · · ∩ (Xtm ≤ xm )} is a
member of the sigma field A.
Remark: In the case of stochastic process we have a large (count-
able or uncountable) number of distribution functions and in most
SR

situations, we have no option for repeating the experiment, much


less, do we have complete information on even a single realization!

1.1.1 Describing (Properties of ) Stochastic Processes

A simpler, more useful way of describing a stochastic process is to


give the moments of the process, particularly the first and second
moments that are called the mean and autocovariance function
(ACVF), respectively. The variance function is a special case of
the ACVF.
Definition 1.1 (Mean of a Stochastic Process). The mean func-
tion µt is defined for all t by
µt = E(Xt ). (1)
Definition 1.2 (Variance of a Stochastic Process). The vari-
ance function σt2 is defined for all t by
σt2 = var(Xt ) = E(Xt − µt )2 . (2)
Note: The variance function alone is not enough to specify the
second moments of a sequence of random variables. More generally,
we define Autocovariance Function.

6
Definition 1.3 (Autocovariance Function of a Stochastic Pro-
cess (ACVF)). The Autocovariance Function (ACVF) γt1 ,t2 is de-
fined to be the covariance of Xt1 and Xt2 ,(covariances between ran-
dom variables of the same stochastic process hence known as autoco-
variance) namely
γt1 ,t2 = E{(Xt1 − µt1 )(Xt2 − µt2 )}. (3)
Clearly, the variance function is a special case of the ACVF when
t1 = t2 = t(say), which implies Cov(Xt , Xt ) = V ar(Xt ) = γ0
Remark: Higher moments of a stochastic process may be defined
in an obvious way, but are rarely used in practice.
Note: The size of an autocovariance coefficient depends on the
units in which Xt is measured. Thus, for interpretative purposes, it
is helpful to standardize the ACVF, to produce a function called the
autocorrelation function(ACF). It is also called as Serial Correla-
tion.
Definition 1.4 (Autocorrelation Function(ACF)). The auto-
correlation function (ACF) ρt1 ,t2 to be the correlation between Xt1
SR

and Xt2 , namely


γt ,t E{(Xt1 − µt1 )(Xt2 − µt2 )}
ρt1 ,t2 = q 1q2 = q q (4)
2 2
σt 1 σt 2 σt21 σt22

1.2 Stationary Processes


To make statistical inferences about the structure of a stochastic
process on the basis of an observed record of that process, we must
usually make some simplifying (and presumably reasonable) assump-
tions about that structure. Especially, If we wish to make predic-
tions, then clearly we must assume that something does not vary
with time. In time series analysis our goal is to predict a series that
typically is not deterministic but contains a random component, If
we assume that this random component does not change with time,
then we can develop powerful techniques to forecast its future values.
The most important such assumption is that of stationarity.
An important class of stochastic processes are those that are sta-
tionary. Broadly speaking a time series is said to be stationary if
there is no systematic change in mean (no trend), if there is no sys-
tematic change in variance and if strictly periodic variations have

7
been removed. In other words, the properties of one section of the
data are much like those of any other section.
Loosely speaking, a time series {Xt , t = 0, ±1, ±2, . . .} is said to
be stationary if it has statistical properties similar to those of the
“time-shifted” series {Xt+k , t = 0, ±1, ±2, . . .}, for each integer k.
Strictly speaking, it is very often that time series data violate the
stationarity property. However, the phrase is often used for time
series data meaning that they exhibit characteristics that suggest a
stationary model can sensibly be fitted.
Much of the probability theory of time series is concerned with
stationary time series, and for this reason time series analysis often
requires one to transform a non-stationary series into a stationary
one so as to use this theory. For example, it may be of interest to
remove the trend and seasonal variation from a set of data and then
try to model the variation in the residuals by means of a stationary
stochastic process. However, it is also worth stressing that the non-
stationary components, such as the trend, may be of more interest
than the stationary residuals.
Stationarity is an essential concept in time series analysis. Gen-
SR

erally speaking, we can distinguish two definitions of stationarity.


The first definition is focused on the joint distribution of the process,
while the second focuses on the second order structure of the time
series model.
Definition 1.2 (Strict Stationarity of a Time Series). : A
time series is said to be strictly stationary if the joint distri-
bution of Xt1 , Xt2 , . . . , Xth is the same as the joint distribution of
Xt1 +k , Xt2 +k , . . . , Xth +k for all t1 , t2 , . . . , th , k.(h ≥ 1 is any integer,
t1 , t2 , . . . , th ∈ T , k is any positive integer, called as lag).

In other words, shifting the time origin by an amount k has no effect


on the joint distributions, which must therefore depend only on the
intervals between t1 , t2 , . . . , th .
Definition 1.3 (Strict Stationarity of a Time Series). : A
stochastic process X = {Xt ; t ∈ T } is said to be strictly stationary if
for any positive integer h ≥ 1, t1 , t2 , . . . , th ∈ T , and for any positive
integer k, the two vectors of Xt1 , Xt2 , . . . , Xth and Xt1 +k , Xt2 +k , . . . , Xth +k
have the same distribution.

8
Notes:
1. Strict Stationarity is also known as Strong Stationarity or Sta-
tionarity in Distribution.
2. In verbal terms a stationary process is that process whose prob-
abilistic laws remain unchanged over shifts of time. In other
words the utility of the observations x1 , x2 , . . . , xn in drawing
inference on the time series is the same as one gets by using
a data collected after a time shift say, x1+k , x2+k , . . . , xn+k , (
k being any positive integer). Interestingly a good number of
time series in several areas of research, like biology, ecology and
economics, exhibit this feature.
3. Moments, (even mean) of a strictly stationary time series need
not exist.
Example: A process consisting of a sequence of i.i.d. Cauchy
random variables.
4. An iid sequence is strictly stationary. Converse need not be
true.
SR

Example: The sequence X(t) = X say, for all t is strictly


stationary. This is not iid!. Because, It is a constant sequence.
This process exhibits maximal dependence (i.e., the process is
identically distributed but not independent).
5. For a strictly stationary process, since the distribution func-
tion is same for all t, the mean function µt = µ is a constant,
provided E(|Xt |) < ∞. Likewise, if E(Xt2 ) < ∞, then the vari-
ance function σt2 = σ 2 for all t and hence is a constant.
6. Moreover, since FXt1 ,Xt2 (x1 , x2 ) = FXt1 +k ,Xt2 +k (x1 , x2 ) (when h =
2) for any integers t1 , t2 and k, we have

γt1 ,t2 = γt1 +k,t2 +k (5)

and
ρt1 ,t2 = ρt1 +k,t2 +k . (6)
Letting t1 = t − k and t2 = t, we get the autocovariance coeffi-
cient at lag k,

γt1 ,t2 = γt−k,t = γt,t+k = γk , (7)

9
and the autocorrelation coefficient at lag k,
γk
ρt1 ,t2 = ρt−k,t = ρt,t+k = ρk = . (8)
γ0
Thus, for a strictly stationary process with first two moments
finite, the covariance and the correlation between Xt and Xt+k
depend only on the time difference k.
Remark: Strict stationarity is too stringent for verification in
practical applications. It has to be weakened to a level that will be
useful for statistical inference. That is, in practice it is often use-
ful to define stationarity in a less restricted way than that described
above. Such a concept is known as Weak/Second-Order/Covariance
Stationarity, which is achieved interms of the moments of the pro-
cess.
Definition 1.5 (Weak Stationarity of a Time Series). A Stochas-
tic process is called second-order stationary (or weakly stationary or
Covariance stationary) if its mean is constant and its ACVF depends
only on the lag, so that
SR

E(Xt ) = µ ∀ t ∈ T (9)

and
Cov(Xt , Xt+k ) = γk ∀ k, and t ∈ T, (10)
or
Cov(Xt+s , Xt+r ) = γ|r−s| ∀ r, s, and t ∈ T. (11)
Note:
1. By letting k = 0, we note that the form of a stationary ACVF
implies that the variance, as well as the mean, is constant. Fur-
ther, the autocovariance and autocorrelation being functions of
the time differences alone.
2. The definition also implies that both the variance and the mean
must be finite.
3. Sometimes, the terms stationary in wide sense or covariance
stationary are also used to describe a second-order weakely sta-
tionary process.

10
4. It follows from the above definitions(Strict and Weak) that a
strictly stationary process with the first two moments finite is
also a second-order weakley or covariance staationary process.
Yet, a striclty stationary process may not have finite moments
and hence may not be covariance stationary(IID Cauchy random
variables - Stricty stationay but not Weak because no moments).
5. No requirements are placed on the higher order moments.
Remark:
1. This weaker definition of stationarity will generally be used from
now on, as many of the properties of stationary processes depend
only on the structure of the process as specified by its first and
second moments.(For instance, Normal Process).
2. The assumption of weak stationarity can be checked empirically
provided that a sufficient number of historical data(time series)
are available. For example, one can divide the data into sub-
samples and check the consistency of the results obtained across
the subsamples.
SR

Definition 1.4 (Normal or Gaussian Process). : A Stochas-


tic process is said to be a normal or Gaussian process if its joint
probability distribution is normal.
Remark: Because a normal distribution is uniquely characterised
by its first two moments, strictly stationary and weakly stationary are
equivalent for a Gaussian process.

1.2.1 Examples of Some Processes

Example 1 : IID Noise - Perhaps the simplest model for a time series
is one in which there is no trend or seasonal component and
in which the observations are simply independent and identi-
cally distributed (iid) random variables with zero mean. Such
a sequence of random variables X1 , X2 , . . . are known as iid se-
quence,indicated by {Xt } ∼ IID(0, σ 2 ).
Note : In this model there is no dependence between observa-
tions, i.e., knowledge of X1 , X2 , . . . , Xn is of no value in pre-
dicting the behaviour of the future observation Xn+h (say). Al-
though this means that iid noise is a rather uninteresting process

11
for forecasters, it plays an important role as a building block for
more complicated time series models.
Example 2 : Random Walk (RW) - Suppose that {Xt } is a discrete-time,
2
purely random process with mean µ and variance σX . A process
{St } is said to be a random walk if
St = St−1 + Xt . (12)

The RW St , t = 0, 1, 2, ...(starting at zero, the process is cus-


tomarily started at zero when t = 0) is obtained by cumulatively
summing (or “integrating”) iid random variables. Thus a ran-
dom walk isPobtained by defining S0 = 0 and St = X1 + X2 +
· · · + Xt = ti=1 Xi for t = 1, 2, . . ..

2
Note: In the above example, E(St ) = tµ and V ar(St ) = tσX .
As the mean and variance change with t, the process is nonsta-
tionary. However, it is interesting to note that the first differ-
ence of a RW, given by
SR

∇St = St − St−1 = Xt (13)


form a purely random process, which is therefore stationary,
{∇St }.
Example 3 : White Noise - If {Xt } is a sequence of uncorrelated random
variables, each with zero mean and variance σ 2 , then clearly
{Xt } is stationary with the same covariance function as the iid
noise. Such a sequence is referred to as White Noise(with mean
0 and variance σ 2 ), indicated by {Xt } ∼ W N (0, σ 2 )
Note: Clearly every IID(0, σ 2 ) sequence is W N (0, σ 2 ) but not
conversly. IID sequence is a special case of WN process. For this
reason the IID sequence is known as Independent WN process.
The term white noise (Note that a purely random process is
sometimes called white noise, particularly by engineers.) arises
from the fact that a frequency analysis of the model shows that,
in analogy with white light, all frequencies enter equally.
Example 4 : Consider the time sequence Xt = Asin(ωt + θ), where A is a
random variable with a zero mean and a unit variance and θ is
a random variable with a uniform distribution on the interval
[−π, π] independent of A. The process is stationary (Verify).

12
Example 5 : Let {Xt } be a sequence of independent random variables alter-
nately following a standard normal distribution N (0, 1) and a
two-sided discrete uniform distribution with equal pobability 1/2
of being −1 or 1. Clearly, the process {Xt } is covariance sta-
tionary (Verify). The process, however, is not stationary. It is,
in fact, not stationary in distribution for any order(h).

1.2.2 Relationship between Strong and Weak Stationarity

1. First note that finite second moments are not assumed in the
definition of strong stationarity, therefore strong stationarity
does not necessarily imply weak stationarity.
2. If the process {Xt ; t ∈ Z} is strongly stationary and has finite
second moment, then {Xt ; t ∈ Z} is weakly stationary.
3. Of course weakly stationary process is not necessarily strongly
stationary, i.e., weak stationarity ; strong stationarity.
4. There is one important case however in which weak stationar-
ity implies strong stationarity. If the process {Xt ; t ∈ Z} is a
SR

weakly stationary Gaussian(Normal) process, then {Xt ; t ∈ Z}


is strongly stationary.

1.3 Ergodicity
In a strictly stationary or covariance stationary stochastic process no
assumption is made about the strength of dependence between random
variables in the sequence. For example, in a covariance stationary
stochastic process it is possible that ρ1 = Cor(Xt , Xt−1 ) = ρ100 =
Cor(Xt , Xt−100 ) = 0.5 say. However, in many contexts it is rea-
sonable to assume that the strength of dependence between random
variables in a stochastic process diminishes the farther apart they be-
come. That is, ρ1 > ρ2 > · · · and that eventually ρj = 0 for j large
enough. This diminishing dependence assumption is captured by the
concept of ergodicity.
Definition 1.5 ( Ergodicity). : Intuitively, a stochastic process
{Xt } is ergodic if any two collections of random variables partitioned
far apart in the sequence are essentially independent.
Note: The stochastic process {Xt } is ergodic if Xt and Xt−j are
essentially independent if j is large enough.

13
Definition 1.6 ( Ergodicity). : A covariance stationary time se-
ries {Xt } is said to be ergodic if for every realization x1 , x2 , . . . , xn ,
the sample mean and sample autocovariance coverge in mean-square
to their true(ensemble) statistical quantities, i.e.,
1.  !2 
Pn
t=1 xt
lim E  −µ  = 0 and (14)
n→∞ n

2.
 !2 
Pn
t=1 (xt − x̄)(xt−k − x̄)
lim E  − γk  = 0, 0 ≤ k ≤ n.
n→∞ n
(15)
In other words, a time average like nt=1 xt /n converges to a pop-
P

ulation quantity like E(Xt ) as n → ∞, which can be regarded as


the counterpart of the central limit theory for the iid case. A suffi-
cient condition for this to happen is that ρk → 0, as k → ∞ and
the process is then called ‘ergodic in the mean’.PIf the autocovari-
SR

ances of a covariance-stationary process satisfy ∞ j=0 γj < ∞, then


{Xt ; t ∈ T } is ergodic for the mean. If {Xt ; t ∈ T } is a stationary
Gaussian process, absolute summability of autocovariances is suffi-
cient to ensure ergodicity for all moments.
Note: Ergodicity helps to study the important properties of the
process from a (fairly large) single (partial) realization of a time
series, without going for independent trials of the experiment that
generates the time series.
In other words, ergodicity implies that the statistical properties of
a process can be deduced given a single and sufficiently long sample
path.
It helps us to construct asymptotically unbiased and consistent
estimates of the first (e.g.mean) and second moments (e.g. auto co-
variance function and hence auto correlation function) of the time
series without the need for independent trails as required in the in-
ference under iid set up. There are several theorems, called ergodic
theorems for stationary processes that support the validity of the in-
ference.
We will not pursue the topic here but rather simply assume that
appropriate ergodic properties are satisfied when estimating the prop-

14
erties of stationary processes. More details may be found, for exam-
ple, in Hamilton (1994, p. 46).

Remarks:
1. The notions Stationarity and Ergodicity plays a central role in
estimations of the statistical quantities of a Time Series process.

2 Autocovariance Function (ACVF) and Auto-


correlation Function (ACF) and their proper-
ties, Partial Autocorrelation Function (PACF)
In subsection 1.1.1, we have defined ACVF and ACF which are seen
as tools to describe a Stochastic Process. The ACVF and ACF pro-
vide a useful measure of the degree of dependence among the values of
a time series at different times and for this reason play an important
role when we consider the prediction of future values of the series in
terms of the past and present values. For a stationary time series,
the functions γk and ρk defined earlier are better known as auto co-
SR

variance function (ACVF) and autocorrelation function (ACF) at


lag k of the time series. They can be estimated from observations of
X1 , X2 , . . . , Xn by computing the sample ACVF and ACF. Here, we
shall investigate some of their general properties.

2.1 Properties of Autocovariance Function (ACVF) and


Autocorrelation Function (ACF)
It is easy to see that the Autocovariance Function (ACVF) γk and
Autocorrelation Function (ACF) ρk have the following properties:
1. γ0 = V ar(Xt ); ρ0 = 1
2. |γk | ≤ γ0 ; |ρk | ≤ 1
3. γk = γ−k ; ρk = ρ−k ∀ k , i.e., γk and ρk are even functions and
hence symmetric about the lag k = 0. This property follows from
the time diference between Xt and t+k and Xt and t−k are the
same. Therefore, the autocorrelation function is often plotted
only for the nonnegative lags. This plot is sometimes known as
Correlogram.

15
4. The acf does not uniquely identify the underlying model.
5. The acvf and acf are positive semidefinite (nonnegative definite)
in the sense that
Xn X n
αi αj γ|ti −tj | ≥ 0 (16)
i=1 j=1

and n X
n
X
αi αj ρ|ti −tj | ≥ 0 (17)
i=1 j=1
for any set of time points t1 , t2 , . . . , tn and any real numbers
α1 , α2 , . . . , αn .
Note:
1. Although a given stochastic process has a unique covariance
structure, the converse is not in general true. It is usually
possible to find many normal and non-normal processes with
the same ACF, and this creates further difficulty in interpret-
ing sample ACFs. Even for stationary normal processes, which
are completely determined by the mean, variance and ACF, we
SR

will see latter in the due course that a requirement, called the
invertibility condition, is needed to ensure uniqueness.
2. It is pertinent to know that not every arbitrary function satis-
fying properties 1 to 3 can be an acvf or acf for a process. A
necessary condition for a function to be the acvf or acf of some
process is that it be positive definite.
3. The proofs of properties 1, 2, 3 and 5 can be found in standard
text books on Time Series Analysis. (Kindly refer them and
prepare the respective proofs)

2.1.1 Conditions Satisfied by the Autocorrelations of a Stationary Pro-


cess

The covariance matrix associated with a stationary process for obser-


vations (X1 , X2 , . . . , Xn ) made at n successive times is (page 26,Box
Jenkins)
Note that both the autocovariance matrix and the autocorrelation ma-
trix are positive definite for any stationary process. The positive
definiteness of the autocorrelation matrix
implies that

16
2.2 Partial Autocorrelation Function (PACF)
Apart from the ACVF and ACF tools, Partial Autocorrelation Func-
tion (PACF) is another important tool to analyse Time Series. While
developing a time series model, correlograms will be used extensively.
The interpretation of correlograms is one of the hardest aspects of
time series analysis and practical experience is a ‘must’. Inspection
of the partial autocorrelation function can provide additional help.
Here is an instance which sheds light on the above aspect: Due
to the stability conditions, autocorrelation functions of stationary fi-
nite order autoregressive processes (to be discussed latter) are always
sequences that converge to zero but do not break off. This makes
it difficult to distinguish between processes of different orders when
using the autocorrelation function. To cope with this problem, we
introduce a new concept, the partial autocorrelation function.
The partial correlation between two random variables is the cor-
relation that remains if the possible impact of all other random vari-
ables has been eliminated.
Definition 2.1 ( Patial Autocorrelation Function (PACF) of
SR

order(lag) k). : The Patial Autocorrelation Function (PACF) of


order(lag) k denoted by φkk is the correlation coefficient between Xt
and Xt+k after their mutual linear dependency on the intervening
variables Xt+1 , Xt+2 , . . . , Xt+k−1 has been removed, i.e., The condi-
tional correlation

φkk = Corr(Xt , Xt+k |Xt+1 , Xt+2 , . . . , Xt+k−1 ). (18)

2.2.1 Calculating PACF

Consider the regression model, where the dependent variable Xt+k


from a zero mean stationary process is regressed on k lagged variables
Xt+k−1 , Xt+k−2 , . . . , Xt+1 , Xt , i.e.,

Xt+k = φk1 Xt+k−1 +φk2 Xt+k−2 +· · ·+φk(k−1) Xt+1 +φkk Xt +t+k , (19)

where φki denotes the ith regression parameter and t+k is an error
term with mean 0 and uncorrelated with Xt+k−j for j = 1(1)k. Mul-
tiplying Xt+k−j on both sides of the above regression equation and
taking expectation, we get

γj = φk1 γj−1 + φk2 γj−2 + · · · + φk(k−1) γj−(k−1) + φkk γj−k ; (20)

17
and

ρj = φk1 ρj−1 + φk2 ρj−2 + · · · + φk(k−1) ρj−(k−1) + φkk ρj−k . (21)

For j = 1(1)k, we have the following system of equations:

ρ1 = φk1 ρ0 + φk2 ρ1 + · · · + φk(k−1) ρk−2 + φkk ρk−1


ρ2 = φk1 ρ1 + φk2 ρ0 + · · · + φk(k−1) ρk−3 + φkk ρk−2
..
.
ρk−1 = φk1 ρk−2 + φk2 ρk−3 + · · · + φk(k−1) ρ2 + φkk ρ1
ρk = φk1 ρk−1 + φk−2 ρk−2 + · · · + φk(k−1) ρ1 + φkk ρ0 .

Using Cramer’s rule successively for k = 1, 2, . . ., we have

φ11 = ρ1

1 ρ1
ρ1 ρ2
φ22 =
SR

1 ρ1
ρ1 1

1 ρ1 ρ1
ρ 1 1 ρ2
ρ 2 ρ1 ρ3
φ33 =
1 ρ1 ρ2
ρ 1 1 ρ1
ρ 2 ρ1 1
..
.
1 ρ1 ρ2 · · · ρk−2 ρ1
ρ1 1 ρ1 · · · ρk−3 ρ2
.. .. .. . . . .. ..
. . . . .
ρk−1 ρk−2 ρk−3 · · · ρ1 ρk
φkk = . (22)
1 ρ1 ρ2 · · · ρk−2 ρk−1
ρ1 1 ρ1 · · · ρk−3 ρk−2
.. .. .. . . . .. ..
. . . . .
ρk−1 ρk−2 ρk−3 · · · ρ1 1

18
Thus, the partial autocorrelation between Xt and Xt+k can be ob-
tained as the regression coefficient associated with Xt when regress-
ing Xt+k on its lagged variables Xt+k−1 , Xt+k−2 , . . . , Xt+1 , Xt as in
equation (19).

Remark: It is pertinent to note that, the PACF is computed us-


ing the ACF.

Note:
1. As a function of k, φkk is usually refered to as the Partial Au-
tocorrelation Function (PACF).
2. PACF is a useful tool for determining the order p of an Autore-
gressive (AR) model, which will be discussed in the sequel.
3. The plots of ACF and PACF play a crucial role in modeling a
time series.
4. There is an alternative way of obtaining PACF, refer Box et al.
(5th edition, 2016, p. 64) for the same.
SR

3 General Linear Processes(G.L.P), Autocovari-


ance generating function, Stationarity and In-
vertibility conditions of a G.L.P
3.1 Introduction
The class of linear time series models, which includes the class of
Autoregressive Moving-average (ARMA) models, provides a general
framework for studying stationary processes. Consider the process
discussed in Example-3 of section-1.2.1, White Noise Process. Let us
denote this process as {at ; t ∈ Z}, which is a sequence of uncorrelated
random variables such that:
1. E(at ) = 0 and V ar(at ) = σa2 , for all t,
2. the ACVF is (
σa2 , k = 0
γk = (23)
0 , k 6= 0,

19
3. hence, the ACF is
(
1 ,k = 0
ρk = (24)
0 , k 6= 0,

and
4. the PACF is (
1 ,k = 0
φkk = (25)
0 , k 6= 0.

Using this model, we can construct several useful stochastic models


for time series. Although the white noise process has very basic prop-
erties(discussed above), this process plays an important role in the
building of processes with much more interesting and more compli-
cated properties. Here, we describe a general linear stochastic model
(General Linear Process(GLP)) that assumes that the time series is
generated by a linear aggregation of such random shocks {at ; t ∈ Z}.
Definition 3.1 (General Linear Process). A General Linear Pro-
cess, {Xt }, is the one that can be represented as a weighted linear
SR

combination of present and past white noise terms as


Xt = at + ψ1 at−1 + ψ2 at−2 + · · · .
The above can be written compactly as

X
X t = at + ψj at−j . (26)
j=1

If the right-hand side of this expression (26) is truly an infinite se-


ries, then certain conditions must be placed on the ψ-weights for the
right-hand side to be meaningful mathematically. For our purposes,
it suffices to assume that

X
ψj2 < ∞. (27)
j=1

We should also note that since {at } is unobservable, there is no loss


in the generality of Equation (26) if we assume that the coefficient
on at is 1; effectively, ψ0 = 1.

Note:

20
1. The general linear process (26) allows us to represent Xt as a
weighted sum of present and past values of the “white noise”
process at .
2. The white noise process at may be regarded as a series of shocks
(innovations) that drive the system.
3. For Xt to represent a valid stationary process, it is necessary
P∞ for
the coefficients ψj to be absolutelyPsummable, that is, j=0 |ψj | <
∞, in which case we also have ∞ 2
j=0 ψj < ∞ so that V ar(Xt )
is finite.
4. A process with a nonzero mean µ may be obtained by adding µ
to the right-hand side of Equation (26). Since the mean does
not affect the covariance properties of a process, we assume a
zero mean (taking E(Xt ) = 0) for convenience until we begin
fitting models to data. If E(Xt ) = µ, then P∞ the general linear
process (26), can be rewriteen as Xt = µ + j=0 ψj at−j or X̃t =
Xt − µ = ∞
P
j=0 ψj at−j . Here X̃t , which is the deviation from its
level µ follows GLP.
SR

3.2 Alternative Form of GLP


The GLP given in (26) can be expressed in an alternative form, where
in, it may be thought of as being “regressed” on past Xt−1 , Xt−2 , . . .,
of the process. That is, Xt is a weighted sum of past Xt ’s and an
added shock at , that is,

Xt = π1 Xt−1 + π2 Xt−1 + · · · + at .

The above can be written compactly as



X
Xt = πj Xt−j + at (28)
j=1

Relationships between the ψ Weights and π Weights


The relationships between the ψ weights and the π weights may be
obtained by using the previously defined backward shift operator B,
such that

BXt = Xt−1 and hence Bj Xt = Xt−j .

21
Later, we will also need to use the forward shift operator F = B −1 ,
such that
FXt = Xt+1 and hence Fj Xt = Xt+j .
Usng the backshift operator B, the model (26) can be written as
 
X∞
X t = 1 + ψj Bj  at
j=1

or
Xt = ψ (B)at (29)
where ∞ ∞
X X
j
ψ (B) = 1 + ψj B = ψj Bj
j=1 j=0

with ψ0 = 1. Here ψ (B) is called the transfer function of the linear


filter relating Xt to at . The operator ψ (B) can be thought of as
a linear filter, which when applied to the white noise “input” series
{at } produces the “output” {Xt }. It can be regarded as the generating
function of the ψ weights, with B now treated simply as a variable
SR

whose jth power is the coefficient of ψj .


Similarly, (28) may be written as
 

X
1 − πj Bj  Xt = at
j=1

or
π (B)Xt = at . (30)
Thus,

X
π (B) = 1 − πj Bj
j=1
is the generating function of the π weights. After operating on both
sides of this expression (30) by ψ , we obtain
π (B)Xt = ψ (B)at = Xt .
ψ (B)π
π (B) = 1, so that
Hence, ψ (B)π
π (B) = ψ −1 (B). (31)
This relationship may be used to derive the π weights, knowing the
ψ weights, and vice versa.

22
3.3 Properties of GLP
For a general linear process, as given in expression (26), note that:
1.
X∞
E(Xt ) = E( ψj at−j )
j=0

X
= ψj E(at−j )
j=0
X∞
= ψj (0), since E(at ) = 0, ∀ t
j=0
E(Xt ) = 0 (32)
Note that,
P∞if Xt has mean µ, then
P∞using note-4 (above) E(Xt ) =
E(µ + j=0 ψj at−j ) = µ + E( j=0 ψj at−j ) = µ.
2. Since autocorrelation function is a basic data analysis tool for
identifying models, it is important to know the autocorrelation
function of a linear P
process. The autocovariance at lag k of the
linear process Xt = ∞
SR

j=0 ψj at−j , with ψ0 = 1 is clearly


γk = E(Xt Xt+k )
 
X∞ X ∞
= E ψj ψh at−j at+k−h 
j=0 h=0

X
γk = σa2 ψj ψj+k , using equation (23). (33)
j=0

In particular, by setting k = 0, we find that its variance (γ0 ) is



X
2 2
γ 0 = σX = σa ψj2 . (34)
j=0

Hence, the required autocorrelation function of a linear process


is
γk
ρk =
γ0
σa2 ∞
P
j=0 ψj ψj+k
=
σa2 ∞ 2
P
j=0 ψj
P∞
j=0 ψj ψj+k
ρk = P ∞ 2 . (35)
j=0 ψj

23
3. The Autocovariance Generating Function of a GLP is defined
as ∞
X
γ(B) = γk Bk . (36)
k=−∞

Note : The result (33) may be substituted in the autocovariance


generating function (36) to give
∞ X
X ∞
γ(B) = σa2 ψj ψj+k Bk
k=−∞ j=0
X∞ X∞
= σa2 ψj ψj+k Bk , (since ψh = 0, f or h < 0)
j=0 k=−j
X∞ X ∞
= σa2 ψj ψh Bh−j , (writing j + k = h, ⇒ k = h − j)
j=0 h=0
X∞ ∞
X
= σa2 ψh B h
ψj B−j
h=0 j=0
−1
= σa2ψ (B)ψ
ψ (B )
SR

γ(B) = σa2ψ (B)ψ


ψ (F). (37)

Note: This method is a convenient way of calculating the autoco-


variances for some linear processes. The corresponding autocorrela-
tion generating function will be

X γ(B)
ρ(B) = ρk Bk = . (38)
γ0
k=−∞

3.4 Stationarity and Invertibility Conditions of a General


Linear Process
Here, we shall study concepts like Stationarity, Invertibility and Causal-
ity. Further, we learn the conditions under which the GLP will be
Stationary, Invertibile and Causal respectively.

Stationarity Conditions of a General Linear Process: The


convergence of the series (34) ensures that the process has a finite
variance. Also, we have seen in Section 2.1.3 that the autocovari-
ances and autocorrelations must satisfy a set of conditions to ensure

24
stationarity. For a linear processP
(26), these conditions are guaran-
teed by the single condition that ∞j=0 ψj < ∞. This condition can
also be embodied in the condition that the series ψ (B), which is the
generating function of the ψ weights, must converge for |B| ≤ 1, that
is, on or within the unit circle.

Invertibility Conditions of a General Linear Process: We


have seen that the ψ weights of a linear process must satisfy the con-
dition that ψ (B) converges on or within the unit circle if the process
is to be stationary. We now consider a similar restriction applied to
the π weights to ensure what is called invertibility. The invertibil-
ity requirement is needed to associate present events with past values
in a sensible manner. This invertibility condition is independent of
the stationarity condition and is also applicable to the nonstationary
linear models, which will be discussed later.
For a linear process (30),P
the invertibility condition is guaranteed
by the single condition that ∞ j=0 πj < ∞. This condition can also
be embodied in the condition that the series π (B), which is the gen-
erating function of the π weights, must converge for |B| ≤ 1, that is,
SR

on or within the unit circle.


Definition 3.2 (Invertibility). A Linear process {Xt } is invertible
(strictly, an invertible function
P∞ of {at }) if there is a π (B) = π0 −
2
π1 B − π2 B − · · · , with j=0 πj < ∞ and at = π (B)Xt .

Remark: Thus, to summarize,Pa linear process given in expres-



sion (26/29/30) is stationary if j=0 ψj < ∞ and invertible if
P∞ −1
P ∞ j
j=0 πj < ∞, where π (B) = ψ (B) = 1 − j=1 πj B .

Causality Conditions of a General Linear Process: We say


that {Xt } is causal or a causal function of {at }, since Xt can be
expressed in terms of the current and past values as , s ≤ t.
Definition 3.3 (Causality). A Linear process {Xt } is Causality
(strictly, an causalPfunction of {at }) if there is a ψ (B) = ψ0 + ψ1 B +
ψ2 B2 + · · · , with ∞j=0 ψj < ∞ and Xt = ψ (B)at .

Remark: Just as causality means that Xt is expressible in terms


of as , s ≤ t, the dual concept of invertibility means that at is express-
ible in terms of Xs , s ≤ t.

25
Note: Having studied the General Linear Process, we now focus
our attention on more realistic models. For practical representation,
it is desirable to employ models that use parameters parsimoniously.
Parsimony may often be achieved by representation of the linear pro-
cess in terms of a small number of autoregressive–moving average
(ARMA) terms. The GLP contains infinite number of parameters
that are impossible to estimate from a finite number of available ob-
servations. Instead, in modeling a phenomenon, we construct models
with only a finite number of parameters (this is what is known as
principle of parsimony).

4 Autoregressive Processes(AR(p)), Stationar-


ity Condition, ACF, PACF, Yule-Walker Equa-
tions, AR(1) and AR(2) Processes
4.1 Introduction
Consider the alternative form of P GLP given in expression P∞ (30) i.e.,
SR

∞ j
π (B)Xt = at , where π (B) = 1 − j=1 πj B , and 1 + j=1 |πj | < ∞
and {at } ∼ W N (0, σa2 ). In this model, if only a finite number of π
weights are nonzero, i.e., π1 = φ1 , π2 = φ2 , . . . , πp = φp and πk =
0, ∀ k > p, then the resulting process (special case of (30) ) is said
to be an Autoregressive process of order p and abbreviate the name
to AR(p) process.
Definition 4.1 (Autoregressive Process of order p (AR(p)).
Suppose that {at } is a purely random process with mean zero and
variance σa2 . Then a process {Xt } is said to be an autoregressive
process of order p (abbreviated to an AR(p) process) if

Xt = φ1 Xt−1 + φ2 Xt−2 + · · · + φp Xt−p + at . (39)

The AR(p) model can be written in the equivalent form using the
backshift operator B

(1 − φ1 B − φ2 B2 − · · · − φp Bp )Xt = at

or
φ (B)Xt = at . (40)

26
This implies that
at
Xt = = φ −1 (B)at ≡ Ψ(B)at .
φ (B)
Hence, the autoregressive process can be thought of as the output Xt
from a linear filter with transfer function φ −1 (B) = Ψ(B) when the
input is white noise at .

Note :
1. The AR processes are useful in describing situations in which
the present value of a ts depends linearly on its immediate pre-
vious values together with a random shock.
2. The current value of the series Xt is a linear combination of the
p most recent past values of itself plus an “innovation” term at
that incorporates everything new in the series at time t that is
not explained by the past values. Thus, for every t, we assume
that at is independent of Xt−1 , Xt−2 , Xt−3 , . . . , Xt−p .
3. AR model is rather like a multiple regression model, but Xt is
SR

regressed on past values of Xt rather than on separate predictor


variables. This explains the prefix ‘auto’ (regressions on them-
selves).
4. The symbols φ1 , φ2 , . . . , φp are the finite set of weight param-
eters, known as Autoregressive parameters. These weight pa-
rameters are unknown and hence will be estimated from the
observed Xt series (data).
5. Recall that we are assuming that Xt has zero mean. We can
always introduce a nonzero mean by replacing Xt by Xt − µ, ∀ t
throughout our equations.
6. The autoregressive operator is defined to be φ(B) = (1 − φ1 B −
φ2 B2 − · · · − φp Bp ).
7. Yule (1926) carried out the original work on autoregressive pro-
cesses. He used this process to describe the phenomena of sunspot
numbers and the behaviour of a simple pendulum.
Definition 4.2 (Autoregressive Process of Order p). {Xt } is an
AR(p) process if {Xt } is stationary and if for every t,
Xt − φ1 Xt−1 − φ2 Xt−2 − · · · − φp Xt−p = at , (41)

27
where {at } ∼ W N (0, σa2 ) and (1 − φ1 z − φ2 z 2 − · · · − φp z p ) is the pth
degree polynomial.

4.2 Stationarity Conditions of AR(p) Process


The general AR(p) process φ (B)Xt = at can be written as

X
−1
Xt = φ (B)at ≡ ψ (B)at = ψj at−j (42)
j=0

provided that the right-side expression is convergent. Using the fac-


torization
φ (B) = (1 − G1 B)(1 − G2 B) · · · (1 − Gp B)
where G−1 −1 −1 −1
1 , G2 , . . . , Gp are the roots of φ (B) = 0, expanding φ (B)
in partial fractions (partial fraction decomposition)yields
p
−1
X Ki
Xt = φ (B)at = at .
i=1
1 − Gi B
SR

Hence, if ψ (B) = φ −1P


(B) is to be a convergent series for |B| ≤ 1, i.e.
if the weights ψj = pi=1 Ki Gji (can be shown) are to be absolutely
summable so that the AR(p) process is stationary. i.e.

X
ψj < ∞
j=0
p
∞ X
X
i.e., Ki Gji < ∞
j=0 i=1

=⇒ |Gi | < 1, ∀ i = 1(1)p


1
=⇒ > 1, ∀ i = 1(1)p
Gi
i.e. G−1
i > 1, ∀ i = 1(1)p

Nothing but the roots of the characteristic equation φ(B) = 0 must


lie outside the unit circle. Thus for stationarity the roots(zeros) of
the polynomial φ(B) must lie outside the unit circle. i.e the roots of
φ(B) = 0) must lie outside
the characteristic equation of the process (φ
the unit circle, i.e. be greater than 1 in absolute value.

28
Remark: Other relationships between polynomial roots and coef-
ficients may be used to show that the following two inequalities are
necessary for stationarity. That is, for the roots to be greater than 1
in modulus, it is necessary, but not sufficient, that both
φ1 + φ2 + · · · + φp < 1
and φp < 1.
Note:
1. The ψ weights are calculated for the AR(p) process by using the
difference equation ψj = φ1 ψj−1 + φ2 ψj−2 + · · · + φp ψj−p , j > 0
with ψ0 = 1 and ψj = 0, ∀ j < 0, from which the weights ψj
can easily be computed recursively in terms of the φi .
2. Since the series π (B) = φ (B) = (1−φ1 B−φ2 B2 −· · ·−φp Bp ) is
finite, no restrictions are required on the parameters
Pof an au-

toregressive process to ensure invertibility, because j=1 πj =
Pp
j=1 φj < ∞.

4.3 ACVF and ACF of the AR(p) Process


SR

We know that the pth order autoregressive process (AR(p)) is


Xt = φ1 Xt−1 + φ2 Xt−2 + · · · + φp Xt−p + at . (43)
An important recurrence relation for the autocorrelation function of
a stationary autoregressive process is found by multiplying throughout
in (43) by Xt−k , for k > 0, to obtain
Xt−k Xt = φ1 Xt−k Xt−1 + φ2 Xt−k Xt−2 + · · · + φp Xt−k Xt−p + Xt−k at .
(44)
Now, on taking expected values, we obtain the difference equation
γk = φ1 γk−1 + φ2 γk−2 + · · · + φp γk−p , k > 0. (45)
Note that the expectation E(Xt−k at ) = 0, ∀ k > 0. On dividing
throughout in (45) by γ0 , we see that the autocorrelation function
satisfies the same form of difference equation
ρk = φ1 ρk−1 + φ2 ρk−2 + · · · + φp ρk−p , k > 0. (46)
Note that this (recursive relationship) is analogous to the difference
equation satisfied by the process Xt itself, but without the random
shock input at . Now suppose that this equation is written as
φ(B)
φ(B)ρk = 0,

29
where φ (B) = (1 − φ1 B − φ2 B2 − · · · − φp Bp ) and B now operates
on k not t. Then, writing
p
Y
φ(B) = (1 − Gi B)
i=1

the general solution for ρk in (46) is

ρk = A1 Gk1 + A2 Gk2 + · · · + Ap Gkp , k > 0. (47)

where G−1 i , i = 1(1)p are the roots of the characteristic equation


(1−φ1 B−φ2 B2 −· · ·−φp Bp ) = 0. For a stationary process G−1
i >1
and |Gi | < 1.
Note: In general, the autocorrelation function of a stationary
autoregressive process will consist of a mixture of damped exponen-
tials (if the roots are real then ACF decays to zero geometrically as
k increases. We often refer to this as a damped exponential.) and
damped sine waves (if some of the roots are complex).

Remark: We can in principle find the ACF of the general-order


SR

AR process using the expression given in (35), but the {ψj } weights
(wrt to AR(p) process the weights will be interms of {φj }) may be
algebraically hard to find. We have an alternative way, which is
discussed below.

4.3.1 Autoregressive Parameters in Terms of the Autocorrelations:


Yule–Walker Equations

If we substitute k = 1(1)p in (46), we obtain a set of linear equations


for φ1 , φ2 , . . . , φp in terms of ρ1 , ρ2 , . . . , ρp , i.e.

ρ1 = φ1 + φ2 ρ1 + · · · + φ(p−1) ρp−2 + φp ρp−1


ρ2 = φ1 ρ1 + φ2 + · · · + φ(p−1) ρp−3 + φp ρp−2
..
. (48)
ρp−1 = φ1 ρp−2 + φ2 ρp−3 + · · · + φ(p−1) ρ2 + φp ρ1
ρp = φ1 ρp−1 + φ2 ρp−2 + · · · + φ(p−1) ρ1 + φp .

These are the well-known Yule–Walker equations (G. U. Yule, 1927;


Sir Gilbert Walker, 1931). We obtain Yule–Walker estimates of the

30
parameters by replacing the theoretical autocorrelations ρk by the es-
timated autocorrelations rk . Note that, if we write
     
φ1 ρ1 1 ρ1 ρ2 · · · ρp−1
φ 
 2
ρ 
 2
 ρ
 1 1 ρ1 · · · ρp−2 
φ =  ..  ρ p =  ..  and P p =  ..

.. .. ..
. ···

 .   .   . . . 
φp ρp ρp−1 ρp−2 ρp−3 · · · 1

the solution of (48) for the parameters φ in terms of the autocorre-


lations may be written as

φ = P −1
p ρp (49)

4.4 Variance of the AR(p) Process


Consider pth order autoregresive process (AR(p))

Xt = φ1 Xt−1 + φ2 Xt−2 + · · · + φp Xt−p + at . (50)

multiply the above expression throughout in (50) by Xt , to obtain


SR

Xt2 = φ1 Xt Xt−1 + φ2 Xt Xt−2 + · · · + φp Xt Xt−p + Xt at . (51)

Now, on taking expectation, the contribution from the term E(Xt at ),


is E(a2t ) = σa2 . Hence, when k = 0, we obtain

γ0 = φ1 γ−1 + φ2 γ−2 + · · · + φp γ−p + σa2 .

On substituting γk = γ−k and writing γk = γ0 ρk , the variance γ0 = σx2


may be written as

σa2
σx2 = . (52)
1 − φ1 ρ1 − φ2 ρ2 − · · · − φp ρp

4.5 PACF of the AR(p) Process


In practice, we typically do not know the order of the autoregres-
sive process initially, and the order has to be specified from the data.
The problem is analogous to deciding on the number of independent
variables to be included in a multiple regression. The partial auto-
correlation function is a tool that exploits the fact that , whereas an
AR(p) process has an autocorrelation function that is infinite in ex-
tent, the partial autocorrelations are zero beyond lag p (section-2.2,

31
Note:2).
The partial autocorrelations can be described in terms of p nonzero
functions of the autocorrelations. Denote by φkj the jth coefficient
in an autoregressive representation of order k, so that φkk is the last
coefficient. From, ρk = φ1 ρk−1 + φ2 ρk−2 + · · · + φp ρk−p , the φkj satisfy
the set of equations

ρj = φk1 ρj−1 + φk2 ρj−2 + · · · + φk(k−1) ρj−k+1 + φkk ρj−k , j = 1(1)k


(53)
leading to the Yule–Walker equations (48), which may be written as
    
1 ρ1 ρ2 · · · ρk−1 φk1 ρ1
 ρ
 1 1 ρ1 · · · ρk−2   φk2   ρ2 
   
  ..  =  .. 

 .. .. .. . . .
.
 . . . . .  .   . 
ρk−1 ρk−2 ρk−3 · · · 1 φkk ρk
or
P kφ k = ρ k . (54)
Solving these equations for k = 1, 2, 3, . . ., sucessively, we obtain
SR

φ11 = ρ1

1 ρ1
ρ1 ρ 2
φ22 =
1 ρ1
ρ1 1

1 ρ1 ρ1
ρ 1 1 ρ2
ρ 2 ρ1 ρ3
φ33 =
1 ρ1 ρ2
ρ 1 1 ρ1
ρ 2 ρ1 1
..
.

32
In general, for φkk ,
1 ρ1 ρ2 · · · ρk−2 ρ1
ρ1 1 ρ1 · · · ρk−3 ρ2
.. .. .. . . . .. ..
. . . . .
ρk−1 ρk−2 ρk−3 · · · ρ1 ρk
φkk = .
1 ρ1 ρ2 · · · ρk−2 ρk−1
ρ1 1 ρ1 · · · ρk−3 ρk−2
.. .. .. . . . .. ..
. . . . .
ρk−1 ρk−2 ρk−3 · · · ρ1 1
The quantity φkk , regarded as a function of the lag k, is called the
partial autocorrelation function.
Note: For an AR(p) process, the partial autocorrelations φkk will
be nonzero for k ≤ p and zero for k > p. In other words, the partial
autocorrelation function of the AR(p) process has a cutoff after lag
p. i.e. (
6= 0 , k ≤ p
For an AR(p) process φkk is (55)
= 0 , k > p.
SR

4.6 AR(1) Process


4.6.1 Introduction

Definition 4.1 (AR(1) Process). : {Xt } is an AR(1) process if


{Xt } is stationary and if for every t,
Xt − φ1 Xt−1 = at , (56)
where {at } ∼ W N (0, σa2 ) and (1 − φ1 z) is the 1st degree polynomial.
Note: The AR(1) process is sometimes called the Markov process,
after the Russian A.A. Markov, because the distribution of Xt given
Xt−1 , Xt−2 , . . . is exactly same as the distribution of Xt given Xt−1 .

4.6.2 Stationarity and Invertibility of AR(1) Process

Consider the equation (56), which may be written as


(1 − φ1 B)Xt = at ,
i.e. ∞
X
Xt = (1 − φ1 B) at = −1
φj1 at−j
j=0

33
provided that the infinite series on the right converges in an appro-
priate sense. Hence

X
ψ (B) = (1 − φ1 B) −1
= φj1 Bj . (57)
j=0

We know that forP stationarity, ψ (B) must converge for |B| ≤ 1, or


equivalently that ∞ j
j=0 |φ1 | < ∞. This implies that the parameter
φ1 of an AR(1) process must satisfy the condition |φ1 | < 1 to ensure
stationarity. Since the root of (1 − φ1 B) = 0 is B = φ−1 1 , this
condition is equivalent to saying that the root of (1 − φ1 B) = 0 must
lie outside the unit circle.
In summary, the necessary and sufficient condition for the AR(1)
model in (56) to be weakly stationary is |φ1 | < 1.

1. The ψ weights are calculated for the AR(1) process by using


the difference equation ψj = φ1 ψj−1 , j > 0 with ψ0 = 1 and
ψj = 0, ∀ j < 0, from which the weights ψj = φj1 (can be
shown) can easily be computed recursively in terms of the φ1 .
SR

2. Since the series π (B) = φ (B) = (1 − φ1 B) is finite, no restric-


tions are required on the parameter of an autoregressive process
to ensure invertibility.

4.6.3 ACVF and ACF of AR(1) Process

Consider the AR(1) model, Xt = φ1 Xt−1 + at . Multiply this equation


by Xt−k , take expectations to get

E(Xt−k Xt ) = φ1 E(Xt−k Xt−1 ) + E(Xt−k at ).

Since the series is assumed to be stationary with zero mean, and


since at is independent of Xt−k , we obtain E(Xt−k at ) = 0 and so

γk = φ1 γk−1 , k ≥ 1, (58)

and the autocorrelation becomes

ρk = φ1 ρk−1 = φk1 , k ≥ 1, (59)

where we use that ρ0 = 1. Hence, when |φ| < 1 and the process is
stationary, the ACF exponentially decays in one of the two forms
depending on the sign of φ1 . If 0 < φ1 < 1, then all autocorrelations

34
are positive; if −1 < φ1 < 0, then the sign of the autocorrelations
shows an alternating pattern beginning with a negative sign. The
magnitude of these autocorrelations decrease exponentially in both
cases.

4.6.4 Variance of AR(1) Process

Consider the AR(1) model, Xt = φ1 Xt−1 + at , take variances of both


sides to get
γ0 = φ21 γ0 + σa2 , (60)
where we use that V ar(Xt ) = γ0 = σx2 , ∀ t and V ar(at ) = σa2 .
Solving for γ0 yields
σa2
σx2 = γ0 = (61)
(1 − φ21 )
Note: Notice the immediate implication that φ21 < 1 or that |φ1 | < 1.

4.6.5 PACF of AR(1) Process

From we get Xt = φ11 Xt−1 + at , For an AR(1) process, the PACF is


SR

(
ρ1 = φ1 , k = 1
φkk = (62)
0 , k ≥ 1.

Hence, the PACF of the AR(1) process shows a positive or negative


spike at lag 1 depending on the sign of φ1 and cuts off.

4.7 AR(2) Process


4.7.1 Introduction

Definition 4.2 (AR(2) Process). : {Xt } is an AR(2) process if


{Xt } is stationary and if for every t,

Xt − φ1 Xt−1 − φ2 Xt−2 = at , (63)

where {at } ∼ W N (0, σa2 ) and (1 − φ1 z − φ2 z 2 ) is the 2nd degree


polynomial.
Note: The AR(2) process was originally used by G. U. Yule in
1921 to describe the behaviour of a simple pendulum. Hence, the
process is sometimes called Yule Process.

35
4.7.2 Stationarity and Invertibility of AR(2) Process

For the second-order AR process, we have

(1 − φ1 B − φ2 B2 )Xt = at
i.e. Xt = φ1 Xt−1 + φ2 Xt−2 + at .

The AR(2) process, as a finite autoregressive model, is always in-


vertible. To be stationary, the roots of φ(B) = (1 − φ1 B − φ2 B2 ) = 0
must lie outside the unit circle.
The stationarity condition of the AR(2) model can also be expressed
in terms of its parameter values. We will show that this will be true
if and only if three conditions are satisfied:
φ1 + φ2 < 1, φ2 − φ1 < 1, and |φ2 | < 1.
Let G−1 −1 2
1 and G2 be the two roots of (1 − φ1 B − φ2 B ) = 0. We have
p p
φ + φ2 + 4φ φ − φ21 + 4φ2
−1 1 1 2 −1 1
G1 = and G2 =
−2φ2 −2φ2
Let the reciprocals of the roots be denoted G1 and G2 . Then
SR

" p #
2
−φ1 + φ1 + 4φ2
2φ 2φ
G1 = p2 = p2 p
−φ1 − φ21 + 4φ2 −φ1 − φ21 + 4φ2 −φ1 + φ21 + 4φ2
p p
2φ2 (−φ1 + φ21 + 4φ2 ) φ1 − φ21 + 4φ2
= =
φ21 − (φ21 + 4φ2 ) 2
Similarly,
p
φ1 + φ21 + 4φ2
G2 =
2
The required condition G−1
i > 1 implies that |Gi | < 1 for i = 1, 2.
Hence,

|G1 G2 | = |φ2 | < 1 and |G1 + G2 | = |φ1 | < 2.

Thus, we have the following necessary conditions for stationarity


regardless of whether the are real or complex:
(
−1 < φ2 < 1
(64)
−2 < φ1 < 2.

36
For real roots: Here we have φ21 + 4φ2 ≥ 0. Note that |Gi | < 1 for
i = 1, 2 if and only if

−1 < G1 ≤ G2 < 1
p p
φ1 − φ21 + 4φ2 φ1 + φ21 + 4φ2
−1 < ≤ <1
q2 q2
or − 2 < φ1 − φ21 + 4φ2 ≤ φ1 + φ21 + 4φ2 < 2

Consider just the first inequality


q
−2 < φ1 − φ21 + 4φ2
q
=⇒ φ21 + 4φ2 < φ1 + 2
on SBS =⇒ φ21 + 4φ2 < φ21 + 4φ1 + 4
=⇒ φ2 < φ1 + 1
=⇒ φ2 − φ1 < 1
p
The inequality φ1 + φ21 + 4φ2 < 2 is treated similarly and leads to
SR

φ1 + φ2 < 1. Hence, (
φ2 + φ1 < 1
(65)
φ2 − φ1 < 1.
These equations together with φ21 + 4φ2 ≥ 0 define the stationarity
region for the real root case.
For complex roots: Here we have φ21 + 4φ2 < 0. It is quite in-
volved. Shall not pursue.

Thus, in terms of the parameters values, the stationarity condition


of the AR(2) model is given by the following equations:

 φ2 + φ1 < 1

φ2 − φ1 < 1 (66)
 −1 < φ2 < 1.

4.7.3 ACVF and ACF of AR(2) Process

Consider the AR(2) model, Xt = φ1 Xt−1 + φ2 Xt−2 + at . Multiply this


equation by Xt−k , take expectations to get

E(Xt−k Xt ) = φ1 E(Xt−k Xt−1 ) + φ2 E(Xt−k Xt−2 ) + E((Xt−k at )

37
Since the series is assumed to be stationary with zero mean, and
since at is independent of Xt−k , we obtain E((Xt−k at ) = 0 and so
γk = φ1 γk−1 + φ2 γk−2 , k ≥ 1, (67)
and the autocorrelation becomes
ρk = φ1 ρk−1 + φ2 ρk−2 , k ≥ 1. (68)
Specifically, when k = 1 and 2,
ρ1 = φ1 + φ2 ρ1
ρ2 = φ1 ρ1 + φ2 ,
where we use that ρ0 = 1. Hence,
φ1
ρ1 =
1 − φ2
φ21 φ21 + φ2 − φ22
ρ2 = + φ2 = , (69)
1 − φ2 1 − φ2
and ρk for k ≥ 3 is calculated recursively through (68).
The pattern of the ACF is governed by the difference equation
SR

given by (68), namely (1 − φ1 B − φ2 B2 )ρk = 0. The ACF will


exponentially decay if the roots of (1 − φ1 B − φ2 B2 ) = 0 are real and
a damped sine wave if the roots of (1−φ1 B−φ2 B2 ) = 0 are complex.

4.7.4 Variance of AR(2) Process

Consider the AR(2) model, Xt = φ1 Xt−1 +φ2 Xt−2 +at , take variances
of both sides to get
γ0 = φ21 γ0 + φ22 γ0 + 2φ1 φ2 γ1 + σa2 , (70)
where we use that V ar(Xt ) = γ0 = σx2 , ∀ t and V ar(at ) = σa2 . Using
γ1 = φ1 γ0 + φ2 γ1 and, hence solving for γ0 yields
(1 − φ2 )σa2
σx2 = γ0 = (71)
(1 − φ2 )(1 − φ21 − φ22 ) − 2φ2 φ21
Note:
1. Notice the immediate implication of (66).
2. We can also find the variance of AR(2) process using σx2 =
σa2
, where ρ1 and ρ2 are as given in equation(69).
1 − φ1 ρ 1 − φ2 ρ 2
38
4.7.5 PACF of AR(2) Process

Using the fact that,

ρk = φ1 ρk−1 + φ2 ρk−2 , k ≥ 1, (72)

we have, from (22),


φ1
φ11 = ρ1 = (73)
1 − φ2

1 ρ1
ρ1 ρ2
φ22 =
1 ρ1
ρ1 1
ρ2 − ρ21
=
1 − ρ21
!  2
φ21 + φ2 − φ22 φ1

SR

1 − φ2 1 − φ2
=  2
φ1
1−
1 − φ2
φ2 [(1 − φ2 )2 − φ21 ]
=
(1 − φ2 )2 − φ21
φ22 = φ2 (74)

39
1 ρ1 ρ 1
ρ1 1 ρ 2
ρ2 ρ1 ρ 3
φ33 =
1 ρ1 ρ 2
ρ1 1 ρ 1
ρ2 ρ1 1
1 ρ1 φ 1 + φ 2 ρ1
ρ1 1 φ1 ρ1 + φ2
ρ2 ρ1 φ 1 ρ2 + φ 2 ρ1
= due to equation (72)
1 ρ 1 ρ2
ρ1 1 ρ1
ρ2 ρ 1 1
φ33 = 0, (75)

because the last column of the numerator is a linear combination of


the firt two columns. Similarly, we can show that φkk = 0, ∀ k ≥ 3.
SR

Hence, the PACF of AR(2) proces cuts off after lag 2.

5 Moving Average (MA(q)) Processes, Invert-


ibility Condition, ACF, PACF, MA(1), MA(2)
Processes, Duality between AR(p) and MA(q)
Processes
5.1 Introduction
Consider the aletrnative formPof GLP given in expression (29) i.e.,
∞ j P∞
Xt = ψ (B)at , where ψ (B) = j=0 ψj B , j=0 |ψj | < ∞ and {at } ∼
W N (0, σa2 ). In this model, if only a finite number of ψ weights are
nonzero, i.e., ψ1 = −θ1 , ψ2 = −θ2 , . . . , ψq = −θq and ψk = 0, ∀ k >
q, then the resulting process (special case of (29) ) is said to be a
Moving average process of order q and abbreviate the name to MA(q)
process.
Definition 5.1 (Moving Process of order q (M A(q)). : Suppose
that {at } is a purely random process with mean zero and variance

40
σa2 . Then a process {Xt } is said to be a moving average process of
order q (abbreviated to an MA(q) process) if

Xt = at − θ1 at−1 − θ2 at−2 − · · · − θq at−q . (76)

The MA(q) model can be written in the equivalent form using the
backshift operator B

Xt = (1 − θ1 B − θ2 B2 − · · · − θq Bq )at

or
Xt = θ (B)at . (77)
This implies that
Xt = θ (B)at ≡ Ψ(B)at .
Hence, the moving average process can be thought of as the output
Xt from a linear filter with transfer function θ (B) = Ψ(B) when the
input is white noise at .

Note :
SR

1. The MA processes are useful in describing phenomena in which


events produce an immediate effect that only lasts for short pe-
riods of time. MA processes have been used in many areas, par-
ticularly econometrics. For example, economic indicators are
affected by a variety of ‘random’ events such as strikes, gov-
ernment decisions, shortages of key materials and so on. Such
events will not only have an immediate effect but may also af-
fect economic indicators to a lesser extent in several subsequent
periods, and so it is at least plausible that an MA process may
be appropriate.
2. The current value of the series {Xt } is a linear combination of
the q most recent past values of “innovation” terms {at }.Thus,
the observed series {Xt }, is a weighted moving average of the
unobserved {at } series. This process is a related idea of the
weighted moving average, that was discussed in Unit-1. An im-
portant difference between this moving average and those con-
sidered previously(in Unit-1) is that here the moving average se-
ries is directly observed, and the coefficients θ1 , θ2 , . . . , θq must
be estimated from the data.

41
3. The terminology moving average arises from the fact that Xt
is obtained by applying the weights, 1, −θ1 , −θ2 , . . . , −θq to the
variables at , at−1 , at−2 , . . . , at−q and then moving the weights and
applying them to at+1 , at , at−1 , . . . , at−q+1 to obtain Xt+1 and so
on.
4. The symbols θ1 , θ2 , . . . , θq are the finite set of weight parameters,
known as Moving average parameters.
5. The moving average operator is defined to be θ(B) = (1 − θ1 B −
θ2 B2 − · · · − θq Bq ).
6. Recall that we are assuming that Xt has zero mean. We can
always introduce a nonzero mean by replacing Xt by Xt − µ, ∀ t
throughout our equations.
7. Slutzky (1927) and Wold(1938) carried out the original work on
moving average processes. The process arose as a result of the
study by Slutzky on the effect of the moving average on random
events.
SR

Remark: Some statistical books (viz. Hamilton)/software (for ex-


ample R), write the MA model with positve coefficients; that is, plus
signs before the θ’s.

Definition 5.2 (Moving average Process of Order q). {Xt } is an


MA(q) process if for every t,

Xt = (1 − θ1 B − θ2 B2 − · · · − θq Bq )at (78)

where {at } ∼ W N (0, σa2 ) and (1 − θ1 z − θ2 z 2 − · · · − θq z q ) is the qth


degree polynomial.

5.2 Invertibility Conditions of MA(q) Process


Stationarity: Moving-average models are always weakly stationary
because they are finite linear combinations of a white noise sequence
for which the first two moments are time invariant. Here, no restric-
tions on the {θi } are required for a (finite-order) MA process to be
stationary, but it is generally desirable to impose restrictions on {θi }
to ensure that the process satisfies a condition called invertibility.

42
Invertibility: We now derive the conditions that the parameters
θ1 , θ2 , . . . , θq must satisfy to ensure the invertibility of the MA(q)
process:
The general MA(q) process Xt = θ (B)at can be written as

X
−1
at = θ (B)Xt ≡ π (B)Xt = πj Xt−j (79)
j=0

provided that the right-side expression is convergent. Using the fac-


torization

θ (B) = (1 − H1 B)(1 − H2 B) · · · (1 − Hq B)

where H1−1 , H2−1 , . . . , Hq−1 are the roots of θ (B) = 0, expanding θ −1 (B)
in partial fractions yields
q
−1
X Mi
at = θ (B)Xt = Xt .
i=1
1 − Hi B

Hence, if π (B) = θ −1 (B)


Pq is to bej a convergent series for |B| ≤ 1, i.e.
SR

if the weights πj = − i=1 Mi Hi (can be shown) are to be absolutely


summable so that the MA(q) process is invertible. i.e.

X
πj < ∞
j=0
q
∞ X
X
i.e., Mi Hij < ∞
j=0 i=1

=⇒ |Hi | < 1, ∀ i = 1(1)q


1
=⇒ > 1, ∀ i = 1(1)q
Hi
i.e. Hi−1 > 1, ∀ i = 1(1)q

Nothing but the roots of the characteristic equation θ (B) = 0 must


lie outside the unit circle. Thus for invertibility the roots(zeros) of
the polynomial θ (B) must lie outside the unit circle. i.e the roots of
the characteristic equation of the process (θθ (B) = 0) must lie outside
the unit circle, i.e. be greater than 1 in absolute value.
Remark: Other relationships between polynomial roots and coeffi-
cients may be used to show that the following two inequalities are

43
necessary for invertibiity. That is, for the roots to be greater than 1
in modulus, it is necessary, but not sufficient, that both
θ1 + θ2 + · · · + θq < 1
and θq < 1.
Note:
1. The π weights are calculated for the MA(q) process by using the
difference equation πj = θ1 πj−1 + θ2 πj−1 + · · · + θq πj−q , j > 0
with π0 = −1 and πj = 0, ∀ j < 0, from which the weights πj
can easily be computed recursively in terms of the θi .
2. Since the series ψ (B) = θ (B) = (1 − θ1 B − θ2 B2 − · · · − θq Bq )
is finite, and 1 + θ12 + θ22 + · · · + θq2 < ∞, no restrictions are re-
quired on the parameters of an moving average process to ensure
stationarity.

5.3 ACVF and ACF of the MA(q) Process


We know that the qth order moving average process (MA(q)) is
SR

Xt = at − θ1 at−1 − θ2 at−2 − · · · − θq at−q . (80)


Multiplying throughout in (80) by Xt−k , for k > 0, to obtain
Xt−k Xt = (at−k − θ1 at−k−1 − · · · − θq at−k−q )(at − θ1 at−1 − · · · − θq at−q )
Now, on taking expected values, we obtain the autocovariance func-
tion of an MA(q) process as
γk = E((at−k − θ1 at−k−1 − · · · − θq at−k−q )(at − θ1 at−1 − · · · − θq at−q ))
= −θk E(a2t−k ) + θ1 θk+1 E(a2t−k−1 ) + · · · + θq−k θq E(a2t−q )
since the at are uncorrelated, and γk = 0, ∀ k > q. Hence,
(
(−θk + θ1 θk+1 + θ2 θk+2 + · · · + θq−k θq )σa2 , k = 1(1)q
γk = (81)
0, k > q.
Variance of the MA(q) Process: Taking Variance on both sides
of (76), we obtain
V ar(Xt ) = V ar(at − θ1 at−1 − θ2 at−2 − · · · − θq at−q )
Since a0t s are uncorrelated, we obtain
= V ar(at ) + θ12 V ar(at−1 ) + θ22 V ar(at−2) + · · · + θq2 V ar(at−q )
Using V ar(at ) = σa2 , ∀ t, we get
γ0 = σx2 = σa2 (1 + θ12 + θ22 + · · · + θq2 ) (82)

44
Using (81) and (82), thus, the autocorrelation function is

 (−θk + θ1 θk+1 + θ2 θk+2 + · · · + θq−k θq ) , k = 1(1)q

ρk = (1 + θ12 + θ22 + · · · + θq2 ) (83)
0, k > q.

We see that the autocorrelation function of an MA(q) process is zero,


beyond the order q of the process. In other words, the autocorrelation
function of a moving average process has a cutoff after lag q. This
important property enables us to identify whether a given time series
is generated by a moving average process.

5.3.1 Moving Average Parameters in Terms of the Autocorrelations

If ρ1 , ρ2 , . . . , ρq are known, the q equations may be solved for the


parameters θ1 , θ2 , . . . , θq . However, unlike the Yule–Walker equa-
tions (48) for an autoregressive process, the equations are nonlinear.
Hence, these equations have to be solved iteratively.

5.4 PACF of the MA(q) Process


SR

The PACF for MA models behaves much like the ACF for AR models.
We can easily see that the partial autocorrelation function of the
general MA(q) process tails off as a mixture of exponential deacys
and/or damped sine waves depending on the nature of the roots of
(1 − θ1 − θ2 − · · · − θq ) = 0. The PACF will contain damped sine
waves if some of the roots are complex.

5.5 MA(1) Process


5.5.1 Introduction

We now consider in detail the simple but nevertheless important mov-


ing average process of order 1, that is, the MA(1) series.
Definition 5.1 (MA(1) Process). : {Xt } is a MA(1) process if
for every t,
Xt = at − θ1 at−1 = (1 − θ1 B)at , (84)
where {at } ∼ W N (0, σa2 ) and (1 − θ1 z) is the 1st degree polynomial.

45
5.5.2 Invertibility Condition of MA(1) Process

Consider the equation (84), which may be written as

Xt = (1 − θ1 B)at ,

i.e. ∞
X
−1
at = (1 − θ1 B) Xt = θ1j Xt−j
j=0

provided that the infinite series on the right converges in an appro-


priate sense(this happens if |θ| < 1). Hence

X
π (B) = (1 − θ1 B) −1
= θ1j Bj . (85)
j=0

We know that forPinvertibility,Pπ∞(B) must converge for |B| ≤ 1, or


∞ j
equivalently that j=0 πj = j=0 |θ1 | < ∞. This implies that the
parameter θ1 of an MA(1) process must satisfy the condition |θ1 | < 1
to ensure invertibility. Since the root of (1 − θ1 B) = 0 is B = θ1−1 ,
this condition is equivalent to saying that the root of (1 − θ1 B) = 0
SR

must lie outside the unit circle.


In summary, the necessary and sufficient condition for the MA(1)
model in (84) to be invertible is |θ1 | < 1, which is parallel to the sta-
tionarity condition of the AR(1) model.

Note: When |θ| > 1,the weights increase as lags increase, so the
more distant the observations the greater their influence on the cur-
rent error. When |θ| = 1, the weights are constant in size, and the
distant observations have the same influence as the recent observa-
tions. As neither of these situations make much sense, we require
|θ| < 1, so the most recent observations have higher weight than ob-
servations from the more distant past. Thus, the process is invertible
when |θ| < 1.

5.5.3 ACVF and ACF of the MA(1) Process

The autocovariance generating function of a MA(1) process is, got


by using (37),

ψ (F), where F = B−1 .


γ(B) = σa2ψ (B)ψ (86)

46
For MA(1) process, ψ (B) = (1 − θ1 B). Therefore

γ(B) = σa2 (1 − θ1 B)(1 − θ1 B−1 )


= σa2 (1 − θ1 B−1 − θ1 B + θ12 )
γ(B) = σa2 (−θ1 B−1 + (1 + θ12 ) − θ1 B). (87)

Equating the powers of B, we get



 (1 + θ12 )σa2 , k = 0

γk = −θ1 σa2 , k=1 (88)
0, k > 1.

Variance of the MA(1) Process:


It is clear from (88), that

γ0 = σx2 = (1 + θ12 )σa2 . (89)

Using (88) and (89), the autocorrelation function becomes



 −θ1 , k = 1
ρk = (1 + θ12 ) (90)
SR

0, k ≥ 2,

which cuts off after lag 1; that is, the process has no correlation
beyond lag 1. This fact will be important later when we need to
choose suitable models for real data.
Remark: We can see that the first lag autocorrelation in MA(1) is
|θ1 | 1
bounded as |ρ1 | = ≤ .
(1 + θ12 ) 2

5.5.4 Non-uniqueness of MA Models and Invertibility

Let
1
M odel A : Xt = at − θ1 at−1 and M odel B : Xt = at − at−1 .
θ1
It can easily be shown that these two different processes have exactly
the same ACF. Thus we cannot identify a MA process uniquely from
a given ACF. Now, if we ‘invert’ models A and B by expressing at
in terms of Xt , Xt−1 , . . ., we find by successive substitution that

M odel A : at = Xt + θ1 Xt−1 + θ12 Xt−2 + · · · .

47
and
1 1
M odel B : at = Xt + Xt−1 + 2 Xt−2 + · · · .
θ1 θ1
If |θ1 | < 1, the series of coeffcients of Xt−j for model A converges
whereas that of B does not. Thus model B cannot be ‘inverted’ in
this way. The imposition of the invertibility condition ensures that
there is a unique invertible first-order MA process for a given ACF.

Remark: Thus for uniqueness and meaningful implication for fore-


casting(to be discussed later), we restrict ourselves to an invertible
process in the model selection.

5.5.5 PACF of the MA(1) Process

Using (90) and (22), the PACF of a MA(1) process can be easily
seen as
−θ1 −θ1 (1 − θ12 )
φ11 = ρ1 = =
1 + θ12 1 − θ14
−ρ21 −θ12 −θ12 (1 − θ12 )
φ22 = = =
SR

1 − ρ21 1 + θ12 + θ14 1 − θ16


ρ31 −θ13 −θ13 (1 − θ12 )
φ33 = = =
1 − 2ρ21 1 + θ12 + θ14 + θ16 1 − θ18
In general,
−θ1k (1 − θ12 )
φkk = 2(k+1)
, k ≥ 1. (91)
1− θ1
Contrary to its ACF, which cuts off after lag 1, the PACF of an
MA(1) process tails off exponentially (because |φkk | < |θ1 |k ) in one of
two forms depending on the sign of θ1 (hence the sign of ρ1 ). If ρ1 is
positive, so that θ1 is negative, the partial autocorrelations alternate
in sign. If, however ρ1 is negative, so that θ1 is positive, the partial
autocorrelations are negative.

5.6 MA(2) Process


5.6.1 Introduction

Definition 5.2 (MA(2) Process). : {Xt } is a MA(1) process if


for every t,
Xt = at − θ1 at−1 − θ2 at−2 = (1 − θ1 B − θ2 B2 )at , (92)

48
where {at } ∼ W N (0, σa2 ) and (1 − θ1 z − θ2 z 2 ) is the 2nd degree
polynomial.

5.6.2 Invertibility Condition of MA(2) Process

As a finite-order moving order model is stationary, the MA(2) pro-


cess is always stationary. For invertibility, the roots of (1 − θ1 B −
θ2 B2 ) = 0 must lie outside the unit circle. Hence, it can be show
that the invertibility condition for MA(2) process is:

 θ2 + θ1 < 1

θ2 − θ1 < 1 (93)
 −1 < θ2 < 1.

which is parallel to the stationarity condition of the AR(2) model.

5.6.3 ACVF and ACF of the MA(2) Process

The autocovariance generating function of a MA(2) process is, got


by using (37),
SR

ψ (F), where F = B−1 .


γ(B) = σa2ψ (B)ψ (94)

For MA(2) process, ψ (B) = (1 − θ1 B − θ2 B2 ). Therefore

γ(B) = σa2 (1 − θ1 B − θ2 B2 )(1 − θ1 B−1 − θ2 B−2 )


= σa2 (1 − θ1 B − θ2 B2 − θ1 B−1 + θ12 + θ1 θ2 B − θ2 B−2 + θ1 θ2 B−1 + θ22 )
γ(B) = σa2 (−θ2 B−2 − θ1 (1 − θ2 )B−1 + (1 + θ12 + θ22 ) − θ1 (1 − θ2 )B − θ2 B−2 ).

Equating the powers of B, we get





 (1 + θ12 + θ22 )σa2 k =0
−θ1 (1 − θ2 )σa2 , k =1

γk = (95)

 −θ2 σa2 , k =2
0, k ≥ 3.

Variance of the MA(2) Process:


It is clear from (95), that

γ0 = σx2 = (1 + θ12 + θ22 )σa2 . (96)

49
Using (95) and (96), the autocorrelation function becomes

−θ1 (1 − θ2 )
, k=1


 (1 + θ12 + θ22 )


ρk = −θ2 (97)

 (1 + θ 2 + θ2) , k=2

 1 2
 0, k ≥ 3,
which cuts off after lag 2.

5.6.4 PACF of the MA(2) Process

Using (97), where ρk = 0, ∀ k ≥ 3 and (22), the PACF of a MA(2)


process is

φ11 = ρ1
ρ2 − ρ21
φ22 =
1 − ρ21
ρ31 − ρ1 ρ2 (2 − ρ2 )
φ33 =
1 − ρ22 − 2ρ21 (1 − ρ2 )
SR

..
..
(98)
Note: The exact expression for the partial autocorrelation function
of an MA(2) process is complicated, but it is dominated by the sum
of two exponentials (tails off exponentially) if the roots of the char-
acteristic equation (1 − θ1 B1 − θ2 B2 ) = 0 are real, and by a damped
sine wave if the roots are complex. Its behaviour depend also on the
signs and magnitudes of θ1 and θ2 . Thus, it behaves like the auto-
correlation function of an AR(2) process (this aspect illustrates the
duality between the MA(2) and the AR(2) processes).

5.7 Duality between AR(p) and MA(q) Processes


MA(∞) representation of AR(p):
For a given AR(p) process,
φ(B)
φ(B)Xt = at , (99)
where φ(B) = (1 − φ1 B − φ2 B2 − · · · − φp Bp ), we can write
at
Xt = = ψ(B)
ψ(B)at , (100)
φ(B)
50
with ψ(B) = (1 + ψ1 B + ψ2 B2 + · · · ), such that

φ(B) ψ(B) = 1.
φ(B)ψ(B) (101)

The ψ weights can be derived by equating the coefficients of Bj on


both sides of the above expression. For example, we can write the
AR(2) process as
at 2
Xt = 2 = (1 + ψ1 B + ψ2 B + · · · )at , (102)
(1 − φ1 B − φ2 B )
which implies that

(1 − φ1 B − φ2 B2 )(1 + ψ1 B + ψ2 B2 + · · · ) = 1,

i.e.
1 + ψ1 B + ψ2 B2 + ψ3 B3 + · · ·
− φ1 B − ψ1 φ1 B2 − ψ2 φ1 B3 − · · · (103)
− φ2 B2 − ψ1 φ2 B3 − · · · = 1.
Thus, we obtain the ψj ’s as follows:

B1 : ψ1 − φ1 = 0
SR

→ ψ1 = φ1
2
B : ψ2 − ψ1 φ1 − φ2 = 0 → ψ2 = ψ1 φ1 + φ2 = φ21 + φ2
(104)
B3 : ψ3 − ψ2 ψ1 − ψ1 φ2 = 0 → ψ3 = ψ2 φ1 + ψ1 φ2
..
..
Actually, for j ≥ 2, we have

ψj = ψj−1 φ1 + ψj−2 φ2 , (105)

where ψ0 = 1. In a special case when φ2 = 0, we have ψj = φj1 , ∀ j ≥


0. Therefore,
at
Xt = = (1 + φ11 B + φ21 B2 + · · · )at . (106)
(1 − φ1 B)
This equation implies that a finite-order stationary AR process is
equivalent to an infinite-order MA process.

AR(∞) representation of MA(q):


Consider the MA(q) process,

Xt = θ(B)
θ(B)at , (107)

51
with θ(B) = (1 − θ1 B − θ2 B2 − · · · − θq Bq ), we can rewrite it as
Xt
π(B)
π(B)Xt = = at , (108)
θ(B)
1
where π(B) = (1 − π1 B − π2 B2 − · · · ) = , such that
θ(B)
θ(B) π(B) = 1.
θ(B)π(B) (109)
The π weights can be derived by equating the coefficients of Bj on
both sides of the above expression. For example, we can write the
MA(2) process as
Xt
(1 − π1 B − π2 B2 − π3 B3 − · · · )Xt = = at , (110)
(1 − θ1 B − θ2 B2 )
which implies that
(1 − θ1 B − θ2 B2 )(1 − π1 B − π2 B2 − π3 B3 − · · · ) = 1,
or
1 − π1 B − π2 B2 − π3 B3 − · · ·
SR

− θ1 B + π1 θ1 B2 + π2 θ1 B3 + · · · (111)
− θ2 B2 + π1 θ2 B3 + · · · = 1.
Thus, the πj weights can be derive by equating the coefficients of Bj
as follows:
B1 : −π1 − θ1 = 0 → π1 = −θ1
B2 : −π2 + π1 θ1 − θ2 = 0 → π2 = π1 θ1 − θ2 = −θ12 − θ2
(112)
B3 : −π3 + π2 θ1 + π1 θ2 = 0 → π3 = π2 θ1 + π1 θ2
..
..
In general, for j ≥ 3, we have
πj = πj−1 θ1 + πj−2 θ2 , (113)
where π0 = −1. In a special case when θ2 = 0, we have πj =
−θ1j , ∀ j ≥ 1. Therefore,
Zt
(1 + θ11 B + θ12 B2 + · · · )Xt = = at . (114)
(1 − θ1 B)
This equation implies that a finite-order invertible MA process is
equivalent to an infinite-order AR process.

52
In summary, a finite-order stationary AR(p) process corresponds
to an infinite-order MA(q) process, and a finite-order invertible MA(q)
process corresponds to an infinite-order AR(p) process. This dual re-
lationship between the AR(p) and MA(q) processes also exists in the
ACF and PACF. The AR(p) process has its autocorrelations tailing
off and partial autocorrelations cutting off, but the MA(q) process
has its autocorrelations cutting off and partial autocorrelations tail-
ing off.

5.7.1 Consequences of the Duality

The above discussed duality has the following consequences:


1. In a stationary autoregressive process of order p, at can be rep-
resented as a finite weighted sum of previous X’s, or Xt as an
infinite weighted sum
Xt = φ−1 (B)
(B)at
of the previous a’s. Conversely, an invertible moving average
process of order q, Xt can be represented as a finite weighted
SR

sum of previous a’s, or at as an infinite weighted sum


θ−1 (B)
(B)Xt = at
of the previous X’s.
2. The finite MA process has an autocorrelation function that is
zero beyond a certain point, but since it is equivalent to an in-
finite AR process, its partial autocorrelation function is infi-
nite in extent and is dominated by damped exponentials and/or
damped sine waves. Conversely, the AR process has a partial
autocorrelation function that is zero beyond a certain point, but
its autocorrelation function is infinite in extent and consists of
a mixture of damped exponentials and/or damped sine waves.
3. For an autoregressive process of finite order p, the parameters
are not required to satisfy any conditions to ensure invertibility.
However, for stationarity, the roots of φ(B) = 0 must lie outside
the unit circle. Conversely, the parameters of the MA process
are not required to satisfy any conditions to ensure stationarity.
However, for invertibility, the roots of θ(B) = 0 must lie outside
the unit circle.

53
Note: The below table gives the characteristics of the Autocor-
relation and the Partial Autocorrelation Functions of AR and MA
Processes, which can be used for model identification.
Table 1: Characteristics of the Autocorrelation and the Partial Autocorrelation
Functions of AR and MA Processes

ACF PACF
AR(p) does not break off breaks off with p
MA(q) breaks off with q does not break off

6 ARMA(p, q) Processes, Stationarity, Invert-


ibility, ACF, PACF, ARMA(1,1) Processes
6.1 Introduction
We now proceed with the general development of autoregressive and
moving average processes, called as mixed Autogressive Moving Av-
erage(ARMA) models for stationary time series.
SR

A useful class of models for time series is formed by combining


MA and AR processes. A natural extension of the pure AR and MA
processes is the mixed Autogressive Moving Average(ARMA) process,
which includes the AR and MA processes as special cases. The pro-
cess contains a large class of parsimonious time series models that
are useful in describing a wide variety of time series encountered in
practice.
As we have shown, a stationary and invertible process can be rep-
resented either in a moving average form or in an autogressive form.
A problem with either representation, though, is that it may contain
too many parameters, even for a finite-order moving average and a
finite-order autoregressive model because a higher-order model is of-
ten needed for good approximation. In general, a large number of
parameters reduces efficiency in estimation. Thus, in model build-
ing, it may be necessary to include both autoregressive an moving
average terms in a model, which leads to the following useful mixed
Autogressive Moving Average(ARMA) process:
Definition 6.1 (Autoregressive Moving Process of order (p, q)
(ARM A(p, q)). : Suppose that {at } is a purely random process with
mean zero and variance σa2 . Then a process {Xt } is said to be a

54
autoregressive moving average process of order (p,q) (abbreviated to
an ARMA(p,q) process) if

Xt −φ1 Xt−1 −· · ·−φp Xt−p = at −θ1 at−1 −θ2 at−2 −· · ·−θq at−q . (115)

The ARMA(p, q) model can be written in the equivalent form using


the backshift operator B

(1 − φ1 B − · · · − φp Bp )Xt = (1 − θ1 B − · · · − θq Bq )at

or in concise form
φ (B)Xt = θ (B)at . (116)
where φ (B) = (1 − φ1 B − φ2 B2 − · · · − φp Bp ) is the pth degree poly-
nomial and θ (B)(1 − θ1 B − θ2 B2 − · · · − θp Bq ) is the qth degree
polynomial.

Note: The above ARMA(p, q) process may be thought of in two


ways:

1. As a pth-order autoregressive process


SR

φ(B)Xt = et , (117)

where et following the qth-order moving average process et =


θ (B)at .
2. As a qth-order autoregressive process

Xt = θ (B)bt , (118)

where bt following the pth-order autoregressive process φ (B)bt =


φ(B)bt = θ (B)at .
at , so that φ (B)Xt = θ (B)φ

Definition 6.2 (Autoregressive Moving Average Process of Order


(p, q)). : {Xt } is an ARMA(p, q) process if {Xt } is stationary and
if for every t,

Xt − φ1 Xt−1 − · · · − φp Xt−p = at − θ1 at−1 − · · · − θq at−q , (119)

where {at } ∼ W N (0, σa2 ), (1 − φ1 z − φ2 z 2 − · · · − φp z p ) is the pth


degree polynomial and (1 − θ1 z − θ2 z 2 − · · · − θp z p ) is the qth degree
polynomial, where these polynomials will not have common roots.
Remark:

55
1. These models enable us to describe processes in which neither
the autocorrelation nor the partial autocorrelation function breaks
off after a finite number of lags.
2. Here we express Xt as a linear combination of past observations
Xt−j and white noise at−j .
3. In the statistical analysis of time series, autoregressive–moving-
average (ARMA) models provide a parsimonious description of
a (weakly) stationary stochastic process in terms of two poly-
nomials, one for the autoregression (AR) and the second for
the moving average (MA). The general ARMA model was de-
scribed in the 1951 thesis of Peter Whittle, Hypothesis testing
in time series analysis, and it was popularized in the 1970 book
by George E. P. Box and Gwilym Jenkins.
4. The importance of ARMA processes lies in the fact that a sta-
tionary time series may often be adequately modelled by an
ARMA model involving fewer parameters than a pure MA or
AR process by itself. This is an early example of what is often
SR

called the Principle of Parsimony. This says that we want


to find a model with as few parameters as possible, but which
gives an adequate representation of the data at hand.
Note :
1. ARMA models are fundamental tools for analyzing short-memory
time series.
2. The current value of the series Xt is a linear combination of the
p most recent past values of itself plus q most recent past values
of “innovation” terms at .
3. The symbols φ1 , φ2 , . . . , φp are the finite set of weight parame-
ters, known as Autoregressive parameters and symbols θ1 , θ2 , . . . , θq
are the finite set of weight parameters, known as Moving average
parameters.
4. The autoregressive operator is defined to be φ(B) = (1 − φ1 B −
φ2 B2 − · · · − φp Bp ) and the moving average operator is defined
to be θ(B) = (1 − θ1 B − θ2 B2 − · · · − θq Bq ).

56
θ(B) ≡ 1), the model is
5. It is to be noted that, when q = 0(θ(B)
called an autoregressive model of order p, AR(p), and p =
φ(B) ≡ 1), the model is called a moving average model of
0(φ(B)
order q, MA(q).
6. Recall that we are assuming that Xt has zero mean. We can
always introduce a nonzero mean by replacing Xt by Xt − µ, ∀ t
throughout our equations.
7. In the stationary ARMA(p, q) process, if the autoregressive op-
erator φ(B) = (1 − φ1 B − φ2 B2 − · · · − φp Bp ) and the moving
average operator θ(B) = (1 − θ1 B − θ2 B2 − · · · − θq Bq ) have
any roots in common say, λi = ηj for some i and j, then the
stationary ARMA(p, q) process is clearly identical to the sta-
tionary ARMA(p-1,q-1) process. (Here, the orders p and q gets
reduced by as many as the number of common roots, i.e., if there
are say j where j = min(p, q) common roots, then the actual
order of the ARMA process is (p − j, q − j) instead of (p,q)).
This is the reason why in the above definition (119 ) we state
that the polynomials will not have common roots.
SR

6.2 Stationarity Condition of ARMA(p, q) Process


The conditions on the model parameters to make the process station-
ary are the same as for a pure AR, namely, that the values of {φi },
which make the process stationary, are such that the roots of

φ (B) = 0

lie outside the unit circle.

Remark: The stationarity of an ARMA process depends entirely on


the autoregressive parameters (φ1 , φ2 , . . . , φp ) and not on the moving
average parameters (θ1 , θ2 , . . . , θq ).

6.3 Invertibility Condition of ARMA(p, q) Process


The conditions on the model parameters to make the process invert-
ible are the same as for a pure MA process, namely, that the values
of {θi }, which make the process invertible, are such that the roots of

θ (B) = 0

57
lie outside the unit circle.

Remark: The invertibility of an ARMA process depends entirely on


the moving average parameters (θ1 , θ2 , . . . , θq ) and not on the autore-
gressive parameters (φ1 , φ2 , . . . , φp ).

6.4 Relationship between ψj , πj , φi and θi of ARMA(p, q)


Process
The stationary and invertible ARMA(p, q) process (119) has both
the infinite moving average representation

X
Xt = ψ (B)at = ψj at−j ,
j=0

where ψ (B) = φ −1 (B)θθ (B), and the infinite autoregressive represen-


tation ∞
X
π (B)Xt = Xt − πj Xt−j = at ,
j=1
SR

where π (B) = θ −1 (B)φ


φ(B), with both the ψj weights and πj weights
being absolutely summable. The weights ψj are determined from the
ψ (B) = θ (B)to satisfy
relation φ (B)ψ

ψj = φ1 ψj−1 + φ2 ψj−2 + · · · + φp ψj−p − θj , j > 0

with ψ0 = 1, ψj = 0 for j < 0, and θj = 0 for j > q, while from the


π (B) = φ (B) the πj are determined to satisfy
relation θ (B)π

πj = θ1 πj−1 + θ2 πj−2 + · · · + θq πj−q + φj , j > 0

with π0 = −1, πj = 0 for j < 0, and φj = 0 for j > p. From these


relations, the ψj and πj weights can readily be computed recursively
in terms of the φi and θi coefficients.

6.5 ACVF and ACF of the ARMA(p, q) Process


The autocorrelation function of the mixed process may be derived by
a method similar to that used for autoregressive processes earlier.
Consider the ARMA(p, q) process

Xt = φ1 Xt−1 +φ2 Xt−2 +· · ·+φp Xt−p +at −θ1 at−1 −θ2 at−2 −· · ·−θq at−q ,

58
multiply it by Xt−k on both sides

Xt−k Xt = φ1 Xt−k Xt−1 +· · ·+φp Xt−k Xt−p +Xt−k at −θ1 Xt−k at−1 −· · ·−θq Xt−k at−q .

We now take expected value to obtain

γk = φ1 γk−1 +· · ·+φp γk−p +E[Xt−k at ]−θ1 E[Xt−k at−1 ]−· · ·−θq E[Xt−k at−q ].

i.e.,

γk = φ1 γk−1 + · · · + φp γk−p + γxa (k) − θ1 γxa (k − 1) · · · − θq γxa (k − q).

where γxa (k) is the cross-covariance function between X and a and


is defined by γxa (k) = E[Xt−k at ]. Since Xt−k depends only on shocks
that have occurred up to time t−k through
P∞ the infinite moving average
representation Xt−k = ψ (B)at−k = j=0 ψj at−k−j , it follows that
(
0, k>0
γxa (k) = 2 . (120)
ψ−k σa , k ≤ 0
Hence, the preceding equation for γk may be expressed as
SR

γk = φ1 γk−1 + · · · + φp γk−p − σa2 (θk ψ0 + θk+1 ψ1 + · · · + θq ψq−k ) (121)

with the convention that θ0 = −1. We see that this implies

γk = φ1 γk−1 + · · · + φp γk−p , k = q + 1, q + 2, . . . ,

and hence,

ρk = φ1 ρk−1 + · · · + φp ρk−p , k = q + 1, q + 2, . . . , (122)

or concisely,

φ (B)ρk = 0, ∀ k = q + 1, q + 2, . . . . (123)

Thus, after q lags the autocovariance function γk (and the autocorre-


lation function ρk ) follow the pth-order difference equation governed
by the autoregressive parameters. Hence, the ACF of an ARMA(p,
q) model tails off after lag q just like an AR(p) process, which de-
pends only on the autoregressive parameters in the model. The first
q autocorrelations ρq , ρq−1 , . . . , ρ1 , however, depend on both autore-
gressive and moving average parameters in the model and serve as
initial values for the pattern. This distinction is useful in model
identification.

59
Note that (123) does not hold for k ≤ q, owing to correlation
between θk at−k and Xt−k . Hence, an ARMA(p, q) process will have
more complicated autocovariances for lags 1 through q than would
the corresponding AR(p) process.
The ACF decays always exponentially. In case φ > 0 the ACF is
smooth as it damps out whereas if φ < 0 the ACF alternates in sign.
Remark that the sign of ρ1 is determined by φ1 and θ1 : if φ1 > θ1
then ρ1 > 0, else if φ1 < θ1 then ρ1 < 0.

6.6 Variance of the ARMA(p, q) Process


Consider equation (121), put k = 0 in it, we have

γ0 = φ1 γ1 + · · · + φp γp + σa2 (1 − θ1 ψ1 − · · · − θq ψq ) (124)

which has to be solved along with the p equations (121) for k =


1, 2, . . . , p to obtain γ0 , γ1 , γ2 , . . . , γp .

6.7 PACF of the ARMA(p, q) Process


SR

Hence, the partial autocorrelation function of a mixed process is in-


finite in extent. It behaves eventually like the partial autocorrelation
function of a pure moving average process, being dominated by a mix-
ture of damped exponentials and/or damped sine waves, depending
on the order of the moving average and the values of the parameters
it contains.

6.8 ARMA(1, 1) Process


6.8.1 Introduction

A mixed ARMA process of considerable practical importance is the


ARMA(1, 1)process, which is defined as below.
Definition 6.1 (ARMA(1, 1) Process). : {Xt } is an ARMA(1,
1) process if {Xt } is stationary and if for every t,

Xt − φ1 Xt−1 = at − θ1 at−1 , (125)

where {at } ∼ W N (0, σa2 ) and (1 − φ1 z) and (1 − θ1 z) are the 1st


degree polynomials.
Note:

60
1. If we are using an ARMA(1, 1) model in which θ1 is close to
φ1 then the data might better be modeled as simple white noise.
2. When φ1 = 0 (125) reduces to MA(1) process, and when θ1 = 0
it reduces to AR(1) process. Thus, we can regard the AR(1) and
MA(1) processes as special cases of the ARMA(1, 1) process.

6.8.2 Stationarity and Invertibility of ARMA(1, 1) Process

Consider the equation (125), which may be written as

(1 − φ1 B)Xt = (1 − θ1 B)at .

Stationarity: The above process is stationary, if the root of the


equation (1 − φ1 B) = 0, lies outside the unit circle, which leads to
the condition that −1 < φ1 < 1. That is, ARMA(1, 1) model defined
in (125) is stationary, if −1 < φ1 < 1.

Invertibility: The above process is said to be invertible, if the root


of the equation (1 − θ1 B) = 0, lies outside the unit circle, which
leads to the condition that −1 < θ1 < 1. That is, ARMA(1, 1)
SR

model defined in (125) is invertible, if −1 < θ1 < 1.

6.8.3 ACVF and ACF of ARMA(1, 1) Process

Consider the ARMA(1,1) model, Xt = φ1 Xt−1 +at −θ1 at−1 . Multiply


this equation by Xt−k , take expectations to get

E(Xt−k Xt ) = φ1 E(Xt−k Xt−1 ) + E(Xt−k at ) − θ1 E(Xt−k at−1 ).

γk = φ1 γk−1 + E(Xt−k at ) − θ1 E(Xt−k at−1 ). (126)


Variance of the ARMA(1, 1) Process:
More specifically, when k = 0,

γ0 = φ1 γ1 + E(Xt at ) − θ1 E(Xt at−1 )

Recall that E(Xt at ) = σa2 . For the term E(Xt at−1 ), we note that

E(Xt at−1 ) = φ1 E(Xt−1 at−1 ) + E(at at−1 ) − θ1 E(a2t−1 )


= (φ1 − θ1 )σa2 .

Hence,
γ0 = φ1 γ1 + σa2 − θ1 (φ1 − θ1 )σa2 . (127)

61
When k = 1, we have from (126)
γ1 = φ1 γ0 − θ1 σa2 . (128)
Substituting (128) in (127), we have
γ0 = φ21 γ0 − φ1 θ1 σa2 + σa2 − φ1 θ1 σa2 + θ12 σa2 ,
i.e., The variance of ARMA(1, 1) process is
(1 + θ12 − 2φ1 θ1 ) 2
γ0 = σa . (129)
(1 − φ21 )
Thus,
γ1 = φ1 γ0 − θ1 σa2
φ1 (1 + θ12 − 2φ1 θ1 )
= 2 − θ1 σa2
(1 − φ1 )
(φ1 − θ1 )(1 − φ1 θ1 ) 2
γ1 = σa . (130)
(1 − φ21 )
For k ≥ 2, we have from (126)
γk = φ1 γk−1 , k ≥ 2. (131)
Hence, the ARMA(1, 1) model has the following autocorrelation
SR

function: 


 1, k=0
 (φ − θ )(1 − φ θ )
1 1 1 1
ρk = 2 , k=1 (132)

 (1 + θ1 − 2φ1 θ1 )
 φ1 ρk−1 = φk−1 ρ1 , k ≥ 2.

1

Note that the autocorrelation function of ARMA(1, 1) model com-


bines characteristics of both AR(1) and MA(1) processes. The mov-
ing average parameter θ1 enters into the calculation of ρ1 . Beyond
ρ1 , the autocorrelation function of ARMA(1, 1) model follows the
same pattern as the autocorrelation function of an AR(1) process.
The autocorrelation function decays exponentially from the starting
value ρ1 , which depends on θ1 and φ1 . This exponential decay is
smooth if φ1 positive and alternates if φ1 negative. Furthermore, the
sign of ρ1 is determined by the sign of (φ1 − θ1 ) and dictates from
which side of zero the exponential decay takes place.
Hence, it is unlikely that we will be able to tell the difference
between an ARMA(1, 1) and an AR(1) based solely on an ACF es-
timated from a sample. This consideration will lead us to the partial
autocorrelation function.

62
Alternative Way of Deriving ACF of ARMA(1, 1) Process
Consider the model

(1 − φ1 B)Xt = (1 − θ1 B)at
φ (B)Xt = θ (B)at
1
Xt = θ (B)at (133)
φ (B)
For |φ1 | < 1, we have
1
θ (B) = (1 + φ1 B + φ21 B2 + φ31 B3 + · · · )(1 − θ1 B)
φ (B)
= 1 + φ1 B + φ21 B2 + φ31 B3 + · · · − (θ1 B + φ1 θ1 B2 + φ21 θ1 B3 + · · · )
= 1 + (φ1 − θ1 )B + (φ21 − φ1 θ1 )B2 + (φ31 − φ21 θ1 )B3 + · · ·
= 1 + (φ1 − θ1 )B + (φ1 − θ1 )φ1 B2 + (φ1 − θ1 )φ21 B3 + · · ·

1 X
θ (B) = ψj Bj (134)
φ (B) j=0

where ψ0 = 1 and ψj = (φ1 − θ1 )φj−11 , j ≥ 1. Therefore equation


SR

(133) can be expressed as a linear process of the form



X
Xt = ψj at−j , at ∼ W N (0, σa2 ) and ψj = (φ1 −θ1 )φj−1
1 , j ≥ 1, ψ0 = 1.
j=0
(135)
Using the above form we can easily derive expressions for γ0 and γ1 .

X
γ0 = σa2 ψj2
j=0
 

2(j−1) 
X
= σa2 1 + (φ1 − θ1 )2 φ1
j=1
 

X
= σa2 1 + (φ1 − θ1 )2 φ2j
1

j=0
" #
(φ1 − θ1 )2
γ0 = σa2 1+ (136)
(1 − φ21 )

63
and

X
γ1 = σa2 ψj ψj+1
j=0
2
= + (φ1 − θ1 )(φ1 − θ1 )φ1 + (φ1 − θ1 )φ1 (φ1 − θ1 )φ21
σa [1(φ1 − θ1 )
+ (φ1 − θ1 )φ21 (φ1 − θ1 )φ31 + · · · ]
= σa2 [(φ1 − θ1 ) + (φ1 − θ1 )2 φ1 (1 + φ21 + φ41 + · · · )]
 

X 2j
= σa2 (φ1 − θ1 ) + (φ1 − θ1 )2 φ1 φ1 
j=0
" #
(φ1 − θ1 )2 φ1
γ1 = σa2 (φ1 − θ1 ) +
(1 − φ21 )
(137)
Similar derivations for k ≥ 2 give

γk = φk−1
1 γk−1 . (138)

Hence, we can calculate the autocorrelation function ρk . For k = 1


SR

we obtain
γ1 (φ1 − θ1 )(1 − φ1 θ1 )
ρ1 = = , (139)
γ0 1 − φ21
and for k ≥ 2 we have

ρk = φk−1
1 ρk−1 . (140)

From these formulae we can see that when φ1 = θ1 , the ACF ρk =


0, ∀ k = 1, 2, . . . and the process is just a white noise.

6.8.4 PACF of ARMA(1,1) Process

The partial autocorrelation function of the mixed ARMA(1,1) process


consists of a single initial value φ11 = ρ1 . Thereafter, it behaves like
the partial autocorrelation function of a pure MA(1) process and is
dominated by a damped exponential. Thus, when θ1 is positive, it
is dominated by a smoothly damped exponential that decays from a
value of ρ1 , with sign determined by the sign of (φ1 − θ1 ). Similarly,
when θ1 is negative, it is dominated by an exponential that oscillates
as it decays from a value of ρ1 , with sign determined by the sign of
(φ1 − θ1 ).

64
6.9 Summary of Properties of AR, MA, and ARMA Pro-
cesses

Table 2: Summary of Properties of AR, MA, and ARMA Processes


Autoregressive Process Moving Average Process Mixed Process
Model interms of previous Xt0 s φ(B)Xt = at θ −1 (B)Xt = at θ −1 (B)φφ(B)Xt = at
Model interms of previous a0t s Xt = φ −1 (B)at Xt = θ (B)at Xt = φ −1 (B)θ θ (B)at
π weights Finite series Infinite Series Infinite Series
ψ weights Infinite Series Finite series Infinite Series
Roots of φ (B) = 0 must Always Stationary Roots of φ (B) = 0 must
Stationarity Condition
lie outside the unit circle lie outside the unit circle
Always Invertible Roots of θ (B) = 0 must Roots of θ (B) = 0must
Invertibility Condition
lie outside the unit circle lie outside the unit circle
Infinite (damped exponentials Finite:Cuts off after lag q Infinite (damped exponentials
and/or damped sine waves) and/or damped sine waves after
ACF
Tails off first q − p lags)
Tails off
Finite:Cuts off after lag p Infinite (dominated by Infinite (dominated by damped
damped exponentials and/or exponentials and/or damped
PACF
damped sine waves) sine waves after first p − q lags)
Tails off Tails off

***END***
SR

65

You might also like