4 Time Series
4 Time Series
This chapter reviews some basic times series concepts that are essential
for describing and modeling
financial time series. Section 4.1
defines univariate time series processes and introduces the important
concepts of stationarity and ergodicity. Covariance stationary time
series processes are defined, which
gives meaning to measuring linear
time dependence using autocorrelation. The benchmark Gaussian
White
Noise process and related processes are introduced and illustrated
using R. Some common non-
stationary time series processes are also
discussed including the famous random walk model. Section 4.2
introduces covariance stationary multivariate time series process.
Such processes allow for dynamic
interactions among groups of time
series variables. Section 4.3 discusses
time series model building and
introduces the class of univariate
autoregressive-moving average time series models and multivariate
vector autoregression models. The properties of some simple models
are derived and it is shown how to
simulate observations from these
models using R. The chapter concludes with a brief discussion of
forecasting
from time series models.
∞
{…, Y 1, Y 2, …, Y t, Y t + 1, …} = {Y t} t = − ∞,
is a sequence of random variables indexed by time t17. In most applications, the time index is a regularly
spaced index
representing calendar time (e.g., days, months, years, etc.) but it
can also be irregularly
spaced representing event time (e.g., intra-day
transaction times). In modeling time series data, the
ordering imposed
by the time index is important because we often would like to capture
the temporal
relationships, if any, between the random variables in
the stochastic process. In random sampling from a
population, the
ordering of the random variables representing the sample does not
matter because they
are independent.
T
{Y 1 = y 1, Y 2 = y 2, …, Y T = y T} = {y t} t = 1.
(Y t − t, Y t − t, …, Y t − t)
for any value of t.
1 2 r
general temporal
dependence between the random variables in the stochastic process.
A useful property of strict stationarity is that it is preserved under
general transformations, as summarized
in the following proposition.
2
For example, if {Y t} is strictly stationary then {Y t }
and {Y tY t − 1} are also strictly stationary.
cov(Y t, Y t − j) γj
ρj = = .
σ2
√ var(Y t)var(Y t − j)
rhoVec = (rho)^(1:10)
ylab=expression(rho[j]))
For this process the strength of linear time dependence decays toward
zero geometrically fast as j
increases.
set.seed(123)
y = rnorm(250)
The simulated iid N(0,1) values are generated using the rnorm()
function. The command
set.seed(123) initializes R’s internal
random number generator using the seed value 123. Every time
the random
number generator seed is set to a particular value, the random number
generator produces
the same set of random numbers. This allows different
people to create the same set of random numbers
so that results are
reproducible. The simulated data is illustrated in Figure 4.2
created using:
abline(h=0)
set.seed(123)
abline(h=c(0,0.01,-0.04,0.06), lwd=2,
lty=c("solid","solid","dotted","dotted"),
set.seed(123)
y = (1/sqrt(3))*rt(250, df=3)
col="blue", lwd=2)
abline(h=0)
1
Figure 4.4: Simulation of IWN(0,1) process: Y t ∼ × t3
√3
◼
4.1.2 Non-Stationary processes
2
Y t = β 0 + β 1t + ε t, ε t ∼ GWN(0, σ ε ),
t = 0, 1, 2, …
E[Y t] = β 0 + β 1t.
2
Figure 4.5 shows a realization
of this process with β 0 = 0, β 1 = 0.1 and σ ε = 1
created using the R
commands:
set.seed(123)
e = rnorm(250)
y.dt = 0.1*seq(1,250) + e
abline(a=0, b=0.1)
2
Y t = Y t − 1 + ε t, ε t ∼ GWN(0, σ ε ),
Y 0 is fixed (non-random).
Y 1 = Y 0 + ε 1,
Y 2 = Y 1 + ε 2 = Y 0 + ε 1 + ε 2,
⋮
Yt = Yt − 1 + εt = Y0 + ε1 + ⋯ + εt
t
= Y0 + ∑ ε j.
j=1
( )
t t
var(Y t) = var ∑ εj = ∑ σ 2ε = σ 2ε × t,
j=1 j=1
2
Figure 4.6 shows a realization of
the RW process with Y 0 = 0 and σ ε = 1 created
using the R commands:
set.seed(321)
e = rnorm(250)
y.rw = cumsum(e)
abline(h=0)
Figure 4.6: Random walk process: Y t = Y t − 1 + ε t, ε t ∼ GWN(0, 1).
The RW process looks much different from the GWN process in Figure
4.2. As the variance of the
process
increases linearly with time, the uncertainty about where the process
will be at a given point in
time increases with time.
2
X t = Y t − Y t − 1 = ε t ∼ GWN(0, σ ε ).
Example 4.9 (Random walk with drift model for log stock prices)
Let r t denote the continuously compounded monthly return on
Microsoft stock and assume that
r t ∼ GWN(μ, σ 2).
Since r t = ln(P t / P t − 1) it follows that lnP t = lnP t − 1 + r t.
Now, re-express r t as r t = μ + ε t
where ε t ∼ GWN(0, σ 2).
Then lnP t = lnP t − 1 + μ + ε t. By recursive substitution
we have
t
lnP t = lnP 0 + μt + ∑ t = 1 ε t and
so lnP t follows a random walk process with drift value μ.
Here, E[lnP t] = μt
and var(lnP t) = σ 2t
so lnP t is non-stationary because both the mean and variance
depend on t. In this
model, prices, however, do not follow a random
walk since P t = e ln P t = P t − 1e r t.
4.1.3 Ergodicity
Example 4.11 (Covariance stationary but not ergodic process (White 1984, page 41))
Let Y t ∼ GWN(0, 1) and let X ∼ N(0, 1) independent
of {Y t}. Define Z t = Y t + X. Then {Z t} is covariance
stationary but not ergodic. To see why {Z t} is not ergodic,
note that for all j > 0:
var(Z t) = var(Y t + X) = 1 + 1 = 2,
γ j = cov(Y t + X, Y t − j + X) = cov(Y t, Y t − j) + cov(Y t, X) + cov(Y t − j, X) + cov(X, X)
= cov(X, X) = var(X) = 1,
1
ρ j = for all j.
2
E[Y t] = (μ 1, …, μ n) ′ = μ,
( )
var(Y 1t) cov(Y 1t, Y 2t) ⋯ cov(Y 1t, Y nt)
cov(Y 2t, Y 1t) var(Y 2t) ⋯ cov(Y 2t, Y nt)
= .
⋮ ⋮ ⋱ ⋮
cov(Y nt, Y 1t) cov(Y nt, Y 2t) ⋯ var(Y nt)
cor(Y t) = C 0 = D − 1Γ 0D − 1,
k
γ jj = cov(Y jt, Y jt − k),
k
γ jj
k
ρ jj = corr(Y jt, Y jt − k) = 2
,
σj
k −k k −k
and these are symmetric in k: γ jj = γ jj ,
ρ jj = ρ jj . The cross lag-k covariances
and cross lag-k
correlations between
Y it and Y jt are defined as
k k
γ ij = cov(Y it, Y jt − k) ≠ cov(Y jt, Y it − k) = γ ji.
k
If γ ij ≠ 0 for some k > 0 then Y jt is said
to lead Y it. This implies that past values of Y jt
are useful for
k
predicting future values of Y it. Similarly, if
γ ji ≠ 0 for some k > 0 then Y it is said to
lead Y jt. It is possible
that Y it leads Y jt
and vice-versa. In this case, there is said to be dynamic feedback
between the two
series.
Γ k = E[(Y t − μ)(Y t − k − μ) ′]
( )
cov(Y 1t, Y 1t − k) cov(Y 1t, Y 2t − k) ⋯ cov(Y 1t, Y nt − k)
cov(Y 2t, Y 1t − k) cov(Y 2t, Y 2t − k) ⋯ cov(Y 2t, Y nt − k)
= ,
⋮ ⋮ ⋱ ⋮
cov(Y nt, Y 1t − k) cov(Y nt, Y 2t − k) ⋯ cov(Y nt, Y nt − k)
C k = D − 1Γ kD − 1.
′ ′
The matrices Γ k and C k are not
symmetric in k but it is easy to show that Γ − k = Γ k
and C − k = C k.
E[Y t] = 0,
var(Y t) = Σ,
k
cov(Y jt, Y jt − k) = γ jj = 0 (for k > 0)
k
cov(Y it, Y jt − k) = γ ij = 0 (for k > 0)
Σ=
( )
4
1
1
1
⇒C=
( 1
0.5
0.5
1 )
use:
library(mvtnorm)
set.seed(123)
Y = rmvnorm(250, sigma=Sigma)
abline(h=0)
The simulated values are shown on the same plot in Figure 4.7.
Both series fluctuate randomly about
zero, and the first series (black
line) has larger fluctuations (volatility) than the second series
(blue line).
The two series are contemporaneously correlated (ρ 12 = 0.5)
but are both uncorrelated over time (
ρ k11 = ρ k22 = 0, k > 0)
and are not cross-lag correlated (ρ k12 = ρ k21 = 0, k > 0).
Time series models are probability models that are used to describe
the behavior of a stochastic process.
In many cases of interest, it
is assumed that the stochastic process to be modeled is covariance
stationary and ergodic. Then, the main feature of the process to be
modeled is the time dependence
between the random variables. In this
section, we illustrate some simple models for covariance stationary
and ergodic time series that exhibit particular types of linear time
dependence captured by
autocorrelations. The univariate models, made
popular originally by (Box and Jenkins 1976), are called
autoregressive
moving average (ARMA) models. The multivariate model, made popular
by (Sims 1980),
is called the vector autoregressive (VAR) model.
These models are used extensively in economics and
finance for modeling
univariate and multivariate time series.
Y t = μ + ε t + θε t − 1, − 1 < θ < 1,
2
ε t ∼ GWN(0, σ ε ).
The MA(1) model is a simple linear function of the GWN random variables
ε t and ε t − 1. This linear
structure
allows for easy analysis of the model. The moving average parameter
θ determines the sign and
magnitude of the correlation between
Y t and Y t − 1. Clearly, if θ = 0 then Y t = μ + ε t
so that {Y t} is GWN with
non-zero mean μ and exhibits
no time dependence. As will be shown below, the MA(1) model produces
a
covariance stationary and ergodic process for any (finite) value
of θ. The restriction − 1 < θ < 1 is called
the invertibility restriction and will be explained below.
γ1 θσ 2ε θ
ρ1 = = = .
σ2 2
σ ε (1 + θ 2) (1 + θ 2)
Clearly, ρ 1 = 0 if θ = 0; ρ 1 > 0 if θ > 0; ρ 1 < 0
if θ < 0. Also, the largest value for | ρ 1 | is 0.5 which
occurs
when | θ | = 1. Hence, a MA(1) model cannot describe a
stochastic process that has | ρ 1 | > 0.5. Also, note
that there
is more than one value of θ that produces the same value of
ρ 1. For example, θ and 1/θ give
the same
value for ρ 1. The invertibility restriction − 1 < θ < 1
provides a unique mapping between θ and
ρ 1.
y 0 = μ + ε 0 + θε − 1,
y 1 = μ + ε 1 + θε 0.
n.obs = 250
mu = 1
theta = 0.9
sigma.e = 1
set.seed(123)
e = rnorm(n.obs, sd = sigma.e)
y = rep(0, n.obs)
y[1] = mu + e[1]
for (i in 2:n.obs) {
head(y, n=3)
set.seed(123)
e = rnorm(n.obs, sd = sigma.e)
y = mu + e + theta*em1
head(y, n=3)
args(arima.sim)
## NULL
ma1.model = list(ma=0.9)
set.seed(123)
y = mu + arima.sim(model=ma1.model, n=250,
n.start=1, start.innov=0)
head(y, n=3)
ma1.acf
## 0 1 2 3 4 5 6 7 8 9
## 1.0000 0.4972 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 10
## 0.0000
par(mfrow=c(2,1))
abline(h=c(0,1))
abline(h=0)
2
Figure 4.8: Simulated values and theoretical ACF from MA(1) process with μ = 1, θ = 0.9 and σ ε = 1.
par(mfrow=c(1,1))
R t ∼ GWN(μ, σ 2).
R t(2) = R t + R t − 1.
R t(2) = R t + R t − 1,
R t − 1(2) = R t − 1 + R t − 2,
R t − 2(2) = R t − 2 + R t − 3,
⋮
The one-month overlap in the two-month returns implies that {R t(2)}
follows an MA(1) process. To show
this, we need to show that the autocovariances
of {R t(2)} behave like the autocovariances of an MA(1)
process.
Next, we have:
and,
Notice that
σ2 1
ρ1 = = .
2σ 2 2
θ
What MA(1) process describes {R t(2)}? Because ρ 1 = = 0.5
it follows that θ = 1. Hence, the MA(1)
1 + θ2
process has mean 2μ
and θ = 1 and can be expressed as the MA(1) model:
R t(2) = 2μ + ε t + ε t − 1,
ε t ∼ GWN(0, σ 2).
2
Y t = μ + ε t + θ 1ε t − 1 + ⋯ + θ qε t − q, where ε t ∼ GWN(0, σ ε ).
E[Y t] = μ,
2 2
γ 0 = σ 2(1 + θ 1 + ⋯ + θ q ),
γj =
{ (θ j + θ j + 1θ 1 + θ j + 2θ 2 + ⋯ + θ qθ q − j )σ 2 for j = 1, 2, …, q .
0 for j > q
lim ρ j = ϕ j = 0,
j→∞
Here, we use the first trick. Given that {Y t} is covariance stationary it follows that E[Y t] = E[Y t − 1].
Substituting E[Y t] = E[Y t − 1] into the above and solving for
E[Y t] gives (4.5).
2
var(Y t) = ϕ 2(var(Y t − 1)) + var(ε t) = ϕ 2(var(Y t)) + σ ε ,
To determine (4.7), we use another trick. Multiply both sides of (4.4) by Y t − 1 − μ and take expectations to
give:
[ ]
which uses the fact that Y t − j is independent of ε t,
and E (Y t − 1 − μ)(Y t − j − μ) = γ j − 1 provided
{Y t} is
covariance stationary. Using recursive substitution
and γ 0 = σ 2 gives (4.9).
The AR(1) model (4.4) is written in what is called the mean-adjusted form. The
mean-adjusted form can
be re-expressed in the form of a linear regression
model as follows:
Y t − μ = ϕ(Y t − 1 − μ) + ε t ⇒
Y t = μ − ϕμ + ϕY t − 1 + ε t
= c + ϕY t − 1 + ε t,
phi = 0.9
mu = 1
sigma.e = 1
n.obs = 250
y = rep(0, n.obs)
set.seed(123)
e = rnorm(n.obs, sd=sigma.e)
y[1] = mu + e[1]
for (i in 2:n.obs) {
head(y, 3)
head(y, 3)
ar1.model = list(ar=0.9)
mu = 1
set.seed(123)
y = mu + arima.sim(model=ar1.model,n=250,n.start=1, start.innov=0)
head(y, 3)
The simulated AR(1) values and the ACF are shown in Figure 4.9.
Compared to the MA(1) process in
Figure 4.8,
the realizations from the AR(1) process are much smoother. That is,
when Y t wanders high
above its mean it tends to stay above the
mean for a while and when it wanders low below the mean it
tends to
stay below for a while.
◼
2
Figure 4.9: Simulated values and ACF from AR(1) model with μ = 1, ϕ = 0.9 and σ ε = 1
Y t − μ = ϕ 1(Y t − 1 − μ) + ⋯ + ϕ p(Y t − p − μ) + ε t,
2
ε t ∼ GWN(0, σ ε ),
The regression form of the AR(p) model is used very often in practice because of its simple
linear
structure and because it can capture a wide variety of autocorrelation
patterns such as exponential decay,
damped cyclical patterns, and
oscillating damped cyclical patterns. Unfortunately, the mathematical
derivation of the autocorrelations in the AR(p) model is complicated
and tedious (and beyond the scope of
this book). The exercises at the end of the chapter illustrate some of the calculations for the AR(2) model.
Y t − μ = ϕ 1(Y t − 1 − μ) + ⋯ + ϕ p(Y t − p − μ)
+ ε t + θ 1ε t − 1 + ⋯ + θ qε t − q
ε t ∼ GWN(0, σ 2)
Y t = c + ϕ 1Y t − 1 + ⋯ + ϕ pY t − p + ε t + θε t − 1 + ⋯ + θε t − q
where c = μ / (1 − ϕ 1 − ⋯ − ϕ p) = μ / (1 − ϕ) and ϕ = ϕ 1 + ⋯ + ϕ p.
This model combines aspects of the pure
moving average models and
the pure autoregressive models and can capture many types of
autocorrelation
patterns. For modeling typical non-seasonal economic and financial
data, it is seldom
necessary to consider models in which p > 2 and
q > 2. For example, the simple ARMA(1,1) model
Y t − μ = ϕ 1(Y t − 1 − μ) + ε t + θ 1ε t − 1
The most popular multivariate time series model is the (VAR) model. The VAR model is a multivariate
extension
of the univariate autoregressive model (4.12). For example,
a bivariate VAR(1) model for
Y t = (Y 1t, Y 2t) ′
has the form
( )()( )( ) ( )
1 1
Y 1t c1 a 11 a 12 Y 1t − 1 ε 1t
= + 1 1
+ ,
Y 2t c2 a 21 a 22 Y 2t − 1 ε 2t
or
1 1
Y 1t = c 1 + a 11 Y 1t − 1 + a 12 Y 2t − 1 + ε 1t,
1 1
Y 2t = c 2 + a 21 Y 1t − 1 + a 22 Y 2t − 1 + ε 2t,
where
( ) (( ) ( ))
ε 1t 0 σ 11 σ 12
∼ iid N , .
ε 2t 0 σ 12 σ 22
Y t = AY t − 1 + ε t,
ε t ∼ N(0, Σ),
where
( ) ( )
1 1
a 11 a 12 σ 11 σ 12
A= 1 1
, Σ= .
a 21 a 22 σ 12 σ 22
Y t = c + A 1Y t − 1 + A 2Y t − 2 + ⋯ + A pY t − p + ε t,
4.4 Forecasting
One of the main practical uses of time series models is for forecasting future observations
The existence of time dependence in a covariance stationary time series means that we can exploit
this time dependence to obtain forecasts of future observations that are superior to the unconditional
mean.
Describe basic forecasting problem
Show result that conditional mean is optimal mse forecast
Show chain-rule of forecasting for AR(1) processes
1. Based on the sample autocorrelations, which time series process is most appropriate for describing
the series: MA(1) or AR(1)? Justify
your answer.
2. If you think the process is an AR(1) process, what do you think is the value of the autoregressive
parameter? If you think the process is a MA(1) process, what do you think is the value of the moving
average parameter?
Realizations from four stochastic processes are given in the Figures below:
Which processes appear to be covariance stationary and which processes appear to be non-stationary?
For those processes that you think are non-stationary, explain why the process is non-stationary.
Y t = 10 − 0.67Y t − 1 + ϵ t, ϵ ∼ N(0, 1)
Exercise 4.6 Let Y t represent a stochastic process. Under what conditions is Y t covariance stationary?
Realizations from four stochastic processes are given in the Figures below:
2. Which processes appear to be covariance stationary and which processes appear to be non-
stationary?
The only process which appears to be covariance stationary is Process 1 (constant mean, volatility etc.)
For those processes that you think are non-stationary, explain why the process is non-stationary.
Process 2 has an obvious time trend so the mean is not independet of t. Process 3 has a level shift
around observation 75 (the mean shifts up) so that the mean before t = 75 is different from the mean after
t = 75. Process 4 shows an increase in variance/volatility after observation 75. The variance/volatility
before t = 75 is different from the variance/volatility after t = 75.
Y t = 10 − 0.67Y t − 1 + ϵ t, ϵ ∼ N(0, 1)
Stationary. The absolute value of the coefficient of Y t − 1 is 0.67, which is smaller than 1.
Box, G., and G. M. Jenkins. 1976. Time Series Analysis : Forecasting and Control. San Francisco:
Holden-Day.
Ruppert, D., and D. S. Matteson. 2015. Statistics and Data Analysis for Financial Engineering with R
Examples. New York: Springer.
Zivot, E. 2016. Modeling Financial Time Series with R. New York: Springer.
∞
17. To conserve on notation, we will represent the stochastic process
{Y t} t = − ∞ simply as {Y t}. ↩︎
18. This is also called a Cauchy distribution. For this distribution E[Y t] = var(Y t) = cov(Y t, Y t − j) = ∞. ↩︎