Lecture Notes Part1
Lecture Notes Part1
ÿ
Evÿzen Koÿcenda & Alexandr Cerný
CERGE-EI, Prague
September 13, 2005
Contents
1 Introduction 3
3 Difference Equations 14
3.1 Linear Difference Equations . . . . . . . . . . . . . . . . . . . . . 14
3.2 Lag Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Solution of Difference Equations . . . . . . . . . . . . . . . . . . 15
3.3.1 Particular Solution and Lag Operators . . . . . . . . . . . 16
3.3.2 Solution by Iteration . . . . . . . . . . . . . . . . . . . . . 17
3.3.3 Homogenous Solution . . . . . . . . . . . . . . . . . . . . 19
3.3.4 Particular Solution . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Stability Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Stability and Stationarity . . . . . . . . . . . . . . . . . . . . . . 22
1
4.2.1 Deterministic Trend . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Stochastic Trend . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.3 Stochastic plus Deterministic Trend . . . . . . . . . . . . 36
4.2.4 Final Notes on Trends in Time Series . . . . . . . . . . . 37
4.3 Seasonality in Time Series . . . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Removing Seasonal Pattern . . . . . . . . . . . . . . . . . 39
4.3.2 Estimating Seasonal Pattern . . . . . . . . . . . . . . . . 40
4.3.3 Detecting Seasonal Pattern . . . . . . . . . . . . . . . . . 41
4.4 Unit Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4.1 Dickey-Fuller Test . . . . . . . . . . . . . . . . . . . . . . 44
4.4.2 Augmented Dickey-Fuller Test . . . . . . . . . . . . . . . 46
4.4.3 Shortcomings of the Dickey-Fuller test . . . . . . . . . . . 48
4.4.4 KP SS test . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Structural Change and Unit Roots . . . . . . . . . . . . . . . . . 51
4.5.1 Perron’s Test . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5.2 Zivot and Andrews’ Test . . . . . . . . . . . . . . . . . . . 54
4.6 Detecting a Structural Change . . . . . . . . . . . . . . . . . . . 57
4.6.1 Vogelsang’s Test . . . . . . . . . . . . . . . . . . . . . . . 57
4.7 Conditional Heteroskedasticity . . . . . . . . . . . . . . . . . . . 62
4.7.1 Conditional and Unconditional Expectations . . . . . . . 63
4.7.2 ARCH Models . . . . . . . . . . . . . . . . . . . . . . . . 64
4.7.3 GARCH Models . . . . . . . . . . . . . . . . . . . . . . . 67
4.7.4 Detecting Conditional Heteroskedasticity . . . . . . . . . 69
4.7.5 How to Identify and Estimate a GARCH Model . . . . . 73
4.7.6 Extensions of ARCH Models . . . . . . . . . . . . . . . . 75
B Statistical Tables 81
2
1 Introduction
The following Chapters aim to present basic tools of econometric analysis of
time series. The text is based on the material presented in a semester course
of time series econometrics given at CERGE-EI. The major stress of the course
is on practical applications of theoretical tools. Therefore we usually abstract
from the rigorous style of theorems and proofs. Rather we try to present the
material in a way that is naturally easy to understand. In many cases we rely
only on an intuitive explanation and understanding of studied phenomena. To
readers interested in more formal approach we recommend to read the appro-
priate references. Useful references for time series econometrics are [1] and [2].
The classical reference for general econometric issues is [3]. Many chapters of
this text are based and refer to inßuential papers, where also more detailed
presentation of the topic is available.
The text is divided into four major sections Nature of time series , Difference
equations, Univariate time series, and Multiple time series. The Þrst section
gives an introduction into time series analysis. The second section describes
in short the theory of difference equations with the emphasis on those results
that are important for time series econometrics. The third section presents the
methods commonly used in univariate time series analysis, thus in analysis of
time series of one single variable. The fourth section deals with time series
models of more interrelated variables.
3
2 Nature of Time Series
In general there are two types of data sets studied by econometrics, cross-
sectional data sets and time series. Cross-sectional data sets are data collected
in one time across several entities like countries, industries, companies etc. A
time series is any set of data ordered by time. Our lives pass in time, therefore
it is a common think for any variable to become a time series. Any variable that
is being registered periodically forms a time series. For example, yearly gross
domestic product (GDP) recorded over several years is a time series. Similarly
price level, unemployment, exchange rates of a currency, or proÞts of a Þrm can
form a time series, if recorded periodically over certain time span. The com-
bination of cross-sectional data and time series creates what economists call a
panel data set. Panel data sets can be studied by tools typical for panel data
econometrics or by tools characteristic for multiple time series analysis.
The fact that time series data are ordered by time implies their special prop-
erties and some special ways of analyzing them. It enables estimation of models
containing only one variable, the so-called univariate time series estimation. In
such case the value of a variable is estimated by its past values and eventually
by time as well. Because of the time ordering of data, issues of autocorrelation
gain a large importance in time series econometrics.
Example 1 The stochastic process that generates the time series can be for
example described as yt = 0:5yt−1 + "t (AR(1) process), where "t are normal
iid with mean 0 and variance ¾ 2 . With initial condition y0 = 0 the sequence of
random variables generated by this process is "1 ; 0:5"1 + "2 ; 0:25"1 + 0:5"2 + "3 ;
etc: Finally the concrete realizations of these random variables can be numbers
0:13882; 0:034936; −1:69767; etc: When we say that we estimate a time series, it
means that based on the data (concrete realizations) we estimate the underlying
process that generated the time series. The speciÞcation of the process is also
called the model.
For basic description of time series the following properties are deÞned: fre-
quency, time span, mean, variance, and covariance.
1. The frequency is related to the time difference between yt and yt+1 . The
data can be collected with yearly, quarterly, daily, or even greater fre-
quency. For example stock prices are recorded after any change (tick by
tick).
4
2. The time span is the period of time over which the data were collected. If
there are no gaps in the data, the time span is equivalent to the number
of observations T times the frequency. Throughout the text T is reserved
to indicate the sample size unless stated otherwise.
3. The mean ¹t is deÞned as ¹t = E (yt ). It means that the mean is deÞned
for each element of the time series, so that there are N such means.
h i
4. The variance is var (yt ) = E (yt − ¹t )2 . Similarly as by the mean, the
variance is deÞned for each element of the time series.
£ ¡ ¢¤
5. The covariance is cov (yt ; yt−s ) = E (yt − ¹t ) yt−s − ¹t−s . The covari-
ance is deÞned for each time t and for each time difference s, so that in the
general case there are t2 − t covariances; however, because of symmetry
only half of them can be different.
2.3 Stationarity
Stationarity is a crucial property of time series. Intuitively a time series must
be stationary, for us to be able to make some predictions of its future behavior.
Non-stationary time series are unpredictable in this sense, because they tend
to "explode". If a time series is stationary, then any shock that occurs in time
t has a diminishing effect over time and Þnally disappears in time t + s as
s → ∞. This feature is called mean reversion. With a non-stationary time
series it is not the case and the effect of the shock "explodes" over time. A
special case of non-stationary process is the so-called unit root process . With
a unit root process, the shock that occurred in time t doesn’t "explode", but
remains present in the same magnitude in all future dates. For more details on
stationarity, non-stationarity, and unit root processes you can see the sections
3.4, 3.5, and 4.4.
The most useful stationarity concept in econometrics is the concept of co-
variance stationarity. Throughout the text, we will for simplicity use usually
5
only the term stationarity and mean covariance stationarity. We say that a
T
time series {yt }t=1 is covariance stationary if and only if the following formal
conditions are satisÞed:
It means that a time series is covariance stationary, if its mean and variance
are constant and Þnite over time and if the covariance depends only on the time
distance s between the two elements of the time series but not on the time t
itself.
Note that the white noise introduced in the previous section is obviously
stationary. However, a stationary time series is not automatically a white noise,
because for a white noise we would need additional conditions that the mean
and all covariances are 0, which means that ¹ = 0 and ° s = 0 for all s.
Most economic time series are not stationary and some transformations are
needed in order to achieve stationarity. The most popular transformations are
described in the next section.
Stationary
Non-stationary
Unit root
6
|a1 | = 1, then the time series contains a unit root. The formal necessary and
sufficient conditions for time series stationarity will be described in the sections
3.4 and 3.5 The three time series were generated by the following processes:
stationary: yt = 0:6yt−1 + "t , a1 = 0:6 < 1
non-stationary: yt = 1:1yt−1 + "t , a1 = 1:1 > 1
unit root: yt = yt−1 + "t , a1 = 1
From the Þgure we can distinguish clear visual differences between the time
series. Stationary time series tends to return often to its initial value. Non-
stationary time series explodes after a while. Finally, a time series containing
unit root can resemble a stationary time series, but it does not return to its
initial value as often. These differences can be more or less pronounced on a
visual plot. Nevertheless, visual plot of the time series can not replace formal
tests of stationarity described in the sections 4.4 and 4.5.
7
second differences ∆2 yt we apply the identical transformation on Þrst dif-
ferences ∆2 yt = ∆yt − ∆yt−1 . In this way we can create differences of
even higher orders. Although any time series becomes stationary after
sufficient order of differencing, differencing of higher than second order is
almost never used in econometrics. The reason is that by each differencing
we lose one observation and, more important, by each differencing we loose
part of information contained in the data. Moreover, higher order differ-
ences have no clear interpretation. Second differences are already linear
growth rates of the linear growth rates obtained by Þrst differencing. Such
variables are obviously not very interesting for applied economic research.
3. Detrending is a procedure that removes linear or even higher order trend
from the data. To detrend a time series, we run a regression of the series
on time t or its higher powers as well and then we substract the estimated
values from the original time series. The degree of the time polynomial
included in the regression can be formaly tested by the F -test prior to
detrending. Trending time series are never stationary, because their mean
is not constant. Therefore detrending also helps to make such time series
stationary. However, differencing is generally more successful in achieving
this goal and therefore it is also more commonly used. More details about
trends in time series will be given in the next section and in the section
4.2.
Example 3 Usually the economic data grow exponentially. It means that for
a variable X we have a growth equation Xt = (1 + gt )Xt−1 in discrete case,
or Xt = Xt−1 egt in continuous case, where gt is in between two successive
periods growth rate or the rate of return depending on the nature of X. By
logarithmic differencing we get ln Xt − ln Xt−1 = ln(1 + gt ) ≈ gt in discrete case
or ln Xt − ln Xt−1 = gt in continuous case.
SpeciÞcally, let us consider a time series of price levels {Pt }Tt=1 . By loga-
rithmic differencing we receive the series of inßation rates it = ln Pt − ln Pt−1 .
The above mentioned differences were always just differences between two
successive periods. If the data perform some seasonal pattern, it is more fruitful
to apply differences between the seasonal periods. For example, with monthly
data we can apply 12th seasonal logarithmic differencing ln Xt − ln Xt−12 . Such
procedure removes the seasonal pattern from the data and also decreases the
variance of the series (if the seasonal pattern has a period of 12 months). More
about seasonal patterns will be mentioned in the next section and in the section
4.3.
8
yt = Tt + St + It . (1)
Example 4 In the Þgure 2 the decomposition of a time series into trend, sea-
sonal, and irregular pattern is shown. The time series on the picture consists
of
the trend Tt = 2 + 0:3t,
the seasonal pattern St = 4 sin(t2¼=6), and
the irregular pattern It = 0:7It−1 + "t , where "t are normal i.i.d.
with 0 mean and variance ¾2 = 9.
9
Series
Trend
Seasonal
Irregular
2. Moving average process of the order q, M A(q), is described with the equa-
tion q
X
yt = ¯ i "t−i . (3)
i=0
10
If the series was differenced n times, in order to make it stationary, and Þrst
then estimated by an ARM A(p; q) model, then we say sometimes that it was
estimated by an ARIM A(p; n; q) model. The I inserted in the abbreviation
ARMA and the n in the parentheses stand for integrated of the order n, as was
pointed out earlier.
1. The trend
Economic time series contain very often a clear trend. In fact, their growth
is usually not only linear but exponential, as it was mentioned in the ex-
ample 3. It means that even after we take a natural logarithm of the data,
still a clear linear trend persists. Such behavior is typical for variables
that naturally grow over time. Typical examples are GDP, price level,
consumption, prices of stocks etc. When we estimate such series, we usu-
aly apply logarithmic differencing, which yields in between two desired
periods growth rate, or the rate of return. Fist logarithmic differencing
may yield for example a yearly growth rate of aggregate output or daily
rate of return on a Þnancial instrument or product. If such growth rates or
rates of return are stationary, then we can estimate them with an ARM A
model. If they are not stationary, then we can make them stationary
by further differencing and only after that estimate them by an ARM A
model. The most common transformations of time series that are applied
in order to remove the trend and to achieve stationarity were described in
the section 2.4. The trend component of time series was already mentioned
in the section 2.5. More about trends will be yet given in the section 4.2.
2. Trend breaks and structural changes
To make things more complicated, the trend mentioned in the previous
point is usually not constant over time. For example GDP can grow
roughly by 4 % for 10 years and roughly by 2 % for the next 10 years.
The same case can happen for inßation or with stock prices. We say
that such time series incures a structural change or containes a structural
break. Moreover, structural change can involve not only change in the
trend coefficient (in the growth rate) but also in the intercept. Structural
changes in time series are usually caused by some real historical or eco-
nomic events. For example the oil shocks in the 70’s were followed by a
signiÞcant slowdown of economic growth of most industrialized countries.
The issue of trend breaks and structural changes will be studied in the
sections 4.5 and 4.6.
11
3. The mean running up and down
Some time series, like for example exchange rates, do not show any per-
sistent tendency to increase or decrease. On the other hand, they do not
return to their initial value very often either. Rather they alternate be-
tween relatively long periods of increases and decreases. As it was already
suggested in the example 2, such behavior is typical for series that contain
a unit root. The simplest process containing a unit root is the so-called
"random walk". It is deÞned by the equation yt = yt−1 + "t , where "t is
white noise. Our suspicion that exchange rates and rates of return will
behave as random walk has a noble counterpart in the economic theory,
namely in the information efficiency hypothesis. This hypothesis is de-
scribed in the example 5. More about unit roots and random walks will
be given in the sections 4.2 and 4.4.
4. High persistence of shocks
This observation is based on the fact that any shock that occurs at time t
has typically a long persistence in economic time series. Again it is related
to the fact that the underlying data generating processes are either non-
stationary or close to unit root processes. In such case the coefficients
ai and ¯ i in the ARM A models are relatively high. Therefore any past
shock is being transferred to the future dates in relatively large magnitude.
Also this point has a close relation to the point two on trend breaks and
structural changes. If the shock has high persistence, it can appear as a
structural change in the data.
5. Volatility is not constant
Especially in the case of data generated on Þnancial markets (e.g. stock
prices), we can observe periods of high and low volatility. Nevertheless,
this behavior can be detected also by GDP or price levels. Time series
with such properties are called to be conditionally heteroskedastic and
are usually estimated by the so-called ARCH (autoregressive conditional
heteroskedasticity) and GARCH (generalized ARCH) models, which will
be described in the section 4.7.
6. Non-stationarity
All the previous Þve points have one common consequence. Time series
of economic data are in most cases non-stationary. Therefore some trans-
formations are usually needed in order to make them stationary. These
transformations were already described in the section 2.4. The formal
tests of stationarity will be given in the sections 4.4 and 4.5.
7. Comovements in multiple time series
Some time series can share comovements with other time series. This oc-
curs for example, when shocks to one series are correlated with shocks to
other series. In today’s open world, where national economies are closely
12
linked by many channels (e.g. international trade, international invest-
ment, foreign exchange markets etc.), such behavior is not surprising. If
the comovement is a result of some long term equilibrium towards which
the two series tend to return after each shock, then such series are called to
be cointegrated. More on cointegration and multiple time series comove-
ments will be given in the section 5.
13
3 Difference Equations
All the equations used to describe the time series data generating processes in the
previous sections are in mathematical terminology called difference equations.
Therefore the theory of difference equations constitutes the basic mathematical
background for time series econometrics. In this section we will brießy introduce
major results of this theory. Mainly we will focus on those results that are
important for econometric time series analysis. From this point of view, the
stability conditions and the relation between the stability of a difference equation
and the stationarity of a time series are crucial (see the sections 3.4 and 3.5).
Much more detailed presentation of the topic can be found in [1].
where xt is the so-called forcing process that can be any function of time t,
current and lagged values of variables other than y, and if the linear difference
equation should deserve the attribute stochastic, it should be also a function of
stochastic variables (e.g. of stochastic disturbances).
14
A(L)yt = a0 + B(L)"t , (6)
where A(L) and B(L) are the following polynomials of L:
n
X n
X
yt = ai yt−i or rewritten yt − ai yt−i = 0, (7)
i=1 i=1
which means that the constant a0 and the forcing process xt from the
original equation are left out. Such homogeneous equation has n linearly
independent solutions that we will denote as Hi (t). The homogeneous
solutions are functions of time t only. Any linear combination of the
homogeneous solutions Hi (t) is also a solution to the equation (7).
2. One solution of the whole linear difference equation (5) must be found.
Such solution is called the particular solution. Lets denote it as P ({xt } ; t).
The particular solution can be a function of time t and the elements of
the forcing process {xt }.
15
3. The general solution of the linear difference equation is any linear combi-
nation of the n homogeneous solutions Hi (t) plus the particular solution
P ({xt } ; t). Lets denote the general solution as G({xt } ; t). It can be
written in the following way:
n
X
G({xt } ; t) = Ci Hi (t) + P ({xt } ; t), (8)
i=1
and B(L)
A(L) are not numbers but operators. To get the coefficients of the partic-
ular solution we must solve the polynomials A(L) and B(L) with respect to L,
which means to Þnd their characteristic roots. Such procedure is shown for the
simplest case of an AR(1) process in the following example 6. Moreover, the
equation (9) is only a particular solution of the ARM A(p; q) equation, which
is not unique. Lag operators can not be used to express homogeneous solutions
and so can they neither be used to express general solutions.
16
and so has just one characteristic root, which is moreover equal to 1=a1 . The
solution is yt = a0 =(1 − a1 L) + "t =(1 − a1 L).
Now consider only the case of |a1 | < 1. Using the properties of the lag
operator, the solution can be written as yt = (1 + a1 L + a21 L2 + :::)a0 + (1 +
a1 L + a21 L2 + :::)"t . Finally application of the lag operator yields yt = (a0 +
a1 a0 + a21 a0 + :::) + ("t + a1 "t−1 + a21 "t−2 + :::), which can be simpliÞed as
∞
X
yt = a0 =(1 − a1 ) + ai1 "t−i . (10)
i=0
17
∞
X ∞
X
yt = (y0 − a0 =(1 − a1 ) − ai1 "−i )at1 + a0 =(1 − a1 ) + ai1 "t−i , which can
i=0 i=0
be further simpliÞed as
t−1
X
yt = (y0 − a0 =(1 − a1 ))at1 + a0 =(1 − a1 ) + ai1 "t−i . (12)
i=0
18
3.3.3 Homogenous Solution
Here we will describe the procedure that yields homogeneous solutions of a
general nth-order linear homogeneous difference equation (7):
n
X n
X
yt = ai yt−i or rewritten as yt − ai yt−i = 0,
i=1 i=1
where A(L) is the polynomial of the lag operator from the equation (13). The
equation (15) is called inverse characteristic equation and the L’s that solve it
are inverse values of the ®’s that solve the characteristic equation (14).
Now, we search for ®’s that solve the characteristic equation (14). It is a
n
polynomial equation that will have n characteristic roots {®i }i=1 . In general
some of the roots can be multiple and some can be complex. We will distin-
guish these cases in the following list and assign one homogeneous solution to
each of the roots, so that we get n linearly independent homogeneous solutions
n
{Hi (t)}i=1 .
19
2. The root ®j is real and multiple.
So we have k identical roots ®j , where k is the multiplicity. In this case the
corresponding k linearly independent homogeneous solutions are Hj (t) =
®tj , Hj+1 (t) = t®tj , Hj+2 (t) = t2 ®tj , ..., Hj+k−1 (t) = tk−1 ®tj .
3. The root ®j is complex.
Such root will necessarily come in a conjugate complex pair witch can
be written as ° j ± iµ j . The two corresponding homogeneous solutions
will be also complex and will take the form of Hj (t) = (° j + iµj )t and
Hj+1 (t) = (° j − iµj )t .
and it seems reasonable to try to Þnd the particular solution in the form of
aPconstant yt = c. Indeed, substituing this P in the equation leads to c = ao +
n n
i=1 ai c, which can be solved as c
Pn = ao =(1− i=1 ai ). So we easily P
obtained the
n
particular
Pn solution y t = ao =(1− i=1 a i ). This is possible only if 1− i=1 ai 6= 0.
If 1 − i=1 ai = 0, then the particular solutionP takes the form yt = ct. It can
n
be shown analogically that in this case c = a0 =( i=1 iai ).
The method that probably resembles the most the general cookbook style
is called the method of undetermined coefficients. Using this method we proÞt
from the fact that the particular solution of a linear difference equation must be
also linear. Then we suppose the particular solution to be a linear combination
of a constant c, time t, and the elements of the forcing process {xt }, because we
know that there is hardly anything else it could depend on. Then we substitute
this so-called challenge solution into the difference equation and solve for the
constants of the linear combination. Even though it sounds simple, the practical
application may get cumbersome.
20
3.4 Stability Conditions
Stability of homogeneous linear difference equations is closely linked to the con-
cept of stationarity of time series. In this sense, stability conditions represent a
result of the theory of difference equations that has the greatest importance for
econometric analysis of time series. If a linear homogeneous difference equation
is stable, then its solution converges to zero as t → ∞. If it is unstable, then its
solution diverges.
Stability of the homogeneous part of a linear difference equation that de-
scribes the time series generating process is the necessary condition for the
time series stationarity. In the section 3.3.3 we have already mentioned that a
general nth-order linear homogeneous difference equation (7) forms the homoge-
neous part of a difference equation describing any general ARM A(n; q) process.
That is why the stability conditions can be applied as necessary conditions for
stationarity of a wide range of time series, namely of any time series that was
generated by a general ARMA(n; q) process.
In the section 3.3.2 we have seen that the homogeneous solution of the ho-
mogeneous part of an AR(1) equation yt = a1 yt−1 (note that it is also the
homogeneous part of any ARM A(1; q) equation) takes the form of yt = Cat1 ,
because a1 is the only characteristic root of the corresponding characteristic
equation. Obviously such solution is stable and converges to zero if |a1 | < 1 and
is unstable and diverges if |a1 | > 1. If a1 = 1, then the solution remains yt = C
forever and we say similarly as in the time series context that the equation
contains a unit root.
For a Þrst order homogeneous linear difference equation yt = a1 yt−1 we can
summarize that
1. If |a1 | < 1, then the equation and its solution are stable.
2. If |a1 | > 1, then the equation and its solution are unstable.
3. If |a1 | = 1, then the equation is unstable and contains a unit root.
21
2. If at least one characteristic root ®i lies outside the unit circle, that is
|®i | > 1, then the equation and its solution are unstable.
3. If at least one characteristic root ®i lies on the unit circle, that is |®i | = 1,
then the equation is unstable and contains a unit root.
The particular solution associated with this forcing process can cause the
time series to be non-stationary, even if the homogeneous solution is stable.
This is not the case of any AR process and of any Þnite MA process, where
the stability of the corresponding homogeneous equation is not only necessary
but also sufficient condition for stationarity. However, it can be the case of an
inÞnite MA process. So in the case of a M A process some additional conditions
for stationarity are needed. We summarize the results as follows:
22
Pn
1. AR(n) process yt = a0 + i=1 ai yt−i + "t
Stability of the corresponding nth-order linear homogeneous difference
equation is necessary and sufficient condition for stationarity of the gen-
erated time series.
Pq
2. M A(q) process yt = i=0 ¯ i "t−i
Here the corresponding homogeneous difference equation is yt = 0, which
is obviously stable. However, the forcing process and with it associ-
ated particular solution can cause the generated time series to be non-
stationary.
Using the above conditions we have related the econometric stationarity con-
cept with the mathematical concept of stability of linear homogeneous difference
equations. However, one more condition for the stationarity of the generated
time series is needed. Stationarity of a time series requires that the mean,
variance, and covariances are constant. Even if the homogeneous part of the
difference equation generating the time series is stable, its homogeneous solu-
tion is not constant. It only converges to zero. It means that it will be constant
(equal to zero) only after a sufficiently long time t. Therefore the above listed
necessary and sufficient conditions hold only for sufficiently high time periods,
that is for time periods that are sufficiently distant from the initial period t = 0.
23
4 Univariate Time Series
4.1 Estimation of an ARMA Model
As mentioned earlier, an ARMA model is a standard building block of the
time series econometrics. In this section we will describe how to estimate the
time series’ true data generating process with an ARMA model. We will follow
the so-called Box-Jenkins methodology [4]. The aim of the methodology is to
Þnd the most parsimonious model of the data generating process. By the most
parsimonious we mean such model that Þts the data well and at the same
time uses a minimum number of parameters; thus it leaves a high number of
degrees of freedom. The search for parsimony is common for all branches of
econometrics, not only for time series analysis. In fact it is the general principal
of estimation. In estimation, our aim is not to achieve a perfect Þt, but to achieve
a reasonable Þt with few parameters. A perfect Þt can be always achieved
trivially, if we use as many parameters as data points. However, such extremely
overparametrized model tells us nothing about the nature of the events and
decisions that generated the data.
The Box-Jenkins methodology can be divided into three main stages. The
Þrst is to identify the data generating process, the second is to estimate the
parameters of this process, and the third is to diagnose the residuals from the
estimated model. If the process was identiÞed and estimated correctly, then
such residuals should be diagnosed as a white noise.
Most of the tools and procedures of the Box-Jenkins methodology require
the time series to be stationary and the estimated process to be invertible. The
concept of stationarity was already explained in the section 2.3. The concept
of invertibility will be explained later in the subsection 4.1.2. In short it means
that the process can be represented by a Þnite-order or convergent autoregressive
process.
In the following subsections we will Þrst describe the tools needed for the
application of the Box-Jenkins methodology and after that we will clarify the
sequence and logic in which they should be applied.
cov(yt ; yt−s )
½s = . (16)
var(yt )
Because we consider stationary time series, the ACF expressed by the equation
24
(16) is only a function of the time difference s and not of the time t itself.
Remember that one of the stationarity conditions is the independence of var(yt )
and cov(yt ; yt−s ) on the time t. Independence of the ACF on time then follows
from its construction. Stationarity of {yt } also ensures that the ACF is equal
to corr(yt ; yt−s
p). Because var(yt ) = var(yt−s ), we can write corr(yt ; yt−s ) =
cov(yt ; yt−s )= var(yt )var(yt−s ) = cov(yt ; yt−s )=var(yt ) = ½s .
In the example 7 we computed for illustration the theoretical ACF of AR(1)
and M A(1) processes and we got the following results:
1. AR(1) process yt = a0 + a1 yt−1 + "t , where |a1 | < 1 (sufficient and neces-
sary condition for stationarity of such process). Then the ACF is
½s = as1 .
½0 = 1,
½1 = ¯ 1 =(1 + ¯ 21 ),
½s = 0 for any s > 1.
The ACF is different from zero for s ≤ 1 and is zero for s > 1. It can be
shown that for any MA(q) process the autocorrelation function is different
from zero for s ≤ q and is zero for s > q.
where
T
1X
y= yt
T t=1
25
is the sample mean.
T
Having observations {yt }t=1 , the equation (17) enables to compute the sam-
ple ACF and compare it with the theoretical autocorrelation functions com-
puted for different ARM A(p; q) processes according to the equation (16). If the
sample ACF resembles an oscillatory or direct decay, we can assume that the
data were generated by some AR(p) process. If the sample ACF is different
from zero until the lag s = q and goes almost to zero for lags s > q, then we
can assume that the true data generating process was a MA(q). A more gen-
eral algorithm that enables to assign an appropriate ARMA(p; q) process to a
certain behavior of the sample ACF (and also sample P ACF ) is offered in the
table 1 in the subsection 4.1.6.
Example 7 First consider an AR(1) process yt = a0 + a1 yt−1 + "t .
To compute the ACF we need to get var(yt ) and cov(yt ; yt−s ). Lets denote
var("t ) by ¾2 , then because yt−1 and "t are uncorrelated, we can write
var(yt ) = var(a0 + a1 yt−1 + "t ) = a21 var(yt−1 ) + ¾2 .
We assume the time series {yt } to be stationary, thus var(yt ) = var(yt−1 ).
Substitution of this identity yields an equation var(yt ) = a21 var(yt )+¾2 . Solving
this equation we get
¾2
var(yt ) = .
1 − a21
For covariances we can write similarly
cov(yt ; yt−1 ) = cov(a0 + a1 yt−1 + "t ; yt−1 ) = a1 var(yt−1 ) = a1 var(yt ),
cov(yt ; yt−2 ) = cov(a0 + a1 yt−1 + "t ; yt−2 ) = a1 cov(yt−1 ; yt−2 ),
= a1 cov(yt ; yt−1 ) = a21 var(yt ),
and for the general case of cov(yt ; yt−s ) we get through iteration
cov(yt ; yt−s ) = cov(a0 + a1 yt−1 + "t ; yt−s ) = a1 cov(yt−1 ; yt−s ),
= a21 cov(yt−2 ; yt−s ) = ::: = as−1
1 cov(yt−s+1 ; yt−s ),
s−1 s
= a1 cov(yt ; yt−1 ) = a1 var(yt ).
Now we can compute the ACF for an AR(1) process. By substituting the ex-
pression for cov(yt ; yt−s ) into the equation (16) we get
cov(yt ; yt−s ) as var(yt )
½s = = 1 = as1 .
var(yt ) var(yt )
Second consider an MA(1) process yt = "t + ¯ 1 "t−1 .
var(yt ) = var("t + ¯ 1 "t−1 ) = (1 + ¯ 21 )¾ 2 ,
cov(yt ; yt−1 ) = cov("t + ¯ 1 "t−1 ; "t−1 + ¯ 1 "t−2 ) = ¯ 1 ¾2 ,
cov(yt ; yt−2 ) = cov("t + ¯ 1 "t−1 ; "t−2 + ¯ 1 "t−3 ) = 0,
cov(yt ; yt−3 ) = cov("t + ¯ 1 "t−1 ; "t−3 + ¯ 1 "t−4 ) = 0,
etc.
26
So that the ACF is
½0 = 1,
½1 = ¯ 1 =(1 + ¯ 21 ),
½s = 0 for any s > 1.
where the autocorrelation coefficient at the Þrst lag Á11 equals to the desired
P ACF . Knowing the ACF for the Þrs lag ½1 , it must hold that
Then as a result
Á11 = ½1 ,
which is not surprising, because for the Þrst lag there are no intervening lags in
between, so that the P ACF Á11 equals the ACF ½1 . To compute the P ACF
for the second lag Á22 , we must suppose that the time series was generated by
an AR(2) process described as
where the autocorrelation coefficient at the second lag Á22 equals the desired
P ACF . Knowing the ACF for the Þrst and second lags ½1 and ½2 , it must
hold that
½1 = corr(yt ; yt−1 ) = corr(a0 + Á21 yt−1 + Á22 yt−2 + "t ; yt−1 ) = Á21 + Á22 ½1 ,
½2 = corr(yt ; yt−2 ) = corr(a0 + Á21 yt−1 + Á22 yt−2 + "t ; yt−2 ) = Á21 ½1 + Á22 .
27
These two equations can be solved for Á22 so that we have
We can continue the procedure until we get the P ACF for any general lag
s; denoted as Áss . In such a case we must suppose that the time series was
generated by an AR(s) process and that we know the ACF for all lags until s.
In this manner we obtain the following expression for the P ACF of any general
lag s
Á11 = ½1 , (18)
Á22 = (½2 − ½21 )=(1 − ½21 ),
s−1
X
½s − Ás−1;j ½s−j
j=1
Áss = s−1
for s > 2,
X
1− Ás−1;j ½j
j=1
1. AR(1) process yt = a0 + a1 yt−1 + "t , where |a1 | < 1 (sufficient and neces-
sary condition for stationarity of such process).
The P ACF is different from zero until lag s = 1 and equals to zero for
all lags s > 1. Similarly, if the data were generated by an AR(p) process,
then there is no direct correlation between yt and yt−s for s > p. The
P ACF is thus different from zero up to the lag s = p and equals to zero
for any lag s > p.
2. M A(1) process yt = "t + ¯ 1 "t−1 (such process is always stationary).
If ¯ 1 6= −1, we can rewrite the MA(1) equation using the lag operator
as yt =(1 + ¯ 1 L) = "t . We suppose the MA(1) process to be not only
stationary but also invertible, thus it must have an convergent inÞnite
order AR representation yt = ¯ 1 yt−1 −¯ 21 yt−2 +¯ 31 yt−3 −:::+"t . Note that
the necessary and sufficient condition for convergence of this inÞnite order
AR equation and therefore also for invertibility of the M A(1) equation is
that |¯ 1 | < 1. Because the M A(1) process has such convergent inÞnite
order AR representation the P ACF will never go directly to zero. Rather
it will decay exponentially to zero, while the decay will be direct if ¯ 1 < 0
and oscillatory if ¯ 1 > 0. It can be shown that for any invertible MA(q)
process the P ACF will decay to zero either in a direct or oscillatory way.
We have just shown that the M A(1) process must be invertible, if we want
to get a meaningful P ACF . As it was already mentioned in the beginning of
28
the section 4.1 invertibility of any ARM A(p; q) process means that it can be
represented by a Þnite-order or convergent autoregressive process. In general
we need any ARM A(p; q) process to be invertible for its P ACF to make sense.
In a more exact way an ARM A(p; q) process
n
X q
X
yt = a0 + ai yt−i + ¯ i "t−i
i=1 i=0
P
is invertible if all the roots Li of the polynomial qi=0 ¯ i Li of the lag operatorL
lie outside the unite circle (|Li | > 1). In such case the polynomial can be
Yq Y q
rewritten as (L − Li ) = − Li (1 − ri L), where ri stands for L1i , thus
i=0 i=0
|ri | < 1. The ARMA(p; q) process can be then rewritten as
P
−yt −a0 − n i=1 ai yt−i
q = q + q + "t .
Y Y Y
Li (1 − ri L); Li (1 − ri L); Li (1 − ri L);
i=0 i=0 i=0
Because all |ri | < 1, all elements in the equation above can be extended step by
step into convergent sums, which yields a convergent inÞnite AR representation
of the original ARM A(p; q) process.
The P ACF given by the equation (18) is a theoretical function computed
from the theoretical ACF (½s ). In practice, when we want to estimate a time se-
ries, we know neither the theoretical ACF nor the theoretical P ACF . Similarly
as in the case of ACF we use the sample partial autocorrelation function (sam-
ple P ACF ) instead of the theoretical one. We get the sample P ACF simply by
replacing the theoretical ACF (½s ) by its sample counterpart b ½s in the equation
(18). Onother way how to get the sample P ACF is to apply the OLS method
to estimate the equations yt = a0 + Ái1 yt−1 + Ái2 yt−2 + ::: + Áii yt−i + "t . The
coefficient estimates will then equal to the appropriate elements of the sample
P ACF .
T
Having the observations {yt }t=1 , we can compute the sample P ACF and
compare it with theoretical partial autocorrelation functions computed for dif-
ferent ARM A(p; q) processes. If the sample P ACF resembles an oscillating
or direct decay, we can assume that the data were generated by some MA(q)
process. If the sample P ACF is different from zero until the lag s = p and goes
almost to zero for lags s > p, then we can assume that the true data generating
process was an AR(p). A more general algorithm that enables to assign the
appropriate ARM A(p; q) process to a certain behavior of the sample P ACF
(and also sample ACF ) is offered in the table 1 in the subsection 4.1.6.
4.1.3 Q-Tests
In the previous section, we described the autocorrelation and partial autocor-
relation functions as useful tools that help us to guess the number of lags p
29
and q of the true ARM A(p; q) data generating process. Application of these
functions is rather intuitive. The sample couterparts of these functions are com-
pared to the theoretical functions for different ARMA models. If the pattern of
the sample ACF and P ACF resembles the theoretical ACF and P ACF of an
ARMA(p; q), then p and q might represent the true number of lags. However, in
practice this procedure is rarely so straightforward and easy. The sample ACF
and P ACF can often appear to be ambiguous. Then it becomes very difficult to
discover some clear pattern. Indeed, the sample ACF and P ACF are random
variables and as such can deviate from the expected pattern by pure chance. As
a result, the guess of the correct number of lags based on the sample ACF and
P ACF is by large the matter of experience.
To increase the chance that our guess is correct, we can use another tool — the
Q-tests. The Q-tests based on Q-statistics offer a statistically more formal way
to asses the correct number of lags. They test whether a group of autocorrela-
tions ½s (elements of the ACF ) is statistically different from zero. Theoretically
the sample variance for the sample ACF and P ACF can be computed as well
as t-tests of these functions formulated for each lag s separately. However, such
tests have a low power, because they are always based only on one value of the
sample ACF and P ACF , for one particular s. The Q-statistics are based on a
group of sample autocorrelations b ½s , and therefore their power is higher.
In practical application two well known types of Q-tests are used. The Box-
Pierce Q-test [5] and the Ljung-Box Q-test [6]. The Þrst test performs well only
in very large samples, while the second test uses a Q-statistic that is adjusted
in order to perform better in small samples. That is why the Ljung-Box Q-test
is usually the preferred one. For the sake of completeness both of them are
introduced though.
1. Box-Pierce Q-test
The Box-Pierce Q-test is based on the Box-Pierce Q-statistic deÞned as
k
X
Q=T ½2i ,
b (19)
i=1
where b ½i are the elements of the sample ACF deÞned by the equation
(17). Under the null hypothesis of all autocorrelations up to the lag k
being zero the Q-statistic is asymptotically Â2 distributed with k degrees
of freedom.(This holds only in the case that the time series was generated
by a stationary ARMA process.)
2. Ljung-Box Q-test
The Ljung-Box Q-test is based on the Ljung-Box Q-statistic deÞned as
Xk
½2i
b
Q = T (T + 2) , (20)
i=1
T −i
30
where b½i are as in the previous case the elements of the sample ACF
deÞned by the equation (17). Under the null hypothesis of all autocorre-
lations up to the lag k being zero the Q-statistic is Â2 distributed with k
degrees of freedom.(This holds only in the case that the time series was
generated by a stationary ARM A process.)
When we search for the appropriate number of lags of an ARM A(p; q) model,
we should compute the Q-statistics for k starting at 1 and continue until a
reasonably high k that should not be higher then T =4 which is a reasonable
upper bound veriÞed by practice. The choice of the upper k is a matter of
experience and also of the nature of the data, whose generating process we
want to model. The testing procedure is standard. If the critical value of
Â2k is exceeded by the appropriate Q-statistics, we can say that at least one
autocorrelation from the set {½i }ki=1 is signiÞcantly different from zero.
Unlike the analysis of the ACF and P ACF there is no straightforward algo-
rithm that assigns the most appropriate ARM A(p; q) model to various patterns
of Q-tests’ results. Both methodologies should be applied together and their re-
sults should not contradict each other. In practice, Q-tests are usually applied
rather to diagnose the residuals from the estimated model, than to discover the
correct number of lags p and q. The Diagnostics of residuals is described in the
next section.
31
4.1.5 Information Criteria
It can happen that several different ARMA(p; q) models seem to be appropriate
for our data. This occurs if the pattern of the sample ACF and P ACF can
be interpreted in several different ways and if the residuals from the several
different models are diagnosed to be a white noise. In such case we can use
information criteria to select the model that is the best. By the best we mean
the most parsimonious model that satisfactorily captures the dynamics of the
data.
The most common measure of goodness of Þt is the R2 . The problem with
the R2 is that it enables to compare only models of the same form and, moreover,
with the same number of explanatory variables. The R2 would never decrease
if we add one more variable in the model and in most cases it would increase.
So for example we can use the R2 only to compare linear models, which use the
same number of explanatory variables whereas these explanatory variables can
differ across the models. It is not exactly what we need in univariate time series
econometrics. Here the explanatory variables are given and so can not differ to
such an extent as in cross sectional econometrics. They are the lags of y in AR
models and the lags of " in MA models. In time series econometrics we usually
want to compare different ARM A(p; q) models, where the difference resides in
the number of lags p and q, thus in the number of explanatory variables. For
this comparison we must use the information criteria instead of the R2 .
Most frequently two information criteria are used. The Akaike information
criterion ( AIC) [7] that is deÞned as
32
number of observations goes to inÞnity the SBC will enable us to select the
true model. In plain words the SBC imposes a heavier penalty on overparame-
trized models. Hannan-Quinn information criterion is another commonly used
tool.REFERENCE NA H-Q SEM.
33
Table 1: The ACF and P ACF of different ARM A models.
34