0% found this document useful (0 votes)
121 views

Lecture Notes Part1

Uploaded by

kalkar
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views

Lecture Notes Part1

Uploaded by

kalkar
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Econometric Analysis of Time Series

ÿ
Evÿzen Koÿcenda & Alexandr Cerný
CERGE-EI, Prague
September 13, 2005

Contents
1 Introduction 3

2 Nature of Time Series 4


2.1 Description of Time Series . . . . . . . . . . . . . . . . . . . . . . 4
2.2 White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Transformations of Time Series . . . . . . . . . . . . . . . . . . . 7
2.5 Trend, Seasonal, and Irregular Pattern . . . . . . . . . . . . . . . 8
2.6 ARM A Models of Time Series . . . . . . . . . . . . . . . . . . . 9
2.7 Stylized Facts about Time Series . . . . . . . . . . . . . . . . . . 11

3 Difference Equations 14
3.1 Linear Difference Equations . . . . . . . . . . . . . . . . . . . . . 14
3.2 Lag Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Solution of Difference Equations . . . . . . . . . . . . . . . . . . 15
3.3.1 Particular Solution and Lag Operators . . . . . . . . . . . 16
3.3.2 Solution by Iteration . . . . . . . . . . . . . . . . . . . . . 17
3.3.3 Homogenous Solution . . . . . . . . . . . . . . . . . . . . 19
3.3.4 Particular Solution . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Stability Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Stability and Stationarity . . . . . . . . . . . . . . . . . . . . . . 22

4 Univariate Time Series 24


4.1 Estimation of an ARMA Model . . . . . . . . . . . . . . . . . . . 24
4.1.1 Autocorrelation Function — ACF . . . . . . . . . . . . . . 24
4.1.2 Partial Autocorrelation Function — P ACF . . . . . . . . . 27
4.1.3 Q-Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.4 Residuals’ Diagnostics . . . . . . . . . . . . . . . . . . . . 31
4.1.5 Information Criteria . . . . . . . . . . . . . . . . . . . . . 32
4.1.6 Box-Jenkins Methodology . . . . . . . . . . . . . . . . . . 33
4.2 Trend in Time Series . . . . . . . . . . . . . . . . . . . . . . . . . 34

1
4.2.1 Deterministic Trend . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Stochastic Trend . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.3 Stochastic plus Deterministic Trend . . . . . . . . . . . . 36
4.2.4 Final Notes on Trends in Time Series . . . . . . . . . . . 37
4.3 Seasonality in Time Series . . . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Removing Seasonal Pattern . . . . . . . . . . . . . . . . . 39
4.3.2 Estimating Seasonal Pattern . . . . . . . . . . . . . . . . 40
4.3.3 Detecting Seasonal Pattern . . . . . . . . . . . . . . . . . 41
4.4 Unit Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4.1 Dickey-Fuller Test . . . . . . . . . . . . . . . . . . . . . . 44
4.4.2 Augmented Dickey-Fuller Test . . . . . . . . . . . . . . . 46
4.4.3 Shortcomings of the Dickey-Fuller test . . . . . . . . . . . 48
4.4.4 KP SS test . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Structural Change and Unit Roots . . . . . . . . . . . . . . . . . 51
4.5.1 Perron’s Test . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5.2 Zivot and Andrews’ Test . . . . . . . . . . . . . . . . . . . 54
4.6 Detecting a Structural Change . . . . . . . . . . . . . . . . . . . 57
4.6.1 Vogelsang’s Test . . . . . . . . . . . . . . . . . . . . . . . 57
4.7 Conditional Heteroskedasticity . . . . . . . . . . . . . . . . . . . 62
4.7.1 Conditional and Unconditional Expectations . . . . . . . 63
4.7.2 ARCH Models . . . . . . . . . . . . . . . . . . . . . . . . 64
4.7.3 GARCH Models . . . . . . . . . . . . . . . . . . . . . . . 67
4.7.4 Detecting Conditional Heteroskedasticity . . . . . . . . . 69
4.7.5 How to Identify and Estimate a GARCH Model . . . . . 73
4.7.6 Extensions of ARCH Models . . . . . . . . . . . . . . . . 75

5 Multiple Time Series 77


5.1 Granger Causality . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 Error Correction Model . . . . . . . . . . . . . . . . . . . . . . . 77
5.4 Unit Root tests in Panel data . . . . . . . . . . . . . . . . . . . . 77
5.5 VAR Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

A Monte Carlo Simulations 80

B Statistical Tables 81

2
1 Introduction
The following Chapters aim to present basic tools of econometric analysis of
time series. The text is based on the material presented in a semester course
of time series econometrics given at CERGE-EI. The major stress of the course
is on practical applications of theoretical tools. Therefore we usually abstract
from the rigorous style of theorems and proofs. Rather we try to present the
material in a way that is naturally easy to understand. In many cases we rely
only on an intuitive explanation and understanding of studied phenomena. To
readers interested in more formal approach we recommend to read the appro-
priate references. Useful references for time series econometrics are [1] and [2].
The classical reference for general econometric issues is [3]. Many chapters of
this text are based and refer to inßuential papers, where also more detailed
presentation of the topic is available.
The text is divided into four major sections Nature of time series , Difference
equations, Univariate time series, and Multiple time series. The Þrst section
gives an introduction into time series analysis. The second section describes
in short the theory of difference equations with the emphasis on those results
that are important for time series econometrics. The third section presents the
methods commonly used in univariate time series analysis, thus in analysis of
time series of one single variable. The fourth section deals with time series
models of more interrelated variables.

3
2 Nature of Time Series
In general there are two types of data sets studied by econometrics, cross-
sectional data sets and time series. Cross-sectional data sets are data collected
in one time across several entities like countries, industries, companies etc. A
time series is any set of data ordered by time. Our lives pass in time, therefore
it is a common think for any variable to become a time series. Any variable that
is being registered periodically forms a time series. For example, yearly gross
domestic product (GDP) recorded over several years is a time series. Similarly
price level, unemployment, exchange rates of a currency, or proÞts of a Þrm can
form a time series, if recorded periodically over certain time span. The com-
bination of cross-sectional data and time series creates what economists call a
panel data set. Panel data sets can be studied by tools typical for panel data
econometrics or by tools characteristic for multiple time series analysis.
The fact that time series data are ordered by time implies their special prop-
erties and some special ways of analyzing them. It enables estimation of models
containing only one variable, the so-called univariate time series estimation. In
such case the value of a variable is estimated by its past values and eventually
by time as well. Because of the time ordering of data, issues of autocorrelation
gain a large importance in time series econometrics.

2.1 Description of Time Series


A set of data ordered by time {yt }Tt=1 forms a time series. The term time series
we use for three distinct but closely related objects: a series of random variables,
a series of data that are concrete realizations of these variables, and also for the
stochastic process that generates these data or random variables.

Example 1 The stochastic process that generates the time series can be for
example described as yt = 0:5yt−1 + "t (AR(1) process), where "t are normal
iid with mean 0 and variance ¾ 2 . With initial condition y0 = 0 the sequence of
random variables generated by this process is "1 ; 0:5"1 + "2 ; 0:25"1 + 0:5"2 + "3 ;
etc: Finally the concrete realizations of these random variables can be numbers
0:13882; 0:034936; −1:69767; etc: When we say that we estimate a time series, it
means that based on the data (concrete realizations) we estimate the underlying
process that generated the time series. The speciÞcation of the process is also
called the model.

For basic description of time series the following properties are deÞned: fre-
quency, time span, mean, variance, and covariance.

1. The frequency is related to the time difference between yt and yt+1 . The
data can be collected with yearly, quarterly, daily, or even greater fre-
quency. For example stock prices are recorded after any change (tick by
tick).

4
2. The time span is the period of time over which the data were collected. If
there are no gaps in the data, the time span is equivalent to the number
of observations T times the frequency. Throughout the text T is reserved
to indicate the sample size unless stated otherwise.
3. The mean ¹t is deÞned as ¹t = E (yt ). It means that the mean is deÞned
for each element of the time series, so that there are N such means.
h i
4. The variance is var (yt ) = E (yt − ¹t )2 . Similarly as by the mean, the
variance is deÞned for each element of the time series.
£ ¡ ¢¤
5. The covariance is cov (yt ; yt−s ) = E (yt − ¹t ) yt−s − ¹t−s . The covari-
ance is deÞned for each time t and for each time difference s, so that in the
general case there are t2 − t covariances; however, because of symmetry
only half of them can be different.

2.2 White Noise


White noise is a term frequently used in time series econometrics. As the name
suggests, white noise is a time series that does not contain any further infor-
mation that would help in estimation, except its variance, of course. Residuals
from correctly speciÞed , "true" model that captures fully the generating process
are white noise. In the future text, error process that is white noise will be usu-
ally denoted as "t . For example a series of identically distributed independent
random variables with 0 mean is a white noise itself, because except the variance
nothing else can be estimated by such time series.
If we estimate a time series using the correct ARM A model as described in
the sections 2.6 and 4.1, then the remaining unestimated part of the time series
(the errors) must be a white noise. Procedures used to test for a time series to
be a white noise are described in the section 4.1.4.

2.3 Stationarity
Stationarity is a crucial property of time series. Intuitively a time series must
be stationary, for us to be able to make some predictions of its future behavior.
Non-stationary time series are unpredictable in this sense, because they tend
to "explode". If a time series is stationary, then any shock that occurs in time
t has a diminishing effect over time and Þnally disappears in time t + s as
s → ∞. This feature is called mean reversion. With a non-stationary time
series it is not the case and the effect of the shock "explodes" over time. A
special case of non-stationary process is the so-called unit root process . With
a unit root process, the shock that occurred in time t doesn’t "explode", but
remains present in the same magnitude in all future dates. For more details on
stationarity, non-stationarity, and unit root processes you can see the sections
3.4, 3.5, and 4.4.
The most useful stationarity concept in econometrics is the concept of co-
variance stationarity. Throughout the text, we will for simplicity use usually

5
only the term stationarity and mean covariance stationarity. We say that a
T
time series {yt }t=1 is covariance stationary if and only if the following formal
conditions are satisÞed:

i) ¹t = ¹t−s = ¹ for all t; s.


ii) var (yt ) = var (yt−s ) = ¾ 2 for all t; s.
iii) cov (yt ; yt−s ) = cov (yt−j ; yt−j−s ) = ° s for all t; j;and s.

It means that a time series is covariance stationary, if its mean and variance
are constant and Þnite over time and if the covariance depends only on the time
distance s between the two elements of the time series but not on the time t
itself.
Note that the white noise introduced in the previous section is obviously
stationary. However, a stationary time series is not automatically a white noise,
because for a white noise we would need additional conditions that the mean
and all covariances are 0, which means that ¹ = 0 and ° s = 0 for all s.
Most economic time series are not stationary and some transformations are
needed in order to achieve stationarity. The most popular transformations are
described in the next section.

Stationary
Non-stationary
Unit root

Figure 1: Stationary, non-stationary and unit root time series.

Example 2 Figure 1 brings examples of stationary, non-stationary, and unit


root time series. All the three series were generated by AR(1) processes deÞned
as yt = a1 yt−1 + "t , where "t are normal iid with mean 0 and variance ¾2 = 9.
By such AR(1) processes, the necessary and sufficient condition for stationarity
is that |a1 | < 1. If |a1 | > 1, then the time series is non-stationary and if

6
|a1 | = 1, then the time series contains a unit root. The formal necessary and
sufficient conditions for time series stationarity will be described in the sections
3.4 and 3.5 The three time series were generated by the following processes:
stationary: yt = 0:6yt−1 + "t , a1 = 0:6 < 1
non-stationary: yt = 1:1yt−1 + "t , a1 = 1:1 > 1
unit root: yt = yt−1 + "t , a1 = 1
From the Þgure we can distinguish clear visual differences between the time
series. Stationary time series tends to return often to its initial value. Non-
stationary time series explodes after a while. Finally, a time series containing
unit root can resemble a stationary time series, but it does not return to its
initial value as often. These differences can be more or less pronounced on a
visual plot. Nevertheless, visual plot of the time series can not replace formal
tests of stationarity described in the sections 4.4 and 4.5.

2.4 Transformations of Time Series


In most cases some transformations of time series of economic data are neces-
sary before we can proceed with estimation. Usually we apply transformations
in order to achieve stationarity. However, sometimes it is natural to apply
transformations, because the transformed variable corresponds to that we are
actually interested in. Typical example is a macroeconomic variable in level
versus as a growth rate (e.g. prices versus inßation)
Depending on the type of transformation we must apply in order to make
a time series stationary, we call the time series to be difference stationary,
trend stationary, or broken trend stationary. Difference stationary time series
becomes stationary after differencing, trend stationary series after detrending,
and broken trend stationary series becomes stationary after detrending with a
structural change incorporated (more on the last topic will be introduced in the
section 4.5).
If the series must be differenced n times to become stationary, then such
series is called to be integrated of the order n, which we denote as I(n). Thus
a series that is stationary without any differencing can be also denoted as I(0).
Prior to differencing and detrending, the most common transformation is to
take a natural logarithm of the data to remove non-linearity. For example, if
we are interested in growth rates, it is natural to apply logarithmic differencing,
which means that we Þrst take natural logarithms of the data and then difference
them.
1. Taking a natural logarithm is applied when the data perform exponential
growth, which is a common case in economics. For example, if a GDP of a
country grows each year roughly by 3%, then the time series of yearly GDP
performs exponential growth. In such case by taking a natural logarithm
we receive data that grow linearly.
2. Differencing is a general procedure that is usually applied in order to
achieve stationarity. To difference a time series, we apply a transforma-
tion ∆yt = yt − yt−1 , where ∆yt is the so-called Þrst difference. To obtain

7
second differences ∆2 yt we apply the identical transformation on Þrst dif-
ferences ∆2 yt = ∆yt − ∆yt−1 . In this way we can create differences of
even higher orders. Although any time series becomes stationary after
sufficient order of differencing, differencing of higher than second order is
almost never used in econometrics. The reason is that by each differencing
we lose one observation and, more important, by each differencing we loose
part of information contained in the data. Moreover, higher order differ-
ences have no clear interpretation. Second differences are already linear
growth rates of the linear growth rates obtained by Þrst differencing. Such
variables are obviously not very interesting for applied economic research.
3. Detrending is a procedure that removes linear or even higher order trend
from the data. To detrend a time series, we run a regression of the series
on time t or its higher powers as well and then we substract the estimated
values from the original time series. The degree of the time polynomial
included in the regression can be formaly tested by the F -test prior to
detrending. Trending time series are never stationary, because their mean
is not constant. Therefore detrending also helps to make such time series
stationary. However, differencing is generally more successful in achieving
this goal and therefore it is also more commonly used. More details about
trends in time series will be given in the next section and in the section
4.2.

Example 3 Usually the economic data grow exponentially. It means that for
a variable X we have a growth equation Xt = (1 + gt )Xt−1 in discrete case,
or Xt = Xt−1 egt in continuous case, where gt is in between two successive
periods growth rate or the rate of return depending on the nature of X. By
logarithmic differencing we get ln Xt − ln Xt−1 = ln(1 + gt ) ≈ gt in discrete case
or ln Xt − ln Xt−1 = gt in continuous case.
SpeciÞcally, let us consider a time series of price levels {Pt }Tt=1 . By loga-
rithmic differencing we receive the series of inßation rates it = ln Pt − ln Pt−1 .
The above mentioned differences were always just differences between two
successive periods. If the data perform some seasonal pattern, it is more fruitful
to apply differences between the seasonal periods. For example, with monthly
data we can apply 12th seasonal logarithmic differencing ln Xt − ln Xt−12 . Such
procedure removes the seasonal pattern from the data and also decreases the
variance of the series (if the seasonal pattern has a period of 12 months). More
about seasonal patterns will be mentioned in the next section and in the section
4.3.

2.5 Trend, Seasonal, and Irregular Pattern


A general time series can consist of three basic components, the deterministic
trend, the seasonal pattern and the irregular pattern. Our task by estimation
and forecasting is to decompose the series into these three components. The
series can be then written as:

8
yt = Tt + St + It . (1)

Pn trend Tt can be generally described with a trend poly-


1. The deterministic
nomial Tt = i=0 ai ti . Usually we will deal only with linear or quadratic
trends (n = 1 or 2). If the series grows exponentially, then we must take
a natural logarithm in order to transform an exponential into a linear
growth. How to estimate and remove the trend from a time series is de-
scribed in the section 4.2. Aside deterministic trends the section 4.2 deals
also with the so-called stochastic trends. However, the stochastic trends
can be rather viewed as a part of the irregular pattern.
2. The seasonal pattern St can be described as St = c sin(t2¼=d), where d is
the period of the seasonal pattern. For example if we look at the monthly
number of visitors to some see resort, then the period of the seasonal
pattern of such series would be very likely 12 months. Another way how
to describe the seasonal pattern is to incorporate it into the irregular
pattern. The issue of seasonality will be yet treated in the section 4.3.
3. The irregular pattern It can be expressed by a general ARM A model,
as described in the following section 2.6. In fact, most of the following
sections as well as most of univariate time series econometrics deals par-
ticularly with the estimation of irregular patterns.

Example 4 In the Þgure 2 the decomposition of a time series into trend, sea-
sonal, and irregular pattern is shown. The time series on the picture consists
of
the trend Tt = 2 + 0:3t,
the seasonal pattern St = 4 sin(t2¼=6), and
the irregular pattern It = 0:7It−1 + "t , where "t are normal i.i.d.
with 0 mean and variance ¾2 = 9.

2.6 ARMA Models of Time Series


ARMA models are the most common processes used to estimate stationary ir-
regular or eventually also seasonal patterns of time series. The abbreviation
ARMA stands for autoregressive moving average, which is not surprisingly a
combination of autoregressive and moving average models. The individual mod-
els and their combination are described in the following list.

1. Autoregressive process of the order p, AR(p), is described with the equa-


tion p
X
yt = a0 + ai yt−i + "t . (2)
i=1

9
Series
Trend
Seasonal
Irregular

Figure 2: Decomposition of a time series into trend, seasonal, and irregular


pattern.

2. Moving average process of the order q, M A(q), is described with the equa-
tion q
X
yt = ¯ i "t−i . (3)
i=0

3. Autoregressive moving average process of the orders p and q, ARMA(p; q),


is described with the equation
p
X q
X
yt = a0 + ai yt−i + ¯ i "t−i . (4)
i=1 i=0

The coefficient ¯ 0 in the equations (3) and (4) is typically normalized to 1.


Note that there is no trend in the equations above. It is the case, because we
estimate the irregular (or eventually seasonal patterns) only. Therefore, we use
already detrended, or differenced, or logarithmic differenced time series.
The estimation of a time series with an ARMA model makes sense only if
the series is stationary. However, we can specify an ARM A process that gen-
erates non-stationary time series, as we did in the example 2. In this example
non-stationary and unit root containing time series were generated by AR(1)
processes with a1 = 1:1 and a1 = 1:0 respectively. In fact, there are necessary
and sufficient conditions on the coefficients ai and ¯ i ensuring that the gener-
ated series is stationary. These conditions will be described in the section 3,
which deals with difference equations. These conditions are also crucial for the
construction of stationarity tests, which are described in the sections 4.4 and
4.5.

10
If the series was differenced n times, in order to make it stationary, and Þrst
then estimated by an ARM A(p; q) model, then we say sometimes that it was
estimated by an ARIM A(p; n; q) model. The I inserted in the abbreviation
ARMA and the n in the parentheses stand for integrated of the order n, as was
pointed out earlier.

2.7 Stylized Facts about Time Series


We mentioned already that time series of economic variables usually have some
properties that are endemic for economic data. In the following list we enumer-
ate these properties systematically and make references to the chapters, where
the related topics will be studied in more details.

1. The trend
Economic time series contain very often a clear trend. In fact, their growth
is usually not only linear but exponential, as it was mentioned in the ex-
ample 3. It means that even after we take a natural logarithm of the data,
still a clear linear trend persists. Such behavior is typical for variables
that naturally grow over time. Typical examples are GDP, price level,
consumption, prices of stocks etc. When we estimate such series, we usu-
aly apply logarithmic differencing, which yields in between two desired
periods growth rate, or the rate of return. Fist logarithmic differencing
may yield for example a yearly growth rate of aggregate output or daily
rate of return on a Þnancial instrument or product. If such growth rates or
rates of return are stationary, then we can estimate them with an ARM A
model. If they are not stationary, then we can make them stationary
by further differencing and only after that estimate them by an ARM A
model. The most common transformations of time series that are applied
in order to remove the trend and to achieve stationarity were described in
the section 2.4. The trend component of time series was already mentioned
in the section 2.5. More about trends will be yet given in the section 4.2.
2. Trend breaks and structural changes
To make things more complicated, the trend mentioned in the previous
point is usually not constant over time. For example GDP can grow
roughly by 4 % for 10 years and roughly by 2 % for the next 10 years.
The same case can happen for inßation or with stock prices. We say
that such time series incures a structural change or containes a structural
break. Moreover, structural change can involve not only change in the
trend coefficient (in the growth rate) but also in the intercept. Structural
changes in time series are usually caused by some real historical or eco-
nomic events. For example the oil shocks in the 70’s were followed by a
signiÞcant slowdown of economic growth of most industrialized countries.
The issue of trend breaks and structural changes will be studied in the
sections 4.5 and 4.6.

11
3. The mean running up and down
Some time series, like for example exchange rates, do not show any per-
sistent tendency to increase or decrease. On the other hand, they do not
return to their initial value very often either. Rather they alternate be-
tween relatively long periods of increases and decreases. As it was already
suggested in the example 2, such behavior is typical for series that contain
a unit root. The simplest process containing a unit root is the so-called
"random walk". It is deÞned by the equation yt = yt−1 + "t , where "t is
white noise. Our suspicion that exchange rates and rates of return will
behave as random walk has a noble counterpart in the economic theory,
namely in the information efficiency hypothesis. This hypothesis is de-
scribed in the example 5. More about unit roots and random walks will
be given in the sections 4.2 and 4.4.
4. High persistence of shocks
This observation is based on the fact that any shock that occurs at time t
has typically a long persistence in economic time series. Again it is related
to the fact that the underlying data generating processes are either non-
stationary or close to unit root processes. In such case the coefficients
ai and ¯ i in the ARM A models are relatively high. Therefore any past
shock is being transferred to the future dates in relatively large magnitude.
Also this point has a close relation to the point two on trend breaks and
structural changes. If the shock has high persistence, it can appear as a
structural change in the data.
5. Volatility is not constant
Especially in the case of data generated on Þnancial markets (e.g. stock
prices), we can observe periods of high and low volatility. Nevertheless,
this behavior can be detected also by GDP or price levels. Time series
with such properties are called to be conditionally heteroskedastic and
are usually estimated by the so-called ARCH (autoregressive conditional
heteroskedasticity) and GARCH (generalized ARCH) models, which will
be described in the section 4.7.
6. Non-stationarity
All the previous Þve points have one common consequence. Time series
of economic data are in most cases non-stationary. Therefore some trans-
formations are usually needed in order to make them stationary. These
transformations were already described in the section 2.4. The formal
tests of stationarity will be given in the sections 4.4 and 4.5.
7. Comovements in multiple time series
Some time series can share comovements with other time series. This oc-
curs for example, when shocks to one series are correlated with shocks to
other series. In today’s open world, where national economies are closely

12
linked by many channels (e.g. international trade, international invest-
ment, foreign exchange markets etc.), such behavior is not surprising. If
the comovement is a result of some long term equilibrium towards which
the two series tend to return after each shock, then such series are called to
be cointegrated. More on cointegration and multiple time series comove-
ments will be given in the section 5.

Example 5 Hypothesis of the information efficiency of the foreign exchange


markets
Information efficiency hypothesis assumes that prices on the markets reßect
all available information. To prevent any arbitrage oportunities under such con-
ditions it must hold that today’s expectation of tomorrow’s exchange rate equals
today’s exchange rate. Such a statement can be written as Et (yt+1 | Ωt ) = yt ,
where Ωt stands for the information available at time t and yt for an exchange
rate at time t. The reader can easily verify that random walk Þts this condi-
tion. However, there exist more complicated processes, like some ARCH and
GARCH processes, that can also Þt this hypothesis. In general such processes
are called the martingales.

13
3 Difference Equations
All the equations used to describe the time series data generating processes in the
previous sections are in mathematical terminology called difference equations.
Therefore the theory of difference equations constitutes the basic mathematical
background for time series econometrics. In this section we will brießy introduce
major results of this theory. Mainly we will focus on those results that are
important for econometric time series analysis. From this point of view, the
stability conditions and the relation between the stability of a difference equation
and the stationarity of a time series are crucial (see the sections 3.4 and 3.5).
Much more detailed presentation of the topic can be found in [1].

3.1 Linear Difference Equations


It was just stated that all the equations used to model time series data generat-
ing processes in the previous chapters are the so-called difference equations. To
be more precise, we should say that they all belong to the subset of difference
equations called linear difference equations. This is for example the case with
the equations (2), (3), and (4) modeling the AR, M A, and ARM A processes
respectively. They are difference equations, because they contain current and
lagged values of variables. They are linear difference equations, because these
variables are raised only to the Þrst power, which means that certain linear
combination of present and lagged values is contained in the equation. More-
over, they are linear stochastic difference equations, because they contain some
stochastic component — some random variables.
Formally an nth-order linear difference equation can be written as
n
X
yt = ao + ai yt−i + xt , (5)
i=1

where xt is the so-called forcing process that can be any function of time t,
current and lagged values of variables other than y, and if the linear difference
equation should deserve the attribute stochastic, it should be also a function of
stochastic variables (e.g. of stochastic disturbances).

3.2 Lag Operator


Lag operator represents a useful tool that helps to write down difference equa-
tions, to adjust them, and to express their solutions in a simple and compact
way. The lag operator L is a linear operator that applied to a variable y returns
its lagged value:

Lyt = yt−1 ; or for higher lags Li yt = yt−i


The equation (4) describing an ARMA(p; q) process can be written with the
use of lag operators as

14
A(L)yt = a0 + B(L)"t , (6)
where A(L) and B(L) are the following polynomials of L:

A(L) = 1 − a1 L − a2 L2 − ::: − ap Lp and B(L) = ¯ 0 + ¯ 1 L + ¯ 2 L2 + ::: + ¯ q Lq .

Although L is an operator, most of its algebraic properties are analogous


to those of a simple variable. Major properties of the lag operator L are listed
below :

1. L applied to a constant returns the constant: Lc = c


2. Distributive law holds: (Li + Lj )yt = Li yt + Lj yt = yt−i + yt−j
3. Associative law for multiplication holds : Li Lj yt = Li+j yt = yt−j−i
4. L raised to a negative power is a lead operator: L−i yt = yt+i
5. From 2 and 3 follows that for |a| < 1, (1 + aL + a2 L2 + a3 L3 + :::)yt =
yt =(1 − aL)

3.3 Solution of Difference Equations


To solve the general linear difference equation (5) means to express the value
of yt as a function of the elements of the forcing process sequence {xt }, time
t, and possibly given elements of the sequence {yt } called initial conditions.
The solution process is quite similar to the process of solving linear differential
equations. In general, Þnding the solution involves the following steps:

1. All the so-called homogeneous solutions of the homogeneous part of the


linear difference equation must be found. The homogeneous part of the
equation (5) is deÞned as

n
X n
X
yt = ai yt−i or rewritten yt − ai yt−i = 0, (7)
i=1 i=1

which means that the constant a0 and the forcing process xt from the
original equation are left out. Such homogeneous equation has n linearly
independent solutions that we will denote as Hi (t). The homogeneous
solutions are functions of time t only. Any linear combination of the
homogeneous solutions Hi (t) is also a solution to the equation (7).
2. One solution of the whole linear difference equation (5) must be found.
Such solution is called the particular solution. Lets denote it as P ({xt } ; t).
The particular solution can be a function of time t and the elements of
the forcing process {xt }.

15
3. The general solution of the linear difference equation is any linear combi-
nation of the n homogeneous solutions Hi (t) plus the particular solution
P ({xt } ; t). Lets denote the general solution as G({xt } ; t). It can be
written in the following way:

n
X
G({xt } ; t) = Ci Hi (t) + P ({xt } ; t), (8)
i=1

where Ci are arbitrary constants of the linear combination of the homo-


geneous solution. Clearly there are inÞnitely many general solutions as
there is inÞnity of such constants.
4. If the initial conditions for values of y at the initial periods are speciÞed,
then the arbitrary constants can be eliminated by imposing these condi-
tions on the general solution. Ideally, if the initial values are known for n
initial time periods (y0 ; y1 ; :::; yn−1 ), then all the arbitrary constants can
be eliminated and we get one unique solution.

Before we describe the general process of Þnding homogeneous, particular,


and unique solutions, we will introduce some more intuitive simple methods of
expressing particular solutions in terms of lag operators and of Þnding particular
and unique solutions by iteration.

3.3.1 Particular Solution and Lag Operators


It was already mentioned that the lag operator with its simple algebraic prop-
erties can be used to express particular solutions of difference equations in a
simpleP way. We have P just shown that the general ARM A(p; q) model yt =
a0 + pi=1 ai yt−i + qi=0 ¯ i "t−i can be rewritten as the equation (6), A(L)yt =
a0 + B(L)"t , where A(L) = 1 − a1 L − a2 L2 − ::: − ap Lp and B(L) = ¯ 0 +
¯ 1 L + ¯ 2 L2 + ::: + ¯ q Lq . The particular solution to this equation can be simply
expressed as
a0 B(L)
yt = + "t . (9)
A(L) A(L)
However, such solution does not tell us anything about the concrete coeffi-
cients in front of ao and in front of the sequence of shocks {"t }, because A(L)
1

and B(L)
A(L) are not numbers but operators. To get the coefficients of the partic-
ular solution we must solve the polynomials A(L) and B(L) with respect to L,
which means to Þnd their characteristic roots. Such procedure is shown for the
simplest case of an AR(1) process in the following example 6. Moreover, the
equation (9) is only a particular solution of the ARM A(p; q) equation, which
is not unique. Lag operators can not be used to express homogeneous solutions
and so can they neither be used to express general solutions.

Example 6 AR(1) process yt = a0 + a1 yt−1 + "t can be written as A(L)yt =


a0 + "t , where A(L) = (1 − a1 L). The polynomial A(L) is of the Þrst order only

16
and so has just one characteristic root, which is moreover equal to 1=a1 . The
solution is yt = a0 =(1 − a1 L) + "t =(1 − a1 L).
Now consider only the case of |a1 | < 1. Using the properties of the lag
operator, the solution can be written as yt = (1 + a1 L + a21 L2 + :::)a0 + (1 +
a1 L + a21 L2 + :::)"t . Finally application of the lag operator yields yt = (a0 +
a1 a0 + a21 a0 + :::) + ("t + a1 "t−1 + a21 "t−2 + :::), which can be simpliÞed as

X
yt = a0 =(1 − a1 ) + ai1 "t−i . (10)
i=0

3.3.2 Solution by Iteration


Simple difference equations can be solved by iteration. This methodology again
enables to Þnd only particular solutions. However, if we know the initial con-
ditions (initial values of y), then we can iterate from these to get the unique
solution.
Again we will demonstrate the solution procedure on the simple case of
an AR(1) process yt = a0 + a1 yt−1 + "t . The equation must hold also for
yt−1 , yt−2 , etc. Therefore we can substitute into the AR(1) equation yt−1 =
a0 + a1 yt−2 + "t−1 , and yt−2 = a0 + a1 yt−3 + "t−2 , and continue until inÞnity,
by which we get
yt = a0 + a1 (a0 + a1 (a0 + a1 (:::) + "t−2 ) + "t−1 ) + "t = (a0 + ao a1 + a0 a21 +
:::) + ("t + a1 "t−1 + a21 "t−2 + :::), which can be written as

X ∞
X
yt = a0 ai1 + ai1 "t−i .
i=0 i=0

If |a1 | ≥ 1, then the solution diverges.


If |a1 | < 1, then we can compute the Þrst sum and get the result expressed
by the equation (10), thus the same result, we got using lag operators in the
example (6). Again we have only a particular solution.
The iteration methodology does not enable to get homogeneous solutions of
the homogeneous part of an AR(1) equation yt = a1 yt−1 . For the moment, you
can verify that the homogeneous solution is yt = Cat1 , where C is an arbitrary
constant. If we combine the particular solution described by the equation (10)
with this homogeneous solution, then we get the general solution of the AR(1)
equation for |a1 | < 1:

X
yt = Cat1 + a0 =(1 − a1 ) + ai1 "t−i . (11)
i=0
Knowing the initial condition yo we can substitute it into the solution and
eliminate the arbitrary constant C:

X ∞
X
i
y0 = C + a0 =(1 − a1 ) + a1 "−i , thus C = y0 − a0 =(1 − a1 ) − ai1 "−i .
i=0 i=0
Now we can insert the constant C into the general solution and get the unique
solution:

17

X ∞
X
yt = (y0 − a0 =(1 − a1 ) − ai1 "−i )at1 + a0 =(1 − a1 ) + ai1 "t−i , which can
i=0 i=0
be further simpliÞed as
t−1
X
yt = (y0 − a0 =(1 − a1 ))at1 + a0 =(1 − a1 ) + ai1 "t−i . (12)
i=0

This procedure, that led us from a particular solution obtained by iteration


to the general and unique solutions, could obviously be used also in the case of
the particular solution obtained by application of lag operators. Nevertheless,
the iteration methodology is a bit more powerful in the sense that we can get
the unique solution directly by iteration from the initial condition y0 . If we have
an AR(1) process yt = a0 + a1 yt−1 + "t , it must hold that
y1 = a0 + a1 y0 + "1 ,
y2 = a0 + a1 y1 + "2 = a0 + a1 (a0 + a1 y0 + "1 ) + "2 ,
y3 = a0 + a1 y2 + "3 = a0 + a1 (a0 + a1 (a0 + a1 y0 + "1 ) + "2 ) + "3 , etc. until
we get
t−1
X t−1
X
yt = a0 ai1 + at1 y0 + ai1 "t−i . In the case of |a1 | < 1 we can compute
i=0 i=0
the Þrst sum and get
t−1
X
yt = a0 =(1 − a1 ) − at1 a0 =(1 − a1 ) + at1 y0 + ai1 "t−i ,
i=0

which is identical to the solution expressed by the equation (12).


Note that in the case of the particular solution obtained by iteration, we
iterated backward. In this way we got the so-called backward looking solution.
An alternative way is to iterate forward and to get forward looking solution.
Forward looking solutions are functions of future realizations of the shock "t ,
which is not very useful or applicable in time series econometrics. However,
such solutions are often used in rational expectations models.
We have shown how to solve difference equations using the iteration method
and the method of lag operators. It is straightforward that these methods can
be applied only in the case of simplest difference equations. In more complicated
cases, like is a general ARM A(p; q) equation, a general solution methodology as
described in the following two sections must be applied. Also, the two methods
did not lead us to homogeneous solutions. They enabled to get only the partic-
ular solution and in the case of the iteration method combined with the initial
condition also the unique solution. If we want to get homogeneous solutions
and hence also the general solution, again we must follow the general procedure
described in the following two sections.

18
3.3.3 Homogenous Solution
Here we will describe the procedure that yields homogeneous solutions of a
general nth-order linear homogeneous difference equation (7):
n
X n
X
yt = ai yt−i or rewritten as yt − ai yt−i = 0,
i=1 i=1

which can be also written in terms of lag operators as


n
X
A(L)yt = 0, where A(L) = 1 − a1 L − a2 L2 − ::: − an Ln = 1 − ai Li . (13)
i=1

Notice that this homogeneous equation is a homogeneous part of an AR(n)


equation but also of an ARM A(n; q) and in general of any nth-order linear
difference equation.
In the previous section we have just seen that the homogeneous solution to
the homogeneous part of an AR(1) equation takes the form of at1 . Knowing
this, we will try to Þnd solutions to the general homogeneous equation (7) in
the form of ®t . Notice that there will be n independent homogeneous solutions.
By substituting ®t to the homogeneous equation (7) and dividing the whole
equation by ®t−n we get
n
X
®n − a1 ®n−1 − a2 ®n−2 − ::: − an = an − ai ®n−i = 0, (14)
i=1

which is the so-called characteristic equation.


Notice that if we divide the characteristic equation by ®n and substitute L
for 1=®, we will get the equation

1 − a1 L − a2 L2 − ::: − an Ln = A(L) = 0, (15)

where A(L) is the polynomial of the lag operator from the equation (13). The
equation (15) is called inverse characteristic equation and the L’s that solve it
are inverse values of the ®’s that solve the characteristic equation (14).
Now, we search for ®’s that solve the characteristic equation (14). It is a
n
polynomial equation that will have n characteristic roots {®i }i=1 . In general
some of the roots can be multiple and some can be complex. We will distin-
guish these cases in the following list and assign one homogeneous solution to
each of the roots, so that we get n linearly independent homogeneous solutions
n
{Hi (t)}i=1 .

1. The root ®j is real and unique.


In this simplest case the corresponding homogeneous solution is indeed
Hj (t) = ®tj .

19
2. The root ®j is real and multiple.
So we have k identical roots ®j , where k is the multiplicity. In this case the
corresponding k linearly independent homogeneous solutions are Hj (t) =
®tj , Hj+1 (t) = t®tj , Hj+2 (t) = t2 ®tj , ..., Hj+k−1 (t) = tk−1 ®tj .
3. The root ®j is complex.
Such root will necessarily come in a conjugate complex pair witch can
be written as ° j ± iµ j . The two corresponding homogeneous solutions
will be also complex and will take the form of Hj (t) = (° j + iµj )t and
Hj+1 (t) = (° j − iµj )t .

So for each of the n roots ®i solving the characteristic equation (14) we


got a homogeneous solution Hi (t) solving the linear homogeneous difference
equation (7). Moreover, these solutions are linearly independent and their linear
combinations are also solutions to the homogeneous difference equation (7).

3.3.4 Particular Solution


We have already shown how to Þnd particular solutions by the iteration method
and how to express such solution easily by means of lag operators. Unfortu-
nately, there is no general cookbook style procedure leading to the particular
solution. The choice of the right way is often led by intuition and depends on
the nature of the forcing process {xt }.
For example, if all the elements of {xt } equal to 0, then the nth-order linear
difference equation (5) reduces to
n
X
yt = ao + ai yt−i
i=1

and it seems reasonable to try to Þnd the particular solution in the form of
aPconstant yt = c. Indeed, substituing this P in the equation leads to c = ao +
n n
i=1 ai c, which can be solved as c
Pn = ao =(1− i=1 ai ). So we easily P
obtained the
n
particular
Pn solution y t = ao =(1− i=1 a i ). This is possible only if 1− i=1 ai 6= 0.
If 1 − i=1 ai = 0, then the particular solutionP takes the form yt = ct. It can
n
be shown analogically that in this case c = a0 =( i=1 iai ).
The method that probably resembles the most the general cookbook style
is called the method of undetermined coefficients. Using this method we proÞt
from the fact that the particular solution of a linear difference equation must be
also linear. Then we suppose the particular solution to be a linear combination
of a constant c, time t, and the elements of the forcing process {xt }, because we
know that there is hardly anything else it could depend on. Then we substitute
this so-called challenge solution into the difference equation and solve for the
constants of the linear combination. Even though it sounds simple, the practical
application may get cumbersome.

20
3.4 Stability Conditions
Stability of homogeneous linear difference equations is closely linked to the con-
cept of stationarity of time series. In this sense, stability conditions represent a
result of the theory of difference equations that has the greatest importance for
econometric analysis of time series. If a linear homogeneous difference equation
is stable, then its solution converges to zero as t → ∞. If it is unstable, then its
solution diverges.
Stability of the homogeneous part of a linear difference equation that de-
scribes the time series generating process is the necessary condition for the
time series stationarity. In the section 3.3.3 we have already mentioned that a
general nth-order linear homogeneous difference equation (7) forms the homoge-
neous part of a difference equation describing any general ARM A(n; q) process.
That is why the stability conditions can be applied as necessary conditions for
stationarity of a wide range of time series, namely of any time series that was
generated by a general ARMA(n; q) process.
In the section 3.3.2 we have seen that the homogeneous solution of the ho-
mogeneous part of an AR(1) equation yt = a1 yt−1 (note that it is also the
homogeneous part of any ARM A(1; q) equation) takes the form of yt = Cat1 ,
because a1 is the only characteristic root of the corresponding characteristic
equation. Obviously such solution is stable and converges to zero if |a1 | < 1 and
is unstable and diverges if |a1 | > 1. If a1 = 1, then the solution remains yt = C
forever and we say similarly as in the time series context that the equation
contains a unit root.
For a Þrst order homogeneous linear difference equation yt = a1 yt−1 we can
summarize that

1. If |a1 | < 1, then the equation and its solution are stable.
2. If |a1 | > 1, then the equation and its solution are unstable.
3. If |a1 | = 1, then the equation is unstable and contains a unit root.

Similarly we can describe the stability conditions for a general nth-order


linear homogeneous difference equation (7):
n
X
yt = ai yt−i ,
i=1

whose corresponding characteristic equation is the equation 14:

®n − a1 ®n−1 − a2 ®n−2 − ::: − an = 0.

Here we conclude that


n
1. If all characteristic roots {®i }i=1 lie within the unit circle, that is |®i | < 1
for all i, then the equation and its solution are stable.

21
2. If at least one characteristic root ®i lies outside the unit circle, that is
|®i | > 1, then the equation and its solution are unstable.
3. If at least one characteristic root ®i lies on the unit circle, that is |®i | = 1,
then the equation is unstable and contains a unit root.

Sometimes the stability conditions are described in terms of the roots of


the inverse characteristic equation (15). The roots of the inverse characteristic
equation are inverse values of the roots of characteristic equation. Therefore
the stability conditions require the opposite: that all the inverse characteristic
roots lie outside the unit circle.
It is often difficult to compute all the characteristic roots. Fortunately there
are some necessary and sufficient conditions for stability. These conditions are
expressed directly in terms of the coefficients ai .
Pn
1. i=1 ai < 1 is a necessary condition for stability.
Pn
2. i=1 |ai | < 1 is a sufficient condition for stability.
Pn
3. If i=1 ai = 1, then the difference equation contains a unit root, which
means that at least one of the characteristic roots equals unity. This
condition is used by testing for unit roots in time series. Such tests are
described in the sections 4.4 and 4.5.

3.5 Stability and Stationarity


In the previous section we have just stated that the necessary condition for
stationarity of a time series generated by a general ARM A(n; q) process is
stability of the corresponding nth-order linear homogeneous difference equation
(7). However, it is only a necessary condition. Except its homogeneous part
the general ARM A(n; q) equation contains also the forcing process xt , which is
described as

— xt = a0 + "t in the case of an AR(n) process,


P
— xt = qi=0 ¯ i "t−i in the case of a M A(q) (here the forcing process is the
whole right side of the M A(q) equation),
Pq
— xt = a0 + i=0 ¯ i "t−i in the case of a general ARMA(n; q) process.

The particular solution associated with this forcing process can cause the
time series to be non-stationary, even if the homogeneous solution is stable.
This is not the case of any AR process and of any Þnite MA process, where
the stability of the corresponding homogeneous equation is not only necessary
but also sufficient condition for stationarity. However, it can be the case of an
inÞnite MA process. So in the case of a M A process some additional conditions
for stationarity are needed. We summarize the results as follows:

22
Pn
1. AR(n) process yt = a0 + i=1 ai yt−i + "t
Stability of the corresponding nth-order linear homogeneous difference
equation is necessary and sufficient condition for stationarity of the gen-
erated time series.
Pq
2. M A(q) process yt = i=0 ¯ i "t−i
Here the corresponding homogeneous difference equation is yt = 0, which
is obviously stable. However, the forcing process and with it associ-
ated particular solution can cause the generated time series to be non-
stationary.

(a) If q is Þnite, then the generated time series will be stationary.


(b) If q is inÞnite, then the necessary and sufficient condition for station-
arity of the generated time series is that the sums (¯ s + ¯ 1 ¯ s+1 +
¯ 2 ¯ s+2 + :::) are Þnite for all s. Note that in the case of a Þnite M A
process these sums are obviously Þnite, because they contain a Þnite
number of summands.
Pn Pq
3. ARM A(n; q) process yt = a0 + i=1 ai yt−i + i=0 ¯ i "t−i
Here we can combine the two previous cases and get two necessary and
sufficient conditions for stationarity of the generated time series:

(a) The corresponding nth-order linear homogeneous difference equation


must be stable.
(b) The sums (¯ s + ¯ 1 ¯ s+1 + ¯ 2 ¯ s+2 + :::) must be Þnite for all s.

Using the above conditions we have related the econometric stationarity con-
cept with the mathematical concept of stability of linear homogeneous difference
equations. However, one more condition for the stationarity of the generated
time series is needed. Stationarity of a time series requires that the mean,
variance, and covariances are constant. Even if the homogeneous part of the
difference equation generating the time series is stable, its homogeneous solu-
tion is not constant. It only converges to zero. It means that it will be constant
(equal to zero) only after a sufficiently long time t. Therefore the above listed
necessary and sufficient conditions hold only for sufficiently high time periods,
that is for time periods that are sufficiently distant from the initial period t = 0.

23
4 Univariate Time Series
4.1 Estimation of an ARMA Model
As mentioned earlier, an ARMA model is a standard building block of the
time series econometrics. In this section we will describe how to estimate the
time series’ true data generating process with an ARMA model. We will follow
the so-called Box-Jenkins methodology [4]. The aim of the methodology is to
Þnd the most parsimonious model of the data generating process. By the most
parsimonious we mean such model that Þts the data well and at the same
time uses a minimum number of parameters; thus it leaves a high number of
degrees of freedom. The search for parsimony is common for all branches of
econometrics, not only for time series analysis. In fact it is the general principal
of estimation. In estimation, our aim is not to achieve a perfect Þt, but to achieve
a reasonable Þt with few parameters. A perfect Þt can be always achieved
trivially, if we use as many parameters as data points. However, such extremely
overparametrized model tells us nothing about the nature of the events and
decisions that generated the data.
The Box-Jenkins methodology can be divided into three main stages. The
Þrst is to identify the data generating process, the second is to estimate the
parameters of this process, and the third is to diagnose the residuals from the
estimated model. If the process was identiÞed and estimated correctly, then
such residuals should be diagnosed as a white noise.
Most of the tools and procedures of the Box-Jenkins methodology require
the time series to be stationary and the estimated process to be invertible. The
concept of stationarity was already explained in the section 2.3. The concept
of invertibility will be explained later in the subsection 4.1.2. In short it means
that the process can be represented by a Þnite-order or convergent autoregressive
process.
In the following subsections we will Þrst describe the tools needed for the
application of the Box-Jenkins methodology and after that we will clarify the
sequence and logic in which they should be applied.

4.1.1 Autocorrelation Function — ACF


Basic tools of the Box-Jenkins methodology are autocorrelation and partial au-
tocorrelation functions (ACF and P ACF ). These functions help to identify the
parameters p and q of the ARMA(p; q) data generating process. That is the
number of AR and MA lags that should be included in the model. We will start
with the description of the ACF .
We will denote the ACF of a time series {yt } as ½s , where s is the appropriate
lag. The ACF is deÞned simply as

cov(yt ; yt−s )
½s = . (16)
var(yt )
Because we consider stationary time series, the ACF expressed by the equation

24
(16) is only a function of the time difference s and not of the time t itself.
Remember that one of the stationarity conditions is the independence of var(yt )
and cov(yt ; yt−s ) on the time t. Independence of the ACF on time then follows
from its construction. Stationarity of {yt } also ensures that the ACF is equal
to corr(yt ; yt−s
p). Because var(yt ) = var(yt−s ), we can write corr(yt ; yt−s ) =
cov(yt ; yt−s )= var(yt )var(yt−s ) = cov(yt ; yt−s )=var(yt ) = ½s .
In the example 7 we computed for illustration the theoretical ACF of AR(1)
and M A(1) processes and we got the following results:

1. AR(1) process yt = a0 + a1 yt−1 + "t , where |a1 | < 1 (sufficient and neces-
sary condition for stationarity of such process). Then the ACF is

½s = as1 .

The ACF performs a direct exponential or oscillating decay, because


|a1 | < 1. The decay is direct if 1 > a1 > 0 and oscillating if −1 < a1 < 0.
It can be shown that the same behavior holds for any stationary AR(p)
process. The ACF decays, while the decay might be direct or oscillating
2. M A(1) process yt = "t + ¯ 1 "t−1 (such process is always stationary). Then
the ACF is

½0 = 1,
½1 = ¯ 1 =(1 + ¯ 21 ),
½s = 0 for any s > 1.

The ACF is different from zero for s ≤ 1 and is zero for s > 1. It can be
shown that for any MA(q) process the autocorrelation function is different
from zero for s ≤ q and is zero for s > q.

The above mentioned autocorrelation functions are theoretical functions.


They are computed based on the true data generating processes. However, in
estimation we do not know the true data generating process. In fact, our task is
almost the opposite. We have the data, that is the time series, and we want to
estimate the data generating process with an ARMA(p; q) model. Fortunately,
we can use the sample counterparts of the thoretical autocorrelation functions
mentioned above. The sample autocorrelation function (the sample ACF ) is
deÞned as
XT
(yt − y)(yt−s − y)
t=s+1
½s =
b T
, (17)
X
(yt − y)2
t=1

where
T
1X
y= yt
T t=1

25
is the sample mean.
T
Having observations {yt }t=1 , the equation (17) enables to compute the sam-
ple ACF and compare it with the theoretical autocorrelation functions com-
puted for different ARM A(p; q) processes according to the equation (16). If the
sample ACF resembles an oscillatory or direct decay, we can assume that the
data were generated by some AR(p) process. If the sample ACF is different
from zero until the lag s = q and goes almost to zero for lags s > q, then we
can assume that the true data generating process was a MA(q). A more gen-
eral algorithm that enables to assign an appropriate ARMA(p; q) process to a
certain behavior of the sample ACF (and also sample P ACF ) is offered in the
table 1 in the subsection 4.1.6.
Example 7 First consider an AR(1) process yt = a0 + a1 yt−1 + "t .
To compute the ACF we need to get var(yt ) and cov(yt ; yt−s ). Lets denote
var("t ) by ¾2 , then because yt−1 and "t are uncorrelated, we can write
var(yt ) = var(a0 + a1 yt−1 + "t ) = a21 var(yt−1 ) + ¾2 .
We assume the time series {yt } to be stationary, thus var(yt ) = var(yt−1 ).
Substitution of this identity yields an equation var(yt ) = a21 var(yt )+¾2 . Solving
this equation we get
¾2
var(yt ) = .
1 − a21
For covariances we can write similarly
cov(yt ; yt−1 ) = cov(a0 + a1 yt−1 + "t ; yt−1 ) = a1 var(yt−1 ) = a1 var(yt ),
cov(yt ; yt−2 ) = cov(a0 + a1 yt−1 + "t ; yt−2 ) = a1 cov(yt−1 ; yt−2 ),
= a1 cov(yt ; yt−1 ) = a21 var(yt ),
and for the general case of cov(yt ; yt−s ) we get through iteration
cov(yt ; yt−s ) = cov(a0 + a1 yt−1 + "t ; yt−s ) = a1 cov(yt−1 ; yt−s ),
= a21 cov(yt−2 ; yt−s ) = ::: = as−1
1 cov(yt−s+1 ; yt−s ),
s−1 s
= a1 cov(yt ; yt−1 ) = a1 var(yt ).
Now we can compute the ACF for an AR(1) process. By substituting the ex-
pression for cov(yt ; yt−s ) into the equation (16) we get
cov(yt ; yt−s ) as var(yt )
½s = = 1 = as1 .
var(yt ) var(yt )
Second consider an MA(1) process yt = "t + ¯ 1 "t−1 .
var(yt ) = var("t + ¯ 1 "t−1 ) = (1 + ¯ 21 )¾ 2 ,
cov(yt ; yt−1 ) = cov("t + ¯ 1 "t−1 ; "t−1 + ¯ 1 "t−2 ) = ¯ 1 ¾2 ,
cov(yt ; yt−2 ) = cov("t + ¯ 1 "t−1 ; "t−2 + ¯ 1 "t−3 ) = 0,
cov(yt ; yt−3 ) = cov("t + ¯ 1 "t−1 ; "t−3 + ¯ 1 "t−4 ) = 0,
etc.

26
So that the ACF is

½0 = 1,
½1 = ¯ 1 =(1 + ¯ 21 ),
½s = 0 for any s > 1.

4.1.2 Partial Autocorrelation Function — P ACF


Another tool helping to identify the correct number of lags p, q of the true
ARMA(p; q) data generating process is the partial autocorrelation function (P ACF ).
The P ACF gives us similarly as the ACF correlations between the elements of
T
time series {yt }t=1 . However, in this case the correlation between yt and yt−s
is netted out of the effects of yt−1 , yt−2 , ..., yt−s+1 . Notice that in the case
of an AR(1) process yt = a0 + a1 yt−1 + "t we have correlation between yt and
yt−2 even though yt−2 does not directly appear in the AR(1) equation. This
correlation is intermediated through yt−1 . Because yt is correlated with yt−1
and yt−1 is correlated with yt−2 , we get the correlation between yt and yt−2

½2 = corr(yt ; yt−2 ) = corr(yt ; yt−1 )corr(yt−1 ; yt−2 ) = ½21 .

The aim of the P ACF is to eliminate such intermediated correlations in


the AR(p) processes. To derive the theoretical P ACF we assume that we have
the autocorrelation function ½s for our time series {yt }. If we want to get the
P ACF for the Þrst lag, we suppose for the moment that the time series was
generated by an AR(1) process, that is

yt = a0 + Á11 yt−1 + "t ,

where the autocorrelation coefficient at the Þrst lag Á11 equals to the desired
P ACF . Knowing the ACF for the Þrs lag ½1 , it must hold that

½1 = corr(yt ; yt−1 ) = corr(a0 + Á11 yt−1 + "t ; yt−1 ) = Á11 .

Then as a result
Á11 = ½1 ,
which is not surprising, because for the Þrst lag there are no intervening lags in
between, so that the P ACF Á11 equals the ACF ½1 . To compute the P ACF
for the second lag Á22 , we must suppose that the time series was generated by
an AR(2) process described as

yt = a0 + Á21 yt−1 + Á22 yt−2 + "t ,

where the autocorrelation coefficient at the second lag Á22 equals the desired
P ACF . Knowing the ACF for the Þrst and second lags ½1 and ½2 , it must
hold that

½1 = corr(yt ; yt−1 ) = corr(a0 + Á21 yt−1 + Á22 yt−2 + "t ; yt−1 ) = Á21 + Á22 ½1 ,
½2 = corr(yt ; yt−2 ) = corr(a0 + Á21 yt−1 + Á22 yt−2 + "t ; yt−2 ) = Á21 ½1 + Á22 .

27
These two equations can be solved for Á22 so that we have

Á22 = (½2 − ½21 )=(1 − ½21 ).

We can continue the procedure until we get the P ACF for any general lag
s; denoted as Áss . In such a case we must suppose that the time series was
generated by an AR(s) process and that we know the ACF for all lags until s.
In this manner we obtain the following expression for the P ACF of any general
lag s

Á11 = ½1 , (18)
Á22 = (½2 − ½21 )=(1 − ½21 ),
s−1
X
½s − Ás−1;j ½s−j
j=1
Áss = s−1
for s > 2,
X
1− Ás−1;j ½j
j=1

where Ásj = Ás−1;j − Áss Ás−1;s−j .


The theoretical P ACF derived above will have speciÞc behavior for time
series generated by different ARM A(p; q) processes. As in the previous chapter,
we will Þrst for simplicity consider AR(1) and M A(1) processes.

1. AR(1) process yt = a0 + a1 yt−1 + "t , where |a1 | < 1 (sufficient and neces-
sary condition for stationarity of such process).
The P ACF is different from zero until lag s = 1 and equals to zero for
all lags s > 1. Similarly, if the data were generated by an AR(p) process,
then there is no direct correlation between yt and yt−s for s > p. The
P ACF is thus different from zero up to the lag s = p and equals to zero
for any lag s > p.
2. M A(1) process yt = "t + ¯ 1 "t−1 (such process is always stationary).
If ¯ 1 6= −1, we can rewrite the MA(1) equation using the lag operator
as yt =(1 + ¯ 1 L) = "t . We suppose the MA(1) process to be not only
stationary but also invertible, thus it must have an convergent inÞnite
order AR representation yt = ¯ 1 yt−1 −¯ 21 yt−2 +¯ 31 yt−3 −:::+"t . Note that
the necessary and sufficient condition for convergence of this inÞnite order
AR equation and therefore also for invertibility of the M A(1) equation is
that |¯ 1 | < 1. Because the M A(1) process has such convergent inÞnite
order AR representation the P ACF will never go directly to zero. Rather
it will decay exponentially to zero, while the decay will be direct if ¯ 1 < 0
and oscillatory if ¯ 1 > 0. It can be shown that for any invertible MA(q)
process the P ACF will decay to zero either in a direct or oscillatory way.

We have just shown that the M A(1) process must be invertible, if we want
to get a meaningful P ACF . As it was already mentioned in the beginning of

28
the section 4.1 invertibility of any ARM A(p; q) process means that it can be
represented by a Þnite-order or convergent autoregressive process. In general
we need any ARM A(p; q) process to be invertible for its P ACF to make sense.
In a more exact way an ARM A(p; q) process
n
X q
X
yt = a0 + ai yt−i + ¯ i "t−i
i=1 i=0
P
is invertible if all the roots Li of the polynomial qi=0 ¯ i Li of the lag operatorL
lie outside the unite circle (|Li | > 1). In such case the polynomial can be
Yq Y q
rewritten as (L − Li ) = − Li (1 − ri L), where ri stands for L1i , thus
i=0 i=0
|ri | < 1. The ARMA(p; q) process can be then rewritten as
P
−yt −a0 − n i=1 ai yt−i
q = q + q + "t .
Y Y Y
Li (1 − ri L); Li (1 − ri L); Li (1 − ri L);
i=0 i=0 i=0

Because all |ri | < 1, all elements in the equation above can be extended step by
step into convergent sums, which yields a convergent inÞnite AR representation
of the original ARM A(p; q) process.
The P ACF given by the equation (18) is a theoretical function computed
from the theoretical ACF (½s ). In practice, when we want to estimate a time se-
ries, we know neither the theoretical ACF nor the theoretical P ACF . Similarly
as in the case of ACF we use the sample partial autocorrelation function (sam-
ple P ACF ) instead of the theoretical one. We get the sample P ACF simply by
replacing the theoretical ACF (½s ) by its sample counterpart b ½s in the equation
(18). Onother way how to get the sample P ACF is to apply the OLS method
to estimate the equations yt = a0 + Ái1 yt−1 + Ái2 yt−2 + ::: + Áii yt−i + "t . The
coefficient estimates will then equal to the appropriate elements of the sample
P ACF .
T
Having the observations {yt }t=1 , we can compute the sample P ACF and
compare it with theoretical partial autocorrelation functions computed for dif-
ferent ARM A(p; q) processes. If the sample P ACF resembles an oscillating
or direct decay, we can assume that the data were generated by some MA(q)
process. If the sample P ACF is different from zero until the lag s = p and goes
almost to zero for lags s > p, then we can assume that the true data generating
process was an AR(p). A more general algorithm that enables to assign the
appropriate ARM A(p; q) process to a certain behavior of the sample P ACF
(and also sample ACF ) is offered in the table 1 in the subsection 4.1.6.

4.1.3 Q-Tests
In the previous section, we described the autocorrelation and partial autocor-
relation functions as useful tools that help us to guess the number of lags p

29
and q of the true ARM A(p; q) data generating process. Application of these
functions is rather intuitive. The sample couterparts of these functions are com-
pared to the theoretical functions for different ARMA models. If the pattern of
the sample ACF and P ACF resembles the theoretical ACF and P ACF of an
ARMA(p; q), then p and q might represent the true number of lags. However, in
practice this procedure is rarely so straightforward and easy. The sample ACF
and P ACF can often appear to be ambiguous. Then it becomes very difficult to
discover some clear pattern. Indeed, the sample ACF and P ACF are random
variables and as such can deviate from the expected pattern by pure chance. As
a result, the guess of the correct number of lags based on the sample ACF and
P ACF is by large the matter of experience.
To increase the chance that our guess is correct, we can use another tool — the
Q-tests. The Q-tests based on Q-statistics offer a statistically more formal way
to asses the correct number of lags. They test whether a group of autocorrela-
tions ½s (elements of the ACF ) is statistically different from zero. Theoretically
the sample variance for the sample ACF and P ACF can be computed as well
as t-tests of these functions formulated for each lag s separately. However, such
tests have a low power, because they are always based only on one value of the
sample ACF and P ACF , for one particular s. The Q-statistics are based on a
group of sample autocorrelations b ½s , and therefore their power is higher.
In practical application two well known types of Q-tests are used. The Box-
Pierce Q-test [5] and the Ljung-Box Q-test [6]. The Þrst test performs well only
in very large samples, while the second test uses a Q-statistic that is adjusted
in order to perform better in small samples. That is why the Ljung-Box Q-test
is usually the preferred one. For the sake of completeness both of them are
introduced though.

1. Box-Pierce Q-test
The Box-Pierce Q-test is based on the Box-Pierce Q-statistic deÞned as
k
X
Q=T ½2i ,
b (19)
i=1

where b ½i are the elements of the sample ACF deÞned by the equation
(17). Under the null hypothesis of all autocorrelations up to the lag k
being zero the Q-statistic is asymptotically Â2 distributed with k degrees
of freedom.(This holds only in the case that the time series was generated
by a stationary ARMA process.)
2. Ljung-Box Q-test
The Ljung-Box Q-test is based on the Ljung-Box Q-statistic deÞned as

Xk
½2i
b
Q = T (T + 2) , (20)
i=1
T −i

30
where b½i are as in the previous case the elements of the sample ACF
deÞned by the equation (17). Under the null hypothesis of all autocorre-
lations up to the lag k being zero the Q-statistic is Â2 distributed with k
degrees of freedom.(This holds only in the case that the time series was
generated by a stationary ARM A process.)

When we search for the appropriate number of lags of an ARM A(p; q) model,
we should compute the Q-statistics for k starting at 1 and continue until a
reasonably high k that should not be higher then T =4 which is a reasonable
upper bound veriÞed by practice. The choice of the upper k is a matter of
experience and also of the nature of the data, whose generating process we
want to model. The testing procedure is standard. If the critical value of
Â2k is exceeded by the appropriate Q-statistics, we can say that at least one
autocorrelation from the set {½i }ki=1 is signiÞcantly different from zero.
Unlike the analysis of the ACF and P ACF there is no straightforward algo-
rithm that assigns the most appropriate ARM A(p; q) model to various patterns
of Q-tests’ results. Both methodologies should be applied together and their re-
sults should not contradict each other. In practice, Q-tests are usually applied
rather to diagnose the residuals from the estimated model, than to discover the
correct number of lags p and q. The Diagnostics of residuals is described in the
next section.

4.1.4 Residuals’ Diagnostics


Residuals from correctly speciÞed model that captures the data generating
process should be white noise. It means that they should contain no further
information that could help with estimation. A white noise has all autocorrela-
tions equal to zero. That is why Q-tests represent a suitable, but limited, tool
to test whether a time series is a white noise.
After we choose the lags p and q and estimate the ARM A(p; q) model, we
should test the residuals of that model for being a white noise. For this purpose
we use the Q-tests described in the previous chapter. The testing procedure
is almost the same. The only difference is that the Q-statistics deÞned by the
equations (19) and (20) have Â2 distributions with less degrees of freedom. The
degrees of freedom are decreased by the number of parameters included in the
estimated ARM A(p; q). It means that the Â2 distribution has k − p − q − 1
degrees of freedom, if constant is included in the model, or k − p − q degrees
of freedom, if the model was estimated without a constant. Therefore we can
compute the Q-statistics and perform the Q-tests starting only with k higher
than p +q +2 or p+q +1 and continue maximally until k = T =4. If the residuals
are white noise, then all autocorrelations should be zero; therefore we should
accept the zero hypothesis for each k. It means that the Q-statistics should not
exceed the appropriate Â2 critical values for any k.

31
4.1.5 Information Criteria
It can happen that several different ARMA(p; q) models seem to be appropriate
for our data. This occurs if the pattern of the sample ACF and P ACF can
be interpreted in several different ways and if the residuals from the several
different models are diagnosed to be a white noise. In such case we can use
information criteria to select the model that is the best. By the best we mean
the most parsimonious model that satisfactorily captures the dynamics of the
data.
The most common measure of goodness of Þt is the R2 . The problem with
the R2 is that it enables to compare only models of the same form and, moreover,
with the same number of explanatory variables. The R2 would never decrease
if we add one more variable in the model and in most cases it would increase.
So for example we can use the R2 only to compare linear models, which use the
same number of explanatory variables whereas these explanatory variables can
differ across the models. It is not exactly what we need in univariate time series
econometrics. Here the explanatory variables are given and so can not differ to
such an extent as in cross sectional econometrics. They are the lags of y in AR
models and the lags of " in MA models. In time series econometrics we usually
want to compare different ARM A(p; q) models, where the difference resides in
the number of lags p and q, thus in the number of explanatory variables. For
this comparison we must use the information criteria instead of the R2 .
Most frequently two information criteria are used. The Akaike information
criterion ( AIC) [7] that is deÞned as

AIC = T ln SSR + 2n (21)

and the Schwarz Bayes criterion ( SBC) [8] that is deÞned as

SBC = T ln SSR + n ln T , (22)

where SSR is the sum of residuals’ squares; n is the number of explanatory


variables ( n = p + q+possible constant term); and T is as usual the number
of usable observations. The number of usable observations T should be the
same for all the compared models. If we add lagged variables in the model we
lose some observations. This happens also by differencing. As a result if we
compare several models, we should use for estimation of all these models only
those observations that are usable in the model with the highest number of
lagged variables. For example if we compare AR(1), AR(2), and AR(3) models
with 100 data points, then we should use only 97 observations for the purpose
of the comparison.
To select the best model the value of the information criteria is to be min-
imized. The problem is that the information criteria will usually suggest to
choose a model with more parameters (more explanatory variables) than has
the true model of the data generating process. This will be a bigger problem
in the case of the AIC that is biased towards choosing an overparametrized
model. The SBC is at least asymptotically consistent. It means that if the

32
number of observations goes to inÞnity the SBC will enable us to select the
true model. In plain words the SBC imposes a heavier penalty on overparame-
trized models. Hannan-Quinn information criterion is another commonly used
tool.REFERENCE NA H-Q SEM.

4.1.6 Box-Jenkins Methodology


In the previous subsections we have introduced the tools necessary for the ap-
plication of the Box-Jenkins methodology. Now we will describe the sequence
and logic in which they should be applied — the algorithm of the Box-Jenkins
methodology. In essence the Box-Jenkins methodology consists of the following
three steps:
1. IdentiÞcation
2. Estimation
3. Diagnostic checking
Of course, such algorithm represents a simpliÞcation as any generalization
does. In reality we must rely also on our intuition and experience. Our starting
T
point are the data. We have the time series {yt }t=1 and we want to estimate its
data generating process by some ARM A(p; q) model. In order to achieve this
goal we should proceed in the following steps:
1. Plot the sample ACF and P ACF for lags from s = 1 to s = T =4 and
compare the pattern of these functions to the patterns of theoretical ACF
and P ACF of ARMA(p; q) models with different number of lags p and q.
These theoretical patterns are summarized in the table below. Based on
this comparison we choose the most appropriate number of lags p and q
for our ARM A model.
2. Estimate the ARMA(p; q) model with the lags p and q chosen in the
previous step and save the residuals.
3. Plot the sample ACF and P ACF for the series of residuals, compute
Q-statistics and perform Q-tests. The sample ACF and P ACF can be
computed as by the original time series for lags from s = 1 to T =4. The
Q-tests can be computed for k starting at p + q + 2, if we use a constant
term in our model, or at p + q + 1, if our model is without a constant
term, and ending maximally at T =4. If all the sample autocorrelations
and partial autocorrelations are close to zero and if all the Q-tests allow
to accept the zero hypothesis of no autocorrelations, then the estimated
model might be the correct one. If it is not the case, then we have to go
back to the step 1 and change the number of lags p and q.
4. If we got to this point with several possibly correct ARMA(p; q) models,
then we can choose from these the one that minimizes the information
criteria. Such model has a high chance to be the best approximation of
the true data generating process.

33
Table 1: The ACF and P ACF of different ARM A models.

ARM A process ACF pattern


White noise ½s = 0 for all s
AR(1) Direct or oscillating decay, ½s = as1
AR(p) Direct or oscillating decay
M A(1) ½1 6= 0; ½s = 0 for s > 1
M A(q) ½s 6= 0 for s ≤ q; ½s = 0 for s > q
ARM A(p; q) Direct or oscillating decay beginning at lag q
ARM A process P ACF pattern
White noise Áss = 0 for all s
AR(1) Á11 = ½1 ; Áss = 0 for s > 1
AR(p) Áss 6= 0 for s ≤ p; Áss = 0 for s > p
M A(1) Direct or oscillating decay
M A(q) Direct or oscillating decay
ARM A(p; q) Direct or oscillating decay beginning at lag p

Throughout this section we required the time series to be stationary and


invertible. So if the Box-Jenkins methodology leads us to the choice of a model
that is close to being non-stationary or non-invertible, we should be suspicious.
If we for example estimated the data generating process as an AR(2) process
yt = a0 + a1 yt−1 + a2 yt−2 + "t and got the coefficients a1 and a2 such that
a1 + a2 is close to one, then we should review the whole estimation procedure
and possibly also test the time series for the presence of unit root (as presented
in the section 4.4).

4.2 Trend in Time Series


In the previous sections we have just described how to estimate the irregular
pattern of a time series with an ARMA model. However, it was already pointed
out that many time series of economic data contain a time trend. Other time se-
ries can grow even exponentially. Exponentially growing time series we typically
transform by taking a natural logarithm, which generates a time series contain-
ing a linear time trend. Any trending time series is not stationary. Therefore we
must Þrst remove the trend from the analyzed data, before we can proceed to
the estimation of the irregular pattern. Explaining the trend is another matter.
However, in a time series econometrics explanation of a trend is not a priority.
Understandably, more fundamental as well as theoretical background should be
used as a tool to explain the source of speciÞc trend in data. Nevertheless, time
series econometrics is capable to deal with trends and Mills (2003) serves as a
qualiÞed reference covering this topic.
In the following subsections we will introduce various types of trends that
can be present in the data. Also we will suggest the appropriate transformations

34

You might also like