Times Series Analysis Notes May 2021
Times Series Analysis Notes May 2021
SCHOOL OF EDUCATION
13.1 INTRODUCTION
In the previous block, we have discussed simple and multiple linear
regression analysis where we have dealt with bivariate as well as
multivariate data. While studying that block, you have learnt how useful
regression analysis is in decision making. If we carefully analyse most
decisions and actions of the Government, an institution, an industrial
organisation or an individual, we find that, to a large extent, these depend on
the situations expected to arise in future. For example, suppose the Delhi
Government wishes to frame a housing development policy for providing
houses to all families of the central government employees in Delhi over the
next five years. Then the Government would like to know: What would the
number of families of government employees in Delhi be in the next five
years? A similar assessment is required while formulating the employment
policy, and so on.
Planning for future is an essential aspect of managing an organisation. This
requires that we should be able to forecast the future requirements of that
organisation. For example, suppose we are asked to provide quarterly
forecasts of the sales volume for a particular product during the coming one
5
Time Series Modelling year period. Our forecasts for sales would also affect production schedules,
raw material requirements, inventory policies, and sales quota. A good
forecast of the future requirements will result in good planning. A poor
forecast results in poor planning and may result in increased cost. In order to
provide such forecasts, we use historical data of the past few years to assess
the average requirement, trend (if any) over the years and seasonal
variations. Based on these features observed from the past data, we try to
understand their role in causing variability and use them for forecasting
requirements.
This exercise is done with the help of time series analysis which is a
collection of observations made sequentially over a period of time. The
main objectives of time series analysis are description, explanation and
forecasting. It has applications in many fields including economics,
engineering, meteorology, etc.
In this unit, we discuss the concept of time series and explain different types
of time series in Sec. 13.2. In Secs. 13.3 and 13.4, we describe different
components and basic models of time series. We explain the methods of
smoothing and filtering the time series data along with the estimation of
trend by the curve fitting and curvilinear methods in Secs. 13.5 and 13.6.
Finally, in Sec. 13.7, we describe the methods of measurement of trend and
cyclic effect in time series data.
In the next unit, we shall discuss some methods for estimating the seasonal
component (S). We shall also discuss the method of estimating the trend
component from deseasonalised time series data.
Objectives
After studying this unit, you should be able to:
explain the concept of time series;
describe the components of time series;
explain the basic models of time series;
decompose a time series into its different components for further
analysis;
describe the trend component of the time series;
describe different types of trends;
explain various methods for smoothing time series and estimation of
trends; and
describe the centred moving average method of measuring the trend
effect.
with respect to time over a span of time is called a time series. Normally, we Trend Component Analysis
assume that observations are available at equal intervals of time, e.g., on an
hourly, daily, monthly or yearly basis. Some time series cover a period of
several years.
The methods of analysing time series constitute an important area of study
in statistics. But before we discuss time series analysis, we would like to
show you the plots of some time series from different fields. In the next
three sub-sections, we look at the plots of three time series, namely, time
series with trend effect, time series with seasonal effect and time series
with cyclic effect. These plots are called time plots.
1000
800
600
Profit (in Lakhs)
400
200
Mortality Rate
800
700
600
500
400
300 Mortality Rate
200
100
0
1991199319951997199920012003200520072009
200
0
0 5 10 15 20 25 30
Fig. 13.5: Number of employees in software industries for the last 25 years.
So far, you have learnt about different types of time series plots which
exhibit different trends in data. These trends arise due to the effect of
various factors on the variations in data. The variations in the values or data
are also described in terms of components of time series. Let us learn about
them.
Time Series Modelling that some time series may not show any trend at all. The shifting in level is
usually the result of changes in the population, demographic characteristic
of the population, technology, consumer preferences, purchasing power of
the population, and so on.
You should clearly understand that a trend is a general, smooth, long term
and average tendency of a time series data. The increase or decrease may
not necessarily be in the same direction throughout the given period. A time
series may show a linear or a nonlinear (curvilinear) trend. If the time series
data are plotted on a graph and the points on the graph cluster more or less
around a straight line, the tendency shown by the data is called linear trend
in time series. But if the points plotted on the graph do not cluster more or
less around a straight line, the tendency shown by the data is called
nonlinear or curvilinear trend. Trend need not always be a straight line. It
can be quadratic, exponential or may not be present at all.
Trend is also known as long term variation. However, do understand that
long term or long period of time is a relative term which cannot be defined
uniformly. In some situations, a period of one week may be fairly long
while in others, a period of 2 years may not be long enough.
from three to ten years (see Fig. 13.5). The cycles could be caused by period Trend Component Analysis
of moderate inflation followed by a period of high inflation. However, the
existence of such business cycles leads to some confusion about cyclic,
trend and seasonal effects. To avoid this confusion, we shall term a pattern
in the time series as cyclic component only when its duration is longer
than one year.
The cyclic variations in a time series are usually called “business cycle” and
comprise four phases of a business, namely, prosperity (boom), recession,
depression and recovery. These are normally over a span of seven to eleven
years. Thus, the oscillatory variations with a period of oscillation of
more than one year are called cyclic variations or the cyclic component
in a time series. One oscillation period is called one cycle.
Fig. 13.6 shows the plot of this quarterly data. Note that there are 28
quarters from the year 2001 to 2007 and so 28 values are plotted in
Fig. 13.6. These have been numbered from 1 onwards on the horizontal axis.
We have connected the data points by a dotted curve to obtain a time series
plot of the data. From the plot, we note that the values exhibit an upward
linear trend over the long term. We show this trend by a thin straight line in
Fig. 13.6. So the thin straight line in Fig. 13.6 reflects the presence of a
long-term linear trend. We also notice a seasonal variation in the data, which
11
Time Series Modelling we show by a smooth thick free-hand curve. This thick curve shows the
approximate movement around the straight trend line.
900
800
700
600
500 Seasonal Indices
400 Trend Values
300
200 Linear (Trend Values)
100
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
Fig. 13.6: Trend and seasonal variation in quarterly sales of washing machine.
In Fig. 13.6, in most years, the first quarter is a low point and then there is a
rise in the second quarter, a decline in the third quarter and a rise in the
fourth quarter. This could be due to changes in seasons and festival offers.
production data of a yield does not have seasonal variations. Similarly, a Trend Component Analysis
time series for the annual rainfall does not contain cyclic variations.
In additive model, we have assumed that the time series is the sum of the
trend, cyclic, seasonal and irregular components. Generally, the additive
model is appropriate when seasonal variations do not depend on the trend of
the time series. However, there are a number of situations where the
seasonal variations exhibit an increasing or decreasing trend over time. In
such cases, we use the multiplicative model.
Solution: In this case, m = 3 years. The first value of the moving averages
for m = 3 years is the average of 17, 22 and 18, which is 19. The second
value of moving averages is obtained by discarding the first value, i.e., 17
and taking the average of the next 3 values in the time series, i.e., 22, 18 and
26. So we take the average of 22, 18 and 26, which is 22. Again, we discard
the first value, i.e., 22 and take the average of the next 3 values in the time
series, i.e., 18, 26 and 16. It is 20. We repeat this procedure for calculating
the remaining 3-year moving averages. Table 3 gives the 3-year simple
moving averages for the data given in Table 2. Note that each moving
average is tabulated at the average (centre) of the time period for which it is
computed. This method is, therefore, called the centred moving averages.
Table 3: Centred simple moving averages for time series data of Table 2
Remember that moving averages vary less than the data values from which
they are calculated as they smooth (or filter) out the effect of irregular
component. This helps us in appreciating the effect of trend more clearly.
For the given data the original time series varies between 16 and 27 whereas
the moving averages vary between 19 and 23, which is much smoother than
the original series. Fig.13.7 shows the output of the original time series and
the 3-year moving averages series. In this figure, you can clearly see the
smoothing property of moving averages.
30
25
20
15
Output (in
10 Thousands)
0
1976 1977 1978 1979 1980 1981
You may ask: What should the value of m be? If m is increased, the series
becomes much smoother and it may also smooth out the effect of cyclical
and seasonal components, which are our main interest of study. Sometimes
3-year, 5-year or 7-year moving averages are used to expose the combined
trend and cyclical movement of time series. But as we shall see in Sec. 13.6,
four quarter or 12-months moving averages are more useful for estimating
trend, seasonal and cyclical movement of the series.
Time Series Modelling The reason for giving larger weight to the latest observation rather than the
earlier observations is that the latest observation is a better predictor of the
trend than the earlier one. However, this may not be true for the cases in
which the data contains a very large irregular component.
30
25
20
15 Output (in
Thousands)
10 Weighted
Moving Avg
5
0
1976 1977 1978 1979 1980 1981
where 0 < w < 1. The value of ‘w’ is chosen as per the requirements.
Equation (3) gives weighted average of y1, y2, …, yt for calculating moving
average series yt . You can see that the latest observation y¢t - 1 gets the
maximum weight and then weights decrease exponentially. Note that the
sum of weights is equal to one. This is a very popular technique for
forecasting purposes. Let us take up an example to explain this method.
Example 3: Find the forecast for the time series given in Example 1.
Solution: On plotting the given data, it appears that there is no trend in the
time series.
This is the same as saying that if T = a + bt is the trend then b = 0.
Therefore, if we fit a straight line to the given data with b = 0, the least
squares estimate of a would be the simple average of the time series
values, i.e.,
å yt
â = = 20.66
6
In this way, the forecast for output for all t would be â as the model is
T = a + (forecast error)
25
20
15
— Output (in
Thousands)
10
0
1976 1977 1978 1979 1980 1981
We should know the value of y1 for the computation of y1 , y2 , y3 ...
According to this technique, we also select a single weight w, which lies between
0 and 1. To start with, let us take w = 0.01. Then from equation (3), we get
y1 0.01y1 0.99 y1 0.0117 0.99 17 17
Our forecast for y1 at t = 0 is y1 . Therefore, the forecast error is
e1 y1 y1 17 17 0.00
We get the second forecast value as follows:
y2 0.01y 2 0.99 y1 0.01 22 0.99 17 17.05
The forecast error at t = 2 is
e 2 y2 y2 22 17.05 4.95
17
Time Series Modelling Proceeding in this way, we calculate the forecast value yt and forecast error
et for all t. These are calculated and given in the following table:
Table 5: Forecast values and forecast errors for time series given in Table 2
Year Output (in thousand) Forecasts Forecast errors
yt y t et
1976 17 17.00 +0.00
1977 22 17.05 +4.95
1978 18 17.06 +0.94
1979 26 17.15 +8.85
1980 16 17.14 −1.14
1981 27 17.24 +9.76
After calculating the forecasts y t and errors et for all t, we plot the forecast
values with the original time series values (Fig. 13.10). The graph of time
series shows that there are no peaks in the time series after smoothing.
30
25
20
15 Output (in
Thousands)
10
0
1976 1977 1978 1979 1980 1981
Fig. 13.10: Original and smoothed time series values of output (in thousands).
You should try to solve a few problems to check your understanding of the
concepts discussed so far.
E1) Calculate the last two values of 3-year moving average for the data
given in Example 1.
E2) Use exponential smoothing to obtain filtered values for the data given
in Example 1 taking w = 0.5 and compare them with simple moving
average values obtained in the same example.
E3) Obtain filtered values for the following data using exponential smoothing:
Year Rainfall Year Rainfall Year Rainfall Year Rainfall
(in cm) (in cm) (in cm) (in cm)
1970 664 1981 548 1991 624 2001 468
1971 728 1982 417 1992 473 2002 554
1972 447 1983 387 1993 750 2003 744
1973 663 1984 590 1994 343 2004 943
1974 630 1985 556 1995 484 2005 582
1975 451 1986 292 1996 545 2006 581
1976 617 1987 327 1997 419 2007 437
1977 734 1988 494 1998 798 2008 417
1978 491 1989 448 1999 334 2009 617
1979 520 1990 704 2000 465 2010 571
1980 280
We shall not discuss it here in detail as you have already studied the method
of least squares and curve fitting in Unit 5 of MST-002. In the next section,
we shall discuss the case when p = 1, i.e., the case of a linear trend equation.
n n n
å yt x t = a1å x t + b1å x t2 …(9a)
t =1 t =1 t =1
On solving the above equations, the least squares estimates of a1 and b1 are
given by â 1 and b̂1 as:
n
y x
t 1
t t
â 1 y and b̂1 n
…(9b)
2
x
t 1
t
bˆ 0 y bˆ 1 t …(10)
19
Example 6: Fit a straight line trend for the data of annual profit of a
company given below:
Year 2003 2004 2005 2006 2007 2008 2009 2010 2011
Profit (in crores) 93 102.8 126.7 103.5 105.7 133.2 156.7 175.7 161.6
Solution: From the given data, we obtain linear trend values for the annual
profit as follows:
Table 6: Trend values for the given time series
From the above table, we calculate the least squares estimates as follows:
n
y x t t
y 128.76
y t 1 582.8
â 1 b̂1 n
9.71
9 2 60
x
t 1
t
ŷ t b̂ 0 b̂1t y b̂1 ( t t )
These points are used to plot the trend line along with the data and projected
value for 2013 (Fig. 13.11).
20
Fig. 13.11: Computed trend line for the data of profit of a company.
The projections are forecasts of future trend values but they do not take into
account the cyclical effect. Sometimes cyclical effects are confused with
trend curves which are of higher degree of polynomials.
2 3
x y b x b x b x
t t 0 t 1 t 2 t
...(13)
2
x y t t b 0 x 2t b1 x 3t b 2 x 4t
Time Series Modelling Proceeding in the same way as in Sec. 13.6.1, the normal equations for
estimating b0, b1 and b2 are given by
y t nb 0 b1 x t b 2 x 2t
x y t t b0 x t b1 x 2t b 2 x 3t
... (ii)
2
x t y t b0 x 2t b1 x 3t b 2 x 4t
The values of y t , x t , x t y t , x 2t y t , 2
x , x
t
3
t and x 4
t are
obtained from the given data as follows:
Table 7: Calculations for fitting of quadratic trend
2
Year yt xt=t–1999 xt yt xt y t x 2t x 3t x 4t
1991 240 –8 –1920 15360 64 –512 4096
1992 167 –7 –1169 8183 49 –343 2401
1993 140 –6 –840 5040 36 –216 1296
Putting the values from the above table in the normal equations given in
equation (ii), we get
17 b 0 + 0 b1 + 408 b2 = 5469
0 b0 + 408 b1+ 0 b2 = 15126
408 b0 + 0 b1+ 17544 b2 = 165144
22
With these values, we get the desired quadratic trend: Trend Component Analysis
2
yt = 373.22 – 41.614 t t + 4.3715 t t
Fig. 13.12: Plot the quadratic trend model to the data of Example 7.
Yt = β0 + β1 t
where Yt = loge yt, β0 = logee α 0 , β1 = loge α1,
Now, we can fit the model with Yt and t as described in Sec. 13.6.1. Once
we know the estimates of β0 and β1, i.e., ̂0 and ̂1 , we can obtain the
estimates of α0 and α1 , i.e., the values of ̂ 0 and ̂1 by taking anti-
logarithm of ̂0 and ̂1 , respectively. When we fit the exponential model to
the data given in Example 7, we get the values ˆ 2.57 and ˆ 0.0494,
0 1
respectively, and the estimated exponential trend equation is obtained as:
yt = 371.535 (1.1205)t
23
Time Series Modelling The fitted and raw data are plotted in Fig. 13.13.
600
400
200
0
0 2 4 6 8 10 12 14 16 18
Fig. 13.13: Plot of fitted exponential trend model to the data of Example 7.
E4) Use the data given in Example 1 to fit a linear trend line and obtain
the projection for the year 2012.
Example 8: Compute the trend values for the given data for quarterly sales Trend Component Analysis
of washing machines by an appliance manufacturer for the period
2001-2009.
Year Quarter 1 Quarter 2 Quarter 3 Quarter 4
2001 935 1215 1045 1455
2002 990 1315 1350 1485
2003 1370 1815 1470 1680
2004 1160 1365 1205 1445
2005 1030 1475 1195 1585
2006 1185 1330 1500 2145
2007 1410 2120 1915 2390
2008 1875 2145 1965 2800
2009 1865 2115 1935 2165
Solution: We shall use the data of four years 2001-2004 from the given data
to explain the calculations of moving averages, which are estimates of T and
C. The given time series data is shown in Fig. 13.14.
Fig. 13.14: Time series plot for quarterly sales of washing machines from 2001-2009.
We have seen from the plot (Fig. 13.14) that it has a very strong 12 monthly
seasonal effect. Hence, to remove the seasonal effect, we have to take
moving averages with m = 4.
Table 8: Calculation of centred moving averages
Sales (in
Year Quarter Centred MA(1) Centred MA (2)
Hundreds)
1 935
2001
-
2 1215
1162.5
3 1045 1169.375
1176.25
4 1455 1188.75
1201.25
2002 1 990 1239.375
1277.5
2 1315 1281.25
1285.0
3 1350 1332.5
1380.0
4 1485 1442.5
25
The values in MA(1) are the moving averages for m = 4 but they do not
correspond to any of the given four quarters as the average of 1, 2, 3 and 4 is
2.5. Hence, to make it correspond to a quarter, we usually calculate moving
averages with m = 2 on MA(1) so that the centred values correspond to one
of the four quarters. MA(2) gives the moving average of MA(1) series with
m = 2 so that the values correspond to quarters 3, 4, 1, 2,.. etc.
You should try to solve the following exercises.
E5) Compute MA(1), the moving average values for m = 4 and MA(2),
the moving average values for m = 2 of MA(1) for the remaining
years of the period 2005-2009 for the data given in Example 8.
E6) Compute the moving average (MA) values for m = 3 for time series
for the period 2001-2009 for the data given in Example 8.
Let us now summarise the concepts that we have discussed in this unit.
13.8 SUMMARY
1. A good forecast of the future requirements will result in good planning.
A poor forecast results in poor planning and may lead to increased cost.
In order to provide such forecasts, we use historical data of the past few
years to assess the average requirement, trend (if any) over the years and
seasonal variations. Based on these features observed from the past data,
we try to understand their role in causing variability and use them for
forecasting requirements.
2. A time series is a collection of observations made sequentially over a
period of time. The main objectives of time series analysis are
description, explanation and forecasting. It has applications in many
fields including economics, engineering, meteorology, etc.
3. A trend is a long term smooth variation (increase or decrease) in the
time series. When values in a time series are plotted in a graph and, on
an average, these values show an increasing or decreasing trend over a
long period of time, the time series is called the time series with trend
effect.
4. If values in a time series reflect seasonal variation with respect to a
given period of time such as a quarter, a month or a year, the time series
is called a time series with seasonal effect. If the time plot of data in a
time series exhibits a cyclic trend, it is called the time series with cyclic
effect.
26
5. The long term variations, i.e., the trend component, and short term Trend Component Analysis
variations, i.e., the seasonal and cyclic component, are known as
regular variations. Apart from these regular variations, random or
irregular variations, which are not accounted for by trend, seasonal or
cyclic variations, exist in almost all time series.
6. The additive model is one of the most widely used models. It is based
on the assumption that at any time t, the time series value Yt is the sum
of all the components. According to the additive model, a time series can
be expressed as
Yt Tt C t St It
where Tt, Ct, St and It are the trend, cyclic, seasonal and irregular
variations, respectively, at time t.
9. The multiplicative model is based on the assumption that the time series
value Yt at time t is the product of the trend, cyclic, seasonal and
irregular component of the series:
Yt Tt Ct St I t
where Tt, Ct, St and It denote the trend, cyclic, seasonal and irregular
variations, respectively. The multiplicative model is found to be
appropriate for many business and economic data.
10. There are two methods of moving averages: the equal weight or
simple moving averages method and the weighted (unequal)
moving average method. The methods of moving averages and
exponential smoothing are used for smoothing or filtering the time series
data.
11. In exponential smoothing technique, where weights decrease
exponentially, except the last one, the trend value is given by
where 0 < w < 1. The value of ‘w’ is chosen as per the requirements.
12. An alternative approach to smoothing is to fit a polynomial to the data.
This treats smoothing as a regression problem in which yt is the trend
value and integral powers of time t are the explanatory variables. The
resulting smooth function is a polynomial
p
yt b j t j
j 0
13.9 SOLUTIONS/ANSWERS
E1) Year MA
1979 (18 + 26 + 16)/3 = 20
1980 (26 + 16 + 27)/3 = 23
27
Time Series Modelling E2) Using equation (1), we obtain the exponential smoothing values as
follows:
y2 = 0.5x2 +0.5x1 = 0.5 × 22 + 0.5 × 17 = 19.50
y3 = 0.5x3 + 0.25x2 + 0.25x1
= 0.5 × 18 + 0.25 × 22 + 0.25 × 17 = 18.75
y4 = 0.5x4 + 0.25x3 + 0.125x2 + 0.125x1
= 0.5 × 26 + 0.25 × 18 + 0.125 × 22 + 0.125 × 17 = 22.375
y5 = 0.5x5 + 0.25x4 + 0.125x3 + 0.0625x2 + 0.0625x1
= 0.5 × 16 + 0.25 × 26 + 0.125 × 18 + 0.0625 × 22
+ 0.0625 × 17 = 19.1875
y6 = 0.5x6 + 0.25x5 + 0.125x4 + 0.0625x3 +0.03125 x2 + 0.03125x1
= 0.5 × 27 + 0.25 × 16 + 0.125 × 26 + 0.0625 × 18
+ 0.03125 × 22 + 0.03125 × 17 = 23.09375
E3) If we plot the given data in a graph, there seems to be no trend in the
time series.
If we fit a linear regression to this data and if T = a + bt is the
equation, it seems to have b = 0. Therefore, the least squares
estimates of all 41 time series values would be the same. Also our
forecast for rainfall is a for all t because our model is
T = a + (forecast error)
This result of forecast seems somewhat unreasonable because a
particular place cannot have a constant amount of rainfall every year.
From Fig. 13.15, you can see that the value of yt seems to be higher
during the period 1970 to 1980 compared to the value of a for the
period 1980 to 1990. Therefore, it is logical to assume that the value
of a is gradually changing over time and to denote it by yt rather
than a.
Fig. 13.15: Time series plot for rainfall (in cm) from 1970 to 2009.
After calculating the forecast at and the error et for all t, we plot the
time series values along with the forecasted values, which are the
outcomes after smoothing the time series values and observe the
change in the time series values before and after the smoothing
(Fig. 13.16). We can also see that the peaks barely exist in the graph
of the time series after smoothing.
30
14.1 INTRODUCTION
In Unit 13, you have learnt that time series can be decomposed into four
components: Trend (T), Cyclic (C ), Seasonal (S) and Irregular
Component (I). Our aim is to estimate T, C and S components and use them
for forecasting. We have already described some methods for smoothing or
filtering the time series, namely, the simple moving average method,
weighted moving average method and exponential smoothing method in
Unit 13. We have also described some methods for estimating Trend and
Cyclic components, i.e., the method of least squares and the moving average
method in Unit 13.
When time series data do not contain any trend and cyclic components but
reflect seasonal variation, we have to estimate the seasonal component S by
removing irregular components. In Sec. 14.2 of this unit, we discuss some
methods for estimating the seasonal component (S), namely, the simple
average method, the ratio to moving average method and the ratio to trend
method. If the effect of seasonal variation is not removed from the time
series data, the trend estimates are also affected. Therefore, we have to
deseasonalise the data by dividing it by the corresponding seasonal
indices. Once the data is free from seasonal effects, we estimate the trend
equation using the method of least squares as explained in 14.3. In
Sec. 14.4, we explain how to use data for forecasting purposes once we
have estimated the trend, cyclic and seasonal components of the time
series.
In the next unit, we shall discuss the stationary time series and explain the
stationary processes, i.e., weak and strict stationary processes. We shall also
discuss the autocovariance, autocorrelation function and correlogram of a
stationary process.
Objectives
After studying this unit, you should be able to:
explain the effect of seasonal variation in time series;
apply the simple average method for estimating seasonal indices;
apply the ratio to moving average method and ratio to trend method for
estimation of seasonal indices; 33
Time Series Modelling describe the method of estimation of trend component from
deseasonalised time series data; and
use trend (T), cyclical (C) and seasonal (S) components for forecasting
purposes.
In the additive model, the seasonal indices St are normalised so that their
sum over months in a year is zero. In the multiplicative model, they are
normalised. In both cases, the forecast of the yearly output is not affected by
the seasonal effect St and we can work on yearly totals to estimate the trend.
In such cases, we can estimate St by working on the annual data. However,
in many cases, we may be interested in forecasting monthly (or quarterly)
figures. This requires monthly (or quarterly) estimate of the seasonal
index St.
In many cases it has been found that seasonal effects increase with increase
in the mean level of time series. Under these circumstances, it may be more
appropriate to use the multiplicative model. If seasonal effects remain
constant, the additive model is more appropriate. The classical approach is
to consider the multiplicative model and estimate seasonal effect (St) for
forecasting purposes. In this unit, we use the classical multiplicative model.
We describe two methods for estimating seasonal indices based on
the ratios of time series observation (Y) and estimated trend and cycle
effects:
Y T C t St It
t St I t … (3)
(Tt C t ) (Tt C t )
Step 3: After the monthly averages are calculated, we calculate the average Seasonal Component
of the monthly averages, that is, Analysis
y1 y 2 ... y12
y
12
Step 4: After calculating the average y , we express monthly averages yi as
the percentage of the average y . These percentages are known as
seasonal indices. Thus,
y
Seasonal index for the ith month i 100, for i 1, 2,...,12. … (4)
y
Let us consider an example to explain this method.
Example 1: Determine the monthly seasonal indices for the following data
of production of a commodity for the years 2010, 2011 and 2012 using the
method of simple averages.
Years Production in Tonnes
Months 2010 2011 2012
January 120 150 160
February 110 140 150
March 100 130 140
April 140 160 160
May 150 160 150
June 150 150 170
July 160 170 160
August 130 120 130
September 110 1360 100
October 100 120 100
November 120 130 110
December 150 140 150
Time Series Modelling We are given the monthly production of a commodity for 3 years. For
calculating the monthly seasonal indices, we first calculate the month-wise
total production for the 3 years. Then we calculate the monthly averages for
all 12 months. Note from Table 1 that for January, it is 143.3, for February,
it is 133.3, …, etc. Next, we calculate the average of all monthly averages,
i.e.
1
y 143.3 133.3 ... 146.6 136.625
12
Now we calculate the seasonal indices by taking the percentage of monthly
averages y i to the combined averages y , one at a time, for i = 1, 2, …, 12:
143.3
Seasonal Index for January 100 104.886
136.625
133 .3
Seasonal Index for February 100 97.566
136 .625
123 .3
Seasonal Index for March 100 90.247
136 .625
153.3
Seasonal Index for April 100 112.205
136.625
153.3
Seasonal Index for May 100 112.205
136.625
156.6
Seasonal Index for June 100 114.620
136.625
163.3
Seasonal Index for July 100 119.524
136.625
126 .6
Seasonal Index for August 100 92.662
136 .625
113 .3
Seasonal Index for September 100 82.928
136 .625
106 .6
Seasonal Index for October 100 78.024
136 .625
120
Seasonal Index for November 100 87.832
136 .625
146 .6
Seasonal Index for December 100 107 .301
136 .625
14.2.2 Ratio to Moving Average Method
In Sec. 14.2.1, we have discussed the simple average method for calculating
seasonal indices. Now we discuss the most widely used method known as
the ratio to moving average method. It is better because of its accuracy.
Also the seasonal indices calculated using this method are free from all the
three components, namely, trend (T), cyclic (C) and Irregular variations (I).
As you have learnt in Unit 13, the moving average eliminates periodic
variations if the span of period of moving average is equal to the period of
the oscillatory variation sought to be eliminated. Therefore, we have to
choose the span of time for moving average to be equal to one cycle. For
36 example, if a cycle is completed in 3 months, we calculate the moving
average for 3 months. You may note that for some quarters, three values are Seasonal Component
available and for some quarters, four values are available. By taking the Analysis
average over three or four values, we get seasonal indices Si for i=1, 2, 3, 4.
We usually normalise them so that their mean is 100 by dividing them by
the mean of Si and multiplying by 100. These normalised Si have mean 100.
Usually, not much difference exists between normalised and non-normalised
seasonal indices Si . When data are monthly, the same procedure will yield
twelve monthly seasonal indices Si . This method of estimating seasonal
indices is known as the ratio to moving average method.
We have explained the ratio to moving average method for monthly time
series data. The same method may be applied for any other periodic data
such as quarterly, weekly data, etc. The steps for obtaining seasonal indices
using this method are as follows:
Step 1: We arrange the data chronologically.
Step 2: If the cycle of oscillation is 1 year, we take the 12 months moving
average of the 1 st year, which will give estimates of the combined
effects of trend and cyclic fluctuation. We enter the average value
against the middle position, i.e., between the months of June and
July.
Step 3: We discard the value for the month of January of the first year and
include the value for the month of January of the subsequent year.
Then we calculate the average of these 12 values and enter it
against the middle position, i.e., between July and August. We
repeat the process of taking moving averages MA (1) and entering
the value in the middle position, till all the monthly data are
exhausted.
Step 4: We calculate the centred moving average, i.e., MA(2), of the two
values of the moving averages MA(1) and enter it against the first
value, i.e., the month of July in the first year and subsequent values
against the month of August, September, etc.
Step 5: After calculating the MA(1) and MA(2) values, we treat the
original values (except the first 6 months in the beginning and the
last 6 months at the end) as the percentage of the centred moving
average values. For this we divide the original monthly values by
the corresponding centred moving average, i.e., MA (2) values,
and multiply the result by 100. We have now succeeded in
eliminating the trend and cyclic variations from the original data.
We now have to get rid of the data of irregular variations.
Step 6: We prepare another two-way table consisting of the month-wise
percentage values calculated in Step 5, for all years. The purpose
of this step is to average the percentages and to eliminate the
irregular variations in the process of averaging.
Step 7: We find the median of the percentages or preliminary seasonal
indices calculated in Step 5 month-wise and take the average of the
month-wise median. Then we divide the median of each month by
the average value and multiply it by 100. Generally, the sum of all
medians is not 1200. Therefore, the average of all medians is not
equal to 100. Hence, the seasonal indices are subjected to the same
operation. We multiply the medians by the ratio of expected total
of indices, i.e., 1200 to the actual total as follows:
37
38
Once we have obtained S × I, we can take the average to eliminate the effect
of irregularity I. This gives seasonal indices, Si. Now we prepare a two way
table which will include the percentage value of column (6) of Table 2
month-wise for every year as follows:
Table 3: Seasonal indices for the given time series
Solution: We are given the time series data for 5 years of quarterly
sales of a commodity. To compute the seasonal indices, we first determine
the trend value for the yearly averages (Y) by fitting a linear trend by the
method of least squares. The following table is constructed for fitting the
straight line:
Y = a + b X X a bU
where U = X X
41
Time Series Modelling Table 4: Trend values for the yearly averages
For the straight line Y = a + b X, the normal equations for estimating a and
b are:
Y na b U
2
UY a U b U
2
Now we put the values of Y , U , UY , U in the normal
equations:
5 a 2800 a 560
10 b 1200 b 120
On putting the optimum value of a and b in the equation of the straight line
Y = a + bX, we get the trend line for the given time series data as:
Y = 560 – 120 U
Thus, the trend values for each value of U are obtained as follows:
U = –2, Y = 560 – 120 (–2) = 800
U = –1, Y = 560 – 120 (–1) = 680
U = 0, Y = 560 – 120 ( 0 ) = 560
U = 1, Y = 560 – 120 ( 1 ) = 440
U = 2, Y = 560 – 120 ( 2 ) = 320
Since yearly decline in the trend value is −120, the quarterly increment
would be
120
Quarterly increment 30
4
For 2008, the trend value for the middle quarter, i.e., half of the second
quarter and half of the third quarter is 800. Since the quarterly increment is
−30, we obtain the trend value for the 2nd quarter as 800 − (−15) and for the
3rd quarter as 800 + (−15). Thus, these are 815 and 785, respectively.
Consequently, the trend value for the first quarter is 815 − (− 30) = 845 and
for the 4 th quarter, it is 785 + (−30) = 755. Similarly, we can get the trend
values for other years as we have obtained for all the quarters of the year
2008. After calculating the trend values, we also calculate the seasonal
indices for each quarter of every year, which estimates the trend component
42 from the data. This is shown in Table 5.
Year 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
2008 845 815 785 755 94.67 112.88 112.1 108.61 428.26
2009 725 695 665 635 74.48 109.35 102.25 97.64 383.72
2010 605 575 545 515 66.11 100.87 99.08 93.2 359.26
2011 485 455 425 395 70.1 114.28 117.64 111.39 413.41
2012 365 335 305 275 82.19 119.4 118.03 123.63 443.25
The average yearly seasonal indices obtained above are adjusted to a total of
400 because the total of the seasonal indices for each quarter is 405.578,
which is greater than 400. So we multiply each quarterly index by the ratio
400 400
K 0.986.
Total of Indices 405.08
The adjusted seasonal indices for each quarter are given in the last row of
the table.
You may now like to solve the following problems to check your
understanding of the three methods explained so far.
E1) Determine the seasonal indices for the data given below for the
average quarterly prices of a commodity for four years:
Years Quarter I Quarter II Quarter III Quarter IV
2009 554 590 616 653
2010 472 501 521 552
2011 501 531 553 595
2012 403 448 460 480
E2) Calculate the seasonal indices for the following data of production of
a commodity (in hundred tons) of a firm using the ratio to trend
method:
Years Quarter I Quarter II Quarter III Quarter IV
2001 470 531 500 480
2002 340 450 410 380
2003 270 360 340 310
2004 240 330 320 290
2005 220 270 250 240
E3) Apply the ratio to moving average method for calculating the
seasonal indices for the time series data given in Example 8 of
Unit 13.
43
Year Yt St Deseasonalised
Xt values (X t - Xt ) (X t - Xt )
2
(
Zt Xt - Xt )
Zt =(Yt/St)×100
2003 Q1 289 113.13 255.46 1.875 3.515625 478.9875
Q2 241 94.82 254.17 1.625 2.640625 413.02625
Q3 273 107.43 254.12 1.375 1.890625 349.415
Q4 232 84.62 274.17 1.125 1.265625 308.44125
44
Once the data are deseasonalised, we apply the method of least squares to estimate
the trend equation. The following values are calculated (as given in the above table):
X X t 21.25 Z X X 298.995
2
X t 2004.5 , t t t t
â Z 266.10 b̂
Zt Xt Xt 14.07
Xt Xt
2
14.4 FORECASTING
Forecasting is one of the main purposes of time series analysis. It is always
a risky task as one has to assume that the process will remain the same in
future as in the past, at least for the period for which we are forecasting.
This assumption is not very realistic and we shall assume that at least for
short term forecasting, the process remains the same as in the past.
If a time series plot shows that there is no seasonal effect, or on theoretical
basis there is no reason for having a seasonal component (St), we can
estimate the trend component (Tt) and use the trend equation for forecasting.
If it is empirically observed that there is a significant seasonal effect (S) and
on theoretical ground also there is a valid reason for the presence of such a
component, we have to take the seasonal effect into account while
estimating and forecasting. If data are collected on monthly basis and the
period of seasonality is one year, we estimate twelve seasonal indices, one
for each month. If data is quarterly, we estimate four seasonal indices. After
deseasonalising the data, we fit the trend equation. Then we project the trend
for the period for which we have to forecast. Next we adjust it for seasonal
effect by multiplying it by the corresponding seasonal index. This gives the
final forecast value which has been corrected for the seasonal effect. These
steps can be described as follows:
Step 1: Calculate the moving average of suitable order. The order of
moving average is taken as the period of the seasonal effect.
Step 2: Calculate the ratio of data to moving average values so that this ratio
contains the seasonal component (St) and the irregular component (It),
i.e., St × It. 45
Time Series Modelling Step 3: Determine the seasonal indices by averaging St × It for respective
seasons. This gives the estimates of the seasonal component (St).
Step 4: Obtain deseasonalised values by dividing data values by the
corresponding seasonal component (St).
Step 5: Fit the trend equation to deseasonalised data. Compute
deseasonalised forecast value from the trend equation.
Step 6: Adjust the forecast value for seasonal effect by multiplying it by the
corresponding St. If the additive model is used, instead of
multiplying or dividing we add or subtract the corresponding values.
Seasonal effects are normalised so that their sum is equal to zero.
Example 5: Using estimated seasonal indices and the fitted trend equation
of Example 4, forecast the value for the first quarter (Q1) of 2007.
Solution: We have fitted the trend equation in Example 4 as:
Yt 266.10 14.07 X t X t
The projected trend value for Q1 of 2007 is:
Ŷt 266.10 14.07 ( 2007.25 2004.5)
where X t 2004.5
1985 7.92 7.81 7.91 7.03 7.25 7.17 5.01 3.90 4.64 7.03 6.88 6.14
1986 4.86 4.48 5.26 5.48 6.42 6.82 4.98 2.45 4.51 6.38 7.55 7.59
E6) Given below are the data of production of a company (in lakhs of
units) for the years 1973 to 1979:
Year 1973 1974 1975 1976 1977 1978 1979
Production 15 14 18 20 17 24 27
i) Compute the linear trend by the method of least squares.
ii) Compute the trend values for each year.
Let us now summarise the concepts that we have discussed in this unit.
14.5 SUMMARY
1. When time series data do not contain any trend and cyclic components
but reflect only seasonal variation, we have to estimate the seasonal
component S by removing irregular components.
2. If the effect of seasonal variation is not removed from the time series
data, the trend estimates are also affected. Therefore, we have to
deseasonalise the data by dividing it by the corresponding seasonal
indices. Once the data is free from the seasonal effects, we estimate the
trend equation using the method of least squares.
3. If a cycle is completed in 3 months, we calculate the moving average for
3 months. By taking the average over all available values, we get
seasonal indices Si for i =1, 2, 3, 4.
4. We usually normalise the seasonal indices so that their mean is 100 by
dividing them by mean of Si for all i and multiplying by 100. The
normalised seasonal indices have mean 100. Usually, there is not
much difference between normalised and non-normalised seasonal
indices.
5. The ratio to trend method provides seasonal indices free from trend
and is an improved version of the simple average method as it assumes
that seasonal variation for a given period is a constant fraction of the
trend.
6. When substantial seasonal component is present in the data, it is
advisable to first remove the seasonal effect from the data.
Otherwise, the trend estimates are also affected by seasonal effects,
which make the estimation unreliable. Hence, after estimating the
seasonal indices, we deseasonalise the data values by dividing it by
corresponding seasonal indices (St). Thus, the deseasonalised values are
given by
yt
Deseasonalised Z t Tt C t I t
St
7. A cycle in the time series means a business cycle, which normally
exceeds one year in length. Note that hardly any time series possess 47
15.1 INTRODUCTION
In Units 13 and 14, you have learnt that a time series can be decomposed
into four components, i.e., Trend (T), Cyclic (C ), Seasonal (S) and Irregular
(I) components. We have discussed methods for smoothing or filtering the
time series and for estimating Trend, Seasonal and Cyclic components. We
have also explained how to use them for forecasting.
In this unit, we describe a very important class of time series, called the
stationary time series. In Sec. 15.2, we explain the concept of stationary
process and define weak and strict stationary processes. We discuss
autocovariance, autocorrelation function and correlogram of a stationary
process in Secs. 15.3 and 15.4. If a time series is stationary, we can model it
and draw further inferences and make forecasts. If a time series is not
stationary, we cannot do any further analysis and hence cannot make
reliable forecasts. If a time series shows a particular type of non-stationarity
and some simple transformations can make it stationary, then we can
model it.
In the next unit, we shall discuss certain stationary linear models such as
Auto Regressive (AR), Moving Average (MA) and mixed Autoregressive
Moving Average (ARMA) processes. We shall also discuss how to deal
with models with trend by considering an integrated model called
Autoregressive Integrated Moving Average (ARIMA) model.
Objectives
After studying this unit, you should be able to:
describe stationary processes;
define weak and strict stationary processes;
define autocovariance and autocorrelation coefficients;
estimate autocovariance and autocorrelation coefficients;
plot the correlogram and interpret it; and
make proper choice of probability models for further studies.
55
Thus, the covariance term depends only on lag J = t2 − t1, i.e., Stationary Processes
E[yt] = µ;
In the course MST-002, you have learnt how to calculate the covariance and
correlation between two variables for given N pairs of observations on two
variables X and Y, say {(x1, y1), (x2, y2), …, (xN, yN). Recall that the
formulas for computation of covariance and correlation coefficient are given
as:
Cov X, Y E X Y
Cov Yt , Yt k
… (2)
2Y
From equation (1), we note that
2Y 0 … (3)
k
Therefore, k and 0 1 … (4)
0
58
Stationary Processes
15.3.2 Estimation of Autocovariance and Autocorrelation
Coefficients
So far, we have defined the autocovariance and autocorrelation coefficients
for a random process. You would now like to estimate them for a finite time
series for which N observations y1, y2, ..., yN are available. We shall denote
a realisation of the random process Y1, Y2, …, YN by small letters y1, y2, ...,
yN. The mean µ can be estimated by
N
y yi N … (5)
i 1
59
y
i 1
i
510
y 51.0
10 10
From equation (6), for k=0, the autocovariance coefficient is
Nk
2
y
t 1
i y
y 2
i Ny 2
c0
N N
(27906 26010)
189.6
10
For k = 1,
9
y
t 1
t y y t 1 y
c1 = – 166.33
9
From equation (7),
c1 166.33
r1 0.88
c2 189.6
You may now like to solve a problem to assess your understanding.
E1) Ten successive observations on a stationary time series are as
follows:
1.6, 0.8, 1.2, 0.5, 0.6, 1.5, 0.8, 1.2, 0.5, 1.3.
Plot the observations and calculate r1.
E2) Fifteen successive observations on a stationary time series are as
follows:
34, 24, 23, 31, 38, 34, 35, 31, 29,
28, 25, 27, 32, 33, 30.
Plot the observations and calculate r1.
In most time series, it is noticed that the absolute value of rk, i.e., | rk| Stationary Processes
decreases as k increases. This is because observations which are located far
away are not related to each other, whereas observations lying closer to each
other may be positively (or negatively) correlated.
(a)
(b)
Fig. 15.1: a) Plot of a time series for N = 200; b) correlogram for lag k = 1, 2, .., 17.
Time Series Modelling an indication of stationary time series with most of the non-zero
autocorrelations being either positive or negative.
Alternating Series
If a time series behaves in a very rough and zig-zag manner, alternating
between values above and below mean, it is indicated by a negative r1 and
positive r2. An alternating time series with its correlogram is shown in
Fig.15.2.
(a)
(b)
Fig. 15.2: a) Plot of alternating time series; b) correlogram for an alternating series
with lag up to 15.
Non-Stationary Time Series
If a time series contains trend, it is said to be non-stationary. Such a series is
usually very smooth in nature and its autocorrelations go to zero very slowly
as the observations are dominated by trend. Due to the presence of trend, the
autocorrelations move towards zero very slowly (see Fig. 15.3). One should
remove trend from such a time series before doing any further analysis.
(a)
(b)
62 Fig. 15.3: a) Plot of a non-stationary time series; b) correlogram of non-stationary series.
If a time series has a dominant seasonal pattern, the time plot will show a
cyclical behaviour with a periodicity of the seasonal effect. If data have
been recorded on monthly basis and the seasonal effect is of twelve months,
i.e., s = 12, we would expect a highly negative autocorrelation at lag 6 (r6)
and highly positive correlation at lag 12 (r12). In case of quarterly data, we
expect to find a large negative r2 and large positive r4. This behaviour will
be repeated at r6, r8 and so on. This pattern of cyclical behaviour of
correlogram will be similar to the time plot of the data.
Years (2010-2012)
Fig. 15.4: Time plot of the average rainfall at a certain place, in successive months
from 2010 to 2012.
Therefore, in this case the correlogram may not contain any more
information than what is given by the time plot of the time series.
(a)
(b)
Fig. 15.5: a) Smoothed plot of the average rainfall at a certain place, in successive
months from 2010 to 2012; b) correlogram of monthly observations of
seasonal time series.
63
Time Series Modelling Fig. 15.5a shows a time plot of monthly rainfall and Fig. 15.5b shows the
correlogram. Both show a cyclical pattern and the presence of a strong 12
monthly seasonal effect. However, it is doubtful that in such cases the
correlogram gives any more information about the presence of seasonal
effect as compared to the time plot shown in Fig 15.4.
In general, the interpretation of correlogram is not easy and requires a lot of
experience and insight. Estimated autocorrelations (rk) are subject to
sampling fluctuations and if N is small, their variances are large. We shall
discuss this in more detail when we consider a particular process. When all
the population autocorrelations ρk (k ≠ 0) are zero, as happens in a random
series, then the values of rk are approximately distributed as N(0,1/N). This
is a very good guide for testing whether the population correlations are all
zeros or not, i.e., the process is completely random or not.
Example 2: For the time series given in Example 1, calculate r1, r2, r3, r4 and
r5 and plot a correlogram.
Solution: From Example 1 and its results we have the following:
y 51.0, c 0 = 189.6, c1 = −166.33 and r1 = − 0.88
Now we form the table for the calculations as follows:
S.
No.
Y Y2 Yt – Y Yt – Y Yt+2 – Y Yt – Y Yt+3 – Y Yt – Y Yt+4 – Y Yt – Y Yt+5 – Y
1 47 2209 −4
2 64 4096 13
3 23 529 −28 112
c2
8
y t y y t2 y
t 1 8
= 876/8 = 109.5
c2
r2 = 109.5/ 189.6 = 0.58
c0
For k = 3, we get
7
y
t 1
t y y t 3 y
c3
64 7
c3
r3 = −44.43/ 189.6 = −0.2343
c0
For k = 4, we get
6
y
t 1
t y y t 4 y
c4
6
= −234/6 = −39
c4
r4 = −39/ 189.6 = −0.2057
c0
For k = 5, we get
5
y
t 1
t y y t 5 y
c5
5
= 479/5 = 95.8
c5
r5 = 95.8/ 189.6 = −0.5052
c0
Thus, we have obtained the autocorrelation coefficients r1, r2, r3, r4 and r5
as r1= –0.88, r2 = 0.58, r3 = −0.2343, r4 = −0.2057, r5 = −0.5052,
respectively.
Now we plot the correlogram for the given time series by plotting the values
of the autocorrelation coefficients versus the lag k for k = 1, 2, …, 5. The
correlogram is shown in Fig. 15.6.
Time Series Modelling Solution: The correlogram for the given values of autocorrelation
coefficients is shown in Fig. 15.7.
Fig. 15.7: Correlogram for 10 sample autocorrelation coefficients of the series of 200
observations.
Example 4: A random walk (St, t = 0, 1, 2, …) starting at zero is obtained
by cumulative sum of independently and identically distributed (i.i.d)
random variables. Check whether the series is stationary or non-stationary.
Solution: Since we have a random walk (St, t = 0, 1, 2, …) starting at zero
obtained from cumulative sum of independently and identically distributed
(i.i.d) random variables, a random walk with zero mean is obtained by
defining S0 = 0 and
St = Y1 + Y2 +….+Yt, for t = 1, 2, …
where {Yt} is i.i.d. noise with mean zero and variance σ2. Then we have
2
E (St) = 0, E (St ) = tσ2 < ∞ for all t
E3) Calculate r2, r3, r4 and r5 for the time series given in Exercise 1 and plot
a correlogram.
E4) Calculate r2, r3, r4 and r5 for the time series given in Exercise 2 and plot
a correlogram.
E5) A computer generates a series of 500 observations that are supposed to
be random. The first 10 sample autocorrelation coefficients of the
series are:
r1 = 0.09, r2 = –0.08, r3 = 0.07, r4 = –0.06, r5 = –0.05, r6 = 0.04,
r7 = –0.3, r8 = 0.02, r9= –0.02, r10 = –0.01
Plot the correlogram.
66
Let us now summarise the concepts that we have discussed in this unit. Stationary Processes
15.5 SUMMARY
1. A time series is said to be stationary if there is no systematic change in
mean, variance and covariance of the observations over a period of time.
If a time series is stationary, we can model it and draw further inferences
and make forecasts. If a time series is not stationary, we cannot do any
further analysis and make reliable forecasts. If a time series shows a
particular type of non-stationarity and some simple transformations can
make it stationary, then we can model it.
16.1 INTRODUCTION
In Unit 15, you have learnt that there are two types of stationary processes:
strict stationary and weak stationary processes. You have also learnt how to
determine the values of autocovariance and autocorrelation coefficients, and
to plot a correlogram for a stationary process. In this unit, we discuss
various time series models.
In Sec. 16.2 of this unit, we introduce an important class of linear stationary
processes, known as Moving Average (MA) and Autoregressive (AR)
processes and describe their key properties. We discuss Autoregressive
Moving Average (ARMA) models in Sec. 16.3. We also discuss their
properties in the form of autocorrelations and the fitting of suitable models
to the given data. We discuss how to deal with models with trend by
considering integrated models, called the Autoregressive Integrated Moving
Average (ARIMA) models in Sec. 16.4.
Objectives
After studying this unit, you should be able to:
describe a linear stationary process;
explain autoregressive and moving average processes;
fit autoregressive moving average models;
describe and use the ARIMA models; and
explore the properties of AR, MA, ARMA and ARIMA models.
i2 , i2 … (3)
The autocorrelation function (acf) of the MA(q) process is given by Time Series Models
k
k 1 k 1 ... q k q , k 1, 2, ..., q ... (12)
q
1 i2
i 1
Note that the autocorrelation function (acf) becomes zero, if lag k is greater
than the order of the process, i.e., q. This is a very important feature of
moving average (MA) processes.
First and Second Order Moving Average (MA) Processes
For the first order moving average {MA (1)} process, we have
X t a t 1 a t 1 … (13)
The mean and variance are obtained for q =1 as
EX t 0 ,
VX t a2 1 12 … (14)
and the autocorrelation coefficient is obtained for q=1 as
1
1 … (15)
1 12
Similarly, for the second order Moving Average MA(2) process we have
X t a t 1a t 1 2 a t 2 … (16)
For q = 2, the mean and variance are given as
EX t 0 ,
V X t a2 1 12 22 … (17)
The autocorrelation coefficients are given as
1 12 , 2
2
1
1 2
1 2
2
1 12 22 … (18)
There is no requirement on the constants β1 and β2 for stationarity.
However, for unique representation of the model, the autocorrelation
coefficients should satisfy the condition of invertibility, which is satisfied
when the roots of
B 1 1B 2 B 2 ... q Bq 0
… (19a)
lie outside the unit-circle, i.e., roots |B| >1.
For MA (1) process, we have
Time Series Modelling Check whether a moving average MA (1) process can be fitted to the data
and obtain preliminary estimates of the parameters.
Solution: We are given the sample mean of 60 observations as 4.0 and
estimate of a2 , i.e., ˆ 2a 3415.72 .
The MA (1) model is written as
Yt a t 1a t 1
X t Yt a t 1a t 1
If the process is purely random, all the autocorrelations (rk) should be in the
range of
2
N
2 2
In this case, = = ±0.258
N 60
Here we see that of the given autocorrelations, only r1 lies outside the range,
given by ±0.258. This suggests that moving average MA (1) model could
be a suitable model since only ρ1 is significantly different from zero and
ρk, k >1 lie within the range ±0.258.
or X t a t
where B is the backward shift operator, defined as
B X t X t 1 , B 2 X t X t 2 , ...... B p X t X t p … (24)
which gives
2x a2 1 12 … (30)
1
1 1 2
2
2
2
1
… (35)
1 12 1
2
1
1 12
1 , 2 2
1 2 1 2
0 1, 1
0.80
0.50 and 2 0.50
0.80 2 0.20
1 0.60 1 0.60
We obtain the values of the autocorrelations ρ3, ρ4 and ρ5 using equation (37)
and get
78
b) X t X t 1 X t 2 12 a t
For the given rk, we have to calculate parameters α1, α 2, …, αp of the model:
X t 1 X t 1 2 X t 2 ... p X t p a t … (38b)
Time Series Modelling with respect to α1, ..., αp, and equating the result to zero. This method
provides good estimates.
If Yt, t = 1, 2, …, N is the observed series X t Yt Ŷ are used in
equation (39). This looks very similar to multiple regression estimates and
by differentiating S with respect to α1, α 2, …, αp and equating the result to
zero, we get a set of k equations
Rˆ r … (40)
where R is a matrix of autocorrelations given by
1 r1 .............rp 1
r1 1 ............. rp 2
R ........................................ … (41)
........................................
r rp 2 ............1
p1
and r = (r1, r2 , ..., rp ) is the row matrix corresponding to the column matrix r.
Thus, ̂ is obtained by solving the simultaneous equations (40) using
inverse of R matrix denoted by R−1 as
ˆ R 1 r … (42)
16.2.4 Determining the Order of an Autoregressive Model
For fitting the model, we have to estimate the order of the autoregressive
model for the data at hand. For the first order autoregressive model, the
autocorrelation function (acf) reduces exponentially as follows:
k 1k as |α1| < 1.
Hence, for an autoregressive process AR (1), the exponential reduction of
autocorrelation function (acf) gives a good indication that the autoregressive
process is of order 1. However, this is not true for correlogram of higher
orders. For two and higher order autoregressive models, the autocorrelation
function (acf) can be a combination of damped exponential or cyclical
functions and may be difficult to identify.
One way is to start fitting the model by taking p = 1 and then p = 2, and so
on. As soon as the contribution of the last α p fitted is not significant, which
can be judged from the reduction in the value of residual sum of squares, we
should stop and take the order as p −1. An alternative method is to calculate
what is called partial autocorrelation function.
16.2.5 Partial Autocorrelation Function (pacf)
For an autoregressive AR (p) process, the partial autocorrelation function
(pacf) is defined as the value of the last coefficient α p. We start with p =1
and calculate pacf. Hence, for the AR (1) process, pacf (1) is
α1 = ρ1 … (43a)
For AR (2), the pacf is given by
2
2
2
1
… (43b)
1 2
1
substituting estimated autocorrelations rk in place of ρ and then test the Time Series Models
significance. When partial autocorrelation function (pacf) is zero, its
asymptotic standard error is 1/√N. Hence, we calculate partial
autocorrelation functions (pacf) by increasing the order by one every time.
As soon as this lies within range of ± 2/√N, we stop and take the order as
the last significant partial autocorrelation function (pacf). This is indicated
when pacf lies outside the range of 2/√N. In the following steps, we give
partial autocorrelation functions (pacf) up to autoregressive AR(3) process:
1 r1 r2
r1 1 r1
r2 r1 r3
and pacf (3) = … (44b)
1 r1 r2
r1 1 r1
r 2 r1 1
81
= = - 0.634
(1 - r ) 1
2
0.350
1 0.806 0.428
0.806 1 0.806
0.428 0.806 0.070
pacf (3) = 0.077
1 0.806 0.428
0.806 1 0.806
0.428 0.806 1
0
a2 1 2 2 … (49a)
1 2
1 0 2a … (49b)
k k 1 , k≥2 … (49c)
We also obtain
1
1 … (49d)
1 2
2
k k 1 , k≥2 … (49e)
Thus, the autocorrelation function decays exponentially from the starting
value 1, which depends on α and β.
Let us take up an example of the ARMA model. 83
Time Series Modelling Example 6: Write the following ARMA (1,1) model
X t 0.5 X t 1 a t 0.3 a t 1
using backward operator B. Is the process stationary and invertible?
Calculate 1 and 2 for the process.
Solution: Since α = 0.5 and β= – 0.3, from equation (48b), the model is
written using backward operator B as:
1 0.5 B X t 1 0.3 B a t
In this case, from equations (47a and b) we have
Φ (B) = 1– 0.5 B and θ(B) = 1 – 0.3B
Therefore, for stationarity and invertibility, from equation (47c), the roots of
1 – 0.5 B = 0 and 1 – 0.3B = 0 must lie outside the unit circle. The roots of
these equations are:
B = 1/0.5 = 2.0 and B = 1/0.3 = 3.33
Since both roots lie outside the unit circle, the process is stationary and
invertible. From equations (49 d and e),
1 1 B / 1 2 2 0.215 , and
2 1 0.107
You may now like to try out an exercise.
E8) Show that the ARMA (1, 1) model
X t 0.5 X t 1 a t 0.5 a t 1
X t X t 1 X t 1 X t 2 X t 2 X t 1 X t 2 … (52)
In general, the ARIMA model can be written as:
Wt 1 Wt 1 2 Wt 2 ... p Wt p a t 1a t 1 2 a t 2 ... q a t q
… (53a)
or using backward operator B, it can be written as:
B Wt B a t …(53b)
B1 B X t B a t
d
or … (53c)
It is denoted by ARIMA (p, d, q). The operator Φ (B) (1−B)d has d roots of
B equal to 1. For d = 0, the series is an ARMA process. In practice, the first
or second difference make the process stationary. A random walk model is
an example of the ARIMA model.
Consider the time series
X t X t 1 a t … (54a)
which can be written as
1 B X t a t … (54b)
It is clearly non-stationary as one root of
Φ (B) = 1− B = 0 … (54c)
lies on the unit circle. To make it stationary, we take one difference of Xt, as
Wt X t X t 1 a t
So the time series can be written as ARIMA (0,1,0). Wt is a white noise
process and stationary.
A plot of the first difference looks like a plot of a stationary process without
any trend. The plot of autocorrelations and partial autocorrelations provide
the idea of the process.
Example 7: For the model
1 0.2 B1 BX t 1 0.5 B a t
find p, d, q and express it as ARIMA (p, d, q). Determine whether the
process is stationary and invertible.
Solution: We are given the model
1 0.2 B1 B X t 1 0.5 B a t
a) In this case, from equations (53 b and c), we can write the given model as
1 0.2 B1 B1 X t 1 0.5 B a t
which implies that Wt = (1 – B) Xt, i.e., d = 1 and from equation (53a)
X t 0.2X t 1 a t 0.5 a t 1
85
Time Series Modelling This implies that p = 1 and q = 1. Hence, the process is ARIMA (1,1,1)
b) F( B) = (1 - B)(1 - 0.2B) = 0 Þ B = 1 and B = 5 and
16.5 SUMMARY
1. The sequences of random variables {Yi} are mutually independent and
identically distributed. If a discrete stationary process consists of such
sequences of i.i.d. variables, it is called a purely random process.
Sometimes it is called white noise.
2. The moving average processes are used successfully to model stationary
time series in econometrics. The MA(q) process of order q is given as
X t 0a t 1a t 1 ... q a t q
where βi, (i = 0, 1, 2, …, q) are constant.
3. The autocorrelation function (acf) of the MA (q) process is given by
k
k 1 k 1 ... q k q , k 1, 2, ..., q
q
1 i2
i 1
It becomes zero if lag k is greater than the order of the process, i.e., q.
This is a very important feature of moving average (MA) processes.
4. A linear stationary process can always be expressed as an
autoregressive process of suitable order. Unlike moving average (MA)
process, which puts no restrictions on parameters for stationarity,
autoregressive (AR) process requires certain restrictions on the
parameter α for stationarity.
86