Sta2604 Study Guide
Sta2604 Study Guide
STA2604: Forecasting II
2021
i
Open Rubric
Contents
1 An Introduction to Forecasting 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Components of a time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Forecasting Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 Qualitative methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2 Quantitative methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Errors in forecasting and forecast accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.1 Absolute deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.2 Mean absolute deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.3 Squared error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.4 Mean squared error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.5 Absolute percentage error (APE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.6 Mean absolute percentage error (MAPE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.7 Forecasting accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Choosing a forecasting technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.1 Factors to consider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.2 Strike the balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5 An overview of quantitative forecasting techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
ii
2.5.1 Leverage values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.2 Residual magnitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.3 Studentised residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5.4 Cook’s distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.5 Dealing with outliers and influential observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
iii
5.5.1 Additive Holt-Winters method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5.2 Multiplicative Holt-Winters method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.6 Damped trend exponential smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.6.1 Damped trend method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.6.2 Additive Holt-Winters with damped trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.6.3 Multiplicative Holt-Winters with damped trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
iv
Welcome message
Dear Student,
Welcome to the module Forecasting II (STA2604), which is offered in the Department of Statistics. I am Prof G Kabera and I
will be your lecturer for this module. I trust that this module will deepen your understanding of Statistics and help you further
your studies in general. The module will enable you to explore to explore different aspects of time series forecasting methods.
The module enables the student to explore and analyse a wide spectrum of problems on basic time series. More specifically, on
completing the module you should be able to describe the components of a time series model, estimate and interpret them.
The study material for this module is available online only. You will find more details on how to study this module in Tutorial
Letter 101. The different options that are available on this site are shown on the left-hand side of the screen. You will find the study
material for the module in the folder Additional Resources. Your tutorial letters and past examination papers are stored under
Official Study Material. You may be requested to post your answers to certain activities in the Discussion forums tool, and you
may also use this tool to raise issues with me or your fellow students. After reading this page, you should read Tutorial Letter 101
(if you have not done so already). Then you should proceed to your study material. If you have any queries about the module, you
are welcome to contact me by email or telephone. I wish you all the best with your studies.
Prof G Kabera
Tel: +27 11 670 9062
Email: [email protected]
Office: GJ Gerwel Building 607, Science Campus, Florida
v
About this module
Prologue
Forecasting is the process of making statements about events whose actual outcomes have not yet been observed. A commonplace
example might be the estimation of the expected value for some variable of interest at some specified future date. Prediction
is similar, but is a more general term. Both might refer to formal statistical methods employing time series, cross sectional or
longitudinal data, or alternatively to less formal judgemental methods. More will be seen in various parts of this module.
The module is about Forecasting, which deals with the methods used to predict the future, i.e. to forecast. Can you think of a
situation where predictions of the future are needed or cases where forecasting is done? By its nature it is a quantitative method that
uses numeric data. There are various forecasting methods, some of them being qualitative because they are based on non-numeric
data. Even though qualitative methods feature in some of our discussions, they are not dealt with in depth in this module.
This module presents fundamental aspects of Time Series analysis used in forecasting. The prescribed textbook for this module
is Bowerman, O’Connell and Koehler (2005). We will not study all the chapters in the book for this module, but will focus on
Chapters 1, 5, 6, 7 and 8. It is assumed that you are very familiar with the material on simple and multiple linear regression covered
in Chapters 3 and 4. Please revise these two chapters since Chapters 5 to 8 are strongly use linear regression techniques.
References
The prescribed book must be purchased. Refer to the study guide regularly. We shall also refer to a number of user-friendly
textbooks on Time Series that are available in the Unisa library.
Prescribed book
Bowerman, B. L., O’Connell, R. T. & Koehler, A. B. (2005) Forecasting, time series and regression: an applied approach, 4th
edition. Singapore: Thomson Brooks/Cole.
vi
The presentation of the module
This study guide summarises the five chapters of interest in the prescribed textbook.
Prior knowledge
It is important that you are familiar with a section before moving to the next one. This will serve as a foundation for the forthcoming
work. Leaving out work without understanding can only lead to an accumulation of problems when responding to assignments and
writing the examination. This is also true about the prerequisites from first-year statistics and the knowledge you have acquired
through the years. Sensible or smart application is based on the use of the accumulated techniques, experiences and knowledge.
Plotting of graphs, fitting a linear model, and so on, are needed in some instances. You are urged, therefore, to incorporate all
the useful techniques in the solutions to exercises. We advise you to revisit these topics in your first-year modules and in some
second-year modules.
It is necessary to realise that numbers alone do not provide all the answers. It should be clear to you that aspects of a qualitative
nature add value to the predictions made so that the data context is clear.
The exercises selected for assignments are important in reinforcing what you need to understand in this module. Take time to
understand the aspects that go with them. Analyse the postulates in the given statements and thereafter the requirements so that it
becomes easy to recall what is necessary in compiling a solution. In that way you do not only solve the problem, you understand
it and enjoy solving it. At the end of the year there is a two-hour examination. The choice of closed- or open-book will depend on
the global heath system, e.g. Covid-19 or endemic/pandemic constraints. The discussions in the study guide and the textbook will
assist you master the module and therefore prepare you for the examination.
This study guide has been prepared to guide you through the prescribed book, but it does not replace the textbook. Therefore,
we will always use it together with the prescribed book. Read them together. The textbook presents the concepts while the study
guide attempts to bring the concepts closer to you. Hence, the prescribed textbook is more important than the study guide.
Each study unit starts with the outcomes in order to show you what you need to know and to evaluate yourself. The table of
outcomes also gives each outcome together with the way the outcome will be assessed, the content needed for that outcome, the
activities that will be used to support the understanding of the content and the way feedback will be given. Your input in the form of
positive criticism to improve the presentation will be of importance in the review of this study guide. You are therefore encouraged
to suggest ways that you believe can improve the presentation of this module.
This module is part of the whole Statistics curriculum at Unisa. Its position on the curriculum structure is as follows:
vii
1st year STA1501 STA1502 STA1503
STA2604
2nd year STA2601 STA2602 STA2603 FORECASTING II STA2610
We are here
3rd year STA3701 STA3702 STA3703 STA3704 STA3705 STA3710
You should already be familiar with some of the modules mentioned above. Knowledge from STA2604 will help you in
STA3704 (Forecasting III).
Assignments
There are three assignments for this module, which are intended to help you learn through various activities. They also serve as
tests to prepare you for the examination. As you do the assignments, study the reading texts, consult other resources, discuss the
work with fellow students or tutors or do research, you are actively engaged in learning. Looking at the assessment criteria given
for each assignment will help you to better understand what is required of you. The three assignments form part of the learning
process. The typical assignment question is a reflection of a typical examination question.
There are fixed submission dates for the assignments and each assignment is based on specific chapters (or sections) in the
prescribed book. You have to adhere to these dates as assignments are only marked if they are received on or before the due dates.
The three assignments are compulsory as
● they are the sole contributors towards your year mark and
● they form an integrated part of the learning process and indicate the form and nature of the questions you can expect in the
examination.
Please note that the submission of assignment 01 is currently the guarantee for examination entry. If you do not submit
assignment 01, UNISA, not the Department of Statistics will deny you examination entry. UNISA may also require a sub-
minimum of 40% in year mark for examination entry. Once the decision will be implemented it will not be enough to submit
Assignment 01 to be guaranteed examination entry.
You are urged to communicate with your lecturer(s) whenever you encounter difficulties in this module. Do not wait until
the assignment due date or the examination to make contact with lecturers. It is helpful to be ready long in advance. You are
also encouraged to work with your own peers, colleagues, friends, etc. However, you must work on your own when compiling
assignment answers. Sharing answers is pure plagiarism. UNISA and all education institutions do not tolerate plagiarism/cheating.
General details about the assignments will be given Tutorial letter 101. However, the assignments questions will be gradually given
to you throughout the year,
Time series has its own useful terminology that should be understood. In order to familiarise yourself with it, let us start with
an easy activity. Activities help in the creation of a mind map of the module. The more you attempt these activities, the better you
will understand the work.
Glossary terms
ACTIVITY 0.1
(a) Make a list of all the concepts that are printed in bold type in Chapters 1, 5, 6, 7 and 8 of the prescribed book. They serve as
your glossary. Of course this is a cumbersome task since there are several such concepts in the prescribed textbook.
viii
(b) Attempt to explain the meanings of these concepts before you deal with the various sections so that you have an idea before
we get there.
(a) There is a missing concept/term among the ones you listed, which is absolutely fundamental. It appears with other terms
or phrases. The term is “data”. You came across the term many times when you studied other modules and in some other
contexts. It is emphasised that it is a useful aspect in forecasting. If you do not have data, you will not be able to make
forecasts.
(b) Do not worry if the meanings you gave do not match the content in the tutorial letter or textbook. The intention was to make
you aware of aspects on which to focus in your learning. What is required from you is a step-by-step journey through the
prescribed material.
ACTIVITY 0.2
There is a general misconception that data and information are the same concepts. This is not necessarily the case. Data are
records of occurrences from which we obtain information. It is not necessarily information on its own, but may sometimes be
information. The truth is, data possess information that is seen after some analysis. They are often the raw answers we receive
from an investigation.
Prerequisites
● The ability to use a scientific calculator.
● Access to a computer package and the ability to use it are highly recommended. The minimum requirement is the ability to
use Excel.
ix
● First-year statistics. The following topics are of great importance in this module:
When you draw plots required for statistical analysis, these plots should be accurate. Hence, use a ruler and a lead pencil (not
a pen) to construct plots. If you have access to a computer, you are also encouraged to practise using any statistical package of
your choice. Assignments may also be prepared by means of a computer. Currently, we only recommend using Excel for some
assignment questions. Just make sure that you use the correct notation. Avoid using a computer if you cannot write the correct
notation. Remember that you are always welcome to contact the lecturers whenever you have problems with any aspect of the
module.
Outcomes
At the end of the module you should be able to do the following:
● Apply important concepts and methods in forecasting and detect forecasting errors.
● Model the trend of time series data and detect and handle first-order correlation.
The assessment, content, activities and feedback for these outcomes are presented in the table on the next page.
x
Table of outcomes
Outcomes - At the
end of the module
Assessment Content Activities Feedback
you should be
able to
- explain and explore - trend
- analyse data - examine data - discuss
time series - seasonality
- plot graphs visually likely
components - cycles
- plot graphs errors
- irregularity
- select a model - balance - choose a statistical - analyse errors - scrutinise
factors technique - plot graphs models
You will know that you understand this module once you will be able to define, describe and apply the concepts in the above
outcomes.
Feedback is not just a follow-up of the preceding concepts. It provides you with an opportunity to reinforce some concepts
and revise others. Make use of this opportunity. Feedback is given after every activity, sometimes with some discussion after the
activity, but in many instances, it follows immediately after the activity.
This widely accepted view of the past might not be correct. Historians often interject their own beliefs and biases when they
write about the past. Facts become distorted and altered over time. It may be that the past is a reflection of our current conceptual
reference. In the most extreme viewpoint, the concept of time itself comes into question.
xi
The future, on the other hand, is filled will uncertainty. Facts give way to opinions. The facts of the past provide the raw
materials from which the mind makes estimates of the future. All forecasts are opinions of the future (some more carefully
formulated than others). The act of making a forecast is the expression of an opinion. The future consists of a range of possible
future phenomena or events.
One of the first rules is to consider how the forecast results will be used. It is important to consider who the readers of the final
report will be during the initial planning stages of a project. It is wasteful to spend resources on an analysis that has little or no
use. The same rule applies to forecasting. We must strive to develop forecasts that are of maximum usefulness to planners. This
means that each situation must be evaluated individually as to the methodology and type of forecasts that are most appropriate to
the particular application.
Forecasting can, and often does, contribute to the creation of the future, but it is clear that other factors are also operating. A
holographic theory would stress the interconnectedness of all elements in the system. At some level, everything contributes to the
creation of the future. The degree to which a forecast can shape the future (or our perception of the future) has yet to be determined
experimentally and experientially.
Sometimes forecasts become part of a creative process, and sometimes they do not. When two people make mutually exclusive
forecasts, both of them cannot be true. At least one forecast is wrong. Does one person’s forecast create the future, and the other
does not? The mechanisms involved in the construction of the future are not well understood on an individual or social level.
Ethics in forecasting
Are predictions of the future a form of propaganda, designed to evoke a particular set of behaviours? Note that the desire for control
is implicit in all forecasts. Decisions made today are based on forecasts, which may or may not come to pass. The forecast is a
way to control today’s decisions.
The purpose of forecasting is to control the present. In fact, one of the assumptions of forecasting is that the forecasts will be
used by policy-makers to make decisions. It is therefore important to discuss the ethics of forecasting. Since forecasts can and
often do take on a creative role, no one has the absolute right to make forecasts that involve other people’s futures.
xii
Nearly everyone would agree that we have the right to create our own future. Goal setting is a form of personal forecasting. It
is one way to organize and invent our personal future. Each person has the right to create his/her own future. On the other hand, a
social forecast might alter the course of an entire society. Such power can only be accompanied by equivalent responsibility.
There are no clear rules involving the ethics of forecasting. Value impact is important in forecasting, i.e. the idea that social
forecasting must involve physical, cultural and societal values. However, forecasters cannot leave their own personal biases out of
the forecasting process. Even the most mathematically rigorous techniques involve judgmental inputs that can dramatically alter
the forecast.
Many futurists have pointed out our obligation to create socially desirable futures. Unfortunately, a socially desirable future for
one person might be another person’s nightmare. For example, modern ecological theory says that we should think of our planet in
terms of sustainable futures. The finite supply of natural resources forces us to reconsider the desirability of unlimited growth. An
optimistic forecast is that we achieve and maintain an ecologically balanced future. That same forecast, the idea of zero growth, is
a catastrophic nightmare for the corporate and financial institutions of the free world. The system of profit depends on continual
growth for the well-being of individuals, groups, and institutions.
‘Desirable futures’ is a subjective concept. It can only be understood relative to other information. The ethics of forecasting
certainly involves the obligation to create desirable futures for the person(s) that might be affected by the forecast. If a goal of
forecasting is to create desirable futures, then the forecaster must ask the ethical question of “desirable for whom?”.
To embrace the idea of liberty is to recognise that each person has the right to create his/her own future. Forecasters can
promote libertarian beliefs by empowering people that might be affected by the forecast. Involving these people in the forecasting
process, gives them the power to become co-creators in their futures.
Now that you have some background on forecasting, let’s start exploring the topic in detail in Unit 1.
xiii
Unit 1
An Introduction to Forecasting
1.1 Introduction
The aim of this unit is to define important concepts and methods in forecasting and detect forecasting errors. The outcomes of the
unit are:
• Select a forecasting method that is appropriate for particular requirements and that is based on relevant time series data.
Further details on this unit outcomes are given in the following table.
Outcomes - At the
end of the unit
you should be Assessment Content Activities Feedback
able to do
the following:
- discuss each
- define time - data plots - time series - experiment
activity
series terms and measures word list with data
If you understand the above activities, it will be an indication that you understand this study unit.
Forecasting is the scientific process of estimating some aspects of the future in usually unknown situations. Prediction is a
similar, but is a more general term. Both can refer to estimation of time series, cross-sectional or longitudinal data. Usage can
1
differ between areas of application: for example in hydrology, the terms ”forecast” and ”forecasting” are sometimes reserved for
estimates of values at certain specific future times, while the term ”prediction” is used for more general estimates, such as the
number of times floods will occur over a long period.
It is essential to note that in this module, the emphasis is on scientific forecasting. This is to ensure that we do not consider
subjective predictions. and spiritual prophecies as part of our scope for this forecasting module. Risk and uncertainty are central to
forecasting and prediction. Forecasting is used in the practice of Customer Demand Planning in everyday business forecasting for
manufacturing companies. The discipline of demand planning, also sometimes referred to as supply chain forecasting, embraces
both statistical forecasting and a consensus process.
Forecasting is commonly used in discussion of time-series data. The terms relating to forecasting used in this module are fairly
straightforward and are explained in the prescribed book.
● Supply chain management - Forecasting can be used in Supply Chain Management to make sure that the right product is at
the right place at the right time. Accurate forecasting will help retailers reduce excess inventory and therefore increase profit
margins. Accurate forecasting will also help them meet consumer demand.
● Economic forecasting
● Technology forecasting
● Earthquake forecasting
● Product forecasting
● Telecommunications forecasting
● Political forecasting
● Sales forecasting
ACTIVITY 1.1
Consider the terms “forecasting”, “cross-sectional data” and “time series”, which are the main focus of this study unit.
(a) Attempt to define these terms without consulting the prescribed textbook or any source such as Google.
(b) Check the definitions in the book and compare your answers in (a).
Before we discuss the above activity, start by reading slowly through the following discussion. Make sure you follow the
discussion.
2
1.1.1 Forecasting
Many people asked about the term “forecasting” make reference to the weather forecast that is presented on radio, television and
the internet. From this we can infer that the general public does not have a clear understanding of the meaning of forecasting.
Historical evidence shows that at every point in time when people lived, they were always interested in the future. There
are stories from history that inform us that when people dreamed, there were experts to explain the meanings of these dreams in
terms of the future. When signs of future drought arose, the implications of the drought were noted and plans were made to offset
the impacts that were anticipated. Drought led to hunger. Thus, when predictions were made so that there was drought coming,
preparations were made that at the time of the drought, there would be enough food for every member of the community during the
duration of the drought. A good example of such an expert is the prisoner Joseph, who according to the Biblical story interpreted
to Pharaoh a dream about a seven-year famine in Egypt and surrounding regions, including Israel where Joseph came from as a
slave. That interpretation resulted in him being promoted from prisoner to prime minister. Predicting the future even as it was done
during the old days can be referred to as forecasting. The predicted future was then used to plan for the future as explained above.
The modern scientific approach has encouraged a more formalised conception of the practice of ”anticipating the future”.
practice be conceptualised. It was then formally termed “forecasting”. The current approaches are scientific in order to ensure
that forecasting is practised systematically. The predictions made are now called forecasts. In other words, forecasts are future
expectations based on scientific guidelines.
• Marketing department
• Finance
• Personnel management
• Production scheduling
• Process control
• Strategic management
Please read the details regarding the examples. Several examples can be mentioned including our own context at UNISA. The
number of student enrolments at Unisa is the starting point. The trend pattern will give an indication of whether there has been a
decline or growth in the student numbers over the years. If you are observant, you will realise that there has been an increase in
student numbers over the past few years. Our “forecast” for next year (2022) is that there will be more students than in 2021.
ACTIVITY 1.2
Weather forecasting was mentioned as a known example where forecasting is used abundantly. There are many others.
3
DISCUSSION OF ACTIVITY 1.2
We discussed the Unisa example. If you are interested in African politics and elections you will be interested in making
predictions about political parties that are going to be in the forefront in the next election. You might anticipate extreme growth
of some political parties and the decline of other parties in a given country, based on the trends in the previous elections and
developments that prevail. Therefore,
(a) one can for example predict how the political parties will perform in the next election; and
(b) recent performance of the various parties in previous elections may be revisited and analysed, the current activities of the
parties may be analysed closely and one may interact with people to determine their impressions about various parties.
N.B.: Here it is assumed normal election conditions where no intimidation, harassments or fraud take place.
1.1.2 Data
Data are important for forecasting. Quality data, which loosely refers to reliable and valid data, are the ones needed for forecasting.
It may be misleading if poor quality data are used because the results may likely to be poor as well, even if best methods are used
by a proficient analyst. The term data refers to groups of information that represent the qualitative or quantitative attributes of a
variable or set of variables. Data (plural of ”datum”, which is seldom used) are typically the results of measurements and can be
the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from
which information and knowledge are derived. Raw data refers to a collection of numbers, characters, images or other outputs
from devices that collect information to convert physical quantities into symbols that are unprocessed.
Without data there will not be forecasting. However, it is important that data be correct (reliable, valid, realistic, etc). Data
need to be both valid for the exercise, and be reliable. If one of these does not apply, then be warned that your forecasts may
mislead you or any user. Also, the collection of data may be inadequate to help in supporting the reasoning behind some findings.
Experience shows that when data are collected under certain contexts, explanations and contexts become clearer when findings are
associated with those contexts. Thus, if you assist in data collection of time series or any statistical data, whenever possible, advise
on the inclusion of details of the occurrences of the data. Giving details around events assists in reducing the extent of making
assumptions which may sometimes be incorrect.
The type of information used in forecasting determines the quality of the forecasts. Not all of us like boxing, but let us discuss
the next scenario. Imagine that two boxers were going to fight on the next Saturday. We were required to make a prediction in
order to win a million rand competition. Many participants looked at the past records of these boxers. They were informed that
in the previous seven years boxer Kangaroo Gumbu had won 25 out of 27 fights while boxer Boetie Blood had won 22 of the 30
fights he had in the same period. Gumbu was known for winning well while Blood had lost dismally in a recent fight. Let us pause
and enjoy the predictions (forecasts) made, just to make a good point.
ACTIVITY 1.3
Either as a person interested in boxing or someone hoping to win the money, you may be tempted to take a chance at the answer.
Make a prediction of the outcome of the fight based on the explanation given.
4
DISCUSSION OF ACTIVITY 1.3
Let us determine the probabilities as statisticians. Using frequencies, Gumbu had a probability of 0.93 of winning the fight
while Blood had probability of 0.73 of winning the fight. On the basis of these probabilities, many participants predicted that
Gumbu was going to win.
Do you know how the probabilities 0.93 and 0.73 have been obtained? If it is not clear, divide the number of successes (wins)
of each boxer by the total number of fights that each boxer had fought.
The data given were based on certain assumptions. Among others, there was the impression that the opponents of the two
boxers were of the same quality. If they were not, then the prediction would be carrying some “inaccuracies”. Among other
omissions, we were not told that the boxing bout was going to be held in the catchweight division, where boxers came from
different weight divisions and could not both fall within a single previously defined weight division. Blood had fought only world-
class opponents and came from two weight divisions heavier than the weight to which Gumbu belonged. That is, there was a
difference between the original weights of the two boxers. Gumbu, on the other hand, was a boxer who talked too much. He had
fought some mediocre opponents and wanted to pretend he was an excellent boxer. He had asked for the fight. In insisting on
the fight, he had called Blood a coward until the bout was sanctioned. At the time he was preparing for an elimination bout in his
weight division after which he was going to fight for a world title if he won. The planned elimination bout was probably going to
be the first real test for Gumbu as a professional fighter. It was going to come “after I am done with Blood,” boasted Gumbu.
In the street some people were predicting that Gumbu was going to lose, but they did not bet as money was required. None of
those who paid to enter the competition predicted correctly. The fight ended with a first-round knockout. Blood was the winner.
Gumbu was no match.
In the case of the example/scenario given, the details were missing, such as that the two boxers were of different weights. If
we knew, this would have helped in our analysis. Sometimes in predicting about forthcoming games, one needs to also know the
quality of opposition that the two opponents have met in the accumulation of their records. This was also missing in the example.
We will insist on use of the valid assumptions because as we saw, wrong or invalid assumptions are likely to give inaccurate
predictions.
Types of data that are common in real life are cross-sectional data and time series data. Cross-sectional data refers to data
collected by observing many subjects (such as individuals, firms or countries/regions) at the same point of time, or without regard
to differences in time. Analysis of cross-sectional data usually consists of comparing the differences among the subjects. For
example, we want to measure current obesity levels in a population. We could draw a sample of 1,000 people randomly from that
population (also known as a cross section of that population), measure their weight and height, and calculate what percentage of
that sample is categorized as obese. Even though we may analyse cross-sectional data for quality forecasts, in this module we use
time series data.
5
A time series is a chronological sequence of observations on a particular variable. In general time series data are observed at
equally spaced time points. For instance, time series data can be collected hourly, daily, weekly, monthly, annually, after five years,
ten years, etc. However, equally time points are not a must.
We have to be careful when discussing time series data. If the data are listed without time specification, then we should not
consider the data to be time series.
SCENARIO
Read the following scenario carefully and make notes as we will keep on referring back to it.
Suppose that Jabulani is a milk salesperson during the week, serving the Florida, Muckleneuk and VUDEC UNISA campuses.
Very fortunately for Jabulani, his milk cows increased and his market in these campuses also increased from year to year. Jabulani’s
business runs from Mondays to Sundays. In a time series analysis a typical question would be: what can we say about the trend of
the sales? Asked differently: should we believe that the sales have a decreasing or increasing trend? It will be clear later on that
the sales levels differ according to days, high on some days and low on others. The pattern of low sales or high sales on different
days have an important connotation in time series analysis. This will be discussed.
ACTIVITY 1.4
You have done some first-year statistics modules/courses and some of you did mathematics modules as well. Let us consider
the following data sets and look at them quite closely.
(a) The two data sets have exactly the same numbers. There is something strange about their appearances though. Compare the
two data sets.
(b) Can these two data sets be classified as time series data sets? Explain.
Discussion
The data above do not necessarily represent time series data, but it can be presented in another way to form time series data -
provided they were collected chronologically over regular time intervals. Suppose data set 1.1 represents the sales of milk sold by
6
Jabulani from Monday to Sunday for four weeks. Let 1 = Monday, 2 = Tuesday, ..., 7 = Sunday as given in data set 1.3. The data
sets should therefore be presented as follows:
Day
1 2 3 4 5 6 7
1 16 14 19 26 11 24 10
Week 2 18 15 21 24 12 21 9
3 21 15 20 27 13 25 11
4 24 17 24 31 14 27 13
We emphasise that in the initial presentation there was simply no information to explain or demonstrate the chronological
sequence with respect to time and that the data were therefore not time series data.
ACTIVITY 1.5
It is required to use graphs in addition to other methods to detect patterns in time series data. Graphical plots reveal information
visually, but this cannot always be done with ease. The example that follows is one of the cases where we can easily draw graphical
plots. Analyse the data about Jabulani’s business by answering the following questions. Make any comments that you believe are
relevant.
(b) Plot the data to reveal the pattern using the following approaches:
7
(b) Graphs of the activity
(iii) In terms of the pattern, the graphs reveal that milk sales were highest on Thursdays, Saturdays and Wednesdays (in
order from highest to lowest). The lowest sales were revealed for Sundays, Fridays, Tuesdays and Mondays (in the
order from lowest to highest).
(c) The graphs can be difficult to compare when they are on separate systems of axes. The last graph makes comparison very
easy, revealing that the patterns for all four weeks are similar.
The patterns of the highest activity and lowest activity about a phenomenon are important in time series. Jabulani will easily
know when he does more business, when he does least business and he can plan to find better ways to improve business. Let us
start formalising these patterns.
8
Components are important because they enable us to see the salient features of a structure. Through them we can make
descriptions of what we need to analyse. When we deal with something that we can describe, we are better able to know the
requirements for dealing with it. Time series also has components that need to be considered and taken care of in their analyses.
Trend
The first component we discuss is trend. The term “trend” is about long-term decline or growth of an activity. It is defined
formally as the upward and downward movements that characterise a time series over a period of time.
Time series data may show upward trend or downward trend for a period of years. This may be due to factors such as increase
in population, change in technological progress, large scale shift in consumers’ demands, and so on. For example, population
increases over a period of time, price increases over a period of years, production of goods on the capital market of the country
increases over a period of years. These are the examples of upward trend. The sales of a commodity may decrease over a period of
time because of better products coming to the market. This is an example of declining trend or downward trend. The increase or
decrease in the movements of a time series is called trend.
Usually one would not be able to determine from looking at the data whether there is a decreasing or increasing trend. There
are times (but rarely) when we can see the pattern by inspection. Often a graphical plot clearly shows the trend. The trend may be
given in shapes such as linear, exponential, logarithmic, polynomial, power function, quadratic, and other forms. In general, we
use the graphical displays to find out if there is a decline or increase in the activity. Some examples of trend applications are the
following:
- Market growth
In Gauteng, the market for umbrellas decreases in the period April to July. During the rainy season, which in Gauteng
happens to be the summer season, the sales of umbrellas increase.
9
November to January, maize is in abundance and the prices drop. As the production level declines, the prices start increasing
again.
Cycle
The next component of time series that we discuss is “cycle”. When trends have been identified, there may be some recurring up
and down movements visible around trend levels. These movements are called cycles. Cycles occur over long and medium terms.
We need to note that generally, natural occurrences have shown some cyclical patterns over the years. Examples are pandemic
diseases that occur after a certain number years (Black plague, Spanish flu, Covid-19, etc.) and thus spikes in number of deaths are
observed.
The impact of cycles on a time series is either to stimulate or depress its activity, but in general, their causes are difficult to
identify and explain. Certain actions by institutions such as government, trade unions, world organisations, and so on, can induce
levels of pessimism and optimism into the economy which are reflected in changes in the time series levels. Economic indices are
usually used to describe cyclical fluctuations.
Cyclical variations are recurrent upward or downward movements in a time series but the period of cycle is greater than a
year. This restriction makes it different from trend. Also, cyclical variations are not as regular as seasonal variations. There
are different types of cycles of varying in length and size. The ups and downs in business activities are the effects of cyclical
variation. A business cycle showing these oscillatory movements has to pass through four phases-prosperity, recession, depression
and recovery. In a business, these four phases are completed by passing one to another in this order. Together, they form a cycle.
Cycles are useful in long-term forecasting. Usually it means centuries and millenniums. Our capabilities and interest in this
module do not require us to look beyond a decade. Hence, methods for developing forecasts that include cycles (or cyclical
components) are not in the scope of this module. However, you still need to understand when cycles are discussed or implied in a
forecasting situation.
Seasonality
The example about milk above dealt with weekly periods. Generally, Seasonal variations are periodic patterns in a time series
that complete themselves within a calendar year and are the repeated on a yearly basis. The impression it gives is that observations
being investigated, must run over a year. This is simply not the case. Even the values occurring within a day can be seen to be
seasonal, as you will soon see. First, we provide a more useful and realistic definition of seasonality, which will be used in the
module. The one given above will apply when the periods are over yearly periods. Let us define the concept in the next line:
Seasonal variations are systematic variations that occur within a period and which are tied to some properties of that period.
They are repeated within the period. They are indeed periodic patterns in a time series that complete themselves within a calendar
period and are repeated on the basis of that period.
Seasonal variations are short-term fluctuations in a time series which occur periodically in a period, such as a year. In this
case it would continue to be repeated year after year. The major factors that are responsible for the repetitive pattern of seasonal
variations are weather conditions and customs of people. More woollen clothes are sold in winter than in the season of summer.
Regardless of the trend we can observe that in each year more ice creams are sold in summer and very little in winter season. The
sales in the departmental stores are more during festive seasons that in the normal days.
10
Irregular fluctuations
We have not mentioned whether Jabulani was ever robbed of his revenue or stock for his business. Robbery is not a regular or
seasonal event, but can suddenly happen.
Irregular fluctuations are variations in time series that are short in duration, erratic in nature and follow no regularity in the
occurrence pattern. These variations are also referred to as residual variations since by definition they represent what is left out in a
time series after trend, cyclical and seasonal variations have been accounted for. Irregular fluctuations result due to the occurrence
of unforeseen events like floods, earthquakes, wars, famines, and so on.
Remember that Jabulani was a smart entrepreneur who would make some estimations of revenue each morning he left for work.
One Tuesday afternoon after he had counted what he thought was his revenue for the day, he was robbed by two thugs. Fortunately
he was neither hurt nor discouraged to continue with his business. It was happening for the first time. Could he have anticipated
being robbed on that day? We also could not have predicted that event.
The point is, that irregular event changed what could have been the revenue and/or profit for that day. In time series, irregular
fluctuations, which are also called irregular variations, refer to random fluctuations that are attributed to unpredictable occurrences.
The presentation about this concept simply implies that these patterns cannot be accounted for. They are once-off events. Examples
are natural disasters (such as fires, droughts, floods) or man-made disasters (strikes, boycotts, accidents, acts of violence and so
on).
Note that all the components of a time series influence the time series and can occur in any combination. The most important
problem to be solved in forecasting is trying to match the appropriate model to the pattern of the time series data.
ACTIVITY 1.6
Discuss what a time series is, and discuss the meaning of trend effects, seasonal variations, cyclical variations, and irregular
effects.
11
1.2 Forecasting Methods
There are several forecasting methods, but there is no single best forecasting method. There are, however, appropriate methods for
any time series situation. The forecasting methods are described in the prescribed textbook along the same line as types of data
that you dealt with in your Statistics courses/modules at first year level. They are qualitative and quantitative in nature.
Common examples of qualitative forecasting methods are judgemental methods. Judgmental forecasting methods incorporate
intuitive judgements, opinions and subjective probability estimates. Popular qualitative forecasting methods are the following:
● Composite forecasts
● Surveys
● Delphi method
● Scenario building
● Technology forecasting
● Forecast by analogy
You do not need to learn more about these for the requirements of this module. However, you may come across them in
applications. Hence, your encounter with them may be of help in future applications.
Causal forecasting models, start by identifying variables that are related to the one to be predicted. This is followed by forming
a statistical model that describes the relationship between these variables and the variable to be forecasted. The common ones are
regression models and ordinary polynomials.
In the causal forecasting method, the variable of interest, which is the one whose forecasts are required, depends on other
variables. It is thus the dependent variable. The ones on which the variable of interest depends are known as the independent
variables.
12
income for the milk purchases. Fortunately for Jabulani, he has in the past four weeks, managed to deliver milk before item P was
delivered. However, most of the buyers who are paid on Saturday tend to meet the P seller before their milk purchases on Sunday
morning.
It is necessary to understand dependencies and correlations when dealing with forecasting. If you fail to understand them, you
may fall into the trap of making wrong assumptions because influences that may affect your forecasts and constraints coming with
correlated variables may lead to developing inaccurate models and thus leading to wrong forecasts.
Useful common examples are time series and causal methods. There are others as well, but the following may be of help in
your development.
• Rolling forecast is a projection into the future based on past performances, routinely updated on a regular schedule to
incorporate data.
• Moving average
• Exponential smoothing
• Extrapolation
• Linear prediction
• Trend estimation
• Growth curve
Other methods
• Simulation
• Prediction market
• Probabilistic forecasting and ensemble forecasting
• Reference class forecasting
These methods are given to you so that when you make references from other forecasting sources, you will be able to understand
where they belong in your module. However, they are not necessarily required to the extent that is presented in those other sources.
13
ACTIVITY 1.7
● Do you see any dependence of the variables in the example of Jabulani’s milk-selling business above?
ACTIVITY 1.8
(a) Classify the milk sales in the latest scenario as a dependent or independent variable.
(b) Explain your choice in (a) above. Here confine your response to milk purchases and disposable income.
It is time to note that if the forecasts prepared/developed are not accurate, they may be useless since they are probably going to
mislead the user. When we insist on a scientific method in forecasting, it was to ensure that we can monitor the methods and test
the models so that the inaccuracies in them are reduced, or ideally, eliminated.
It is important to know the likely errors when you attempt to make predictions or develop forecasts. If you know them, you can
avoid or minimise them. Error is as simple as when you thought Jabulani was going to sell 500 litres in a specific week and he ends
up selling 520 litres. (Note that you could make an error in litres of milk by overestimating as well.)
The next sections require your learned skill of drawing graphs and interpreting them. The most common ones you should expect
to encounter (draw and interpret) are scatter diagram (or scatterplot) and time plot. Revise them if you have already forgotten how
they are drawn.
Further, you are soon going to engage in a number of calculations. Thus, ensure that you are ready to perform them, and that
you remember descriptive statistics you learnt in your early years of Statistics. It is also very important to be able to know why the
calculations are necessary in any exercise of building a forecast model.
There are two types of forecasts, the point forecast and the prediction interval. A point forecast is a single number that estimates
the actual observation. A prediction interval is a range of values that gives us some confidence that the actual value is contained in
the interval.
The forecast error requires that the estimate be found and be “paired” with the actual observation.
14
In statistics, a forecast error is the difference between the actual or real and the predicted or forecast value of a time series or
any other phenomenon of interest. In simple cases, a forecast is compared with an outcome at a single time-point and a summary
of forecast errors is constructed over a collection of such time-points. Here the forecast may be assessed using the difference or
using a proportional error. By convention, the error is defined using the value of the outcome minus the value of the forecast. In
other cases, a forecast may consist of predicted values over a number of lead-times; in this case an assessment of forecast error
may need to consider more general ways of assessing the match between the time-profiles of the forecast and the outcome. If a
main application of the forecast is to predict when certain thresholds will be crossed, one possible way of assessing the forecast
is to use the timing-error—the difference in time between when the outcome crosses the threshold and when the forecast does so.
When there is interest in the maximum value being reached, assessment of forecasts can be done using any of:
• the difference between the peak value of the outcome and the value forecast for that time point.
Forecast error can be a calendar forecast error or a cross-sectional forecast error, when we want to summarize the forecast error
over a group of units. If we observe the average forecast error for a time-series of forecasts for the same product or phenomenon,
then we call this a calendar forecast error or time-series forecast error. If we observe this for multiple products for the same period,
then this is a cross-sectional performance error.
To calculate the forecast errors we subtract the estimates (ŷi ) from the actual observation (yi ). The difference is the forecast
error. Can you tell what the values of the forecast errors imply? For example, some may be smaller than others, some negative and
others positive!
When Jabulani plans his sales, he makes some estimation of litres of milk that he hopes to sell. In Week 3 prior to getting to
the market, he had made the following estimations (ŷi ):
Remember to refer to the appropriate week of the table of Data set 1.4 for observed values (yi ).
ACTIVITY 1.9
(d) Identify the day on which the milk sales were most disappointing! Explain.
15
DISCUSSION OF ACTIVITY 1.9
We have not defined the terms overestimation and underestimation formally. They have been defined in other modules, but we
wish to make a reminder. If you make a prediction and the actual observation turns out to be smaller, we will have overestimated.
What is the sign of the forecast error? Can you now define the term “underestimation”? What about the sign of the forecast error?
Let us get into the questions of the activity. The setup of week 3 is as follows:
(a) Overestimations are visible after pairing by observing the pairs in which the actual observations are lower than the estimates.
These were on Day 1 and Day 5.
(c) The forecast errors are −6, 4, 0, 1, −1, 3 and 2 for the seven days, respectively.
(d) Day 1 was the most disappointing. This is because Jabulani expected to sell 27 litres but only sold 21 litres. It is the day he
made the biggest loss, that is with the largest negative error.
(e) He made the best prediction on Day 3, where the sales were equal to the estimates.
If there was no day when the sales and estimates were equal, then the day with the smallest forecast error in absolute value
would have been the one on which the best prediction was made. This means that Day 4 and Day 5 are the days on which good
predictions were made. However, we note that Day 5 was not a happy day for the seller because some stock was left unsold whereas
on Day 4, all stock was sold and one customer did not get milk.
Examining the forecast errors over time provides some information on the accuracy of the estimates.
- Random forecast errors demonstrate that patterns that existed in the data were considered when the estimates were made.
- If there is an increasing (or decreasing) trend, and in making an estimation this trend was not taken care of, then the scatter
plot of forecast errors would reveal an increasing (or decreasing) trend.
- If estimates of seasonal data did not account for seasonality, the scatter plot of forecast errors would reveal the seasonal
pattern that was not taken care of.
ACTIVITY 1.10
(b) Do the data reveal any pattern that was not accounted for?
(a) The plot is not difficult to draw. The forecast errors to be used were calculated in Activity 1.9. They are
16
Plot of forecast errors of Activity 1.9
(b) The plot looks almost random. This means that the forecasting technique provides a good fit to the data.
ACTIVITY 1.11
Calculate the absolute deviations for the estimates in Activity 1.9.
The absolute deviations are the absolute values of the forecast errors, which we can recall from our high-school days. The
absolute deviations are thus
ACTIVITY 1.12
17
The MAD is therefore
7
∑ ∣ei ∣
i=1
M AD =
n
17
=
7
= 2.42857.
ACTIVITY 1.13
ACTIVITY 1.14
Calculate the MSE for the estimates in Activity 1.9.
DISCUSSION OF ACTIVITY 1.14
To calculate the MSE we need the squared errors, which were calculated as
87
=
7
= 12.42857.
18
Now, let us pause a little. We have done a few useful calculations. We have also answered a few questions about errors.
Do you recall the value of the forecast error on the day that the estimate was perfect? Do you also see what is meant by a poor
estimate? Now can you say what is meant by a good estimate? You will recall that the errors need to be as small as possible. So
far it is not absolutely clear what “small” entails.
The MAD and MSE are the measures that we will use to determine if the errors are small which will indicate a good model.
The objective is to select a good forecast model. The model that will be selected must produce forecasts that are close to the actual
observations. The MAD and the MSE will serve as our tools to select a forecast model.
We need to understand the MAD and the MSE as they relate to the forecast model. The steps are as follows:
MAD is not in any way “mad”. It is an objective route to good forecasting. The MSE serves the same purpose.
Sometimes the effectiveness of a model is measured in percentages. Such measures are the absolute percentage error (APE)
and the mean absolute percentage error.
ACTIVITY 1.15
To calculate the APE we need the absolute errors and the actual observations, which are
n
∑ AP Ei
i=1
M AP E = .
n
19
ACTIVITY 1.16
We obtain
7
∑ AP Ei = 96.8159.
i=1
96.8159
M AP E = .
7
= 13.8308.
The intention when measuring the error is to reduce it to monitor and control to increase the accuracy of these methods.
et = yt − Ft (1.1)
where et is the forecast error at period t, yt is the actual value at period t, and Ft is the forecast for period t. The summary of
the statistics is given in the next table.
20
Measures of aggregate error (in the summation we omitted the index t = 1, . . . , n but it assumed to be there):
∣et ∣
Mean Absolute Deviation (MAD) MAD =
n
et
∑∣ ∣
yt
Mean Absolute Percentage Error (MAPE) MAPE =
n
∑ ∣e2t ∣
Mean squared error (MSE) MSE =
n
√
∑ e2t
Root Mean squared error (RMSE) RMSE =
n
Please note that business forecasters and practitioners sometimes use different terminology in the industry. They refer to the
PMAD as the MAPE, although they compute this volume weighted MAPE. Please stick to the textbook notation.
- Time frame
A forecasting method may take a long or short time to develop. The time frames, or the time horizons, are short, medium, or
long.
- Data patterns
The patterns we identified in the earlier discussion are trend, cycle and seasonality. If a forecasting situation requires a pattern
that the method does not take into account, then the method becomes inappropriate.
- Forecasting cost
Costs could be the money or skills needed to develop a forecasting method. If the cost of developing forecasts is higher
than the benefits, a cheaper method must be used or forecasts should not be developed. Also, the more complex forecasting
methods are more expensive to develop while simple ones are usually less expensive.
- Desired accuracy
Obviously, it is ideal that forecasts be perfectly accurate. Some situations require the best possible accuracy level because of
their high sensitivity. As an example, life-threatening situations such as HIV/AIDS, typhoid, cholera and others, due to risk
of loss of life, require the best possible forecasts with superior accuracy.
21
- Data availability
When there are no numeric data or no detail, we cannot develop quantitative forecasts. Some situations though, may have
limited data, or data of a form that is not required. The forecaster will have to accommodate the data and choose an
appropriate method that will suit the data even though it is not ideal for the problem. We are warned that forecasting methods
give inaccurate forecasts if inaccurate, outdated or irrelevant data are used to develop the forecasts.
- Convenience
Convenience in this case means the ease of use by the forecasters as well as their understanding of the method. If the
forecaster lacks his/her understanding of the methods he or she uses, then there will not be much confidence assigned to the
forecasts.
ACTIVITY 1.17
Suppose that you are to develop forecasts for the number of tourists using the services of a tourism organisation in the country.
You are given data of the number of tourists using these services for the last five years, and they have been increasing annually.
You also realise from the graphs provided that in the months of January, March, June and December the tourists used this company
even more.
(a) As a time series specialist you are requested to develop forecasts and the marketing manager insists on a specific method.
How would you react?
(a) One should not hesitate to differ from the marketing manager by refusing to use the method he or she prescribed. When
using the method, the user needs to be able to explain the rationale for it. The marketing manager must give reasons for the
choice, and these reasons must be consistent with the time series methodology. The method must be able to account for the
high tourism numbers in January, March, June and December. It must also be able to show the increasing numbers.
(b) The patterns are clear. The four months with high tourist numbers indicate seasonality while the increasing numbers indicate
an increasing trend.
ACTIVITY 1.18
Develop a forecast model to predict the milk sales of Jabulani’s business (Data set 1.4).
(a) Explain the patterns that exist from the record presented.
Hint: Take note of the seasonality pattern.
(b) If we assume that the display in the past four weeks will recur, can we expect growth in this business? Explain.
22
DISCUSSION OF ACTIVITY 1.18
As per the explanations given, the data period is not enough to warrant the existence of cycles. The irregular component also,
by definition, cannot be accounted for. Hence (a) requires examination of trend and seasonality.
Plots of data set 1.4
Here the data for the different weeks were combined so that the trend can be examined. There is an increasing trend that is
demonstrated by the trend line.
Can we determine the rate of increase? Here, the rate of increase is given by the equation of the trend line. You must be able to
show that the equation of the trend line is
y = 0.1571x + 16.365.
The milk sales are clearly high on Day 4 and Day 6 for all the weeks and low on Day 2, Day 5 and Day 7. Therefore, from the
graph, the seasonal pattern is very evident in the data set.
23
Regression analysis relates variables through the use of linear equations. It is a statistical methodology that has a wide range of
applications. The variable of interest, denoted by Y , is made the subject of the formula. It is always made the dependent variable
because it is a function of the variables on which it depends. It is also called the response variable because when anything is done
to other variables, the variable of interest behaves in a certain way.
The variables that are related to the response variable are the independent variables that are allowed to vary within their feasible
values. These are the predictor variables often denoted by X1 , X2 , ..., Xk .
The objective of regression is to build a regression model which is a prediction equation that relates y to X1 , X2 , ..., Xk . This
model is used to describe, predict and control Y on the basis of the predictor variables.
Depending on the application needed to address a problem, a regression model can use quantitative independent variables (that
assume numbers) or qualitative independent variables (that assume non-numerical values).
This module requires full manipulation of simple linear regression models and some applications of multiple regression mod-
els. In addition to the regression models, the scope of the module covers time series, decomposition methods and exponential
smoothing.
1.6 Conclusion
We have acquired useful introductory knowledge for the module. We defined forecasting, explained its necessity, and explained
qualitative and quantitative forecasting methods. Time series data were discussed, its components explained, errors in forecasting
were defined, as well as measures to detect them. Factors for choosing a forecasting technique were discussed, and use of regression
analysis in forecasting was discussed briefly. As we know the use of exercises, the next exercises are also intended to make you
“fit” for the tasks ahead.
Self-evaluation exercises
24
Unit 2
2.1 Introduction
In order to do a good job, we need to be well equipped for it. In simple terms we need the knowledge to do the job as well as
the facilities or tools to use. To build a house, we need a good foundation and we need to be able to construct walls. The walls
would normally require bricks, which are laid solidly against one another. They are glued by cement. The cement is mixed with
specific proportions of water and sand. Specific skills are required for an effective mix. A mistake in one of these may lead to bad
results, which may reveal itself only some years after construction. Developing a forecast also requires an amount of knowledge
mix. Fortunately for us, when forecasting is done, there are also some tests or measures to indicate that the forecasts can be trusted.
Good forecasts will represent the actual truth well, with no or minor deviations. On the other hand, bad forecasts would mislead
the forecaster completely.
We need to know the future so that we can plan for it. If you remember the the example of milk sales in unit 1, Thursdays were
good days for business and there was almost always more stock of milk to cater for the increased market. If the predictions were
inaccurate, it could happen that there was less stock when the demand was high.
This study unit focuses on model building and some important aspects of residual analysis. The main purpose of the unit is to
learn to build forecasting models, while residual analysis measures the accuracy of the model. In particular, the unit focuses on:
• Build a regression model by hand or using Excel and interpret the results
• Assess goodness-of-fit, identify outliers and influential points using residual analysis
25
Outcomes - At the
end of the unit
you should be able Assessment Content Activities Feedback
to solve problems on
the following topics:
- multicollinearity - analyse - correlations - calculate - discuss each
covariance - variance - test activity
matrix inflation hypotheses
Where there are concepts that are necessary for us to learn a skill, we will look for the skills wherever they are in the book.
As an example, R2 appears in earlier chapters before Chapter 5. Many of these concepts were dealt with in first-year Statistics.
Fortunately they are all in the fifth chapter of the prescribed book.
● R2
● adjusted R2
● the C-statistic
● stepwise regression and backward elimination – read for interest sake only, not for examination purposes
● residual plots
26
● Cook’s distance
Some explanations
Time series data in this study unit shall consist predominantly of numeric data collected over regular intervals. Similar to
building a house on a good solid foundation, with intact walls and roof, in forecasting you also need an appropriate framework to
use your data wisely and then develop useful (and not misleading) forecasts.
2.2 Multicollinearity
We learnt about the correlation coefficient in first-year Statistics. When more than two variables are considered, the correlation
coefficient is generalised to the correlation matrix. We also came across the coefficient of determination when we studied regression.
The correlation coefficient and the coefficient of determination are useful in measuring multicollinearity.
We know from regression analysis that we may express a variable of interest (dependent variable) as a function of other variables
(independent variables). When two independent variables are related, there is collinearity. If more than two independent variables
are related, there is multicollinearity. An extreme case of multicollinearity is singularity, in which an independent variable is
perfectly predicted by another independent variable (or more than one). Do you recall the value of the correlation measure under
perfect correlation? Justify your answer.
ACTIVITY 2.1
Provide an example of a real-life case where multicollinearity can exist.
Surely, Y depends on X1 and X2 . It is put to you that we are to believe that X1 and X2 can be correlated. Obviously motivation
and training of staff are related variables. Thus, one can believe that X1 and X2 are correlated. Do you have any counter reflection
regarding this assertion? Think of other examples. Your examples need not be in the form of mathematical equations. They should
just get you thinking.
27
2.2.1 The variation inflation factor (VIF)
We studied variances in the first year and this gave us an idea of variation. Another topic that we hear about in economics is
inflation. In the current discussion we are not going to discuss economics, just in case you think it refers to that.
The variance inflation factor (VIF) is a measure we will use to determine the extent of multicollinearity. The variance inflation
factor is defined as follows:
Consider a regression model relating a response variable Y to a set of predictor variables X1 , X2 , . . . , Xj−1 , Xj , Xj+1 , . . . ,
Xk . The variance inflation factor V IFj for the predictor variable Xj in this set is defined by
1
V IFj =
1 − Rj2
where Rj2 is the multiple coefficient of determination for the regression model that relates Xj to all the other predictor variables
X1 , X2 , . . . , Xj−1 , Xj+1 , . . . , Xk .
ACTIVITY 2.2
Calculate the VIF for the Wednesday in the data set relating to milk sales in unit 1.
Recall that in Data set 1.4 in Unit 1 we had the following data for Week 3:
Day 1 2 3 4 5 6 7
yi 21 15 20 27 13 25 11
ŷi 27 11 20 26 14 22 9
ACTIVITY 2.3
Suppose that you are given the following data together with the corresponding estimates.
y 39 41 33 45 29 42 21
ŷ-estimates 36.1 33.9 37.3 40.2 31.7 38.9 34.8
We use Excel to perform the calculations. If you have access to a statistical package, you are welcome to use it.
28
These values are given:
yi 39 41 33 45 29 42 21
ŷi 36.1 33.9 37.3 40.2 31.7 38.9 34.8
i 1 2 3 4 5 6 7 Sum
2
(yi − y) 10.7958 27.9386 7.3674 86.2242 45.0818 39.5100 216.5106 433.4286
2
(ŷi − y) 0.1488 3.2917 2.5144 20.1215 16.1146 10.1487 0.8359 53.1756
n
∑ (ŷi −y)
2
2 i=1
R = n
∑ (yi −y)2
i=1
53.1756
= 433.4286
= 0.1227.
This is how we would calculate the coefficient of determination. The value of R2 is needed for VIF. In calculating VIF though,
only the independent variables are used. We alternate each one of them to be regressed on the others.
NB: Rj2 = 0 implies that Xj is not related to the other independent variables.
ACTIVITY 2.4
29
The last case is used to explain the extent of multicollinearity. If the coefficient of determination of one independent variable
on others is very large (i.e., close to 1), the corresponding VIF is very large.
These two situations lead us to the guidelines for interpreting multicollinearity. To decide about the severity of multicollinearity,
we focus on the maximum VIF and the average of the VIFs. In general, multicollinearity between predictor variables is said to be
severe if
This means that if one of the above conditions is met, we can conclude that there is severe multicollinearity between the
independent variable that was regressed on and the others. However, it is not easy to say what “substantially greater than 1” means.
We have to make it definite for the sake of this module.
We rephrase the rule to be:
Consider multicollinearity as severe if one of the following is true:
ACTIVITY 2.5
Consider the “sales territory performance data”presented in the prescribed textbook. The VIFs were found to be:
Determine if we can conclude that there is severe multicollinearity among the independent variables.
30
● The multiple coefficient of determination R2 .
This measure was dealt with to some extent earlier. It is explored further in this section. When we add an independent
variable to a regression model, it decreases the unexplained variation and increases the explained variation, thus increasing
the R2 . This is true even when it is an unimportant independent variable. R2 is calculated as follows:
n
2
∑ (ŷi − y)
Explained Variation i=1
R2 = = n .
Total Variation 2
∑ (yi − y)
i=1
ACTIVITY 2.6
It was in Activity 2.3 that R2 = 0.1227. That is the total variation of the response variable explained by the predictor variables
is only 12%. This means that there are other important predictor variables that were not included in the model or the model was
incorrect, thus failed to fit the data.
k n−1
R¯2 = (R2 − )( )
n − 1 n − (k + 1)
where R2 is the multiple coefficient of determination, n is the number of observations and k is the number of predictor
variables.
ACTIVITY 2.7
How does this measure behave when an additional independent variable is included in the regression model?
This measure depends only on R2 . Its formation is such that it would behave in the same way as R2 . Since we saw in the
2
previous activity that adding any independent variable increases the value of R2 , it will also increase R . Since these two measures
do not seem to provide adequate assistance, let us try s, the standard error.
2
SSE = Σ (yi − ŷi ) .
31
One criterion considered better than R2 and adjusted R2 for measuring the value of including an additional independent
variable is the standard error given by
√
SSE
s= .
n−k−1
The guideline is that if s increases when we add another independent variable, then that independent variable should not be
added. It is desirable to have a small s. A large s is equivalent to a wide confidence interval. If we were to use the predicted
interval length, short confidence intervals are then indicators of a desired model. We will only use s in this module, but note
that in practice you may be required to use confidence intervals. Note the equivalence.
The next measure for comparing regression models that will be discussed is the C-statistic.
● The C-statistic
The C-statistic, also called the Cp -statistic, is another valuable measure useful in comparing regression models. Let s2p denote
the mean square error based on a model using all p potential independent variables. If SSE denotes the unexplained variation
for another particular model that has k independent variables, then the C-statistic for this model is
SSE
C= − [n − 2(k + 1)].
s2p
ACTIVITY 2.8
SSE
C= + 2k + 2 − n.
s2p
ACTIVITY 2.9
It says in the description of SSE that we want SSE to be small. Explain why we want this measure to be small.
2
SSE Σ (yi − ŷi )
s2 = = .
n−k−1 n−k−1
32
In isolation we analyse
2
SSE = Σ (yi − ŷi ) .
This is the sum of the squared differences between the actual values and the estimates. Ideally, if the estimates are perfect
predictions, they will replicate the actual values. Then the differences will be zero. This will therefore result in SSE = 0, the
smallest possible value of SSE. Therefore, if the model used predicts the actual values satisfactorily, then the differences will be
small and SSE will be small.
Look at Example 5.1 (Bowerman et al. 2005: 228).
2
The output from MINITAB and SAS that appears on page 229 resulted from calculating R2 , R , s and the Cp -statistic.
The MINITAB output gives the two best models of each size in terms of s, R2 and the C-statistic. Thus, we find the two best
one-variable models, the two best two-variable models, . . ., the two best eight-variable models. Note that the adjusted R2 increases
considerably when a second variable is added. There is no problem with the inclusion of ACCTS because it is a good predictor of
the dependent variable.
ACTIVITY 2.10
(a) If a model with only two variables is to be used, which variables would you use?
(b) A model using five variables is the best. Do you agree? Justify your answer.
(a) The model using ACCTS and ADVERT as predictors explains 77.5% of the variation, R2 = 0.775, more than the model
including MKTPOTEN and MKTSHARE.
(b) The models using five predictors have the smallest C-statistics (4.4) and C is closer to the number of parameters k + 1 = 6.
Discussion
We know that most of the time series models we will develop in future as forecasters will not be 100% accurate.
The error, e = y − ŷ, is the deviation between the actual value and the estimate. In Statistics we use interesting terms, we speak
of a residual when we mean an error estimate.
There are methods to deal with these deviations in Statistics so that our predictions remain useful regardless of the presence of
the errors. We refer to them as residual analysis.
33
ACTIVITY 2.11
Indicate if the following measures use residuals or not. You may explain in the space provided:
This is very interesting. There are links among these measures. Do you see the links? This activity also ensures that we revise
previous work. Can you see how much we have learnt so far?
If you answered ”yes” it is an indication of the importance of residuals. The vehicle we will utilise in this module to show this
importance, is residual analysis.
Residual analysis assists us in the prediction task. It helps us to detect errors in the models we develop, and gives us an
indication of whether we are on the right track.
For this we use graphical plots of residuals. We call them residual plots.
34
ACTIVITY 2.12
From Unit 1, Data set 1.4 Week 3 was as follows:
Day 1 2 3 4 5 6 7
yi 21 15 20 27 13 25 11
ŷi 27 11 20 26 14 22 9
yi 21 15 20 27 13 25 11
ŷi 27 11 20 26 14 22 9
yi − ŷi −6 4 0 1 -1 3 2
Remember that we are using residual plots to test the assumption that e = y − ŷ has a normal distribution with mean 0 and
variance σ 2 . We use the above plot to test the constant variance assumption. If the residuals are randomly distributed around the
zero mean we can assume constant error variance. If, however, the residual plot “fans out” or “funnels in” we have an increasing
or decreasing error variance which implies that the assumption of constant error variance is violated.
Let us share something with you about the residual plot for the milk data.
If you visually place the residual plot in the box below and use lines to explain its shape, it cannot be appropriately explained
by a parallel band of the following form:
35
Also, it does not look like it can be appropriately explained by a fan shape of the form
Instead, it looks very much like it can be appropriately explained by a funnel shape of the form
Thus the residuals for the milk data violate the assumption of constant variance. The prescribed textbook provides further
illustrations on page 238.
36
2.4.3 Correct functional form assumption
The model specified from the given data may be correct or incorrect. Using a residual plot, we can determine whether this functional
form is correct or not. If the functional form is incorrect, a correct one can be found from the residual plot constructed from the
derived model by displaying the pattern of the appropriate model. For example, if we use a simple linear regression model when
the true relationship between Y and X is curved, the residual plot would appear as a curve.
The above plot shows no evidence of a bell shape. The normality assumption is violated.
We can also employ a normal plot of the residuals to determine normality. The procedure is as follows:
• Calculate the ordered residuals e(1) , e(2) , . . . , e(n) , then place them on the horizontal axis.
• Calculate the z(i) as the value on horizontal axis under the standard normal curve so that the area under the curve to the left
of z(i) is a(i) . Then place them the vertical axis.
• The scatter plot generated by (e(i) , z(i) ), i = 1, . . . , n is the normal probability plot.
• The assumption of normality is not violated when the normal probability plot approximately follows a straight line.
ACTIVITY 2.13
Use a normal plot for the data of Activity 2.12 to determine whether the data come from a normal distribution or not.
37
DISCUSSION OF ACTIVITY 2.13
ei −6 4 0 1 −1 3 2
e(i) −6 −1 0 1 2 3 4
3i − 1
The a(i) = , i = 1, 2, ..., 7 are
3n + 1
For illustration, z(1) = −1.335 was found as follows: The area to the left of z(1) under the standard normal curve is a(1) =
0.0909. Obviously z(1) is negative since the area is less than 0.5 Thus, the area under the standard normal curve between z(1) and
0 is 0.5 − 0.0909 = 0.4091.
∗ ∗
By symmetry of the standard normal curve, let us find z(1) = −z(1) such that P (Z < z(1) ) = 0.4091. The value 0.4091 is not in
Table A1, but this value is between 0.4082 and 0.4099 which correspond to z = 1.33 and z = 1.34, respectively. The mid-value is
1.33+1.34 ∗
thus z = 2
= 1.335. Hence, z(1) = 1.335 which implies that z(1) = −1.335.
The other z-scores are found in a similar way. Please try to find at least two of them. This manual calculation is time consuming,
but one can use Excel with the command =NORM.S.INV(). For example, =NORM.S.INV(0.0909)=-1.335233304, thus when
rounded to four digits after the decimal place, it gives the same result as the one found manually.
38
Therefore, we plot
e(i) −6 −1 0 1 2 3 4
On the basis of the above, how would you define negative autocorrelation? Since these are time series, the resulting error
terms are also time-based.
In the case where the time-dependent errors do not display a cyclic or alternating pattern, we say that the error terms are
statistically independent.
39
2.5 Outliers and influential observations
Observations that lie far away from the bulk of your data, are called outliers. Some outliers influence the measures derived from
your data. They are called influential observations.
Influential observations have a serious effect on the analysis. To test for the effects caused by a suspected data point, we perform
calculations and estimations, e.g. leverage values, studentised residuals and Cook’s measure. Then we could remove the suspected
data point and perform the same calculations to observe the change in the findings.
Outliers are not necessarily errors, as we may be led to believe. They are often very high or very low values that occur because
of conditions that existed at the time they were observed. Some of them may indicate a fortune while others may be an indication
of a hardship. When high successes are experienced, analysts may examine the factors that contribute to high levels of success. It
is better to take note of the conditions that are necessary to eliminate the outlier!
Be warned also that sometimes low values and high values may occur due to seasonality, not because they are just outliers. Out
of the time series context they may be judged as ‘bad’ or ‘good’ while under the time series scope they may be normal values with
a useful implication.
ACTIVITY 2.14
Are there outliers in the following data set? If so, please identify them.
x 40 36 49 1207 23 38 27 44 45 30
y 90 77 87 46 290 79 58 66 87 66
In the x data set, most values lie in the region of the twenties to the forties. The outlier is therefore x = . The y-values
lie in the forties to the nineties so that the outlier is y = .
ACTIVITY 2.15
Calculate the means and the standard deviations of the data in Activity 2.14.
DISCUSSION OF ACTIVITY 2.15
x y
Mean 153.9 94.6
Standard deviation 370.11 70.09
ACTIVITY 2.16
Remove the values which you said were outliers in Activity 2.14. Calculate the means and standard deviations. Were these data
points influential?
40
x 40 36 49 38 27 44 45 30
y 90 77 87 79 58 66 87 66
If you did not get the correct answers in Activity 2.14, this is the time to update your answers to that question.
x y
Mean 38.63 76.25
Standard deviation 7.520 11.781
Are there substantial differences from these measures based on the original data? Well, this is obvious. What do you conclude?
ACTIVITY 2.17
This is a question given to remove a possible misconception that if a value lies far away from the others, it will also influence
measures calculated from the data set. There are some statistical measures that are easily influenced by outliers, such as the mean
and the standard deviation. But the median and the mode are not influenced that easily. Do you see why?
A leverage value is considered to be large if it is greater than twice the average of all the leverage values which can be calculated
as 2 (k + 1) /n, where k is the number of predictors and n the sample size. Leverages are usually computed using statical software
packages.
ACTIVITY 2.18
This activity is included to give you a feeling for the calculations done when analysing residuals. These are unrealistic data,
just to prove the point. In real life this analysis will be done by a computer, but here we use a simple dataset.
Consider the following data:
x 40 36 49 1207 23 38 27 44 45 30
y 90 77 87 46 290 79 58 66 87 66
41
(a) Find the fitted regression equation ŷ = β̂0 + β̂1 x using the method of least squares method.
(a) The method of least squares method provides the values of β̂0 and β̂1 as follows:
nΣxy − ΣxΣy
β̂1 = 2
nΣx2 − (Σx)
−593954
=
12328569
= −0.048
and
Σy − bΣx
β̂0 =
n
= 102.
ŷ = 102 − 0.048x.
42
(b) To calculate the residuals, we estimate y-values using the equation above and the following x-values:
x 40 36 49 1207 23 38 27 44 45 30
ŷ 100.087 100.280 99.654 43.865 100.906 100.184 100.714 99.895 99.846 100.57
y 90 77 87 46 290 79 58 66 87 66
ŷ 100.087 100.280 99.654 43.865 100.906 100.184 100.714 99.895 99.846 100.569
e −10.087 −23.280 −12.654 2.135 189.094 −21.184 −42.714 −33.895 −12.846 −34.569
(c) The residuals that are suspect are the fourth and the fifth ones, namely
e 2.135 189.094
ACTIVITY 2.19
2
SSE = Σ (yi − ŷi )
= 41346.9053.
43
Then
√
SSE
s =
n−2
√
41346.9053
=
8
= 71.8913.
Now
2
1 (xi − x)
Di = +
n SSxx
where
n ∑ xi 1539
2
SSxx = ∑ (xi − x) with x̄ = = = 153.9
i=1 n 10
= 1232856.9.
i 1 2 3 4 5
Di 0.1125 0.1133 0.1107 0.9825 0.1161
i 6 7 8 9 10
Di 0.1129 0.1152 0.1117 0.1115 0.1145
Now we want
√
si = s 1 + Di .
They are
i 1 2 3 4 5
si 75.8274 75.8547 75.7661 101.2239 75.9500
i 6 7 8 9 10
si 75.8411 75.9194 75.8002 75.7933 75.8956
e −10.087 −23.280 −12.654 2.135 189.094 −21.184 −42.714 −33.895 −12.846 −34.569
44
The studentised residuals are
i 1 2 3 4 5
estud
i −0.1330 −0.3069 −0.1670 0.0211 2.4897
i 6 7 8 9 10
estud
i −0.2793 −0.5626 −0.4472 −0.1695 −0.4555
As expected, the fifth observation is an outlier with respect to y since the corresponding studentised residual 2.4897 is greater
than 2.
Sometimes an “obvious” outlier cannot be detected using studentised residuals. Studentised deleted residuals may also be used.
Thereafter we will also use Cook’s distance.
Deleted residuals
The deleted residual for observation i is calculated by subtracting yi from the point estimate computed using least squares point
estimation based on all n observations except for observation i. This is done because if yi is an outlier with respect to its y-value,
using this observation to compute the usual least squares point estimates might draw the usual point prediction ŷi towards yi and
thus cause the resulting usual residual to be small. This would falsely imply that observation i is not an outlier with respect to its
y-value. Studentised deleted residuals are usually computed using statistical software packages.
ACTIVITY 2.20
Inspect the output on p. 256 of the textbook.
e2i hi
Di = (k+1)s2
[ (1−hi)
2].
where ei = yi − ŷi is the ith residual, s2 is the model standard error and hi the ith diagonal element of the hat matrix. It is
known that a multiple linear regression model is expressed, in matrix form, as
y = Xβ + ϵ
where
y is the vector of responses, X is the design matrix, β is the vector of model parameters and ϵ is the vector of error terms. The hat
matrix is then defined by:
−1
H = X (X ′ X) X′ .
Cook’s distance is compared to F-critical value, say F0.05 (k + 1, n − (k + 1)) to see if it is significant. To guide us further we shall
also use the following rule of the thumb:
45
● A value of Di > 1.0 would generally be considered large.
ACTIVITY 2.21
The table below is about the need for labor in 17 navy hospitals in the USA as described in the textbook on page 254. The
response variable, Y , is the number of monthly hours required, and the dependent variables are
X1 : monthly X-ray exposure
X2 : monthly occupied bed days - a hospital has one occupied bed day if one bed is occupied for an entire day
X3 : average length of patients’ stay, in days.
Hospital Hours (Y ) Xray (X1 ) BedDays (X2 ) Length (X3 )
1 566.52 2463 472.92 4.45
2 696.82 2048 1339.75 6.92
3 1033.15 3940 620.25 4.28
4 1603.62 6505 568.33 3.9
5 1611.37 5723 1497.6 5.5
6 1613.27 11520 1365.83 4.6
7 1854.17 5779 1687 5.62
8 2160.55 5969 1639.92 5.15
9 2305.58 8461 2872.33 6.18
10 3503.93 20106 3655.08 6.15
11 3571.89 13313 2912 5.88
12 3741.4 10771 3921 4.88
13 4026.52 15543 3865.76 5.5
14 10343.81 36194 7684.1 7
15 11732.17 34703 12446.33 10.78
16 15414.94 39204 14098.4 7.05
17 18854.45 86533 15524 6.35
reg y x1 x2 x3
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .0529876 .020092 2.64 0.021 .0095813 .0963938
x2 | .9784801 .1051542 9.31 0.000 .7513084 1.205652
x3 | -320.9485 153.193 -2.10 0.056 -651.9019 10.00497
_cons | 1523.372 786.9016 1.94 0.075 -176.6255 3223.37
------------------------------------------------------------------------------
46
(a) Write down the equation of the fitted model.
(b) The following table presents STATA output for the diagnostic statistics discussed above. Indicate outlying observations in
terms of (1) standardised residuals, (2) studentised residuals, (3) leverages and (4) Cooks distance. Show all steps.
(b) (1) and (2) The observation with standardised and studentised residuals greater than 2 correspond to hospital 14. Therefore,
the outlying observation in its y-value is the one corresponding to hospital 14.
(3) To decide on the observation with high leverage, that outlier in the x-values we must find observations with leverages greater
2(k+1)
than n
where k = 3 and n = 17. We find
2(k + 1) 2(3 + 1) 8
= = ≈ 0.4706.
n 17 17
Therefore, outliers in the x-values are the observations corresponding to hospitals 15, 16 and 17 since the corresponding leverages
0.6818, 0.7855 and 0.8632, respectively, are greater than 0.4706.
(4)For n = 17, k = 3 and α = 0.05, the F critical value is F0.05 (3 + 1, 17 − (3 + 1)) = F0.05 (4, 13) = 3.18. Hospital 17
corresponds to an influential observation since Cooks distance value 5.0329 is greater than the F critical value F0.05 (4, 13) = 3.18.
Obviously, the value 5.0329 is also greater than 1.
In practical situations outliers could have important implications. The patterns of time series, such as seasonality, could be the
result of outlying elements in the data. To identify outliers we inspect leverage points and residuals using the techniques studied
above.
47
2.6 Conclusion
The study unit explained model building, and checking the model for usefulness by checking how far it is deviant from real
observations. Some useful statistics were introduced and experimentations took performed to appreciate them. These statistics are
important and should be remembered. You are not required to memorise them. You are also not expected to derive them. However,
you need to be able to interpret computer output on these statistics.
EXERCISES
i X Y
1 2 18
2 15 129
3 11 90
4 100 805
5 25 210
6 9 88
Calculate
(a) SSxx
(b) SSxy
(Σxi ) (Σyi )
where SSxy = Σxi yi −
n
(e) Which Di ’s are larger than the value for 2(k + 1)/n?
(f) Can you conclude that there are outliers in the data? Explain.
Open questions
(a) Why do we, as forecasters, have to study residuals, outliers, influential observations and the underlying measures?
(b) What is the role of residuals and of deleted residuals? Clarify your answer. Do residuals also explain deleted residuals?
Textbook exercises
Exercise 5.4
Exercise 5.5
Exercise 5.7
Exercise 5.16
If you are unsure whether your answers are correct, discuss them with your fellow students in the Discussion forum on the
module website.
48
Unit 3
• Fit the trend of time series data using polynomials, with emphasis on first order.
• Determine the types of seasonal variation and use dummy variables for assessing seasonal effects.
Outcomes
- At the end
Assessment Content Activities Feedback
of the unit you
should be able to
- data plots,
- use polynomial in - model trend - plot graphs, - discuss the
parameter
modeling trend using experiment activity
estimation
polynomial with data and
and
functions interpret data
measures
- detect - Durbin-Watson - autocorrelation - perform - discuss the
autocorrelation test, graphs detection, exercises activities
DW statistic with DW
- regression
- modeling with
- to model of - find lengths of - discuss the
dummy
seasonality using seasonality seasonality, activities
variables
dummy variables using develop
dummy forecasts
variables
3.1 Introduction
This unit is based on Chapter 6 in the prescribed textbook, which is Time Series Regression. It does not require full affluence in
regression. Your basic knowledge of polynomials will suffice. We discussed regression models roughly in the previous study units.
There we stated that the variable of interest (Y), which is the dependent variable, is regressed on the variables (factors) on which it
49
depends. In the past two units we plotted and interpreted some graphs. Did you find them useful? Quadratic equations were also
dealt with at school. Do you remember the parabola? This is the graph of a quadratic equation. You are welcome to refer to school
textbooks for these graphs.
These topics, together with the ones we learnt in study units 1 and 2 such as the components of time series, will be integrated
in this study unit. Do you still remember the components of time series? Attempt to name them.
We defined trend, seasonality, cyclic and irregular patterns in the earlier study units. We will treat trend as it may occur in
a linear pattern, a quadratic pattern and where there is no trend. The linear and quadratic patterns will include decreasing and
increasing trends.
One of the elements we dealt with in the previous study units is independence. Residuals are useful in detecting if the data are
independent or not. Time series data are observations of the same phenomenon recorded over consecutive time periods. Hence,
they cannot be fully independent. The usual relationship in time series data is autocorrelation. When the adjacent residuals have
roughly the same value and being correlated with each other we say that they are autocorrelated.
Autocorrelation can be negative or positive. Positive autocorrelation exists when over time, a positive error term is followed
by another positive error term and if over time, a negative error term is followed by another negative error term. On the other
hand, negative autocorrelation exists when over time, a positive error term is followed by a negative error term and if over time,
a negative error term is followed by a positive error term. We will explore this idea further. Residual plots and the Durbin-Watson
statistic will be involved.
Do you remember that some data do not have a seasonal pattern? Analysing data will reveal the presence or absence of
seasonality and when present, we should be able to determine the pattern.
We will show how dummy variables and trigonometric functions may be used to deal with seasonality. Growth curve models
will also be studied. The unit will also show how to deal with autocorrelated errors using first-order autocorrelated process.
yt = T Rt + εt
where
The value yt can be represented by an average level µt , which changes over time according to the equation, µt = T Rt and by
the error term εt . As we recall that random fluctuations do often occur in a process, the error term represents random fluctuations
that cause yt values to deviate from the average level µt . The three trends that we are going to study in this module are no trend,
linear trend, and quadratic trend.
50
ACTIVITY 3.1
What do you think “no trend” means?
3.2.1 No trend
In qualitative terms one may describe the condition as stable. This is a case of no deterioration and no improvement, therefore a
case of no trend. There is a general constant pattern displayed with no long-run growth or decline over time. In this case the trend
takes some constant value β0 , and is modeled as T Rt = β0 . Generally the case of “no trend” is undesirable, but it may happen.
Who would not want to see change?
Note that the case of “no trend” does not necessarily mean absolutely no change. If the changes are shown by fluctuations (the
ups and downs) in such a way that the average seems constant in the long run, then we have no trend.
yt = T Rt + εt
= β0 + β 1 t + ε t
The values β0 and β1 of the above equation provide us with the shape of the line graph. Try to recall the values that lead to
various shapes.
ACTIVITY 3.2
Discuss the implications of the parameters β0 and β1 on the shape of the linear graph.
51
yt = T Rt + εt
= β0 + β1 t + β2 t2 + ... + βp tp + εt
ACTIVITY 3.3
Write down the equation for the 3rd -order polynomial trend model.
The estimation of the regression parameters β0 , β1 , β2 and β3 is done using the method of least squares. The assumptions in
the model are that the error term εt satisfies the constant variance, independence, and normality assumptions.
ACTIVITY 3.4
How would you identify the violations of the assumptions?
ACTIVITY 3.5
The data Cod Catch described in the prescribed textbook in Example 6.1 is reproduced below.
The company wanted to forecast its minimum and maximum possible revenues from cod catch sales and to plan the operations
of its fish processing plant by making point and interval forecasts of its monthly cod catch (in tons).
52
(b) Which type of trend the plot indicate? Explain your answer.
(c) Determine the point estimate and the 95% prediction interval for the monthly cod catches.
(a) We must first combine in long format the data of the two years, and thus have 24 data points as in the table below.
The plot of Cod Catch versus Time is displayed below. The plot was done using STATA, but it is also easy to plot in Excel.
53
Try it. You can also use another statistical package.
(b) The plot of the data reveals a random fluctuation around a constant average level. The maximum is slightly above 400 while
the minimum seems to be half way between 250 and 300, that is about 275.
Now, since we assume a random fluctuation around a constant level, it makes sense to believe that a model with no trend
describes the data. Hence, it comes to the conclusion that the regression model with no trend is to be used in forecasting the
cod catch in future months. Therefore, we use the following model:
yt = T Rt + εt = β0 + εt
(c) The parameter is a constant. What is it estimated? It is well known that in this case, least squares estimation gives β̂0 = ȳ,
that is:
24
∑ yt 362 + 381 + ⋯ + 365
β̂0 = ȳ = t=1 = = 351.2917.
24 24
When there is mention of minimum and maximum, we must remember from first year that it is the confidence intervals
that are being implied. Now, do you recall interval estimation? In forecasting we speak of forecasts when point estimates
of future predictions are of interest, and of prediction interval forecasts for confidence intervals of the predicted future
confidence intervals.
Also,
¿
Á n (y − y)2
Á∑ t
Á
À t=1
s= .
n−1
54
The results in the following tables will be used to calculate s.
t yt (yt − ȳ)2
1 362 114.6677
2 381 882.5831
3 317 1175.9207
4 297 2947.5887
5 399 2276.0819
6 402 2571.3317
7 375 562.0835
8 349 5.2519
9 386 1204.6661
10 328 542.5033
11 389 1421.9159
12 343 68.7523
13 276 5668.8401
14 334 299.0029
15 394 1823.9989
16 334 299.0029
17 384 1069.8329
18 314 1390.6709
19 344 53.1689
20 337 204.2527
21 345 39.5855
22 362 114.6677
23 314 1390.6709
24 365 187.9175
Total 351.2917 26314.9583
¿
Á n (y − y)2 √
Á∑ t
Á
À t=1 26314.96
Hence, s = = = 33.82497
n−1 23
√ √
⎛ [23] 1 [23] 1⎞
y − t0.025 s 1 + , y + t0.025 s 1 +
⎝ n n⎠
√ √
⎛ 1 1⎞
= 351.2917 − 2.069 (33.82497) 1 + ; 351.2917 + 2.069 (33.82497) 1+
⎝ 24 24 ⎠
= (279.8647; 422.7187)
55
ACTIVITY 3.6
The demand of new type of calculator, called Bismark X-12, has been increasing over the last two in Smith’s Department
Stores, Inc as stated in the Example 6.2 of the prescribed textbook. Smith’s uses an inventory policy to meet customers’
demand without ordering calculators that may greatly exceed the demand. In order to implement his policy in future months,
Smith both point predictions and prediction intervals for total monthly Bismark X-12 demand. The monthly calculator
demand data for the past two years are given in the following table:
(b) Which type of trend the plot indicate? Explain your answer.
(c) Determine the point estimate and the 95% prediction interval of calculator demand for January of the third year.
56
DISCUSSION OF ACTIVITY 3.6
(a) We must first combine in long format the data of the two years, and thus have 24 data points as in the table below.
The plot of Demand versus Time is displayed below. The plot was done using STATA, but it is also easy to plot in Excel.
Try it. You can also use another statistical package.
57
(b) The figure gives an indication of an increasing trend. Therefore, we shall employ the regression equation of the form:
yt = T Rt + εt = β0 + β1 t + εt
(c) The method of least squares can be used to estimate the parameters β1 and β0 , that is:
Hence,
nΣtyt − ΣtΣyt 24 (98973) − (300) (7175)
β̂1 = 2
= = 8.0743
nΣt2 − (Σt) 24 (4900) − 3002
and
Σyt − b1 Σt 7175 − (8.0743) (300)
β̂0 = = = 198.0290.
n 24
58
which can be used to calculate the point forecast of a future demand yt . January of the third year corresponds to t = 25.
Hence, the point forecast of the demand in January of the third year is:
ŷ25 = 198.0290 + 8.0743(25) = 399.8865. Furthermore, the 95% prediction interval for y25 is:
¿
(24−2) Á 1 (25 − t̄)2
ŷ25 ± t0.025 sÁ
À1 + + 24
24 ∑t=1 (t − t̄)2
(24−2) (22)
where t0.025 = t0.025 = 2.074 and
√ √
(yt − ŷ)2 22066.6
s= = = 31.6706
n−2 22
The 95% prediction interval of y25 is:
¿ √
(24−2) Á
Á
À 1 (25 − t̄)2 1 (25 − 12.5)2
ŷ25 ± t0.025 s 1 + + 24 = 399.8865 ± 2.074(31.6706) 1 + + .
24 ∑t=1 (t − t̄) 2 24 1150
ACTIVITY 3.7
Activities 3.5 and 3.6 focused on time series data with no trend and with a linear trend, respectively. In this activity we focus
on time series with a quadratic trend using Example 6.3 in the prescribed textbook. The example is on data collected in two
consecutive years for monthly loan requests by staff members at State University Credit Union. The credit union requires both
point predictions and prediction intervals of monthly loan requests to made by staff members in future months. The collected data,
thousands of dollars, are reported in the following table:
(b) Which type of trend the plot indicate? Explain your answer.
(c) Determine the point estimate and the 95% prediction interval of loan request for January of the third year.
59
DISCUSSION OF ACTIVITY 3.7
(a) We must first combine in long format the data of the two years, and thus have 24 data points as in the table below.
The plot of Loan request versus Time is displayed below. The plot was done using STATA, but it is also easy to plot in Excel.
60
Try it. You can also use another statistical package.
(b) The above graph indicates an increasing trend with a decreasing rate. Thus the following quadratic model may be used to
model the data:
yt = T Rt + ϵt = β0 + β1 t + β2 t2 + ϵt .
(c) Parameter estimation can be done using any statistical package, but one can also Excel provided the values of t2 are first
61
created as in the following table:
yt t t2
297 1 1
249 2 4
340 3 9
406 4 16
464 5 25
481 6 36
549 7 49
553 8 64
556 9 81
642 10 100
670 11 121
712 12 144
808 13 169
809 14 196
867 15 225
855 16 256
965 17 289
921 18 324
956 19 361
990 20 400
1019 21 441
1021 22 484
1033 23 529
1127 24 576
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9935
R Square 0.9871
Adjusted R Square 0.9859
Standard Error 31.2469
Observations 24
ANOVA
df SS MS F Significance F
Regression 2 1566730.1527 783365.0764 802.3275 0.0000
Residual 21 20503.6806 976.3657
Total 23 1587233.8333
62
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 199.6196 20.8480 9.5750 0.0000 156.2639 242.9753
X Variable 1 50.9366 3.8424 13.2564 0.0000 42.9459 58.9274
X Variable 2 -0.5677 0.1492 -3.8048 0.0010 -0.8780 -0.2574
ACTIVITY 3.8
Determine forecasts for the loan requests for April of year 7.
We note that December of year 6 is t = 72. Can you see why? It then becomes easy to realise that April of year 7 is t = 76.
Thus, we are required to determine the value of ŷ76 . Thus:
ACTIVITY 3.9
Consider the following residual plots of time series data. State in each case if the error terms are negatively autocorrelated,
positively autocorrelated or there is no autocorrelation. The space in “Verdict” below the following graphs allows you to fill in the
answer.
63
Residual plot (a)
64
Residual plot (c)
You have made the verdicts by deciding the appropriate pattern for each graph given. Are you happy with your answers?
Residual plot (a) is fully characterised by the “Positive autocorrelation clarification phrase”. Residual plot (b) cannot be related
to any of the two phrases; hence it is an example of a case where there is no autocorrelation. Lastly, Residual plot (c) is fully
characterised by the “Negative autocorrelation clarification phrase”. To convince ourselves even more, we read off the values from
these graphs. The three residual data sets used are:
residuals (a) −2 −7 −4 3 9 14 4 −1 −5 −3 −1 2 5 3
residuals (b) −2 −7 4 −3 9 0 −4 −1 5 −3 1 −1 5 3
residuals (c) −2 7 −4 3 −9 14 −4 1 −5 3 −1 2 −5 3
These residuals confirm the verdicts in the above discussion. An alternative way is to use runs. A run is simply a set of same
signs following each other. If you can identify that signs of residuals that follow each other appear as runs then we have a positive
autocorrelation. If the signs alternate, we have a negative autocorrelation. Where none of these patterns appear, then there is a
random pattern. This is the case where the assumption of independent errors is confirmed. The two cases of autocorrelation are
undesirable since they violate the assumption. In the examples above, we have the following runs:
residuals (a) − − − + + + + − − − − + + +
residuals (b) − − + − + + − − + − + − + +
residuals (c) − + − + − + − + − + − + − +
In the following subsections, we will give a formula to calculate so that you do not rely only on visual inspection.
65
3.3.2 First-order autocorrelation
One type of positive or negative autocorrelation is the first-order autocorrelation, denoted as AR(1) where AR stands for
autoregressive and 1 represents the lag. In this case, the residuals are related to their immediate predecessors. That is the error term
in period t (εt ) is related to the error term in period t − 1, namely; εt−1 . The first-order autocorrelation AR(1) is represented by
the equation εt = ϕ1 εt−1 + at . Here we assume that:
● ϕ1 is the correlation coefficient between error terms separated by one time period; and
● a1 , a2 , ... are values randomly and independently selected from a normal distribution having mean zero and a variance
independent of time.
We promised to show how to determine negative or positive autocorrelation. The Durbin-Watson test will assist in achieving
this. This test can be one-sided (one-tailed) or two-sided (two-tailed). It is important to note the meaning given by each version
of a one-sided test. The Durbin-Watson (DW) statistic is used for all the three versions. The DW statistic will not be used if the
residuals are less than 15 in number or more than 100. We also need the number of predictor variables (k). Here the number of
predictor variables is the power (k) of the polynomial from which the residuals were derived. In simple linear regression, k = 1.
Let e1 , e2 , ..., en be the time-ordered residuals. The DW statistic is:
n
2
∑ (ei − ei−1 )
i=2
d= n
∑ e2i
i=1
Positive autocorrelation is the first of the three versions that we look at in the use of the DW statistic.
This version is a one-sided test for positive autocorrelation. It is formulated in clearer detail as follows:
ACTIVITY 3.10
Use the DW test to determine if the following residuals are positively AR(1). Assume that the model for the residuals was of
the fourth power.
66
Error terms: −2 − 7 − 4 3 9 14 4 − 1 − 5 − 3 − 1 2 5 3 − 1 − 4
2
ei e2i ei−1 (ei − ei−1 )
−2 4 − −
−7 49 −2 25
−4 16 −7 9
3 9 −4 49
9 81 3 36
14 196 9 25
4 16 14 100
−1 1 4 25
−5 25 −1 16
−3 9 −5 4
−1 1 −3 4
2 4 −1 9
5 25 2 9
3 9 5 4
−1 1 3 16
−4 16 −1 9
Total 462 340
2
Σ (ei − ei−1 ) 340
2. d= = = 0.7359
Σe2i 462
3. We choose α = 0.01.
4. Since we assume that the model used was of the fourth power, and using α = 0.01,
from the DW table, we read off the values
corresponding to row n = 16 and column k = 4 and . These values are
dL,0.01 = 0.53 and dU,0.01 = 1.66.
While we are discussing this activity, one realises that the choice can be an important factor. To illustrate this point, suppose
in the above activity we chose α = 0.05. We would therefore have dL,0.05 = 0.74 and dU,0.05 = 1.93. The decision is to reject H0
since d < dL,α . Interesting! What do you think?
In order to address the activity fully, the decision reached implies that at the 5% level of significance we conclude that the error
terms are positively autocorrelated.
67
Durbin-Watson test for negative autocorrelation
This version is a one-sided test to test for negative autocorrelation. It is formulated in similar details as for positive correlation as
follows:
ACTIVITY 3.11
Use the DW test to determine if the following residuals are negative AR(1).For argument’s sake, assume that the model used
from where these residuals were derived had a quadratic equation. Use α = 0.05.
Error terms: − 2 − 7 4 − 3 9 0 − 4 − 1 5 − 3 1 − 1 5 3 − 4 9 − 4
2
Σ (ei − ei−1 ) 992
2. d= 2
= = 2.7632.
Σei 359
When you are required to test any hypothesis, show the steps you follow. This is the reason we formulated the steps formally
for this test to make it easy. There is a tendency for students to start with the statistics, then read the table values and make a
decision about a hypothesis that they did not state. When this happens, note that it is meaningless. No marks should be awarded
for it. You have been advised.
68
Durbin-Watson test for autocorrelation
Many problems do not explicitly state that we have to test for positive or negative autocorrelation. In that case, the alternative
hypothesis changes and the decision rules for both the positive and negative autocorrelation must be examined. This version is a
two-sided test for autocorrelation.
We note that the steps are the same for all three different statistical hypothesis tests. There is one possibility when the test does
not give a clue. We say that the test is inconclusive, or the test fails.
ACTIVITY 3.12
Use the DW test to determine if the following residuals are positively or negatively AR(1). For argument’s sake, assume that
the model used from where these residuals were derived was to the fifth power. Use α = 0.10.
Error terms: − 2 7 − 4 3 − 9 14 − 4 1 − 5 3 − 1 2 − 5 3 − 9 5 − 2 7 − 1
69
DISCUSSION OF ACTIVITY 3.12
2
Σ (ei − ei−1 ) 2045
2. d= 2
= = 3.3802
Σei 605
α
3. α = 0.10 Ô⇒ 2
= 0.05.
Conclusion: We reject H0 and conclude that the error terms are negatively autocorrelated.
Before we get to the real stuff, recall the patterns of fluctuations in waves. Waves have peaks and troughs, like the sine and
cosine curves. The magnitude of the fluctuation in these patterns is indicated by the minimum and maximum levels that peaks and
troughs can reach. A swing is a fluctuation shown by peaks and troughs.
Seasonal variation is a component of a time series which is defined as the repetitive and predictable movement around the
trend line in one year or less. It is detected by measuring time intervals in small units, such as days, weeks months or quarters.
Organizations facing seasonal variations, like the motor vehicle industry, are often interested in knowing their relative performance
to the normal seasonal variation. The same is true of the Department of Labour in South Africa, which expects unemployment to
increase in December (maybe even January to March) because recent graduates are just arriving into the market and also schools
have also been given a vacation for the summer. The main point is whether the increase is more or less than expected. Organizations
affected by seasonal variation need to identify and measure this seasonality to help with planning for temporary increases or
decreases in labor requirements, inventory, training, periodic maintenance, and so forth. Apart from these, the organizations need
to know if the seasonal variation they experience are more or less than the average rate.
70
1. The description of the seasonal effect provides a better understanding of the impact this component has upon a particular
series.
2. After establishing the seasonal pattern, methods can be implemented to eliminate it from the time-series to study the effect
of other components such as cyclical and irregular variations. This elimination of the seasonal effect is referred to as
deseasonalising or seasonal adjustment of data.
3. To project the past patterns into the future knowledge of the seasonal variations is a must.
Assumptions
A decision maker or analyst must select one of the following assumptions when treating the seasonal component:
Seasonal Index
Seasonal variation is measured in terms of an index, called seasonal index. It is an average that indicates the percentage of an
actual observation relative to what it would be if no seasonal variation in a particular period is present. It is attached to each period
of the time series within a year. This implies that if monthly data are considered there are 12 separate seasonal indices, one for each
month, and 4 separate indices for quarterly data. The following methods are used to calculate seasonal indices to measure seasonal
variations of a time-series data.
In this module you will be required to develop forecasts by focusing on only two of these methods, namely; method of simple
averages and ratio-to-moving average method.
An example
Now let us try to understand the measurement of seasonal variation by using the Ratio-to-Moving Average method. This
technique provides an index to measure the degree of the seasonal variation in a time series. The index is based on a mean of
100, with the degree of seasonality measured by variations away from the base. For example if we observe the hotel rentals in a
winter resort, we find that the winter quarter index is 124. The value 124 indicates that 124 percent of the average quarterly rental
occurs in winter. If the hotel management records 1436 rentals for the whole of last year, then the average quarterly rental would
be 359(1436/4). As the winter-quarter index is 124, we estimate the number of winter rentals as follows:
In this example, 359 is the average quarterly rental, 124 is the winter-quarter index, and 445 the seasonalised spring-quarter
rental.
71
This method is also called the percentage moving average method. In this method, the original data values in the time-series
are expressed as percentages of moving averages. The steps and the tabulations are given below.
Steps
1. Find the centered 12 monthly (or 4 quarterly) moving averages of the original data values in the time-series.
2. Express each original data value of the time-series as a percentage of the corresponding centered moving average values
obtained in step (1). In other words, in a multiplicative time-series model, we get
This implies that the ratio–to-moving average represents the seasonal and irregular components.
3. Arrange these percentages according to months or quarter of given years. Find the averages over all months or quarters of
the given years.
4. If the sum of these indices is not 1200 (or 400 for quarterly figures), multiply then by a correction factor = 1200/(sum of
monthly indices) or 400/(sum of quarterly indices). Otherwise, the 12 monthly averages or 4 quarterly averages will be
considered as seasonal indices.
Let us calculate the seasonal indices by the ratio-to-moving average method from the following data:
Table data
Year/Quarter I II III IV
2006 75 60 53 59
2007 86 65 53 59
2008 90 72 66 85
2009 100 78 72 93
Now calculations for 4 quarterly moving averages and ratio-to-moving averages are shown in the table below:
Let Q = Quarter, MA = Moving Average, CMA Centered Moving Average, then we complete the following table.
72
y
Year Q y 4 MA total 4 MA 4 CMA (T) ( ) × 100
T
2006 1 75
2 60
274 61.75 + 64.50 53
3 53 (75 + 60 + 53 + 59) = 247 = 61.75 = 63.125 × 100 = 83.96
4 2 63.125
258 64.50 + 65.75 59
4 59 (60 + 53 + 59 + 86) = 258 = 64.50 = 65.125 × 100 = 90.60
4 2 65.125
2007 1 86 263 65.75 65.75 130.80
2 65 263 65.75 65.74 98.86
3 53 263 65.75 66.25 80.00
4 59 267 66.75 67.625 87.25
2008 1 90 274 68.50 70.125 128.34
2 72 287 71.75 75.00 96.00
3 66 313 78.25 79.50 83.02
4 85 323 80.75 81.50 104.29
2009 1 100 329 82.25 83.00 120.48
2 78 335 83.75 84.75 92.04
3 72 343 85.75
4 93
The total for the seasonal index is 126.54 + 95.63 + 82.33 + 94.05 = 398.55
Adjusted seasonal index:
Quarter Value
400
I × 126.96 = 127.00
398.55
400
II × 95.94 = 95.98
398.55
400
III × 82.33 = 82.63
398.55
400
IV × 94.05 = 94.39
398.55
The total of seasonal averages was found to be 398.55. Therefore the corresponding correction factor is 400/398.55 = 1.0018.
Each seasonal average is multiplied by the correction factor 1.0036 to get the adjusted seasonal indices as shown in the above table.
73
Remarks
1. In an additive time-series model, the seasonal component is estimated as S = Y − (T + C + I) where S is for Seasonal values,
Y is for observed data values of the time-series, T is for trend values, C is for cyclical values and I is for irregular values.
3. The deseasonalised time-series data will have only trend (T ), cyclical (C) and irregular (I) components and is expressed
as:
ACTIVITY 3.13
(b) described by y = ax2 + bx + c with a > 0, the y intercept is y = −1, one root is x = 1.
(a) If there is any seasonality, it will be constant seasonal variation. The graph will closely look as Figure 6.1 (b) in the prescribed
textbook.
(b) Taking x ≥ 0 (generally true for time series) and a > 0, the graph is an increasing convex parabola with minimum at point
(0, −1) and passing through point (1, 0). Please draw it. If seasonality exists, this is an example of an increasing seasonal
variation.
74
3.5.1 Time series with constant seasonal variation
Every time series has a trend of some kind, increasing, decreasing or none. If in addition there is seasonality, we determine if it is
constant or increasing seasonal variation. For presenting a time series with constant seasonal variation we use a model of the form:
yt = T Rt + SNt + εt
where
ACTIVITY 3.14
What is the value of T Rt , the trend for a time series with no trend? Write down the above equation when there is no trend.
Assume that T Rt is a linear function of the form T Rt = β0 + β1 t + ϵ.
We recall that if a linear trend is increasing, then its slope β1 is positive while a decreasing trend has a negative slope. No trend
means that β1 = 0. Hence, the above model collapses to yt = β0 + SNt + εt .
µt = T Rt + SNt .
Now, the error term εt is a random variable. The assumption made about the error term is that it satisfies the usual regression
assumptions. Thus, we assume that the error terms have a constant variance, are identically and independently distributed (IID)
with a normal distribution. There is also a further implication that the magnitude of the seasonal swing is independent of the trend.
Let trt and snt be estimates of T Rt and SNt , respectively. Then the estimate of yt is:
Seasonality is a somewhat complex part in a time series. In the next section we use dummy variables to model seasonality.
75
3.5.2 Use of dummy variables
The seasonality of time series defines the seasons to be used. It is possible that a time series can be studied from observations that
are collected at different times of the day. An example is a pancake vendor who confirms that sales are very high in the morning,
low during the day and slightly higher in the afternoon. Here the seasons are the times of the day, and they are three in this case.
If we study a time series collecting data over a five-day week, the number of seasons is five. For some activities we may use a
seven-day week, allowing the number of seasons to be seven. If we use quarters of a year, there are four seasons. A common
tendency is to use months, in which case there will be twelve seasons. This simply means that the number of seasons will differ
from situation to situation, mainly depending on data collection pattern. In order to define dummy variables, we denote the number
of seasons by L.
Study the second rectangular box on p.299 of the textbook. We consider the seasonal factor SNt . We express this factor using
dummy variables as:
where the constants βs1 , βs2 , ..., βs(L−1) are called the seasonal parameters and xs1,t , xs2,t , ...., xs(L−1)t are dummy variables
defined as:
⎧
⎪
⎪ 1
⎪
⎪ if time period t is season 1
xs1,t = ⎨
⎪
⎪
⎪ 0 otherwise
⎪
⎩
⎧
⎪
⎪ 1
⎪
⎪ if time period t is season 2
xs2,t = ⎨
⎪
⎪
⎪ 0 otherwise
⎪
⎩
⎧
⎪
⎪ 1
⎪
⎪ if time period t is season (L − 1)
xs(L−1),t = ⎨
⎪
⎪
⎪ 0 otherwise
⎪
⎩
76
NOTEWORTHY POINTS
● One of the season parameters has to be set at 0, and not necessarily the last one. However, it is often more convenient to set
the last one as we did. Note that some statistical packages take the first season as the reference, instead of the last season. If
we fail to set the dummy variables of one of the seasons to zero, least squares estimation may prove to be complex or require
an unusual approach.
● The dummy variable model is based on the time series that display constant seasonal variation. It is also common to refer to
constant seasonal variation as additive seasonal variation. We often apply transformation methods to a time series that shows
increasing seasonal variation to equalise the seasonal variation before using dummy variables.
Let us briefly discuss Example 6.7 in the prescribed textbook about short-term forecast (up to one year) of the number of
occupied rooms in four hotels in a city. The collected data, reproduced below, were for 14 years.
t yt t yt t yt t yt t yt t yt t yt
1 501 25 555 49 585 73 645 97 665 121 723 145 748
2 488 26 523 50 553 74 593 98 626 122 655 146 731
3 504 27 532 51 576 75 617 99 649 123 658 147 748
4 578 28 623 52 665 76 686 100 740 124 761 148 827
5 545 29 598 53 656 77 679 101 729 125 768 149 788
6 632 30 683 54 720 78 773 102 824 126 885 150 937
7 728 31 774 55 826 79 906 103 937 127 1067 151 1076
8 725 32 780 56 836 80 934 104 994 128 1038 152 1125
9 585 33 609 57 652 81 713 105 781 129 812 153 840
10 542 34 604 58 661 82 710 106 759 130 790 154 864
11 480 35 531 59 584 83 600 107 643 131 692 155 717
12 530 36 592 60 644 84 676 108 728 132 782 156 813
13 518 37 578 61 623 85 645 109 691 133 758 157 811
14 489 38 543 62 553 86 602 110 649 134 709 158 732
15 528 39 565 63 599 87 601 111 656 135 715 159 745
16 599 40 648 64 657 88 709 112 735 136 788 160 844
17 572 41 615 65 680 89 706 113 748 137 794 161 833
18 659 42 697 66 759 90 817 114 837 138 893 162 935
19 739 43 785 67 878 91 930 115 995 139 1046 163 1110
20 758 44 830 68 881 92 983 116 1040 140 1075 164 1124
21 602 45 645 69 705 93 745 117 809 141 812 165 868
22 587 46 643 70 684 94 735 118 793 142 822 166 860
23 497 47 551 71 577 95 620 119 692 143 714 167 762
24 558 48 606 72 656 96 698 120 763 144 802 168 877
This is a case of seasonal data with 12 seasons. The values of t are related to the various months. For example, when September
is mentioned in the first year, we set t = 9. The dummy variable M1 will take the value 1 for September 0 for all the other months
of the first year.The same procedure is applied for all years with December being set to zero for all years since December is taken
as the reference level in the definitions of the 11 dummy variables. For simplicity, the dummy variables were used as follows:
M1 for xs1,t , M2 for xs2,t , . . . , M11 for xs11,t since the seasons s1 , s2 , . . . sL−1 are the months
M1 , M2 , . . . , M11 . The logarithm transformation yt∗ = ln yt was used to obtain a relatively constant variation. The plots of the
77
original data, as well as the square and quadric roots of the data are reported from page 297 to 298 in the prescribed textbook,
but please plot the graphs yourself. A graphical representation of yt∗ versus time is given below. Clearly, the seasonal variation is
constant since the size of the seasonal swing remains the same as the level of the time series increases.
The example wanted a forecast for January of the fifteenth year. The graph indicates that a linear trend is suitable for the data.
Therefore, the following model can be used for prediction:
This is a multiple linear regression model with 12 predictor variables and hence 13 parameters (one per predictor variable and
intercept) have to be estimated. It is very difficult, even impossible, to fit this model by hand. The use of statistical package is
required. Excel, can also be used in some statistical analyses such as regression. The following is Excel output of the analysis.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9943
R Square 0.9886
Adjusted R Square 0.9878
Standard Error 0.0212
Observations 168
ANOVA
df SS MS F Significance F
Regression 12 6.0674 0.5056 1124.79015 4.975E-144
Residual 155 0.0697 0.0004
Total 167 6.1371
78
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 6.2875 0.0064 977.5404 0.0000 6.2748 6.3002
Time 0.0027 0.0000 80.5988 0.0000 0.0027 0.0028
M1 -0.0416 0.0080 -5.1862 0.0000 -0.0575 -0.0258
M2 -0.1121 0.0080 -13.9736 0.0000 -0.1279 -0.0962
M3 -0.0845 0.0080 -10.5317 0.0000 -0.1003 -0.0686
M4 0.0398 0.0080 4.9681 0.0000 0.0240 0.0557
M5 0.0204 0.0080 2.5441 0.0119 0.0046 0.0362
M6 0.1469 0.0080 18.3269 0.0000 0.1311 0.1627
M7 0.2890 0.0080 36.0588 0.0000 0.2732 0.3049
M8 0.3110 0.0080 38.8068 0.0000 0.2952 0.3269
M9 0.0560 0.0080 6.9861 0.0000 0.0402 0.0718
M10 0.0395 0.0080 4.9345 0.0000 0.0237 0.0554
M11 -0.1122 0.0080 -14.0030 0.0000 -0.1280 -0.0964
= β0 + β 1 t + β2 + ε t
With the estimates being completed, the model we will use to forecast the averages for January is:
ŷt∗ = β0 + β1 t + β2
Now for January of the 15th year we note that 14 years include January of year 1 up to December of year 14, which makes
t = 168 months (14 × 12 months). Thus, for January of year 15 we have t = 169. The forecast required is therefore:
∗
ŷ169 = 6.2875 + 0.0027 (169) − 0.0416
= 6.7022.
Since these were transformed data, the required value is ŷ169 = e6.7022 = 814.1951. It is as simple as this.
The computation of the 95% prediction interval of y169 requires more calculations, but not difficult. The average time is:
1 168 1 + 2 + ⋯ + 168
t̄ = ∑t= = 84.5.
168 t=1 168
79
The sum of squared differences of times minus the mean time is:
168
2 2 2 2
∑ (t − t̄) = (1 − 84.5) + (2 − 84.5) + ⋯ + (168 − 84.5) = 395122.
t=1
∗
Then, (169 − t̄)2 = (169 − 84.5)2 = 7140.25. Therefore, the 95% prediction interval of y169 is:
¿ √
2
∗ (168−13) Á 1 (169 − t̄) 1 7140.25
ŷ169 ± t0.025 sÁ À1 + + = 6.7022 ± 1.96(0.0212) 1 + + .
168 ∑168t=1 (t − t̄) 2 168 395122
∗
Hence, the 95% prediction interval of y169 is: [6.6602; 6.7442]. The 95% prediction interval of y169 , obtained by exponentiation
is the following:
[e6.6602 ; e6.7442 ] = [780.7071; 849.1196].
This prediction interval states that the owner of the hotel can be 95% confident that in period 169, that is January of year 15, the
average number of rooms occupied will not be less than 780 and not greater than 850 per day.
ACTIVITY 3.15
(c) For simplicity and practicality, let us assume that the above model is based on a six-day week. Prepare forecasts for:
(a) (i) We see five seasons being displayed in the model. This means that the sixth seasonal parameter has been set to 0. Thus,
six seasons are involved.
(ii) The trend term comes only from 5, which is a constant term. Therefore there is no trend.
80
(iii) The first and the fourth seasons are low seasons because they have negative seasonal parameters.
(iv) The second, third and fifth season are high because they have positive seasonal parameters.
= 3 + εt
= 9 + εt
= 8 + εt
= −9 + εt
= 6 + εt
= 5 + εt
ŷ6 = 5
ŷ28 = 5 − 14 = −9
ŷ17 = 5 + 1 = 6
y55 = 5 − 2 = 3
Note that the trend was given by T Rt = 5. This is a constant, which effectively implies that there is no trend. We next look at
the use of trigonometric functions.
81
3.5.4 Use of trigonometry in a model with a linear trend
It is common that trigonometric terms are incorporated in a time series regression model that shows either constant or increasing
seasonal variation. The general form of such incorporation is:
yt = T Rt + f (t) + εt
where
The two most used trigonometric models for constant variation and linear trend are the following: Let us assume a linear trend
and suppose that:
2πt 2πt
yt = β0 + β1 t + β2 sin ( ) + β3 cos ( ) + εt .
L L
and
2πt 2πt 4πt 4πt
yt = β0 + β1 t + β2 sin ( ) + β3 cos ( ) + β4 sin ( ) + β5 cos ( ) + εt
L L L L
ACTIVITY 3.16
(i) t = L
L
(ii) t =
2
2πL 2πL
yL = β0 + β1 L + β2 sin ( ) + β3 cos ( ) + εL .
L L
= β0 + β1 L + β2 sin(2π) + β3 cos(2π) + εL
= β0 + β1 L + β3 + εL .
82
L
(ii) If t = 2
, then the first model simplifies to:
2π L2 2π L
yL = β0 + β1 L2 + β2 sin ( ) + β3 cos ( 2 ) + ε L .
2 L L 2
= β0 + β1 L2 + β2 sin(π) + β3 cos(π) + ε L
2
= β0 + β1 L2 − β3 + ε L .
2
= β0 + β1 L2 − β3 + β5 + ε L .
2
ACTIVITY 3.17
L
Simplify the two models when t = .
4
Let us look again the hotel data presented above and analysed using dummy variables. Since the transformed data had a linear
trend and a constant seasonal variation and that there are 12 seasons, we can also analyse the data using the following trigonometric
regression model:
2πt 2πt 4πt 4πt
yt∗ = β0 + β1 t + β2 sin ( ) + β3 cos ( ) + β4 sin ( ) + β5 cos ( ) + ϵt
12 12 12 12
where yt∗ = ln yt .
Fitting the model in Excel first requires the trigonometric transformations of the time points. For instance, if t = 1 is in cell
AJ2 and that we need put sin (f rac2πt12) in cell AK2, we must type in cell AK2, the expression = SIN(2 ∗ PI() ∗ AJ2/12),
then after picking the answer, we scroll down in column AK for the remaining answers. The same procedure must be repeated for
all the other values. Excel output of the trigonometric regression model is the following:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9577
R Square 0.9172
Adjusted R Square 0.9146
Standard Error 0.0560
Observations 168
83
ANOVA
df SS MS F Significance F
Regression 5 5.6289 1.1258 358.8645 1.14159E-85
Residual 162 0.5082 0.0031
Total 167 6.1371
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 6.3327 0.0087 728.3737 0.0000 6.3156 6.3499
t 0.0027 0.0001 30.6389 0.0000 0.0026 0.0029
X1 -0.1009 0.0061 -16.4862 0.0000 -0.1130 -0.0888
X2 -0.1266 0.0061 -20.7208 0.0000 -0.1387 -0.1146
X3 0.0662 0.0061 10.8359 0.0000 0.0542 0.0783
X4 0.0190 0.0061 3.1076 0.0022 0.0069 0.0311
The point forecast for January of the 15th , obtained by exponentiation of the above result is:
ŷ169 = e6.6957 = 808.920. Noting, that the model has six parameters and that the standard error is s = 0.0560, the 95% prediction
∗
interval of y169 is:
¿ √
∗ (168−6) Á 1 (169 − t̄)2 1 7140.25
ŷ169 ± t0.025 sÁÀ1 + + = 6.6957 ± 1.96(0.0560) 1 + + = [6.5846; 6.8068].
168 ∑168
t=1 (t − t̄)2 168 395122
Hence, the 95% prediction interval for y169 , obtained by exponentiating the above results is:
Note that this prediction interval is wider, that is less precise, than the prediction interval
[780.7071; 849.1196] obtained using dummy variables. The widths of the two prediction intervals are 903.9735 − 723.8614 =
180.1122 and 849.1196 − 780.7071 = 68.4125, respectively.
yt = β0 β1t εt
84
This is a complicated model that we are not going to strive to unravel. However, since the decomposition of time series may
be multiplicative, we may end up with a form of a time series that resembles growth curves. The question will be “How do we
handle it?” since nonlinear forms are easier to handle, we will transform these to linear forms. If we assume that the parameters are
positive, we can use the natural log-transformation on both sides of the equation of the model to obtain the following linear form:
ln yt = ln β0 + (ln β1 ) t + ln εt
This is a familiar form once you understand how the transformation is done. We are allowed to work on the transformed data
and reverse the answers using inverse of the transformation used.
DISCUSSION OF EXAMPLE 6.9 IN THE PRESCRIBED TEXTBOOK.
The example is about steakhouses opened over the last 15 years as reported in the following table:
Year(t) yt ln yt (t − t̄)2
1 11 2.3979 49
2 14 2.6391 36
3 16 2.7726 25
4 22 3.0910 16
5 28 3.3322 9
6 36 3.5835 4
7 46 3.8286 1
8 67 4.2047 0
9 82 4.4067 1
10 99 4.5951 4
11 119 4.7791 9
12 156 5.0499 16
13 257 5.5491 25
14 284 5.6490 36
15 403 5.9989 49
Total 280
An analyst wanted to predicted the number of steakhouses that will be operating next year.
85
The plot of the number of steakhouses (yt ) versus Year (t) is the following:
Clearly, this graph indicates that the data do not exhibit a linear trend. The plot of ln yt versus t is the following:
This graph indicates that a model with linear trend is suitable for the log-transformed data. Excel output for fitting the model
is given below.
86
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9980
R Square 0.9960
Adjusted R Square 0.9957
Standard Error 0.0755
Observations 15
ANOVA
df SS MS F Significance F
Regression 1 18.4765 18.4765 3239.9689 0.0000
Residual 13 0.0741 0.0057
Total 14 18.5506
Hence, the fitted model is: ŷt∗ = 2.0701 + 0.2569t. To obtain the corresponding nonlinear model, note that β̂0 = exp(2.0701) =
7.9256 and β̂1 = exp(0.2569) = 1.2929. Therefore, the fitted nonlinear model is:
∗ ∗
The point prediction of y16 is: ŷ16 = 2.0701 + 0.2569(16) = 6.1805 which implies that the point prediction of y16 is: ŷ16 =
e6.1805 = 483.2335. Therefore, the number of operating steakhouses in year 16 is approximately 483.
∗
Now, let us calculate the 95% prediction intervals for y16 and y16 .
1 15
The average time is t̄ = ∑t=1 t = 1+2+⋯+15
15 15
= 8. The squared differences (t − t̄)2 and their sum are has been included in the table
containing the original and logged data. In addition (16−t̄)2 = 82 = 64, the standard error is s = 0.0755 and t α (n−2) = t13
0.025 = 2.160.
2
∗
Hence, the 95% prediction interval for y16 is:
¿ √
∗ 13
Á
Á
À 1 (16 − t̄)2 1 64
ŷ16 ± t0.025 s 1 + + 15 = 6.1805 ± 2.160(0.0755) 1 + + = [5.9949; 6.3661]
15 ∑t=1 (t − t̄) 2 15 280
∗
The 95% prediction interval of y16 , obtained by exponentiating the prediction limits of y16 is the following:
Thus, the analyst is 95% confident that, in average, the number of operating steakhouses in year 16 will not be less than 401 and
will not be more than 582.
87
The AR(1) is a special case of the general autoregressive process of order p given by:
ACTIVITY 3.18
εt = ϕ1 εt−1 + ϕ2 εt−2 + at .
To make sure that you have the right period, your polynomial as a function of t should suit the given time period. As an example,
starting from January of the current year, to evaluate ŷ20 when time is given as monthly would imply August of the following year.
This means that the equation used should be suitable for August months. On the other hand, if quarters are used, starting from the
current year, ŷ20 would mean the fourth quarter of the fifth year. If you are dealing with months, and your equation is given as a
function t, any future time given should be converted into months. For example, if you are required to predict a value for February
of the fourth year from the current year you should be able to set t = 38. If you are to find a prediction for the third quarter of the
seventh year, you should be able to set that t = 31 for that prediction on the quarterly model.
3.9 Conclusion
This unit introduced important aspects of time series. It used graphical plots to demonstrate some patterns, then incorporated some
applications of estimation. Trend and seasonality were discussed. The AR(1) process was also introduced, and the DW statistic
was used to detect positive and negative first order autocorrelations. Two types of seasonal variation, constant and increasing
seasonal variations were discussed. Modelling using dummy variables and trigonometric ratios were discussed in great details.
Growth models were also discussed. Real life examples were used to illustrate the theoretical components.
We are ready for the next important unit.
88
Unit 4
89
4.1 Introduction
This is a continuation of concepts introduced in earlier chapters. Components of a time series should now be at your fingertips.
Are they? This unit deals with the decomposition of a time series, which aims to isolate the influence of each of the components
on the actual time series. It is presented as Chapter 7 in the prescribed textbook.
Decomposition of time series is an important technique for all types of time, especially for seasonal adjustment. It seeks to
construct, from an observed time series, a number of component series (that could be used to reconstruct the original time series
by additions or multiplications) where each of these has a certain characteristic or type of behaviour.
ACTIVITY 4.1
The components into which time series can be decomposed into are:
● the Trend Component Tt that reflects the long term progression of the series
● the Cyclical Component Ct that describes repeated but non-periodic fluctuations, for instance caused by the economic cycle
● the Irregular Component It (or “noise”) that describes random, irregular influences. Compared to the other components it
represents the residuals of the time series.
Multiplicative decomposition of a time series model grants that the actual values of a time series yt be presented as a product
of the trend component T Rt , a seasonal index SNt , a cyclical index CLt and an irregular measure IRt . The trend component
measured in actual units, the cyclical index is then expressed relative to the trend, and the seasonal index is expressed relative to
the trend and the cyclical index. Thus, the multiplicative decomposition model is:
When a time series exhibits increasing seasonal variation, it is represented in this form. Statistical analysis is useful for effective
isolation and analysis of the trend and the seasonal components. Hence, we will examine statistical approaches to quantify trend
and seasonal variations. These are the components that usually account for a significant proportion of the actual values in a time
series. Isolating them is an opportunity to explain the actual time series values.
90
4.2.1 Trend analysis
This discussion uses moving averages to analyse trend. When averaging out the short-term fluctuations in a time series, the trend
is identified. Either a smooth curve or a straight line would emerge. Earlier we discussed time series regression. It is one method
used to isolate trend. The other is by use of moving averages (MAs). In this section we will discuss the MA.
The term ‘trend’ may be seen as a tendency or resulting behaviour of occurrence of something observed over a long term. In a
nutshell, “trend analysis” is a term referring to the concept of collecting information and attempting to spot a pattern, or trend, in
the information. In some fields of study, the term “trend analysis” has more formally-defined meanings.
For example, in project management, trend analysis is a mathematical technique that uses historical results to predict future
outcome. This is achieved by tracking variances in cost and schedule performance. In this context, it is a project management
quality control tool.
Although trend analysis is often used to predict future events, it could be used to estimate uncertain events in the past, such
as how many ancient kings probably ruled between two dates, based on data such as the average years which other known kings
reigned.
● Add observations for the first three periods and find their average. Place the answer opposite the middle time period, i.e.
opposite the second measurement.
● Remove the observation for the earliest period and replace it by the fourth measurement. Obtain the new average and place
it opposite the third measurement.
● Repeat the process until you do not have enough observations to produce a MA of three periods.
Note that the above illustration used a case where you will be able to place the MA next to a middle observation in the first
average. The same will be easy when a 5-period MA is needed, or a 7-period one. That is, for odd number MA we will not struggle
to place the MA in the middle. There will be practical cases where we need to use 2-period MA, 4-period MA, and so on. The
prescribed textbook provides several examples, but we will discuss see some examples in the activities of this unit.
ACTIVITY 4.2
They were collected for three days over the regular time periods 8–12 noon, 12–4 p.m. and 4–8 p.m. Calculate appropriate
moving averages and explain the trend of the data.
91
DISCUSSION OF ACTIVITY 4.2
We can call these times morning, afternoon and evening for convenience. It seems obvious to use a 3-period MAs. According
to the guideline given, we start with averages per day:
Average
Day 1 Morning 170
Afternoon 140 540/3 = 180
Evening 230
The average for each day has been placed opposite the midpoint of that day, i.e., the afternoon period. We need a trend figure
for every period, not just for the afternoons. It is not yet a clearly moving average. We make them “move” by removing oldest and
replacing with the newest observations. The table becomes:
Now, we answer the question about the trend. We note that the MAs are clearly increasing. This simply informs us that on
average, the above observations are increasing. Hence, we have an increasing trend.
92
The steps for quantifying the seasonal variation are:
ACTIVITY 4.3
176
Day 2 Morning 186
= 0.9462
152
Afternoon 187
= 0.8128
233
Evening 189
= 1.2328
182
Day 3 Morning 192
= 0.9479
161
Afternoon 195
= 0.8256
Evening
Due to random influences, values for the same periods differ. But it is clear that there is a common pattern. For example,
afternoon values of Actual/MA are similar in size (0.78–0.81–0.83) The same is true about the evening figures of 1.26 and 1.23;
and with “luck” the morning figures are both 0.95. The seasonal indices are found by computing the averages of Actual/MA per
season as in the following table.
The snt numbers 0.9471, 0.8054 and 1.2483 are unadjusted seasonal indices since their 3.0008 is not the number of seasons
L = 3. Adjusted seasonal indices are calculated by multiplying each one by the correction factor
L
L
∑t=1 snt
93
where in this case L = 3. Therefore, the seasonal indices in this example are:
3 3
sn1 = sn1 × 3
= 0.9471 × = 0.9468.
∑t=1 snt 3.0008
3 3
sn2 = sn2 × 3
= 0.8054 × = 0.8052.
∑t=1 snt 3.0008
3 3
sn3 = sn3 × 3
= 1.2483 × = 1.2480.
∑t=1 snt 3.0008
Note that sn1 + sn2 + sn3 = L = 3 as required for seasonal indices.
We have now isolated the trend and seasonal effects present in the time series. Knowledge of seasonal effects is important for
forecasting as well as for removing strong seasonal effects that may conceal other important features or movements in a data set.
Knowledge that there are random variations is of no use in forecasting, but being essentially unpredictable, they serve as a
guide to the reliability of a forecast. When there are very small random influences a process is likely to produce reliable forecasts
while large fluctuations may completely upset even the most carefully calculated forecasts.
yt
dt = .
snt
For our example, see the deseasonalised observations in column 5 of the following table (later referred to as the main table).
Time (t) yt MA snt dt trt trt × snt clt × irt clt irt
1 170 - 0.9468 179.5522 177.0841 167.6632 1.0139 - -
2 140 180 0.8052 173.8698 179.6232 144.6326 0.9680 0.9979 0.9700
3 230 182 1.248 184.2949 182.1623 227.3386 1.0117 0.9954 1.0164
4 176 186 0.9468 185.8893 184.7014 174.8753 1.0064 1.0088 0.9977
5 152 187 0.8052 188.773 187.2405 150.7661 1.0082 0.9995 1.0087
6 233 189 1.248 186.6987 189.7796 236.8449 0.9838 0.9972 0.9866
7 182 192 0.9468 192.2264 192.3187 182.0873 0.9995 1.0031 0.9964
8 161 195 0.8052 199.9503 194.8578 156.8995 1.0261 1.0027 1.0234
9 242 - 1.248 193.9103 197.3969 246.3513 0.9823 - -
Now, assume that the trend is linear and thus can be modelled as
T Rt = β0 + β1 t + ϵt
where ϵt are error terms with mean zero. The estimation of the parameters β0 and β1 can be done using least squares.
The following table contains information to be used in the estimation.
94
Time (t) dt tdt t2
1 179.5522 179.5522 1
2 173.8698 347.7397 4
3 184.2949 552.8846 9
4 185.8893 743.5572 16
5 188.773 943.8649 25
6 186.6987 1120.192 36
7 192.2264 1345.585 49
8 199.9503 1599.603 64
9 193.9103 1745.192 81
Total 45 1685.165 8578.171 285
Hence,
9 ∑9t=1 tdt − ∑9t=1 t ∑9t=1 dt 9 × 8578.171 − 45 × 1685.165
β̂1 = 2
= = 2.5391
9 ∑9t=1 t2 − (∑9t=1 t) 9 × 285 − (45)2
and
9 9
∑ yt − β̂1 ∑t=1 t 1685.165 − 2.5391 × 45
β̂0 = t=1 = = 174.5450.
9 9
Hence, the fitted equation of the trend is: trt = 174.5450 + 2.5391t. The trend values in our example are reported in the sixth
column of the main table. For instance, the first value is tr1 = 174.5450 + 2.5391(1) = 177.0841.
In the present example, the above values are reported in the 8th column of the main table. The above equation does not allow the
estimation of CLt but, as stated in the prescribed textbook, CLt can be estimated by the following three-period moving average:
clt × irt
irt = .
clt
For our example, the last column of the main table gives the estimates of the irregular components.
If there is no pattern in the irregular component, the can predict that all values irt are equal to one. In that case, we have:
95
if a well-defined cycle exists in the time series. If there is no well-defined cycle, that is when the clt values are close to one, the
point forecasts are given by:
ŷt = trt × snt .
ACTIVITY 4.4
(a) Does the time series exhibit a well-defined cycle? Explain your answer.
(a) The time series does not exhibit a well-defined cycle since all the cyclical values are close to one.
(b) Since all cyclical values and irregular values are close to one, point forecasts are determined by the equation
The point forecast for the 9 observations are displayed in the 9th column of the main table.
ACTIVITY 4.5
The following data represent sales of pies in thousands in the various quarters of the years 2001 to 2003. Using a multiplicative
decomposition of a time series, determine the estimates of the four components.
Quarter 1 2 3 4 1 2 3 4 1 2 3 4
Value 142 54 162 206 130 50 174 198 126 42 162 186
All the procedures for analysing time series with seasonal patterns were explained. The appropriate number of periods for the
moving average is clearly four, so the first moving average is:
142 + 54 + 162 + 206
= 141
4
This figure obviously belongs to the middle of the first year, which comes halfway in between the second and third quarters. The
table follows at the bottom of this explanation. This will be true with other moving averages too. Therefore we need an additional
step called centering the moving averages. The results of this centering give rise to the centred moving averages (CMAs).
The moving average 141 applies to a point halfway between the second and the third quarters of year 2001, while the figure
138 applies midway between the third and fourth quarters, we can obtain the moving averages directly comparable with the fourth
quarter by taking the average of 141 and 138, which is 139.5. Doing the same for all moving averages, we obtain the values in the
fifth and sixth columns of the following table.
yt
To calculated the seasonal indices, we first calculate the ratios CM A
as displayed in the seventh column of the table. The
seasonal indices are calculated in the similar manner as the one we had in Activity 4.3. The following table gives all values.
96
Year Quarter 1 Quarter 2 Quarter 3 Quarter 4 Total
1 1.1613 1.4982
2 0.9386 0.3597 1.2655 1.4559
3 0.9438 0.3218
Total 1.8824 0.6815 2.4268 2.9541
Mean: snt 0.9412 0.34075 1.2134 1.47705 3.9724
snt 0.9477 0.3431 1.2218 1.4873 4
yt
These indices were copied in the 8th column of the main table, then the deseasonalised values snt
were inserted in the 9th
column of the table. The deseasonalised values were used as response values in the linear model
d t = β0 + β1 t + ϵt .
The following table provides useful information for parameter estimation and its last two columns will be used later for interval
prediction.
Hence,
12 ∑12 12 12
t=1 tdt − ∑t=1 t ∑t=1 dt 12 × 10418.49 − 78 × 1649.783
β̂1 = 2
= = −2.1336
12 2 12
12 ∑t=1 t − (∑t=1 t) 12 × 650 − (78)2
and
12 12
∑ yt − β̂1 ∑t=1 t 1649.783 + 2.1336 × 78
β̂0 = t=1 = = 151.3503.
12 12
Hence, the fitted equation of the trend is: trt = 151.3503 − 2.1336t. The trend values in our example are reported in the tenth
column of the main table.
For instance, the first value is tr1 = 151.3503 − 2.1336(1) = 149.2167.
97
yt
Yr Q t yt MA CM A CM A
snt dt trt trt × snt clt × irt clt irt
1 1 1 142 - - - 0.95 149.84 149.22 141.41 1.00 - -
1 2 2 54 - - - 0.34 157.39 147.08 50.46 1.07 1.00 1.07
1 3 3 162 141 139.5 1.16 1.22 132.59 144.95 177.10 0.91 0.98 0.93
1 4 4 206 138 137.5 1.50 1.49 138.51 142.82 212.41 0.97 0.95 1.02
2 1 5 130 137 138.5 0.94 0.95 137.17 140.68 133.32 0.98 1.00 0.98
2 2 6 50 140 139 0.36 0.34 145.73 138.55 47.54 1.05 1.02 1.03
2 3 7 174 138 137.5 1.27 1.22 142.41 136.42 166.67 1.04 1.03 1.01
2 4 8 198 137 136 1.46 1.49 133.13 134.28 199.72 0.99 1.01 0.98
3 1 9 126 135 133.5 0.94 0.95 132.95 132.15 125.24 1.01 0.98 1.03
3 2 10 42 132 130.5 0.32 0.34 122.41 130.01 44.61 0.94 0.99 0.95
3 3 11 162 129 - - 1.22 132.59 127.88 156.24 1.04 0.99 1.05
3 4 12 186 - - - 1.49 125.06 125.75 187.02 0.99 - -
The point forecast ŷt = trt × snt are reported in the 11th column of the table.
yt
The ratios clt × irt = trt ×snt
are reported in the 12th column of the table.
Clearly, the time series in this example does not have a well-defined cycle since all the cyclical values are close to one.
where α is the level of significance, and Bt [100(1 − α)] is the error bound in a 100(1 − α)% prediction interval [trt ± Bt (1 − α)]
for the deseasonalised observation dt = T Rt + ϵt = β0 + β1 t + ϵt .
ACTIVITY 4.6
Consider the data given in Activity 4.5. Determine the 95% prediction interval for the pie sales in the first quarter of year 2004.
The estimate of the trend was found to trt = 151.3503 − 2.1336t. The first quarter of year 2004 corresponds to t = 13 and
snt = 0.95. The point prediction of the trend at time t = 13 is tr13 = 151.3503 − 2.1336(13) = 123.6135. Hence, the point forecast
(prediction) is
ŷ13 = tr13 × sn13 = [151.3503 − 2.1336(13)] × 0.95 = 117.4328.
√ √ √
n 12
∑t=1 (dt − trt )2 ∑t=1 (dt − trt )2 460.1589
s= = = = 6.7835.
n−2 12 − 2 10
98
Also,
(n−2) (10)
tα/2 = t0.025 = 2.228
and the average time is
1 12
t̄ = ∑ t = 6.5
12 t=1
which implies that (13 − t̄)2 = 42.25. Finally, note that from the table used for parameter estimation, we have:
12
2
∑ (t − t̄) = 143.
t=1
Hence, the error bound for predicting the 13th observation using a 95% confidence level is:
141.3602 − 105.8668
B13 [95] = = 17.7467.
2
It follows that an approximate 95% prediction interval for y13 is:
ŷ13 − 17.7467; ŷ13 + 17.7467] = [117.4328 − 17.7467; 117.4328 + 17.7467] = [99.6861; 135.1795].
EXERCISE
Consider the average number of calls received per day at a Computer Club Warehouse (CCW) call centre for the past three
years. You will also realise that the pattern of the call volumes can be of help in the analysis.
(b) What type of trend and seasonal variation do you observe? Explain your answer.
(c) Perform a time series analysis using the multiplicative decomposition, that is estimate the four time series components.
(d) Calculate the point forecast and the 95% prediction interval for the call volume in the first quarter of the fourth year. The
next table presents the data.
If you are unsure whether your answers are correct, discuss them with your fellow students in the Discussion forum on the
module website.
99
4.3 Additive decomposition
A time series that exhibits constant seasonal variation is modelled using an additive decomposition model given by:
with the known notation. The estimation of the four components follows the steps used for a multiplicative decomposition
model. The only differences are the following:
(3) The sum of the adjusted seasonal indices is zero, not the number of seasons.
ACTIVITY 4.7
Consider the following time series data (not real life data, but fictional).
(b) What type of trend and seasonal variation do you observe? Explain your answer.
(c) Perform a time series analysis using the additive decomposition, that is estimate the four time series components.
(d) Calculate the point forecast and the 95% prediction interval for the sales in Spring 2022. The next table presents the data.
Attempt the activity as an exercise following the same procedure as in Activities 4.5 and 4.6.
100
4.4 Conclusion
This unit dealt with two methods for the decomposition of time series. The estimation of the four components was done using
moving averages. Two examples were used to illustrate component estimation, point and interval prediction for the multiplicative
decomposition. The additive decomposition was discussed briefly since the component estimation and predictions follow the same
steps as the one used in the multiplicative decomposition. We are now ready for the last unit of the module.
101
Unit 5
Exponential smoothing
• Apply simple exponential smoothing, tracking signals, Holt’s trend-corrected exponential smoothing, the Holt-Winters
model and the damped trend model.
• Fit each model to available data and describe its appropriateness for a given situation.
102
Outcomes - At the
end of the unit you Assessment Content Activities Feedback
should be able to
- explain methods of - exponential
- analyse data - perform - discuss
smoothing - smoothing
appropriate likely
constants
calculations errors
- damped trend
- perform simple - explore data - perform
- simple - explain
exponential with various calculations
exponential alternative
smoothing smoothing - interpret
soothing methods
constants data
- tracking signals - calculate the - discuss the
- monitor the - measure the
statistics solutions
forecasting system strength of
- interpret
forecasts
them
- describe and - Holt’s trend
- determine - perform apt - discuss the
apply various corrected
aptness of calculations solutions
smoothing smoothing
various for each
approaches - Holt-Winters
methods method
method
- Damped
trend
method
- Holt’s, Holt- - discuss the
- forecast future - develop - perform
Winters and solutions
values of a forecast calculations
damped trend
time series values
5.1 Introduction
Changing trend and seasonality of a time series over time makes forecasting difficult to undertake. This is when exponential
smoothing becomes useful. Smoothing constants are used to smooth a rough time series. In this module we study various smoothing
methods, and a tracking method to monitor the process. The methods are simple exponential, Holt’s trend corrected exponential,
Holt-Winters, and damped trend exponential.
A common way to characterise exponential smoothing is to consider it as a technique that can be applied to time series data,
either to produce smoothed data that are to be presented, or to develop forecasts. The observed phenomenon may be an essentially
random process, or it may be an orderly, but noisy, process. Different smoothing techniques are available as presented in this unit,
and each one for a specific purpose. For example, simple moving average is one in which the past observations are weighted equally,
and exponential smoothing is one which assigns higher weights to recent observed and lower weights to remote observations.
Exponential smoothing is commonly applied to financial market and economic data, but it can be used with any discrete set of
repeated measurements.
103
5.2 Simple exponential smoothing
This method is used when data pattern is horizontal (i.e., there is neither cyclic variation nor trend in the historical data). Let us
first explore the following model.
The model
yt = β0 + εt .
is used for forecasting when there is no trend or seasonal pattern and the mean of the time series remains constant.
Do you remember this formula? Equal weights are given to each observation as
1 n n 1
y= ∑ yt = ∑ yt .
n i=1 i=1 n
Under these conditions, we require a model that would describe the data more suitably, and estimates for the mean that may
change from one time period to the next. Simple exponential smoothing (SES) is one such method; it does not use equal weights.
Instead, more recent observations are given more weight.
ACTIVITY 5.1
Indicate “True” or “False” for each the following statements about SES. In case of “False”, correct the statement. Justify the
correct statements.
104
(3) Oldest observations receive the most weights.
False. Oldest observations receive the least weights.
We release you from suspense and define SES formally. Let y1 , y2 , ..., yn be a time series with a mean that is changing slowly
over time but having neither a trend nor seasonal pattern. Then the estimate for the level (or mean) of the time series in period T is:
ℓT = αyT + (1 − α) ℓT −1
where
The value of α determines the degree of smoothing and how responsive the model is to fluctuation in the time-series data. This
value is arbitrary and is determined both by the nature of the data and the sensitivity of the forecaster as to what constitutes a good
response rate. A smoothing constant close to zero leads to a stable model while a constant close to one is highly reactive. Typically,
constant values between 0.01 and 0.3 are used.
Let us illustrate with the data we have seen before in order to feel comfortable at the early stage of SES exploration.
ACTIVITY 5.2
Consider the cod catch data that were discussed in Activity 3.5 of Unit 3. Find the smoothing levels at all the time points, then
calculate the forecasts made in last period, the forecast errors and the squared forecast errors.
If you recall, the plots for these data showed that the data had no trend
ℓT = αyT + (1 − α) ℓT −1 , we need an initial smoothing level l0 . To our knowledge, there is no consensus in the statistical liter-
ature about the initial values and thus many statistical packages use different values. In Excel, l0 = y1 , that is the first observation.
For the cod catch data, the author of the prescribed textbook chose the mean of the observations in the first year as the value of l0 .
That is:
1 12 1
ℓ0 = ∑ yt = (362 + 381 + 317 + ... + 343)
12 t=1 12
1
= (4329) = 360.6667
12
In order to illustrate, we use α = 0.1. Does it satisfy the given restriction? We want to explore by determining levels from these
data.
105
ℓ1 = αy1 + (1 − α) ℓ0
= 360.8000
ℓ2 = αy2 + (1 − α) ℓ1
= 362.8200
These can be calculated further to ℓ24 . Forecast errors can be calculated for all these mean levels. Do you remember the forecast
errors? Detailed results, including the estimates of forecasts made last period, are in the following table.
106
In SES, a point forecast at time T of any future value yT +τ is the last estimate ℓT for the mean of the time series since there is
no trend or seasonal pattern to observe. That is:
ŷT +τ = ℓT (τ = 1, 2, 3, ...)
ACTIVITY 5.3
Write down the point forecast in time period t − 1 of the value yt−1 .
We dealt with the standard error(s) and the sum of squares for error (SSE) in the earlier chapters. The current version is that the
standard error at time T is:
¿
ÁT
√ Á ∑ (yt − ℓt−1 )2
SSE Á Á
À t=1
s= =
T −1 T −1
For any τ , a 95% prediction interval computed in time period T for yT +τ is:
√ √
[ℓT − z[0.025] s 1 + (τ − 1) α2 ; ℓT + z[0.025] s 1 + (τ − 1) α2 ]
ACTIVITY 5.4
Write down the formula for a 95% prediction interval computed in time period T for yT +τ when:
(i) τ = 1
(ii) τ = 2.
√ √
(ii) For τ = 2, the prediction interval for yT +2 is: [ℓT − z[0.025] s 1 + α2 ; ℓT + z[0.025] s 1 + α2 ].
We can go on and experiment with different values of τ , but it is a theoretical exercise more than it is an application. For this
reason, let us consider a practical example by revisiting the Cod Catch data discussed in Unit 3, Activity 3.5.
107
ACTIVITY 5.5
Consider cod catch data that were discussed in Unit 3. Find the point prediction and the 95% prediction intervals in months
made in month 24 for the months 25, 26 and 27.
In activity 5.2, we arbitrarily used α = 0.2 as the smoothing constant. However, using Solver in Excel gives 0.034 as the optimal
value of α that minimizes the error sum of squares (SSE). With α = 0.034, the optimal values of lT , Ft , et = yT − lT , e2t and SSE
are given in the following table.
108
For prediction intervals we need the value of the standard error. The standard error is:
¿
ÁT
√ Á ∑ (yt − ℓt−1 )2
SSE Á Á
À t=1
√
s= = = 28089.1756
23
= 34.9465.
T −1 T −1
The information obtained from this activity is that when α is small the limits and, therefore lengths, of the prediction for future
values are practically the same.
ACTIVITY 5.6
Write down the model ℓT = αyT + (1 − α) ℓT −1 in terms of ℓT −1 and yT − ℓT −1 , then give an interpretation.
ℓT = αyT + (1 − α) ℓT −1
= αyT + ℓT −1 − αℓT −1
= ℓT −1 + αyT − αℓT −1
= ℓT −1 + α (yT − ℓT −1 )
This form is called the error correction form. It means that the smoothing level lT at time T is the sum of the smoothing level
lT −1 at the previous time, T − 1, plus a fraction α of the one-period-ahead forecast error yT − lT −1 .
109
T
Y (α, T ) = ∑ et (α).
t=1
ACTIVITY 5.7
Determine the sum of forecast errors for T = 24 using the cod catch data.
By definition, a single-period-ahead forecast is given by eT (α) = yT − lT −1 . Hence, the forecast errors indicted in the table
under Activity 5.4 are in the column with heading “Forecast errors”. Their sum is −120.28190.
Forecast errors are in column E.
ACTIVITY 5.8
Show that:
= Y (α, T − 1) + eT (α)
One of the tracking signal instruments is the simple simple cusum (cumulative sum) tracking signal C(α, T ) defined by
Y (α, T )
C(α, T ) = ∣ ∣
M AD(α, T )
where M (α, T ) = α∣eT (α)∣ + (1 − α)M AD(α, T − 1) is the smoothed mean absolute deviation (M AD).
Remember that:
n
∑ ∣et ∣
t=1
M AD =
n
110
If C (α, T ) is large, then the sum of forecast errors Y (α, T ) is large relative to the mean absolute deviation M AD (α, T ).
This means that the forecasting system produces errors that are either consistently positive or consistently negative. This means
that a large C (α, T ) value shows that the forecasting system produces forecasts that are consistently smaller or consistently larger
than the actual time series value. If the forecasting system is accurate, it should produce (at least approximately) an equal number
of negative and positive errors. Thus, a large C (α, T ) indicates that the forecasting system does not perform accurately. Note that
we have still have not quantified what a large value of C (α, T ) means. There are no hard and fast rules for it. It will be given with
every situation. However, there is a rule of thumb based on a predefined control limit K. If C(α, T ) exceeds K for two or more
consecutive periods, then this is an indication the forecasts errors are larger the ones expected for an accurate forecasting system.
The following table gives the control limit K for selected smoothing constants where for 5% and 1% chance of having larger than
normal value of C(α, T ).
5% 1%
α 0.1 0.2 0.3 0.1 0.2 0.3
K 5.6 4.1 3.5 7.5 5.6 4.9
ACTIVITY 5.9
Consider the Cod Catch data discussed in Unit 3 and in the above activities.
(a) Calculate the simple cusum tracking signal for all the 24 time points.
(b) Is the forecasting appropriate for the data? Explain your answer. Use both 5% and 1% as the significance levels.
The important quantities needed to calculate values of the simple cusum tracking are given in the following table:
111
T
t yt lt Ft et ∣et ∣ Y (α, T ) ∑ ∣et ∣ M ADT M AD(α, T ) C(α, T )
t=1
0 360.67
1 362 360.71 360.67 1.33 1.33 1.33 1.33 1.33 1.33 1.00
2 381 361.40 360.71 20.29 20.29 21.62 21.62 10.81 3.23 6.70
3 317 359.89 361.40 -44.40 44.40 -22.78 66.02 22.01 7.35 3.10
4 297 357.75 359.89 -62.89 62.89 -85.67 128.92 32.23 12.90 6.64
5 399 359.16 357.75 41.25 41.25 -44.43 170.16 34.03 15.74 2.82
6 402 360.61 359.16 42.84 42.84 -1.58 213.01 35.50 18.45 0.09
7 375 361.10 360.61 14.39 14.39 12.80 227.39 32.48 18.04 0.71
8 349 360.69 361.10 -12.10 12.10 0.70 239.49 29.94 17.45 0.04
9 386 361.55 360.69 25.31 25.31 26.01 264.80 29.42 18.23 1.43
10 328 360.41 361.55 -33.55 33.55 -7.54 298.35 29.84 19.76 0.38
11 389 361.38 360.41 28.59 28.59 21.05 326.94 29.72 20.65 1.02
12 343 360.76 361.38 -18.38 18.38 2.67 345.33 28.78 20.42 0.13
13 276 357.88 360.76 -84.76 84.76 -82.09 430.08 33.08 26.85 3.06
14 334 357.06 357.88 -23.88 23.88 -105.97 453.96 32.43 26.56 3.99
15 394 358.32 357.06 36.94 36.94 -69.03 490.90 32.73 27.59 2.50
16 334 357.49 358.32 -24.32 24.32 -93.35 515.22 32.20 27.27 3.42
17 384 358.39 357.49 26.51 26.51 -66.84 541.72 31.87 27.19 2.46
18 314 356.88 358.39 -44.39 44.39 -111.24 586.12 32.56 28.91 3.85
19 344 356.45 356.88 -12.88 12.88 -124.12 599.00 31.53 27.31 4.55
20 337 355.79 356.45 -19.45 19.45 -143.57 618.45 30.92 26.52 5.41
21 345 355.42 355.79 -10.79 10.79 -154.35 629.23 29.96 24.95 6.19
22 362 355.64 355.42 6.58 6.58 -147.77 635.82 28.90 23.11 6.39
23 314 354.23 355.64 -41.64 41.64 -189.41 677.46 29.45 24.97 7.59
24 365 354.59 354.23 10.77 10.77 -178.64 688.23 28.68 23.55 7.59
∑t=1 ∣et ∣
T
M ADT = T
for t = 1, 2 . . . , 24
(α,T )
C(α, T ) = ∣ MYAD(α,T )
∣
For a smoothing constant α = 0.1, the control limits at 5% and 1% levels of significance are respectively K = 5.6 and K = 7.5.
The simple cusum tracking signals greater than K = 5.6 correspond to the observations at times t = 2, 4, 20, 21, 22, 23, 24; that is
seven observations. The simple cusum tracking signals greater than for K = 7.5 correspond to the observations at times t = 23, 24,
but the difference is very small. Since C(α, T ) > K in two or more than two time periods, we conclude that the forecasting process
is not accurate for both the 5% and the 1% levels of significance. However, the deviation from accuracy is less severe at the 1%
level of significance.
Another tracking signal that had an extensive use in practice is the smoothed error tracking signal defined as follows:
First define the smoothed error (E) of the one-period-ahead forecast errors as:
112
E(α, T ) = αeT (α) + (1 − α)E(α, T − 1).
E(α, T )
S(α, T ) = ∣ ∣.
M AD(α, T )
ACTIVITY 5.10
Consider the Cod Catch data discussed in Unit 3 and in the above activities.
(a) Calculate the smoothed error tracking signal for all the 24 time points.
(b) Is the forecasting appropriate for the data? Explain your answer. Use both5% and 1% as the significance levels.
The important quantities needed to calculate values of the smoothed error tracking signal are given in the following table:
T
t yt lt Ft et et Y (α, T ) ∑ ∣et ∣ M ADT M AD C E S
t=1
0 360.67
1 362 360.71 360.67 1.33 1.33 1.33 1.33 1.33 1.33 1.00 1.33 1.00
2 381 361.40 360.71 20.29 20.29 21.62 21.62 10.81 3.23 6.70 3.23 1.00
3 317 359.89 361.40 -44.40 44.40 -22.78 66.02 22.01 7.35 3.10 -1.53 0.21
4 297 357.75 359.89 -62.89 62.89 -85.67 128.92 32.23 12.90 6.64 -7.67 0.59
5 399 359.16 357.75 41.25 41.25 -44.43 170.16 34.03 15.74 2.82 -2.78 0.18
6 402 360.61 359.16 42.84 42.84 -1.58 213.01 35.50 18.45 0.09 1.78 0.10
7 375 361.10 360.61 14.39 14.39 12.80 227.39 32.48 18.04 0.71 3.04 0.17
8 349 360.69 361.10 -12.10 12.10 0.70 239.49 29.94 17.45 0.04 1.53 0.09
9 386 361.55 360.69 25.31 25.31 26.01 264.80 29.42 18.23 1.43 3.91 0.21
10 328 360.41 361.55 -33.55 33.55 -7.54 298.35 29.84 19.76 0.38 0.16 0.01
11 389 361.38 360.41 28.59 28.59 21.05 326.94 29.72 20.65 1.02 3.00 0.15
12 343 360.76 361.38 -18.38 18.38 2.67 345.33 28.78 20.42 0.13 0.87 0.04
13 276 357.88 360.76 -84.76 84.76 -82.09 430.08 33.08 26.85 3.06 -7.70 0.29
14 334 357.06 357.88 -23.88 23.88 -105.97 453.96 32.43 26.56 3.99 -9.31 0.35
15 394 358.32 357.06 36.94 36.94 -69.03 490.90 32.73 27.59 2.50 -4.69 0.17
16 334 357.49 358.32 -24.32 24.32 -93.35 515.22 32.20 27.27 3.42 -6.65 0.24
17 384 358.39 357.49 26.51 26.51 -66.84 541.72 31.87 27.19 2.46 -3.34 0.12
18 314 356.88 358.39 -44.39 44.39 -111.24 586.12 32.56 28.91 3.85 -7.44 0.26
19 344 356.45 356.88 -12.88 12.88 -124.12 599.00 31.53 27.31 4.55 -7.99 0.29
20 337 355.79 356.45 -19.45 19.45 -143.57 618.45 30.92 26.52 5.41 -9.13 0.34
21 345 355.42 355.79 -10.79 10.79 -154.35 629.23 29.96 24.95 6.19 -9.30 0.37
22 362 355.64 355.42 6.58 6.58 -147.77 635.82 28.90 23.11 6.39 -7.71 0.33
23 314 354.23 355.64 -41.64 41.64 -189.41 677.46 29.45 24.97 7.59 -11.10 0.44
24 365 354.59 354.23 10.77 10.77 -178.64 688.23 28.68 23.55 7.59 -8.92 0.38
113
5.4 Holt’s trend corrected exponential smoothing
SES cannot handle a time series that displays a trend. If the time times is increasing or decreasing at a fixed rate it may be described
by the linear trend model:
yt = β0 + β1 t + εt .
ACTIVITY 5.11
Show that the change in level of the time series from time period T − 1 to time period T is β1 .
DISCUSSION OF ACTIVITY 5.11
The change is simply the difference the trend value at time T and the trend value at time T − 1. That is:
Growth rate
Now, regardless of whether the change β1 leads to an increase or a decrease, it is called the growth rate.
Holt’s trend corrected exponential smoothing is appropriate when both the level and the growth rate are changing. For the
Holt’s trend corrected exponential smoothing, let ℓT −1 be the estimate of the level of the time series in time period T − 1 and bT −1
be the corresponding estimate of the growth rate. If we observe a new time series value yt in time period T , these two estimates
require two smoothing equations to be updated.
The estimate of the level of the time series in time period T uses the smoothing constant α and is:
ℓT = αyT + (1 − α) (ℓT −1 + bT −1 )
The estimate of the growth rate of the time series in time period T uses the smoothing constant γ and is:
bT = γ (ℓT − ℓT −1 ) + (1 − γ) (bT −1 )
ŷT +τ = ℓT + τ bT (τ = 1, 2, 3, ...)
In general, for τ ≥ 2, a 95% prediction interval computed in time period T for yT +τ is:
114
¿ ¿
Á τ −1 Á τ −1
⎡ ⎤
⎢ ⎥
À1 + ∑ α2 (1 + jγ);
⎢(ℓT + τ bT ) − z0.025 sÁ À1 + ∑ α2 (1 + jγ)⎥
(ℓT + τ bT ) + z0.025 sÁ
⎢ ⎥
⎢ j=1 j=1 ⎥
⎣ ⎦
ACTIVITY 5.12
Write down the formula for a 95% prediction interval computed in time period T for yT +τ when:
(i) τ = 2
(ii) τ = 3.
(i) For τ = 2, the above general formula for the 95% prediction interval for yT +2 can be written as:
√ √ √
lT + 2bT ± z0.025 s 1 + α2 (1 + γ)2 = [lT + 2bT − z0.025 s 1 + α2 (1 + γ)2 ; lt + 2bT + z0.025 s 1 + α2 (1 + γ)2 ].
(2) For τ = 3, the general formula for the 95% prediction interval for yT +2 can be written as:
√
lT + 3bT ± z0.025 s 1 + α2 (1 + γ)2 + α2 (1 + 2γ)2 ,
ACTIVITY 5.13
Show that the Holt’s trend corrected exponential smoothing equations can be written in the following error correction forms:
lT = lT −1 + bT −1 + α[yT − (lT −1 + bT −1 )]
bT = bT −1 + α[yT − (lT −1 + bT −1 ).
lT = αyT + (1 − α)[lT −1 + bT −1 ]
= lT −1 + bT −1 + αyT − α[lT −1 + bT −1 ]
= lT −1 + bT −1 + α[yT − (lT −1 + bT −1 )].
The establishment of the second equation is left to you as an exercise.
ACTIVITY 5.14
The following data can be found in Example 8.3 in the prescribed textbook. The data is about thermostats sold for a period of
52 weeks.
115
206 189 172 255
245 244 210 303
185 209 205 282
169 207 244 291
162 211 218 280
177 210 182 255
207 173 206 312
216 194 211 296
193 234 273 307
230 156 248 281
212 206 262 308
192 188 258 280
162 162 233 345
The objective of the study was to find the point forecast and the 95% predictions intervals of the number of thermostats to be
sold in weeks 53, 54 and 55.
The graph indicates an overall increasing trend mainly at the end, but the growth rate has been changing over the 52 weeks.
Therefore, the Holt’s trend corrected exponential smoothing can be used to analyse the data. It has been suggested by the author
of the textbook to use α = 0.2 and γ = 0.1 as smoothing constants. We must first find l0 = β̂0 and b0 = β̂1 by fitting on the first
26 observations, the model yt = β0 + β1 t + ϵt where yt represents sakes at time t, t = 1, 2, . . . , 52 and ϵt is the error term at time t,
assumed to have mean 0.
116
Least squares estimation using Excel gives the following output:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.1118
R Square 0.0125
Adjusted R Square -0.0287
Standard Error 25.5552
Observations 26
ANOVA
df SS MS F Significance F
Regression 1 198.2785 198.2785 0.3036 0.5867
Residual 24 15673.6062 653.0669
Total 25 15871.8846
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 202.6246 10.3199 19.6344 0.0000 181.3254 223.9238
t -0.3682 0.6682 -0.5510 0.5867 -1.7474 1.0110
The fitted model is: ŷt = 202.6246 − 0.3682t. This model shows that, in general, the trend is decreasing. It follows that
l0 = β̂0 = 202.6246 and b0 = β̂1 = −0.3682. We can now calculate the smoothing levels lt and the growth rates bt using the
equations
ℓT = αyT + (1 − α) (ℓT −1 + bT −1 )
and
bT = γ (ℓT − ℓT −1 ) + (1 − γ) (bT −1 )
where α = 0.2, γ = 0.1, l0 = 202.6246 and b0 = −0.3682. For instance, l1 , b1 , l2 and b2 are calculated as follows:
Continuing the process until t = 52 gives the values in columns 3 and 4 in the following table:
117
Week Sales Level Growth rate Forecast made last period Forecast error Squared forecast error
0 - 202.6246 -0.3682 - - -
1 206 203.0051 -0.2933 202.2564 3.7436 14.0145
2 245 211.1694 0.5524 202.7118 42.2882 1788.2925
3 185 206.3775 0.0180 211.7219 -26.7219 714.0583
4 169 198.9164 -0.7299 206.3955 -37.3955 1398.4230
5 162 190.9492 -1.4536 198.1865 -36.1865 1309.4617
6 177 186.9964 -1.7036 189.4955 -12.4955 156.1387
7 207 189.6343 -1.2694 185.2929 21.7071 471.1988
8 216 193.8919 -0.7167 188.3649 27.6351 763.6988
9 193 193.1402 -0.7202 193.1752 -0.1752 0.0307
10 230 199.9360 0.0314 192.4200 37.5800 1412.2596
11 212 202.3739 0.2720 199.9674 12.0326 144.7845
12 192 200.5167 0.0591 202.6459 -10.6459 113.3357
13 162 192.8607 -0.7124 200.5759 -38.5759 1488.0973
14 189 191.5186 -0.7754 192.1483 -3.1483 9.9118
15 244 201.3946 0.2898 190.7433 53.2567 2836.2784
16 209 203.1475 0.4361 201.6844 7.3156 53.5180
17 207 204.2669 0.5044 203.5836 3.4164 11.6718
18 211 206.0170 0.6290 204.7713 6.2287 38.7967
19 210 207.3168 0.6961 206.6460 3.3540 11.2491
20 173 201.0103 -0.0042 208.0129 -35.0129 1225.9025
21 194 199.6049 -0.1443 201.0061 -7.0061 49.0858
22 234 206.3685 0.5465 199.4606 34.5394 1192.9711
23 156 196.7320 -0.4718 206.9149 -50.9149 2592.3316
24 206 198.2081 -0.2770 196.2601 9.7399 94.8650
25 188 195.9449 -0.4756 197.9311 -9.9311 98.6264
26 162 188.7754 -1.1450 195.4692 -33.4692 1120.1885
27 172 184.5043 -1.4576 187.6303 -15.6303 244.3076
28 210 188.4373 -0.9186 183.0466 26.9534 726.4838
29 205 191.0150 -0.5689 187.5187 17.4813 305.5945
30 244 201.1568 0.5021 190.4460 53.5540 2868.0261
31 218 204.9272 0.8290 201.6590 16.3410 267.0293
32 182 201.0049 0.3538 205.7561 -23.7561 564.3537
33 206 202.2870 0.4467 201.3587 4.6413 21.5413
34 211 204.3869 0.6120 202.7336 8.2664 68.3326
35 273 218.5991 1.9720 204.9989 68.0011 4624.1497
36 248 226.0569 2.5206 220.5711 27.4289 752.3432
37 262 235.2620 3.1890 228.5775 33.4225 1117.0646
38 258 242.3608 3.5800 238.4510 19.5490 382.1626
39 233 243.3527 3.3212 245.9408 -12.9408 167.4651
40 255 248.3391 3.4877 246.6739 8.3261 69.3246
41 303 262.0614 4.5112 251.8268 51.1732 2618.6956
42 282 269.6581 4.8197 266.5726 15.4274 238.0038
43 291 277.7823 5.1502 274.4778 16.5222 272.9820
44 280 282.3460 5.0915 282.9324 -2.9324 8.5992
45 255 280.9500 4.4428 287.4375 -32.4375 1052.1900
46 312 290.7142 4.9749 285.3928 26.6072 707.9453
47 296 295.7513 4.9811 295.6891 0.3109 0.0966
118
Week Sales Level Growth rate Forecast made last period Forecast error Squared forecast error
48 307 301.9860 5.1065 300.7324 6.2676 39.2823
49 281 301.8740 4.5846 307.0924 -26.0924 680.8155
50 308 306.7669 4.6155 306.4586 1.5414 2.3759
51 280 305.1059 3.9878 311.3823 -31.3823 984.8514
52 345 316.2750 4.7059 309.0937 35.9063 1289.2627
Total 39182.4700
Solver in Excel has been used to find optimal values of α and γ that minimise SSE. The optimal values are α = 0.095. These
new smoothing constants lead to the following table:
Week Sales Level Growth rate Forecast made last period Forecast Error Squared forecast error
0 202.6246 -0.3682
1 206 203.1811 -0.2804 202.2564 3.7436 14.0145
2 245 213.2992 0.7075 202.9007 42.0993 1772.3500
3 185 206.8421 0.0269 214.0067 -29.0067 841.3910
4 169 197.5153 -0.8617 206.8689 -37.8689 1434.0563
5 162 188.0941 -1.6749 196.6536 -34.6536 1200.8702
6 177 184.0927 -1.8959 186.4193 -9.4193 88.7225
7 207 188.3232 -1.3139 182.1968 24.8032 615.1987
8 216 194.1700 -0.6336 187.0093 28.9907 840.4610
9 193 193.4039 -0.6462 193.5364 -0.5364 0.2877
10 230 201.9565 0.2277 192.7577 37.2423 1386.9911
11 212 204.6087 0.4580 202.1842 9.8158 96.3499
12 192 201.8392 0.1514 205.0667 -13.0667 170.7388
13 162 192.1129 -0.7870 201.9906 -39.9906 1599.2500
14 189 190.7514 -0.8416 191.3260 -2.3260 5.4101
15 244 203.2701 0.4277 189.9099 54.0901 2925.7413
16 209 205.0074 0.5521 203.6978 5.3022 28.1134
17 207 205.9153 0.5859 205.5595 1.4405 2.0750
18 211 207.6124 0.6914 206.5012 4.4988 20.2393
19 210 208.7228 0.7312 208.3038 1.6962 2.8770
20 173 200.4499 -0.1242 209.4540 -36.4540 1328.8965
21 194 198.7633 -0.2726 200.3257 -6.3257 40.0149
22 234 207.2615 0.5606 198.4907 35.5093 1260.9109
23 156 195.0221 -0.6554 207.8221 -51.8221 2685.5333
24 206 197.2401 -0.3824 194.3667 11.6333 135.3337
25 188 194.6699 -0.5902 196.8577 -8.8577 78.4594
26 162 186.1560 -1.3430 194.0796 -32.0796 1029.1031
27 172 181.6482 -1.6436 184.8130 -12.8130 164.1725
28 210 187.4134 -0.9398 180.0045 29.9955 899.7281
29 205 191.0496 -0.5051 186.4736 18.5264 343.2270
30 244 203.7480 0.7493 190.5446 53.4554 2857.4848
31 218 207.8325 1.0661 204.4973 13.5027 182.3228
32 182 202.2546 0.4349 208.8986 -26.8986 723.5328
33 206 203.5072 0.5126 202.6895 3.3105 10.9591
119
Week Sales Level Growth rate Forecast made last period Forecast Error Squared forecast error
34 211 205.7439 0.6764 204.0198 6.9802 48.7229
35 273 222.8655 2.2387 206.4203 66.5797 4432.8540
36 248 230.7594 2.7759 225.1042 22.8958 524.2185
37 262 240.5661 3.4439 233.5354 28.4646 810.2345
38 258 247.4655 3.7721 244.0100 13.9900 195.7201
39 233 246.7330 3.3442 251.2377 -18.2377 332.6122
40 255 251.2931 3.4597 250.0771 4.9229 24.2345
41 303 266.6698 4.5918 254.7528 48.2472 2327.7936
42 282 273.9140 4.8438 271.2617 10.7383 115.3118
43 291 281.7816 5.1311 278.7578 12.2422 149.8707
44 280 285.2053 4.9689 286.9127 -6.9127 47.7855
45 255 281.4861 4.1435 290.1741 -35.1741 1237.2185
46 312 292.1431 4.7623 285.6296 26.3704 695.3980
47 296 296.6817 4.7410 296.9054 -0.9054 0.8197
48 307 302.8003 4.8719 301.4228 5.5772 31.1056
49 281 301.0842 4.2460 307.6722 -26.6722 711.4083
50 308 305.9897 4.3087 305.3302 2.6698 7.1277
51 280 302.8147 3.5977 310.2983 -30.2983 917.9895
52 345 315.9435 4.5032 306.4124 38.5876 1489.0045
Total 38884.2466
The point forecasts in week 52 for weeks 53, 54 and 55 obtained using
ŷT +τ = ℓT + τ bT (τ = 1, 2, 3, ...)
are respectively:
To calculate the 95% prediction intervals, we first have to calculate the standard error; that is:
¿
ÁT
√ Á ∑ [yt − (ℓT −1 + bT −1 )]2
SSE Á Á
À t=1
√
s= = = 38884.2466
50
= 27.8870.
T −2 T −2
√ √
ŷ54 (52) ± z0.025 s 1 + α2 (1 + γ)2 = 324.9499 ± 1.96(27.8870) 1 + (0.247)2 (1 + 0.095)2 = [268.3275; 381.5723].
√
Finally, the 95% prediction interval in week 52 for y55 is: ŷ55 (52) ± z0.025 s 1 + α2 (1 + γ)2 + α2 (1 + 2γ)2 .
That is:
√
329.4531 ± 1.96(27.8870) 1 + (0.247)2 (1 + 0.095)2 + (0.247)2 (1 + 2(0.095))2 = [266.0740; 392.8322].
120
5.5 Holt-Winters methods
Holt-Winters methods are designed for time series that show linear trend. The trend could be local or over a range of the entire
time series. In this section, two methods are presented. One is the additive Holt-Winters method and the other is the multiplicative
Holt-Winters method.
yt = (β0 + β1 t) + SNt + εt
In order to handle this model, it is easier to analyse the trend and the seasonal component separately. The seasonal component
can also be handled using the dummy variables if necessary. This method is appropriate when a time series has a linear trend
with an additive seasonal pattern for which the level, the growth rate, and the seasonal pattern may be changing. Implementation
of the additive Holt-Winters method starts with estimates of the level, the growth rate and the seasonal factor. Let ℓT −1 denote
the estimate of the level in time T − 1, and bT −1 the estimate of the growth rate in time T − 1. Suppose that we observe a new
observation yt in time period T and let snT −L be the latest estimate of the seasonal factor in time period T . As before, L is the
number of seasons. The subscript T − L of snT −L is to reflect that the time series value in time period T − L is the most recent
time series value observed in the season being analysed. Thus, this most recent time series value is used in determining snT −L .
The estimate of the level of the time series in time period T uses the smoothing constant α and is:
where (yT + snT −L ) is the deseasonalised observation in time period T. The estimate of the growth rate of the time series in
time period T uses the smoothing constant γ and is:
bT = γ (ℓT − ℓT −1 ) + (1 − γ) (bT −1 )
The new estimate for the seasonal factor SNT in time period T uses the smoothing constant δ and is:
where snT +τ −L is the “most recent” estimate of the seasonal factor for the season corresponding to time period T + τ .
A 95% confidence interval computed in time period T is:
√ √
[ŷT +τ (T ) − z0.025 s cτ ; ŷT +τ (T ) − z0.025 s cτ ]
where
121
cτ =1 for τ = 1
τ −1
2
= 1 + ∑ α2 (1 + jγ) for τ = 2, 3, ..., L
j=1
τ −1
2
= 1 + ∑ [α (1 + jγ) + dj,L (1 − α) δ] for τ = L, L + 1, L + 2, ...
j=1
where
dj,L =1 if j is a multiple of L
=0 otherwise
The error correction forms for the smoothing equations in the additive Holt-Winters method are:
ACTIVITY 5.15
Suppose that there is a well-known commodity that is transported by the largest international shipping and transportation
company from a foreign country, which is seasonal over the quarters of a year.
(i) j = 2
(ii) j = 12
(a) The number of quarters of a year is L = 4, that is the number of seasons. Hence:
122
cτ =1 for τ = 1
τ −1
2
= 1 + ∑ α2 (1 + jγ) for τ = 2, 3, 4
j=1
τ −1
2
= 1 + ∑ [α (1 + jγ) + dj,4 (1 − α) δ] for τ = 4, 5, 6, ...
j=1
(b) Using the results in part (a), and noting that dj,5 = 0 (since 5 is not a multiple of 4) have:
c1 =1
2
c2 = 1 + ∑1j=1 α2 (1 + jγ)2 = 1 + α2 (1 + γ)
2 2
c3 = 1 + ∑2j=1 α2 (1 + jγ)2 = 1 + α2 (1 + γ) + α2 (1 + 2γ)
2 2
c4 = 1 + ∑3j=1 α2 (1 + jγ)2 = 1 + α2 (1 + γ) + α2 (1 + 3γ)
4
2
c5 = 1 + ∑ [α (1 + jγ) + 0 (1 − α) δ]
j=1
= 1 + α (1 + γ)2 + α2 (1 + 2γ)2 + α2 (1 + 3γ)2 + α2 (1 + 4γ)2
2
ACTIVITY 5.16
The following data about four-year quarterly sales of TRK-50 mountain in Switzerland were presented in Exercise 6.4 in the
prescribed textbook. The use of dummy variables was suggested as an analysis method in the exercise.
123
(a) Plot sales versus time where here the time variable varies from 1 to 16.
(b) Explain why the additive Holt-Winters method is appropriate for the data.
(c) Calculate the estimates of the smoothing levels, the growth rates, the seasonal factors, the forecasts made last periods, the
forecast errors, and the squared forecast errors. Use α = 0.2, γ = 0.1 and δ = 0.1 as smoothing constants.
(d) The optimal values of the smoothing constants that minimizes SSE were found to be α = 0.561, γ = 0 and δ = 0. Repeat the
questions in part (c) using these new constants.
(e) Calculate the point forecasts and the 95% prediction interval of sales at the first 3 quarters of year 5.
(b) The plot indicates a linear and a constant seasonal variation. Hence, the data can be analysed using the additive Holt-Winters
method.
(c) Fitting the linear trend model yt = β0 + β1 t + ϵt in Excel gives the following output:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.2200
R Square 0.0484
Adjusted R Square -0.0196
Standard Error 14.2680
Observations 16
124
ANOVA
df SS MS F Significance F
Regression 1 144.9529 144.9529 0.7120 0.4130
Residual 14 2850.0471 203.5748
Total 15 2995
The fitted model is: ŷt = 22.2 + 0.6529t. Therefore, l0 = β̂0 = 22.2 and b0 = β̂1 = 0.6529.
We first calculate the regression estimates using the above fitted model. For instance:
ŷ1 = 22.2 + 0.6529(1) = 22.8529. This value and the other regression estimates, rounded to three digits after the decimal
point, are in the fifth column of the following table. Next, we calculate the detrended values yt − ŷt for t = 1, 2, . . . , 16. For
instance, y1 − ŷ1 = 10 − 22.853 = −12.853. This value and the detrended values are in the sixth column of the following table.
The initial seasonal factors are the averages of the detrended values for the corresponding seasons. In the table, they are
denoted at , and they represent sn1−L = s−3 , sn2−L = sn−2 , sn3−L = sn1 , sn0 since L = 4. As a consequence, the time does
not start from t = 1, but from t = −3 up to t = 16. Let us illustrate how sn−3 was calculated.
125
Yr Q t yt ŷt yt − ŷt at lt bt snt ŷt+1 (t) et e2t
-3 -14.520
-2 6.327
-1 18.674
0 22.2 0.653 -10.479
1 1 1 10 22.853 -12.853 -14.520 23.186 0.686 -14.387 8.333 1.667 2.780
1 2 2 31 23.506 7.494 6.327 24.033 0.702 6.391 30.199 0.801 0.641
1 3 3 43 24.159 18.841 18.674 24.653 0.694 18.641 43.409 -0.409 0.167
1 4 4 16 24.812 -8.812 -10.479 25.574 0.717 -10.388 14.868 1.132 1.281
2 1 5 11 25.465 -14.465 26.110 0.699 -14.459 11.903 -0.903 0.816
2 2 6 33 26.117 6.883 26.768 0.695 6.375 33.199 -0.199 0.040
2 3 7 45 26.770 18.230 27.242 0.673 18.553 46.104 -1.104 1.220
2 4 8 17 27.423 -10.423 27.810 0.662 -10.431 17.526 -0.526 0.277
3 1 9 13 28.076 -15.076 28.269 0.642 -14.540 14.012 -1.012 1.025
3 2 10 34 28.729 5.271 28.654 0.616 6.272 35.286 -1.286 1.653
3 3 11 48 29.382 18.618 29.305 0.620 18.567 47.823 0.177 0.031
3 4 12 19 30.035 -11.035 29.826 0.610 -10.470 19.494 -0.494 0.244
4 1 13 15 30.688 -15.688 30.257 0.592 -14.612 15.896 -0.896 0.802
4 2 14 37 31.341 5.659 30.824 0.589 6.262 37.121 -0.121 0.015
4 3 15 51 31.994 19.007 31.618 0.610 18.649 49.981 1.019 1.039
4 4 16 21 32.646 -11.646 32.076 0.595 -10.531 21.757 -0.757 0.574
Now, that we have the initial values of the smoothing levels, the growth rates, the seasonal factors and the smoothing constants
we are ready to calculate all the other estimates.
The point forecast at t = 1 is:
126
The seasonal factor at time t = 1 is:
sn1 = sn−3 + (1 − α)δ[y1 − (l0 + b0 + sn−3 )]
= sn−3 + (1 − α)δ[y1 − ŷ1 (0)]
= −14.520 + 0.8(0.1)(10 − 8.333)
= −14.520 + 0.8(0.1)(1.667)
= −14.38664 ≈ −14.387.
The process continues for t = 3, 4, . . . , 16. The values of lt , bt , snt and Ft = ŷt+1 (t) are in the eighth, ninth, tenth and
eleventh columns, respectively of the table.
The forecast errors et = yt − ŷt+1 (t) were used in the above calculated, but also reported in the twelfth column of the table.
The squared forecast errors are in the last column of the table.
127
(d) The results found in part (c) served for illustrating the computation, but are not useful for prediction since the optimal values
of the smoothing constants that minimises SSE were not used. We now use the optimal values α = 0.561, γ = 0. and δ = 0.
The regression estimates, the detrended values and the initial seasonal factors remain as in part (c). However, the smoothing
levels, the growth rates, the seasonal factors and the point forecast from time t = 1 to t = 16 will change since the smoothing
constants have changed. We obtain the following:
The forecast error at time t = 1 is: y1 − ŷ1 (0) = 10 − 8.333 = 1.667 and thus the squared forecast error is 2.780. The process
continues for t = 2, 4, . . . , 16. The results are presented in the following table.
128
Yr Q t yt ŷt yt − ŷt at lt bt snt ŷt+1 (t) et e2t
- - -3 - - - - - - -14.520 - - -
- - -2 - - - - - - 6.327 - - -
- - -1 - - - - - - 18.674 - - -
- - 0 - - - - 22.2 0.653 -10.479 - - -
1 1 1 10 22.853 -12.853 -14.520 23.788 0.653 -14.520 8.333 1.667 2.780
1 2 2 31 23.506 7.494 6.327 24.571 0.653 6.327 30.768 0.232 0.054
1 3 3 43 24.159 18.841 18.674 24.720 0.653 18.674 43.898 -0.898 0.807
1 4 4 16 24.812 -8.812 -10.479 25.994 0.653 -10.479 14.894 1.106 1.223
2 1 5 11 25.465 -14.465 26.015 0.653 -14.520 12.126 -1.126 1.268
2 2 6 33 26.117 6.883 26.671 0.653 6.327 32.994 0.006 0.000
2 3 7 45 26.770 18.230 26.764 0.653 18.674 45.998 -0.998 0.995
2 4 8 17 27.423 -10.423 27.452 0.653 -10.479 16.938 0.062 0.004
3 1 9 13 28.076 -15.076 27.777 0.653 -14.520 13.584 -0.584 0.341
3 2 10 34 28.729 5.271 28.005 0.653 6.327 34.757 -0.757 0.572
3 3 11 48 29.382 18.618 29.033 0.653 18.674 47.332 0.668 0.446
3 4 12 19 30.035 -11.035 29.570 0.653 -10.479 19.207 -0.207 0.043
4 1 13 15 30.688 -15.688 29.829 0.653 -14.520 15.702 -0.702 0.493
4 2 14 37 31.341 5.659 30.589 0.653 6.327 36.808 0.192 0.037
4 3 15 51 31.994 19.007 31.850 0.653 18.674 49.916 1.084 1.175
4 4 16 21 32.646 -11.646 31.929 0.653 -10.479 22.024 -1.024 1.049
Tot - - - - - - - - - - - 11.287
(e) The first three quarters of year 5 correspond to t = 17, 18, 19. The point forecast for y17 at t = 16 is:
ŷ17 (16) = l16 + b16 + sn17−L = l16 + b16 + sn13 = 31.929 + 0.653 − 14.520 = 18.062.
ŷ18 (16) = l16 + 2b16 + sn18−L = l16 + 2b16 + sn14 = 31.929 + 2(0.653) + 6.327 = 39.562.
ŷ19 (16) = l16 + 3b16 + sn19−L = l16 + 3b16 + sn15 = 31.929 + 3(0.653) + 18.674 = 52.562.
We need the standard error before calculating the prediction interval. In this case, we have:
√ √ √
SSE 11.287 11.287
s= = = = 0.9318.
T −3 16 − 3 13
The 95% prediction interval for y17 at t = 16 is:
√ √
ŷ17 (16) ± z0.025 s c1 = 18.062 ± 1.96(0.9318) 1
= 18.062 ± 1.8263
= [16.2357; 19.8883].
129
The 95% prediction interval for y18 at t = 16 is:
√ √
ŷ18 (16) ± z0.025 s c2 = 39.562 ± 1.96(0.9318) 1 + α2 (1 + γ)2
√
= 39.562 ± 1.96(0.9318) 1 + (0.561)2 (1 + 0)2
= 39.562 ± 2.0941
= [37.4679; 41.6561].
√ √
ŷ19 (16) ± z0.025 s c3 = 52.562 ± 1.96(0.9318) 1 + α2 (1 + γ)2 + α2 (1 + 2γ)
√
= 52.562 ± 1.96(0.9318) 1 + (0.561)2 (1 + 0)2 + (0.561)2 (1 + 2(0))
√
= 52.562 ± 1.96(0.9318) 1 + 2(0.561)2
= 52.562 ± 2.3313
= [50.2307; 54.8933].
In Unit 4 we showed how to estimate the fixed seasonal factors, SNt , by using centred moving averages. The level at time
period T − 1 for this model is given by β0 + β1 (T − 1), and the level at time period T is given by β0 + β1 T . This shows that a
growth rate for the level is β1 .
The implementation of the multiplicative Holt-Winters method needs to estimate the smoothed level, the growth rate and the
seasonal factor. Let ℓT −1 denote the estimate of the level in time T − 1, and bT −1 denote the growth rate in time T − 1. Then,
suppose that we observe a new observation yT in time period T , and let snT −L be the latest estimate of the seasonal factor in time
period T . As before, L is the number of seasons. The subscript T − L of snT −L is to reflect that the time series value in time period
T − L is the most recent time series value observed in the season being analysed. Thus, this most recent time series value is used
in determining snT −L .
The estimate of the level of the time series in time period T uses the smoothing constant α and is:
yT
ℓT = α ( ) + (1 − α) (ℓT −1 + bT −1 )
snT −L
yT
where ( ) is the deseasonalised observation in time period T . The estimate of the growth rate of the time series in time
snT −L
period T uses the smoothing constant γ and is:
bT = γ (ℓT − ℓT −1 ) + (1 − γ) (bT −1 )
The new estimate for the seasonal factor SNT in time period T uses the smoothing constant δ and is:
yT
snT = δ ( ) + (1 − δ) snT −L
ℓT
130
yT
where ( ) is an estimate of the newly observed seasonal variation.
ℓT
A point forecast made in time period T for yT +τ is:
where snT +τ −L is the “most recent” estimate of the seasonal factor for the season corresponding to time period T + τ .
where
2
c1 = (ℓT + bT )
2 2 2
c2 = α2 (1 + γ) (ℓT + bT ) + (ℓT + 2bT )
2 2 2 2 2
c3 = α2 (1 + 2γ) (ℓT + bT ) + α2 (1 + γ) (ℓT + 2bT ) + (ℓT + 3bT )
−1 2
cτ = ∑τj=1 α (1 + [τ − j]γ)2 (lT + jbT )2 + (lT + τ bT )2 , if 2 ≤ τ ≤ L
The error correction form for the smoothing equations in the additive Holt-Winters method is made of:
yT − (ℓT −1 + bT −1 ) snT −L
ℓT = ℓT −1 + bT −1 + α [ ]
snT −L
yT − (ℓT −1 + bT −1 ) snT −L
bT = bT −1 + αγ [ ]
(ℓT −1 + bT −1 ) snT −L
yT − (ℓT −1 + bT −1 ) snT −L
snT = snT −1 + (1 − α) δ [ ]
(ℓT −1 + bT −1 ) snT −L
ACTIVITY 5.17
The following data for quarterly sales in 1000s of the cases of Tiger Sports Drink for eight consecutive years can be found on
page 377 in the prescribed textbook. The investigator wanted to estimate the point forecasts and the 95% prediction interval of the
number of cases to be sold in the four quarters of the 9th year.
Year
Quarter 1 2 3 4 5 6 7 8
1 72 77 81 87 94 102 106 115
2 116 123 131 140 147 162 170 177
3 136 146 158 167 177 191 200 218
4 96 101 109 120 128 134 142 149
131
(a) Plot sales versus time where here the time variable varies from 1 to 32.
(b) Explain why the multiplicative Holt-Winters method is appropriate for the data.
(c) Calculate the estimates of the smoothing levels, the growth rates, the seasonal factors, the forecasts made last periods, the
forecast errors, and the squared forecast errors. Use α = 0.2, γ = 0.1 and δ = 0.1 as smoothing constants.
(d) The optimal values of the smoothing constants that minimizes SSE were found to be α = 0.336, γ = 0.046 and δ = 0.134.
Repeat the questions in part (c) using these new constants.
(e) Calculate the point forecasts and the 95% prediction interval of sales at the 4 quarters of year 9.
(b) The graph indicates a linearly increasing trend and the seasonal pattern increases with time. Thus a multiplicative Holt-
Winters may be appropriate for analysing the data.
(c) Fitting the simple linear regression model yt = β0 + β1 t + ϵt to the first 16 observations gives the following output.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.4038
R Square 0.1631
Adjusted R Square 0.1033
Standard Error 27.5833
Observations 16
132
ANOVA
df SS MS F Significance F
Regression 1 2075.2941 2075.2941 2.7276 0.1209
Residual 14 10651.7059 760.8361
Total 15 12727
Step 1: Use the fitted model to calculate the regression estimates for the first 16 observations. The regression estimates are
reported in the fifth column of the following table.
Step 3: We calculate the average per season (here quarter) of the detrended values. That is: For quarter 1, the calculations
are the following:
133
Then, S̄2 , S̄3 and S̄4 are calculated in a similar way. The results are reported on the bottom of the table in the columns
corresponding to the quarters.
Step 3: The averages S̄i , i = 1, . . . , 4 are not likely to sum to L = 4. Therefore, we multiply each average by the following
correction factor:
L
CF = L
.
∑i=1 S̄i
In the present case, the sum of the average is:
0.7065×4
S1−L = S1−4 = S−3 = 4.0016
= 0.7062
1.1119×4
S2−L = S2−4 = S−2 = 4.0016 = 1.1115
S3−L = S3−4 = S−1 = 1.2942×4
4.0016
= 1.2937
0.8890×4
S4−L = S4−4 = S0 = 4.0016 = 0.8886
These results are reported in the last column of the table. As expected they sum to 4. Now, we are ready to estimate the
components of the multiplicative Holt-Winters model.
ŷ1 (0) = (l0 + 1b0 )sn0+1−4 = (l0 + b0 )sn−3 = (95.25 + 2.4706)(0.7062) = 69.0103.
is:
y1
l1 = α( ) + (1 − α) (ℓ1−1 + b1−1 )
sn1−4
y1
= 0.2 ( ) + (1 − α) (ℓ0 + b0 )
sn−3
72
= 0.2 ( ) + (0.8) (95.25 + 2.4706)
0.7062
= 98.5673.
134
is:
is:
y1
sn1 = δ( ) + (1 − δ) sn1−4
ℓ1
y1
= δ ( ) + (1 − δ) sn−3
ℓ1
72
= 0.1 ( ) + (0.9) 0.7062
98.5673
= 0.7086.
y2
l2 = α( ) + (1 − α) (ℓ2−1 + b2−1 )
sn2−4
y2
= 0.2 ( ) + (1 − α) (ℓ1 + b1 )
sn−2
116
= 0.2 ( ) + (0.8) (98.5673 + 2.5553)
1.1115
= 101.7708.
135
y2
sn2 = δ( ) + (1 − δ) sn2−4
ℓ2
y2
= δ ( ) + (1 − δ) sn−2
ℓ2
116
= 0.1 ( ) + (0.9) 1.1115
101.7708
= 1.1143.
136
(d) The results found in part (c) served for illustrating the computation, but are not useful for prediction since the optimal values
of the smoothing constants that minimises SSE were not used. We now use the optimal values α = 0.336, γ = 0.046. and
δ = 0.134.
The regression estimates, the detrended values and the initial seasonal factors remain as in part (c). However, the smoothing
levels, the growth rates, the seasonal factors and the point forecast from time t = 1 to t = 32 will change since the smoothing
constants have changed. We obtain the following:
ŷ1 (0) = (l0 + 1b0 )sn0+1−4 = (l0 + b0 )sn−3 = (95.25 + 2.4706)(0.7062) = 69.0103.
y1
l1 = α( ) + (1 − α) (ℓ1−1 + b1−1 )
sn1−4
y1
=( ) + (1 − α) (ℓ0 + b0 )
sn−3
72
= 0.336 ( ) + (0.664) (95.25 + 2.4706)
0.7062
= 99.1431.
y1
sn1 = δ( ) + (1 − δ) sn1−4
ℓ1
y1
= δ ( ) + (1 − δ) sn−3
ℓ1
72
= 0.134 ( ) + (0.866) 0.7062
99.1431
= 0.7089.
The process continues for t = 2, 4, . . . , 16. The results are presented in the following table.
137
Y Q T yt lt bt snt ŷt (t − 1) et e2t srt2
- - -3 - - - 0.7062 - - - -
- - -2 - - - 1.1115 - - - -
- - -1 - - - 1.2937 - - - -
- - 0 - 95.25 2.4706 0.8886 - - - -
1 1 1 72 99.1431 2.5360 0.7089 69.0103 2.9897 8.9384 0.00187686
1 2 2 116 102.5810 2.5775 1.1141 113.0163 2.9837 8.9024 0.00069699
1 3 3 136 105.1472 2.5770 1.2937 136.0436 -0.0436 0.0019 0.00000010
1 4 4 96 107.8287 2.5818 0.8888 95.7238 0.2762 0.0763 0.00000833
2 1 5 77 109.8094 2.5542 0.7079 78.2681 -1.2681 1.6082 0.00026252
2 2 6 123 111.7052 2.5239 1.1123 125.1829 -2.1829 4.7651 0.00030407
2 3 7 146 113.7684 2.5027 1.2923 147.7740 -1.7740 3.1470 0.00014411
2 4 8 101 115.3846 2.4619 0.8870 103.3449 -2.3449 5.4988 0.00051486
3 1 9 81 116.6986 2.4091 0.7060 83.4183 -2.4183 5.8481 0.00084042
3 2 10 131 118.6578 2.3884 1.1112 132.4893 -1.4893 2.2181 0.00012636
3 3 11 158 121.4557 2.4072 1.2934 156.4251 1.5749 2.4804 0.00010137
3 4 12 109 123.5338 2.3921 0.8864 109.8689 -0.8689 0.7549 0.00006254
4 1 13 87 125.0192 2.3504 0.7047 88.9052 -1.9052 3.6297 0.00045922
4 2 14 140 126.9048 2.3290 1.1102 141.5372 -1.5372 2.3631 0.00011796
4 3 15 167 129.1936 2.3272 1.2933 167.1548 -0.1548 0.0240 0.00000086
4 4 16 120 132.8175 2.3868 0.8887 116.5792 3.4208 11.7019 0.00086102
5 1 17 94 134.5975 2.3589 0.7038 95.2725 -1.2725 1.6192 0.00017839
5 2 18 147 135.4302 2.2887 1.1068 152.0428 -5.0428 25.4298 0.00110005
5 3 19 177 137.4292 2.2754 1.2926 178.1149 -1.1149 1.2431 0.00003918
5 4 20 128 141.1589 2.3423 0.8911 124.1534 3.8466 14.7962 0.00095991
6 1 21 102 143.9794 2.3643 0.7044 100.9982 1.0018 1.0035 0.00009838
6 2 22 162 146.3500 2.3646 1.1069 161.9793 0.0207 0.0004 0.00000002
6 3 23 191 148.3952 2.3499 1.2919 192.2285 -1.2285 1.5093 0.00004085
6 4 24 134 150.6205 2.3441 0.8909 134.3304 -0.3304 0.1092 0.00000605
7 1 25 106 152.1282 2.3057 0.7034 107.7534 -1.7534 3.0745 0.00026479
7 2 26 170 154.1498 2.2926 1.1063 170.9358 -0.9358 0.8757 0.00002997
7 3 27 200 155.8956 2.2674 1.2907 202.1024 -2.1024 4.4200 0.00010821
7 4 28 142 158.5742 2.2864 0.8915 140.9098 1.0902 1.1885 0.00005986
8 1 29 115 161.7440 2.3270 0.7044 113.1506 1.8494 3.4201 0.00026713
8 2 30 177 162.7000 2.2639 1.1038 181.5140 -4.5140 20.3760 0.00061844
8 3 31 218 166.2882 2.3248 1.2934 212.9131 5.0869 25.8769 0.00057083
8 4 32 149 168.1144 2.3019 0.8908 150.3230 -1.3230 1.7504 0.00007746
- - - - - - - - - - 0.01079712
2
yt − ŷt (t − 1)
srt2 =[ ] .
ŷt (t − 1)
138
(e) The point forecasts in time 32 of y33 , y34 , y35 and y36 are the following:
To calculate the 95% prediction intervals, we first compute the relative standard error as follows:
¿
Á T y − ŷ (t − 1) 2
Á
Á∑[ t t
]
Á
À t=1 ŷt (1 − 1)
sr = T −3
¿
Á 32 y − ŷ (t − 1) 2
Á
Á∑[ t t
]
Á
À t=1 ŷt (1 − 1)
= 32−3
√
0.01079712
= 29
= 0.0193.
√ √
ŷ33 (32) ± z0.025 sr c1 sn33−L = ŷ33 (32) ± z0.025 sr (l32 + b32 )2 sn29
√
= 120.0412 ± 1.96(0.0193) (168.1144 + 2.3019)2 (0.7044)
= 120.0412 ± 1.96(0.0193)(168.1144 + 2.3019)(0.7044)
= 120.0412 ± 4.5409
= [115.5003; 124.5821].
139
The 95% prediction interval for y34 is:
√ √
ŷ34 (32) ± z0.025 sr c2 sn34−L = ŷ34 (32) ± z0.025 sr c2 sn30
where
√ √
ŷ34 (32) ± z0.025 sr c2 sn34−L = ŷ34 (32) ± z0.025 sr c2 sn30
√
= 190.6463 ± 1.96(0.0193) 33418.8476(1.1038)
= 190.6463 ± 7.6331
= [183.0132; 198.2794].
√ √
ŷ35 (32) ± z0.025 sr c3 sn35−L = ŷ35 (32) ± z0.025 sr c3 sn31
where
c3 = α2 (1 + 2γ)2 (l32 + b32 )2 + α2 (1 + γ)2 (l32 + 2b32 )2 + (l32 + 3b32 )2
= (0.336)2 (1 + 2(0.046))2 (168.1144 + 2.3019)2
+ (0.336)2 (1 + 0.046)2 (168.1144 + 2(2.3019))2 + (168.1144 + 3(2.3019))2
= 38226.5951.
Hence, the 95% prediction interval of y35 is:
√ √
ŷ35 (32) ± z0.025 sr c3 sn35−L = ŷ35 (32) ± z0.025 sr c3 sn31
√
= 226.3710 ± 1.96(0.0193) 38226.5951(1.2934)
= 226.3710 ± 9.5660
= [216.8050; 235.9370].
The 95% prediction interval for y36 is:
√ √
ŷ36 (32) ± z0.025 sr c4 sn36−L = ŷ36 (32) ± z0.025 sr c4 sn32
140
where
c4 = ∑4−1 2 2 2
j=1 α (1 + [4 − j]γ) (lT + jbT ) + (lT + 4bT )
2
√ √
ŷ36 (32) ± z0.025 sr c4 sn36−4 = ŷ36 (32) ± z0.025 sr c4 sn32
√
= 157.9584 ± 1.96(0.0193) 43488.912(0.8908)
= 157.9584 ± 7.0272
= [150.9312; 164.9856].
where α and γ are smoothing constants between 1 and 1, and ϕ is a damping factor between 0 and 1.
ŷT +τ (T ) = lT + (ϕ + ϕ2 + ⋯ + ϕτ )bT .
ACTIVITY 5.18
141
DISCUSSION OF ACTIVITY 5.18
We know that the values of the dampening factor lies between 0 and 1. The values near 0 have less dampening effect than the
ones near 1. Hence:
(a) Meager (or weak) dampening will be effected with values near 0.
ACTIVITY 5.19
(a) to 0?
(b) to 1?
(a) When ϕ = 0, then lT = αyT + (1 − α)lt−1 which is the estimate of the smoothing level at time T for the simple exponential
smoothing.
(b) When ϕ = 1, then lT = αyT + (1 − α)(lt−1 + bT −1 ) and bt = γ(lT − lT −1 ) + (1 − γ)bT −1 which are the estimates of the
smoothing level and the growth rate at time T for the Holt’s trend corrected exponential smoothing.
Once the point forecast is determined, one can also determine the interval prediction.
ŷT +1 (T ) ± z0.025 s
where
¿
ÁT
√ Á ∑ [yt − (ℓT −1 + ϕbT −1 )]2
SSE Á Á
À t=1
s= = .
T −2 T −2
√
ŷT +2 (T ) ± z0.025 s 1 + α2 (1 + ϕγ)2
√
ŷT +3 (T ) ± z0.025 s 1 + α2 (1 + ϕγ)2 + α2 (1 + ϕγ + ϕ2 γ)2
142
If τ ≥ 4, then a 95% prediction interval computed at time T for yT +τ is:
¿
Á τ −1
À1 + α2 (1 + ∑ α2 (1 + ϕj γ)2
ŷT +τ (T ) ± z0.025 sÁ
j=1
where ϕj = ϕ + ϕ2 + ⋯ + ϕj .
ACTIVITY 5.20
Show that the error correction forms of the damped trend exponential smoothing equations are:
and
bT = γ(lT − lT −1 ) + (1 − γ)ϕbT −1
= γlT − γlT −1 + ϕbT −1 − γϕbT −1
= ϕbT −1 + γ(lT − lT −1 − ϕbT −1 )
= ϕbT −1 + γ[lT − (lT −1 + ϕbT −1 )].
Remember that the additive Holt-Winter method is appropriate for time series with fixed linear trend and fixed growth rate
and constant seasonal variation. The results in Section 5.5.1 imply that estimates of the smoothing level, the growth rate and
the seasonal component for the additive Holt-Winters with damped trend are the following:
ℓT = α (yT − snT −L ) + (1 − α) (ℓT −1 + ϕbT −1 )
bT = γ (ℓT − ℓT −1 ) + (1 − γ) ϕ (bT −1 )
143
√
ŷT +τ (T ) ± z0.025 s cτ
ACTIVITY 5.21
Show that the error correction form equations of the additive Holt-Winters with damped trend exponential smoothing are:
The other two equations are derived in a similar manner. Try to establish them!
Remember that the multiplicative Holt-Winter method is appropriate for time series with fixed linear trend and fixed growth
rate and a changing (increasing) seasonal variation. The results in Section 5.5.2 imply that estimates of the smoothing level,
the growth rate and the seasonal component for the multiplicative Holt-Winters with damped trend are the following:
yT
ℓT = α ( ) + (1 − α) (ℓT −1 + ϕbT −1 )
snT −L
bT = γ (ℓT − ℓT −1 ) + (1 − γ) (ϕbT −1 )
yT
snT = δ ( ) + (1 − δ) snT −L
ℓT
√
ŷT +τ (T ) ± z0.025 sr cτ snT +τL
144
where s is as in for the multiplicative Holt-Winters method.
If 2 ≤ τ ≤ L, then
τ −1
cr = ∑ α2 (1 + [τ − j]γ)2 (lT + ϕj bT )2 + (lT + ϕτ bT )2
j=1
2 j
where ϕj = ϕ + ϕ + ⋯ + ϕ .
ACTIVITY 5.22
Show that the error correction form equations of the multiplicative Holt-Winters with damped trend exponential smoothing
are:
ACTIVITY 5.23
Show that the no trend multiplicative Holt-Winters method is characterised by the following results:
(1) The estimates of the smoothing levels and seasonal components are:
and
snT = δ(yT /lT ) + (1 − α)snT −L
ŷT +τ (T ) = lT snT +τ −L .
(3) An approximate 95% prediction interval computed in time period T for yT +τ when 1 ≤ τ ≤ L is:
√
ŶT +τ (T ) ± z0.025 sr ( 1 + (τ − 1)α2 ) lT snT +τ −L .
145
DISCUSSION OF ACTIVITY 5.23
If there is no trend, then bT = 0 for all values of T . It follows from the results in Section (5.6.3) that:
yT
lT = α( ) + (1 − α) (ℓT −1 + ϕbT −1 )
snT −L
yT
= α( ) + (1 − α) (ℓT −1 + ϕ(0))
snT −L
yT
= α( ) + (1 − α) lT −1 .
snT −L
The other results can be derived in s similar manner. Try to establish them!
5.7 Conclusion
This unit discussed various forecasting models that are based on exponential smoothing. Simple exponential smoothing was found
to be appropriate for time series with no trend and no seasonal variation. When a time series exhibits a linear trend at least
locally with seasonal variation, the Holt’s trend corrected exponential smoothing method was found to be appropriate. Time series
exhibiting linear trend, and constant seasonal variation were found to be better analysed using the additive Holt-Winters method
while time series that have a linear trend with changing seasonal variation were found to be better analysed using the multiplicative
Holt-Winters method. Finally the damped trend exponential smoothing method was found to be appropriate for time series which
have a growth rate that is not sustained in future and thus the growth rate has to be multiplied by a damping factor.
146