0% found this document useful (0 votes)

37 views43 pages

A - Basic - Time Series Forecasting Course With Python

Time SeriesStudies

Uploaded by

Fabiana Sampaio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views43 pages

A - Basic - Time Series Forecasting Course With Python

Time SeriesStudies

Uploaded by

Fabiana Sampaio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Operations Research Forum (2023) 4:2

https://doi.org/10.1007/s43069-022-00179-z

TUTORIAL

A Basic Time Series Forecasting Course with Python

Alain Zemkoho1

Received: 29 October 2021 / Accepted: 16 November 2022 / Published online: 23 December 2022
© The Author(s) 2022

Abstract
The aim of this paper is to present a set of Python-based tools to develop forecasts
using time series data sets. The material is based on a 4-week course that the author
has taught for 7 years to students on operations research, management science, ana-
lytics, and statistics 1-year MSc programmes. However, it can easily be adapted to
various other audiences, including executive management or some undergraduate
programmes. No particular knowledge of Python is required to use this material.
Nevertheless, we assume a good level of familiarity with standard statistical forecast-
ing methods such as exponential smoothing, autoregressive integrated moving aver-
age (ARIMA), and regression-based techniques, which is required to deliver such a
course. Access to relevant data, codes, and lecture notes, which serve as based for
this material, is made available (see https://github.com/abzemkoho/forecasting) for
anyone interested in teaching such a course or developing some familiarity with the
mathematical background of relevant methods and tools.

Keywords Python · Forecasting · Exponential smoothing · ARIMA · Regression

Mathematics Subject Classiffication (2020) 90-04 · 97U50 · 97U70

1 Introduction

According to Makridakis et al. [1], forecasting is the process of making predictions

of the future based on past and present data and most commonly by analysis of
trends, and usually needed to determine when an event will occur or a need arise, so
that appropriate actions can be taken. Broadly speaking, forecasting approaches can

This article is part of the Topical Collection on Model Development for the Operations Research
Classroom.

* Alain Zemkoho
[email protected]
1
School of Mathematical Sciences & Centre for Operational Research, Management Sciences
and Information Systems (CORMSIS), University of Southampton, Building 54 Mathematical
Sciences SO17 1BJ Highfield Campus, Southampton, England

13
Vol.:(0123456789)
2 Page 2 of 43 Operations Research Forum (2023) 4:2

be split into two categories: qualitative and quantitative forecasting methods, and
we can even add a third one that we label as semi-qualitative, where a combination
of both qualitative and quantitative methods can be employed to generate forecasts.
Qualitative forecasting methods are often used in situations where historical data is
not available. For more details on these concepts, interested readers are referred to
the books [1, 2] and references therein.
Our focus in this paper is on quantitative methods, as we assume that historical
times series data (i.e. data from a unit (or a group of units) observed in several suc-
cessive periods) is available for the variables of interest. Within quantitative meth-
ods, we also have a number of subcategories that can be broadly labelled as statisti-
cal methods, which are at the foundation of the subject, and machine learning ones,
which have been developing rapidly in recent years; see, e.g. [3–9] for a sample of
applications and surveys on the subject.
The material to be presented in this paper is based on statistical forecasting
methods; see, e.g. [1, 2, 10–12] for related details. Despite the fast development of
machine learning techniques, they have been consistently shown through the last two
M competitions [13, 14] to generally be outperformed by statistical methods in terms
of accuracy and computational requirements; these comparisons (see relevant details
in the papers [13, 14]) are done on more than 100 thousand practical data sets, related
to a wide range of industries, based on the ForeDeCk database (http://fsudataset.
com/). Note that the M competition series (with M referring to Spyros Makridakis,
one of the world leaders in the field) is a famous open competition, which can also be
seen as a benchmarking exercise, where competitors evaluate and compare the per-
formance of a wide range of forecasting methods on thousands of practical data sets.
The aim of this paper is to introduce the reader to existing Python tools that can be
used to deliver a practical course on basic statistical forecasting methods; namely, we will
focus on the exponential smoothing, autoregressive integrated moving average (ARIMA),
and regression-based methods, which are (or a combination of them) part of core tech-
niques shown to have the best performance in the M competitions mentioned above.

1.1 Background

The material presented in this paper is based on a course named Forecasting, that
the author has taught for the past 7 years within the School of Mathematical Sci-
ences at the University of Southampton, based in the UK. This is an optional course,
but which is very popular, and is taken by students from the eight MSc programmes
listed in Table 1, spanning both the School of Mathematical Sciences and the South-
ampton Business School.
The course is very practical and hands on, designed to run for 16h across 4 weeks,
with 2h of weekly lecture and the remaining 2h dedicated to a workshop/tutorial/
computer lab, where the students are supported to go through the Python material
to test and apply the methods on some practical data sets. The lectures focus on tak-
ing the students through the mathematical background of the methods that will be
covered here [15]. During the computer labs, students are taken through the Python
codes covered in this paper, which implement the methods that form the content

13
Operations Research Forum (2023) 4:2 Page 3 of 43 2

Table 1 List of MSc programmes of origin of the students that usually take the forecasting course, which
is the source of the material presented in this paper
School of Mathematical Sciences Southampton Business School

Operational Research and Finance Business Analytics and Management Science

Data and Decision Analytics Logistics and Supply Chain Analytics
Operational Research Business Analytics and Finance
Statistics Marketing Analytics

of the lectures, and support them in using these methods to develop forecasts on
practical data sets. Note that this course can easily be expanded to cover a few more
weeks, as necessary, and the material can also be adapted to an undergraduate level
for programmes around operations research, statistics, business analytics, and man-
agement science.
It is important to mention that before the start of the course, a brief material with
a basic introduction to Python is made available to the students, in order to bring
them up to speed with some basic elements of Python, in case they have had no prior
exposure to the language. This brief material essentially covers the relevant Python
ecosystem discussed in Section 2 and an overview of the basic steps needed to get
Python up and running on their personal computers or the university machines.
Additionally, note that each of the weekly computer labs, which take place during
the course, is an opportunity for the instructors to guide the students on how to use
the different libraries needed to implement the mathematical concepts covered in the
lecture of that week.
The author has taught the course over the last 7 years, first using Excel and rele-
vant Visual Basic for Applications (VBA) codes to enhance some of the techniques.
The transition to Python was done much recently, considering the demand both from
industry and students, and also to keep up with the pace of developments in data
science more broadly. The motivation to prepare this paper came as a result of the
transition from Excel to Python, as the author was unable to find a single book or
resource relevant to prepare for a complete delivery of this course using Python.

1.2 Contribution of the Paper and Relevant Literature

The paper will mostly focus on the use of existing Python tools to generate forecasts,
although a bit of the background on the mathematical concepts will be provided
as necessary. Also, although prior knowledge of Python is not necessary, it will be
assumed that the reader has some level of familiarity with methods involved in the
corresponding mathematical material, as it would be required for anyone teaching
such a course. The lecture notes [15] that form the material of the course discussed
here are based on the books [1, 2].
As for the Python material, we only found the book [16] during the prepara-
tion of the first draft of the computing material, to be presented here, in 2019.
While preparing this paper, we came across the two new books [17, 18] on the
use of Python to generate forecasts on time series data. There are two common

13
2 Page 4 of 43 Operations Research Forum (2023) 4:2

denominators to these three books; the first is that they are mostly geared
towards machine learning–based techniques for time series forecasting, with the
exception of ARIMA models, which are covered in detail. Secondly, they essen-
tially focus on the use of Python tools to generate forecasts, and hence do not
specifically pay attention to the mathematical background of the methods, which
are the based on the corresponding Python forecasting tools.
Clearly, there are two differences between the content of this paper and what
is covered in the books [16–18]. At first, considering the page limitation of an
article such as this one, we also mostly only focus on the coding side of the
methods; however, our presentation is essentially organized along the lines of
the corresponding lecture notes [15], which provide the necessary mathemati-
cal background to develop a deep understanding of all the methods covered in
this paper. Secondly, unlike in these books, we focus our attention on statistical
methods, which form the basis of most of the methods which are at the heart
of the successful practical implementations in the context of the M competition
series, as discussed at the beginning of this introduction.
It is also important to mention that our philosophy in the preparation and
delivery of the course discussed in this paper is inspired in part by the book [2];
that is, giving the reader a balanced mathematical background of the forecast-
ing methods, while accompanying them with relevant practical software tool to
use these methods on practical data sets. However, the fundamental difference is
that [2] uses R while we use Python.
The lecture notes on which this course is based (i.e. [15]), as well as all the cor-
responding codes presented here, can be accessed online via the following link:
https://github.com/abzemkoho/forecasting.

1.3 Outline of the Paper

We start the next section with an overview of the main Python packages needed
to work with the tools that we will go through in this paper. Subsequently, we
present tools that can be used for a basic data analysis (i.e. time, seasonal,
and scatter plots, as well as correlation analysis, just to mention a few) before
the start of any forecasting task based on the methods covered in this paper.
Section 3 is devoted to exponential smoothing methods, which are very effi-
cient on time series that involve trends and/or seasonality. Section 4 covers
ARIMA methods; and finally, Section 5 presents tools for regression analysis
and how they can be used for forecasting. Note that the exponential smooth-
ing and ARIMA methods are blackbox techniques, as are they are built under
the assumption that historical patterns in the time series will keep repeating
themselves in the future. However, regression-based approaches assume that
the behaviour of the time series of interest (dependent variable) is influenced
by other variables (independent variables), and this is explored through linear
regression to possibly build more accurate forecasts.

13
Operations Research Forum (2023) 4:2 Page 5 of 43 2

2 Preliminaries on Python and Data Analysis

2.1 The Necessary Python Ecosystem

No prior knowledge of Python is required to use the material in this paper. How-
ever, we assume that the reader/instructor who wants to use the tools presented
here has Python up and running on their device (desktop, laptop, etc.) The codes
and corresponding results are based the use of Python under Anaconda 3
with Spyder 3.6 as editor, all running on Windows 10 Enterprise (proces-
sor: Intel(R) Core(TM) i5-6300U CPU @ 2.40 GHz). The advantage of using
Anaconda is that it installs Python with many important packages that are use-
ful for time series analysis of the type covered in this paper. This therefore helps
in part to reduce dependency issues between various packages used, and hence
ensure that key packages are set to work nicely together. Nevertheless, all the
codes presented here should be able to work smoothly on most platforms running
a version 3 of Python (see https://www.python.org/). The main packages needed
are as follows:

– SciPy;
– NumPy;
– Matplotlib;
– Pandas;
– Statsmodels.

As an ecosystem of open-source software for mathematics, science, and engineer-

ing, SciPy (https://scipy.org/) includes the NumPy library (https://numpy.org/) for
multi-dimensional array operations and the Matplotlib library (https://matplotlib.
org/) specifically designed for plotting.
Pandas (https://pandas.pydata.org/) is an open-source data analysis and
manipulation tool. It is important to mention here that all the data sets used for
our illustrations are stored in Excel spreadsheets. Hence, we use pandas function
named read_excel in almost all our codes, to read data from Excel work-
sheets. Finally, the statsmodels library (https://www.statsmodels.org/stable/
index.html#) is at the heart of most of the data analysis and forecasting meth-
ods presented in this paper, as it includes packages to generate forecasts using
exponential smoothing, ARIMA, and regression-based methods. Occasionally,
we could also use scikit-learn for a few tasks, such as generating some
error measures.
It is also important to mention that the abovementioned libraries all work
together, with NumPy and Matplotlib building on SciPy, while Statsmodels is
built on top of NumPy and SciPy, integrating with Pandas for data handling, as
already mentioned.

13
2 Page 6 of 43 Operations Research Forum (2023) 4:2

2.2 Basic Data Analysis Tools

In this subsection, we discuss the following five key topics, which are crucial in
the preliminary analysis of time series data sets:

– Time plots;
– Adjustments;
– Decompositions;
– Correlation analysis;
– Autocorrelation function.

A time plot is simply a two-dimensional graphical representation of a time series.

A time plot is typically the starting point of any forecasting task, as it enables a
first dive into the data set, to assess, for example, whether any errors or particular
patterns are present in it. To build a time plot, matplotlib can be used after the
data has been loaded using the read_excel function from pandas, as described
above. Listing A.1 provides the code for the time plot in Fig. 1(a); the remaining
ones are obtained just by replacing the clay bricks data set by the corresponding
ones. Note that read_excel includes a number of options to specify the sheet and
other information to be returned for use by the series.plot() command, which
generates the plot.
Obviously, Fig. 1(a), (b) show a globally increasing trend, while the trend in (c) is
generally decreasing. However, in the clay bricks sale, there is overall a global trend
with steady increase over time up to 1965, where we start having some occasional
large fluctuations which are difficult to explain, and hence, hard to predict without
knowing the underlying causes. But this could be a reflection of a cyclical trend
behaviour in the clay bricks sale time series.
Clearly, trend and cyclical behaviour can be easily identified from a graph. How-
ever, unlike trends and cyclical patterns, seasonality can be trickier to observe. Con-
sidering the important role that the identification of seasonality plays in develop-
ing/selecting some forecasting methods (e.g. exponential smoothing and ARIMA
methods, as it will be clear in the following sections), we need to pay some atten-
tion on how to assess it. There are various ways to assess whether a time series has

(a) (b) (c)

Fig. 1 Time plots for three data sets demonstrating different patterns, with (a) showing a strong increas-
ing trend until about 1965, a global increasing trend in (b), while we have a decreasing trend in (c). As
for all the data sets used in this paper, they are drawn from material related to the book [1] and will be
made available in the supplementary files

13
Operations Research Forum (2023) 4:2 Page 7 of 43 2

seasonality, including zooming out specific chunks of the corresponding time plots.
Also, a time plot can sometimes already give an initial indication on the presence of
seasonality in a time series; for example, intuitively, Fig. 1(b) already suggests that
we might be having peaks and troughs occurring at regular intervals. But some fur-
ther steps need to be taken to check this.
In this paper, we are going to mainly use the seasonal plots and the concept of
autocorrelation function (ACF) to decide whether a time series is seasonal or not.
The ACF will be defined at the end of this section. Before that, we start with the
seasonal plots, which correspond to a superposition of time plots over a succes-
sion of limited time periods (e.g. 12 months in the context of monthly observations,
which is what we have for most of the data sets used in our illustrations). Listing A.2
provides a code that can be used to build seasonal plots after having organized our
data in months for over a few years.
Clearly, there is an indication from Fig. 2 that the clay bricks and electricity data
may have seasonality, while it is unlikely to be the case for the treasury bills data.
From the time plots in Fig. 1, an initial guess could have already been made about
the electricity data, but maybe not necessarily for the clay bricks data. At the end of
this section, we will see how the ACF plots can help to further confirm seasonality
identified here.
Besides the different patterns that can be assessed using time plots, they can also
enable an assessment of the need for adjustments (e.g. mathematical transforma-
tions or calendar adjustments). Ideally, the role of a mathematical transformation is
to attempt to stabilize variance in a time series, where rapid changes in some parts of
a time plot can affect the ability of a forecasting method to generate accurate results.
For instance, the power (including the square root, as a special case) and log transfor-
mations are the most commonly used transformations in the literature; the square root
can help, in the case where the time series has the shape of a second-order quadratic
function, to promote a “linear” shape, which can improve the predictability capacity
of some forecasting methods. On the other hand, the log (of course, applicable only
for positive time series) has an additional advantage, in terms of its interpretability.
For more details on these transformations and many other adjustments, which can
positively impact the forecasting ability of some methods, see [2, Chapter 3]. List-
ings A.3, A.4, and A.5 provide appropriate codes to generate a log, square root, and
calendar adjustments, respectively. The code in Listing A.5 runs on a special data
set, where a calendar adjustment can be useful, as in the milk production of a cow,

(a) (b) (c)

Fig. 2 Seasonal plots for the time series plotted in Fig. 1; we can see signs of relatively strong seasonal-
ity in (a) and (b), while seasonality seems weak in (c). The level variation in each of the graphs is further
evidence of the increasing or decreasing trends that clearly appear in the time plots

13
2 Page 8 of 43 Operations Research Forum (2023) 4:2

the difference in the observations from one month to the other can essentially be due
to the number of days in months. Hence, the calendar adjustment can help to remove
such a calendar effect before any further analysis of this time series.
For a given time series {Yt }t , it is sometimes important to look for ways to split it
by means of a decomposition function f in such way that
Yt = f (Tt , St , Et ), (1)
where for a given t, Tt and St denote the trend-cycle and seasonal components,
respectively, and Et corresponds to the error that results from such a decomposi-
tion. Decompositions are useful in developing a better understanding of the consti-
tuting patterns in a time series, but not necessarily for generating forecasts. Stand-
ard selections for a decomposition function are f (Tt , St , Et ) ∶= Tt + St + Et (additive
decomposition) and f (Tt , St , Et ) ∶= Tt × St × Et (multiplicative decomposition).
The statsmodels function seasonal_decompose can be used to generate
these decompositions, with the option “model” suitable for indicating the nature of
the decomposition (i.e. additive or multiplicative); see Listing A.6 for an additive
decomposition code (used to generate Fig. 3, for illustrative purpose) and Listing
A.7 for a multiplicative one.
It is important to note that in terms of the background algorithm on how a decompo-
sition is computed, one usually starts with the trend estimation, and then, depending on
the nature of f (1), the seasonal component is estimated; interested readers are refereed
to the lecture notes associated to this material [15, Section 2] and references therein.
Correlation analysis comes into play when we want to explore relationships
between variables in cross-sectional data. There are at least two possible tools to
assess correlation between variables. Namely, scatter plots and correlation values,
both concepts are strongly related in the sense that the scatter plot provides a graphi-
cal representation that can demonstrate how strong the relationship between two vari-
ables is, while the correlation is a numerical value materializing the strength level of
such a relationship. As an example to illustrate these two concepts, consider a data set
made of a variety of used cars and their price (based on their mileage). For instance,
we might want to forecast (price) against one possible explanatory variables (mileage,
here). Running the code in Listing A.8 clearly shows that the price of a car decreases
as the mileage increases. Each point on the graph represents one specific vehicle.
Fig. 3 Additive decomposition
graphs for the clay bricks sale
time series

13
Operations Research Forum (2023) 4:2 Page 9 of 43 2

A scatter plot helps us to visualize the relationship and suggests that if one wants
to forecast the price of used car, a suitable model should include mileage as an
explanatory variable. In Listing A.8, the scatter plot function scatter function
from matplotlib is applied with arguments being the mileage and price as sepa-
rate entries. Note that pandas also has the function scatter_matrix, which
can generate scatter plots for many variables in one go; this could be particularly
important in Section 5 when studying the regression approach to forecasting. Fig-
ure 4, for example, generated by the code Listing A.9, shows scatter plots in a matrix
form for four time series.
The correlation is a statistic corresponding to a number between −1 and 1 to
measure the level of the linear relationship for bivariate data (i.e. when there are two
variables). The corrcoef function from numpy, see Listing A.8, calculates the
correlation between the mileage and prices of the cars, as discussed above. Note that
in principle, corrcoef is generated as a symmetric matrix, hence the use of cor-
relval[1,0] to extract the necessary value. In a situation where one is interested
in evaluating the relationships between various pairs of variables, the correlation
matrix enables the calculation of these values in one go, as discussed above in the
context of scatter plots, as illustrated in the left-hand-side of Fig. 4; the correspond-
ing correlation values are generated with the function corr from pandas; see the
table in the right-hand-side of Fig. 4 for an illustration with four time series.
For a given time series Yt , the concept of correlation can be extended to the time
lags Yt and Yt−k of this same series. Hence, such a correlation is called autocorrela-
tion. The autocorrelation is used to measure the degree of correlation between differ-
ent time lags in a time series. The autocorrelation function (ACF) is crucial in assess-
ing many properties in statistics, including seasonality, white noise, and stationarity.
In this section, we limit ourselves to the use of the ACF in assessing seasonality. For
its use in assessing white noise and stationarity, see Sections 3 and 4, respectively.

Fig. 4 Left, we have the matrix of scatter plots for four times series labelled as DEOM, AAA, Tto4, and
D3to4. On the right, we have the correlation matrix, which gives the correlation value that reflects the
relationship in each pair in these four data sets. As it can be seen in the scatter plots, the strongest corre-
lation is between AAA and Dto4, as confirmed by the correlation value, which is strictly larger than 0.50

13
2 Page 10 of 43 Operations Research Forum (2023) 4:2

Fig. 5 Left, we have the seasonal plots for most of the years involved in the times series. On the right-
hand-side, we have the ACF plot over 60 time lags

The function plot_acf from statsmodels can be used to generate ACF

plots. As illustration, we use the code in Listing A.10 to generate the ACF plot for
some building material data from Australia in Fig. 5. It can be seen that the seasonal
plots (left-hand-side) might be slightly unclear. But the ACF plot (right-hand-side)
strongly confirms the presence of seasonality, as peaks and troughs are approximately
occurring at regular interval, namely, at every 12 time lags, as the data is made of
monthly observations. Note that autocorrelation_plot from pandas can
also be used to produce ACF plots, but instead with a curve shape; this can be seen
by running the code in Listing A.10, where the relevant command is also included.

3 Exponential Smoothing Methods

Considering the importance of accuracy in forecasting, we start this section by dis-

cussing a few tools that can be used to assess the accuracy of a method; namely, we
discuss error measures, the concept of white noise via the ACF function, and confi-
dence intervals. After this, in Subsection 3.2, we introduce the exponential smooth-
ing methods, which, together with the ARIMA methods, represent the most widely
used forecasting techniques in practice.

3.1 Accuracy Measures

As accuracy is the first main concern when forecasting, we start here by discussing how
some standard error measures, i.e. the mean error (ME), mean absolute error (MAE),
mean square error (MSE), percentage error (PE), mean percentage error (MPE), and
the mean absolute percentage error (MAPE), can be computed using Python. To pro-
ceed, it is crucial to recall that an error measure on its own does not mean much, but
rather, it can only make sense in a comparison setting of 2 or more methods. Hence,
we introduce two naïve forecasting methods to illustrate how these error measures can
be used in practice. We begin with a naïve forecasting, labelled as NF1, which assumes
that for a times series {Yt }, the forecast at time point t + 1 is obtained as Ft+1 = Yt.
Next, we consider a second naïve forecasting method labelled as NF2:

13
Operations Research Forum (2023) 4:2 Page 11 of 43 2

Fig. 6 The results from NF1 and NF2 can be seen in the first and second graphs, respectively. As for the
corresponding error measures, see the table in the right-hand-side

1
Ft+1 = Yt − St + S(t−12)+1 with St = (mSt−12 + Yt ),
m+1
where St = Yt for t = 1, … , 12 and with m is the number of complete years of data
available; for the initialization of the method, we set Ft+1 = Yt for t = 1, … , 12.
The code in Listing B.1 generates the results in Fig. 6, which show both the
NF1 and NF2 forecast plots, as well as the corresponding error measures stated
above. Note that the ME and MPE are not to be taken very seriously as their
values essentially reflect the fact that positive and negative values just cancel
each other throughout the range. Clearly, NF2 outperforms NF1 on almost all
the measures, especially, on the positive ones (MAE, MSE, and MAPE), which
are more meaningful. This is not surprising, considering the fact NF2 contains
more structure capturing the nature of the data set much better than NF1, which is
essentially a one-step translation of the original data set. Similar comparisons can
be done for any two or more forecasting methods.
Another tool to assess the accuracy of a forecast method is the ACF of the
errors. Basically, the expectation is that if the results of a forecasting methods are
reasonably accurate, the time plot of the errors, seen as a time series, should be
purely random. Therefore, no patterns from the original data should be preserved
in the errors/residuals. Using the corresponding code in Listing B.2 on the data
used for Fig. 2, we get the graphs in Fig. 7, which clearly show that the forecasts
from NF1 preserve seasonality from the original time series, with the large spikes
appearing after every 12th time lag. Such a pattern is not clearly obvious for NF2.
Finally, providing the confidence interval for a forecast can help decision-
makers in building their management perspectives. Let Ft+1 be the forecast from a
given method, then, the corresponding lower and upper bounds can be obtained as

(a) (b) (c)

Fig. 7 The original time series is seasonal as it can be seen in (a) and this pattern is preserved in the
errors resulting from NF1 (b), with the spikes after each 12th time lag. However, the seasonality slightly
less clear for NF2 (c)

13
2 Page 12 of 43 Operations Research Forum (2023) 4:2

√ √
LF t+1 ∶= Ft+1 − z MSE and UF t+1 ∶= Ft+1 + z MSE ,

respectively, where MSE represents the mean square error over a suitable range of
the data, while z is a quantile of the normal distribution, which is a conventional
number that determines the level of confidence of the corresponding interval. Stand-
ard values commonly used in practice for z can be seen in Section 2 of [15]. Fig-
ure 8, generated with the code in Listing B.3, provides the confidence intervals for
the data and corresponding NF1 and NF2-based results.

3.2 Exponential Smoothing Methods

There are four main types of exponential smoothing methods, which can be
applied based on characteristics of our time series and sometimes also consider-
ing our intended purpose. Before diving into these methods, it is important to
mention that all the related Python tools that we are going to describe here are
from the statsmodels library. The first and simplest such method is the so-
called single exponential smoothing (SES) method. The SES is usually applied
only on time series that do not exhibit any specific pattern and can only produce
one step ahead forecast.
To set the stage for the general process of all the forecasting methods that we
are going to present in this paper, we are going to provide a brief overview of the
mathematical background of the SES method. To proceed, let us assume that we
are given a time series Y1, ..., Yt , where data is available from time point T = 1 up
to T = t . Then, the forecast for this time series at time point T = t + 1 using the
SES method can be calculated as
t−1
∑
= (1 − 𝛼) F1 + 𝛼 (1 − 𝛼)j Yt−j , (2)
t
Ft+1
j=0

where the parameter 𝛼 ∈ [0, 1]. There are various ways to initialize the method; one
possibility is to select F1 = Y1. The first key observation that can be made on the

√
Fig. 8 The confidence intervals here are obtained with the formula Ft ± z MSE with z being the parame-
ter ensuring that the 90% chance that the forecasts would be between the lower and upper bounds provided

13
Operations Research Forum (2023) 4:2 Page 13 of 43 2

formula (2), and which justifies the name of this class of methods, is the fact that if
one looks carefully at the factor (1 − 𝛼), we will observe that it decays exponentially
as the power j increases. More interestingly, by the nature of the expression, this
increase is associated with the decrease of the indices of Yj . Hence, this means that
the value of Ft+1 relies heavily on more recent values of the time series Y1, ..., Yt .
This is one of the particular characteristics of any exponential smoothing methods.
Additionally, being able to optimally select the value of the parameter 𝛼 is critical
for the performance of the method. The strategy commonly used in this case is the
least square optimization approach to select its best value. It corresponds to mini-
mize the MSE
t t
1∑ 2 1 ∑( )2
min ej ∶= Fj − Yj s.t. 𝛼 ∈ [0, 1], (3)
t j=1 t j=1

where Fj , 1 = 1, … , t is obtained from (2). The function from statsmodels to

generate forecasts using the SES method is SimpleExpSmoothing. Applying it,
as in Listing B.4, to an example of data set generates the results in Fig. 9, where
we can clearly see that the models based on manually selected values 𝛼 = 0.1 and
𝛼 = 0.7 for models 1 and 2, respectively, are not as good as the 3rd model, which is
obtained by solving the optimization problem in (3); clearly, the MSE values in the
right-hand-side table in Fig. 9 confirm that the parameter 𝛼 obtained from the opti-
mization problem (3) incurs the smallest error.
The core part of the code in Listing B.4 is SimpleExpSmoothing imported
from the subpackage statsmodels.tsa.api of statsmodels. Everything
else is essentially the selection of the parameter 𝛼 for models 1 and 2 using the
option smoothing_level; obviously, in the case where 𝛼 is optimally selected,
the default selection of the optimized option is set at True. Also, recall that as
the SES method can only produce a one-step forecast, the number 10 that appears
in the command fit2.forecast(10).rename (r’𝛼 = 0.7’), in the 2nd model,
for example, sets the number of times that the single value of Ft+1 is going to be
repeated. This is essentially to clearly visualize the result in the graph; however, it
creates an impression that the forecast values beyond t are a line.
The second basic exponential smoothing method is the Holt linear method, which
builds on the same path as (2)–(3). Holt’s linear method is suited for time series

Fig. 9 On the left, we have the forecast plots for different values of the parameter 𝛼 , with the 3rd being
the optimal one. The table on the right provides values of the MSE for each value of the parameter

13
2 Page 14 of 43 Operations Research Forum (2023) 4:2

involving trend without the presence of seasonality. Hence, this method involves an
estimate of the level and linear trend of the time series at a given time point. As
a consequence, the Holt linear method involves level and slope parameters 𝛼 and
𝛽 , respectively. These parameters can be optimized using the minimization of the
MSE, similarly to what is done in (3). Similarly to SES, Holt’s linear method is
applied by simply calling the function named Holt from statsmodels.tsa.
api. In the case where we want to set the parameters 𝛼 and 𝛽 manually, we can
use the options smoothing_level and smoothing_slope, respectively. To
improve the forecasting performance of the Holt linear method, the Holt function
provides an option to select the nature of the trend using the exponential or
damped option, as it can be seen in the following excerpt of the Holt forecasting
code in Listing B.5:

Obviously, the default selection of the trend in the first model (see first line in
this excerpt) is the linear trend. For more details on the different type of trends and
the corresponding mathematical adjustment, see https://www.statsmodels.org/stable/
generated/statsmodels.tsa.holtwinters.Holt.html.
Finally, we now present the Holt-Winter forecasting method, which is suitable for
time series involving both trend and seasonality. Hence, in addition to the level and
trend components needed in the Holt linear method (design only for the case where
trend in present in our time series), a seasonal component is needed. The seasonal
component also comes with its parameter generally denoted by 𝛾 . As it should be
the case for the previous two methods, all the parameters are required to be real
numbers from the interval [0, 1]. Since the Holt-Winter method is more general than
the SES and LES, the corresponding function from statsmodels.tsa.api is
labelled as ExponentialSmoothing.

As we can see from this excerpt of the corresponding code in Listing B.6, besides
the parameters 𝛼 , 𝛽 , and 𝛾 , represented here by smoothing_level, smooth-
ing_slope, and smoothing_seasonal, which can be fixed or optimized as
in the previous two exponential smoothing methods, we have the nature of the trend
and seasonality, which can be additive or multiplicative. Clearly, the term add (resp.
mul) is used for additive (resp. multiplicative) trend or seasonality. More details on
these concepts can be found in [15, Section 2].

13
Operations Research Forum (2023) 4:2 Page 15 of 43 2

(a) (b) (c)

(d) (e) (f) (g)

Fig. 10 All the results here are generated with the codes in Listing B.6, where a is obtained from Expo-
nentialSmoothing and b results from errors calculated with values extracted from the results from
the ExponentialSmoothing function. The table in c is obtained by applying mean_squared_
error from sklearn.metrics to forecast values extracted from the results from Exponen-
tialSmoothing. The latter results can also be obtained straightforwardly from the formulas of the
MSE. As for the second row (d)−(g), the plots there are generated with plot_acf from statsmodels. The
data set is based on cement production in Australia

We use the code in Listing B.6 to generate the results in Fig. 10, which clearly
show that the optimized models 3 and 4 are the best, with the 3rd one with addi-
tive trend and seasonality being slightly better. The ACF of the residuals from each
method are also included in the code, to further evaluate the performance of each
method. It is clear that the residuals for models 1 and 2 retain the seasonality present
in the original data set. On the other hand, Fig. 10(b), (f), and (g) just confirm that
residuals seem relatively random.

4 ARIMA Methods

4.1 Preliminary Tools

As we have seen so far, the ACF plot can play an important role in showing that a
time series is seasonal and also in assessing the accuracy of a forecasting method
(mainly via the white noise concept). In this section, we are going to see how the
ACF can also be helpful in assessing a few other properties relevant to the ARIMA
method, namely, in assessing stationarity and the identification of an ARIMA model.
However, to strengthen the capacity of the ACF in this role, we now introduce the
concept of partial autocorrelation function (PACF), which is used to measure the
degree of association between observations at time lags t and t − k (i.e. Yt and Yt−k ,
respectively) when the effects of other time lags, 1, … , k − 1, are removed. Hence,
partial autocorrelations calculate true correlations between Yt , Yt−1, ..., Yk and can
therefore be obtained using a regression formula on these terms, while proceeding

13
2 Page 16 of 43 Operations Research Forum (2023) 4:2

as in the least square approach in (3) or the concept of maximum likelihood estima-
tion, which is more common in this case [2].
To get a good flavour of how the PACF can be applied, let us use it to further
illustrate white noise in combination with ACF. Similarly to the ACF, as shown in
Subsection 2.2, the PACF can be plotted by simply applying the function plot_
pacf from statsmodels.graphics.tsaplots. The code in Listing C.1
generates the AFC and PACF for an example of white noise model. The important
thing to note when this code is ran is how the ACF and PACF of a typical white
noise model look like; recall that for a model to be statistically while
√ noise, about
95% of the values of ACF and PACF are within the range ± 1.96∕ n , where n is
the total number of observations. This range is represented by the shadow band that
appears in the graphs of both the ACF and PACF.
We now turn our attention to the concept of stationarity, which is at the heart of
the development of ARIMA methods. Recall that a time series is stationary if the
distribution of the fluctuations is not time dependent. This is easy to say, but it can
be tricky to actually show that a time series is stationary. We try now to provide a
few tools that can be helpful in identifying stationarity in a time series. To proceed,
we start by stating the following scenarios or specific tools that we are going to rely
on to identify whether a time series is stationary or not:

1. A white noise time series is stationary;

2. A time series with trend or seasonality is non-stationary;
3. A cyclical time series with no trend and no seasonality is stationary;
4. A non-stationary time series can be detected by means the ACF and PACF;
5. A unit root test can be used to show that a time series is stationary.

We have just seen how to determine whether a time series is white noise, using the
ACF and PACF, which can be plotted with Python using plot_acf and plot_
pacf, respectively. As for the second item, we already know, see Subsection 2.2,
how to identify trend and seasonality, as well as cyclical patterns, using time plots.
There is an interesting way to show that a time series is non-stationary by means of
its ACF and PACF plots. Basically, the autocorrelations of a stationary time series
drop to zero quite quickly, while those of a non-stationary one can take a significant

Fig. 11 Example of non-stationary times series (Dow Jones data from January 1956 to April 1980)

13
Operations Research Forum (2023) 4:2 Page 17 of 43 2

number of time lags to become zero. On the other hand, the PACF of a non-stationary
time series will typically have a large spike, possibly close to 1, at lag 1. This can
clearly be observed in Fig. 11 generated with the code in Listing C.2.
Ultimately, if the first four points above cannot help to make a definite decision
on the stationarity or non-stationarity of a time series, then we can proceed with
a unit root test. It is important to say beforehand that this is not a magic solution
to demonstrate stationarity, as there are various types of unit root tests, which can
sometimes provide contradictory results. The version of the unit root test that we
consider here is the augmented Dickey-Fuller (ADF) test [19], which assesses the
null hypothesis that a unit root is present in a time series sample.
A simple understanding of the ADF test that is relevant to us is that it generates
a number of statistics that we are going to present next. To generate these statistics,
the function adfuller from statsmodels.tsa.stattools can be applied
to our data set. This function simply takes in the values of the time series, as it can be
seen in the example used in the code present in Listing C.3, which is used to generate
the results in Fig. 12 from three different scenarios. Considering some building mate-
rial production data from Australia, the first row of Fig. 12 presents the time, ACF,
and PACF plots, respectively, as well as the statistics generated by the ADF test.
The ADF test (see last column of Fig. 12) generates three key categories of sta-
tistics. First, we have the ADF statistics itself, which needs to be negative and sub-
sequently would need to be less than the 1% critical value to confirm the strength
of stationary if additionally, the P-value is at least less than the threshold value of
0.05. We can clearly see from Fig. 12 how the ADF test helps to confirm that we go

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

Fig. 12 The original time series is obviously not stationary as it can be seen in (a)−(d); after first differ-
encing, trend is removed but the new series is still not stationary as it is seasonal; see (e) - (h). It is after
both first and seasonal differencing that we obtain a stationary time series; cf. (i)−(l)

13
2 Page 18 of 43 Operations Research Forum (2023) 4:2

from a series, where the original and first differenced series are non-stationary to a
stationary time series when first and seasonal differencing are done.

4.2 Models and Selection Process

A simplistic though seemingly nice way to introduce the ARIMA methods is to

think about the corresponding forecast function as a polynomial f ∶ ℝ → ℝ defined
by

f (x) ∶= a0 + a1 x + a2 x2 + … ap xp ,

where p is the order of the polynomial and a0, a1, ..., ap are its coefficients. To get a
complete description of this polynomial, we need to start by identifying the order p,
which determines the number of the coefficients a0, a1, ..., ap, which can then be sub-
sequently calculated. This is approximately what is done to build an ARIMA model.
To make things a bit precise, let us consider a non-seasonal ARIMA(p, d, q) model

(1 − 𝜙1 B − … − 𝜙p Bp )(1 − B)d Yt = c + (1 − 𝜃1 B − … − 𝜃q Bq )et , (4)

where Bk Yt ∶= Yt−k corresponds to the backshift notation. Here, the vector (p, d, q)
represents the order of the model, and 𝜙i , i = 1, … , p and 𝜃j , j = 1, … , q are param-
eters/coefficients of the model. Algorithm 1 summarizes the building process of an
ARIMA model, including the forecasting step.

In Step 1, determining d is the most obvious thing, as it simply results from

whether we need to do differencing or not to ensure that our time series is stationary.
If no differencing is needed, then d = 0. Otherwise, d ≥ 1 simply represents how
many times differencing is needed to obtain stationarity. p and q are much more
trickier to obtain. An initial approximation of these numbers can be derived from the
ACF and PACF plots. To proceed, let us first present graphs of the ACF and PACF
of pure autoregressive and moving average models
AR (p) = ARIMA (p, 0, 0) and MA (q) = ARIMA (0, 0, q), (5)
respectively, for an artificially generated time series example. AR(p) is obtained if
the ACF of this time series is exponentially decaying or sinusoidal and there is a
significant spike at lag p in the PACF, but no larger one beyond lag p. As for the pure
moving average model MA(q), the PACF is exponentially decaying or sinusoidal; and
there is a significant spike at lag q in the ACF plot, but not larger than one beyond
lag q. Of course, in the case of a non-stationary time series, these observations would

13
Operations Research Forum (2023) 4:2 Page 19 of 43 2

Fig. 13 The first row presents the time, ACF, and PACF plots of an artificially generated autoregressive
model of order 1. The second row presents analogous graphs for an artificially generated moving average
of order 1

be made on ACF and PACF of “sufficiently differenced” (in the sense of leading to
stationarity) data. The graphs in Fig. 13 show an AR(1) and a MA(1) in the first and
second row, as generated by Listings C.4 and C.5, respectively.
Considering the fact that the approach in Step 1 can only enable the estimation
of pure AR and MA models, we need a way to check whether our series exhibits
a more general ARIMA(p, d, q) model with p > 0 and q > 0 simultaneously. The
AIC, which is a function of p and q, can help us to check whether there is a model
better than the one obtained from Step 1. The smaller the AIC, the better the model
is. To proceed, we can use the code in Listing C.6, which runs through a combina-
tions of values p, d, and q from the interval [0, 2] to identify the order (p, d, q) with
best AIC. For the selection of d, it is straightforward to use the process described
above, repeating the differencing as necessary to get the best statistics from the ADF
test based on the code in Listing C.3.
In terms of the content of the code in Listing C.6, its main feature is the ARIMA
function from statsmodels. This function is also going to be used for Step 4
of Algorithm 1, but one of its most interesting features is that it also generates
other important information such as the AIC of the corresponding model. How-
ever, in the context of Listing C.6, its main role is to print and compare the AIC
to identify the best model. When the most suitable values of the order (p, d, q)
have been identified, the ARIMA function can then be applied, using this order,
to generate the forecasts, as it is done for the example in Listing C.7. Running the
code generates forecast plots and some important statistics, including the AIC of
the model and the corresponding coefficients/parameters 𝜙i , i = 1, … , p and 𝜃j ,
j = 1, … , q as described in the equation in (4).
So far, we have considered only time series that are not necessarily seasonal.
In the seasonal case, the process is the same, except that the seasonal order
(P, D, Q) and periodicity s have to be provided, as indicated in the general model

13
2 Page 20 of 43 Operations Research Forum (2023) 4:2

ARIMA (p, d, q)(P, D, Q)s. (6)

The first key difference between the non-seasonal (4) and seasonal (6) ARIMA
models is the parameter s, which represents the number of time periods per season
shown by the time series; for example, for the seasonal time series examples that
we have covered so far (see, e.g. Figs. 2 and 5), the patterns repeat themselves every
12 months — hence, s = 12 in those cases. Similarly to d in (4), D in (6) represents
the number of seasonal difference needed to remove seasonality in the time series.
Furthermore, the pure seasonal autoregressive and moving average models
ARIMA (p, 0, 0)(P, 0, 0)s and ARIMA (0, 0, q)(0, 0, Q)s , (7)
respectively, can be obtained in a way similar to (5) by looking whether the patterns
from the ACF and PACF plots approximately repeat themselves after s time lags.
For example, using the code in Listing C.8, first-order differencing and seasonal dif-
ferencing can be done to remove trend and seasonality from the electricity data from
Fig. 2(b). Based on the reference ACF and PACF plots in Fig. 13 (second row), this
leads to the model ARIMA(0, 1, 1)(0, 1, 1)12 in Fig. 14 (third row), as a MA(1) pat-
tern approximately repeats itself from the 12th time lag.
After this identification trial for a seasonal model based on the ACF and PACF
plots, one can then proceed with the automatic identification process similar to the
one introduced above (see Listing C.6), while using the corresponding seasonal code
available in Listing C.9. Similarly, when the best seasonal order (p, d, q)(P, D, Q)

Fig. 14 These graphs generated from Listing C.8 present the changes in the electricity demand times
series data in Fig. 1(b), going from the original data and its ACF and PACF plots (first row), passing by
the first difference (second row) to the graphs resulting from first and seasonal differencing (third row)

13
Operations Research Forum (2023) 4:2 Page 21 of 43 2

with the corresponding number of time periods per season (s) has been identified,
the seasonal ARIMA function (SARIMAX) also from statsmodels (see List-
ing C.10) can be used to generate the forecasts. Running SARIMAX with the code
available in Listing C.10 applied on building material time series from 1986 to 2008
in Australia, we get the graphs in Fig. 15 together with a number of statistics assess-
ing the quality of the model and the results.

Fig. 15 Summary of graphical results obtained by running the SARIMAX(1, 1, 1)(0, 1, 1)12 model using
the code from Listing C.10 on building material time series from 1986 to 2008 in Australia. The first four
graphs assess the accuracy of the method, with (1) the residual plot, (2) the distribution of the error (close
to a normal distribution), (3) the normal Q–Q plot, which compares randomly generated and independent
standard normal data on the vertical axis to a standard normal population on the horizontal axis (the clos-
est the data points are to a line suggests that the data are normally distributed), and (4) the correlogram for
checking randomness in the residual. The last row shows the one-step forecasts on a section of the data for
some visual assessment of accuracy, as well as the out-of-sample future forecasts over a 20-step horizon

13
2 Page 22 of 43 Operations Research Forum (2023) 4:2

5 Regression‑Based Forecasting Method

The particularity of the method that we are going to discuss here is that it is explan-
atory, in comparison to the previous ones, which are blackbox methods. A regres-
sion model exploits potential relationships between the main (dependent) variable
and other (independent) variables. We focus our attention here on the simplest and
most commonly used relationship, which is the linear regression:
Y = bo + b1 X1 + … + bk Xk + e, (8)
where Y is the dependent variable, X1, ..., Xk the independent variables, and b0, b1,
..., bk the coefficients/parameters, where b0 specifically is often called intercept. It
is important to start by recalling that a regression model as (8) is not a forecasting
method by itself; there is a large number of applications of regression models in sta-
tistics and econometrics; see, e.g. [20] for a detailed analysis of regression models
and some flavour of a sample of applications.
To apply the regression model (8) to develop a forecast for a time series {Yt }, we
assume that it is influenced by other time series {Xit } for i = 1, … , n. To have some
flavour of this, we consider the mutual savings bank case study from [14, a regres-
sion model can be built to forecast EOM while considering AAA and Tto4 as inde-
pendent variables. For some technical reasons (see [1]), our Y is the first-order dif-
ference of EOM (denoted by DEOM), and X1, X2, and X3 the AAA, Tto4, and D3to4
(first-order difference of Tto4), respectively. Note that historical time series data sets
are available for the variables DEOM, AAA, Tto4, and D3to4, and there are some
level of relationship between these variables as it can be seen from the scatter plots
and correlation matrix in Fig. 4. However, this is not enough to guarantee that the
regression model resulting from this relation would be significant. The analysis of a
regression model starts with the evaluation of its overall significance.
For the overall significance of a model, key statistics are the R2 (known as
the coefficient of determination) and the P-value, which gives the probability of
obtaining a F statistic as large as the one calculated for the data set being studied,
if in fact the true slope is zero. As the R2 is a number between 0 and 1, model (8)
would be considered to be significant if it is at least greater than 0.50. Hence, the
overall significance of the model increases as R2 grows closer to the upper bound
1. Furthermore, from the perspective of the P-value, a regression model will be
said to be significant if the P-value is smaller than the conventionally set value of
0.05; and the significance improves as the P-value decreases below this threshold.
Before we expand this discussion further, let us show how the aforementioned
statistics can be obtained with Python. Our analysis of a regression model here
is based on the ols function from statsmodels, which means ordinary least
squares, given that the parameters in (8) are computed by the same least square
approach introduced for the SES model in (3). As you can see in the demonstra-
tion code in Listing D.1, it is incredibly easy to use ols. For example, to build
the basic model for our above bank case study, what is needed is to start by writ-
ing the regression equation
formula = ’DEOM ÃAA + Tto4 + D3to4’,

13
Operations Research Forum (2023) 4:2 Page 23 of 43 2

recalling that Y = DEOM is the dependent variable and X1 = AAA, X2 = Tto4,

and X3 = D3to4, respectively, are the independent variables. The function ols
can then be applied as follows to the combination of this formula and the data set
to produce the statistics:
results = ols(formula, data=series).fit(),
where series corresponds to the container with the time series data sets for the
dependent and independent variables. The results from this function (generated
with the code in Listing D.1) are given in the table in Fig. 16.
The orange box in Fig. 16 contains the statistics of overall significance of the
model from the corresponding example, where Y is represented by time series
DEOM, while X1, X2 , X3, and X4 , are represented by AAA, Tto4, and D3to4,
respectively. It can clearly be seen that the overall model in this example is sig-
nificant, with an R2 of 0.56 and a P-value of 7.59 × 10−9 . But the R2 suggests that
the significance is not that strong, although the P-value is relatively good from
the perspective of the threshold value of 0.05.
For the individual significance of each variable involved in (8), the main sta-
tistics is the P-value. Similarly to the F-test, the statsmodels function ols
(ordinary least squares) generates P-values of each t-statistic. Each of these
P-values is the probability of obtaining an absolute value of the t-statistics, of
a given independent variable, as large as the one calculated for the data, if the
parameter is equal to zero. So, if a P-value is small, then the estimated parameter
is significantly different from zero. As with F-tests, it is common to conclude that
an estimated parameter is significantly different from zero (i.e. significant) if the
P-value is smaller than 0.05. The 5th column of the green box in Fig. 16 gives the
P-value of each of the 3 independent variables in the example introduced above.

Fig. 16 Key statistics to assess the overall and individual significance of a regression model

13
2 Page 24 of 43 Operations Research Forum (2023) 4:2

Clearly, the significance of AAA, Tto4, and D3to4 is relatively good, as it is less
than the threshold value of 0.05, although that of the latter variable is weaker.
Interestingly, the green box in the table in Fig. 16 also provides the coefficients of
this example (cf. second column). After we have seen how the function ols can help
to generate the key statistics to assess the overall and individual significance of the
model, it remains to see how the forecast can actually be derived. To be able to do this,
we need the forecasts
Gi = (Gi1 , … , Gik ) of Xi = (Xi1 , … , Xik ) for i = t + 1, … , t + m.

We can then use each of these forecasts of the independent variables in the
expected value that determined the regression-based forecast for the independ-
ent variable Y using Eq. (8):

Fi = Ŷ i = Gi b̂ for i = t + 1, … , t + m, (9)

where the forecasts Gi of each independent variable can be obtained by any method
that is most suitable. Applying (9) to our example above (see Listing D.2), we obtain
the results in Fig. 17.
To conclude this section, some quick comments are in order. First, one of the typi-
cal preliminary step when building a regression model is to conduct a correlation

Fig. 17 Generating forecasts for the time series involved in this model, i.e. AAA, Tto4, and D3to4 for the
independent variables and DEOM for the dependent variable, is quite challenging as none of data sets
exhibits a clear pattern. Hence, from the exponential smoothing methods covered in Section 3, only Holt’s
linear method is the most suitable, as it enables the calculation of out-of-sample forecasts over a number of
time points ahead. An ARIMA method could also be used to generate forecasts for AAA, Tto4, and D3to4

13
Operations Research Forum (2023) 4:2 Page 25 of 43 2

analysis (e.g. scatter plots, correlation matrix), which can be done using tools that
we have discussed in Subsection 2.2. This can be done here with matrix scatter plots
and correlation tables; see Fig. 4. Also, often to improve an initial model as in (8) or
resulting forecasting accuracy (9), a careful selection process of variables or features
of the data sets can be done. Finally, the term prediction is usually confused with that
of forecast. Prediction is much more broad, as it includes tasks such as predicting the
result of a soccer game or an election, where only characteristics of the player of each
team (soccer) or surveys from voters (election) not necessarily historical data can be
used. Further details on these topics can be found in [1, 2, 15] and references therein.

6 Conclusion

This paper puts together a set of Python-based mostly off-the-shelf tools to develop
forecasts for time series data using basic statistical forecasting methods, namely,
exponential smoothing, ARIMA, and regression methods. It is important to mention
that for each forecasting method and analysis tool described in this paper, there could
be multiple Python approaches available, to undertake them, across different Python-
based platforms. Secondly, within many packages, there could also be various ways
to do the same thing. So, when using the material presented here, it will be useful to
have a look at the most recent updates on the corresponding packages’ websites (see
corresponding links provided in Section 2) for other possible ways to conduct spe-
cific analysis or for the most recent updates on possible improvements to these tools.

Appendix

Codes on Preliminaries on Python and Data Analysis

13
2 Page 26 of 43 Operations Research Forum (2023) 4:2

13
Operations Research Forum (2023) 4:2 Page 27 of 43 2

13
2 Page 28 of 43 Operations Research Forum (2023) 4:2

Codes on Exponential Smoothing Methods

13
Operations Research Forum (2023) 4:2 Page 29 of 43 2

13
2 Page 30 of 43 Operations Research Forum (2023) 4:2

13
Operations Research Forum (2023) 4:2 Page 31 of 43 2

13
2 Page 32 of 43 Operations Research Forum (2023) 4:2

13
Operations Research Forum (2023) 4:2 Page 33 of 43 2

Codes on ARIMA Methods

13
2 Page 34 of 43 Operations Research Forum (2023) 4:2

13
Operations Research Forum (2023) 4:2 Page 35 of 43 2

13
2 Page 36 of 43 Operations Research Forum (2023) 4:2

13
Operations Research Forum (2023) 4:2 Page 37 of 43 2

13
2 Page 38 of 43 Operations Research Forum (2023) 4:2

13
Operations Research Forum (2023) 4:2 Page 39 of 43 2

13
2 Page 40 of 43 Operations Research Forum (2023) 4:2

Codes on the Regression Analysis and Application to Forecasting

13
Operations Research Forum (2023) 4:2 Page 41 of 43 2

13
2 Page 42 of 43 Operations Research Forum (2023) 4:2

Acknowledgements The lecture notes [15] (based on the textbooks [1, 2]), which have served as base for
the mathematical background of the data analysis and forecasting tools discussed in this paper, have been
developed and refined over the years thanks to contributions from many colleagues from the Southamp-
ton OR Group, in particular, I would like to mention Russell Cheng and Honora Smith for preparing and
delivering the Forecasting course for many years, until the 2013–2014 academic year. The author would
like to thank the referee and the guest editor for their constructive feedback, which led to improvements in
the presentation of the paper.

Funding This work is supported by the EPSRC grant with reference EP/V049038/1 and the Alan Turing
Institute under the EPSRC grant EP/N510129/1.

Data Availability All the data sets used for the illustrations in this paper are based on the book [1]; all
the data sets related to this book are available online: https://cloud.r-project.org/web/packages/fma/index.
html. As for the specific times series from this database used in this paper, they are available via the fol-
lowing link, together with all the py files associated to the codes in the appendix: https://github.com/
abzemkoho/forecasting.

Declarations
Conflict of Interest The author declares no competing interests.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons licence, and indicate if changes were made. The images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permis-
sion directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/
licenses/by/4.0/.

References
1. Makridakis S, Wheelwright SC, Hyndman RJ (2008) Forecasting methods and applications. J Wiley
& Sons
2. Hyndman RJ, Athanasopoulos G (2018) Forecasting: principles and practice. OTexts
3. Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning.
APSIPA Transactions on Signal and Information Processing 3
4. Hamzaçebi C, Akay D, Kutay F (2009) Comparison of direct and iterative artificial neural network
forecast approaches in multi-periodic time series forecasting. Expert Systems with Applications
36(Part 2):3839–3844
5. Robinson C, Dilkina B, Hubbs J, Zhang W, Guhathakurta S, Brown MA et al (2017) Machine learn-
ing approaches for estimating commercial building energy consumption. Appl Energy 208(Supple-
ment C):889–904
6. Salaken SM, Khosravi A, Nguyen T, Nahavandi S (2017) Extreme learning machine based transfer
learning algorithms: a survey. Neurocomputing 267:516–524
7. Voyant C, Notton G, Kalogirou S, Nivet ML, Paoli C, Motte F et al (2017) Machine learning meth-
ods for solar radiation forecasting: a review. Renew Energy 105(Supplement C):569–582
8. Zhang G, Eddy Patuwo B, Hu YM (1998) Forecasting with artificial neural networks: the state of
the art. Int J Forecast 14(1):35–62
9. Zhang L, Suganthan PN (2016) A survey of randomized algorithms for training neural networks. Inf
Sci 364-365(Supplement C):146-155
10. Adya M, Collopy F (1998) How effective are neural networks at forecasting and prediction? A
review and evaluation. J Forecast 17(56):481–495

13
Operations Research Forum (2023) 4:2 Page 43 of 43 2

11. Chatfield C (1993) Neural networks: forecasting breakthrough or passing fad? Int J Forecast
9(1):1–3
12. Sharda R, Patil RB (1992) Connectionist approach to time series prediction: an empirical test. J
Intell Manuf 3(1):317–323
13. Makridakis S, Spiliotis E, Assimakopoulos V (2018) Statistical and machine learning forecasting
methods: concerns and ways forward. PLoS ONE 13(3):e0194889
14. Makridakis S, Spiliotis E, Assimakopoulos V (2018) The M4 Competition: results, findings, con-
clusion and way forward. Int J Forecast 34(4):802–808
15. Zemkoho A (2021) Forecasting. School of Mathematical Sciences, University of Southampton, Lec-
ture Notes
16. Brownlee J (2018 ) Introduction to time series forecasting with Python. Ebook avaliable at https://
machinelearningmastery.com/introduction-to-time-series-forecasting-with-Python/, (Accessed on
15 Nov 2019)
17. Korstanje J (2021) Advanced forecasting with Python. Apress
18. Lazzeri F (2021) Machine learning for time series forecasting with Python. J Wiley & Sons
19. Dickey DA, Fuller WA (1979) Distribution of the estimators for autoregressive time series with a
unit root. J Am Stat Assoc 74:427–431
20. Montgomery DC, Peck EA, Vining GG (2021) Introduction to linear regression analysis. J Wiley &
Sons

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

Roncalli Introduction To Risk Parity and Budgeting PDF
No ratings yet
Roncalli Introduction To Risk Parity and Budgeting PDF
151 pages
Daniel Yergin, Joseph Stanislaw-The Commanding Heights - The Battle Between Government and The Marketplace That Is Remaking The Modern World-2002
No ratings yet
Daniel Yergin, Joseph Stanislaw-The Commanding Heights - The Battle Between Government and The Marketplace That Is Remaking The Modern World-2002
13 pages
Econ 106
No ratings yet
Econ 106
4 pages
Espen Gaarder Haug: Space-Time Finance
No ratings yet
Espen Gaarder Haug: Space-Time Finance
14 pages
Prediction, Learning, and Games (2006)
100% (1)
Prediction, Learning, and Games (2006)
407 pages
Forecasting Stock Market Crisis Events Using Deep and Statistical Machine Learning Techniques
No ratings yet
Forecasting Stock Market Crisis Events Using Deep and Statistical Machine Learning Techniques
19 pages
How to Implement Market Models Using VBA
From Everand
How to Implement Market Models Using VBA
Francois Goossens
No ratings yet
Lecture 3 EARTH WORKS AND MASS HAUL DIAGRAM
100% (2)
Lecture 3 EARTH WORKS AND MASS HAUL DIAGRAM
18 pages
08 Nodal Analysis
No ratings yet
08 Nodal Analysis
27 pages
The Merton Model
No ratings yet
The Merton Model
14 pages
Time Series Analysis in Python With Statsmodels
No ratings yet
Time Series Analysis in Python With Statsmodels
8 pages
The Equity Premium Puzzle
No ratings yet
The Equity Premium Puzzle
26 pages
Revisiting The Equity Risk Premium
No ratings yet
Revisiting The Equity Risk Premium
126 pages
Bayesian Methods in Finance-Nick Polson
No ratings yet
Bayesian Methods in Finance-Nick Polson
38 pages
Lecture - 12 Von Neumann & Morgenstern Expected Utility
No ratings yet
Lecture - 12 Von Neumann & Morgenstern Expected Utility
20 pages
Bayesian Statistics: Thomas Bayes
No ratings yet
Bayesian Statistics: Thomas Bayes
22 pages
Algorithmic Trading & Quantitative Strategies Gappy Lecture 5
No ratings yet
Algorithmic Trading & Quantitative Strategies Gappy Lecture 5
22 pages
Advance Stochastic Calculus (Abstracts) PDF
100% (2)
Advance Stochastic Calculus (Abstracts) PDF
106 pages
The Black-Litterman Model Explained
No ratings yet
The Black-Litterman Model Explained
19 pages
Harlow 1991
No ratings yet
Harlow 1991
13 pages
Paul Samuelson Theoretical Notes On Trade Problems
No ratings yet
Paul Samuelson Theoretical Notes On Trade Problems
11 pages
Understanding The Kelly Capital Growth Investment Strategy
No ratings yet
Understanding The Kelly Capital Growth Investment Strategy
7 pages
Statistics Done Wrong PDF
No ratings yet
Statistics Done Wrong PDF
27 pages
Lecture 14 - Mutual Fund Theorem and Covariance Pricing Theorems
No ratings yet
Lecture 14 - Mutual Fund Theorem and Covariance Pricing Theorems
16 pages
20 A Long-Run and Short-Run Component Model of Stock Return Volatility
No ratings yet
20 A Long-Run and Short-Run Component Model of Stock Return Volatility
23 pages
Information Theory Is The New Central Discipline
No ratings yet
Information Theory Is The New Central Discipline
3 pages
Lecture 7 - Quantifying Uncertainty and Risk
No ratings yet
Lecture 7 - Quantifying Uncertainty and Risk
8 pages
Algorithmic Trading & Quantitative Strategies Gappy Lecture 2
No ratings yet
Algorithmic Trading & Quantitative Strategies Gappy Lecture 2
25 pages
Grinold - Factor Models
No ratings yet
Grinold - Factor Models
22 pages
Nassim Taleb
No ratings yet
Nassim Taleb
5 pages
The Econometric Modelling of Financial Time Series: Terence C. Mills
100% (1)
The Econometric Modelling of Financial Time Series: Terence C. Mills
11 pages
Update Sept 07
No ratings yet
Update Sept 07
44 pages
PHD Thesis Bernardo R C Costa Lima Final Submission
100% (1)
PHD Thesis Bernardo R C Costa Lima Final Submission
204 pages
ECON138 Syllabus W23
No ratings yet
ECON138 Syllabus W23
9 pages
Stochastic Calculus: Summary. by Celine Azizieh (Université Libre de Bruxelles)
100% (1)
Stochastic Calculus: Summary. by Celine Azizieh (Université Libre de Bruxelles)
360 pages
Book of Greeks Edition 1.0 (Preview)
No ratings yet
Book of Greeks Edition 1.0 (Preview)
80 pages
A Hybrid Bankruptcy Prediction Model With Dynamic Loadings On Acct-Ratio-Based and Market-Based Info
No ratings yet
A Hybrid Bankruptcy Prediction Model With Dynamic Loadings On Acct-Ratio-Based and Market-Based Info
16 pages
Risk Parity Portfolio vs. Other Asset Allocation Heuristic Portfolios
No ratings yet
Risk Parity Portfolio vs. Other Asset Allocation Heuristic Portfolios
11 pages
Top 12 Python Libraries
No ratings yet
Top 12 Python Libraries
15 pages
Taleb Testimony
No ratings yet
Taleb Testimony
10 pages
Lecture 7 - Markov Switching Models20130520235704
No ratings yet
Lecture 7 - Markov Switching Models20130520235704
85 pages
Heath Jarrow Morton A Interest Rate Model For CVA Calculations
No ratings yet
Heath Jarrow Morton A Interest Rate Model For CVA Calculations
9 pages
Machine Learning: An Applied Econometric Approach
100% (1)
Machine Learning: An Applied Econometric Approach
31 pages
Bovier & Den Hollander - Metastability
No ratings yet
Bovier & Den Hollander - Metastability
578 pages
Eri WP Predicting Decomposing Risk Data Driven Portfolios 0 PDF
No ratings yet
Eri WP Predicting Decomposing Risk Data Driven Portfolios 0 PDF
46 pages
Two Stage Fama Macbeth
100% (1)
Two Stage Fama Macbeth
5 pages
Advance Stats
No ratings yet
Advance Stats
233 pages
Money Management Methods
No ratings yet
Money Management Methods
18 pages
Introduction Mathematical Portfolio Theo
No ratings yet
Introduction Mathematical Portfolio Theo
159 pages
Stochastic Volatiity Models 2005 PDF
No ratings yet
Stochastic Volatiity Models 2005 PDF
35 pages
The Yield Curve and Growth Forecasts
100% (1)
The Yield Curve and Growth Forecasts
8 pages
Calibration of The Schwartz-Smith Model For Commodity Prices
100% (1)
Calibration of The Schwartz-Smith Model For Commodity Prices
77 pages
Change Point Detection
No ratings yet
Change Point Detection
23 pages
Mastering Markets: The Ultimate Guide to Backtesting and Strategy Validation
From Everand
Mastering Markets: The Ultimate Guide to Backtesting and Strategy Validation
William Johnson
No ratings yet
Financial Econometrics: Tools for Quantitative Analysis in Finance
From Everand
Financial Econometrics: Tools for Quantitative Analysis in Finance
William Johnson
No ratings yet
Model risk Second Edition
From Everand
Model risk Second Edition
Gerardus Blokdyk
No ratings yet
Principles of Quantitative Development
From Everand
Principles of Quantitative Development
Manoj Thulasidas
No ratings yet
Modelling Single-name and Multi-name Credit Derivatives
From Everand
Modelling Single-name and Multi-name Credit Derivatives
Dominic O'Kane
No ratings yet
Scientific Inference
From Everand
Scientific Inference
Harold Jeffreys
No ratings yet
Manufacturing and Managing Customer-Driven Derivatives
From Everand
Manufacturing and Managing Customer-Driven Derivatives
Dong Qu
No ratings yet
Unit Root, Cointegration, Granger-Causality, Threshold Regression and Other Econometric Modeling with Economics and Financial Data: 單根，共積，格蘭傑爾因果，門檻迴歸及其他計量經濟模式
From Everand
Unit Root, Cointegration, Granger-Causality, Threshold Regression and Other Econometric Modeling with Economics and Financial Data: 單根，共積，格蘭傑爾因果，門檻迴歸及其他計量經濟模式
Chin-Wei Yang
No ratings yet
Examples and Problems in Mathematical Statistics
From Everand
Examples and Problems in Mathematical Statistics
Shelemyahu Zacks
5/5 (2)
Math IB Questions
No ratings yet
Math IB Questions
11 pages
Three-Dimensional Analysis of Train-Rail-Bridge Interaction Problems
No ratings yet
Three-Dimensional Analysis of Train-Rail-Bridge Interaction Problems
37 pages
Volumetric Efficiency
No ratings yet
Volumetric Efficiency
7 pages
Employee - Attrition - Rate - Jupyter Notebook
No ratings yet
Employee - Attrition - Rate - Jupyter Notebook
62 pages
Common Tags
No ratings yet
Common Tags
3 pages
Sculfort Technical Catalogue
No ratings yet
Sculfort Technical Catalogue
10 pages
Club 3D Geforce 6200 TC Pcie Turbocache Technology: WWF Panda JR
No ratings yet
Club 3D Geforce 6200 TC Pcie Turbocache Technology: WWF Panda JR
2 pages
Injection Moulding
No ratings yet
Injection Moulding
155 pages
Iot Based Aeroponics Agriculture Monitoring System Using Raspberry Pi
No ratings yet
Iot Based Aeroponics Agriculture Monitoring System Using Raspberry Pi
8 pages
A Survey On Neural Network Hardware Accelerators
No ratings yet
A Survey On Neural Network Hardware Accelerators
22 pages
LAMBDA DOSER - HIDOSER, Powder Dosing Instrument Leaflet
No ratings yet
LAMBDA DOSER - HIDOSER, Powder Dosing Instrument Leaflet
3 pages
BlockchainTechnologyandItsApplicationsinBusiness 13F3Y1 by Antef
No ratings yet
BlockchainTechnologyandItsApplicationsinBusiness 13F3Y1 by Antef
17 pages
Mutable Plaits
No ratings yet
Mutable Plaits
12 pages
Hopeless
No ratings yet
Hopeless
7 pages
Chalimbana University BFM 3100 2024
No ratings yet
Chalimbana University BFM 3100 2024
6 pages
Mach4 G and M Code Reference Manual
No ratings yet
Mach4 G and M Code Reference Manual
81 pages
Contest3 Tasks
No ratings yet
Contest3 Tasks
10 pages
Acoustic Treatment and Design
No ratings yet
Acoustic Treatment and Design
54 pages
Cleanroom Validation and Environmental Monitoring
No ratings yet
Cleanroom Validation and Environmental Monitoring
52 pages
Conceptual
No ratings yet
Conceptual
45 pages
Sample Paper 07 (2019-20)
No ratings yet
Sample Paper 07 (2019-20)
26 pages
Wigand 1992
No ratings yet
Wigand 1992
8 pages
Service Manual: S4S Diesel Engine
100% (2)
Service Manual: S4S Diesel Engine
15 pages
Integrative 7-q3
No ratings yet
Integrative 7-q3
2 pages
Grade 7 Multiple 1st Grading
No ratings yet
Grade 7 Multiple 1st Grading
3 pages
Question For Machine Stitch.
No ratings yet
Question For Machine Stitch.
4 pages
12.1 Experimental Techniques - Multiple Choice (Questions Only)
No ratings yet
12.1 Experimental Techniques - Multiple Choice (Questions Only)
12 pages
A Critical Review of NPS
No ratings yet
A Critical Review of NPS
14 pages

A - Basic - Time Series Forecasting Course With Python

Uploaded by

A - Basic - Time Series Forecasting Course With Python

Uploaded by

Operations Research Forum (2023) 4:2

A Basic Time Series Forecasting Course with Python

Keywords Python · Forecasting · Exponential smoothing · ARIMA · Regression

Mathematics Subject Classiffication (2020) 90-04 · 97U50 · 97U70

According to Makridakis et al. [1], forecasting is the process of making predictions

Operational Research and Finance Business Analytics and Management Science

1.2 Contribution of the Paper and Relevant Literature

1.3 Outline of the Paper

2 Preliminaries on Python and Data Analysis

2.1 The Necessary Python Ecosystem

As an ecosystem of open-source software for mathematics, science, and engineer-

2.2 Basic Data Analysis Tools

A time plot is simply a two-dimensional graphical representation of a time series.

(a) (b) (c)

(a) (b) (c)

The function plot_acf from statsmodels can be used to generate ACF

3 Exponential Smoothing Methods

Considering the importance of accuracy in forecasting, we start this section by dis-

(a) (b) (c)

3.2 Exponential Smoothing Methods

where Fj , 1 = 1, … , t is obtained from (2). The function from statsmodels to

(a) (b) (c)

(d) (e) (f) (g)

1. A white noise time series is stationary;

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

4.2 Models and Selection Process

A simplistic though seemingly nice way to introduce the ARIMA methods is to

(1 − 𝜙1 B − … − 𝜙p Bp )(1 − B)d Yt = c + (1 − 𝜃1 B − … − 𝜃q Bq )et , (4)

In Step 1, determining d is the most obvious thing, as it simply results from

ARIMA (p, d, q)(P, D, Q)s. (6)

5 Regression‑Based Forecasting Method

recalling that Y = DEOM is the dependent variable and X1 = AAA, X2 = Tto4,

Codes on Preliminaries on Python and Data Analysis

Codes on Exponential Smoothing Methods

Codes on ARIMA Methods

Codes on the Regression Analysis and Application to Forecasting

You might also like

1.2 Contribution of the Paper and Relevant Literature

1.3 Outline of the Paper

2 Preliminaries on Python and Data Analysis

2.1 The Necessary Python Ecosystem

2.2 Basic Data Analysis Tools

3 Exponential Smoothing Methods

3.2 Exponential Smoothing Methods

4.2 Models and Selection Process

5 Regression‑Based Forecasting Method