08 ASAP TimeSeriesForcasting - Day 8-11
08 ASAP TimeSeriesForcasting - Day 8-11
Business Analytics
MODULE II-Unit:3
Time Series Forecasting-Forecasting Data
Prof. Dr. George Mathew
B.Sc., B.Tech, PGDCA, PGDM, MBA, PhD 1
Time Series Forecasting
2.3.1 What is Time-Series analysis
2.3.1.1 Decomposition
2.3.2 Correlation and Causation
2.3.3 Autocorrelation
2.3.4 Forecasting Data
2.3.5 Autoregressive Modeling, Moving averages and ARIMA
2.3.6 Neural networks for Time series data
Introduction to Autoregressive and Automated
Methods for Time Series Forecasting
Building forecasts is an integral part of any business, whether it is
about revenue, inventory, online sales, or customer demand
forecasting. Time series forecasting remains so fundamental because
there are several problems and related data in the real world that
present a time dimension.
Applying machine learning models to accelerate forecasts enables
scalability, performance, and accuracy of intelligent solutions that can
improve business operations. However, building machine learning
models is often time consuming and complex with many factors to
consider, such as iterating through algorithms, tuning machine
learning hyperparameters, and applying feature engineering
techniques. These options multiply with time series data as data
scientists need to consider additional factors, such as trends,
seasonality, holidays, and external economic variables.
Classical methods for time series forecasting
We will look at the following classical methods:
Autoregression – This time series technique assumes that future observations at next time
stamp are related to the observations at prior time stamps through a linear relationship.
Moving Average – This time series technique leverages previous forecast errors in a
regression approach to forecast future observations at next time stamps. Together with the
autoregression technique, the moving average approach is a crucial element of the more
general autoregressive moving average and autoregressive integrated moving average
models, which present a more sophisticated stochastic configuration.
Autoregressive Moving Average – This time series technique assumes that future
observations at next time stamp can be represented as a linear function of the observations
and residual errors at prior time stamps.
Autoregressive Integrated Moving Average – This time series technique assumes that
future observations at next time stamp can be represented as a linear function of the
differenced observations and residual errors at prior time stamps. We will also look at an
extension of the autoregressive integrated moving average approach, called seasonal
autoregressive integrated moving average with exogenous regressors.
Automated Machine Learning (Automated ML) – This time series technique iterates
through a portfolio of different machine learning algorithms for time series forecasting,
while performing best model selection, hyperparameters tuning, and feature engineering for
your scenario.
Autoregressive Modeling,
Moving averages and
ARIMA
(Day-8-10)
Autoregression
Autoregression is a time series forecasting approach that depends only on
the previous outputs of a time series: this technique assumes that future
observations at next time stamp are related to the observations at prior
time stamps through a linear relationship. In other words, in an
autoregression, a value from a time series is regressed on previous values
from that same time series.
In an autoregression, the output value in the previous time stamp becomes
the input value to predict the next time stamp value, and the errors follow
the usual assumptions about errors in a simple linear regression model. In
autoregressions, the number of preceding input values in the time series
that are used to predict next time stamp value is called order (often we
refer to the order with the letter p). This order value determines how many
previous data points will be used:
usually, data scientists estimate the p value by testing different values and
observing which model resulted with the minimum Akaike information
criterion (AIC).
Data scientists refer to autoregressions in which the current predicted value
(output) is based on the immediately preceding value (input) as first order
autoregression, as illustrated in Figure 1.
Notice that we have good positive correlation with the lags up to lag number 6; this is the
point where the ACF plot cuts the upper confidence threshold. Although we have good
correlation up to the sixth lag, we cannot use all of them as it will create a
multicollinearity problem; that's why we turn to the PACF plot to get only the most
relevant lags.
ACF and PACF plots
In Figure 4.9, we can see that lags up to 6 have good correlation before the plot
first cuts the upper confidence interval. This is our p value, which is the order of
our autoregression process. We can model the given autoregression process
using linear combination of first 6 lags. In Figure 4.9, we can also see that lags
up to 1 have good correlation before the plot first cuts the upper confidence
interval. This is our p value, the order of our autoregression process. We can then
model this autoregression process using the first lag.
The more lag variables you include in your model, the better the model will fit
ACF and PACF plots
The more lag variables you include in your model, the better the model will fit
the data; however, this can also represent a risk of overfitting your data. The
information criteria adjust the goodness-of-fit of a model by imposing a penalty
based on the number of parameters used. There are two popular adjusted
goodness-of-fit measures:
• AIC
• BIC
In order to get the information from these two measures, you can use the
summary() function, the params attribute, or the aic and bic attributes in
Python. These information criteria are used to fit several models, each with a
different number of parameters, and choose the one with the lowest Bayesian
information criterion. For example, if we have an AR(5) model, the lowest
information criterion results will denote a value of 5.
Beginning in version 0.11, statsmodels has introduced a
Autocorrelation Plot
Autocorrelation plots are a commonly-used tool for
checking randomness in a data set. This
randomness is ascertained by computing
autocorrelations for data values at varying time lags.
If random, such autocorrelations should be near zero
for any and all time-lag separations. If non-random,
then one or more of the autocorrelations will be
significantly non-zero.In addition, autocorrelation
plots are used in the model identification stage
for autoregressive, moving average time series
models.
Self-Correlation/ autocorrelation
At its most fundamental, self-correlation of a time
series is the idea that a value in a time series at one
given point in time may have a correlation to the
value at another point in time. Note that “self-
correlation” is being used here informally to describe
a general idea rather than a technical one.
In particular, autocorrelation asks the more
general question of whether there is a
correlation between any two points in a specific
time series with a specific fixed distance
between them. We’ll look at this in more detail
next, as well as final elaboration of partial
The autocorrelation function
Autocorrelation, also known as serial correlation, is
the correlation of a signal with a delayed copy of itself
as a function of the delay. Informally, it is the
similarity between observations as a function of the
time lag between them.
Autocorrelation gives you an idea of how
data points at different points in time are
linearly related to one another as a function
of their time difference.
The autocorrelation function
There are a few important facts about the ACF, mathematically speaking:
• The ACF of a periodic function has the same periodicity as the original
process.
• The autocorrelation of the sum of periodic functions is the sum of the
autocorrelations of each function separately.
• All time series have an autocorrelation of 1 at lag 0.
• The autocorrelation of a sample of white noise will have a value of
approximately 0 at all lags other than 0.
• The ACF is symmetric with respect to negative and positive lags, so only
positive lags need to be considered explicitly. You can try plotting a
manually calculated ACF to prove this.
• A statistical rule for determining a significant nonzero ACF estimate is
given by a “critical region” with bounds at +/–1.96 × sqrt(n). This rule relies
on a sufficiently large sample size and a finite variance for the process.
Autocorrelation Plot
Note that uncorrelated does not necessarily mean random.
Data that has significant autocorrelation is not random.
However, data that does not show significant autocorrelation
can still exhibit non-randomness in other ways.
Autocorrelation is just one measure of randomness. In the
context of model validation checking for autocorrelation is
typically a sufficient test of randomness since the residuals
from a poor fitting models tend to display non-subtle
randomness. However, some applications require a more
rigorous determination of randomness. In these cases, a
battery of tests, which might include checking for
autocorrelation, are applied since data can be non-random in
many different and often subtle ways. An example of where a
more rigorous check for randomness is needed would be in
testing random number generators.
The partial autocorrelation function
The partial autocorrelation function (PACF) can be trickier to
understand than the ACF. The partial autocorrelation of a time
series for a given lag is the partial correlation of the time
series with itself at that lag given all the information between
the two points in time.
As we can see, our ACF plot is consistent with the
aforementioned property; the ACF of the sum of two periodic
series is the sum of the individual ACFs. You can see this
most clearly by noticing the positive → negative → positive →
negative sections of the ACF correlating to the more slowly
oscillating ACF. Within these waves, you can see
the faster fluctuation of the higher-frequency ACF.
Autocorrelation plot results from ts_data_load_subset with plot_pacf() function
from the statsmodels library. we can see that lags up to 6 have good correlation
before the plot first cuts the upper confidence interval. This is our p value, which is
the order of our autoregression process
White noise
For white noise series, we expect each autocorrelation to be
close to zero. Of course, they will not be exactly equal to zero
as there is some random variation.
For a white noise series, we expect 95% of the spikes in the
ACF to lie within± 2/ 𝑇 where T is the length of the time series.
It is common to plot these bounds on a graph of the ACF (the
blue dashed lines above). If one or more large spikes are
outside these bounds, or if substantially more than 5% of
spikes are outside these bounds, then the series is
probably not white noise.
In this example, T=50 and so the bounds are at ± 2/ 50 =± 0.
All of the autocorrelation coefficients lie within these limits,
confirming that the data are white noise
Selecting the best model
Healthcare time series
Time series: the backbone of bespoke medicine
Time series datasets such as electronic health records (EHR) and
registries represent valuable (but imperfect) sources of
information spanning a patient’s entire lifetime of care. Whether
intentionally or not, they capture genetic and lifestyle risks,
signal the onset of diseases, show the advent of new morbidities
and comorbidities, indicate the time and stage of diagnosis, and
document the development of treatment plans, as well as their
efficacy.
Using patient data generated at each of these points in the
pathway of care, we can develop machine learning models that
give us a much deeper and more interconnected understanding
of individual trajectories of health and disease—including the
number of states needed to provide an accurate representation
of a disease, how to infer a patient’s current state, what triggers
transitions from one state to another, and much more.
Healthcare time series
Healthcare time series
ARIMA: nonseasonal models
ARIMA(p,d,q) forecasting equation: ARIMA models are, in theory, the
most general class of models for forecasting a time series which can be
made to be “stationary” by differencing (if necessary), perhaps in
conjunction with nonlinear transformations such as logging or deflating (if
necessary). A random variable that is a time series is stationary if its
statistical properties are all constant over time. A stationary series has
no trend, its variations around its mean have a constant amplitude, and it
wiggles in a consistent fashion, i.e., its short-term random time patterns
always look the same in a statistical sense. The latter condition means
that its autocorrelations (correlations with its own prior deviations from the
mean) remain constant over time, or equivalently, that its power
spectrum remains constant over time. A random variable of this form can
be viewed (as usual) as a combination of signal and noise, and the signal
(if one is apparent) could be a pattern of fast or slow mean reversion, or
sinusoidal oscillation, or rapid alternation in sign, and it could also have a
seasonal component. An ARIMA model can be viewed as a “filter” that
tries to separate the signal from the noise, and the signal is then
extrapolated into the future to obtain forecasts.
ARIMA: nonseasonal models
The ARIMA forecasting equation for a stationary time series is a linear (i.e.,
regression-type) equation in which the predictors consist of lags of the
dependent variable and/or lags of the forecast errors. That is:
Predicted value of Y = a constant and/or a weighted sum of one or more
recent values of Y and/or a weighted sum of one or more recent values of
the errors.
If the predictors consist only of lagged values of Y, it is a pure autoregressive
(“self-regressed”) model, which is just a special case of a regression model and
which could be fitted with standard regression software. For example, a first-
order autoregressive (“AR(1)”) model for Y is a simple regression model in which
the independent variable is just Y lagged by one period (LAG(Y,1) If some of the
predictors are lags of the errors, an ARIMA model it is NOT a linear regression
model, because there is no way to specify “last period’s error” as an independent
variable: the errors must be computed on a period-to-period basis when the
model is fitted to the data. From a technical standpoint, the problem with using
lagged errors as predictors is that the model’s predictions are not linear functions
of the coefficients, even though they are linear functions of the past data. So,
coefficients in ARIMA models that include lagged errors must be estimated
by nonlinear optimization methods (“hill-climbing”) rather than by just solving a
system of equations.
ARIMA: nonseasonal models
The acronym ARIMA stands for Auto-Regressive Integrated Moving Average.
Lags of the stationarized series in the forecasting equation are called
"autoregressive" terms, lags of the forecast errors are called "moving average"
terms, and a time series which needs to be differenced to be made stationary is
said to be an "integrated" version of a stationary series. Random-walk and
random-trend models, autoregressive models, and exponential smoothing
models are all special cases of ARIMA models.
A nonseasonal ARIMA model is classified as an "ARIMA(p,d,q)" model
p is the number of autoregressive terms
d is the number of nonseasonal differences needed for stationarity, and
q is the number of lagged forecast errors in the prediction equation
The forecasting equation is constructed as follows. First, let y denote the
dth difference of Y, which means:
If d=0: yt = Yt
If d=1: yt = Yt - Yt-1
If d=2: yt = (Yt - Yt-1) - (Yt-1 - Yt-2) = Yt - 2Yt-1 + Yt-2
Note that the second difference of Y (the d=2 case) is not the difference from 2
periods ago. Rather, it is the first-difference-of-the-first difference, which is the
discrete analog of a second derivative, i.e., the local acceleration of the series
rather than its local trend
ARIMA: nonseasonal models
ARIMA(1,1,0) = differenced first-order
autoregressive model: If the errors of a random walk
model are autocorrelated, perhaps the problem can be
fixed by adding one lag of the dependent variable to
the prediction equation--i.e., by regressing the first
difference of Y on itself lagged by one period. This
would yield the following prediction equation:
Ŷt - Yt-1 = μ + ϕ1(Yt-1 - Yt-2)
Ŷt - Yt-1 = μ
which can be rearranged to
Ŷt = μ + Yt-1 + ϕ1 (Yt-1 - Yt-2)
This is a first-order autoregressive model with one
order of nonseasonal differencing and a constant
term--i.e., an ARIMA(1,1,0) model.
ARIMA: nonseasonal models
ARIMA(0,1,1) without constant = simple
exponential smoothing
ARIMA(0,1,1) with constant = simple exponential
smoothing with growth
ARIMA(0,2,1) or (0,2,2) without constant = linear
exponential smoothing
The ARIMA(0,2,2) model without constant predicts that the
second difference of the series equals a linear function of the
last two forecast errors
ARIMA(1,1,2) without constant = damped-trend linear
exponential smoothing:
ARMA Model.pdf
Selecting parameters
forecast auto.arima()
Selecting parameters
Manually fitting a model.
Autoregressive Integrated Moving Average
(ML for Timeseries foresting)
Autoregressive integrated moving average (ARIMA) models are considered a
development of the simpler autoregressive moving average (ARMA) models
and include the notion of integration.
Indeed, autoregressive moving average (ARMA) and autoregressive integrated
moving average (ARIMA) present many similar characteristics: their elements
are identical, in the sense that both of them leverage a general autoregression
AR(p) and general moving average model MA(q). As you previously learned,
the AR(p) model makes predictions using previous values in the time series,
while MA(q) makes predictions using the series mean and previous errors.
The main differences between ARMA and ARIMA methods are the notions of
integration and differencing. An ARMA model is a stationary model, and it
works very well with stationary time series (whose statistical properties, such as
mean, autocorrelation, and seasonality, do not depend on the time at which
the series has been observed).
Autoregressive Integrated Moving Average
It is possible to stationarize a time series through differencing techniques (for
example, by subtracting a value observed at time t from a value observed at time
t−1). The process of estimating how many nonseasonal differences are needed to
make a time series stationarity is called integration (I) or integrated method.
ARIMA models have three main components, denoted as p, d, q; in Python you
can assign integer values to each of these components to indicate the specific
ARIMA model you need to apply.
These parameters are defined as follows:
p stands for the number of lag variables included in the ARIMA model, also
called the lag order.
d stands for the number of times that the raw values in a time series data set
are differenced, also called the degree of differencing.
q denotes the magnitude of the moving average window, also called the
order of moving average.
In the case that one of the parameters above does not need to be used, a value of
0 can be assigned to that specific parameter, which indicates to not use that
element of the model.
SARIMAX
Seasonal auto regressive integrated moving average
with exogenous factors in statsmodels
Let's now take a look at an extension of the ARIMA model in
Python, called SARIMAX, which stands for seasonal autoregressive
integrated moving average with exogenous factors. Data scientists
usually apply SARIMAX when they have to deal with time series
data sets that have seasonal cycles. Moreover, SARIMAX models
support seasonality and exogenous factors and, as a consequence,
they require not only the p, d, and q arguments that ARIMA
requires, but also another set of p, d, and q arguments for the
seasonality aspect as well as a parameter called s, which is the
periodicity of the seasonal cycle in your time series data set.
Python supports the SARIMAX() class with the statsmodels library.