0% found this document useful (0 votes)
35 views62 pages

08 ASAP TimeSeriesForcasting - Day 8-11

Uploaded by

George Mathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views62 pages

08 ASAP TimeSeriesForcasting - Day 8-11

Uploaded by

George Mathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

FOUNDATION TO DATA SCIENCE

Business Analytics

MODULE II-Unit:3
Time Series Forecasting-Forecasting Data
Prof. Dr. George Mathew
B.Sc., B.Tech, PGDCA, PGDM, MBA, PhD 1
Time Series Forecasting
2.3.1 What is Time-Series analysis
2.3.1.1 Decomposition
2.3.2 Correlation and Causation
2.3.3 Autocorrelation
2.3.4 Forecasting Data
2.3.5 Autoregressive Modeling, Moving averages and ARIMA
2.3.6 Neural networks for Time series data
Introduction to Autoregressive and Automated
Methods for Time Series Forecasting
Building forecasts is an integral part of any business, whether it is
about revenue, inventory, online sales, or customer demand
forecasting. Time series forecasting remains so fundamental because
there are several problems and related data in the real world that
present a time dimension.
Applying machine learning models to accelerate forecasts enables
scalability, performance, and accuracy of intelligent solutions that can
improve business operations. However, building machine learning
models is often time consuming and complex with many factors to
consider, such as iterating through algorithms, tuning machine
learning hyperparameters, and applying feature engineering
techniques. These options multiply with time series data as data
scientists need to consider additional factors, such as trends,
seasonality, holidays, and external economic variables.
Classical methods for time series forecasting
We will look at the following classical methods:
Autoregression – This time series technique assumes that future observations at next time
stamp are related to the observations at prior time stamps through a linear relationship.
Moving Average – This time series technique leverages previous forecast errors in a
regression approach to forecast future observations at next time stamps. Together with the
autoregression technique, the moving average approach is a crucial element of the more
general autoregressive moving average and autoregressive integrated moving average
models, which present a more sophisticated stochastic configuration.
Autoregressive Moving Average – This time series technique assumes that future
observations at next time stamp can be represented as a linear function of the observations
and residual errors at prior time stamps.
Autoregressive Integrated Moving Average – This time series technique assumes that
future observations at next time stamp can be represented as a linear function of the
differenced observations and residual errors at prior time stamps. We will also look at an
extension of the autoregressive integrated moving average approach, called seasonal
autoregressive integrated moving average with exogenous regressors.
Automated Machine Learning (Automated ML) – This time series technique iterates
through a portfolio of different machine learning algorithms for time series forecasting,
while performing best model selection, hyperparameters tuning, and feature engineering for
your scenario.
Autoregressive Modeling,
Moving averages and
ARIMA
(Day-8-10)
Autoregression
Autoregression is a time series forecasting approach that depends only on
the previous outputs of a time series: this technique assumes that future
observations at next time stamp are related to the observations at prior
time stamps through a linear relationship. In other words, in an
autoregression, a value from a time series is regressed on previous values
from that same time series.
In an autoregression, the output value in the previous time stamp becomes
the input value to predict the next time stamp value, and the errors follow
the usual assumptions about errors in a simple linear regression model. In
autoregressions, the number of preceding input values in the time series
that are used to predict next time stamp value is called order (often we
refer to the order with the letter p). This order value determines how many
previous data points will be used:
usually, data scientists estimate the p value by testing different values and
observing which model resulted with the minimum Akaike information
criterion (AIC).
Data scientists refer to autoregressions in which the current predicted value
(output) is based on the immediately preceding value (input) as first order
autoregression, as illustrated in Figure 1.

First order autoregression approach


If you need to predict the next time stamp value using the previous two values
instead, then the approach is called a second order autoregression, as the next
time stamp value will be predicted using as input two previous values, as
illustrated in Figure 2.

Second order autoregression approach


Autoregressive Models
More generally, an nth-order autoregression is a multiple linear regression in which the
value of the series at any time t is a linear function of the previous values in that same time
series. Because of this serial dependence, another important aspect of autoregressions is
autocorrelation: autocorrelation is a statistical property that occurs when a time series is
linearly related to a previous or lagged version of itself.
Autocorrelation is a crucial concept for autoregressions as the stronger the correlation
between the output (that is, the target variable that we need to predict) and a specific
lagged variable (that is, a group of values at a prior time stamp used as input), the more
weight that autoregression can put on that specific variable. So that variable is considered
to have a strong predictive power.
Moreover, some regression methods, such as linear regression and ordinary least squares
regression, rely on the implicit assumption that there is no presence of autocorrelation in
the training data set used to feed the model. These methods,
like linear regression and ordinary least squares regression, are defined as parametric
methodologies, as the set of data used with them presents a normal distribution and their
regression function is defined in terms of a finite number of unknown parameters that are
estimated from the data.
For all these reasons, autocorrelation can help data scientists select the most appropriate
method for their time series forecasting solutions. Furthermore, autocorrelation can be
very useful to gain additional insights from your data and between your variables, and
identify hidden patterns, such as seasonality and trend in time series data.
Autoregressive Models
Autoregressive Models
ARIMA (Autoregressive Integrated Moving Average
Models)
ARIMA models provide another approach to time series
forecasting. Exponential smoothing and ARIMA models are
the two most widely used approaches to time series
forecasting, and provide complementary approaches to the
problem. While exponential smoothing models are based on
a description of the trend and seasonality in the data,
ARIMA models aim to describe the autocorrelations in the
data.
Before we introduce ARIMA models, we must first discuss
the concept of stationarity and the technique of differencing
time series.
Handbook: page 36
Stationarity and differencing
A stationary time series is one whose properties do not depend on the time
at which the series is observed. Thus, time series with trends, or with
seasonality, are not stationary — the trend and seasonality will affect the
value of the time series at different times. On the other hand, a white noise
series is stationary — it does not matter when you observe it, it should look
much the same at any point in time.
Some cases can be confusing — a time series with cyclic behaviour (but
with no trend or seasonality) is stationary. This is because the cycles are not
of a fixed length, so before we observe the series we cannot be sure where
the peaks and troughs of the cycles will be.
In general, a stationary time series will have no predictable patterns in the
longterm.
Time plots will show the series to be roughly horizontal (although some
cyclic behaviour is possible), with constant variance.
Stationarity and differencing
Consider the nine series plotted in the figure above. Which of these series
are stationary?
(a) Google stock price for 200 consecutive days;
(b) Daily change in the Google stock price for 200 consecutive days;
(c) Annual number of strikes in the US;
(d) Monthly sales of new one-family houses sold in the US;
(e) Annual price of a dozen eggs in the US (constant dollars);
(f) Monthly total of pigs slaughtered in Victoria, Australia;
(g) Annual total of lynx trapped in the McKenzie River district of north-west
Canada;
(h) Monthly Australian beer production;
(i) Monthly Australian electricity production.

Obvious seasonality rules out series (d), (h) and (i).


Trends and changing levels rules out series (a), (c), (e), (f) and (i).
Increasing variance also rules out (i).
That leaves only (b) and (g) as stationary series.
Autoregressive Models
We can see that the value of the PACF crosses the 5%
significance threshold at lag 3. This is consistent with the
results from the ar() function available in R’s stats package.
ar() automatically chooses the order of an autoregressive
model if one is not specified.
If we look at the documentation for the ar() function, we can
see that the order selected is determined (with the default
parameters we left undisturbed) based on the Akaike
information criterion (AIC). This is helpful to know because it
shows that the visual selection we made by examining the
PACF is consistent with the selection that would be made by
minimizing an information criterion. These are two different
ways of selecting the order of the model, but in this case
they are consistent.
Akaike Information Criterion
The AIC of a model is equal to AIC = 2k – 2lnL where k is the number of
parameters of the model and L is the maximum likelihood value for that
function. In general, we want to lessen the complexity of the model (i.e.,
lessen k) while increasing the likelihood/ goodness-of-fit of the model (i.e.,
L). So we will favor models with smaller AIC values over those with greater
AIC values.
A likelihood function is a measure of how likely a particular set of
parameters for a function is relative to other parameters for that function
given the data. So imagine you’re fitting a linear regression of x on y of the
following data:
xy
11
22
33
If you were fitting this to the model y = b × x, your likelihood function would
tell you that an estimate of b = 1 was far more likely than an estimate of b =
0. You can think of likelihood functions as a tool for helping you identify the
most likely true parameters of a model given a set of data.
Forecasting with an AR(p) process
AR(p) Models Are Moving Window Functions
Forecasting many steps into the future
So far we have done a single-step-ahead forecast.
However, we may want to predict even further into the future.
we wanted to produce a two-step-ahead forecast instead of a
one-step-ahead forecast.
What we would do is first produce the one-step-ahead
forecast, and then use this to furnish the yt value we need to
predict yt+1.
Moving Average Models
A moving average (MA) model relies on a picture of a process in which the
value at each point in time is a function of the recent past value “error”
terms, each of which is independent from the others. We will review this
model in the same series of steps we used to study AR models.
In many cases, an MA process can be expressed as an infinite order AR
process. Likewise, in many cases an AR process can be expressed as an
infinite order MA process
The model
A moving average model can be expressed similarly to an autoregressive
model except that the terms included in the linear equation refer to present
and past error terms rather than present and past values of the process
itself. So an MA model of order q is expressed as:
yt = μ + et + θ1 × et -1 + θ2 × et -2... + θq × et –q
Do not confuse the MA model with a moving average. They are
not the same thing. Once you know how to fit a moving average
process, you can even compare the fit of an MA model to a moving
average of the underlying time series.
Autoregressive Integrated Moving Average Models
Now we that we have examined AR and MA models individually, we look to
the Autoregressive Integrated Moving Average (ARIMA) model, which
combines these, recognizing that the same time series can have both
underlying AR and MA model dynamics. This alone would lead us to an
ARMA model, but we extend to the ARIMA model, which accounts for
differencing, a way of removing trends and rendering a time series
stationary.
What Is Differencing?
As discussed earlier in the book, differencing is converting a time series of
values into a time series of changes in values over time. Most often this is
done by calculating pairwise differences of adjacent points in time, so that
the value of the differenced series at a time t is the value at time t minus the
value at time t – 1. However, differencing can also be performed on
different lag windows, as convenient
ARIMA models continue to deliver near state-of-the-art performance,
particularly in cases of small data sets where more sophisticated machine
learning or deep learning models are not at their best. However, even
ARIMA models pose the danger of overfitting despite their relative
simplicity.
Lag Plot
Lag plots are used to check if a data set or time
series is random: random data should not exhibit
any structure in the lag plot.
A lag plot checks whether a data set or time
series is random or not. Random data should
not exhibit any identifiable structure in the lag
plot. Non-random structure in the lag plot
indicates that the underlying data are not
random. Several common patterns for lag plots
are shown in the examples below. Sample
Plot.
Lag Plot
Lag Plot
Lag Plot

Lag plot results from ts_data_load set. we can see a large


concentration of energy values along a diagonal line of the
plot. It clearly shows a relationship or some correlation
between those observations of the data set.
Autocorrelation plot results from ts_data_load_subset:
The autocorrelation plot shows the value of the autocorrelation function on the vertical
axis. It can range from –1 to 1. The horizontal lines displayed in the plot correspond to 95
percent and 99 percent confidence bands, and the dashed line is 99 percent confidence
band. The
autocorrelation plot is intended to reveal whether the data points of a time series
are positively correlated, negatively correlated, or independent of each other.
A plot of the autocorrelation of a time series by lag is also called the autocorrelation
function (ACF).
ACF and PACF plots
ACF and PACF plots
The concepts and respective plots of ACF and PACF functions become
particularly important when data scientists need to understand and determine
the
order of autoregressive and moving average time series methods. There are two
methods that you can leverage to identify the order of an AR(p) model:
➢ The ACF and PACF functions
➢ The information criteria
ACF is an autocorrelation function that provides you
with the information of how much a series is autocorrelated with its lagged
values. In simple terms, it describes how well the present value of the series is
related with its past values.
A time series data set can have components like trend, seasonality, and cyclic
patterns. ACF considers all these components while finding correlations.
ACF and PACF plots
On the other side, PACF is another important function that, instead
of finding correlations of present values with lags like ACF, finds
correlation of the residuals with the next lag. It is a function that
measures the incremental benefit of adding another lag. So if
through the PACF function we discover that there is hidden
information in the residual that can be modeled by the next lag, we
might get a good correlation, and we will keep that next lag as a
feature while modeling.
Now let's see why these two functions are important when building
an autoregression model. As mentioned at the beginning of this
chapter, an autoregression is a model based on the assumption that
present values of a time series can be obtained using previous
values of the same time series: the present value is a weighted
average of its past values.
ACF and PACF plots
In order to avoid multicollinear features for time series models, it is necessary to
find optimum features or order of the autoregression process using the PACF
plot, as it removes variations explained by earlier lags, so we get only the
relevant features .

Notice that we have good positive correlation with the lags up to lag number 6; this is the
point where the ACF plot cuts the upper confidence threshold. Although we have good
correlation up to the sixth lag, we cannot use all of them as it will create a
multicollinearity problem; that's why we turn to the PACF plot to get only the most
relevant lags.
ACF and PACF plots

In Figure 4.9, we can see that lags up to 6 have good correlation before the plot
first cuts the upper confidence interval. This is our p value, which is the order of
our autoregression process. We can model the given autoregression process
using linear combination of first 6 lags. In Figure 4.9, we can also see that lags
up to 1 have good correlation before the plot first cuts the upper confidence
interval. This is our p value, the order of our autoregression process. We can then
model this autoregression process using the first lag.
The more lag variables you include in your model, the better the model will fit
ACF and PACF plots
The more lag variables you include in your model, the better the model will fit
the data; however, this can also represent a risk of overfitting your data. The
information criteria adjust the goodness-of-fit of a model by imposing a penalty
based on the number of parameters used. There are two popular adjusted
goodness-of-fit measures:
• AIC
• BIC
In order to get the information from these two measures, you can use the
summary() function, the params attribute, or the aic and bic attributes in
Python. These information criteria are used to fit several models, each with a
different number of parameters, and choose the one with the lowest Bayesian
information criterion. For example, if we have an AR(5) model, the lowest
information criterion results will denote a value of 5.
Beginning in version 0.11, statsmodels has introduced a
Autocorrelation Plot
Autocorrelation plots are a commonly-used tool for
checking randomness in a data set. This
randomness is ascertained by computing
autocorrelations for data values at varying time lags.
If random, such autocorrelations should be near zero
for any and all time-lag separations. If non-random,
then one or more of the autocorrelations will be
significantly non-zero.In addition, autocorrelation
plots are used in the model identification stage
for autoregressive, moving average time series
models.
Self-Correlation/ autocorrelation
At its most fundamental, self-correlation of a time
series is the idea that a value in a time series at one
given point in time may have a correlation to the
value at another point in time. Note that “self-
correlation” is being used here informally to describe
a general idea rather than a technical one.
In particular, autocorrelation asks the more
general question of whether there is a
correlation between any two points in a specific
time series with a specific fixed distance
between them. We’ll look at this in more detail
next, as well as final elaboration of partial
The autocorrelation function
Autocorrelation, also known as serial correlation, is
the correlation of a signal with a delayed copy of itself
as a function of the delay. Informally, it is the
similarity between observations as a function of the
time lag between them.
Autocorrelation gives you an idea of how
data points at different points in time are
linearly related to one another as a function
of their time difference.
The autocorrelation function
There are a few important facts about the ACF, mathematically speaking:
• The ACF of a periodic function has the same periodicity as the original
process.
• The autocorrelation of the sum of periodic functions is the sum of the
autocorrelations of each function separately.
• All time series have an autocorrelation of 1 at lag 0.
• The autocorrelation of a sample of white noise will have a value of
approximately 0 at all lags other than 0.
• The ACF is symmetric with respect to negative and positive lags, so only
positive lags need to be considered explicitly. You can try plotting a
manually calculated ACF to prove this.
• A statistical rule for determining a significant nonzero ACF estimate is
given by a “critical region” with bounds at +/–1.96 × sqrt(n). This rule relies
on a sufficiently large sample size and a finite variance for the process.
Autocorrelation Plot
Note that uncorrelated does not necessarily mean random.
Data that has significant autocorrelation is not random.
However, data that does not show significant autocorrelation
can still exhibit non-randomness in other ways.
Autocorrelation is just one measure of randomness. In the
context of model validation checking for autocorrelation is
typically a sufficient test of randomness since the residuals
from a poor fitting models tend to display non-subtle
randomness. However, some applications require a more
rigorous determination of randomness. In these cases, a
battery of tests, which might include checking for
autocorrelation, are applied since data can be non-random in
many different and often subtle ways. An example of where a
more rigorous check for randomness is needed would be in
testing random number generators.
The partial autocorrelation function
The partial autocorrelation function (PACF) can be trickier to
understand than the ACF. The partial autocorrelation of a time
series for a given lag is the partial correlation of the time
series with itself at that lag given all the information between
the two points in time.
As we can see, our ACF plot is consistent with the
aforementioned property; the ACF of the sum of two periodic
series is the sum of the individual ACFs. You can see this
most clearly by noticing the positive → negative → positive →
negative sections of the ACF correlating to the more slowly
oscillating ACF. Within these waves, you can see
the faster fluctuation of the higher-frequency ACF.
Autocorrelation plot results from ts_data_load_subset with plot_pacf() function
from the statsmodels library. we can see that lags up to 6 have good correlation
before the plot first cuts the upper confidence interval. This is our p value, which is
the order of our autoregression process
White noise
For white noise series, we expect each autocorrelation to be
close to zero. Of course, they will not be exactly equal to zero
as there is some random variation.
For a white noise series, we expect 95% of the spikes in the
ACF to lie within± 2/ 𝑇 where T is the length of the time series.
It is common to plot these bounds on a graph of the ACF (the
blue dashed lines above). If one or more large spikes are
outside these bounds, or if substantially more than 5% of
spikes are outside these bounds, then the series is
probably not white noise.
In this example, T=50 and so the bounds are at ± 2/ 50 =± 0.
All of the autocorrelation coefficients lie within these limits,
confirming that the data are white noise
Selecting the best model
Healthcare time series
Time series: the backbone of bespoke medicine
Time series datasets such as electronic health records (EHR) and
registries represent valuable (but imperfect) sources of
information spanning a patient’s entire lifetime of care. Whether
intentionally or not, they capture genetic and lifestyle risks,
signal the onset of diseases, show the advent of new morbidities
and comorbidities, indicate the time and stage of diagnosis, and
document the development of treatment plans, as well as their
efficacy.
Using patient data generated at each of these points in the
pathway of care, we can develop machine learning models that
give us a much deeper and more interconnected understanding
of individual trajectories of health and disease—including the
number of states needed to provide an accurate representation
of a disease, how to infer a patient’s current state, what triggers
transitions from one state to another, and much more.
Healthcare time series
Healthcare time series
ARIMA: nonseasonal models
ARIMA(p,d,q) forecasting equation: ARIMA models are, in theory, the
most general class of models for forecasting a time series which can be
made to be “stationary” by differencing (if necessary), perhaps in
conjunction with nonlinear transformations such as logging or deflating (if
necessary). A random variable that is a time series is stationary if its
statistical properties are all constant over time. A stationary series has
no trend, its variations around its mean have a constant amplitude, and it
wiggles in a consistent fashion, i.e., its short-term random time patterns
always look the same in a statistical sense. The latter condition means
that its autocorrelations (correlations with its own prior deviations from the
mean) remain constant over time, or equivalently, that its power
spectrum remains constant over time. A random variable of this form can
be viewed (as usual) as a combination of signal and noise, and the signal
(if one is apparent) could be a pattern of fast or slow mean reversion, or
sinusoidal oscillation, or rapid alternation in sign, and it could also have a
seasonal component. An ARIMA model can be viewed as a “filter” that
tries to separate the signal from the noise, and the signal is then
extrapolated into the future to obtain forecasts.
ARIMA: nonseasonal models
The ARIMA forecasting equation for a stationary time series is a linear (i.e.,
regression-type) equation in which the predictors consist of lags of the
dependent variable and/or lags of the forecast errors. That is:
Predicted value of Y = a constant and/or a weighted sum of one or more
recent values of Y and/or a weighted sum of one or more recent values of
the errors.
If the predictors consist only of lagged values of Y, it is a pure autoregressive
(“self-regressed”) model, which is just a special case of a regression model and
which could be fitted with standard regression software. For example, a first-
order autoregressive (“AR(1)”) model for Y is a simple regression model in which
the independent variable is just Y lagged by one period (LAG(Y,1) If some of the
predictors are lags of the errors, an ARIMA model it is NOT a linear regression
model, because there is no way to specify “last period’s error” as an independent
variable: the errors must be computed on a period-to-period basis when the
model is fitted to the data. From a technical standpoint, the problem with using
lagged errors as predictors is that the model’s predictions are not linear functions
of the coefficients, even though they are linear functions of the past data. So,
coefficients in ARIMA models that include lagged errors must be estimated
by nonlinear optimization methods (“hill-climbing”) rather than by just solving a
system of equations.
ARIMA: nonseasonal models
The acronym ARIMA stands for Auto-Regressive Integrated Moving Average.
Lags of the stationarized series in the forecasting equation are called
"autoregressive" terms, lags of the forecast errors are called "moving average"
terms, and a time series which needs to be differenced to be made stationary is
said to be an "integrated" version of a stationary series. Random-walk and
random-trend models, autoregressive models, and exponential smoothing
models are all special cases of ARIMA models.
A nonseasonal ARIMA model is classified as an "ARIMA(p,d,q)" model
p is the number of autoregressive terms
d is the number of nonseasonal differences needed for stationarity, and
q is the number of lagged forecast errors in the prediction equation
The forecasting equation is constructed as follows. First, let y denote the
dth difference of Y, which means:
If d=0: yt = Yt
If d=1: yt = Yt - Yt-1
If d=2: yt = (Yt - Yt-1) - (Yt-1 - Yt-2) = Yt - 2Yt-1 + Yt-2
Note that the second difference of Y (the d=2 case) is not the difference from 2
periods ago. Rather, it is the first-difference-of-the-first difference, which is the
discrete analog of a second derivative, i.e., the local acceleration of the series
rather than its local trend
ARIMA: nonseasonal models
ARIMA(1,1,0) = differenced first-order
autoregressive model: If the errors of a random walk
model are autocorrelated, perhaps the problem can be
fixed by adding one lag of the dependent variable to
the prediction equation--i.e., by regressing the first
difference of Y on itself lagged by one period. This
would yield the following prediction equation:
Ŷt - Yt-1 = μ + ϕ1(Yt-1 - Yt-2)
Ŷt - Yt-1 = μ
which can be rearranged to
Ŷt = μ + Yt-1 + ϕ1 (Yt-1 - Yt-2)
This is a first-order autoregressive model with one
order of nonseasonal differencing and a constant
term--i.e., an ARIMA(1,1,0) model.
ARIMA: nonseasonal models
ARIMA(0,1,1) without constant = simple
exponential smoothing
ARIMA(0,1,1) with constant = simple exponential
smoothing with growth
ARIMA(0,2,1) or (0,2,2) without constant = linear
exponential smoothing
The ARIMA(0,2,2) model without constant predicts that the
second difference of the series equals a linear function of the
last two forecast errors
ARIMA(1,1,2) without constant = damped-trend linear
exponential smoothing:

ARMA Model.pdf
Selecting parameters

forecast auto.arima()
Selecting parameters
Manually fitting a model.
Autoregressive Integrated Moving Average
(ML for Timeseries foresting)
Autoregressive integrated moving average (ARIMA) models are considered a
development of the simpler autoregressive moving average (ARMA) models
and include the notion of integration.
Indeed, autoregressive moving average (ARMA) and autoregressive integrated
moving average (ARIMA) present many similar characteristics: their elements
are identical, in the sense that both of them leverage a general autoregression
AR(p) and general moving average model MA(q). As you previously learned,
the AR(p) model makes predictions using previous values in the time series,
while MA(q) makes predictions using the series mean and previous errors.
The main differences between ARMA and ARIMA methods are the notions of
integration and differencing. An ARMA model is a stationary model, and it
works very well with stationary time series (whose statistical properties, such as
mean, autocorrelation, and seasonality, do not depend on the time at which
the series has been observed).
Autoregressive Integrated Moving Average
It is possible to stationarize a time series through differencing techniques (for
example, by subtracting a value observed at time t from a value observed at time
t−1). The process of estimating how many nonseasonal differences are needed to
make a time series stationarity is called integration (I) or integrated method.
ARIMA models have three main components, denoted as p, d, q; in Python you
can assign integer values to each of these components to indicate the specific
ARIMA model you need to apply.
These parameters are defined as follows:
p stands for the number of lag variables included in the ARIMA model, also
called the lag order.
d stands for the number of times that the raw values in a time series data set
are differenced, also called the degree of differencing.
q denotes the magnitude of the moving average window, also called the
order of moving average.
In the case that one of the parameters above does not need to be used, a value of
0 can be assigned to that specific parameter, which indicates to not use that
element of the model.
SARIMAX
Seasonal auto regressive integrated moving average
with exogenous factors in statsmodels
Let's now take a look at an extension of the ARIMA model in
Python, called SARIMAX, which stands for seasonal autoregressive
integrated moving average with exogenous factors. Data scientists
usually apply SARIMAX when they have to deal with time series
data sets that have seasonal cycles. Moreover, SARIMAX models
support seasonality and exogenous factors and, as a consequence,
they require not only the p, d, and q arguments that ARIMA
requires, but also another set of p, d, and q arguments for the
seasonality aspect as well as a parameter called s, which is the
periodicity of the seasonal cycle in your time series data set.
Python supports the SARIMAX() class with the statsmodels library.

You might also like