Time+Series+Forecasting Monograph
Time+Series+Forecasting Monograph
Series Forecasting
[email protected]
Y21IHWS8GO
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 1
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Figure 20: Plot Actual vs. Forecasted sales using Triple Exponential Smoothing method for 2015-2016
years ...................................................................................................................................................... 38
Figure 21: Plot Actual vs. Forecasted sales using SARIMA model for 2015-2016 years .................... 39
Figure 22: PM2.5 Pollution data……………………………………………………………………….41
Figure 23: PM2.5 Data (split into Training and testing purpose)……………………………………...42
Figure 24: PM2.5 Data ACF Plot……………………………………………………………………...42
Figure 25: Plot of first difference series, ACF and PACF of PM2.5………………………………….44
Figure 26: Residual Diagnostics of the (0,1,1) ARIMA Model……………………………………….46
Figure 27: PM2.5 data forecast using ARIMA (0,1,1)………………………………………………..47
Figure 28: Plot Actual vs. Forecasted PM2.5 data using ARIMA(0,1,1) for 2016-2017 years……….48
Figure 29: Residual Diagnostics of the (0,1,3) ARIMAX Model……………………………………..51
Figure 30: PM2.5 data forecast using ARIMAX (0,1,3)………………………………………………52
Figure 31: Plot Actual vs. Forecasted PM2.5 data using ARIMAX (0,1,3) for 2016-2017 years…….52
Figure 32: Flow chart of Time Series ................................................................................................... 56
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
[email protected]
Y21IHWS8GO
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
Data can be classified into three major groups through their temporal nature, such as:
I. Cross sectional data: Data is collected at a single point in time on one or more variables. Here,
data is not sequential and usually, data points are independent of one another. Regression, random
forest or neural network methods have been applied widely on these data.
II. Time series data: Univariate or multivariate data is observed across time in a sequential manner
at pre-determined and equally-spaced time intervals (such as yearly, monthly, quarterly or
hourly). Ordering among data points is important and cannot be destroyed.
III. Combination of cross sectional and time series data: This is a complex study design where
information on same variables are collected over various points of time. Many survey samplings
make use of panel data.
[email protected]
Y21IHWS8GO
In this monograph we have focused only on the forecasting methods appropriate for time series data
observed at regular intervals.
Prediction for cross-sectional data has been the topic of discussion for other predictive models such as linear
regression, logistic regression, CART, RF, ANN etc.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
Formal definition of time series: A collection of observations that has been observed at regular time
intervals for a certain variable over a given duration is called a time series.
The regular time intervals can be daily (stock market prices), weekly (product manufactured), monthly
(unemployment rate), quarterly (sales of product), annual (GDP), quinquennial, that is in every 5 years
(Census of manufactures), or decennial (census of population).
Time series can be applied in various fields such as economic forecasting, sales forecasting, budgetary
analysis, stock market analysis, yield projection, inventory studies, workload projections, utility studies,
census analysis, process and quality control and many more.
Time series data has several characteristics that make it unique. These characteristics can be stated
below as: -
All observations are dependent: In time series data, each observation is expected to depend on
the past observations.
Missing data must be imputed: Because all data points are sequential in time series, if any data
point is missing, it must be imputed before the actual analysis process commences; otherwise the
[email protected]
proper ordering is not preserved.
Y21IHWS8GO
Two different types of intervals cannot be mixed: Time series data is observed on the same
variable over a given period of time with fixed and regular time intervals. Though data can be
collected at various intervals such as yearly, monthly, weekly, daily, hourly (e.g. temperature)
and/or any specific time-interval, the interval must remain the same throughout the entire range;
e.g. yearly series cannot be combined with quarterly or monthly series.
Objective of Time Series Forecasting: Time series forecasting is applied to extract information from
historical series and is used to predict future behaviour of the same series based on past pattern.
Approaches used for Time Series Forecasting: The following are two major approaches to time series
forecasting.
The two approaches are completely different. Both approaches are discussed with
illustration.
By no means these are the only two approaches of time series forecasting. A few other methods are referred
to in Section 7.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 6
Trend (Tt): When the series increases (or decreases) over the entire length of time. For example, the
price of a share may increase or decrease linearly over a period of time., the sales of a new product may
increase exponentially (or non-linearly). Figure 1 shows increasing linear trend of US GDP growth over
a period of time.
[email protected]
Y21IHWS8GO
Seasonality (St): When a series is observed with more frequently than a year (quarterly or monthly for
example), the series is subject to rhythmic fluctuations which are stable and repeatable each year. For
example, sales of umbrella increase in rainy season whereas sales of AC increase in summers and sales
of woolen clothes increase in winters. This intra-year fluctuation is known as seasonal fluctuation.
Figure 2 contains monthly average temperature that oscillates in a regular pattern over the given period
of time.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 7
Trend and seasonal components are part of systematic components of time series.
Additive Model: Yt = Tt + St + It is considered when the resultant series is the sum of the components.
Multiplicative Model: Yt = Tt * St * It is considered when the resultant time series is the product of the
components.
A series may be considered multiplicative series when the seasonal fluctuations increase as trend
increases. A multiplicative time series can be transformed into an additive series by taking log
transformation i.e.
log(Yt) = log(Tt) + log(St) + log(It)
Decomposition of a time series leads to identification and extraction of the individual components.
Primary objective of decomposition is to study the components of the time series, NOT forecasting.
However, forecasting models can be built on top of the decomposed series.
Case Study 1
[email protected]
A company ABC selling tractors has to forecast its sales for the next 24 months. It has 12 years of past
Y21IHWS8GO
sales data on monthly basis. The data may contain trend, seasonality or both. Objective is to provide a
reasonable forecast for future sales.
Solution:
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 8
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from pylab import rcParams
from statsmodels.graphics.tsaplots import month_plot,plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose,STL
from statsmodels.tsa.api import ExponentialSmoothing
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima_model import ARIMA
import statsmodels.api as sm
df = pd.read_csv('Tractor-Sales.csv')
timestamp = pd.date_range(start='2003-01-01',end='2015-01-01',freq ='M')
df['Time_Stamp'] = timestamp
df.drop(labels='Month-Year',axis=1,inplace=True)
df.set_index(keys='Time_Stamp',drop=True,inplace=True)
rcParams['figure.figsize'] = 15,8
df.plot(grid=True);
[email protected]
Y21IHWS8GO
I. Data values are stored in correct time order and no data is missing.
II. The sales are increasing in numbers, implying presence of trend component.
III. Intra-year stable fluctuations are indicative of seasonal component. As trend increases,
fluctuations are also increasing. This is indicative of multiplicative seasonality.
Note:
All the versions of the libraries used are given in the appendix of the monograph given in page 57.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 9
yearly_sales_across_years.plot()
plt.grid()
plt.legend(loc='best');
[email protected]
Y21IHWS8GO
month_plot(df,ylabel='TractorSalesTS')
plt.grid();
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 10
Figure 4 shows that the sales of tractors are increasing every year in number.
Note in Figure 5 that, the vertical lines represent monthly sales and the horizontal lines represent average
[email protected]
Y21IHWS8GOsales of the given month. Here, it can be observed that average sales are higher in July and August as
compared to other months
In all these above plots the increasing lines that represent sales have seasonal fluctuations along with a
trend. Thus, we can confirm that it is multiplicative seasonality.
Part II: To Identify the components of the given Tractor sales data.
Now decomposition method is applied to identify and separate out the three components (i.e.
trend, seasonality and irregular components) from the given series to observe their independent
properties.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 11
[email protected]
Y21IHWS8GO
Figure 6: Decomposed tractor time series into components using moving average decomposition
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 12
Seasonal_Ind = pd.DataFrame({'Jan':round(decomposition.seasonal.head(12),2).values[0],
'Feb':round(decomposition.seasonal.head(12),2).values[1],
'Mar':round(decomposition.seasonal.head(12),2).values[2],
'Apr':round(decomposition.seasonal.head(12),2).values[3],
'May':round(decomposition.seasonal.head(12),2).values[4],
'Jun':round(decomposition.seasonal.head(12),2).values[5],
'Jul':round(decomposition.seasonal.head(12),2).values[6],
'Aug':round(decomposition.seasonal.head(12),2).values[7],
'Sep':round(decomposition.seasonal.head(12),2).values[8],
'Oct':round(decomposition.seasonal.head(12),2).values[9],
'Nov':round(decomposition.seasonal.head(12),2).values[10],
'Dec':round(decomposition.seasonal.head(12),2).values[11]},
index=range(1,2))
Seasonal_Ind
Figure 6 indicates that trend is increasing linearly. Since this is monthly data, there are 12 seasonal
indices. Sum of the monthly indices must be 12. In July tractor sales is the highest among all months in
the same year, as borne by the highest value of the seasonal component whereas in November (lowest
value of the seasonality) tractor sales is the lowest.
[email protected]
3.2.2.
Y21IHWS8GO Using Seasonal and Trend decomposition by Loess
Owing to some limitations in the moving average decomposition, Loess decomposition has
been proposed. This is more versatile but does not admit multiplicative seasonality. Hence log
transformation is used to convert multiplicative seasonality into additive seasonality.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 13
Before a forecast method is proposed, the method needs to be validated. For that purpose, data has to
be split into two sets i.e. training and testing. Training data helps in identifying and fitting right model(s)
and test data is used to validate the same.
In case of time series data, the test data is the most recent part of the series so that the ordering in the
data is preserved.
Part III: To Propose best model for the Tractor sales data
Forecasting accuracy measures compare the predicted values against the observed values to quantify the
predictive power of the proposed model. Mathematically, it can be defined as
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 14
1
Mean Absolute Deviation (MAD): ∑𝑛𝑡=1 |𝑒𝑡 |
𝑛
Mean Absolute Percentage Error (MAPE): This method is used extensively in time series because
it acts as a unit free criteria; therefore, performance of forecasted values can be easily
1 𝑛 |𝑒𝑡 |
compared. 𝑀𝐴𝑃𝐸= ∑𝑡=1 * 100
𝑛 𝑡 𝑌
MAPE is usually expressed as a percentage.
1
Mean Square Error (MSE): 𝑀𝑆𝐸= ∑𝑛𝑡=1 𝑒𝑡2
𝑛
1
Root Mean Square Error (RMSE): 𝑅𝑀𝑆𝐸= √ ∑𝑛𝑡=1 𝑒𝑡2
𝑛
For Tractor Sales series, the first 10 years of data is used for training purpose and last 2 years of data
is for testing purpose.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 15
[email protected]
Y21IHWS8GO
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 16
This is an extension of moving (rolling) average method where more recent observations get
higher weight.
where, 𝛼 is the smoothing parameter for the level. In reality such a series is hard to find. This is a one-
step-ahead forecast where all the forecast values are identical.
Forecast equation:Y𝑡+1=𝑙𝑡+𝑏𝑡+𝑠𝑡−𝑚(𝑘+1)
Level Equation:𝑙𝑡=𝛼(𝑌𝑡−𝑠𝑡−𝑚)+𝛼(1−𝛼)𝑌𝑡−1, 0 < 𝛼 < 1
Trend Equation:𝑏𝑡=𝛽(𝑙𝑡−𝑙𝑡−1)+(1−𝛽)𝑏𝑡−1, 0 < 𝛽 < 1
Seasonal Equation : 𝛾(𝑌𝑡−𝑙𝑡−1−𝑏𝑡−1)+(1−𝛾)𝑠𝑡−𝑚, 0 < 𝛾 < 1
This is also known as three parameters exponential or triple exponential because of the three
smoothing parameters 𝛼, 𝛽 and 𝛾. This is a general method and a true multi-step ahead forecast.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 17
TS_Train_HW = ExponentialSmoothing(TS_Train,seasonal='multiplicative',tre
nd='additive',freq='M')
TS_Train_HW_autofit = TS_Train_HW.fit(optimized=True)
TS_Train_HW_autofit.params_formatted
[email protected]
Y21IHWS8GO
A user may also choose values of 𝛼, 𝛽 𝑎𝑛𝑑 𝛾, and can observe the differences in the model.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 18
Figure 10: Plot Actual vs. Forecasted sales using HW’s method for 2013-2014 years
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 19
RMSE = mean_squared_error(TS_Test,TES_pred,squared=False)
MAPE = mean_absolute_percentage_error(TS_Test['Number of Tractor Sold'],T
ES_pred)
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 20
The following parameters are important for generating time stamps according to frequency
(seasonality) using the date_range function in Pandas.
‘start’- This defines the time-stamp to indicate the first instance of the data.
‘period’ – This specifies the number of date-time observations to generate.
‘freq’ – This can be used to change the seasonality of the time stamps.
‘end’ – Instead of the ‘period’ parameter, ‘end’ can be used to specify the last instance of
the observation
After the time stamps are generated, they may be used to index data to make it an appropriate time
series.
The following links of the Pandas library is useful for various other custom date ranges.
1. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html
2. https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-
aliases
3. https://pandas.pydata.org/pandas-
docs/stable/reference/api/pandas.tseries.offsets.CustomBusinessDay.html#pandas.tseries.offs
ets.CustomBusinessDay
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 21
Auto Regressive Integrated Moving Average (ARIMA) models are applied on time series data
when the current value is assumed to be correlated to past values and past prediction errors.
Therefore, these models are used in defining current value as a linear combination of past
values and past prediction errors.
Here, we have defined a few terms that would be useful in understanding ARIMA models in
detail.
ARMA models can only be applied only on stationary time series data.
Mean: 𝐸(𝑌𝑡)=𝜇
Variance: 𝑉𝑎𝑟(𝑌𝑡)=𝐸(𝑌𝑡−𝜇)2=𝜎2
Correlation: 𝜌𝑘=𝐸[(𝑌𝑡−𝜇)(𝑌𝑡+𝑘−𝜇)/(𝜎𝑡𝜎𝑡+𝑘)]
[email protected]
Y21IHWS8GO
Where 𝜌𝑘 is the correlation (or auto-correlation) at lag 𝑘 between the values of 𝑌𝑡 and 𝑌𝑡+𝑘
So, if mean, variance and correlation (or auto-correlation) of time series data is constant (at
different lags) no matter at what point of time it is measured; i.e. if they are time invariant, the
series is called a stationary time series. A series not possessing these properties is termed as a
non-stationary time series.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 22
Now Log transformed data is put to Augmented Dickey-Fuller test to check for stationarity.
TS_Train_log = np.log10(TS_Train)
dftest = adfuller(TS_Train_log,regression='ct',autolag=None,maxlag=24)
print('DF test statistic is %3.3f' %dftest[0])
print('DF test p-value is' ,dftest[1])
print('Number of lags used' ,dftest[2])
[email protected]
Y21IHWS8GO
Neither original nor log-transformed series is stationary. Hence, a stationarization is
necessary. Often differencing a non-stationary time series leads to a stationary series.
First difference of a series is defined as 𝐷1=𝑌𝑡−𝑌𝑡−1
Autocorrelation Function (ACF): Autocorrelation of order 𝑝 is the correlation between
𝑌𝑡 and 𝑌𝑡+𝑘 for all values of 𝑘=0,1,…, −1≤𝐴𝐶𝐹≤1 and 𝐴𝐶𝐹(0)=1. ACF measures strength of
dependency of current observations on past observations.
Partial Autocorrelation Function (PACF): PACF of order 𝑘 is the autocorrelation between
𝑌𝑡 and 𝑌𝑡+𝑘 adjusting for all the intervening periods i.e. it provides the correlation value between
current and 𝑘 - lagged series by removing the influence of all other observations that exist in
between.
ACF and PACF used together to identify the order of the ARMA.
Seasonal ACF and PACF examines correlations for seasonal data
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 23
f,a = plt.subplots(1,2,sharex=True,sharey=False,squeeze=False)
[email protected]
Y21IHWS8GO
Figure 11: ACF and PACF of Tractor Sales after log transformation
𝒀𝒕=𝜷𝟏𝒀𝒕−𝟏+𝜷𝟐𝒀𝒕−𝟐+𝜷𝟑𝒀𝒕−𝟑+⋯+𝜷𝒑𝒀𝒕−𝒑+𝜺𝒕+𝜶𝟏𝜺𝒕−𝟏+𝜶𝟐𝜺𝒕−𝟐+⋯+𝜶𝒒𝜺𝒕−𝒒
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 24
𝒀𝒕=𝜷𝟏𝒀𝒕−𝟏+𝜷𝟐𝒀𝒕−𝟐+𝜷𝟑𝒀𝒕−𝟑+⋯+𝜷𝒑𝒀𝒕−𝒑+𝜺𝒕+𝜶𝟏𝜺𝒕−𝟏+𝜶𝟐𝜺𝒕−𝟐+⋯+𝜶𝒒𝜺𝒕−𝒒
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 25
## Here we have taken the range of p,q,P and Q to be within 0 to 2. We can chan
ge this if need be.
[email protected]
Y21IHWS8GOimport itertools
p = range(0, 3)
q = range(0, 3)
d = range(1, 2)
pdq = list(itertools.product(p, d, q))
model_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
## Defining an empty data frame to store the parameter values along with th
e model AIC
SARIMA_AIC = pd.DataFrame(columns=['param','seasonal', 'AIC'])
results_SARIMA = SARIMA_model.fit()
except:
continue
print('SARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, result
s_SARIMA.aic))
SARIMA_AIC = SARIMA_AIC.append({'param':param,'seasonal':param_seas
onal ,'AIC': results_SARIMA.aic}, ignore_index=True)
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 26
SARIMA_AIC.sort_values(by=['AIC']).head()
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 27
TS_AutoARIMA.plot_diagnostics();
[email protected]
Y21IHWS8GO
Alternatively, one might investigate other suitable model(s) for a time series using ACF and
PACF for the differenced series.
# Plot the first difference series, ACF and PACF of Log(Tractor Sales)
TS_Train_log.diff().plot()
plt.grid()
# ACF and PACF after taking the differenced logarithmic transformation
f,a = plt.subplots(1,2,sharex=True,sharey=False,squeeze=False)
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 28
Figure 13: Plot of first difference series, ACF and PACF of Log(Tractor Sales)
The plots of ACF and PACF indicate possible values for p to be 1 and q to be 0.
##Plot of first difference and seasonal first difference series, ACF and PACF of Log(Trac
tor Sales)
TS_Train_log.diff(12).diff().plot()
plt.grid()
# ACF and PACF after taking the differenced logarithmic transformation --> Seasonal Series Plot
f,a = plt.subplots(1,2,sharex=True,sharey=False,squeeze=False)
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 29
Figure 14: Plot of first difference and seasonal first difference series, ACF and PACF of Log(Tractor Sales)
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 30
[email protected]
Y21IHWS8GO
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 31
# Forecast for the test set duration using the automated SARIMA model with
95% confidence intervals
[email protected]
Y21IHWS8GO
pred_AutoARIMA = TS_AutoARIMA.get_forecast(steps=len(TS_Test))
axis = TS_Train_log.plot()
pred_AutoARIMA.summary_frame(alpha=0.05)['mean'].plot(ax=axis, label='Forec
ast', alpha=0.7)
axis.fill_between(pred_AutoARIMA.summary_frame(alpha=0.05).index, pred_Auto
ARIMA.summary_frame(alpha=0.05)['mean_ci_lower'],
pred_AutoARIMA.summary_frame(alpha=0.05)['mean_ci_upper']
, color='k', alpha=.15)
axis.set_xlabel('Year-Months')
axis.set_ylabel('Number of Tractor Sold')
plt.legend(loc='best')
plt.title('Tractor sales data forecast using SARIMA (0,1,1)(0,1,1)[12]')
plt.grid();
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 32
[email protected]
# Forecast for the test set duration using the manual SARIMA model with 95%
Y21IHWS8GO
confidence intervals
pred_f = TS_f.get_forecast(steps=len(TS_Test))
axis = TS_Train_log.plot()
pred_f.summary_frame(alpha=0.05)['mean'].plot(ax=axis, label='Forecast', al
pha=0.7)
axis.fill_between(pred_f.summary_frame(alpha=0.05).index, pred_f.summary_fr
ame(alpha=0.05)['mean_ci_lower'],
pred_f.summary_frame(alpha=0.05)['mean_ci_upper'], color=
'k', alpha=.15)
axis.set_xlabel('Year-Months')
axis.set_ylabel('Number of Tractor Sold')
plt.legend(loc='best')
plt.title('Tractor sales data forecast using SARIMA (0,1,1)(1,1,1)[12]')
plt.grid();
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 33
## Plot Actual vs. Forecasted sales using SARIMA(0,1,1)*(0,1,1)[12] for 2013-2014 years
[email protected]
Y21IHWS8GO
TS_Test.plot()
np.power(10,pred_AutoARIMA.summary_frame(alpha=0.05)['mean']).plot()
plt.legend(['Actual Data','Forecasted Data']);
plt.title('Plot of Actual vs. Forecasted sales using SARIMA(0,1,1)*(0,1,1)[12]
for 2013-2014 years')
plt.grid();
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 34
TS_Test.plot()
np.power(10,pred_f.summary_frame(alpha=0.05)['mean']).plot()
plt.legend(['Actual Data','Forecasted Data']);
plt.title('Plot of Actual vs. Forecasted sales using SARIMA(0,1,1)*(1,1,1)[
12] for 2013-2014 years')
plt.grid();
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 35
RMSE2 = mean_squared_error(TS_Test.values,np.power(10,pred_f.summary_frame(
)['mean']).values,squared=False)
MAPE2 = mean_absolute_percentage_error(TS_Test.values,np.power(10,pred_f.su
[email protected]
Y21IHWS8GOmmary_frame()['mean']).values)
It seems that ARIMA(0, 1, 1)(1, 1, 1)[12] not only has smaller AIC and BIC values compared to ARIMA(0,
1, 1)(0, 1, 1)[12] recommended by the automated ARIMA, it provides a smaller value of RMSE, albeit
marginal.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 36
A flow chart has been defined for understating ARIMA/ SARIMA models below: -
Final Forecasts
Once a model is chosen and validated, forecasts into the future to be determined. Tractor sales
to be forecasted for 24 months: 2015 Jan – 2016 Dec.
Going strictly by MAPE, recommended model is Triple Exponential Smoothing (Holt Winter’s
Model).
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 37
[email protected]
Y21IHWS8GO
# Plot Actual vs. Forecasted sales using HW method for 2015-2016 years
df.plot()
TS_df_HW_autofit.forecast(steps=24).plot()
plt.legend(['Actual','Forecast'])
plt.title('Forecast from the Holt-Winters Multiplicative Method')
plt.grid();
Figure 20: Plot Actual vs. Forecasted sales using Triple Exponential Smoothing method for 2015-2016 years
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 38
TS_final_arima = sm.tsa.statespace.SARIMAX(np.log10(df),
order=(0,1,1),
seasonal_order=(1, 1, 1, 12))
TS_final_arima = TS_final_arima.fit()
pred_final_arima = TS_final_arima.get_forecast(steps=24)
axis = np.log10(df).plot()
pred_final_arima.summary_frame(alpha=0.05)['mean'].plot(ax=axis, lab
el='Forecast', alpha=0.7)
axis.fill_between(pred_final_arima.summary_frame(alpha=0.05).index,
pred_final_arima.summary_frame(alpha=0.05)['mean_c
i_lower'],
pred_final_arima.summary_frame(alpha=0.05)['mean_c
i_upper'], color='k', alpha=.15)
axis.set_xlabel('Year-Months')
axis.set_ylabel('Number of Tractor Sold')
plt.legend(loc='best')
plt.title('Forecast from SARIMA(0,1,1)(1,1,1)[12]')
plt.grid();
[email protected]
Y21IHWS8GO
Figure 21: Plot Actual vs. Forecasted sales using SARIMA Model for 2015-2016 years
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 39
Often a times series is influenced by one or more exogenous variables, i.e. a time series may have
two independent components. One component is directly influenced by the part observations, but
another comment works like a multiple linear regression. Such a time series model is known as an
ARIMAX model and may be treated as an extension of ARIMA model. The ‘X’ in ARIMAX (or
SARIMAX) stands for the independent predictors in the model.
𝒀𝒕=𝜷𝟏𝒀𝒕−𝟏+𝜷𝟐𝒀𝒕−𝟐+𝜷𝟑𝒀𝒕−𝟑+⋯+𝜷𝒑𝒀𝒕−𝒑+𝜺𝒕+𝜶𝟏𝜺𝒕−𝟏+𝜶𝟐𝜺𝒕−𝟐+⋯+𝜶𝒒𝜺𝒕−𝒒 + γXt
It is possible that X is a vector, like a set of typical multiple linear regression predictor. X may vary
over time or not. For example, if price of a commodity is modelled as a time series, X may be price
index, which is a time dependent variable. Alternatively, X may be a categorical variable, such as
geographic location.
[email protected]
There is one major
Y21IHWS8GO difference with multiple linear regression and inclusion of a set of predictors in
an ARIMA model. Interpretation of the regression coefficients are not straight forward. The
estimated coefficient of X is not an estimate of the increase in 𝒀𝒕 for a unit increase in X, because 𝒀𝒕
includes the lag variables.
The dataset Pollution_Data.csv contains average weekly value of several polluting particles in one
pollution monitoring station. The main parameter for monitoring ambient air quality is PM2.5. The
data points are weekly average from 2013 – 2017.
Fig 22 shows the data pattern. Note that the frequency of the time stamps is weekly.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 40
df = pd.read_csv('Pollution_Data.csv')
daterange = pd.date_range(start='2013-03-03',periods=len(df),freq='W')
df['Time_Stamp'] = daterange
df.set_index(keys='Time_Stamp',inplace=True)
rcParams['figure.figsize'] = 15,8
df['PM2.5'].plot(grid=True);
[email protected]
Y21IHWS8GO
Since we have already discussed the theory behind building ARIMA models using the lowest
Akaike Information Criteron, we will directly go ahead and apply those concepts over here.
But before that we need to split the data into training and test and then go on to check for stationarity
of the Training Data.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 41
[email protected]
Y21IHWS8GO
Figure 23: PM2.5 Data (split into Training and testing purpose)
Let us check the ACF plot to understand the exact nature of the seasonality in the data.
plot_acf(TS_Train['PM2.5'],lags=60);
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 42
Now, we will check whether the Training data is stationary using the Augmented Dickey-Fuller
Test.
dftest = adfuller(TS_Train['PM2.5'],regression='ct')
print('DF test statistic is %9.9f' %dftest[0])
print('DF test p-value is' ,dftest[1])
print('Number of lags used' ,dftest[2])
The data is not stationary at 95% confidence interval. Now, we will take a first order difference
of the data and then check for stationarity.
dftest = adfuller(TS_Train['PM2.5'].diff().dropna(),regression='ct')
print('DF test statistic is %9.9f' %dftest[0])
[email protected]
Y21IHWS8GOprint('DF test p-value is' ,dftest[1])
print('Number of lags used' ,dftest[2])
After taking a first order differencing we see that the data has indeed become stationary at 95%
confidence level.
Let us plot the differenced Time Series once and check the plots of the ACF and the PACF.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 43
f,a = plt.subplots(1,2,sharex=True,sharey=False,squeeze=False)
plot_0 = plot_acf(TS_Train['PM2.5'].diff(),ax=a[0][0],missing='drop')
plot_1 = plot_pacf(TS_Train['PM2.5'].diff().dropna(),ax=a[0][1],zero=False);
[email protected]
Y21IHWS8GO
Figure 25: Plot of first difference series, ACF and PACF of PM2.5
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 44
𝐀𝐑𝐈𝐌𝐀(𝐩,𝐝,𝐪):
## The following loop helps us in getting a combination of different parame
ters of p and q in the range of 0 and 3
## We have kept the value of d as 1 as we need to take a difference of the
series to make it stationary.
import itertools
p = q = range(0, 4)
d= range(1,2)
pdq = list(itertools.product(p, d, q))
for param in pdq:# running a loop within the pdq parameters defined by iter
tools
ARIMA_model = sm.tsa.statespace.SARIMAX(TS_Train['PM2.5'].values,order=
param).fit()#fitting the ARIMA model using the parameters from the loop
ARIMA_AIC = ARIMA_AIC.append({'param':param, 'AIC': ARIMA_model.aic}, i
gnore_index=True)
#appending the AIC values and the model parameters to the previously cr
eated data frame
#for easier understanding and sorting of the AIC values
## Sort the above AIC values in the ascending order to get the parameters f
or the minimum AIC value
[email protected]
Y21IHWS8GOARIMA_AIC.sort_values(by='AIC',ascending=True)
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 45
TS_AutoARIMA = sm.tsa.statespace.SARIMAX(endog=TS_Train['PM2.5'],order=(0,1
,1))
TS_AutoARIMA = TS_AutoARIMA.fit()
print(TS_AutoARIMA.summary())
[email protected]
Y21IHWS8GO
According to the result, S𝐴𝑅𝐼𝑀𝐴(0,1,1) is the indicated model for the PM2.5 data with
AIC= (1500.213) and BIC= (1506.194).
TS_AutoARIMA.plot_diagnostics();
# Forecast for the test set duration using the automated ARIMA model with 95%
confidence intervals
pred_AutoARIMA = TS_AutoARIMA.get_forecast(steps=len(TS_Test))
[email protected]
Y21IHWS8GO
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 47
TS_Test['PM2.5'].plot()
pred_AutoARIMA.summary_frame()['mean'].plot()
plt.grid()
plt.title('PM2.5: Actual vs Forecast - SARIMA Model')
plt.xlabel('Time')
plt.legend(['Actual Data','Forecasted Data']);
[email protected]
Y21IHWS8GO
Figure 28: Plot Actual vs. Forecasted PM2.5 data using ARIMA(0,1,1) for 2016-2017 years
Here, we have used the AIC to select the best ‘p’ and ‘q’ values for the ARIMA models. We can
also decide the ‘p’ and ‘q’ values based on lags where the ACF and the PACF cuts-off at a particular
confidence level (usually 95% confidence level is taken).
It is clear that ARIMA model does not provide good approximation to the data. An effort will be made
to fit an ARIMAX model using a set of covariates, temperature, dew point, rainfall amount and wind
speed to improve accuracy of the model.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 48
## Here we have taken the range of p and q to be within 0 to 3. We can change this if need
be. The range of p and q has been defined in the last loop and we will be
using the same parameter values
for param in pdq:#running a loop within the pdq parameters defined by itertools
ARIMAX_model = sm.tsa.statespace.SARIMAX(endog=TS_Train['PM2.5'].values,
order=param,exog=TS_Train[['TEMP','DEWP','RAIN','WSPM']]).fit()#fitting the ARIMA
model using the parameters from the loop
ARIMAX_AIC = ARIMAX_AIC.append({'param':param, 'AIC': ARIMAX_model.aic}, igno
re_index=True)
#appending the AIC values and the model parameters to the previously created
data frame
#for easier understanding and sorting of the AIC values
## Sorting the parameters of the ARIMAX models to get the parameters which give u
s the lowest AIC value
[email protected]
Y21IHWS8GO
ARIMAX_AIC.sort_values(by=['AIC']).head()
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 49
TS_ARIMAX = TS_ARIMAX.fit()
print(TS_ARIMAX.summary())
[email protected]
Y21IHWS8GO
According to the result, 𝐴𝑅𝐼𝑀𝐴X (0,1,3) is the indicated model for the PM2.5 data with
AIC= (1443.447) and BIC= (1467.371).
TS_ARIMAX.plot_diagnostics();
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 50
pred = TS_ARIMAX.get_forecast(steps=len(TS_Test),exog=TS_Test[['TEMP','DEWP',
'RAIN','WSPM']])
axis = TS_Train['PM2.5'].plot()
pred.summary_frame(alpha=0.05)['mean'].plot(ax=axis, label='Forecast', alpha=
0.7)
axis.fill_between(pred.summary_frame(alpha=0.05).index, pred.summary_frame(al
pha=0.05)['mean_ci_lower'],
pred.summary_frame(alpha=0.05)['mean_ci_upper'], color='k',
alpha=.15)
axis.set_xlabel('Time')
axis.set_ylabel('PM2.5')
plt.legend(loc='best')
plt.title('PM2.5 data forecast using ARIMAX (0,1,3)')
plt.grid();
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 51
TS_Test['PM2.5'].plot()
[email protected]
pred.summary_frame()['mean'].plot()
Y21IHWS8GO
plt.grid()
plt.title('PM2.5: Actual vs Forecast - ARIMAX Model (0,1,3)')
plt.xlabel('Time')
plt.legend(['Actual Data','Forecasted Data']);
Figure 31: Plot Actual vs. Forecasted PM2.5 data using ARIMAX (0,1,3) for 2016-2017 years
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 52
RMSE = mean_squared_error(TS_Test['PM2.5'],pred.predicted_mean,squared=False)
MAPE = mean_absolute_percentage_error(TS_Test['PM2.5'],pred.predicted_mean)
Here, we have chosen the model parameters (p and q) using the lowest AIC. But we can definitely
go back and investigate the lags at which the ACF and the PACF cuts-off and take p and q values
accordingly for the ARIMAX model.
[email protected]
Y21IHWS8GO
Note that none of the above models show small MAPE. However, it is to be noted that with the
addition of an exogenous variable, there is a reduction of 40% in the MAPE.
Since no future data for exogenous variables is available, forecast into the future for ARIMAX
models (beyond the time stamps of the test set) is not possible. The seasonal parameters (with the
appropriate seasonal frequency) may be added to the ARIMAX model to make it a SARIMAX
model.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 53
[email protected]
Y21IHWS8GO
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 55
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 56
Libraries Versions
Pandas 1.0.5
Numpy 1.19.0
Matplotlib 3.2.1
Seaborn 0.10.1
Statsmodels 0.12.0
Sklearn 0.23.1
[email protected]
Y21IHWS8GO
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 57
Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: principles and practice. OTexts.
Accessed Link: https://otexts.org/fpp2/
Tsay, R. S. (2005). Analysis of financial time series (Vol. 543). John Wiley & Sons.
Mills, T. C., & Patterson, K. D. (2015). Modelling the trend: the historical origins of some
modern methods and ideas. Journal of Economic Surveys, 29(3), 527-548.
Klein, J. L., & Klein, D. (1997). Statistical visions in time: a history of time series analysis,
1662-1938. Cambridge University Press.
Nau, R., (2018), Statistical forecasting: notes on regression and time series analysis, Fuqua
School of Business, Duke University, Accessed link:
http://people.duke.edu/~rnau/411home.htm
Hyndman, R., Koehler, A. B., Ord, J. K., & Snyder, R. D. (2008). Forecasting with exponential
smoothing: the state space approach. Springer Science & Business Media.
Coghlan, A. (2015). A little book of R for time series. Disponível em: https://media.
readthedocs. org/pdf/a-little-book-of-r-for-time-series/latest/a-little-book-of-r-for-time-series.
pdf>. Acesso em, 10.
Tsay, R. S. (2014). An introduction to analysis of financial data with R. John Wiley & Sons.
Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis:
Forecasting and control (5th ed). Hoboken, New Jersey: John Wiley & Sons.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 58