ARIMA Model - Complete Guide to Time Series Forecasting in Python _ ML+
ARIMA Model - Complete Guide to Time Series Forecasting in Python _ ML+
Search Search
Login
Python
ARIMA Model –
Decorators in Python – How
to enhance functions
Complete Guide to
without changing the code?
Python Module – What are Using ARIMA model, you can forecast a time series using
modules and packages in the series past values. In this post, we build an optimal
python?
ARIMA model from scratch and extend it to Seasonal
Object Oriented
ARIMA (SARIMA) and SARIMAX models. You will also see
Programming (OOPS) in
Python
how to build autoarima models in python
Python Collections –
Contents
Complete Guide
1. Introduction to Time Series Forecasting
pdb – How to use Python
2. Introduction to ARIMA Models
debugger
3. What does the p, d and q in ARIMA model mean?
Python JSON – Guide
4. What are AR and MA models
How to use tf.function to
speed up Python code in
5. How to find the order of differencing (d) in ARIMA
Tensorflow model
List Comprehensions in 6. How to find the order of the AR term (p)
Python – My Simplified 7. How to find the order of the MA term (q)
Guide
8. How to handle if a time series is slightly under or
Mahalonobis Distance – over differenced
Understanding the math
9. How to build the ARIMA Model
with examples (python)
10. How to do find the optimal ARIMA model manually
Parallel Processing in
Python – A Practical Guide using Out-of-Time Cross validation
with Examples 11. Accuracy Metrics for Time Series Forecast
Python @Property Explained 12. How to do Auto Arima Forecast in Python
– How to Use and When? 13. How to interpret the residual plots in ARIMA model
(Full Examples)
14. How to automatically build SARIMA model in python
Python Logging – Simplest
15. How to build SARIMAX Model with exogenous
Guide with Full Code and
Examples variable
ARIMA Modeling
Any ‘non-seasonal’ time series that exhibits patterns and
is not a random white noise can be modeled with ARIMA
Augmented Dickey Fuller
Test (ADF Test) models.
Why?
Why?
The null hypothesis of the ADF test is that the time series
is non-stationary. So, if the p-value of the test is less
than the significance level (0.05) then you reject the null
hypothesis and infer that the time series is indeed
stationary.
# Original Series
fig, axes = plt.subplots(3, 2,
sharex=True)
axes[0, 0].plot(df.value); axes[0,
0].set_title('Original Series')
plot_acf(df.value, ax=axes[0, 1])
# 1st Differencing
axes[1, 0].plot(df.value.diff());
axes[1, 0].set_title('1st Order
Differencing')
plot_acf(df.value.diff().dropna(),
ax=axes[1, 1])
# 2nd Differencing
axes[2,
0].plot(df.value.diff().diff());
axes[2, 0].set_title('2nd Order
Differencing')
plot_acf(df.value.diff().diff().dropna(
), ax=axes[2, 1])
plt.show()
Order of Differencing
## Adf Test
ndiffs(y, test='adf') # 2
# KPSS test
ndiffs(y, test='kpss') # 0
# PP test:
ndiffs(y, test='pp') # 2
2 0 2
plt.show()
Order of AR Term
import pandas as pd
from statsmodels.graphics.tsaplots
import plot_acf, plot_pacf
import matplotlib.pyplot as plt
plt.rcParams.update({'figure.figsize':
(9,3), 'figure.dpi':120})
# Import data
df =
pd.read_csv('https://raw.githubusercont
ent.com/selva86/datasets/master/austa.c
sv')
plt.show()
Order of MA Term
Couple of lags are well above the significance line. So,
let’s tentatively fix q as 2. When in doubt, go with the
simpler model that sufficiently explains the Y.
ARIMA
Model Results
=======================================
=======================================
Dep. Variable: D.value
No. Observations: 99
Model: ARIMA(1, 1, 2)
Log Likelihood -253.790
Method: css-mle
S.D. of innovations 3.119
Date: Wed, 06 Feb 2019
AIC 517.579
Time: 23:32:56
BIC 530.555
Sample: 1
HQIC 522.829
=======================================
=======================================
===
coef std err
z P>|z| [0.025 0.975]
---------------------------------------
---------------------------------------
---
const 1.1202 1.290
0.868 0.387 -1.409
3.649
ar.L1.D.value 0.6351 0.257
2.469 0.015 0.131
1.139
ma.L1.D.value 0.5287 0.355
1.489 0.140 -0.167
1.224
ma.L2.D.value -0.0010 0.321
-0.003 0.998 -0.631
0.629
Roots
=======================================
======================================
Real
Imaginary Modulus
Frequency
---------------------------------------
--------------------------------------
AR.1 1.5746
+0.0000j 1.5746
0.0000
MA.1 -1.8850
+0.0000j 1.8850
0.5000
MA.2 545.3515
+0.0000j 545.3515
0.0000
---------------------------------------
--------------------------------------
ARIMA
Model Results
=======================================
=======================================
Dep. Variable: D.value
No. Observations: 99
Model: ARIMA(1, 1, 1)
Log Likelihood -253.790
Method: css-mle
S.D. of innovations 3.119
Date: Sat, 09 Feb 2019
AIC 515.579
Time: 12:16:06
BIC 525.960
Sample: 1
HQIC 519.779
=======================================
=======================================
===
coef std err
z P>|z| [0.025 0.975]
---------------------------------------
---------------------------------------
---
const 1.1205 1.286
0.871 0.386 -1.400
3.641
ar.L1.D.value 0.6344 0.087
7.317 0.000 0.464
0.804
ma.L1.D.value 0.5297 0.089
5.932 0.000 0.355
0.705
Roots
=======================================
======================================
Real
Imaginary Modulus
Frequency
---------------------------------------
--------------------------------------
AR.1 1.5764
+0.0000j 1.5764
0.0000
MA.1 -1.8879
+0.0000j 1.8879
0.5000
---------------------------------------
--------------------------------------
Residuals Density
The residual errors seem fine with near zero mean and
uniform variance. Let’s plot the actuals against the fitted
values using plot_predict() .
# Actual vs Fitted
model_fit.plot_predict(dynamic=False)
plt.show()
Actual vs Fitted
# Build Model
# model = ARIMA(train, order=(3,2,1))
model = ARIMA(train, order=(1, 1, 1))
fitted = model.fit(disp=-1)
# Forecast
fc, se, conf = fitted.forecast(15,
alpha=0.05) # 95% conf
# Plot
plt.figure(figsize=(12,5), dpi=100)
plt.plot(train, label='training')
plt.plot(test, label='actual')
plt.plot(fc_series, label='forecast')
plt.fill_between(lower_series.index,
lower_series, upper_series,
color='k', alpha=.15)
plt.title('Forecast vs Actuals')
plt.legend(loc='upper left',
fontsize=8)
plt.show()
Forecast vs Actuals
# Build Model
model = ARIMA(train, order=(3, 2, 1))
fitted = model.fit(disp=-1)
print(fitted.summary())
# Forecast
fc, se, conf = fitted.forecast(15,
alpha=0.05) # 95% conf
# Plot
plt.figure(figsize=(12,5), dpi=100)
plt.plot(train, label='training')
plt.plot(test, label='actual')
plt.plot(fc_series, label='forecast')
plt.fill_between(lower_series.index,
lower_series, upper_series,
color='k', alpha=.15)
plt.title('Forecast vs Actuals')
plt.legend(loc='upper left',
fontsize=8)
plt.show()
ARIMA
Model Results
=======================================
=======================================
Dep. Variable: D2.value
No. Observations: 83
Model: ARIMA(3, 2, 1)
Log Likelihood -214.248
Method: css-mle
S.D. of innovations 3.153
Date: Sat, 09 Feb 2019
AIC 440.497
Time: 12:49:01
BIC 455.010
Sample: 2
HQIC 446.327
=======================================
=======================================
====
coef std err
z P>|z| [0.025 0.975]
---------------------------------------
---------------------------------------
----
const 0.0483 0.084
0.577 0.565 -0.116
0.212
ar.L1.D2.value 1.1386 0.109
10.399 0.000 0.924
1.353
ar.L2.D2.value -0.5923 0.155
-3.827 0.000 -0.896
-0.289
ar.L3.D2.value 0.3079 0.111
2.778 0.007 0.091
0.525
ma.L1.D2.value -1.0000 0.035
-28.799 0.000 -1.068
-0.932
Roots
=======================================
======================================
Real
Imaginary Modulus
Frequency
---------------------------------------
--------------------------------------
AR.1 1.1557
-0.0000j 1.1557
-0.0000
AR.2 0.3839
-1.6318j 1.6763
-0.2132
AR.3 0.3839
+1.6318j 1.6763
0.2132
MA.1 1.0000
+0.0000j 1.0000
0.0000
---------------------------------------
--------------------------------------
# Accuracy metrics
def forecast_accuracy(forecast,
actual):
mape = np.mean(np.abs(forecast -
actual)/np.abs(actual)) # MAPE
me = np.mean(forecast - actual)
# ME
mae = np.mean(np.abs(forecast -
actual)) # MAE
mpe = np.mean((forecast -
actual)/actual) # MPE
rmse = np.mean((forecast -
actual)**2)**.5 # RMSE
corr = np.corrcoef(forecast,
actual)[0,1] # corr
mins =
np.amin(np.hstack([forecast[:,None],
actual[:,None]]), axis=1)
maxs =
np.amax(np.hstack([forecast[:,None],
actual[:,None]]), axis=1)
minmax = 1 - np.mean(mins/maxs)
# minmax
acf1 = acf(fc-test)[1]
# ACF1
return({'mape':mape, 'me':me,
'mae': mae,
'mpe': mpe, 'rmse':rmse,
'acf1':acf1,
'corr':corr,
'minmax':minmax})
forecast_accuracy(fc, test.values)
model = pm.auto_arima(df.value,
start_p=1, start_q=1,
test='adf',
# use adftest to find optimal 'd'
max_p=3, max_q=3,
# maximum p and q
m=1,
# frequency of series
d=None,
# let model determine 'd'
seasonal=False,
# No Seasonality
start_P=0,
D=0,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
print(model.summary())
model.plot_diagnostics(figsize=(7,5))
plt.show()
Residuals Chart
Bottom left: All the dots should fall perfectly in line with
the red line. Any significant deviations would imply the
distribution is skewed.
Bottom Right: The Correlogram, aka, ACF plot shows
the residual errors are not autocorrelated. Any
autocorrelation would imply that there is some pattern in
the residual errors which are not explained in the model.
So you will need to look for more X’s (predictors) to the
model.
# Forecast
n_periods = 24
fc, confint =
model.predict(n_periods=n_periods,
return_conf_int=True)
index_of_fc = np.arange(len(df.value),
len(df.value)+n_periods)
# Plot
plt.plot(df.value)
plt.plot(fc_series, color='darkgreen')
plt.fill_between(lower_series.index,
lower_series,
upper_series,
color='k', alpha=.15)
# Import
data =
pd.read_csv('https://raw.githubusercont
ent.com/selva86/datasets/master/a10.csv
', parse_dates=['date'],
index_col='date')
# Plot
fig, axes = plt.subplots(2, 1, figsize=
(10,5), dpi=100, sharex=True)
# Usual Differencing
axes[0].plot(data[:], label='Original
Series')
axes[0].plot(data[:].diff(1),
label='Usual Differencing')
axes[0].set_title('Usual Differencing')
axes[0].legend(loc='upper left',
fontsize=10)
# Seasinal Dei
axes[1].plot(data[:], label='Original
Series')
axes[1].plot(data[:].diff(12),
label='Seasonal Differencing',
color='green')
axes[1].set_title('Seasonal
Differencing')
plt.legend(loc='upper left',
fontsize=10)
plt.suptitle('a10 - Drug Sales',
fontsize=16)
plt.show()
Seasonal Differencing
error_action='ignore',
suppress_warnings=True,
stepwise=True)
smodel.summary()
# Forecast
n_periods = 24
fitted, confint =
smodel.predict(n_periods=n_periods,
return_conf_int=True)
index_of_fc =
pd.date_range(data.index[-1], periods =
n_periods, freq='MS')
# Plot
plt.plot(data)
plt.plot(fitted_series,
color='darkgreen')
plt.fill_between(lower_series.index,
lower_series,
upper_series,
color='k', alpha=.15)
So, you will always know what values the seasonal index
will hold for the future forecasts.
# Import Data
data =
pd.read_csv('https://raw.githubusercont
ent.com/selva86/datasets/master/a10.csv
', parse_dates=['date'],
index_col='date')
model='multiplicative',
extrapolate_trend='freq')
seasonal_index =
result_mul.seasonal[-12:].to_frame()
seasonal_index['month'] =
pd.to_datetime(seasonal_index.index).mo
nth
# merge with the base data
data['month'] = data.index.month
df = pd.merge(data, seasonal_index,
how='left', on='month')
df.columns = ['value', 'month',
'seasonal_index']
df.index = data.index # reassign the
index.
import pmdarima as pm
# SARIMAX Model
sxmodel = pm.auto_arima(df[['value']],
exogenous=df[['seasonal_index']],
start_p=1,
start_q=1,
test='adf',
max_p=3,
max_q=3, m=12,
start_P=0,
seasonal=True,
d=None, D=1,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
sxmodel.summary()
So, we have the model with the exogenous term. But the
coefficient is very small for x1 , so the contribution
from that variable will be negligible. Let’s forecast it
anyway.
exogenous=np.tile(seasonal_index.value,
2).reshape(-1,1),
return_conf_int=True)
index_of_fc =
pd.date_range(data.index[-1], periods =
n_periods, freq='MS')
# Plot
plt.plot(data['value'])
plt.plot(fitted_series,
color='darkgreen')
plt.fill_between(lower_series.index,
lower_series,
upper_series,
color='k', alpha=.15)
17. Conclusion
Congrats if you reached this point. Give yourself a BIG
hug if you were able to solve the practice exercises.
Happy Learning!
More Articles
Forecasting in Python
Read Guide
Join 54,000+ fine folks. Stay as long as you'd like. Unsubscribe anytime.
We Accept
About
About Us
Terms of Use
Privacy Policy
Refund Policy
COMPLETE ROADMAP
1. Programming for DS
2. ML Algorithms
3. ML Ops
4.Deep Learning
5. Time Series
6. DS Industry Projects
7. Supplementary Courses
OFFERINGS
All Courses
Complete Univ Access
Industry DS Projects
Youtube
List of Blogs
30 Day DS Interviews Prep
Tasklist for DS Projects
Jobs
HELP
Drop a Query
FAQ's
Contact Us
Testimonials
Subscribe to newsletter
Copyright 2024 | All Rights Reserved by machinelearningplus
Privacy Policy
Terms of service
Terms & Conditions