Guide To Time Series Analysis With Python - 4 - ARIMA and SARIMA - by Buse Köseoğlu - Medium
Guide To Time Series Analysis With Python - 4 - ARIMA and SARIMA - by Buse Köseoğlu - Medium
Search
Listen Share
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 1/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 2/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
You can find full code of this article on GitHub. If you are ready, let’s get
started.
What is ARIMA(p,d,q)?
As the name suggests, ARIMA is a combination of AR and MA models and
order of integration.
AR: autoregressive process says that the past values in the time series
affect the present.
MA: Moving average process indicates that the current value depends on
the current and past error rates
p: This p-value decides how far back we go. It is the lag order.
In fact, apart from the AR and MA models, the only new parameter here is the
d parameter. In the diagram below, you can see how the data should be
examined after receiving it.
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 3/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 4/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
First of all, we need to pay attention to whether the data is stationary or not.
After applying the transformations to make the data stationary (you can find
them in the first article), we need to determine the d parameter of ARIMA.
This parameter can also be determined by how many differencing we receive
when making the data stationary. Later, in order to find the optimum values of
the p and q parameters, we can create separate lists for both and try every
combination of the 3 parameters with iteration. The model with the smallest
AIC value can be selected as the best model. There is an important point here:
if we want to compare models according to the AIC metric, the d parameter
must be constant. The same d should be used in every model. After choosing
the model, we should not say that this model is the best and leave it at that.
Here we need to do a residual analysis for the model we chose. Residual
analysis helps assess whether the model adequately captures the underlying
patterns in the time series data.
What is SARIMA(p,d,q)(P,D,Q)m?
The SARIMA model includes P, D, Q, m parameters in addition to ARIMA.
These parameters help us capture seasonality.
CODE PRACTICE
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 5/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
Now let’s do all this in practice. We will use auto_arima from the pmdarima
library to determine the optimal model.
For examples we will use the air passengers dataset available on Kaggle.
This data set contains the number of passengers on a monthly basis. The data
consists of 144 rows and 2 columns. There are monthly time periods between
1949–1960.
When we examine the graph of the number of passengers over time, we can
clearly see that there is seasonality even here.
df = pd.read_csv("AirPassengers.csv")
df.rename(columns={"Month":"month","#Passengers":"passengers"}, inplace=T
df.head()
plt.figure(figsize=(15,4))
plt.plot(df["month"],df["passengers"]);
plt.xlabel('Timesteps');
plt.ylabel('Value');
plt.xticks(df['month'][::10]);
# Decomposition
ax1.plot(advanced_decomposition.observed)
ax1.set_ylabel('Observed')
ax2.plot(advanced_decomposition.trend)
ax2.set_ylabel('Trend')
ax3.plot(advanced_decomposition.seasonal)
ax3.set_ylabel('Seasonal')
ax4.plot(advanced_decomposition.resid)
ax4.set_ylabel('Residuals')
fig.autofmt_xdate()
plt.tight_layout()
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 7/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
We must apply the Dickey-Fuller test to decide whether the data is stationary
or not.
def adfuller_test(y):
adf_result = adfuller(y)
adfuller_test(df.passengers)
When we apply the Dickey-Fuller test to the “Passengers” variable, we can see
that the p-value is greater than 0.05, meaning the data is not stationary. We
can perform differencing to make the data stationary.
print("*"*50)
# d = 2
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 8/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
First order differencing did not make the data stationary. P-value is still
greater than 0.05 so we perform differencing again. The second time the data
became stationary. Thus, we determined that our d parameter should be 2.
train = df[:-12]
test = df[-12:]
We divide the data into two: train and test. We put 12 months of data to the
test.
ARIMA_model = auto_arima(train['passengers'],
start_p=1,
start_q=1,
test='adf', # use adftest to find optimal 'd'
tr=13, max_q=13, # maximum p and q
m=1, # frequency of series (if m==1, seasonal is se
d=2,
seasonal=False, # No Seasonality for standard ARIMA
trace=True, #logs
error_action='warn', #shows errors ('ignore' silenc
suppress_warnings=True,
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 9/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
stepwise=True)
ARIMA_model.summary()
Model parameters:
start_p: At what number should we start searching for the p value for the
AR(p) part?
start_q: At what number should we start searching for the q value for the
MA(q) part?
d: order of integration
When we run the model, we can see that the best parameters it finds are
SARIMAX(4, 2, 0). The reason why it is shown as SARIMAX is because
auto_arima can also run the SARIMAX model. Since these models are already
separated by parameters, the SARIMAX(4, 2, 0) representation is equal to the
ARIMA(4, 2, 0) representation.
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 10/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
Histogram plus: The KDE curve should be very similar to the normal
distribution (labeled as N(0,1) in the plot)
Q-Q Plot: Most of the data points should lie on the straight line
ARIMA_model.plot_diagnostics(figsize=(10,7))
plt.show()
The next step is to run the Ljung-Box test on the residuals to make sure that
they are independent and uncorrelated.
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 11/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
residuals = ARIMA_model_fit.resid
acorr_ljungbox(residuals, np.arange(1, 11, 1))
Here we expect all p-value values to be greater than 0.05. But the values we
obtained are not like this.
To compare the ARIMA and SARIMA models, we will add the predictions of
both to the test data.
test['naive_seasonal'] = df['passengers'].iloc[120:132].values
ARIMA_pred = ARIMA_model_fit.get_prediction(132, 143).predicted_mean
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 12/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
test['ARIMA_pred'] = ARIMA_pred
“naive_seasonal” treats the values of the last 12 months as the same as the 12
months in the test data we want to predict. So we assume the last 2 years are
the same. ARIMA_pred contains the values of the last 12 months predicted by
the model.
print("*"*50)
Here we first take first order differencing. Since the data is not stationary, we
do this process again, but we write 12 (frequency of the data — m parameter)
instead of 1 in the n parameter. As a result, we can see that the data is
stationary and the parameters are:
d=1
D=1
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 13/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 14/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
Q-Q Plot: Most of the data points should lie on the straight line, which
displays a fairly straight line
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 15/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
residuals = SARIMA_model_fit.resid
acorr_ljungbox(residuals, np.arange(1, 11, 1))
The returned p-values are all greater than 0.05. Therefore, we do not reject the
null hypothesis, and we conclude that the residuals are independent and
uncorrelated, just like white noise.
Our model has passed all the tests from the residuals analysis, and we are
ready to use it for forecasting.
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 16/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
We can visualize the results of both models and observe which one gives
results closer to reality.
fig, ax = plt.subplots()
ax.plot(df['month'], df['passengers'])
ax.plot(test['passengers'], 'b-', label='actual')
ax.plot(test['naive_seasonal'], 'r:', label='naive seasonal')
ax.plot(test['ARIMA_pred'], 'k--', label='ARIMA')
ax.plot(test['SARIMA_pred'], 'g-.', label='SARIMA')
ax.set_xlabel('Date')
ax.set_ylabel('Number of air passengers')
ax.axvspan(132, 143, color='#808080', alpha=0.2)
ax.legend(loc=2)
plt.xticks(np.arange(0, 145, 12), np.arange(1949, 1962, 1))
ax.set_xlim(110, 143)
fig.autofmt_xdate()
plt.tight_layout()
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 17/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
The gray area in the graph above is our test data. The blue line shows the
actual values. When we examine the results, we can say that ARIMA’s success
is very low, but SARIMA is successful in its predictions. The addition of
seasonality has been successful for the model.
We can also evaluate the success of the model with MAPE (Mean Absolute
Percentage Error). Here we expect MAPE to be small.
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 18/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
When we examine the metric, we can see that the best model is the SARIMA
model.
Follow
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 19/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
No responses yet
Respond
Buse Köseoğlu
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 20/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
Buse Köseoğlu
Apr 17 4 1
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 21/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
Buse Köseoğlu
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 22/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
Jun 7 2
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 23/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
Sep 24 50
Lists
ChatGPT prompts
50 stories · 2298 saves
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 24/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
Pape
Jun 8 64
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 25/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
Time series analysis is essential for predicting future trends in sequential data, such as
sales figures, stock prices, or economic…
Oct 30
Palash Mishra
Jun 19 741 10
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 26/27
11/30/24, 8:15 PM Guide to Time Series Analysis with Python — 4: ARIMA and SARIMA | by Buse Köseoğlu | Medium
Jan 31 348 3
https://buse-koseoglu13.medium.com/guide-to-time-series-analysis-with-python-4-arima-and-sarima-d62bcdcfb64a 27/27