0% found this document useful (0 votes)
23 views

TSA Project Python Code

The document discusses using a SARIMA model to forecast monthly stock prices. It shows the steps taken which include differencing to make the data stationary, identifying model parameters using ACF and PACF plots, fitting a SARIMA(1,1,1)(1,0,1)12 model, making predictions for 4 years ahead and performing diagnostic tests on the residuals.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

TSA Project Python Code

The document discusses using a SARIMA model to forecast monthly stock prices. It shows the steps taken which include differencing to make the data stationary, identifying model parameters using ACF and PACF plots, fitting a SARIMA(1,1,1)(1,0,1)12 model, making predictions for 4 years ahead and performing diagnostic tests on the residuals.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

4/23/23, 5:37 PM TSA Project Ultimate

In [2]: # Import Required Libraries


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.diagnostic import acorr_ljungbox
from scipy.stats import ttest_ind
import statsmodels.api as sm

In [3]: # Read The Data From CSV File


df = pd.read_csv("C:/Jupyter Lab/data/SPX Monthly Data 2000 To 2019.csv")

# set date column as index


df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Extract Monthly Close Prices


close_monthly = df['Close'].resample('M').last()

In [4]: # Plot The Monthly Close Prices


plt.plot(close_monthly)
plt.xlabel('Year')
plt.ylabel('Close Price')
plt.title('Monthly Close Prices from 2000 to 2019')
plt.show()

file:///C:/Users/Ishan/Desktop/TSA Project Python Code.html 1/6


4/23/23, 5:37 PM TSA Project Ultimate

Since the plot shows a clear upwards trend, the data is not stationary. Hence we will use
differencing to make the data stationary since SARIMA Model assumes stationarity.

In [5]: # Perform Differencing To Make The Data Stationary


diff_monthly = close_monthly.diff().dropna()

# Plot The Differenced Data


plt.plot(diff_monthly)
plt.xlabel('Year')
plt.ylabel('Differenced Close Price')
plt.title('Differenced Monthly Close Prices from 2000 to 2019')
plt.show()

In [6]: # Plot ACF And PACF To Determine SARIMA Parameters


fig, ax = plt.subplots(2, figsize=(12,8))
sm.graphics.tsa.plot_acf(diff_monthly, lags=30, ax=ax[0])
sm.graphics.tsa.plot_pacf(diff_monthly, lags=30, method='ywm', ax=ax[1])
plt.show()

# Print ACF and PACF values


acf_values = sm.tsa.stattools.acf(diff_monthly, nlags=30)
pacf_values = sm.tsa.stattools.pacf(diff_monthly, nlags=30, method='ywm')

print('ACF values:', acf_values)


print('PACF values:', pacf_values)

file:///C:/Users/Ishan/Desktop/TSA Project Python Code.html 2/6


4/23/23, 5:37 PM TSA Project Ultimate

ACF values: [ 1. -0.05424451 0.01423008 0.01758467 -0.03799791 0.146267


4
-0.07793367 0.06328135 0.13414535 -0.01721492 0.05255444 -0.02164625
-0.02944189 -0.02383239 -0.01902415 0.06196196 0.0384558 -0.00116111
-0.02437942 0.1108885 -0.05274602 -0.01206332 -0.07061918 0.01159541
0.03963217 -0.03796366 0.00800788 -0.02832515 -0.02506188 -0.02671741
0.00515232]
PACF values: [ 1. -0.05424451 0.01132093 0.01902033 -0.03631952 0.14248
593
-0.06425536 0.05628265 0.13865735 0.00467306 0.02479729 0.00333099
-0.04712943 -0.0601706 -0.00357006 0.03472574 0.03285513 0.00629871
-0.02994631 0.12247964 -0.04395441 -0.00999909 -0.07376671 0.0032677
-0.00915684 -0.01225068 -0.01271558 -0.03274584 -0.01351006 -0.02857296
0.03796791]

In [7]: # Fit A SARIMA Model


model = SARIMAX(diff_monthly, order=(1,1,1), seasonal_order=(1,0,1,12))
results = model.fit()

Based on the ACF and PACF plots, there was no clear evidence of strong seasonality, and the
spikes were not strong enough to infer a definitive seasonal component. The PACF plot
showed a small spike at lag 6, but there were no other strong or consistent spikes that
suggest a clear pattern in the autocorrelations. Thus, the SARIMA(1,1,1)(1,0,1)12 model
parameters were chosen as a starting point.

The chosen model has the following components:

Non-seasonal component:

file:///C:/Users/Ishan/Desktop/TSA Project Python Code.html 3/6


4/23/23, 5:37 PM TSA Project Ultimate

Autoregressive term (p=1): This accounts for the direct relationship between the current
value and the previous value in the time series.

Differencing term (d=1): This makes the time series stationary by taking the first difference of
the series.

Moving average term (q=1): This captures the relationship between the current value and
the residual error from the previous value.

Seasonal component:

Seasonal autoregressive term (P=1): This accounts for the direct relationship between the
current seasonal value and the seasonal value from the previous cycle.

Seasonal differencing term (D=0): No seasonal differencing is applied, as there is no clear


evidence of strong seasonality in the ACF and PACF plots.

Seasonal moving average term (Q=1): This captures the relationship between the current
seasonal value and the residual error from the previous seasonal value.

Seasonal period (s=12): This sets the seasonal period to 12 months, which is typical for
monthly data with potential yearly seasonality.

In [8]: # Make Predictions For The Next 4 Years


start_date = '2020-01-31'
end_date = '2023-12-31'
pred_monthly = results.predict(start=start_date, end=end_date)

# Plot The Predicted Values


plt.plot(pred_monthly)
plt.xlabel('Year')
plt.ylabel('Predicted Differenced Close Price')
plt.title('Predicted Monthly Returns from 2020 to 2023')
plt.show()

file:///C:/Users/Ishan/Desktop/TSA Project Python Code.html 4/6


4/23/23, 5:37 PM TSA Project Ultimate

In [13]: # Perform t-test On January Returns


jan_returns = pred_monthly[pred_monthly.index.month == 1]
other_returns = pred_monthly[pred_monthly.index.month != 1]
t_stat, p_value = ttest_ind(jan_returns, other_returns, equal_var=False)
print('\nNull hypothesis (H0): There is no significant difference between the mean
print('\nAlternative hypothesis (H1): There is a significant difference between the
print('\nt-statistic:', t_stat)
print('\np-value:', p_value)

if p_value < 0.05:


print('\nThe January effect exists')
else:
print('\nThe January effect does not exist for the predicted values')

Null hypothesis (H0): There is no significant difference between the mean returns
of January and the mean returns of other months. In other words, the January Effec
t does not exist.

Alternative hypothesis (H1): There is a significant difference between the mean re


turns of January and the mean returns of other
months. This suggests that the January Effect exists.

t-statistic: -0.1618069189156481

p-value: 0.8804849944025712

The January effect does not exist for the predicted values

file:///C:/Users/Ishan/Desktop/TSA Project Python Code.html 5/6


4/23/23, 5:37 PM TSA Project Ultimate

In [10]: # Perform Ljung-Box Test For Autocorrelations


lb_stat, lb_p_value = acorr_ljungbox(results.resid, lags=[12, 24, 36], return_df=Fa
print('\nLjung-Box statistic (lag 12):', lb_stat[0])
print('\np-value (lag 12):', lb_p_value[0])
print('\nLjung-Box statistic (lag 24):', lb_stat[1])
print('\np-value (lag 24):', lb_p_value[1])
print('\nLjung-Box statistic (lag 36):', lb_stat[2])
print('\np-value (lag 36):', lb_p_value[2])

# Check For Significant Autocorrelation


if any(lb_p_value < 0.05):
print('\nThere is significant autocorrelation in the residuals')
else:
print('\nThere is no significant autocorrelation in the residuals')

Ljung-Box statistic (lag 12): 10.388090875990883

p-value (lag 12): 0.5819538907237682

Ljung-Box statistic (lag 24): 17.20902033750296

p-value (lag 24): 0.8396114070768761

Ljung-Box statistic (lag 36): 23.87893364865142

p-value (lag 36): 0.9393244429177144

There is no significant autocorrelation in the residuals

Since there is no significant autocorrelation in the residuals as indicated by the Ljung-Box


test at various lags, we can assume that the SARIMA(1,1,1)(1,0,1)12 model provides a good
fit for the data. This means that the model captures the underlying patterns and seasonality
of the time series and that the residuals do not contain any significant autocorrelation that
needs to be accounted for. Therefore, the model can be used for forecasting and making
predictions with reasonable accuracy.

In [ ]:

file:///C:/Users/Ishan/Desktop/TSA Project Python Code.html 6/6

You might also like