Week 1 Time Series PDF
Week 1 Time Series PDF
Collecting pmdarima
Downloading pmdarima-2.0.4-cp310-cp310-
manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl.metadata
(7.8 kB)
Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.10/site-
packages (from pmdarima) (1.4.2)
Requirement already satisfied: Cython!=0.29.18,!=0.29.31,>=0.29 in
/opt/conda/lib/python3.10/site-packages (from pmdarima) (3.0.10)
Requirement already satisfied: numpy>=1.21.2 in /opt/conda/lib/python3.10/site-
1
packages (from pmdarima) (1.26.4)
Requirement already satisfied: pandas>=0.19 in /opt/conda/lib/python3.10/site-
packages (from pmdarima) (2.2.3)
Requirement already satisfied: scikit-learn>=0.22 in
/opt/conda/lib/python3.10/site-packages (from pmdarima) (1.2.2)
Requirement already satisfied: scipy>=1.3.2 in /opt/conda/lib/python3.10/site-
packages (from pmdarima) (1.14.1)
Requirement already satisfied: statsmodels>=0.13.2 in
/opt/conda/lib/python3.10/site-packages (from pmdarima) (0.14.2)
Requirement already satisfied: urllib3 in /opt/conda/lib/python3.10/site-
packages (from pmdarima) (1.26.18)
Requirement already satisfied: setuptools!=50.0.0,>=38.6.0 in
/opt/conda/lib/python3.10/site-packages (from pmdarima) (70.0.0)
Requirement already satisfied: packaging>=17.1 in
/opt/conda/lib/python3.10/site-packages (from pmdarima) (21.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in
/opt/conda/lib/python3.10/site-packages (from packaging>=17.1->pmdarima) (3.1.2)
Requirement already satisfied: python-dateutil>=2.8.2 in
/opt/conda/lib/python3.10/site-packages (from pandas>=0.19->pmdarima)
(2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-
packages (from pandas>=0.19->pmdarima) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /opt/conda/lib/python3.10/site-
packages (from pandas>=0.19->pmdarima) (2024.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in
/opt/conda/lib/python3.10/site-packages (from scikit-learn>=0.22->pmdarima)
(3.5.0)
Requirement already satisfied: patsy>=0.5.6 in /opt/conda/lib/python3.10/site-
packages (from statsmodels>=0.13.2->pmdarima) (0.5.6)
Requirement already satisfied: six in /opt/conda/lib/python3.10/site-packages
(from patsy>=0.5.6->statsmodels>=0.13.2->pmdarima) (1.16.0)
Downloading pmdarima-2.0.4-cp310-cp310-
manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (2.1 MB)
���������������������������������������� 2.1/2.1 MB
18.9 MB/s eta 0:00:0000:0100:01
Installing collected packages: pmdarima
Successfully installed pmdarima-2.0.4
2
from scipy import fftpack
from statsmodels.tsa.stattools import kpss
from statsmodels.tsa.seasonal import seasonal_decompose
from scipy.stats import boxcox
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
from pmdarima import auto_arima
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=UserWarning)
from sklearn.metrics import mean_absolute_error, mean_squared_error
from prophet import Prophet
[4]: data.head()
[5]: (1511, 6)
[6]: data.info()
<class 'pandas.core.frame.DataFrame'>
3
RangeIndex: 1511 entries, 0 to 1510
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1511 non-null datetime64[ns]
1 Open 1511 non-null float64
2 High 1511 non-null float64
3 Low 1511 non-null float64
4 Close 1511 non-null float64
5 Volume 1511 non-null int64
dtypes: datetime64[ns](1), float64(4), int64(1)
memory usage: 71.0 KB
Close Volume
count 1511.000000 1.511000e+03
mean 107.422091 3.019863e+07
min 40.290000 1.016120e+05
25% 57.855000 2.136213e+07
50% 93.860000 2.662962e+07
75% 138.965000 3.431962e+07
max 244.990000 1.352271e+08
std 56.702299 1.425266e+07
[ ]:
4
Visualize the trend, seasonality, and residual components using appropriate plots.
Analyze the components and interpret the results.
[8]: Date 0
Open 0
High 0
Low 0
Close 0
Volume 0
dtype: int64
5
if adf[1] < 0.05:
print(f"The '{col}' column series is stationary")
else:
print(f"The '{col}' column series is non-stationary")
print('-'*100)
6
print('p-value:', kpss_test[1])
print(f"Lags used: {kpss_test[2]}")
7
1.0.6 Perform time series decomposition to extract the trend, seasonality, and resid-
ual components of the stock prices.
# Print results
print(f"Time period for '{col}' column series")
print(f"Detected peak frequency: {peak_frequency:.4f} cycles per day")
print(f"Estimated period: {time_period:.2f} days")
print(f'Time period in data set: {len(data)} days')
8
--------------------------------------------------------------------------------
--------------------
Time period for 'High' column series
Detected peak frequency: 0.0007 cycles per day
Estimated period: 1511.00 days
Time period in data set: 1511 days
--------------------------------------------------------------------------------
--------------------
9
Time period for 'Low' column series
Detected peak frequency: 0.0007 cycles per day
Estimated period: 1511.00 days
Time period in data set: 1511 days
--------------------------------------------------------------------------------
--------------------
Time period for 'Close' column series
Detected peak frequency: 0.0007 cycles per day
Estimated period: 1511.00 days
Time period in data set: 1511 days
10
--------------------------------------------------------------------------------
--------------------
Time period for 'Volume' column series
Detected peak frequency: 0.0007 cycles per day
Estimated period: 1511.00 days
Time period in data set: 1511 days
--------------------------------------------------------------------------------
--------------------
diff_dataframe = pd.DataFrame()
11
print(f"since series became stationary in first differencing, the value␣
↪of, d={d}")
else:
print(f"The '{col + '_first_diff'}' column series is non-stationary")
print('-'*100)
12
--------------------
<Figure size 1000x500 with 0 Axes>
13
1.1 Additive Decomposisiton
[15]: period = 1 # No-seasonality
figsize = (10,5)
plt.figure(figsize=figsize)
for col in data.columns[1:]:
print(f'Additive decomposisiton plot of {col} column')
decomposition_additive =␣
↪seasonal_decompose(data[col],model='additive',period=period)
decomposition_additive.plot()
14
15
16
17
18
[17]: # # Checking stationarity of the residuals of Additive Decomposition using ADF␣
↪test
# decomposition_additive =␣
↪seasonal_decompose(data[col],model='additive',period=period)
# residuals = decomposition_additive.resid.dropna()
# adf = adfuller(residuals)
# print('ADF Statistic:', adf[0])
# print('p-value:', adf[1])
19
[18]: # De-Trending series after removing trend and seasonality
# Visualization
plt.figure(figsize=(15, 10))
20
1.2 Muliplicative Decomposisiton
[19]: ### Multiplicate Model
plt.figure(figsize=figsize)
for col in data.columns[1:]:
print(f'Muliplicative decomposisiton plot of {col} column')
decomposition_multiplicative =␣
↪seasonal_decompose(data[col],model='multiplicative',period=1)
decomposition_multiplicative.plot()
21
<Figure size 1000x500 with 0 Axes>
22
23
24
25
[20]: # # Checking stationarity of the residuals of Multiplicative Decomposition␣
↪using ADF test
# decomposition_multiplicative =␣
↪seasonal_decompose(data[col],model='multiplicative',period=period)
# residuals = decomposition_multiplicative.resid.dropna()
# adf = adfuller(residuals)
# print('ADF Statistic:', adf[0])
# print('p-value:', adf[1])
26
[21]: # De-Trending series after removing trend and seasonality
# Detrended series
detrended_series = data[col] - decomposition_multiplicative.trend -␣
↪decomposition_multiplicative.seasonal
# Visualization
plt.figure(figsize=(15, 10))
plt.subplot(3, 1, 3)
plt.plot(detrended_series, label='Detrended Series', color='green')
plt.title(f'Detrended Series after Multiplicative Decomposition of {col}')
plt.legend()
plt.tight_layout()
27
1.3 Time series forecasting:
Split the dataset into training and testing sets. Implement a forecasting model (e.g.,
ARIMA,SARIMA, Prophet) on the training set to forecast future stock prices.
Evaluate the performance of the forecasting model using appropriate evaluation metrics.
Visualize the actual vs. predicted stock prices on the testing set.
28
1207 2020-01-16 16:00:00 164.35 166.24 164.03 166.17 23865360
1.4 ————————————————————————-ARIMA————
———————————————–
1.4.1 Box-Cox transformation to stablize variance
[24]: # Applying Box-Cox transformation
lambda_values = {}
29
0 2.255381 2.283901 2.268258
1 2.255216 2.282255 2.264640
2 2.263638 2.282775 2.275102
3 2.264672 2.292404 2.274939
4 2.262920 2.290134 2.274042
else:
print(f"The '{col + '_first_diff'}' column series is non-stationary")
print('-'*100)
train_data
30
ADF Statistic: -14.321117210776181
p-value: 1.140835758236616e-26
The 'BoxCox_Close_first_diff' column series is stationary
since series became stationary in first differencing, the value of, d=1
--------------------------------------------------------------------------------
--------------------
BoxCox_Close_first_diff
31
0 NaN
1 -0.003617
2 0.010462
3 -0.000163
4 -0.000897
… …
1203 -0.001056
1204 0.002720
1205 -0.001607
1206 0.001468
1207 0.004114
plt.figure(figsize=(10, 5))
plot_pacf(train_data[col].dropna(), lags=40)
plt.title(f"Partial Autocorrelation Function (PACF) for train_data's {col}")
plt.xlabel('Lags')
plt.ylabel('PACF')
plt.show()
32
<Figure size 1000x500 with 0 Axes>
33
<Figure size 1000x500 with 0 Axes>
34
1.4.4 Determine parameter q
plt.figure(figsize=(10, 5))
plot_acf(train_data[col].dropna(), lags=40) # Plot ACF for the column
plt.title(f'Autocorrelation Function (ACF) for train_data {col}')
plt.xlabel('Lags')
plt.ylabel('ACF')
35
<Figure size 1000x500 with 0 Axes>
36
<Figure size 1000x500 with 0 Axes>
37
<Figure size 1000x500 with 0 Axes>
38
1.4.5 Using Auto-Arima to determine p,d,q automatically and doing forecasting -
ARIMA
[28]: def inverse_boxcox(y, lambda_value):
if lambda_value == 0:
return np.exp(y) # If lambda is 0, use the exponential function
else:
return np.power((y * lambda_value + 1), 1 / lambda_value)
print(model.summary())
print('+*'*75)
print('\n\n')
39
forecast, conf_int = model.predict(n_periods=n_periods,␣
↪return_conf_int=True)
forecast_original = inverse_boxcox(forecast, lambda_values['Close'])
# Evaluation of forecast
mae = mean_absolute_error(test_data['Close'], forecast_original)
mse = mean_squared_error(test_data['Close'], forecast_original)
rmse = np.sqrt(mse)
mape = np.mean(np.abs((test_data['Close'] - forecast_original) /␣
↪test_data['Close'])) * 100
40
ma.L1 -0.0722 0.020 -3.645 0.000 -0.111 -0.033
ma.L2 -0.0565 0.022 -2.588 0.010 -0.099 -0.014
sigma2 1.717e-05 3.13e-07 54.830 0.000 1.66e-05 1.78e-05
================================================================================
===
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB):
3539.38
Prob(Q): 0.97 Prob(JB):
0.00
Heteroskedasticity (H): 0.50 Skew:
0.46
Prob(H) (two-sided): 0.00 Kurtosis:
11.34
================================================================================
===
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-
step).
+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*
+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*
1.4.6 Using Auto-Arima to determine p,d,q and P,D,Q & m automatically and doing
forecasting - SARIMA
[29]: def inverse_boxcox(y, lambda_value):
if lambda_value == 0:
return np.exp(y)
else:
return np.power((y * lambda_value + 1), 1 / lambda_value)
model = auto_arima(train_data[col].
↪dropna(),seasonal=True,m=1,stepwise=True,trace=True)
print(model.summary())
print('+*'*100)
41
print('\n\n')
# Forecasting
n_periods = len(test_data) # Number of periods to forecast
forecast, conf_int = model.predict(n_periods=n_periods,␣
↪return_conf_int=True)
# Evaluation of forecast
mae = mean_absolute_error(test_data['Close'], forecast_original)
mse = mean_squared_error(test_data['Close'], forecast_original)
rmse = np.sqrt(mse)
mape = np.mean(np.abs((test_data['Close'] - forecast_original) /␣
↪test_data['Close'])) * 100
# Evaluation metrics
print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)
print("Mean Absolute Percentage Error (MAPE):", mape)
42
- 1208
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
intercept 0.0003 0.000 2.943 0.003 0.000 0.001
ma.L1 -0.0722 0.020 -3.645 0.000 -0.111 -0.033
ma.L2 -0.0565 0.022 -2.588 0.010 -0.099 -0.014
sigma2 1.717e-05 3.13e-07 54.830 0.000 1.66e-05 1.78e-05
================================================================================
===
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB):
3539.38
Prob(Q): 0.97 Prob(JB):
0.00
Heteroskedasticity (H): 0.50 Skew:
0.46
Prob(H) (two-sided): 0.00 Kurtosis:
11.34
================================================================================
===
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-
step).
+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*
+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*
+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*
43
future = model.make_future_dataframe(periods=len(test_data) + 365)
# Make predictions
forecast = model.predict(future)
# Reset index
forecast_test.reset_index(drop=True, inplace=True)
test_data.reset_index(drop=True, inplace=True)
# evaluation
mae = mean_absolute_error(test_data['Close'], forecast_test['yhat'])
mse = mean_squared_error(test_data['Close'], forecast_test['yhat'])
rmse = np.sqrt(mse)
mape = np.mean(np.abs((test_data['Close'] - forecast_test['yhat']) /␣
↪test_data['Close'])) * 100
44
forecast['ds'] = pd.to_datetime(forecast['ds'])
1.5.1 Task
Further analysis:
Implement additional time series models or techniques to improve the forecasting performance, and
Compare the performance of different models and techniques.
Discuss the strengths and weaknesses of each model or technique in the context of the Microsoft
Stocks dataset
### Performance Summary:
Prophet outperformed ARIMA and seasonal ARIMA models across all metrics:
MAE (10.36) is lower compared to ARIMA (11.69), indicating more accurate predictions on average
MSE and RMSE are also lower, showing that Prophet has less variance in its errors.
MAPE (5.32%) is better than the ARIMA models (5.95%), meaning that Prophet had more accurate pe
### ARIMA (AutoRegressive Integrated Moving Average)
45
Strengths:
Good for short-term forecasting
Handles non-stationary data
Can tune the AR, MA, and differencing terms to fit the data well.
Widely understood and used
Weaknesses:
ARIMA assumes a linear structure in the data and may not capture more complex, nonlinear pa
it’s less flexible than other models like Prophet for handling seasonal components.
Not suitable for long-term forecasting.
PROPHET
Strengths:
Handles seasonality well
Robust to missing data
The model is designed to be robust to outliers
Prophet has a user-friendly API and does not require extensive
Weaknesses:
Less accurate for short-term predictions
Prophet assumes that trends and seasonal components are additive
it offers less flexibility for fine-tuning
Weaknesses:
Finding the right parameters for SARIMA (p, d, q, P, D, Q) can be challenging and time-cons
Like ARIMA, SARIMA assumes a linear relationship between variables and struggles with captu
46
> Using Auto-Arima to determine p,d,q automatically and doing forecasting - ARIMA
> Using Auto-Arima to determine p,d,q and P,D,Q & m automatically and doing forecasting - SARIM
> Forecasting using PROPHET and plotting the forecast
> Compare the performance of ARIMA/SARIMA and PROPHET. The performance of prophet is slightly b
47