0% found this document useful (0 votes)
20 views24 pages

Completed Time Series Analysis! ?

Uploaded by

capstoneecs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views24 pages

Completed Time Series Analysis! ?

Uploaded by

capstoneecs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

06/01/2025, 20:50 Time Series Analysis

In [25]: from IPython.display import Image, display

# Display an image
display(Image(filename=r"C:\Users\hi\OneDrive\Desktop\my file\documents\time.jpg

What is Time Series Analysis?

Time series analysis involves examining data points


collected or recorded at specific time intervals to
understand underlying patterns, trends, and relationships. It
is widely used for forecasting and decision-making across
various domains

Time series analysis studies data that changes over time. The objective
is to identify meaningful characteristics, such as trends, seasonality, and
cyclical behavior, and use these insights for prediction or understanding
temporal dynamics.

Trend: The long-term direction of the data (upward, downward, or stationary).

Seasonality: Regular, periodic fluctuations within a fixed period (e.g., daily, monthly,
yearly). Cyclical Component: Irregular, non-periodic fluctuations due to business or
economic cycles. Noise

Types of Time Series Analysis

localhost:8888/doc/tree/ Time Series Analysis.ipynb 1/24


06/01/2025, 20:50 Time Series Analysis

1 Descriptive Analysis: Summarizes the main features of the time series


(e.g., mean, variance, autocorrelation).
2 Exploratory Analysis: Examines patterns and relationships within the data (e.g., trend
analysis, seasonality detection). 3 .Forecasting: Predicts future data points using historical
data (e.g., ARIMA, SARIMA, LSTM). 4 .Causal Analysis: Identifies cause-effect relationships
within or between time series (e.g., Granger causality). 4 .Frequency Analysis: Analyzes
periodic patterns using techniques like Fourier Transform or Wavelet Transform.

1. Descriptive Analysis: Summarizes the main features of the time series


(e.g., mean, variance, autocorrelation).

In [1]: import pandas as pd


import matplotlib.pyplot as plt

# Load dataset (e.g., airline passenger data)


url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passe
data = pd.read_csv(url, parse_dates=['Month'], index_col='Month')

# Visualize the data


data.plot(title='Airline Passengers Data', figsize=(10, 5))
plt.show()

2. Decomposing Time Series


In [2]: from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose the time series


decomposition = seasonal_decompose(data, model='additive')
decomposition.plot()
plt.show()

localhost:8888/doc/tree/ Time Series Analysis.ipynb 2/24


06/01/2025, 20:50 Time Series Analysis

3. ARIMA Model

In [3]: from statsmodels.tsa.arima.model import ARIMA


from sklearn.metrics import mean_squared_error

# Train-test split
train = data[:100]
test = data[100:]

# Fit ARIMA model


model = ARIMA(train, order=(5, 1, 0)) # (p=5, d=1, q=0)
fitted_model = model.fit()

# Forecast
forecast = fitted_model.forecast(steps=len(test))
test['Forecast'] = forecast

# Plot results
plt.figure(figsize=(10, 5))
plt.plot(train, label='Training Data')
plt.plot(test['Passengers'], label='Actual Data', color='blue')
plt.plot(test['Forecast'], label='Forecasted Data', color='orange')
plt.legend()
plt.show()

# Calculate error
error = mean_squared_error(test['Passengers'], forecast)
print(f"Mean Squared Error: {error}")

localhost:8888/doc/tree/ Time Series Analysis.ipynb 3/24


06/01/2025, 20:50 Time Series Analysis

C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\AppData\Local\Temp\ipykernel_10776\1960587366.py:14: SettingWithCopyW
arning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stabl


e/user_guide/indexing.html#returning-a-view-versus-a-copy
test['Forecast'] = forecast

Mean Squared Error: 14515.096811765876

4. Advanced Techniques
In [4]: import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM

# Prepare data for LSTM


data_values = data.values
train_size = int(len(data_values) * 0.8)
train, test = data_values[:train_size], data_values[train_size:]

# Reshape data into [samples, time steps, features]


def create_dataset(dataset, look_back=1):
X, Y = [], []
for i in range(len(dataset) - look_back - 1):
X.append(dataset[i:(i + look_back), 0])
Y.append(dataset[i + look_back, 0])
return np.array(X), np.array(Y)

localhost:8888/doc/tree/ Time Series Analysis.ipynb 4/24


06/01/2025, 20:50 Time Series Analysis

look_back = 3
X_train, y_train = create_dataset(train, look_back)
X_test, y_test = create_dataset(test, look_back)

X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))


X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))

# Build LSTM model


model = Sequential([
LSTM(50, input_shape=(look_back, 1)),
Dense(1)
])
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, epochs=20, batch_size=1, verbose=2)

# Forecast
lstm_predictions = model.predict(X_test)
plt.figure(figsize=(10, 5))
plt.plot(test[look_back+1:], label='Actual Data')
plt.plot(lstm_predictions, label='LSTM Predictions')
plt.legend()
plt.show()

C:\Users\hi\anaconda3\Lib\site-packages\keras\src\layers\rnn\rnn.py:204: UserWarn
ing: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Seq
uential models, prefer using an `Input(shape)` object as the first layer in the m
odel instead.
super().__init__(**kwargs)

localhost:8888/doc/tree/ Time Series Analysis.ipynb 5/24


06/01/2025, 20:50 Time Series Analysis

Epoch 1/20
111/111 - 6s - 53ms/step - loss: 64586.8008
Epoch 2/20
111/111 - 1s - 5ms/step - loss: 61537.0078
Epoch 3/20
111/111 - 1s - 5ms/step - loss: 58685.2891
Epoch 4/20
111/111 - 1s - 5ms/step - loss: 56694.9336
Epoch 5/20
111/111 - 1s - 5ms/step - loss: 54899.6445
Epoch 6/20
111/111 - 1s - 6ms/step - loss: 53251.5664
Epoch 7/20
111/111 - 1s - 5ms/step - loss: 51684.4922
Epoch 8/20
111/111 - 1s - 5ms/step - loss: 50181.5898
Epoch 9/20
111/111 - 1s - 6ms/step - loss: 48733.3164
Epoch 10/20
111/111 - 1s - 5ms/step - loss: 47326.9453
Epoch 11/20
111/111 - 1s - 5ms/step - loss: 45963.5234
Epoch 12/20
111/111 - 1s - 5ms/step - loss: 44643.4883
Epoch 13/20
111/111 - 1s - 6ms/step - loss: 43356.7266
Epoch 14/20
111/111 - 1s - 5ms/step - loss: 42111.0039
Epoch 15/20
111/111 - 1s - 5ms/step - loss: 40900.3594
Epoch 16/20
111/111 - 1s - 5ms/step - loss: 39707.4375
Epoch 17/20
111/111 - 1s - 5ms/step - loss: 38348.6875
Epoch 18/20
111/111 - 0s - 4ms/step - loss: 37123.3203
Epoch 19/20
111/111 - 0s - 4ms/step - loss: 35981.4023
Epoch 20/20
111/111 - 1s - 5ms/step - loss: 34882.6484
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 497ms/step

localhost:8888/doc/tree/ Time Series Analysis.ipynb 6/24


06/01/2025, 20:50 Time Series Analysis

Types of Time Series Analysis


Time series analysis can be categorized based on the methods used or the specific goals
of the analysis. Here are the primary types:

1. Descriptive Analysis

This involves summarizing and visualizing the main features


of a time series.
Purpose: Identify overall trends, seasonality, and variability. Examples: Plotting data to
observe trends or fluctuations. Calculating summary statistics (mean, variance, etc.).
Correlation analysis between time series.

In [5]: import pandas as pd


import matplotlib.pyplot as plt

# Load dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passe
data = pd.read_csv(url, parse_dates=['Month'], index_col='Month')

# Descriptive Analysis
print(data.describe()) # Summary statistics
data.plot(title='Airline Passengers Over Time', figsize=(10, 5))
plt.show()

Passengers
count 144.000000
mean 280.298611
std 119.966317
min 104.000000
25% 180.000000
50% 265.500000
75% 360.500000
max 622.000000

localhost:8888/doc/tree/ Time Series Analysis.ipynb 7/24


06/01/2025, 20:50 Time Series Analysis

Exploratory Analysi

This focuses on understanding patterns and relationships within the data.

Purpose: Uncover hidden patterns (e.g., periodicity, seasonality, trends). Examples:


Identifying lags using autocorrelation (ACF, PACF). Decomposing time series into trend,
seasonal, and residual components.

In [6]: from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Autocorrelation Plot
plot_acf(data, lags=20)
plt.show()

plot_pacf(data, lags=20)
plt.show()

localhost:8888/doc/tree/ Time Series Analysis.ipynb 8/24


06/01/2025, 20:50 Time Series Analysis

3. Forecasting

Predicting future values using historical data.

Purpose: Generate accurate forecasts for decision-making. Methods: Classical models:


ARIMA, SARIMA, Exponential Smoothing. Machine Learning: Random Forest, XGBoost.
Deep Learning: LSTM, GRU, Transformers

In [7]: from statsmodels.tsa.arima.model import ARIMA

# Fit ARIMA model


model = ARIMA(data, order=(5, 1, 0)) # p=5, d=1, q=0
model_fit = model.fit()
forecast = model_fit.forecast(steps=12)

# Plot Forecast
data.plot(label='Historical Data')
forecast.plot(label='Forecast', color='orange')
plt.legend()
plt.show()

localhost:8888/doc/tree/ Time Series Analysis.ipynb 9/24


06/01/2025, 20:50 Time Series Analysis

C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)

4. Causal Analysis

Examines cause-effect relationships between variables.


Purpose: Determine whether one time series influences another. Example Method:
Granger Causality

In [8]: from statsmodels.tsa.stattools import grangercausalitytests

# Granger causality test (example with two time series)


grangercausalitytests(data[['Passengers', 'Another_Series']], maxlag=4)

localhost:8888/doc/tree/ Time Series Analysis.ipynb 10/24


06/01/2025, 20:50 Time Series Analysis

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[8], line 4
1 from statsmodels.tsa.stattools import grangercausalitytests
3 # Granger causality test (example with two time series)
----> 4 grangercausalitytests(data[['Passengers', 'Another_Series']], maxlag=4)

File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:4108, in DataFrame.__geti


tem__(self, key)
4106 if is_iterator(key):
4107 key = list(key)
-> 4108 indexer = self.columns._get_indexer_strict(key, "columns")[1]
4110 # take() does not accept boolean indexers
4111 if getattr(indexer, "dtype", None) == bool:

File ~\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:6200, in Index._ge


t_indexer_strict(self, key, axis_name)
6197 else:
6198 keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 6200 self._raise_if_missing(keyarr, indexer, axis_name)
6202 keyarr = self.take(indexer)
6203 if isinstance(key, Index):
6204 # GH 42790 - Preserve name from an Index

File ~\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:6252, in Index._ra


ise_if_missing(self, key, indexer, axis_name)
6249 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
6251 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 6252 raise KeyError(f"{not_found} not in index")

KeyError: "['Another_Series'] not in index"

5. Frequency Analysis

Analyzes periodic patterns in time series data.

Purpose: Detect hidden frequencies or cycles. Methods: Fourier Transform, Wavelet


Transform

In [9]: import numpy as np


from scipy.fft import fft

# Fourier Transform
fft_result = fft(data['Passengers'])
plt.plot(np.abs(fft_result))
plt.title('Frequency Spectrum')
plt.show()

localhost:8888/doc/tree/ Time Series Analysis.ipynb 11/24


06/01/2025, 20:50 Time Series Analysis

6. Anomaly Detection

Detects unusual or unexpected events in the data.


Purpose: Identify outliers or anomalies in time series. Methods: Statistical thresholds,
Isolation Forest, Deep Autoencoders

In [10]: from sklearn.ensemble import IsolationForest

# Anomaly Detection
model = IsolationForest(contamination=0.05)
data['Anomaly'] = model.fit_predict(data)

# Plot anomalies
data['Passengers'].plot(label='Data')
data[data['Anomaly'] == -1]['Passengers'].plot(style='ro', label='Anomalies')
plt.legend()
plt.show()

localhost:8888/doc/tree/ Time Series Analysis.ipynb 12/24


06/01/2025, 20:50 Time Series Analysis

7. Structural Analysis

Examines how the underlying structure of the data evolves over time.
Purpose: Test for stationarity or structural breaks. Methods: ADF (Augmented Dickey-
Fuller) Test. KPSS (Kwiatkowski–Phillips–Schmidt–Shin) Test

In [11]: from statsmodels.tsa.stattools import adfuller

# Stationarity Test
result = adfuller(data['Passengers'])
print("ADF Statistic:", result[0])
print("p-value:", result[1])

ADF Statistic: 0.8153688792060482


p-value: 0.991880243437641

8. Multivariate Time Series Analysis

Analyzes multiple time series together to understand their interactions.


Purpose: Forecast or infer relationships between variables. Methods: Vector
AutoRegression (VAR), VARIMA.

In [12]: from statsmodels.tsa.api import VAR

# Fit VAR model


model = VAR(data)
results = model.fit()
forecast = results.forecast(data.values[-5:], steps=10)

localhost:8888/doc/tree/ Time Series Analysis.ipynb 13/24


06/01/2025, 20:50 Time Series Analysis

C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)

9. Change Point Detection

Identifies points where the statistical properties of the time series change.

Purpose: Detect regime shifts or structural breaks. Methods: PELT, BOCPD (Bayesian
Online Change Point Detection)

In [ ]:

In [13]: import ruptures as rpt

# Change Point Detection


algo = rpt.Pelt(model="rbf").fit(data['Passengers'].values)
breakpoints = algo.predict(pen=10)
rpt.display(data['Passengers'].values, breakpoints)
plt.show()

---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[13], line 1
----> 1 import ruptures as rpt
3 # Change Point Detection
4 algo = rpt.Pelt(model="rbf").fit(data['Passengers'].values)

ModuleNotFoundError: No module named 'ruptures'

In [14]: !pip install reptures

ERROR: Could not find a version that satisfies the requirement reptures (from ver
sions: none)
ERROR: No matching distribution found for reptures

Time Series Models


Time series models are mathematical and statistical methods used to
analyze, predict, and understand patterns in time series data. These
models account for temporal dependencies and are broadly classified
into classical statistical models, machine learning models, and deep
learning models.

These models are based on statistical properties of the data


and assume linear relationships.
1.1 Autoregressive (AR) Model Definition: Predicts the current value of the series using
past values (lags). Formula: 𝑌 𝑡 = 𝜙 1 𝑌 𝑡 − 1

𝜙2𝑌𝑡−2

localhost:8888/doc/tree/ Time Series Analysis.ipynb 14/24


06/01/2025, 20:50 Time Series Analysis

𝜙𝑝𝑌𝑡−𝑝

𝜖 𝑡 Y t​=ϕ 1​Y t−1​+ϕ 2​Y t−2​+⋯+ϕ p​Y t−p​+ϵ t​

Key Parameter: p (number of lags). Use Case: Suitable for stationary data with
autocorrelation

In [15]: from statsmodels.tsa.ar_model import AutoReg

# Fit AR model
model = AutoReg(data['Passengers'], lags=5)
ar_model = model.fit()
print(ar_model.summary())

AutoReg Model Results


==============================================================================
Dep. Variable: Passengers No. Observations: 144
Model: AutoReg(5) Log Likelihood -670.045
Method: Conditional MLE S.D. of innovations 30.010
Date: Mon, 06 Jan 2025 AIC 1354.089
Time: 20:29:55 BIC 1374.631
Sample: 06-01-1949 HQIC 1362.437
- 12-01-1960
=================================================================================
coef std err z P>|z| [0.025 0.975]
---------------------------------------------------------------------------------
const 9.8632 6.922 1.425 0.154 -3.703 23.430
Passengers.L1 1.2990 0.083 15.621 0.000 1.136 1.462
Passengers.L2 -0.5321 0.140 -3.794 0.000 -0.807 -0.257
Passengers.L3 0.1433 0.148 0.965 0.334 -0.148 0.434
Passengers.L4 -0.1920 0.143 -1.340 0.180 -0.473 0.089
Passengers.L5 0.2585 0.089 2.905 0.004 0.084 0.433
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 1.0192 -0.0000j 1.0192 -0.0000
AR.2 0.8256 -0.9141j 1.2318 -0.1331
AR.3 0.8256 +0.9141j 1.2318 0.1331
AR.4 -0.9638 -1.2541j 1.5817 -0.3543
AR.5 -0.9638 +1.2541j 1.5817 0.3543
-----------------------------------------------------------------------------
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)

Moving Average (MA) Model


localhost:8888/doc/tree/ Time Series Analysis.ipynb 15/24
06/01/2025, 20:50 Time Series Analysis

Definition: Models the current value as a linear combination of past


error terms.
Formula: 𝑌 𝑡 = 𝜇

𝜖𝑡

𝜃1𝜖𝑡−1

𝜃 𝑞 𝜖 𝑡 − 𝑞 Y t​=μ+ϵ t​+θ 1​ϵ t−1​+⋯+θ q​ϵ t−q​

Key Parameter: q (number of error lags). Use Case: Suitable for data with short-term
noise.

In [16]: from statsmodels.tsa.arima.model import ARIMA

# Fit MA model (ARIMA with p=0, d=0)


ma_model = ARIMA(data['Passengers'], order=(0, 0, 5)).fit()
print(ma_model.summary())

C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\statespace\sarimax.py:97
8: UserWarning: Non-invertible starting MA parameters found. Using zeros as start
ing parameters.
warn('Non-invertible starting MA parameters found.'
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\base\model.py:607: Convergenc
eWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "

localhost:8888/doc/tree/ Time Series Analysis.ipynb 16/24


06/01/2025, 20:50 Time Series Analysis

SARIMAX Results
==============================================================================
Dep. Variable: Passengers No. Observations: 144
Model: ARIMA(0, 0, 5) Log Likelihood -733.783
Date: Mon, 06 Jan 2025 AIC 1481.566
Time: 20:31:26 BIC 1502.355
Sample: 01-01-1949 HQIC 1490.013
- 12-01-1960
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 280.3154 16.302 17.195 0.000 248.364 312.266
ma.L1 1.1217 78.179 0.014 0.989 -152.106 154.349
ma.L2 0.3926 9.582 0.041 0.967 -18.387 19.172
ma.L3 0.3872 21.146 0.018 0.985 -41.059 41.833
ma.L4 1.1083 9.207 0.120 0.904 -16.937 19.154
ma.L5 0.9920 77.646 0.013 0.990 -151.191 153.175
sigma2 1386.4913 1.08e+05 0.013 0.990 -2.11e+05 2.14e+05
=================================================================================
==
Ljung-Box (L1) (Q): 26.15 Jarque-Bera (JB): 12.
36
Prob(Q): 0.00 Prob(JB): 0.
00
Heteroskedasticity (H): 2.79 Skew: 0.
71
Prob(H) (two-sided): 0.00 Kurtosis: 3.
16
=================================================================================
==

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-st
ep).

ARMA Model

Definition: Combines AR and MA models.


Formula: 𝑌 𝑡 = 𝜙 1 𝑌 𝑡 − 1

𝜙𝑝𝑌𝑡−𝑝

𝜖𝑡

𝜃1𝜖𝑡−1

localhost:8888/doc/tree/ Time Series Analysis.ipynb 17/24


06/01/2025, 20:50 Time Series Analysis

𝜃 𝑞 𝜖 𝑡 − 𝑞 Y t​=ϕ 1​Y t−1​+⋯+ϕ p​Y t−p​+ϵ t​+θ 1​ϵ t−1​+⋯+θ q​ϵ t−q​

Use Case: Stationary data with both autocorrelation and noise

In [17]: arma_model = ARIMA(data['Passengers'], order=(2, 0, 2)).fit()


print(arma_model.summary())

C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\statespace\sarimax.py:96
6: UserWarning: Non-stationary starting autoregressive parameters found. Using ze
ros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\statespace\sarimax.py:97
8: UserWarning: Non-invertible starting MA parameters found. Using zeros as start
ing parameters.
warn('Non-invertible starting MA parameters found.'

localhost:8888/doc/tree/ Time Series Analysis.ipynb 18/24


06/01/2025, 20:50 Time Series Analysis

SARIMAX Results
==============================================================================
Dep. Variable: Passengers No. Observations: 144
Model: ARIMA(2, 0, 2) Log Likelihood -698.172
Date: Mon, 06 Jan 2025 AIC 1408.344
Time: 20:33:11 BIC 1426.162
Sample: 01-01-1949 HQIC 1415.584
- 12-01-1960
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 280.3016 60.094 4.664 0.000 162.519 398.084
ar.L1 0.2540 0.223 1.137 0.256 -0.184 0.692
ar.L2 0.6510 0.192 3.397 0.001 0.275 1.027
ma.L1 1.1366 0.237 4.794 0.000 0.672 1.601
ma.L2 0.2127 0.173 1.232 0.218 -0.126 0.551
sigma2 930.1711 107.379 8.663 0.000 719.713 1140.630
=================================================================================
==
Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 1.
35
Prob(Q): 0.94 Prob(JB): 0.
51
Heteroskedasticity (H): 6.38 Skew: 0.
20
Prob(H) (two-sided): 0.00 Kurtosis: 3.
24
=================================================================================
==

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-st
ep).

ARIMA Model

Definition: Extends ARMA by including differencing to handle non-stationary data.

Formula: 𝑌 𝑡 = Δ 𝑑 ( 𝜙 1 𝑌 𝑡 − 1

𝜙𝑝𝑌𝑡−𝑝

𝜖𝑡

𝜃1𝜖𝑡−1

localhost:8888/doc/tree/ Time Series Analysis.ipynb 19/24


06/01/2025, 20:50 Time Series Analysis

𝜃 𝑞 𝜖 𝑡 − 𝑞 ) Y t​=Δ d (ϕ 1​Y t−1​+⋯+ϕ p​Y t−p​+ϵ t​+θ 1​ϵ t−1​+⋯+θ q​ϵ t−q​) Key Parameters:
p: AR order. d: Degree of differencing. q: MA orde

In [18]: # Fit ARIMA model


arima_model = ARIMA(data['Passengers'], order=(2, 1, 2)).fit()
print(arima_model.summary())

C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
SARIMAX Results
==============================================================================
Dep. Variable: Passengers No. Observations: 144
Model: ARIMA(2, 1, 2) Log Likelihood -671.673
Date: Mon, 06 Jan 2025 AIC 1353.347
Time: 20:34:17 BIC 1368.161
Sample: 01-01-1949 HQIC 1359.366
- 12-01-1960
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 1.6850 0.020 83.059 0.000 1.645 1.725
ar.L2 -0.9548 0.017 -55.420 0.000 -0.989 -0.921
ma.L1 -1.8432 0.125 -14.795 0.000 -2.087 -1.599
ma.L2 0.9953 0.135 7.373 0.000 0.731 1.260
sigma2 665.9568 114.115 5.836 0.000 442.295 889.619
=================================================================================
==
Ljung-Box (L1) (Q): 0.30 Jarque-Bera (JB): 1.
84
Prob(Q): 0.59 Prob(JB): 0.
40
Heteroskedasticity (H): 7.38 Skew: 0.
27
Prob(H) (two-sided): 0.00 Kurtosis: 3.
14
=================================================================================
==

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-st
ep).

localhost:8888/doc/tree/ Time Series Analysis.ipynb 20/24


06/01/2025, 20:50 Time Series Analysis

C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\base\model.py:607: Convergenc
eWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "

SARIMA Model
Definition: Extends ARIMA to handle seasonal data by adding seasonal
terms.
Formula: 𝑆 𝐴 𝑅 𝐼 𝑀 𝐴 ( 𝑝 , 𝑑 , 𝑞 ) ( 𝑃 , 𝐷 , 𝑄 , 𝑠 ) SARIMA(p,d,q)(P,D,Q,s) where 𝑠 s is the
seasonal period. Use Case: Seasonal data

In [19]: from statsmodels.tsa.statespace.sarimax import SARIMAX

# Fit SARIMA model


sarima_model = SARIMAX(data['Passengers'], order=(2, 1, 2), seasonal_order=(1, 1
print(sarima_model.summary())

C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)
C:\Users\hi\anaconda3\Lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: Va
lueWarning: No frequency information was provided, so inferred frequency MS will
be used.
self._init_dates(dates, freq)

localhost:8888/doc/tree/ Time Series Analysis.ipynb 21/24


06/01/2025, 20:50 Time Series Analysis

SARIMAX Results
=================================================================================
===========
Dep. Variable: Passengers No. Observations:
144
Model: SARIMAX(2, 1, 2)x(1, 1, [1], 12) Log Likelihood
-503.024
Date: Mon, 06 Jan 2025 AIC
1020.048
Time: 20:35:24 BIC
1040.174
Sample: 01-01-1949 HQIC
1028.226
- 12-01-1960
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 0.4441 0.388 1.145 0.252 -0.316 1.204
ar.L2 0.3287 0.303 1.086 0.278 -0.265 0.922
ma.L1 -0.8352 0.402 -2.079 0.038 -1.623 -0.048
ma.L2 -0.1385 0.385 -0.359 0.719 -0.894 0.617
ar.S.L12 -0.8799 0.274 -3.213 0.001 -1.417 -0.343
ma.S.L12 0.7843 0.359 2.183 0.029 0.080 1.489
sigma2 124.5105 14.050 8.862 0.000 96.974 152.047
=================================================================================
==
Ljung-Box (L1) (Q): 0.03 Jarque-Bera (JB): 12.
42
Prob(Q): 0.86 Prob(JB): 0.
00
Heteroskedasticity (H): 2.62 Skew: 0.
14
Prob(H) (two-sided): 0.00 Kurtosis: 4.
48
=================================================================================
==

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-st
ep).

Machine Learning Model

These models use supervised learning techniques and do not require


assumptions about the data's distribution.
Random Forest Uses historical lags and external features to predict future values

In [20]: from sklearn.ensemble import RandomForestRegressor


from sklearn.model_selection import train_test_split

# Prepare lagged features


data['Lag1'] = data['Passengers'].shift(1)
data['Lag2'] = data['Passengers'].shift(2)
data = data.dropna()

X = data[['Lag1', 'Lag2']]

localhost:8888/doc/tree/ Time Series Analysis.ipynb 22/24


06/01/2025, 20:50 Time Series Analysis

y = data['Passengers']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_

# Fit Random Forest model


rf_model = RandomForestRegressor().fit(X_train, y_train)
predictions = rf_model.predict(X_test)

Deep Learning Models

These models capture complex, non-linear relationships in time series.


3.1 Long Short-Term Memory (LSTM) Recurrent neural network architecture designed for
sequential data.

In [21]: from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import LSTM, Dense

# Prepare data for LSTM


data_values = data['Passengers'].values.reshape(-1, 1)
train_size = int(len(data_values) * 0.8)
train, test = data_values[:train_size], data_values[train_size:]

# Create sequences
def create_sequences(data, seq_length):
X, y = [], []
for i in range(len(data) - seq_length):
X.append(data[i:i+seq_length])
y.append(data[i+seq_length])
return np.array(X), np.array(y)

seq_length = 3
X_train, y_train = create_sequences(train, seq_length)
X_test, y_test = create_sequences(test, seq_length)

# Build LSTM model


model = Sequential([
LSTM(50, activation='relu', input_shape=(seq_length, 1)),
Dense(1)
])
model.compile(optimizer='adam', loss='mse')
model.fit(X_train, y_train, epochs=20, batch_size=16)

# Predict
lstm_predictions = model.predict(X_test)

C:\Users\hi\anaconda3\Lib\site-packages\keras\src\layers\rnn\rnn.py:204: UserWarn
ing: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Seq
uential models, prefer using an `Input(shape)` object as the first layer in the m
odel instead.
super().__init__(**kwargs)

localhost:8888/doc/tree/ Time Series Analysis.ipynb 23/24


06/01/2025, 20:50 Time Series Analysis

Epoch 1/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 6s 11ms/step - loss: 53723.1602
Epoch 2/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - loss: 49023.90627
Epoch 3/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 41025.78121
Epoch 4/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 34404.5312
Epoch 5/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 24417.7988
Epoch 6/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 12097.3975
Epoch 7/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 2751.4854
Epoch 8/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 2470.1809
Epoch 9/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 1776.7296
Epoch 10/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 1331.9918
Epoch 11/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 1232.1517
Epoch 12/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 1411.8165
Epoch 13/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 1300.9708
Epoch 14/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 1088.6345
Epoch 15/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 1309.1160
Epoch 16/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - loss: 1091.2700
Epoch 17/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 1050.2249
Epoch 18/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - loss: 1128.1163
Epoch 19/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 851.8344
Epoch 20/20
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 740.9769
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 587ms/step

In [ ]:

localhost:8888/doc/tree/ Time Series Analysis.ipynb 24/24

You might also like