Ise487 - HW#1
Ise487 - HW#1
[This sheet must be completed and attached to the first page of your homework]
ISE 487
Predic.ve Analy.cs Techniques
Term 242
Homework #1
BanderMoafa 202024760
Homework Guidelines
To receive full credit, you should make sure you follow the following guidelines.
Homework Presentation:
• Every main problem should be answered on a DIFFERENT page.
• All pages of your homework should be in CHRONOLOGICAL order. Your solution should not include any
crossed out lines, and question numbers should be very clear.
• Your NAME, ID, and the homework number should be clearly indicated.
• Submit entire HW as ONE single word or pdf document, and one notebook file.
• If you scan or take photo of your HW for the submissions, make sure the scan/photo is LEGIBLE. Use
pen or dark pencils to avoid unreadable scan.
Statistical Background Anis Elgabli HW1
Problem A. Consider the time series !! given in HW1a.csv file. Do the following using python:
A1. Apply a simple moving average filter of window 5 for the given time series. Plot the
transformed data, "!⬚ .
A2. Apply a rolling median window filter of window 5 for the given time series. Plot the
[$]
transformed data, "! .
A3. Plot the sample autocorrelation function of the given time series, i.e., #$& vs %.
A4. Plot the sample variogram for the given time series, i.e., &'& vs %.
A5. Using A3 and A4 can you conclude that the time series is (weakly) stationary?
A6. Apply a power transform to the given time series using ( = 0. For the transformed data,
(()
!! , repeat A3, A4 & A5.
A7. Apply the first diHerence (diHerencing of lag 1) on the given time series. For the
transformed data, ∇!! , repeat A3, A4 & A5.
A8. For the transformed data in A6, ∇!! , is there a seasonality? If yes, then apply a
diHerencing based operation to remove seasonality. For the transformed data,
∇* ∇!! ,repeat A3, A4 & A5.
Hand Calculations: For the given time series, calculate the values for:
[-] [-] (/) (/)
"+ , ", , "+ , ", , #$+ , #$, , &', , &'. , !+ , !, , ∇!, , ∇!. , ∇+, ∇!+. , ∇+, ∇!+0
Problem B. Consider the one step ahead time series forecast errors .! (1) given in HW1b.csv file.
Do the following using python:
B1. Calculate the mean error ("2), the mean squared error ("32), the mean absolute
deviation ("45), the mean percentage error ("62), and the mean absolute percentage error
("462).
B2. Plot the forecast errors, sample ACF and histogram of the given forecast errors.
B3. Is it likely that the forecasting technique (that produced the above errors) produces
unbiased forecasts.
B4. From the above plots, what can you infer about the distribution of the errors, trend and
seasonality of the errors.
Hand Calculations: For the given forecast errors, calculate all the values of B1 for the first 5 errors.
Statistical Background Anis Elgabli HW1
C1. Let us say a simple moving average of window 7 is applied to an uncorrelated i.i.d data
with mean 8 and variance 9 , . Show that the variance of the transformed data, "! , is 9 , /7.
C2. Let us say a simple moving average and an Hanning filter (both of window 3) are applied
to an uncorrelated i.i.d data with mean 8 and variance 9 , . Is the variance of the transformed data,
"!1 , smaller than the variance of "! .
C3. Suppose that a simple moving average of window 7 is used to forecast a time series
that varies randomly around a constant mean, that is, !! = 8 + =! . At the start of period >+ the
mean of the time series shifts to a new mean level, say, 8 + ?. Show that the expected value of the
transformed data, "! is:
Problem A
A-1:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("HW1a.csv", index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")
df["SMA_5"] = df["Values"].rolling(window=5).mean()
plt.figure(figsize=(10,5))
plt.plot(df.index, df["Values"], label="Original Data",
linestyle='dashed', marker='o')
plt.plot(df.index, df["SMA_5"], label="SMA (5)", linewidth=2,
color='red')
plt.xlabel("Months")
plt.ylabel("Values")
plt.title("Simple Moving Average (Window = 5)")
plt.legend()
plt.xticks(rotation=45)
plt.grid()
plt.show()
A-2:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")
df["Rolling_Median_5"] = df["Values"].rolling(window=5).median()
plt.figure(figsize=(10,5))
plt.plot(df.index, df["Values"], label="Original Data",
linestyle='dashed', marker='o')
plt.plot(df.index, df["Rolling_Median_5"], label="Rolling Median (5)",
linewidth=2, color='green')
plt.xlabel("Months")
plt.ylabel("Values")
plt.title("Rolling Median Filter (Window = 5)")
plt.legend()
plt.xticks(rotation=45)
plt.grid()
plt.show()
/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/3522917650.py:4: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
A-3:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")
plt.figure(figsize=(8,5))
plot_acf(df["Values"], lags=20)
plt.title("Autocorrelation Function (ACF) of Time Series")
plt.grid()
plt.show()
/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/2013340745.py:6: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")
plt.figure(figsize=(8,5))
plt.plot(lags, variogram_values, marker='o', linestyle='-',
color='blue')
plt.xlabel("Lag (k)")
plt.ylabel("Sample Variogram G(k)")
plt.title("Sample Variogram of Time Series")
plt.grid()
plt.show()
/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/859536182.py:5: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
A-5:
The time series is NOT weakly stationary because:
A-6:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")
df["Log_Values"] = np.log(df["Values"])
plt.figure(figsize=(8,5))
plot_acf(df["Log_Values"], lags=20)
plt.title("Autocorrelation Function (ACF) - Log Transformed Data")
plt.grid()
plt.show()
plt.figure(figsize=(8,5))
plt.plot(lags, variogram_values, marker='o', linestyle='-',
color='purple')
plt.xlabel("Lag (k)")
plt.ylabel("Sample Variogram G(k)")
plt.title("Sample Variogram - Log Transformed Data")
plt.grid()
plt.show()
/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/2050500087.py:6: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df["Diff_Values"] = df["Values"].diff()
plt.figure(figsize=(8,5))
plot_acf(df["Diff_Values"].dropna(), lags=20)
plt.title("Autocorrelation Function (ACF) - First Differenced Data")
plt.grid()
plt.show()
plt.figure(figsize=(8,5))
plt.plot(lags, variogram_values, marker='o', linestyle='-',
color='orange')
plt.xlabel("Lag (k)")
plt.ylabel("Sample Variogram G(k)")
plt.title("Sample Variogram - First Differenced Data")
plt.grid()
plt.show()
/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/3760779135.py:1: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df["Seasonal_Diff_Values"] = df["Values"].diff(12)
plt.figure(figsize=(8,5))
plot_acf(df["Seasonal_Diff_Values"].dropna(), lags=20)
plt.title("Autocorrelation Function (ACF) - Seasonally Differenced
Data")
plt.grid()
plt.show()
lags, variogram_values =
sample_variogram(df["Seasonal_Diff_Values"].dropna(), max_lag=20)
plt.figure(figsize=(8,5))
plt.plot(lags, variogram_values, marker='o', linestyle='-',
color='red')
plt.xlabel("Lag (k)")
plt.ylabel("Sample Variogram G(k)")
plt.title("Sample Variogram - Seasonally Differenced Data")
plt.grid()
plt.show()
/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/622073623.py:1: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
M.es
MtM2tM3tMtMS 6.99 7.13 809941 7.604
5 3.4
52 0.62
6.997.45 7.138.09
62 N 2
0.57
B-1
df = pd.read_csv("HW1b.csv", parse_dates=["Year"], index_col="Year")
errors = df["Forecast Errors"]
B-2:
plt.figure(figsize=(8,4))
plt.plot(df.index, errors, marker='o', linestyle='dashed',
color='red')
plt.axhline(y=0, color='black', linestyle='--')
plt.xlabel("Year")
plt.ylabel("Forecast Error")
plt.title("Forecast Errors Over Time")
plt.grid()
plt.show()
plt.figure(figsize=(8,4))
plot_acf(errors, lags=15)
plt.title("ACF of Forecast Errors")
plt.grid()
plt.show()
plt.figure(figsize=(8,4))
plt.hist(errors, bins=10, edgecolor='black', alpha=0.7)
plt.xlabel("Forecast Errors")
plt.ylabel("Frequency")
plt.title("Histogram of Forecast Errors")
plt.grid()
plt.show()
B-4:
• There may be a seasonal component in forecast errors, as the error pattern repeats
over time.
• The distribution of errors is nearly symmetric, meaning the forecasting model is not
heavily biased.
• However, the presence of autocorrelation in ACF (from B-2) suggests that the
model is missing some structure.
0.95 132 145 112 0.014
ME et 2.5351
1.4551.122
32.53 132
10.15 2.4805
MSE of 5
132
10.95 125315 1 14511121 1.474
MAD let
2
100 100 0.092
MPE 5
1 11 11 l 3.27
21 1 100
100
MAPE 5
Problem C
C1
var x
Var Mi var 4
ix
II
C2
SMA
Window 3 Var MF I 0.33382
Var Me Var Mt
3
My Xt i
E Mt EK i
before.to E x i 9
After to E Xo i 9 8
i E Me 5 8 N i