0% found this document useful (0 votes)
28 views22 pages

Ise487 - HW#1

The document outlines the homework guidelines and statistical problems for ISE 487 at King Fahd University of Petroleum and Minerals. It includes tasks related to time series analysis, such as applying moving averages, calculating forecast errors, and assessing stationarity. The document specifies the use of Python for data analysis and requires hand calculations for certain statistical measures.

Uploaded by

Bander Moafa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views22 pages

Ise487 - HW#1

The document outlines the homework guidelines and statistical problems for ISE 487 at King Fahd University of Petroleum and Minerals. It includes tasks related to time series analysis, such as applying moving averages, calculating forecast errors, and assessing stationarity. The document specifies the use of Python for data analysis and requires hand calculations for certain statistical measures.

Uploaded by

Bander Moafa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

King Fahd University of Petroleum and Minerals

Department of Industrial and Systems Engineering


Statistical Background Anis Elgabli HW1

[This sheet must be completed and attached to the first page of your homework]

ISE 487
Predic.ve Analy.cs Techniques
Term 242

Homework #1

Name ID# Signature

BanderMoafa 202024760

Homework Guidelines
To receive full credit, you should make sure you follow the following guidelines.

Homework Presentation:
• Every main problem should be answered on a DIFFERENT page.
• All pages of your homework should be in CHRONOLOGICAL order. Your solution should not include any
crossed out lines, and question numbers should be very clear.
• Your NAME, ID, and the homework number should be clearly indicated.
• Submit entire HW as ONE single word or pdf document, and one notebook file.
• If you scan or take photo of your HW for the submissions, make sure the scan/photo is LEGIBLE. Use
pen or dark pencils to avoid unreadable scan.
Statistical Background Anis Elgabli HW1

Problem A. Consider the time series !! given in HW1a.csv file. Do the following using python:

A1. Apply a simple moving average filter of window 5 for the given time series. Plot the
transformed data, "!⬚ .

A2. Apply a rolling median window filter of window 5 for the given time series. Plot the
[$]
transformed data, "! .

A3. Plot the sample autocorrelation function of the given time series, i.e., #$& vs %.

A4. Plot the sample variogram for the given time series, i.e., &'& vs %.

A5. Using A3 and A4 can you conclude that the time series is (weakly) stationary?

A6. Apply a power transform to the given time series using ( = 0. For the transformed data,
(()
!! , repeat A3, A4 & A5.

A7. Apply the first diHerence (diHerencing of lag 1) on the given time series. For the
transformed data, ∇!! , repeat A3, A4 & A5.

A8. For the transformed data in A6, ∇!! , is there a seasonality? If yes, then apply a
diHerencing based operation to remove seasonality. For the transformed data,
∇* ∇!! ,repeat A3, A4 & A5.

Hand Calculations: For the given time series, calculate the values for:
[-] [-] (/) (/)
"+ , ", , "+ , ", , #$+ , #$, , &', , &'. , !+ , !, , ∇!, , ∇!. , ∇+, ∇!+. , ∇+, ∇!+0

Problem B. Consider the one step ahead time series forecast errors .! (1) given in HW1b.csv file.
Do the following using python:

B1. Calculate the mean error ("2), the mean squared error ("32), the mean absolute
deviation ("45), the mean percentage error ("62), and the mean absolute percentage error
("462).

B2. Plot the forecast errors, sample ACF and histogram of the given forecast errors.

B3. Is it likely that the forecasting technique (that produced the above errors) produces
unbiased forecasts.

B4. From the above plots, what can you infer about the distribution of the errors, trend and
seasonality of the errors.

Hand Calculations: For the given forecast errors, calculate all the values of B1 for the first 5 errors.
Statistical Background Anis Elgabli HW1

Problem C. Answer the following:

C1. Let us say a simple moving average of window 7 is applied to an uncorrelated i.i.d data
with mean 8 and variance 9 , . Show that the variance of the transformed data, "! , is 9 , /7.

C2. Let us say a simple moving average and an Hanning filter (both of window 3) are applied
to an uncorrelated i.i.d data with mean 8 and variance 9 , . Is the variance of the transformed data,
"!1 , smaller than the variance of "! .

C3. Suppose that a simple moving average of window 7 is used to forecast a time series
that varies randomly around a constant mean, that is, !! = 8 + =! . At the start of period >+ the
mean of the time series shifts to a new mean level, say, 8 + ?. Show that the expected value of the
transformed data, "! is:
Problem A
A-1:
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("HW1a.csv", index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")

df["SMA_5"] = df["Values"].rolling(window=5).mean()

plt.figure(figsize=(10,5))
plt.plot(df.index, df["Values"], label="Original Data",
linestyle='dashed', marker='o')
plt.plot(df.index, df["SMA_5"], label="SMA (5)", linewidth=2,
color='red')
plt.xlabel("Months")
plt.ylabel("Values")
plt.title("Simple Moving Average (Window = 5)")
plt.legend()
plt.xticks(rotation=45)
plt.grid()
plt.show()
A-2:
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")

df["Rolling_Median_5"] = df["Values"].rolling(window=5).median()

plt.figure(figsize=(10,5))
plt.plot(df.index, df["Values"], label="Original Data",
linestyle='dashed', marker='o')
plt.plot(df.index, df["Rolling_Median_5"], label="Rolling Median (5)",
linewidth=2, color='green')
plt.xlabel("Months")
plt.ylabel("Values")
plt.title("Rolling Median Filter (Window = 5)")
plt.legend()
plt.xticks(rotation=45)
plt.grid()
plt.show()

/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/3522917650.py:4: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
A-3:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")

plt.figure(figsize=(8,5))
plot_acf(df["Values"], lags=20)
plt.title("Autocorrelation Function (ACF) of Time Series")
plt.grid()
plt.show()

/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/2013340745.py:6: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")

<Figure size 800x500 with 0 Axes>


A-4:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")

def sample_variogram(ts, max_lag=20):


lags = np.arange(1, max_lag + 1)
variogram_values = [
np.mean((ts.values[:-lag] - ts.values[lag:]) ** 2) for lag in
lags
]
return lags, variogram_values

lags, variogram_values = sample_variogram(df["Values"], max_lag=20)

plt.figure(figsize=(8,5))
plt.plot(lags, variogram_values, marker='o', linestyle='-',
color='blue')
plt.xlabel("Lag (k)")
plt.ylabel("Sample Variogram G(k)")
plt.title("Sample Variogram of Time Series")
plt.grid()
plt.show()

/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/859536182.py:5: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")

A-5:
The time series is NOT weakly stationary because:

1. ACF decays slowly, indicating a trend.

2. Variogram cycles up and down, indicating seasonality.

A-6:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")

df["Log_Values"] = np.log(df["Values"])

plt.figure(figsize=(8,5))
plot_acf(df["Log_Values"], lags=20)
plt.title("Autocorrelation Function (ACF) - Log Transformed Data")
plt.grid()
plt.show()

def sample_variogram(ts, max_lag=20):


lags = np.arange(1, max_lag + 1)
variogram_values = [
np.mean((ts.values[:-lag] - ts.values[lag:]) ** 2) for lag in
lags
]
return lags, variogram_values

lags, variogram_values = sample_variogram(df["Log_Values"],


max_lag=20)

plt.figure(figsize=(8,5))
plt.plot(lags, variogram_values, marker='o', linestyle='-',
color='purple')
plt.xlabel("Lag (k)")
plt.ylabel("Sample Variogram G(k)")
plt.title("Sample Variogram - Log Transformed Data")
plt.grid()
plt.show()

/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/2050500087.py:6: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")

<Figure size 800x500 with 0 Axes>


A-7:
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")

df["Diff_Values"] = df["Values"].diff()

plt.figure(figsize=(8,5))
plot_acf(df["Diff_Values"].dropna(), lags=20)
plt.title("Autocorrelation Function (ACF) - First Differenced Data")
plt.grid()
plt.show()

def sample_variogram(ts, max_lag=20):


lags = np.arange(1, max_lag + 1)
variogram_values = [
np.mean((ts.values[:-lag] - ts.values[lag:]) ** 2) for lag in
lags
]
return lags, variogram_values

lags, variogram_values = sample_variogram(df["Diff_Values"].dropna(),


max_lag=20)

plt.figure(figsize=(8,5))
plt.plot(lags, variogram_values, marker='o', linestyle='-',
color='orange')
plt.xlabel("Lag (k)")
plt.ylabel("Sample Variogram G(k)")
plt.title("Sample Variogram - First Differenced Data")
plt.grid()
plt.show()

/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/3760779135.py:1: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")

<Figure size 800x500 with 0 Axes>


A-8:
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")
df.index = pd.to_datetime(df.index, format="%Y-%B")

df["Seasonal_Diff_Values"] = df["Values"].diff(12)
plt.figure(figsize=(8,5))
plot_acf(df["Seasonal_Diff_Values"].dropna(), lags=20)
plt.title("Autocorrelation Function (ACF) - Seasonally Differenced
Data")
plt.grid()
plt.show()

def sample_variogram(ts, max_lag=20):


lags = np.arange(1, max_lag + 1)
variogram_values = [
np.mean((ts.values[:-lag] - ts.values[lag:]) ** 2) for lag in
lags
]
return lags, variogram_values

lags, variogram_values =
sample_variogram(df["Seasonal_Diff_Values"].dropna(), max_lag=20)

plt.figure(figsize=(8,5))
plt.plot(lags, variogram_values, marker='o', linestyle='-',
color='red')
plt.xlabel("Lag (k)")
plt.ylabel("Sample Variogram G(k)")
plt.title("Sample Variogram - Seasonally Differenced Data")
plt.grid()
plt.show()

/var/folders/_v/2p_hq0953kj64klxfdsg0k_w0000gn/T/
ipykernel_47433/622073623.py:1: UserWarning: Could not infer format,
so each element will be parsed individually, falling back to
`dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
df = pd.read_csv("HW1a.csv", parse_dates=["Months"],
index_col="Months")

<Figure size 800x500 with 0 Axes>


M 6.99
y
Ma Y 7.13

M.es
MtM2tM3tMtMS 6.99 7.13 809941 7.604
5 3.4

MstMo 7.13 7.4 941 9 04


µ Matmstyat 8509
8.214

ii iii.li ii iimon 0.78

52 0.62

6.997.45 7.138.09
62 N 2
0.57

6.198.09 7.13 94137


0.93
N 3

log 6.99 1.94


Y
Y log 7.13 1.96
TY 7.13 6.99 0.14 Try z Y y 7.01 6.99 0.02
Ys 7.4 7.13 0.27 KVYH YH 92 7.17 7.13 0.04
## Problem B

B-1
df = pd.read_csv("HW1b.csv", parse_dates=["Year"], index_col="Year")
errors = df["Forecast Errors"]

ME = errors.mean() # Mean Error


MSE = np.mean(errors**2) # Mean Squared Error
MAD = np.mean(np.abs(errors)) # Mean Absolute Deviation
MPE = np.mean(errors / df["Actual Values"]) * 100 # Mean Percentage
Error
MAPE = np.mean(np.abs(errors) / df["Actual Values"]) * 100 # Mean
Absolute Percentage Error

print(f"Mean Error (ME): {ME:.4f}")


print(f"Mean Squared Error (MSE): {MSE:.4f}")
print(f"Mean Absolute Deviation (MAD): {MAD:.4f}")
print(f"Mean Percentage Error (MPE): {MPE:.2f}%")
print(f"Mean Absolute Percentage Error (MAPE): {MAPE:.2f}%")

Mean Error (ME): 0.0544


Mean Squared Error (MSE): 1.2775
Mean Absolute Deviation (MAD): 0.9056
Mean Percentage Error (MPE): 0.16%
Mean Absolute Percentage Error (MAPE): 1.77%

B-2:
plt.figure(figsize=(8,4))
plt.plot(df.index, errors, marker='o', linestyle='dashed',
color='red')
plt.axhline(y=0, color='black', linestyle='--')
plt.xlabel("Year")
plt.ylabel("Forecast Error")
plt.title("Forecast Errors Over Time")
plt.grid()
plt.show()

plt.figure(figsize=(8,4))
plot_acf(errors, lags=15)
plt.title("ACF of Forecast Errors")
plt.grid()
plt.show()

plt.figure(figsize=(8,4))
plt.hist(errors, bins=10, edgecolor='black', alpha=0.7)
plt.xlabel("Forecast Errors")
plt.ylabel("Frequency")
plt.title("Histogram of Forecast Errors")
plt.grid()
plt.show()

<Figure size 800x400 with 0 Axes>


B-3:
The forecasting method is slightly biased because:

• ME is close to zero (good).

• ACF shows correlation at some lags (bad).

B-4:
• There may be a seasonal component in forecast errors, as the error pattern repeats
over time.

• The distribution of errors is nearly symmetric, meaning the forecasting model is not
heavily biased.

• However, the presence of autocorrelation in ACF (from B-2) suggests that the
model is missing some structure.
0.95 132 145 112 0.014
ME et 2.5351

1.4551.122
32.53 132
10.15 2.4805
MSE of 5

132
10.95 125315 1 14511121 1.474
MAD let

2
100 100 0.092
MPE 5

1 11 11 l 3.27
21 1 100
100
MAPE 5
Problem C
C1
var x
Var Mi var 4
ix
II
C2

SMA
Window 3 Var MF I 0.33382

Hanning Window 3 Var Mf 7 Tarko 1far 4 1 var x


ft 8 T 0.3755

Var Me Var Mt
3

My Xt i

E Mt EK i

before.to E x i 9
After to E Xo i 9 8

i E Me 5 8 N i

You might also like