0% found this document useful (0 votes)
12 views12 pages

MScFE 600 Financial Data GWP1 - Ipynb

The document outlines an analysis of Indonesia's government bond yield data to study the yield curve, utilizing models such as Nelson-Siegel and Cubic Spline for fitting. It includes data loading, model fitting, and performance comparison metrics like RMSE and R^2. Ethical considerations regarding data smoothing practices are also discussed, emphasizing the importance of transparency and avoiding misrepresentation of market risks.

Uploaded by

ybanceorakle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views12 pages

MScFE 600 Financial Data GWP1 - Ipynb

The document outlines an analysis of Indonesia's government bond yield data to study the yield curve, utilizing models such as Nelson-Siegel and Cubic Spline for fitting. It includes data loading, model fitting, and performance comparison metrics like RMSE and R^2. Ethical considerations regarding data smoothing practices are also discussed, emphasizing the importance of transparency and avoiding misrepresentation of market risks.

Uploaded by

ybanceorakle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MScFE 600 Financial Data GWP1_Grp_7982_Que2_

In [40]: import pandas as pd


import numpy as np
from scipy.optimize import curve_fit
from scipy.interpolate import CubicSpline
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

A.

Country Selection - Indonesia


*The bond yield data used in this analysis is for government securities from Indonesia,
as advised by the group. The dataset was downloaded, consolidated, and analysed to
study the yield curve.*

• *Why Indonesia Was Selected*

▪ Group Recommendation - Indonesia was chosen because it is one of the countries


recommended by the group.
• *Dataset Overview*

▪ Source - Investing.com
▪ Content - Daily bond yields across different maturities
▪ Purpose - The dataset is well-suited for yield curve analysis because it offers
consistent and comprehensive data points for modeling.
• *Selected Maturities*

▪ The dataset includes a comprehensive range of maturities, ensuring a complete


analysis of the yield curve. The selected maturities are as follows:
▪ Short-term: 1 month, 3 months, 6 months, 1 year.
▪ Medium-term: 3 years, 5 years
▪ Long-term: 10 years, 15 years, 20 years, 30 years.
• *Reason for Selection*

▪ These maturities cover the full spectrum of the yield curve, from short-term to long-
term securities. This allows for accurate modeling of interest rate dynamics across
different time horizons and provides sufficient data for fitting both the Nelson-
Siegel and Cubic-Spline models.

In [66]: # Load the dataset


file_path = 'Dataset1812025Bond_Yields.csv' # Update this path if necessary
data = pd.read_csv(file_path)

# Select columns for analysis


selected_columns = ['Date', '1M', '3M', '6M', '1Yr', '3Yr', '5Yr', '10Yr', '15Yr',
filtered_data = data[selected_columns].copy()
filtered_data['Date'] = pd.to_datetime(filtered_data['Date'])

print("Dataset loaded and filtered for maturities ranging from 1M to 30Yr.")

1 of 6
MScFE 600 Financial Data GWP1_Grp_7982_Que2_

Dataset loaded and filtered for maturities ranging from 1M to 30Yr.

B.

In [68]: # Define maturities in years


maturities = np.array([1/12, 3/12, 6/12, 1, 3, 5, 10, 15, 20, 25, 30]) # Maturities in years

# Extract yields for the first date as an example


yields = filtered_data.iloc[0, 1:].values # Using the first row of yields

print("\nMaturities (Years):", maturities)


print("Yields (Observed):", yields)

Maturities (Years): [ 0.08333333 0.25 0.5 1. 3. 5.


10. 15. 20. 25. 30. ]
Yields (Observed): [4.064 4.379 4.625 5.001 5.253 5.352 5.729 6.234 6.554 6.77 6.55
5]

C.

In [70]: # Define the Nelson-Siegel model


def nelson_siegel(maturity, beta0, beta1, beta2, tau):
"""
Nelson-Siegel model for yield curve fitting.
"""
term1 = beta0
term2 = beta1 * ((1 - np.exp(-maturity / tau)) / (maturity / tau))
term3 = beta2 * (((1 - np.exp(-maturity / tau)) / (maturity / tau)) - np.exp(-maturity
return term1 + term2 + term3

# Initial guess for the parameters [beta0, beta1, beta2, tau]


initial_guess = [5, -1, 1, 1]

# Fit the Nelson-Siegel model


params_ns, _ = curve_fit(nelson_siegel, maturities, yields, p0=initial_guess)

# Extract fitted Nelson-Siegel yields


fitted_ns_yields = nelson_siegel(maturities, *params_ns)

# Display the parameters


print("\nNelson-Siegel Model Parameters:")
print(f"beta0 = {params_ns[0]:.4f}, beta1 = {params_ns[1]:.4f}, beta2 = {params_ns[2

# Plot the Nelson-Siegel fit


plt.figure(figsize=(10, 6))
plt.plot(maturities, yields, 'o', label='Observed Yields')
plt.plot(maturities, fitted_ns_yields, '-', label='Nelson-Siegel Fit')
plt.xlabel('Maturities (Years)')
plt.ylabel('Yields (%)')
plt.title('Nelson-Siegel Yield Curve Fit')
plt.legend()
plt.grid()
plt.show()

Nelson-Siegel Model Parameters:


beta0 = 7.0918, beta1 = -2.7091, beta2 = -0.0005, tau = 4.6961

2 of 6
MScFE 600 Financial Data GWP1_Grp_7982_Que2_

D.

In [56]: # Fit the cubic spline model


cs = CubicSpline(maturities, yields)
fitted_cs_yields = cs(maturities)

print("\nCubic Spline model fitted successfully.")

# Plot the Cubic-Spline fit


plt.figure(figsize=(10, 6))
plt.plot(maturities, yields, 'o', label='Observed Yields')
plt.plot(maturities, fitted_cs_yields, '--', label='Cubic Spline Fit')
plt.xlabel('Maturities (Years)')
plt.ylabel('Yields (%)')
plt.title('Cubic Spline Yield Curve Fit')
plt.legend()
plt.grid()
plt.show()

Cubic Spline model fitted successfully.

3 of 6
MScFE 600 Financial Data GWP1_Grp_7982_Que2_

E.

In [58]: # Calculate RMSE and R^2 for both models


rmse_ns = np.sqrt(mean_squared_error(yields, fitted_ns_yields))
rmse_cs = np.sqrt(mean_squared_error(yields, fitted_cs_yields))
r2_ns = r2_score(yields, fitted_ns_yields)
r2_cs = r2_score(yields, fitted_cs_yields)

# Print comparison results


print("\nModel Comparison:")
print(f"Nelson-Siegel Model: RMSE = {rmse_ns:.4f}, R^2 = {r2_ns:.4f}")
print(f"Cubic Spline Model: RMSE = {rmse_cs:.4f}, R^2 = {r2_cs:.4f}")

# Plot observed yields and model fits


plt.figure(figsize=(10, 6))
plt.plot(maturities, yields, 'o', label='Observed Yields')
plt.plot(maturities, fitted_ns_yields, '-', label='Nelson-Siegel Fit')
plt.plot(maturities, fitted_cs_yields, '--', label='Cubic Spline Fit')
plt.xlabel('Maturities (Years)')
plt.ylabel('Yields (%)')
plt.title('Comparison of Yield Curve Models')
plt.legend()
plt.grid()
plt.show()

Model Comparison:
Nelson-Siegel Model: RMSE = 0.1914, R^2 = 0.9544
Cubic Spline Model: RMSE = 0.0000, R^2 = 1.0000

4 of 6
MScFE 600 Financial Data GWP1_Grp_7982_Que2_

F.

In [60]: print("\nModel Parameters:")


print("Nelson-Siegel Parameters:")
print(f"beta0 = {params_ns[0]:.4f}, beta1 = {params_ns[1]:.4f}, beta2 = {params_ns[2
print("\nCubic Spline Model does not have interpretable parameters as it is a non-parametric f

Model Parameters:
Nelson-Siegel Parameters:
beta0 = 7.0918, beta1 = -2.7091, beta2 = -0.0005, tau = 4.6961

Cubic Spline Model does not have interpretable parameters as it is a non-parametric


fit.

G.

Ethical Considerations
It is unethical to intentionally create misleading data. Smoothing data, while useful in
filtering noise in Econometrics, becomes unethical when it misrepresents reality, such as
understating volatility or inflating Sharpe ratios. For instance, holding back gains to offset
losses, as in smoothing profit-and-loss, creates the illusion of stability. Good ethics demands
transparency, making such practices unethical if used to mislead stakeholders about risk or
performance. (WQU, 2024)

Smoothing data can be unethical if it misrepresents reality, such as understating volatility or


inflating Sharpe ratios. If Nelson-Siegel smoothing is used solely for analytical clarity, it is
acceptable. However, if it distorts the yield curve to mislead stakeholders about risk or

5 of 6
MScFE 600 Financial Data GWP1_Grp_7982_Que2_

performance, it becomes unethical, violating transparency and good ethics. The use of
smoothing techniques, such as the Nelson-Siegel model, raises ethical concerns if it obscures
critical market information. Key considerations are:

• Transparency

Smoothing improves interpretability but may conceal market risks if used without disclosure.

• Over-smoothing

During periods of volatility, excessive smoothing could mislead investors by hiding


significant deviations in market conditions. Ethical modeling practices require clear
communication of assumptions, validation of results, and alignment with the intended use of
the analysis. Smoothing should enhance, not distort, market insights (Module 2, Lesson 4,
2024).

In [ ]:

6 of 6
MScFE 600 Financial Data GWP1_Grp7982_Ques3

In [13]: import numpy as np


import pandas as pd
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

In [14]: # Load the dataset


file_path = 'Dataset1812025Bond_Yields.csv'
data = pd.read_csv(file_path)

*A.*

In [17]: # Generate 5 uncorrelated Gaussian random variables


np.random.seed(42) # For reproducibility
n_samples, n_features = 100, 5
uncorrelated_data = np.random.normal(loc=0, scale=0.1, size=(n_samples, n_features))

# Convert to DataFrame
uncorrelated_df = pd.DataFrame(uncorrelated_data, columns=[f"Var{i+1}" for i in range
print("Uncorrelated Gaussian Random Variables:")
print(uncorrelated_df.head())

Uncorrelated Gaussian Random Variables:


Var1 Var2 Var3 Var4 Var5
0 0.049671 -0.013826 0.064769 0.152303 -0.023415
1 -0.023414 0.157921 0.076743 -0.046947 0.054256
2 -0.046342 -0.046573 0.024196 -0.191328 -0.172492
3 -0.056229 -0.101283 0.031425 -0.090802 -0.141230
4 0.146565 -0.022578 0.006753 -0.142475 -0.054438

*B.*

In [19]: # Compute correlation matrix


correlation_matrix_uncorrelated = np.corrcoef(uncorrelated_data.T)

# Perform PCA
pca_uncorrelated = PCA()
pca_uncorrelated.fit(correlation_matrix_uncorrelated)

# Extract explained variance ratio


explained_variance_ratio_uncorrelated = pca_uncorrelated.explained_variance_ratio_

# Prepare PCA results


pca_results_uncorrelated = pd.DataFrame({
"Principal Component": [f"PC{i+1}" for i in range(len(explained_variance_ratio_uncorrelate
"Explained Variance Ratio (%)": explained_variance_ratio_uncorrelated * 100
})
print("\nPCA Results (Uncorrelated Data):")
print(pca_results_uncorrelated)

1 of 6
MScFE 600 Financial Data GWP1_Grp7982_Ques3

PCA Results (Uncorrelated Data):


Principal Component Explained Variance Ratio (%)
0 PC1 3.655594e+01
1 PC2 2.462578e+01
2 PC3 2.242252e+01
3 PC4 1.639577e+01
4 PC5 6.319696e-35

*C.*

In [21]: # Display cumulative variance explained


cumulative_variance_uncorrelated = explained_variance_ratio_uncorrelated.cumsum() *
print("\nCumulative Variance Explained (Uncorrelated Data):")
for i, variance in enumerate(cumulative_variance_uncorrelated):
print(f"Component {i+1}: {variance:.2f}%")

Cumulative Variance Explained (Uncorrelated Data):


Component 1: 36.56%
Component 2: 61.18%
Component 3: 83.60%
Component 4: 100.00%
Component 5: 100.00%

*D.*

In [314… plt.figure(figsize=(8, 5))


plt.plot(range(1, len(explained_variance_ratio_uncorrelated) + 1), explained_variance_ratio_un
plt.title("Scree Plot: Variance Explained by Principal Components (Uncorrelated Data)"
plt.xlabel("Principal Component", fontsize=12)
plt.ylabel("Explained Variance Ratio (%)", fontsize=12)
plt.grid()
plt.show()

2 of 6
MScFE 600 Financial Data GWP1_Grp7982_Ques3

Now let’s work with real data:

*E.*

In [23]: file_path = 'Dataset1812025Bond_Yields.csv' # Replace with your file path


real_data = pd.read_csv(file_path)

# Convert 'Date' column to datetime


real_data['Date'] = pd.to_datetime(real_data['Date'])

# Select the maturities for analysis


maturity_columns = ['1M', '3M', '1Yr', '5Yr', '10Yr']
yields_data = real_data[maturity_columns].dropna()

print("Government Yield Data (First Few Rows):")


print(yields_data.head())

Government Yield Data (First Few Rows):


1M 3M 1Yr 5Yr 10Yr
0 4.064 4.379 5.001 5.352 5.729
1 4.701 4.576 5.553 5.357 5.771
2 3.654 4.338 4.944 5.399 5.705
3 3.654 4.347 5.411 5.311 5.664
4 4.459 4.509 4.723 5.232 5.627

*F.*

In [25]: # Compute daily yield changes


daily_changes = yields_data.diff().dropna()

print("Daily Yield Changes (First Few Rows):")

3 of 6
MScFE 600 Financial Data GWP1_Grp7982_Ques3

print(daily_changes.head())

Daily Yield Changes (First Few Rows):


1M 3M 1Yr 5Yr 10Yr
1 0.637 0.197 0.552 0.005 0.042
2 -1.047 -0.238 -0.609 0.042 -0.066
3 0.000 0.009 0.467 -0.088 -0.041
4 0.805 0.162 -0.688 -0.079 -0.037
5 -0.285 -0.111 -0.021 -0.008 -0.018

G. Re-run the Principal Components using EITHER the correlation or covariance


matrix.

In [27]: # Compute the correlation matrix for daily changes


correlation_matrix_gov = np.corrcoef(daily_changes.T)

# Perform PCA
pca_gov = PCA()
pca_gov.fit(correlation_matrix_gov)

# Extract explained variance ratio


explained_variance_ratio_gov = pca_gov.explained_variance_ratio_

# Prepare PCA results


pca_results_gov = pd.DataFrame({
"Principal Component": [f"PC{i+1}" for i in range(len(explained_variance_ratio_gov
"Explained Variance Ratio (%)": explained_variance_ratio_gov * 100
})
print("\nPCA Results (Government Data):")
print(pca_results_gov)

PCA Results (Government Data):


Principal Component Explained Variance Ratio (%)
0 PC1 6.289245e+01
1 PC2 2.124243e+01
2 PC3 1.400119e+01
3 PC4 1.863934e+00
4 PC5 8.072131e-32

*H.*

In [29]: # Cumulative variance explained


cumulative_variance_gov = explained_variance_ratio_gov.cumsum() * 100
print("\nCumulative Variance Explained (Government Data):")
for i, variance in enumerate(cumulative_variance_gov):
print(f"Component {i+1}: {variance:.2f}%")

Cumulative Variance Explained (Government Data):


Component 1: 62.89%
Component 2: 84.13%
Component 3: 98.14%
Component 4: 100.00%
Component 5: 100.00%

*I.*

4 of 6
MScFE 600 Financial Data GWP1_Grp7982_Ques3

In [33]: # Scree plot


plt.figure(figsize=(8, 5))
plt.plot(range(1, len(explained_variance_ratio_gov) + 1), explained_variance_ratio_gov
plt.title("Scree Plot: Variance Explained by Principal Components (Government Data)"
plt.xlabel("Principal Component", fontsize=12)
plt.ylabel("Explained Variance Ratio (%)", fontsize=12)
plt.grid()
plt.show()

*J.*

In [35]: # Combined Scree Plot


plt.figure(figsize=(8, 5))
plt.plot(range(1, len(explained_variance_ratio_uncorrelated) + 1), explained_variance_ratio_un
plt.plot(range(1, len(explained_variance_ratio_gov) + 1), explained_variance_ratio_gov
plt.title("Scree Plot Comparison: Uncorrelated vs Government Data", fontsize=14)
plt.xlabel("Principal Component", fontsize=12)
plt.ylabel("Explained Variance Ratio (%)", fontsize=12)
plt.legend()
plt.grid()
plt.show()

5 of 6
MScFE 600 Financial Data GWP1_Grp7982_Ques3

In [ ]:

6 of 6

You might also like