0% found this document useful (0 votes)
2 views3 pages

Data Science Exercise Medium

The document contains a Python script for a data analysis exercise involving Principal Component Analysis (PCA) on a 2D dataset. It includes functions to center the data, compute principal components, and visualize the original and encoded data in PCA space. The script uses NumPy and Matplotlib for data manipulation and visualization.

Uploaded by

kevinliangisfat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

Data Science Exercise Medium

The document contains a Python script for a data analysis exercise involving Principal Component Analysis (PCA) on a 2D dataset. It includes functions to center the data, compute principal components, and visualize the original and encoded data in PCA space. The script uses NumPy and Matplotlib for data manipulation and visualization.

Uploaded by

kevinliangisfat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

HW4 Exercise 4

DATASCI 3ML3 Winter 2025

Mithun Manivannan: 400309374

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec

#load data
X_original = np.loadtxt('2d_span_data.csv', delimiter=',')

def center(X):
X_means = np.mean(X, axis=1)[:, np.newaxis]
X_centered = X - X_means
return X_centered

X = center(X_original) # shape (2, P)

#compute pcs
def compute_pcs(X, lam=1e-7):
P = float(X.shape[1])
Cov = (1 / P) * np.dot(X, X.T) + lam * np.eye(X.shape[0])
D, V = np.linalg.eigh(Cov)
return D, V

# Compute eigenvalues/eigenvectors
D, V = compute_pcs(X)
# Get the 2 principal components (columns are PCs)
PCs = V[:, -2:] # shape (2, 2)

# encode data
W = np.dot(PCs.T, X) # shape (2, P)

1
#visuals
fig = plt.figure(figsize=(12, 5))
gs = gridspec.GridSpec(1, 2)

## Left Panel: Original data + principal components


ax1 = plt.subplot(gs[0], aspect='equal')
ax1.set_title("original data")
ax1.set_xlabel(r'$x_1$', fontsize=12)
ax1.set_ylabel(r'$x_2$', fontsize=12)
ax1.scatter(X[0, :], X[1, :], color='black', s=40, edgecolor='white')

# Plot PCs as red arrows


origin = np.zeros((2,))
for i in range(2):
vec = 2 * np.sqrt(D[-(i+1)]) * PCs[:, -(i+1)] # scale by std dev
ax1.arrow(*origin, *vec, color='red', width=0.05, head_width=0.2)

## Right Panel: Encoded data (in PC space)


ax2 = plt.subplot(gs[1], aspect='equal')
ax2.set_title("encoded data")
ax2.set_xlabel(r'$c_1$', fontsize=12)
ax2.set_ylabel(r'$c_2$', fontsize=12)
ax2.scatter(W[0, :], W[1, :], color='black', s=40, edgecolor='white')

# Draw red arrows to show basis (standard unit vectors)


ax2.arrow(0, 0, 2, 0, color='red', width=0.05, head_width=0.2)
ax2.arrow(0, 0, 0, 2, color='red', width=0.05, head_width=0.2)

plt.tight_layout()
plt.show()

2
3

You might also like