0% found this document useful (0 votes)
3 views

program - 3

The document outlines a program that implements Principal Component Analysis (PCA) to reduce the dimensionality of the Iris dataset from 4 features to 2. It includes steps for standardization, covariance matrix calculation, eigenvalue and eigenvector computation, and data transformation. The program also visualizes the PCA-transformed data using a scatter plot.

Uploaded by

1bi22ai010
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

program - 3

The document outlines a program that implements Principal Component Analysis (PCA) to reduce the dimensionality of the Iris dataset from 4 features to 2. It includes steps for standardization, covariance matrix calculation, eigenvalue and eigenvector computation, and data transformation. The program also visualizes the PCA-transformed data using a scatter plot.

Uploaded by

1bi22ai010
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Program - 3

Develop a program to implement Principal Component Analysis (PCA) for reducing the
dimensionality of the Iris dataset from 4 features to 2.
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Load the Iris dataset


iris = datasets.load_iris()
X = iris.data
y = iris.target

# Create a DataFrame for better visualization


df = pd.DataFrame(X, columns=iris.feature_names)
#df to array
X = df.to_numpy()
np.set_printoptions(linewidth=np.inf)
print('original data top 3 rows')
print(X[:3])
mean = np.mean(X,axis=0)
print('mean value =',np.round(mean,2))
std_dev=np.std(X,axis=0)
print('Standard deviation = ',np.round(std_dev,2))
#step 1 standardization of x matrix
X_standardized = (X - np.mean(X, axis=0)) / np.std(X, axis=0)
print(' Standardization matrix top 3 rows \n', np.round(X_standardized[:3],2))
#step 2 take the tanspose of matrix X_stadradized by .T i.m x*xT
cov_matrix = np.cov(X_standardized.T)
print('covarnce = \n', np.round(cov_matrix,2))
# Step 3: Compute the eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
print('eigen values =',np.round(eigenvalues,2))
print('eigen vector =\n',np.round(eigenvectors,2))
# Step 4: Sort eigenvalues and select principal components
sorted_index = np.argsort(eigenvalues)[::-1]
sorted_eigenvectors = eigenvectors[:, sorted_index]
print('sorted eigen values =',np.round(sorted_index,2))
print('soted eigen vector =\n',np.round(sorted_eigenvectors,2))
# Select the top 2 eigenvectors
eigenvectors_subset = sorted_eigenvectors[:, :2]
print('soted eigen vector =\n',np.round(eigenvectors_subset,2))
# Step 5: Transform the data
X_reduced = np.dot(X_standardized, eigenvectors_subset)
print("X_reduced \n",X_reduced [:3])

df_pca = pd.DataFrame(X_reduced, columns=['PCA1', 'PCA2'])


df_pca['target'] = y
print(df_pca.sample(5))
# Plot the PCA-transformed data
plt.figure(figsize=(10, 7))
colors = ['r', 'g', 'b']
for target, color in zip(df_pca['target'].unique(), colors):
subset = df_pca[df_pca['target'] == target]
plt.scatter(subset['PCA1'], subset['PCA2'], color=color, label=iris.target_names[target])
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA of Iris Dataset')
plt.legend()
plt.show()
ouput:
original data top 3 rows
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]]
mean value = [5.84 3.06 3.76 1.2 ]
Standard deviation = [0.83 0.43 1.76 0.76]
Standardization matrix top 3 rows
[[-0.9 1.02 -1.34 -1.32]
[-1.14 -0.13 -1.34 -1.32]
[-1.39 0.33 -1.4 -1.32]

covarnce =
[[ 1.01 -0.12 0.88 0.82]
[-0.12 1.01 -0.43 -0.37]
[ 0.88 -0.43 1.01 0.97]
[ 0.82 -0.37 0.97 1.01]]
eigen values = [2.94 0.92 0.15 0.02]
eigen vector =
[[ 0.52 -0.38 -0.72 0.26]
[-0.27 -0.92 0.24 -0.12]
[ 0.58 -0.02 0.14 -0.8 ]
[ 0.56 -0.07 0.63 0.52]]
sorted eigen values = [0 1 2 3]
soted eigen vector =
[[ 0.52 -0.38 -0.72 0.26]
[-0.27 -0.92 0.24 -0.12]
[ 0.58 -0.02 0.14 -0.8 ]
[ 0.56 -0.07 0.63 0.52]]
soted eigen vector =
[[ 0.52 -0.38]
[-0.27 -0.92]
[ 0.58 -0.02]
[ 0.56 -0.07]]
X_reduced
[[-2.26470281 -0.4800266 ]
[-2.08096115 0.67413356]
[-2.36422905 0.34190802]]
PCA1 PCA2 target
70 0.737683 -0.396572 1
118 3.310696 -0.017781 2
9 -2.184328 0.469014 0
149 0.960656 0.024332 2
25 -1.951846 0.625619 0

You might also like