0% found this document useful (0 votes)
2 views3 pages

Lab 3

The document outlines a Python program that implements Principal Component Analysis (PCA) to reduce the dimensionality of the Iris dataset from 4 features to 2. It includes steps for loading the dataset, standardizing the data, applying PCA, and visualizing the results with a scatter plot. The explained variance ratio indicates how much variance is captured by the two principal components.

Uploaded by

binu28443
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

Lab 3

The document outlines a Python program that implements Principal Component Analysis (PCA) to reduce the dimensionality of the Iris dataset from 4 features to 2. It includes steps for loading the dataset, standardizing the data, applying PCA, and visualizing the results with a scatter plot. The explained variance ratio indicates how much variance is captured by the two principal components.

Uploaded by

binu28443
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

3.

Develop a program to implement Principal Component Analysis (PCA)


for reducing the dimensionality of the Iris dataset from 4 features to 2
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset


iris = load_iris()
print(iris.feature_names) # Column names
print(iris.target_names) # Class names

df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
print(df.head())

# Standardize data before applying PCA


df_standardized = StandardScaler().fit_transform(df)

# Apply PCA with 2 components


pca = PCA(n_components=2)
principalComponents = pca.fit_transform(df_standardized)

# Create a new DataFrame with the principal components


pdf = pd.DataFrame(data=principalComponents, columns=['Principal Component 1', 'Principal
Component 2'])

# Concatenate the DataFrame with class labels


finalDf = pd.concat([pdf, pd.DataFrame(data=iris.target, columns=['target'])], axis=1)
print(finalDf.head())

# Visualize the data


fig, ax = plt.subplots(figsize=(8, 6))
ax.set_xlabel('Principal Component 1', fontsize=15)
ax.set_ylabel('Principal Component 2', fontsize=15)
explained_variance = sum(pca.explained_variance_ratio_)
ax.set_title(f'2 Component PCA (Explained Variance: {explained_variance:.2f})', fontsize=20)

targets = [0, 1, 2]
colors = ['r', 'g', 'b']
for target, color in zip(targets, colors):
indicesToKeep = finalDf['target'] == target
ax.scatter(finalDf.loc[indicesToKeep, 'Principal Component 1'],
finalDf.loc[indicesToKeep, 'Principal Component 2'],
c=color, label=iris.target_names[target], s=50, edgecolors='k')

ax.legend()
ax.grid()
plt.show()

# Print explained variance ratio


print('Explained variance ratio:', pca.explained_variance_ratio_)

Output
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
['setosa' 'versicolor' 'virginica']
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
Principal Component 1 Principal Component 2 target
0 -2.264703 0.480027 0
1 -2.080961 -0.674134 0
2 -2.364229 -0.341908 0
3 -2.299384 -0.597395 0
4 -2.389842 0.646835 0
Explained variance ratio: [0.72962445 0.22850762]

You might also like