0% found this document useful (0 votes)
39 views4 pages

Exp 3

This document describes experiments using principal component analysis (PCA) on various datasets. It includes three tasks: 1) Performing PCA on a sample dataset and printing the explained variance and covariance matrix. 2) Performing PCA on a wine dataset and printing the explained variance. 3) Performing PCA on the Iris dataset, printing the explained variance and ratio, and the first 10 data points of the transformed dataset. It then provides code to perform PCA on a dataset from Kaggle and print various PCA metrics. The conclusion summarizes that PCA is used to reduce dimensionality while retaining variation, and that it identifies variables accounting for majority of variance using principal components.

Uploaded by

jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views4 pages

Exp 3

This document describes experiments using principal component analysis (PCA) on various datasets. It includes three tasks: 1) Performing PCA on a sample dataset and printing the explained variance and covariance matrix. 2) Performing PCA on a wine dataset and printing the explained variance. 3) Performing PCA on the Iris dataset, printing the explained variance and ratio, and the first 10 data points of the transformed dataset. It then provides code to perform PCA on a dataset from Kaggle and print various PCA metrics. The conclusion summarizes that PCA is used to reduce dimensionality while retaining variation, and that it identifies variables accounting for majority of variance using principal components.

Uploaded by

jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine Learning 21BEC505

Experiment-3
Objective: Perform Principal Component Analysis (PCA)
Task #1
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
x = np.array([[1.4,1.65],[1.6,1.975],[-1.4,-1.775],[-2,-2.525],[-3,-3.95],[2.4,3.075],[1.5,2.025],[2.3,2.75],[-
3.2,-4.05],[-4.1,-4.85],[1.4,1.65]])
sc = StandardScaler()
x = sc.fit_transform(x)
pca = PCA(n_components = 2)
x = pca.fit_transform(x)
explained_variance = pca.explained_variance_

print("Explained Variance:\n",explained_variance)
print('\n')
print("Explained Variance Ratio:\n", pca.explained_variance_ratio_)
print('\n')
print("Covariance Matrix:\n", pca.get_covariance())

Output:

Task #2
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
dataset = pd.read_csv('Wine.csv')
X = dataset.iloc[:, 0:13].values
sc = StandardScaler()
X = sc.fit_transform(X)
Machine Learning 21BEC505

pca = PCA(n_components = 3)
X = pca.fit_transform(X)
explained_variance = pca.explained_variance_

print("Explained Variance:\n",explained_variance)
print('\n')
print("Explained Variance Ratio:\n", pca.explained_variance_ratio_)

Output:

Task #2
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
dataset = pd.read_csv('iris.data')
X = dataset.iloc[:, :-1].values
sc = StandardScaler()
X = sc.fit_transform(X)
pca = PCA(n_components = 2)
X = pca.fit_transform(X)
explained_variance = pca.explained_variance_
explained_variance_ratio = pca.explained_variance_ratio_

print("Explained Variance:\n",explained_variance)
print('\n')
print("Explained Variance Ratio:\n", pca.explained_variance_ratio_)
print('\n')
print(X[:10,:])
Machine Learning 21BEC505

Output:

Exercise:

1. Principal Component Analysis with Scikit-Learn. The dataset can be downloaded from:
https://www.kaggle.com/nirajvermafcb/principalcomponent-analysis-with-scikit-learn
from sklearn import preprocessing
from sklearn.decomposition import PCA
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('data.csv')
X_data = df.iloc[:, 0:10]
X_data.drop(df.columns[[1]], axis=1, inplace=True)
print(X_data.head())

#Scaling and preprocessing


sc = StandardScaler()
X_data = sc.fit_transform(X_data)

# standardization of dependent variables standard = preprocessing.scale(X_data) print(standard)


print("\n")
pca = PCA(n_components = 2)
principalComponents = pca.fit_transform(X_data)
X = pca.fit_transform(X_data)
explained_variance = pca.explained_variance_

print("Explained Variance:\n",explained_variance)
print('\n')
print("Explained Variance Ratio:\n", pca.explained_variance_ratio_)
Machine Learning 21BEC505

Output:

Conclusion:
From this experiment, Principal Component Analysis (PCA) is a widely used data analysis method for keeping
as much of the original variation as possible while reducing the dimensionality of high-dimensional datasets.
In this experiment, we used principal component analysis (PCA) on a dataset to identify the variables that
accounted for the majority of the data's variance. Additionally, we used the principal component loadings and
scores to interpret the relationship between the dataset's variables. PCA is a method of unsupervised learning
used to reduce dimensionality. This method basically involves rotating our axis in such a way that, given the
constraints, only a few of our dimensions become principal components-dimensions with a high variance.
This is done with the help of eigen vectors, which we sort in a specific order and select the ones with higher
values, high variance, or higher significance. Assuming the eigen values are same that implies PCA performs
seriously.

You might also like