0% found this document useful (0 votes)
60 views

ML LAB - Principal Component Analysis

The document discusses principal component analysis (PCA) and its application to an arrhythmia dataset. PCA is a technique that transforms correlated variables into a set of uncorrelated variables called principal components. It performs an orthogonal transformation to maximize variance in the data. The code applies PCA to reduce the arrhythmia dataset to 7 principal components, then trains a k-nearest neighbors classifier on the transformed data, achieving a test accuracy of 68.4%.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

ML LAB - Principal Component Analysis

The document discusses principal component analysis (PCA) and its application to an arrhythmia dataset. PCA is a technique that transforms correlated variables into a set of uncorrelated variables called principal components. It performs an orthogonal transformation to maximize variance in the data. The code applies PCA to reduce the arrhythmia dataset to 7 principal components, then trains a k-nearest neighbors classifier on the transformed data, achieving a test accuracy of 68.4%.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

ML LAB- ​Principal Component Analysis

-RAHUL NABERA M
-15BCE1101

DATASET: arrhythmia dataset

What is PCA?

Principal component analysis (PCA) is a statistical procedure that uses an ​orthogonal transformation​ to
convert a set of observations of possibly correlated variables into a set of values of ​linearly uncorrelated
variables called principal components. The number of distinct principal components is equal to the
smaller of the number of original variables or the number of observations minus one. This
transformation is defined in such a way that the first principal component has the largest possible
variance​ (that is, accounts for as much of the variability in the data as possible), and each succeeding
component in turn has the highest variance possible under the constraint that it is ​orthogonal​ to the
preceding components. The resulting vectors are an uncorrelated ​orthogonal basis set​. PCA is sensitive
to the relative scaling of the original variables.

CODE:

# Importing the libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

dataset = pd.read_csv('log.csv',header=None)

X=dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1 ].values

#from sklearn.preprocessing import Imputer


#imputer = Imputer(missing_values =0, strategy = 'mean', axis = 0)
#imputer = imputer.fit(X[:,10:15])
#X[:,10:15] = imputer.transform(X[:,10:15])

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#pca
from sklearn.decomposition import PCA
pca = PCA(n_components=7)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)
Explained= pca.explained_variance_ratio_

from sklearn.neighbors import KNeighborsClassifier


classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

print('accuracy train:{:.3f}'.format(classifier.score(X_train,y_train)))
print('accuracy test:{:.3f}'.format(classifier.score(X_test,y_test)))

RESULTS:

PCA’s:
ACCURACY:

accuracy train:0.791
accuracy test:0.684

You might also like