Exp 15
Exp 15
Dimension Reduction-
Example-
In machine learning,
Using both these dimensions convey similar information.
Also, they introduce a lot of noise in the system.
So, it is better to use just one dimension.
PCA Algorithm-
Problem-01:
Given data = { 2, 3, 4, 5, 6, 7 ; 1, 5, 3, 6, 7, 8 }.
Compute the principal component using PCA Algorithm.
OR
Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8).
Compute the principal component using PCA Algorithm.
OR
Step-01:
Get data.
The given feature vectors are-
x1 = (2, 1)
x2 = (3, 5)
x3 = (4, 3)
x4 = (5, 6)
x5 = (6, 7)
x6 = (7, 8)
Step-02:
Thus,
Step-03:
Step-04:
Now,
Now,
Covariance matrix
= (m1 + m2 + m3 + m4 + m5 + m6) / 6
Calculate the eigen values and eigen vectors of the covariance matrix.
λ is an eigen value for a matrix M if it is a solution of the characteristic equation |M – λI| = 0.
So, we have-
From here,
(2.92 – λ)(5.67 – λ) – (3.67 x 3.67) = 0
16.56 – 2.92λ – 5.67λ + λ2 – 13.47 = 0
λ2 – 8.59λ + 3.09 = 0
Clearly, the second eigen value is very small compared to the first eigen value.
So, the second eigen vector can be left out.
Eigen vector corresponding to the greatest eigen value is the principal component for the given data
set.
So. we find the eigen vector corresponding to eigen value λ1.
On simplification, we get-
5.3X1 = 3.67X2 ………(1)
3.67X1 = 2.55X2 ………(2)
Lastly, we project the data points onto the new subspace as-
# -*- coding: utf-8 -*-
"""EXP15.ipynb
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataset = pd.read_csv('/content/PCA.csv')
dataset
dataset.isna().sum()
X = dataset.iloc[: , :-1].values
y = dataset.iloc[: , -1].values
variance = pca.explained_variance_ratio_
variance
y_pred = classifier.predict(x_test)
data_p=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})
data_p
# find accuracy_score
from sklearn.metrics import accuracy_score
print("the Accuracy of given model:",accuracy_score(y_test, y_pred)*100)
# find precision_score
from sklearn.metrics import precision_score
# calculate recall
from sklearn.metrics import recall_score
recall = recall_score(y_test, y_pred, average='micro')
print('Recall:', recall*100)
# calculate f1 score
from sklearn.metrics import f1_score
f1score = f1_score(y_test, y_pred, average='micro')
print('Recall:', f1score*100)