0% found this document useful (0 votes)

24 views7 pages

PCA File

Principal Component Analysis (PCA) is a statistical technique used to reduce the dimensionality of large datasets. It converts correlated variables into uncorrelated variables called principal components. PCA is commonly used for exploratory data analysis and to visualize multivariate datasets. The document demonstrates how to perform PCA on the Iris dataset and visualize the results.

Uploaded by

shilpa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views7 pages

PCA File

Uploaded by

shilpa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Principal Component Analysis(PCA)

Principal Component Analysis (PCA) is a statistical procedure that uses an

orthogonal transformation which converts a set of correlated variables to a set of
uncorrelated variables. PCA is a most widely used tool in exploratory data analysis and
in machine learning for predictive models. Moreover, PCA is an unsupervised statistical
technique used to examine the interrelations among a set of variables. It is also known
as a general factor analysis where regression determines a line of best fit.

Module Needed:
Import sys
Import pandas as pd
Import numpy as np
Import matplotlib

print('Python : {}'.format(sys.version))

print('Pandas : {}'.format(pd. version))

print('Numpy : {}'.format(np. version))

print('Scikit_learn : {}'.format(sklearn. version))

print('Matplotlib : {}'.format(matplotlib. version))

output:

Python : 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
Pandas : 0.25.1
Numpy : 1.16.5
Scikit_learn : 0.22.1
Matplotlib : 3.1.1
In [6]:
from sklearn import datasets

iris = datasets.load_iris()

features = iris.data

target = iris.target

print(df.shape)

print(df.head(20))

(150, 4)
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
5 5.4 3.9 1.7 0.4
6 4.6 3.4 1.4 0.3
7 5.0 3.4 1.5 0.2
8 4.4 2.9 1.4 0.2
9 4.9 3.1 1.5 0.1
10 5.4 3.7 1.5 0.2
11 4.8 3.4 1.6 0.2
12 4.8 3.0 1.4 0.1
13 4.3 3.0 1.1 0.1
14 5.8 4.0 1.2 0.2
15 5.7 4.4 1.5 0.4
16 5.4 3.9 1.3 0.4
17 5.1 3.5 1.4 0.3
18 5.7 3.8 1.7 0.3
19 5.1 3.8 1.5 0.3
In [10]:
print(df.describe())

output:

sepal length (cm) sepal width (cm) petal length (cm) \

count 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000
std 0.828066 0.435866 1.765298
min 4.300000 2.000000 1.000000
25% 5.100000 2.800000 1.600000
50% 5.800000 3.000000 4.350000
75% 6.400000 3.300000 5.100000
max 7.900000 4.400000 6.900000

petal width (cm)

count 150.000000
mean 1.199333
std 0.762238
min 0.100000
25% 0.300000
50% 1.300000
75% 1.800000
max 2.500000
from pandas.plotting import scatter_matrix

import matplotlib.pyplot as plt

scatter_matrix(df)

plt.show()

<Figure size 640x480 with 16 Axes>

from sklearn.cluster import KMeans

X = []

Y = []

for i in range(1,31):

kmeans = KMeans(n_clusters = i)

kmeans.fit(df)

X.append(i)

awcss = kmeans.inertia_/df.shape[0]

Y.append(awcss)

import matplotlib.pyplot as plt

plt.plot(X,Y,'bo-')

plt.xlim((1,30))

plt.xlabel('Number of Clusters')

plt.ylabel('Average Within_Cluster Sum of Squares')

plt.title('K-Means Clustering Elbow Method')

plt.show()

output:
from sklearn.decomposition import PCA

from sklearn import preprocessing

pca = PCA(n_components = 2)

pc = pca.fit_transform(df)

print(pc.shape)

print(pc[:10])

output:

(150, 2)
[[-2.68412563 0.31939725]
[-2.71414169 -0.17700123]
[-2.88899057 -0.14494943]
[-2.74534286 -0.31829898]
[-2.72871654 0.32675451]
[-2.28085963 0.74133045]
[-2.82053775 -0.08946138]
[-2.62614497 0.16338496]
[-2.88638273 -0.57831175]
[-2.6727558 -0.11377425]]

kmeans = KMeans(n_clusters = 3)
kmeans.fit(pc)

output:

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,

n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
random_state=None, tol=0.0001, verbose=0)
h = 0.02
x_min, x_max = pc[:,0].min()-1,pc[:,1].max()+1
y_min, y_max = pc[:,1].min()-1,pc[:,1].max()+1
xx,yy = np.meshgrid(np.arange(x_min,x_max, h), np.arange(y_min,y_max, h))
Z = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(12,12))
plt.clf()

plt.imshow(Z,interpolation='nearest',
extent = (xx.min(),xx.max(),yy.min(),yy.max()),
cmap=plt.cm.tab20c,
aspect = 'auto', origin = 'lower')
for i, point in enumerate(pc):
if target[i]==0:
plt.plot(point[0],point[0],'g.',markersize = 10)
if target[i]==1:
plt.plot(point[0],point[1],'b.',markersize = 10)
if target[i]==2:
plt.plot(point[0],point[1],'r.',markersize = 10)
centroids = kmeans.cluster_centers_
plt.scatter(centroids[:,0], centroids[:,1], marker = 'x', s = 250, linewidth = 4,
color = 'w',zorder = 10)
plt.title('K-Means Clustering on PCA-Reduced Iris Data Set')
plt.xlim(x_min,x_max)
plt.ylim(y_min,y_max)
plt.xlabel('pca1')
plt.ylabel('pca2')
plt.xticks(())
plt.yticks(())
plt.show()
output:

from sklearn import metrics

kmeans1 = KMeans(n_clusters=3)
kmeans1.fit(features)

kmeans2 = KMeans(n_clusters = 3)
kmeans2.fit(pc)
print('Non Reduced Data')
print('Homogeneity : {}'. format(metrics.homogeneity_score(target,kmeans1.labels_)))
print('Completeness : {}'. format(metrics.completeness_score(target,kmeans1.labels_)))
print('V-measure : {}'. format(metrics.homogeneity_score(target,kmeans1.labels_)))

print('Reduced Data')
print('Homogeneity : {}'. format(metrics.homogeneity_score(target,kmeans1.labels_)))
print('Completeness : {}'. format(metrics.completeness_score(target,kmeans1.labels_)))
print('V-measure: {}'. format(metrics.homogeneity_score(target,kmeans1.labels_)))
Output:

Non Reduced Data

Homogeneity : 0.7514854021988339
Completeness : 0.7649861514489816
V-measure : 0.7514854021988339
Reduced Data
Homogeneity : 0.7514854021988339
Completeness : 0.7649861514489816
V-measure: 0.7514854021988339

SIT30821 - SITHCCC037 - V1.0 - Student Assessment.v1.0
0% (1)
SIT30821 - SITHCCC037 - V1.0 - Student Assessment.v1.0
28 pages
SITHCCC038 - V1.0 - Student Assessment Tools.v1.0
No ratings yet
SITHCCC038 - V1.0 - Student Assessment Tools.v1.0
22 pages
Cofe2o4 Jcpds Data Card
100% (1)
Cofe2o4 Jcpds Data Card
3 pages
Image Recognition Using CNN
0% (1)
Image Recognition Using CNN
12 pages
K Fold
No ratings yet
K Fold
2 pages
Untitled2 - Jupyter Notebook
No ratings yet
Untitled2 - Jupyter Notebook
4 pages
Theory: Performs A Principal Components Analysis On The Given Numeric Data Matrix and Returns The Results As An Object of Class
No ratings yet
Theory: Performs A Principal Components Analysis On The Given Numeric Data Matrix and Returns The Results As An Object of Class
8 pages
Karisma 23011101119 Eda Rec
No ratings yet
Karisma 23011101119 Eda Rec
88 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
4 pages
Unit-2 CCS
No ratings yet
Unit-2 CCS
33 pages
Charmi Shah - 20BCP299 Assignment-4
No ratings yet
Charmi Shah - 20BCP299 Assignment-4
11 pages
ML Mini Project: Name: Sarvesh Muttepwar Class: BE COMP (A) Roll No: 21CEBEB11
No ratings yet
ML Mini Project: Name: Sarvesh Muttepwar Class: BE COMP (A) Roll No: 21CEBEB11
12 pages
Keeraiit 2
No ratings yet
Keeraiit 2
19 pages
Task 7
No ratings yet
Task 7
14 pages
BS SRR-3
No ratings yet
BS SRR-3
20 pages
ML Expt 2
No ratings yet
ML Expt 2
5 pages
Table of F
No ratings yet
Table of F
4 pages
UDESA ST Tabla de Series Temporales
No ratings yet
UDESA ST Tabla de Series Temporales
22 pages
4 Primer
No ratings yet
4 Primer
4 pages
Covid E4v4
No ratings yet
Covid E4v4
1 page
Appendix 4 - Prokon Slab Design
No ratings yet
Appendix 4 - Prokon Slab Design
8 pages
004 Notas - Grupo 1 Noviembre - 18 Moodle
No ratings yet
004 Notas - Grupo 1 Noviembre - 18 Moodle
6 pages
Data Science Practical No 03
No ratings yet
Data Science Practical No 03
5 pages
Unsupervised ML
No ratings yet
Unsupervised ML
17 pages
RC Slab Design - S1
No ratings yet
RC Slab Design - S1
8 pages
KNN Classification - Solved
No ratings yet
KNN Classification - Solved
13 pages
Import As Import As From Import Import As Import As From Import From Import From Import
No ratings yet
Import As Import As From Import Import As Import As From Import From Import From Import
6 pages
Capex
No ratings yet
Capex
3 pages
CPT 046'
No ratings yet
CPT 046'
2 pages
American Conductor Stranding
No ratings yet
American Conductor Stranding
1 page
Balok Office Typical TP5 16-19 Floor Plan: Panjang Lebar Tinggi Beton (m3) Bekisting (m2) Dimensi No. Tebal Slab (M)
No ratings yet
Balok Office Typical TP5 16-19 Floor Plan: Panjang Lebar Tinggi Beton (m3) Bekisting (m2) Dimensi No. Tebal Slab (M)
6 pages
Flores
No ratings yet
Flores
4 pages
Rectangular Slab Panel Design: 1S02-01: Input Data
No ratings yet
Rectangular Slab Panel Design: 1S02-01: Input Data
15 pages
Normalization
No ratings yet
Normalization
4 pages
Biostatistics-Haramaya University Full - Aug 25 2008
No ratings yet
Biostatistics-Haramaya University Full - Aug 25 2008
88 pages
Publicación Notas Finanzas Corporativas F
No ratings yet
Publicación Notas Finanzas Corporativas F
6 pages
Program Analisa Aliran Daya Pada Sistem 30 Bus Dengan Metode Newton Rapshon
No ratings yet
Program Analisa Aliran Daya Pada Sistem 30 Bus Dengan Metode Newton Rapshon
2 pages
12 Days of Xms
No ratings yet
12 Days of Xms
2 pages
Dimentionality Reduction Implementation
No ratings yet
Dimentionality Reduction Implementation
8 pages
Gain Vs Freq
No ratings yet
Gain Vs Freq
1 page
Database Fe2o3 Gamma
No ratings yet
Database Fe2o3 Gamma
3 pages
Pre-Processing Techniques - Ipynb - Colab
No ratings yet
Pre-Processing Techniques - Ipynb - Colab
3 pages
1-Data For Health Financing
No ratings yet
1-Data For Health Financing
19 pages
Bảng Trị Số Bonjean
No ratings yet
Bảng Trị Số Bonjean
7 pages
L-5 Tower Surge Impedance HEIDLER
No ratings yet
L-5 Tower Surge Impedance HEIDLER
18 pages
Structural Calculation of Concrete Pad For Cheerful Homes: Location: Mabalacat, Pampanga
No ratings yet
Structural Calculation of Concrete Pad For Cheerful Homes: Location: Mabalacat, Pampanga
19 pages
Gain Solid Angle Values
No ratings yet
Gain Solid Angle Values
1 page
Item No. Tap Size (Inch), NC Drill Size (In) Drill Size (MM) Tap Size (Inch), NF
No ratings yet
Item No. Tap Size (Inch), NC Drill Size (In) Drill Size (MM) Tap Size (Inch), NF
6 pages
11zon - Merged-Files (1) - Removed - Removed
No ratings yet
11zon - Merged-Files (1) - Removed - Removed
7 pages
BDA Jalan Maskat-DID Rainfall Record - Labang
No ratings yet
BDA Jalan Maskat-DID Rainfall Record - Labang
2 pages
For Assessment 1 - Case Study - Apple Computer's Supplier Hubs - A Tale of Three Cities
No ratings yet
For Assessment 1 - Case Study - Apple Computer's Supplier Hubs - A Tale of Three Cities
11 pages
Ibf Risk
No ratings yet
Ibf Risk
3 pages
Stata Analysis Questions
No ratings yet
Stata Analysis Questions
15 pages
Cric Zop
No ratings yet
Cric Zop
6 pages
Tugas Besar ASTL Lanjut&Softwarwe STL
No ratings yet
Tugas Besar ASTL Lanjut&Softwarwe STL
8 pages
Ploomber Notebook Conversion - 2
No ratings yet
Ploomber Notebook Conversion - 2
14 pages
Rectangular Slab Panel Design:: Input Data
No ratings yet
Rectangular Slab Panel Design:: Input Data
8 pages
Visualization 03
No ratings yet
Visualization 03
1 page
STD Deviation
No ratings yet
STD Deviation
8 pages
Contoh Cluster K-Means
No ratings yet
Contoh Cluster K-Means
4 pages
PROBLEMARIO
No ratings yet
PROBLEMARIO
2 pages
AVL Trees DSA Java
No ratings yet
AVL Trees DSA Java
15 pages
AI Evolution and Cloud Apps With AI
No ratings yet
AI Evolution and Cloud Apps With AI
15 pages
AVL Trees DSA Java Enhanced
No ratings yet
AVL Trees DSA Java Enhanced
15 pages
Efficient DNA Compression With Zero Loss Using Reed Solomon Codes
No ratings yet
Efficient DNA Compression With Zero Loss Using Reed Solomon Codes
11 pages
Sentimental Analysis
No ratings yet
Sentimental Analysis
10 pages
Watermarking
No ratings yet
Watermarking
9 pages
Image Watermarking
No ratings yet
Image Watermarking
10 pages
Sentimental Analysis of Web Scapping Data
No ratings yet
Sentimental Analysis of Web Scapping Data
9 pages
Ipl Cricket Score
No ratings yet
Ipl Cricket Score
8 pages
Uber Analysis
No ratings yet
Uber Analysis
11 pages
Chatbot
No ratings yet
Chatbot
3 pages
SIT30821 - SITHKOP015 - V1.0 - Standard Recipe Card SRC Template.v1.0
No ratings yet
SIT30821 - SITHKOP015 - V1.0 - Standard Recipe Card SRC Template.v1.0
2 pages
CSR Report
No ratings yet
CSR Report
2 pages
SITXWHS005 Assessment Tasks
100% (1)
SITXWHS005 Assessment Tasks
55 pages
A Survey of Named Entity Recognition Techniques
No ratings yet
A Survey of Named Entity Recognition Techniques
8 pages
A Design of Single Axis Sun Tracking System: July 2011
No ratings yet
A Design of Single Axis Sun Tracking System: July 2011
5 pages

PCA File

Uploaded by

PCA File

Uploaded by

Principal Component Analysis(PCA)

Principal Component Analysis (PCA) is a statistical procedure that uses an

print('Pandas : {}'.format(pd. __version__))

print('Numpy : {}'.format(np. __version__))

print('Scikit_learn : {}'.format(sklearn. __version__))

print('Matplotlib : {}'.format(matplotlib. __version__))

sepal length (cm) sepal width (cm) petal length (cm) \

petal width (cm)

import matplotlib.pyplot as plt

<Figure size 640x480 with 16 Axes>

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

plt.ylabel('Average Within_Cluster Sum of Squares')

plt.title('K-Means Clustering Elbow Method')

from sklearn import preprocessing

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,

from sklearn import metrics

Non Reduced Data

You might also like

print('Pandas : {}'.format(pd. version))

print('Numpy : {}'.format(np. version))

print('Scikit_learn : {}'.format(sklearn. version))

print('Matplotlib : {}'.format(matplotlib. version))