0% found this document useful (0 votes)

59 views6 pages

Implementing PCA in Python With Scikit

1. The document discusses implementing PCA in Python using scikit-learn to reduce the dimensionality of data by selecting the most important attributes that capture maximum information. 2. It demonstrates loading breast cancer data, standardizing it, running PCA to reduce the 30 dimensions to 3, and visualizing the results in 2D and 3D plots to show separation between the two classes. 3. It also shows how PCA components explain the variance in the data, with the first component explaining 44% of variance.

Uploaded by

Shobha Kumari Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views6 pages

Implementing PCA in Python With Scikit

Uploaded by

Shobha Kumari Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Implementing PCA in Python with

scikit-learn
In this article, we will learn about PCA (Principal Component Analysis) in Python
with scikit-learn. Let’s start our learning step by step.
WHY PCA?
 When there are many input attributes, it is difficult to visualize the data.
There is a very famous term ‘Curse of dimensionality in the machine
learning domain.
 Basically, it refers to the fact that a higher number of attributes in a dataset
adversely affects the accuracy and training time of the machine learning
model.
 Principal Component Analysis (PCA) is a way to address this issue and is
used for better data visualization and improving accuracy.
How does PCA work?
 PCA is an unsupervised pre-processing task that is carried out before
applying any ML algorithm. PCA is based on “orthogonal linear
transformation” which is a mathematical technique to project the attributes
of a data set onto a new coordinate system. The attribute which describes
the most variance is called the first principal component and is placed at
the first coordinate.
 Similarly, the attribute which stands second in describing variance is
called a second principal component and so on. In short, the complete
dataset can be expressed in terms of principal components. Usually, more
than 90% of the variance is explained by two/three principal components.
 Principal component analysis, or PCA, thus converts data from high
dimensional space to low dimensional space by selecting the most
important attributes that capture maximum information about the dataset.
Python Implementation:
 To implement PCA in Scikit learn, it is essential to standardize/normalize
the data before applying PCA.
 PCA is imported from sklearn.decomposition. We need to select the
required number of principal components.
 Usually, n_components is chosen to be 2 for better visualization but it
matters and depends on data.
 By the fit and transform method, the attributes are passed.
 The values of principal components can be checked using components_
while the variance explained by each principal component can be
calculated using explained_variance_ratio.
1. Import all the libraries
# import all libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

2. Loading Data
Load the breast_cancer dataset from sklearn.datasets. It is clear that the dataset has
569 data items with 30 input attributes. There are two output classes-benign and
malignant. Due to 30 input features, it is impossible to visualize this data

#import the breast _cancer dataset

from sklearn.datasets import load_breast_cancer
data=load_breast_cancer()
data.keys()

# Check the output classes

print(data['target_names'])

# Check the input attributes

print(data['feature_names'])

Output:

3. Apply PCA
 Standardize the dataset prior to PCA.
 Import PCA from sklearn.decomposition.
 Choose the number of principal components.
Let us select it to 3. After executing this code, we get to know that the dimensions of
x are (569,3) while the dimension of actual data is (569,30). Thus, it is clear that
with PCA, the number of dimensions has reduced to 3 from 30. If we choose
n_components=2, the dimensions would be reduced to 2.

# construct a dataframe using pandas

df1=pd.DataFrame(data['data'],columns=data['feature_names'])

# Scale data before applying PCA

scaling=StandardScaler()

# Use fit and transform method

scaling.fit(df1)
Scaled_data=scaling.transform(df1)

# Set the n_components=3

principal=PCA(n_components=3)
principal.fit(Scaled_data)
x=principal.transform(Scaled_data)

# Check the dimensions of data after PCA

print(x.shape)

Output:
(569,3)

4. Check Components
The principal.components_ provide an array in which the number of rows tells the
number of principal components while the number of columns is equal to the
number of features in actual data. We can easily see that there are three rows as
n_components was chosen to be 3. However, each row has 30 columns as in actual
data.

# Check the values of eigen vectors

# prodeced by principal components
principal.components_

5. Plot the components (Visualization)

Plot the principal components for better data visualization. Though we had taken
n_components =3, here we are plotting a 2d graph as well as 3d using first two
principal components and 3 principal components respectively. For three principal
components, we need to plot a 3d graph. The colors show the 2 output classes of the
original dataset-benign and malignant. It is clear that principal components show
clear separation between two output classes.

plt.figure(figsize=(10,10))
plt.scatter(x[:,0],x[:,1],c=data['target'],cmap='plasma')
plt.xlabel('pc1')
plt.ylabel('pc2')

Output:

For three principal components, we need to plot a 3d graph. x[:,0] signifies the first
principal component. Similarly, x[:,1] and x[:,2] represent the second and the third
principal component.

# import relevant libraries for 3d graph

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(10,10))
# choose projection 3d for creating a 3d graph
axis = fig.add_subplot(111, projection='3d')

# x[:,0]is pc1,x[:,1] is pc2 while x[:,2] is pc3

axis.scatter(x[:,0],x[:,1],x[:,2], c=data['target'],cmap='plasma')
axis.set_xlabel("PC1", fontsize=10)
axis.set_ylabel("PC2", fontsize=10)
axis.set_zlabel("PC3", fontsize=10)

Output:

6. Calculate variance ratio

Explained_variance_ratio provides an idea of how much variation is explained by
principal components.

# check how much variance is explained by each principal component

print(principal.explained_variance_ratio_)

Output:
array([0.44272026, 0.18971182, 0.09393163])

Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
Mc21 The Mad Titans Shadow Rulebook-Compressed
No ratings yet
Mc21 The Mad Titans Shadow Rulebook-Compressed
28 pages
PCA Explained
No ratings yet
PCA Explained
5 pages
Ancient Psychomusicology Studies
No ratings yet
Ancient Psychomusicology Studies
557 pages
Reduce Data Dimensionality Using PCA
No ratings yet
Reduce Data Dimensionality Using PCA
6 pages
Fem2063 Data Analytics - May 2020 Lab Practice 5 (Week 6)
No ratings yet
Fem2063 Data Analytics - May 2020 Lab Practice 5 (Week 6)
8 pages
Face Recognition Using PCA
No ratings yet
Face Recognition Using PCA
8 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
ML LAB - Principal Component Analysis
No ratings yet
ML LAB - Principal Component Analysis
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
PCA & Decision Tree Tutorial
No ratings yet
PCA & Decision Tree Tutorial
23 pages
English Portfolio Class 12
No ratings yet
English Portfolio Class 12
8 pages
06 A1 ML Exp7
No ratings yet
06 A1 ML Exp7
5 pages
EFL CM2 DecidingWhatSkills
No ratings yet
EFL CM2 DecidingWhatSkills
7 pages
Updated Lecture 13 Zainab
No ratings yet
Updated Lecture 13 Zainab
17 pages
Soil Exploration & Boring Methods
No ratings yet
Soil Exploration & Boring Methods
84 pages
Lab #3
No ratings yet
Lab #3
12 pages
Quality by Design in Pharma Development
No ratings yet
Quality by Design in Pharma Development
18 pages
An Analytical and Comparative Approach To Cultural Heritage Experiences Enhanced With Augmented Reality
No ratings yet
An Analytical and Comparative Approach To Cultural Heritage Experiences Enhanced With Augmented Reality
25 pages
BE EEC SchemeMarch3
No ratings yet
BE EEC SchemeMarch3
45 pages
Group Behavior Foundations
No ratings yet
Group Behavior Foundations
36 pages
Dvpd11 Merged Merged 27 83
No ratings yet
Dvpd11 Merged Merged 27 83
57 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
Principal Component Analysis: #Datascience
No ratings yet
Principal Component Analysis: #Datascience
13 pages
Week6 - Colab
No ratings yet
Week6 - Colab
3 pages
Principal Component Analysis With Cats
No ratings yet
Principal Component Analysis With Cats
10 pages
Data Science Module 5
No ratings yet
Data Science Module 5
28 pages
Pca
No ratings yet
Pca
18 pages
Project LA
No ratings yet
Project LA
13 pages
Mloa Exp2 C121
No ratings yet
Mloa Exp2 C121
20 pages
European Steel and Alloy Grades: 10crmo9-10 (1.7380)
No ratings yet
European Steel and Alloy Grades: 10crmo9-10 (1.7380)
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Activation Functions
No ratings yet
Activation Functions
15 pages
Resouce Guide The Giver
No ratings yet
Resouce Guide The Giver
46 pages
Experiment 3 PCA On Iris Dataset
No ratings yet
Experiment 3 PCA On Iris Dataset
2 pages
Atlassian - LeetCode
No ratings yet
Atlassian - LeetCode
2 pages
Exp 3 A
No ratings yet
Exp 3 A
2 pages
PCA Guide: Usage, Python Implementation, Feature Importance
No ratings yet
PCA Guide: Usage, Python Implementation, Feature Importance
9 pages
Data Set
No ratings yet
Data Set
3 pages
Principle Component Analysis (PCA) : Purpose of This Project
No ratings yet
Principle Component Analysis (PCA) : Purpose of This Project
30 pages
Iris Dataset PCA Analysis Code
No ratings yet
Iris Dataset PCA Analysis Code
21 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Earth Science Module 1 Final Edited Grade 11
No ratings yet
Earth Science Module 1 Final Edited Grade 11
22 pages
PGM 3
No ratings yet
PGM 3
2 pages
CSE455/CSE552 Machine Learning (Spring 2024) Homework #3: Hand-In Policy Collaboration Policy Grading
No ratings yet
CSE455/CSE552 Machine Learning (Spring 2024) Homework #3: Hand-In Policy Collaboration Policy Grading
2 pages
Importing Libraries Used in This Chapter
No ratings yet
Importing Libraries Used in This Chapter
8 pages
Tolerances For Diecastings Din 1688
No ratings yet
Tolerances For Diecastings Din 1688
5 pages
Principal Component Analysis Python
No ratings yet
Principal Component Analysis Python
7 pages
Spray Humidifier for Agriculture
No ratings yet
Spray Humidifier for Agriculture
27 pages
DAI Amberish LAB ASSIGNMENT 3
No ratings yet
DAI Amberish LAB ASSIGNMENT 3
7 pages
Dimensionality Reduction (Principal Component Analysis)
No ratings yet
Dimensionality Reduction (Principal Component Analysis)
12 pages
Exp 3
No ratings yet
Exp 3
4 pages
Assignment
No ratings yet
Assignment
24 pages
Subudhi Techno Engineers PVTLTD Checklist For Reiki Survey: SN Details Remarks Option Choice (Y/N)
No ratings yet
Subudhi Techno Engineers PVTLTD Checklist For Reiki Survey: SN Details Remarks Option Choice (Y/N)
16 pages
Unsupervised Dimensionality Reduction Via Principal Component Analysis
No ratings yet
Unsupervised Dimensionality Reduction Via Principal Component Analysis
3 pages
Variables Thesis Writing
100% (3)
Variables Thesis Writing
7 pages
DS Prac 9
No ratings yet
DS Prac 9
3 pages
SQL Sequences for Database Developers
No ratings yet
SQL Sequences for Database Developers
3 pages
46 2021 (14) Performa Sumber Daya Genetik Babi Lokal (Sus Scropa Domesticus) Di Pulau Timor, Nusa Tenggara Timur
No ratings yet
46 2021 (14) Performa Sumber Daya Genetik Babi Lokal (Sus Scropa Domesticus) Di Pulau Timor, Nusa Tenggara Timur
12 pages
Dimensionality Reduction - PCA LDA
No ratings yet
Dimensionality Reduction - PCA LDA
25 pages
Pca
No ratings yet
Pca
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
ML Exp6
No ratings yet
ML Exp6
7 pages
Program - 3
No ratings yet
Program - 3
4 pages
Program 3
No ratings yet
Program 3
7 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
3 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
Effectiveness of Olive Oil Massage On Fatigue Among The Patients Undergoing Haemodialysis
No ratings yet
Effectiveness of Olive Oil Massage On Fatigue Among The Patients Undergoing Haemodialysis
5 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
Unit Test-1 Portion and Time Table Grade IX - Compressed
No ratings yet
Unit Test-1 Portion and Time Table Grade IX - Compressed
3 pages
Love Report
No ratings yet
Love Report
7 pages
EMERITUS PROFESSOR OF EMINENCE Professor M. M. Sharma
No ratings yet
EMERITUS PROFESSOR OF EMINENCE Professor M. M. Sharma
1 page
API Mud Balance Instruction Guide
No ratings yet
API Mud Balance Instruction Guide
6 pages
Six Sigma Method and 5s Method
No ratings yet
Six Sigma Method and 5s Method
12 pages
Stem 434 Lesson Plan Final - Kelci Spence
No ratings yet
Stem 434 Lesson Plan Final - Kelci Spence
7 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
Ai (PCA)
No ratings yet
Ai (PCA)
3 pages
Advertising in ML
No ratings yet
Advertising in ML
9 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Philo Pointers
No ratings yet
Philo Pointers
3 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Expanded World Creation for SWN
No ratings yet
Expanded World Creation for SWN
8 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Unit Test Integral Calculus Set A
No ratings yet
Unit Test Integral Calculus Set A
4 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
Elevating Branding Potential Through Color Psychology
No ratings yet
Elevating Branding Potential Through Color Psychology
3 pages
SQL Query Processing10
No ratings yet
SQL Query Processing10
3 pages
SQL UNION Clause
No ratings yet
SQL UNION Clause
3 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
Difference Between K Means and Hierarchical Clustering
No ratings yet
Difference Between K Means and Hierarchical Clustering
2 pages
Linear Equations-2
No ratings yet
Linear Equations-2
2 pages
Central Place Theory Christaller and Losch
No ratings yet
Central Place Theory Christaller and Losch
10 pages
SQL WITH Clause
No ratings yet
SQL WITH Clause
3 pages
Pca 1
No ratings yet
Pca 1
3 pages
Wellness Habits for Veterinary Professionals
No ratings yet
Wellness Habits for Veterinary Professionals
2 pages

Implementing PCA in Python With Scikit

Uploaded by

Implementing PCA in Python With Scikit

Uploaded by

Implementing PCA in Python with

import matplotlib.pyplot as plt

from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

#import the breast _cancer dataset

# Check the output classes

# Check the input attributes

# construct a dataframe using pandas

# Scale data before applying PCA

# Use fit and transform method

# Set the n_components=3

# Check the dimensions of data after PCA

# Check the values of eigen vectors

5. Plot the components (Visualization)

# import relevant libraries for 3d graph

# x[:,0]is pc1,x[:,1] is pc2 while x[:,2] is pc3

6. Calculate variance ratio

# check how much variance is explained by each principal component

You might also like