0% found this document useful (0 votes)

18 views6 pages

Reduce Data Dimensionality Using PCA

The document discusses reducing data dimensionality using principal component analysis (PCA) in Python. It explains how PCA works to extract features from a higher dimensional space to a lower dimensional space while retaining maximum information. The steps to perform PCA in Python are outlined, including importing libraries, loading a dataset, standardizing features, checking feature correlation, applying PCA, and checking reduced feature correlation.

Uploaded by

Shobha Kumari Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views6 pages

Reduce Data Dimensionality Using PCA

Uploaded by

Shobha Kumari Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Reduce Data Dimensionality using PCA

– Python
Introduction
The advancements in Data Science and Machine Learning have made it possible for
us to solve several complex regression and classification problems. However, the
performance of all these ML models depends on the data fed to them. Thus, it is
imperative that we provide our ML models with an optimal dataset. Now, one might
think that the more data we provide to our model, the better it becomes – however, it
is not the case. If we feed our model with an excessively large dataset (with a large
no. of features/columns), it gives rise to the problem of overfitting, wherein the
model starts getting influenced by outlier values and noise. This is called the Curse
of Dimensionality.
The following graph represents the change in model performance with the increase
in the number of dimensions of the dataset. It can be observed that the model
performance is best only at an option dimension, beyond which it starts decreasing.

Dimensionality Reduction is a statistical/ML-based technique wherein we try to

reduce the number of features in our dataset and obtain a dataset with an optimal
number of dimensions.
One of the most common ways to accomplish Dimensionality Reduction is Feature
Extraction, wherein we reduce the number of dimensions by mapping a higher
dimensional feature space to a lower-dimensional feature space. The most popular
technique of Feature Extraction is Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
As stated earlier, Principal Component Analysis is a technique of feature extraction
that maps a higher dimensional feature space to a lower-dimensional feature space.
While reducing the number of dimensions, PCA ensures that maximum information
of the original dataset is retained in the dataset with the reduced no. of dimensions
and the co-relation between the newly obtained Principal Components is minimum.
The new features obtained after applying PCA are called Principal Components and
are denoted as PCi (i=1,2,3…n). Here, (Principal Component-1) PC1 captures the
maximum information of the original dataset, followed by PC2, then PC3 and so on.
The following bar graph depicts the amount of Explained Variance captured by
various Principal Components. (The Explained Variance defines the amount of
information captured by the Principal Components).

In order to understand the mathematical aspects involved in Principal Component

Analysis do check out Mathematical Approach to PCA . In this article, we will focus
on how to use PCA in Python for Dimensionality Reduction.
Steps to Apply PCA in Python for Dimensionality Reduction
We will understand the step by step approach of applying Principal Component
Analysis in Python with an example. In this example, we will use the iris dataset,
which is already present in the sklearn library of Python.
Step-1: Import necessary libraries
All the necessary libraries required to load the dataset, pre-process it and then apply
PCA on it are mentioned below:
# Import necessary libraries

from sklearn import datasets # to retrieve the iris Dataset

import pandas as pd # to load the dataframe

from sklearn.preprocessing import StandardScaler # to standardize the features

from sklearn.decomposition import PCA # to apply PCA

import seaborn as sns # to plot the heat maps

Step-2: Load the dataset

After importing all the necessary libraries, we need to load the dataset. Now, the iris
dataset is already present in sklearn. First, we will load it and then convert it into a
pandas data frame for ease of use.

#Load the Dataset

iris = datasets.load_iris()
#convert the dataset into a pandas data frame
df = pd.DataFrame(iris['data'], columns = iris['feature_names'])
#display the head (first 5 rows) of the dataset
df.head()

Output:

Step-3: Standardize the features

Before applying PCA or any other Machine Learning technique it is always
considered good practice to standardize the data. For this, Standard Scalar is the
most commonly used scalar. Standard Scalar is already present in sklearn. So, now
we will standardize the feature set using Standard Scalar and store the scaled feature
set as a pandas data frame.
#Standardize the features

#Create an object of StandardScaler which is present in sklearn.preprocessing

scalar = StandardScaler()

scaled_data = pd.DataFrame(scalar.fit_transform(df)) #scaling the data

scaled_data

Output:
Step-3: Check the Co-relation between features without PCA (Optional)

Now, we will check the co-relation between our scaled dataset using a heat map. For
this, we have already imported the seaborn library in Step-1. The correlation
between various features is given by the corr() function and then the heat map is
plotted by the heatmap() function. The colour scale on the side of the heatmap helps
determine the magnitude of the co-relation. In our example, we can clearly see that a
darker shade represents less co-relation while a lighter shade represents more co-
relation. The diagonal of the heatmap represents the co-relation of a feature with
itself – which is always 1.0, thus, the diagonal of the heatmap is of the highest shade.

 Python3

#Check the Co-relation between features without PCA

sns.heatmap(scaled_data.corr())

Output:
Co-relation Heatmap of Iris dataset without PCA

We can observe from the above heatmap that sepal length & petal length and petal
length & petal width have high co-relation. Thus, we evidently need to apply
dimensionality reduction. If you are already aware that your dataset needs
dimensionality reduction – you can skip this step.

Step-4: Applying Principal Component Analysis

We will apply PCA on the scaled dataset. For this Python offers yet another in-built
class called PCA which is present in sklearn.decomposition, which we have already
imported in step-1. We need to create an object of PCA and while doing so we also
need to initialize n_components – which is the number of principal components we
want in our final dataset. Here, we have taken n_components = 3, which means our
final feature set will have 3 columns. We fit our scaled data to the PCA object which
gives us our reduced dataset.

 Python

#Applying PCA
#Taking no. of Principal Components as 3
pca = PCA(n_components = 3)
pca.fit(scaled_data)
data_pca = pca.transform(scaled_data)
data_pca = pd.DataFrame(data_pca,columns=['PC1','PC2','PC3'])
data_pca.head()
Output:

PCA Dataset

Step-5: Checking Co-relation between features after PCA

Now that we have applied PCA and obtained the reduced feature set, we will check
the co-relation between various Principal Components, again by using a heatmap.

 Python3

#Checking Co-relation between features after PCA

sns.heatmap(data_pca.corr())

Output:

Heatmap after PCA

The above heatmap clearly depicts that there is no correlation between various
obtained principal components (PC1, PC2, and PC3). Thus, we have moved from
higher dimensional feature space to a lower-dimensional feature space while
ensuring that there is no correlation between the so obtained PCs is minimum.
Hence, we have accomplished the objectives of PCA.

Dvpd11 Merged Merged 27 83
No ratings yet
Dvpd11 Merged Merged 27 83
57 pages
Principle Component Analysis (PCA) : Purpose of This Project
No ratings yet
Principle Component Analysis (PCA) : Purpose of This Project
30 pages
PCA Clearly Explained - When, Why, How To Use It and Feature Importance - A Guide in Python - by Serafeim Loukas - Towards AI
No ratings yet
PCA Clearly Explained - When, Why, How To Use It and Feature Importance - A Guide in Python - by Serafeim Loukas - Towards AI
19 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
PRACTICAL5
No ratings yet
PRACTICAL5
23 pages
Updated Lecture 13 Zainab
No ratings yet
Updated Lecture 13 Zainab
17 pages
Mat 211 - 7
No ratings yet
Mat 211 - 7
14 pages
Pca - Principal Component Analysis 1233
No ratings yet
Pca - Principal Component Analysis 1233
30 pages
Lab #3
No ratings yet
Lab #3
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
PCA by Vikram Kumar
No ratings yet
PCA by Vikram Kumar
19 pages
Assignment
No ratings yet
Assignment
24 pages
Principal Component Analysis: #Datascience
No ratings yet
Principal Component Analysis: #Datascience
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Advertising in ML
No ratings yet
Advertising in ML
9 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
06 A1 ML Exp7
No ratings yet
06 A1 ML Exp7
5 pages
Principal Component Analysis (PCA) : Anisha M. Lal
No ratings yet
Principal Component Analysis (PCA) : Anisha M. Lal
20 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
Module 3
No ratings yet
Module 3
41 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
ML Module 6
No ratings yet
ML Module 6
6 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
2 pages
Cvresearchpaperfinalfinal
No ratings yet
Cvresearchpaperfinalfinal
5 pages
PCA Explained
No ratings yet
PCA Explained
9 pages
Exp 3 A
No ratings yet
Exp 3 A
2 pages
Project LA
No ratings yet
Project LA
13 pages
DS Prac 9
No ratings yet
DS Prac 9
3 pages
STAT502
No ratings yet
STAT502
13 pages
Program 3
No ratings yet
Program 3
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Love Report
No ratings yet
Love Report
7 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Principal Component Analysis Limitations and How To Overcome Them Let's Talk A
No ratings yet
Principal Component Analysis Limitations and How To Overcome Them Let's Talk A
5 pages
Dimensionality Reduction (Principal Component Analysis)
No ratings yet
Dimensionality Reduction (Principal Component Analysis)
12 pages
PCA Explained
No ratings yet
PCA Explained
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
K. J. Somaiya College of Engineering, Mumbai-77: Title: Implementation of Principal Component Analysis
No ratings yet
K. J. Somaiya College of Engineering, Mumbai-77: Title: Implementation of Principal Component Analysis
2 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Implementing PCA in Python With Scikit
No ratings yet
Implementing PCA in Python With Scikit
6 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
ML Lab Manual PRGM 2&3
No ratings yet
ML Lab Manual PRGM 2&3
6 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Pca 1
No ratings yet
Pca 1
3 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
3 pages
Introduction To Deep Learning-1
No ratings yet
Introduction To Deep Learning-1
16 pages
Generative AI Class9 Skill Education
No ratings yet
Generative AI Class9 Skill Education
27 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
Model Compression Techniquesin Deep Learning
No ratings yet
Model Compression Techniquesin Deep Learning
23 pages
1st National AI Olympiad Question Bank Watermark
No ratings yet
1st National AI Olympiad Question Bank Watermark
39 pages
ASC Unit I
No ratings yet
ASC Unit I
32 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
SBA - Fault Injection Attack On Deep Neural Network
No ratings yet
SBA - Fault Injection Attack On Deep Neural Network
23 pages
Machine Learning-2
No ratings yet
Machine Learning-2
87 pages
A Comparative Study On Different Transfer Learning Approaches For Identification of Plant Diseases
No ratings yet
A Comparative Study On Different Transfer Learning Approaches For Identification of Plant Diseases
6 pages
Anjeza Kanxha Bachelor Thesis FinalPresentation
No ratings yet
Anjeza Kanxha Bachelor Thesis FinalPresentation
24 pages
Difference Between K Means and Hierarchical Clustering
No ratings yet
Difference Between K Means and Hierarchical Clustering
2 pages
Feature Detection and Matching
100% (1)
Feature Detection and Matching
50 pages
Activation Functions
No ratings yet
Activation Functions
15 pages
Report Med Ghassen Dahmani
No ratings yet
Report Med Ghassen Dahmani
46 pages
Asm - Artificial Intelligence - 129649
No ratings yet
Asm - Artificial Intelligence - 129649
13 pages
GNN Foundations Frontiers and Applications Chapter3
No ratings yet
GNN Foundations Frontiers and Applications Chapter3
11 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
AI-Driven Fraud Detection in Financial Transactions With Graph Neural Networks and Anomaly Detection
No ratings yet
AI-Driven Fraud Detection in Financial Transactions With Graph Neural Networks and Anomaly Detection
6 pages
Ai Unit-1
No ratings yet
Ai Unit-1
5 pages
Lake Et Al 2017 BBS
No ratings yet
Lake Et Al 2017 BBS
72 pages
Quantum Neural Network For Genomic Pattern Detection
No ratings yet
Quantum Neural Network For Genomic Pattern Detection
11 pages
Role of Machine Learning in Manufacturin
No ratings yet
Role of Machine Learning in Manufacturin
9 pages
SQL WITH Clause
No ratings yet
SQL WITH Clause
3 pages
Multimodal Deep Learning Crime Prediction Using Tweets: Sakirin Tam and Ömer Özgür Tanriöver
No ratings yet
Multimodal Deep Learning Crime Prediction Using Tweets: Sakirin Tam and Ömer Özgür Tanriöver
11 pages
Thesis Presentation
No ratings yet
Thesis Presentation
20 pages
Project Presentation (1) - Read-Only
No ratings yet
Project Presentation (1) - Read-Only
16 pages
SQL Query Processing10
No ratings yet
SQL Query Processing10
3 pages
2301.08243 I-Jepa (2023)
No ratings yet
2301.08243 I-Jepa (2023)
17 pages
Single Multi-Source Black-Box Domain Adaption For Sensor Time Series Data
No ratings yet
Single Multi-Source Black-Box Domain Adaption For Sensor Time Series Data
12 pages
SQL Sequences
No ratings yet
SQL Sequences
3 pages
SQL UNION Clause
No ratings yet
SQL UNION Clause
3 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
Linear Equations-2
No ratings yet
Linear Equations-2
2 pages
Keratoconus Detection Using Deep Learning - Is It Possible
No ratings yet
Keratoconus Detection Using Deep Learning - Is It Possible
2 pages
Heart Disease Prediction Using Neural Networks
No ratings yet
Heart Disease Prediction Using Neural Networks
6 pages
Tuber: Tubelet Transformer For Video Action Detection
No ratings yet
Tuber: Tubelet Transformer For Video Action Detection
10 pages
Sample
No ratings yet
Sample
2 pages
Tensorflow in Practice: Bayu Dwi Prasetya
No ratings yet
Tensorflow in Practice: Bayu Dwi Prasetya
1 page
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet

Reduce Data Dimensionality Using PCA

Uploaded by

Reduce Data Dimensionality Using PCA

Uploaded by

Reduce Data Dimensionality using PCA

Dimensionality Reduction is a statistical/ML-based technique wherein we try to

In order to understand the mathematical aspects involved in Principal Component

from sklearn import datasets # to retrieve the iris Dataset

import pandas as pd # to load the dataframe

from sklearn.decomposition import PCA # to apply PCA

import seaborn as sns # to plot the heat maps

Step-2: Load the dataset

#Load the Dataset

Step-3: Standardize the features

#Create an object of StandardScaler which is present in sklearn.preprocessing

scaled_data = pd.DataFrame(scalar.fit_transform(df)) #scaling the data

#Check the Co-relation between features without PCA

Step-4: Applying Principal Component Analysis

Step-5: Checking Co-relation between features after PCA

#Checking Co-relation between features after PCA

Heatmap after PCA

You might also like