Dimensionality Reduction

Dimensionality reduction is the process of reducing the number of features in a dataset while retaining essential information, aimed at improving model performance and visualization. Techniques such as Principal Component Analysis (PCA), Singular Value Decomposition (SVD), and Linear Discriminant Analysis (LDA) are commonly used for this purpose. While dimensionality reduction can enhance model efficiency and prevent overfitting, it may also lead to data loss and challenges in interpretability.

Uploaded by

bca2m2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views27 pages

Dimensionality Reduction

Uploaded by

bca2m2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Dimensionality Reduction

Dimensionality Reduction
• Dimensionality reduction is the task of
reducing the number of features in a dataset.
• In machine learning tasks like regression or
classification, there are often too many
variables to work with.
• These variables are also called features.
Dimensionality Reduction
• Dimensionality reduction is the process of reducing the
number of features (or dimensions) in a dataset while
retaining as much information as possible.
• This can be done for a variety of reasons, such as to reduce
the complexity of a model, to improve the performance of a
learning algorithm, or to make it easier to visualize the data.
• There are several techniques for dimensionality reduction,
including principal component analysis (PCA), singular value
decomposition (SVD), and linear discriminant analysis (LDA).
• Each technique uses a different method to project the data
onto a lower-dimensional space while preserving important
information.
Dimensionality Reduction
• Dimensionality reduction is a technique used
to reduce the number of features in a dataset
while retaining as much of the important
information as possible.
• In other words, it is a process of transforming
high-dimensional data into a lower-
dimensional space that still preserves the
essence of the original data.
The Curse of Dimensionality
• In machine learning, high-dimensional data refers to data
with a large number of features or variables.
• The curse of dimensionality is a common problem in
machine learning, where the performance of the model
deteriorates as the number of features increases.
• This is because the complexity of the model increases with
the number of features, and it becomes more difficult to
find a good solution.
• In addition, high-dimensional data can also lead to
overfitting, where the model fits the training data too
closely and does not generalize well to new data.
Dimensionality Reduction Methods
• Dimensionality reduction can help to mitigate
these problems by reducing the complexity of
the model and improving its generalization
performance.
• There are two main approaches to
dimensionality reduction:
– feature selection
– feature extraction.
Feature Selection
• Feature selection involves selecting a subset of the original
features that are most relevant to the problem at hand.
• The goal is to reduce the dimensionality of the dataset
while retaining the most important features.
• There are several methods for feature selection, including
filter methods, wrapper methods, and embedded methods.
• Filter methods rank the features based on their relevance
to the target variable, wrapper methods use the model
performance as the criteria for selecting features, and
embedded methods combine feature selection with the
model training process.
Feature Extraction
• Feature extraction involves creating new features by
combining or transforming the original features.
• The goal is to create a set of features that captures the
essence of the original data in a lower-dimensional space.
• There are several methods for feature extraction, including
principal component analysis (PCA), linear discriminant
analysis (LDA), and t-distributed stochastic neighbor
embedding (t-SNE).
• PCA is a popular technique that projects the original
features onto a lower-dimensional space while preserving
as much of the variance as possible.
Why is Dimensionality Reduction important in
Machine Learning and Predictive Modeling?
• An intuitive example of dimensionality reduction can be discussed through
a simple e-mail classification problem, where we need to classify whether
the e-mail is spam or not.
• This can involve a large number of features, such as whether or not the e-
mail has a generic title, the content of the e-mail, whether the e-mail uses
a template, etc.
• However, some of these features may overlap.
• In another condition, a classification problem that relies on both humidity
and rainfall can be collapsed into just one underlying feature, since both of
the aforementioned are correlated to a high degree.
• Hence, we can reduce the number of features in such problems.
• A 3-D classification problem can be hard to visualize, whereas a 2-D one
can be mapped to a simple 2-dimensional space, and a 1-D problem to a
simple line.
• The below figure illustrates this concept, where a 3-D feature space is split
into two 2-D feature spaces, and later, if found to be correlated, the
number of features can be reduced even further.
Components of Dimensionality Reduction

• There are two components of dimensionality

reduction:
– Feature selection: In this, we try to find a subset of the
original set of variables, or features, to get a smaller subset
which can be used to model the problem.
– It usually involves three ways:
• Filter
• Wrapper
• Embedded
– Feature extraction: This reduces the data in a high
dimensional space to a lower dimension space, i.e. a space
with lesser no. of dimensions.
Methods of Dimensionality Reduction
• The various methods used for dimensionality
reduction include:
– Principal Component Analysis (PCA)
– Linear Discriminant Analysis (LDA)
– Generalized Discriminant Analysis (GDA)
• Dimensionality reduction may be both linear
and non-linear, depending upon the method
used.
Principal Component Analysis
• This method was introduced by Karl Pearson.
• It works on the condition that while the data
in a higher dimensional space is mapped to
data in a lower dimension space, the variance
of the data in the lower dimensional space
should be maximum.
Principal Component Analysis
Principal Component Analysis
• It involves the following steps:
– Construct the covariance matrix of the data.
– Compute the eigenvectors of this matrix.
– Eigenvectors corresponding to the largest
eigenvalues are used to reconstruct a large
fraction of variance of the original data.
Principal Component Analysis
• Hence, we are left with a lesser number of
eigenvectors, and there might have been
some data loss in the process.
• But, the most important variances should be
retained by the remaining eigenvectors.
Eigenvalues
• In PCA, eigenvalues represent the amount of
variance (spread or variability) captured by each
principal component.
• Each principal component corresponds to an
eigenvalue, and the eigenvalues are arranged in
decreasing order. The higher the eigenvalue, the
more variance the corresponding principal
component explains.
• The sum of all eigenvalues equals the total variance
in the original data.
Eigenvectors
• Eigenvectors are associated with the principal components,
and they indicate the direction of the spread or variability in
the data.
• Each eigenvector points in the direction of maximum variance,
and the magnitude of the eigenvalue associated with that
eigenvector indicates the importance or significance of that
direction.
• The first principal component (associated with the largest
eigenvalue) points in the direction of maximum variance, the
second principal component (associated with the second-
largest eigenvalue) points in the direction of the second-
highest variance, and so on.
• Eigenvectors are used to transform the original data into a new
coordinate system, defined by the principal components.
• Eigenvalues tell us how much variance is
captured by each principal component, and
eigenvectors tell us the direction of the spread
of the data in the new coordinate system.
• The goal of PCA is to reduce the
dimensionality of the data by keeping the
principal components with the highest
eigenvalues, as they represent the most
significant patterns or features in the dataset.
Variance in PCA
• Variance is a key concept in Principal Component
Analysis (PCA), and its importance lies in
capturing and retaining the most significant
information present in the data.
• In the context of data, variance represents the
amount of information or spread in the dataset.
• High variance indicates that the data points are
more dispersed, covering a wider range of values.
Variance in PCA
• The primary goal of PCA is to reduce the dimensionality
of the data while retaining as much of its original
variability as possible.
• Principal components are derived in such a way that the
first principal component captures the maximum variance
in the data, the second principal component captures the
second-highest variance, and so on.
• By selecting a subset of principal components that
collectively explain a high percentage of the total
variance, you can represent the data in a lower-
dimensional space without losing significant information.
Variance in PCA
• Variance is a measure of the information
content in each direction (principal
component) of the data.
• By focusing on the directions with the highest
variance, PCA helps in discarding less
important directions and, consequently,
reduces the dimensionality of the data.
Advantages of Dimensionality Reduction
• It helps in data compression, and hence reduced storage
space.
• It reduces computation time.
• It also helps remove redundant features, if any.
• Improved Visualization
– High dimensional data is difficult to visualize, and dimensionality
reduction techniques can help in visualizing the data in 2D or 3D,
which can help in better understanding and analysis.
• Overfitting Prevention
– High dimensional data may lead to overfitting in machine learning
models, which can lead to poor generalization performance.
– Dimensionality reduction can help in reducing the complexity of
the data, and hence prevent overfitting.
Advantages of Dimensionality Reduction
• Feature Extraction
– Dimensionality reduction can help in extracting important features
from high dimensional data, which can be useful in feature
selection for machine learning models.
• Data Preprocessing
– Dimensionality reduction can be used as a preprocessing step
before applying machine learning algorithms to reduce the
dimensionality of the data and hence improve the performance of
the model.
• Improved Performance
– Dimensionality reduction can help in improving the performance
of machine learning models by reducing the complexity of the
data, and hence reducing the noise and irrelevant information in
the data.
Disadvantages of Dimensionality
Reduction
• It may lead to some amount of data loss.
• PCA tends to find linear correlations between variables,
which is sometimes undesirable.
• PCA fails in cases where mean and covariance are not
enough to define datasets.
• We may not know how many principal components to
keep- in practice, some thumb rules are applied.
• Interpretability:
– The reduced dimensions may not be easily interpretable, and
it may be difficult to understand the relationship between
the original features and the reduced dimensions.
Disadvantages of Dimensionality
Reduction
• Overfitting
– In some cases, dimensionality reduction may lead to
overfitting, especially when the number of components is
chosen based on the training data.
• Sensitivity to outliers
– Some dimensionality reduction techniques are sensitive to
outliers, which can result in a biased representation of the
data.
• Computational complexity
– Some dimensionality reduction techniques, such as manifold
learning, can be computationally intensive, especially when
dealing with large datasets.
Important points

• Dimensionality reduction is the process of reducing the number of

features in a dataset while retaining as much information as possible.
• This can be done to reduce the complexity of a model, improve the
performance of a learning algorithm, or make it easier to visualize
the data.
• Techniques for dimensionality reduction include: principal
component analysis (PCA), singular value decomposition (SVD), and
linear discriminant analysis (LDA).
• Each technique projects the data onto a lower-dimensional space
while preserving important information.
• Dimensionality reduction is performed during pre-processing stage
before building a model to improve the performance
• It is important to note that dimensionality reduction can also discard
useful information, so care must be taken when applying these
techniques.

DR Engp I 1.15 R6 - Ing
No ratings yet
DR Engp I 1.15 R6 - Ing
19 pages
D&D 4th Edition - Adventurer's Vault 2
100% (6)
D&D 4th Edition - Adventurer's Vault 2
159 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Introduction To Dimensionality Reduction
No ratings yet
Introduction To Dimensionality Reduction
5 pages
Arsenal Football Club PLC V Reed
100% (1)
Arsenal Football Club PLC V Reed
23 pages
Modul English PSPK
No ratings yet
Modul English PSPK
139 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
14 pages
ch1 Part 1
100% (1)
ch1 Part 1
28 pages
Dimension Reduction
No ratings yet
Dimension Reduction
38 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
Unit 3
No ratings yet
Unit 3
102 pages
SSP Cakram5 6
No ratings yet
SSP Cakram5 6
420 pages
Pengaruh Lingkungan Kos-Kosan Terhadap Motivasi Belajar Mahasiswa Stakpn Ambon
No ratings yet
Pengaruh Lingkungan Kos-Kosan Terhadap Motivasi Belajar Mahasiswa Stakpn Ambon
14 pages
Test Bank For Community Policing A Contemporary Perspective 6th Edition Kappelerdownload
100% (12)
Test Bank For Community Policing A Contemporary Perspective 6th Edition Kappelerdownload
32 pages
Book Sizes
No ratings yet
Book Sizes
9 pages
ML Unit 4 (R22)
No ratings yet
ML Unit 4 (R22)
34 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
61 pages
Dimensionality Reduction Techniques You Should Know in 2021
No ratings yet
Dimensionality Reduction Techniques You Should Know in 2021
12 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
Chapter6 - Unit IV2024
No ratings yet
Chapter6 - Unit IV2024
84 pages
Business Ethics - Chapter 5
No ratings yet
Business Ethics - Chapter 5
25 pages
ML Unit 4
No ratings yet
ML Unit 4
20 pages
Module 3
No ratings yet
Module 3
41 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
35.232-2016.30 Balsam Tawfiq Swaidan
No ratings yet
35.232-2016.30 Balsam Tawfiq Swaidan
70 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Dimensionality Reduction in Machine Learning-1
No ratings yet
Dimensionality Reduction in Machine Learning-1
16 pages
Of Plymouth Plantation PDF
100% (2)
Of Plymouth Plantation PDF
4 pages
ML Unit 4 at VS
No ratings yet
ML Unit 4 at VS
33 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
Max 30
No ratings yet
Max 30
6 pages
Unit 3
No ratings yet
Unit 3
23 pages
Unit-13 Feature Selection and Extraction
No ratings yet
Unit-13 Feature Selection and Extraction
24 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Dimension Reduction: P Adraig Cunningham University College Dublin
No ratings yet
Dimension Reduction: P Adraig Cunningham University College Dublin
24 pages
Unit 5 - Machine Learning - Www.a2softech - Xyz - A2kash
No ratings yet
Unit 5 - Machine Learning - Www.a2softech - Xyz - A2kash
12 pages
Lesson 3 - Week 1
No ratings yet
Lesson 3 - Week 1
28 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
20 pages
Paidout Policies
No ratings yet
Paidout Policies
2 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
33 pages
MTS3101 Appendices v1
No ratings yet
MTS3101 Appendices v1
35 pages
Dimensonality Reduction
No ratings yet
Dimensonality Reduction
25 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
22 pages
L-10 - Presentation1-09052024-072206pm
No ratings yet
L-10 - Presentation1-09052024-072206pm
27 pages
Memory MGMT 1
No ratings yet
Memory MGMT 1
17 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Waiting For Santa - Barney Wiki - Fandom 44 58
No ratings yet
Waiting For Santa - Barney Wiki - Fandom 44 58
7 pages
Food Safety, Sanitation and Hygience
No ratings yet
Food Safety, Sanitation and Hygience
21 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Feature Selection & Feature Extraction
No ratings yet
Feature Selection & Feature Extraction
19 pages
Lecture3 Deadlock
No ratings yet
Lecture3 Deadlock
14 pages
Dimensionality Reduction Techniques For ML Class
No ratings yet
Dimensionality Reduction Techniques For ML Class
17 pages
Feature Selection Extraction
No ratings yet
Feature Selection Extraction
24 pages
Lecture5 Deadlock
No ratings yet
Lecture5 Deadlock
13 pages
Lecture1 Deadlock
No ratings yet
Lecture1 Deadlock
13 pages
Air Brake Rake Testing Procedure (LHB Coaches (2) - 0
No ratings yet
Air Brake Rake Testing Procedure (LHB Coaches (2) - 0
22 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Data Reduction
No ratings yet
Data Reduction
9 pages
Dimensionality Reduction-PCA FA LDA
No ratings yet
Dimensionality Reduction-PCA FA LDA
12 pages
AML Unit 5
No ratings yet
AML Unit 5
13 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Sprockets
No ratings yet
Sprockets
16 pages
Effectiveness of PPE Welding Presentation
No ratings yet
Effectiveness of PPE Welding Presentation
11 pages
Self Organizing Maps (SOM)
No ratings yet
Self Organizing Maps (SOM)
8 pages
Project VBA: How and Why It Can Make You A Project Guru!
No ratings yet
Project VBA: How and Why It Can Make You A Project Guru!
14 pages
Dimensionality
No ratings yet
Dimensionality
9 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
9 pages
Quiksam PDF
No ratings yet
Quiksam PDF
6 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
M.sc. Chemistry
No ratings yet
M.sc. Chemistry
20 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Anu Arora Report
No ratings yet
Anu Arora Report
8 pages
Unit 5 Notes New
No ratings yet
Unit 5 Notes New
6 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Betas
No ratings yet
Betas
4 pages
The Star Weaver
No ratings yet
The Star Weaver
2 pages
I Introduction and Design of The Study
No ratings yet
I Introduction and Design of The Study
5 pages
Dimensionality Reduction Report-Yomna Eid Rizk
No ratings yet
Dimensionality Reduction Report-Yomna Eid Rizk
6 pages
DE-13 - Quiz 8
No ratings yet
DE-13 - Quiz 8
2 pages
HDBS Parameters
No ratings yet
HDBS Parameters
1 page
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Dimensionality Reduction

Uploaded by

Dimensionality Reduction

Uploaded by

Dimensionality Reduction

• There are two components of dimensionality

• Dimensionality reduction is the process of reducing the number of

You might also like