0% found this document useful (0 votes)

77 views24 pages

3.1 Dimensionality Reduction

Dimensionality reduction techniques are used to reduce the number of features in datasets for better predictive modeling. Common techniques include feature selection, which selects important existing features, and feature extraction, which creates new features as combinations of existing ones. Popular algorithms for each technique are discussed.

Uploaded by

Javada Javada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views24 pages

3.1 Dimensionality Reduction

Uploaded by

Javada Javada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

What is Dimensionality Reduction?

 The number of input features, variables, or columns present in a given

dataset is known as dimensionality

 The process to reduce these features is called dimensionality reduction.

 A dataset contains a huge number of input features in various cases, which

makes the predictive modeling task more complicated.

 For obtaining a better fit predictive model while solving the classification
and regression problems, we use dimensionality reduction
Benefits of applying Dimensionality Reduction

 By reducing the dimensions of the features, the space required to store

the dataset also gets reduced.
 Less Computation training time is required for reduced dimensions of
features.
 Reduced dimensions of features of the dataset help in visualizing the
data quickly.
 It removes the redundant features (if present) by taking care of
multicollinearity.
Dimensionality Reduction Techniques

1. Feature Selection (Subset Selection)

2.Feature Extraction
1. Feature Selection
 A feature is an attribute that has an impact on a problem or is useful for the
problem, and choosing the important features for the model is known as
feature selection.

 Differences :
 feature selection is about selecting the subset of the original feature
set
 feature extraction creates new features.

 Feature selection is a way of reducing the input variable for the model by using
only relevant data in order to reduce overfitting in the model.
What is Feature Selection?
 Feature selection is the process of selecting the subset of the relevant
features and leaving out the irrelevant features present in a dataset to build a
model of high accuracy.

 In other words, it is a way of selecting the optimal features from the input
dataset.
Definition:
 "It is a process of automatically or manually selecting the subset of most
appropriate and relevant features to be used in model building."
 Feature selection is performed by either including the important features or
excluding the irrelevant features in the dataset without changing them.
Need for Feature Selection

 It is necessary to provide a pre-processed and good input dataset in order to

get better outcomes.
 We collect a huge amount of data to train our model and help it to learn
better.
 The dataset consists of noisy data, irrelevant data, and some part of useful
data.
 Huge amount of data also slows down the training process of the model, and
with noise and irrelevant data, the model may not predict and perform well.
 So, it is very necessary to remove such noises and less-important data from
the dataset and to do this, and Feature selection techniques are used.
 Selecting the best features helps the model to perform well.
Benefits

 It helps in avoiding the curse of dimensionality.

 It helps in the simplification of the model so that it can be easily
interpreted by the researchers.
 It reduces the training time.
 It reduces overfitting hence enhance the generalization.
Feature Selection Techniques
a. Filter Methods
 The dataset is filtered, and a subset that contains only the relevant
features is taken.
 Filters out the irrelevant feature and redundant columns from the
model by using different metrics through ranking
 Some common techniques of filters method are:
 Correlation
 Chi-Square Test
 ANOVA
 Information Gain, etc.
b. Wrappers Methods

 Same goal as the filter method, but it takes a machine learning model for
its evaluation.

 In this method, some features are fed to the ML model, and evaluate the
performance.

 On the basis of the output of the model, features are added or subtracted,
and with this feature set, the model has trained again.

 This method is more accurate than the filtering method but complex to
work.
 Some common techniques of wrapper methods are:

i. Forward Selection
ii. Backward Selection
iii. Bi-directional Elimination
i. Forward selection -
 is an iterative process, which begins with an empty set of
features.

 After each iteration, it keeps adding on a feature and evaluates

the performance to check whether it is improving the
performance or not.

 The process continues until the addition of a new

variable/feature does not improve the performance of the model.
ii. Backward elimination -
Also an iterative approach

opposite of forward selection

 This technique begins the process by considering all the features

and removes the least significant feature.

 This elimination process continues until removing the features

does not improve the performance of the model.
c. Embedded Methods

 Check the different training iterations of the machine learning model

and evaluate the importance of each feature

 Combined the advantages of both filter and wrapper methods by

considering the interaction of features along with low computational
cost.

 These are fast processing methods similar to the filter method but
more accurate than the filter method.
 Some common techniques of Embedded methods are:

 LASSO
 Elastic Net
 Ridge Regression etc.
How to choose a Feature Selection Method?
It is very important to understand that which feature selection method will work properly
for their model.
2. Feature Extraction

 Feature extraction is the process of transforming the space

containing many dimensions into space with fewer dimensions.
 This approach is useful when we want to keep the whole
information but use fewer resources while processing the
information.
Some common feature extraction techniques are:

 Principal Component Analysis

 Linear Discriminant Analysis
 Kernel PCA
 Quadratic Discriminant Analysis
Why is this Useful?

• To reduce the number of resources needed for processing without losing

important or relevant information.
• Also reduce the amount of redundant data for a given analysis.
• Gives us new features which are a linear combination of the existing features.
• The new set of features will have different values as compared to the original
feature values.
• The main aim is that fewer features will be required to capture the
same information.
Common techniques of Dimensionality Reduction
a) Principal Component Analysis
b) Backward Elimination
c) Forward Selection
d) Score comparison
e) Missing Value Ratio
f) Low Variance Filter
g) High Correlation Filter
h) Random Forest
i) Factor Analysis
j) Auto-Encoder

Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review
No ratings yet
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review
14 pages
Sample SSIP 2
75% (4)
Sample SSIP 2
3 pages
Sed Programme Guide 2019 20
No ratings yet
Sed Programme Guide 2019 20
99 pages
Effects of Work Environments On Nursing and Patient Outcomes
No ratings yet
Effects of Work Environments On Nursing and Patient Outcomes
218 pages
Chap001 - Cost Management and Strategic Decision Making
No ratings yet
Chap001 - Cost Management and Strategic Decision Making
24 pages
Master Thesis
No ratings yet
Master Thesis
228 pages
Dimenn Red PDF
No ratings yet
Dimenn Red PDF
135 pages
IT - BPO Sector in The Philippines: A Policy Analysis Using Punctuated Equilibrium Theory
No ratings yet
IT - BPO Sector in The Philippines: A Policy Analysis Using Punctuated Equilibrium Theory
15 pages
Discrete Probability Distribution-1
No ratings yet
Discrete Probability Distribution-1
41 pages
Wind Loads On Marine Structures
100% (1)
Wind Loads On Marine Structures
11 pages
Indian Institute of Technology Madras: Placement Brochure
No ratings yet
Indian Institute of Technology Madras: Placement Brochure
18 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
Censoring & Truncation
No ratings yet
Censoring & Truncation
14 pages
11 PJBUMI Digital Data Specialist DR NARISHAH CV - Narishah - 2021
No ratings yet
11 PJBUMI Digital Data Specialist DR NARISHAH CV - Narishah - 2021
7 pages
Philtech Thesis - Ara, Allyson, Vanessa, Jia, Jerwin
No ratings yet
Philtech Thesis - Ara, Allyson, Vanessa, Jia, Jerwin
12 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
Chapter 3
No ratings yet
Chapter 3
9 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
DS100-2-Grp#4 Chapter 6 Advanced Analytical Theory and Methods Regression (CADAY, CASTOR, CRUZ, SANORIA, TAN)
No ratings yet
DS100-2-Grp#4 Chapter 6 Advanced Analytical Theory and Methods Regression (CADAY, CASTOR, CRUZ, SANORIA, TAN)
4 pages
Critical Review - Ditha Dwiastuti - n1d219047 - Universitas Halu Oleo
No ratings yet
Critical Review - Ditha Dwiastuti - n1d219047 - Universitas Halu Oleo
3 pages
Trung, T.T., International Conference On Business and Finance 2024
No ratings yet
Trung, T.T., International Conference On Business and Finance 2024
30 pages
School Leadership Preparation Questionnaire
No ratings yet
School Leadership Preparation Questionnaire
2 pages
Goodman Et Al 2011 An Investigation of The Relationship Between Students Motivation and Academic Performance As
No ratings yet
Goodman Et Al 2011 An Investigation of The Relationship Between Students Motivation and Academic Performance As
13 pages
L5 Dimensionality Reduction
No ratings yet
L5 Dimensionality Reduction
47 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Auditing Reviewer
No ratings yet
Auditing Reviewer
27 pages
Efficacy of Agents To Prevent and Treat Enteral Feeding Tube Clogs
No ratings yet
Efficacy of Agents To Prevent and Treat Enteral Feeding Tube Clogs
5 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Feature Selection - Study Material
No ratings yet
Feature Selection - Study Material
6 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Information Gain - Towards Data Science
No ratings yet
Information Gain - Towards Data Science
8 pages
Introduction To Dimensionality Reduction-1
No ratings yet
Introduction To Dimensionality Reduction-1
16 pages
KNIME - Seven Techs For Dimensionality Reduction
No ratings yet
KNIME - Seven Techs For Dimensionality Reduction
17 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
Uab Graduate School Dissertation Format
100% (2)
Uab Graduate School Dissertation Format
6 pages
Preliminaries 05
No ratings yet
Preliminaries 05
12 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Feature Pruning and Normalization
No ratings yet
Feature Pruning and Normalization
8 pages
Deep Learning Vocabulary
No ratings yet
Deep Learning Vocabulary
6 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
A Review of Feature Selection Methods With Applications
No ratings yet
A Review of Feature Selection Methods With Applications
6 pages
Traditional Storage Methods - 23.07.2024
No ratings yet
Traditional Storage Methods - 23.07.2024
3 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
Comparartive
No ratings yet
Comparartive
7 pages
L-10 - Presentation1-09052024-072206pm
No ratings yet
L-10 - Presentation1-09052024-072206pm
27 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
9 pages
IEEE Dimensionality Reduction
No ratings yet
IEEE Dimensionality Reduction
6 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Feature Selection
No ratings yet
Feature Selection
5 pages
AI Magazine - 2023 - Munz - Maximizing AI Reliability Through Anticipatory Thinking and Model Risk Audits
No ratings yet
AI Magazine - 2023 - Munz - Maximizing AI Reliability Through Anticipatory Thinking and Model Risk Audits
12 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
Tripti Ahmed 20 42960 1
No ratings yet
Tripti Ahmed 20 42960 1
11 pages
Data Reduction
No ratings yet
Data Reduction
23 pages
Feature Selection Technique
No ratings yet
Feature Selection Technique
7 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
Feature Selection
No ratings yet
Feature Selection
2 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Unit 3
No ratings yet
Unit 3
50 pages
USSF Data and AI FY2025 Strategic Action Plan
No ratings yet
USSF Data and AI FY2025 Strategic Action Plan
12 pages
Module5.2 Feature Selection Methods
No ratings yet
Module5.2 Feature Selection Methods
64 pages
Wa0028.
No ratings yet
Wa0028.
10 pages
An Investigation Into The Confusion in Using The Prefixes in and Un
No ratings yet
An Investigation Into The Confusion in Using The Prefixes in and Un
10 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
Purnima Project File
No ratings yet
Purnima Project File
57 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Research Problem Unit 2
No ratings yet
Research Problem Unit 2
5 pages
Unit 3
No ratings yet
Unit 3
23 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
Feature Selection
No ratings yet
Feature Selection
53 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
56 pages
Featuere Selection
No ratings yet
Featuere Selection
5 pages
Feature Engineering and Dimensionality Reduction
No ratings yet
Feature Engineering and Dimensionality Reduction
146 pages
ASM-BDM - Module 3 - Notes
No ratings yet
ASM-BDM - Module 3 - Notes
12 pages
Dimensionality Reduction Final
No ratings yet
Dimensionality Reduction Final
5 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

3.1 Dimensionality Reduction

Uploaded by

3.1 Dimensionality Reduction

Uploaded by

What is Dimensionality Reduction?

 The number of input features, variables, or columns present in a given

 The process to reduce these features is called dimensionality reduction.

 A dataset contains a huge number of input features in various cases, which

 By reducing the dimensions of the features, the space required to store

1. Feature Selection (Subset Selection)

 It is necessary to provide a pre-processed and good input dataset in order to

 It helps in avoiding the curse of dimensionality.

 After each iteration, it keeps adding on a feature and evaluates

 The process continues until the addition of a new

opposite of forward selection

 This technique begins the process by considering all the features

 This elimination process continues until removing the features

 Check the different training iterations of the machine learning model

 Combined the advantages of both filter and wrapper methods by

 Feature extraction is the process of transforming the space

 Principal Component Analysis

• To reduce the number of resources needed for processing without losing

You might also like