0% found this document useful (0 votes)
77 views24 pages

3.1 Dimensionality Reduction

Dimensionality reduction techniques are used to reduce the number of features in datasets for better predictive modeling. Common techniques include feature selection, which selects important existing features, and feature extraction, which creates new features as combinations of existing ones. Popular algorithms for each technique are discussed.

Uploaded by

Javada Javada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views24 pages

3.1 Dimensionality Reduction

Dimensionality reduction techniques are used to reduce the number of features in datasets for better predictive modeling. Common techniques include feature selection, which selects important existing features, and feature extraction, which creates new features as combinations of existing ones. Popular algorithms for each technique are discussed.

Uploaded by

Javada Javada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

What is Dimensionality Reduction?

 The number of input features, variables, or columns present in a given


dataset is known as dimensionality

 The process to reduce these features is called dimensionality reduction.

 A dataset contains a huge number of input features in various cases, which


makes the predictive modeling task more complicated.

 For obtaining a better fit predictive model while solving the classification
and regression problems, we use dimensionality reduction
Benefits of applying Dimensionality Reduction

 By reducing the dimensions of the features, the space required to store


the dataset also gets reduced.
 Less Computation training time is required for reduced dimensions of
features.
 Reduced dimensions of features of the dataset help in visualizing the
data quickly.
 It removes the redundant features (if present) by taking care of
multicollinearity.
Dimensionality Reduction Techniques

1. Feature Selection (Subset Selection)


2.Feature Extraction
1. Feature Selection
 A feature is an attribute that has an impact on a problem or is useful for the
problem, and choosing the important features for the model is known as
feature selection.

 Differences :
 feature selection is about selecting the subset of the original feature
set
 feature extraction creates new features.

 Feature selection is a way of reducing the input variable for the model by using
only relevant data in order to reduce overfitting in the model.
What is Feature Selection?
 Feature selection is the process of selecting the subset of the relevant
features and leaving out the irrelevant features present in a dataset to build a
model of high accuracy.

 In other words, it is a way of selecting the optimal features from the input
dataset.
Definition:
 "It is a process of automatically or manually selecting the subset of most
appropriate and relevant features to be used in model building."
 Feature selection is performed by either including the important features or
excluding the irrelevant features in the dataset without changing them.
Need for Feature Selection

 It is necessary to provide a pre-processed and good input dataset in order to


get better outcomes.
 We collect a huge amount of data to train our model and help it to learn
better.
 The dataset consists of noisy data, irrelevant data, and some part of useful
data.
 Huge amount of data also slows down the training process of the model, and
with noise and irrelevant data, the model may not predict and perform well.
 So, it is very necessary to remove such noises and less-important data from
the dataset and to do this, and Feature selection techniques are used.
 Selecting the best features helps the model to perform well.
Benefits

 It helps in avoiding the curse of dimensionality.


 It helps in the simplification of the model so that it can be easily
interpreted by the researchers.
 It reduces the training time.
 It reduces overfitting hence enhance the generalization.
Feature Selection Techniques
a. Filter Methods
 The dataset is filtered, and a subset that contains only the relevant
features is taken.
 Filters out the irrelevant feature and redundant columns from the
model by using different metrics through ranking
 Some common techniques of filters method are:
 Correlation
 Chi-Square Test
 ANOVA
 Information Gain, etc.
b. Wrappers Methods

 Same goal as the filter method, but it takes a machine learning model for
its evaluation.

 In this method, some features are fed to the ML model, and evaluate the
performance.

 On the basis of the output of the model, features are added or subtracted,
and with this feature set, the model has trained again.

 This method is more accurate than the filtering method but complex to
work.
 Some common techniques of wrapper methods are:

i. Forward Selection
ii. Backward Selection
iii. Bi-directional Elimination
i. Forward selection -
 is an iterative process, which begins with an empty set of
features.

 After each iteration, it keeps adding on a feature and evaluates


the performance to check whether it is improving the
performance or not.

 The process continues until the addition of a new


variable/feature does not improve the performance of the model.
ii. Backward elimination -
Also an iterative approach

opposite of forward selection

 This technique begins the process by considering all the features


and removes the least significant feature.

 This elimination process continues until removing the features


does not improve the performance of the model.
c. Embedded Methods

 Check the different training iterations of the machine learning model


and evaluate the importance of each feature

 Combined the advantages of both filter and wrapper methods by


considering the interaction of features along with low computational
cost.

 These are fast processing methods similar to the filter method but
more accurate than the filter method.
 Some common techniques of Embedded methods are:

 LASSO
 Elastic Net
 Ridge Regression etc.
How to choose a Feature Selection Method?
It is very important to understand that which feature selection method will work properly
for their model.
2. Feature Extraction

 Feature extraction is the process of transforming the space


containing many dimensions into space with fewer dimensions.
 This approach is useful when we want to keep the whole
information but use fewer resources while processing the
information.
Some common feature extraction techniques are:

 Principal Component Analysis


 Linear Discriminant Analysis
 Kernel PCA
 Quadratic Discriminant Analysis
Why is this Useful?

• To reduce the number of resources needed for processing without losing


important or relevant information.
• Also reduce the amount of redundant data for a given analysis.
• Gives us new features which are a linear combination of the existing features.
• The new set of features will have different values as compared to the original
feature values.
• The main aim is that fewer features will be required to capture the
same information.
Common techniques of Dimensionality Reduction
a) Principal Component Analysis
b) Backward Elimination
c) Forward Selection
d) Score comparison
e) Missing Value Ratio
f) Low Variance Filter
g) High Correlation Filter
h) Random Forest
i) Factor Analysis
j) Auto-Encoder

You might also like