0% found this document useful (0 votes)

70 views9 pages

Feature Selection Techniques in Machine Learning - Javatpoint

Uploaded by

jhajhriashubham9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views9 pages

Feature Selection Techniques in Machine Learning - Javatpoint

Uploaded by

jhajhriashubham9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint

Feature Selection Techniques in Machine Learning

“ Feature selection is a way of selecting the subset of the most relevant features from the
original features set by removing the redundant, irrelevant, or noisy features.
”
While developing the machine learning model, only a few variables in the dataset are useful for
building the model, and the rest features are either redundant or irrelevant. If we input the
dataset with all these redundant and irrelevant features, it may negatively impact and reduce
the overall performance and accuracy of the model. Hence it is very important to identify and
select the most appropriate features from the data and remove the irrelevant or less important
features, which is done with the help of feature selection in machine learning.

Feature selection is one of the important concepts of machine learning, which highly impacts
the performance of the model. As machine learning works on the concept of "Garbage In
Garbage Out", so we always need to input the most appropriate and relevant dataset to the
model in order to get a better result.

In this topic, we will discuss different feature selection techniques for machine learning. But
before that, let's first understand some basics of feature selection.

What is Feature Selection?

Need for Feature Selection

Feature Selection Methods/Techniques

https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 2/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint

Feature Selection statistics

What is Feature Selection?

A feature is an attribute that has an impact on a problem or is useful for the problem, and
choosing the important features for the model is known as feature selection. Each machine
learning process depends on feature engineering, which mainly contains two processes; which
are Feature Selection and Feature Extraction. Although feature selection and extraction
processes may have the same objective, both are completely different from each other. The
main difference between them is that feature selection is about selecting the subset of the
original feature set, whereas feature extraction creates new features. Feature selection is a way
of reducing the input variable for the model by using only relevant data in order to reduce
overfitting in the model.

So, we can define feature Selection as, "It is a process of automatically or manually selecting
the subset of most appropriate and relevant features to be used in model building." Feature
selection is performed by either including the important features or excluding the irrelevant
features in the dataset without changing them.

Need for Feature Selection

Before implementing any technique, it is really important to understand, need for the technique
and so for the Feature Selection. As we know, in machine learning, it is necessary to provide a
pre-processed and good input dataset in order to get better outcomes. We collect a huge
amount of data to train our model and help it to learn better. Generally, the dataset consists of
noisy data, irrelevant data, and some part of useful data. Moreover, the huge amount of data
also slows down the training process of the model, and with noise and irrelevant data, the
model may not predict and perform well. So, it is very necessary to remove such noises and
less-important data from the dataset and to do this, and Feature selection techniques are used.

Selecting the best features helps the model to perform well. For example, Suppose we want to
create a model that automatically decides which car should be crushed for a spare part, and to
do this, we have a dataset. This dataset contains a Model of the car, Year, Owner's name, Miles.
So, in this dataset, the name of the owner does not contribute to the model performance as it
does not decide if the car should be crushed or not, so we can remove this column and select
the rest of the features(column) for the model building.

Below are some benefits of using feature selection in machine learning:

It helps in avoiding the curse of dimensionality.

https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 3/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint

It helps in the simplification of the model so that it can be easily interpreted by the
researchers.

It reduces the training time.

It reduces overfitting hence enhance the generalization.

Feature Selection Techniques

There are mainly two types of Feature Selection techniques, which are:

Supervised Feature Selection technique

Supervised Feature selection techniques consider the target variable and can be used for
the labelled dataset.

Unsupervised Feature Selection technique

Unsupervised Feature selection techniques ignore the target variable and can be used for
the unlabelled dataset.

There are mainly three techniques under supervised feature Selection:

1. Wrapper Methods

In wrapper methodology, selection of features is done by considering it as a search problem, in

which different combinations are made, evaluated, and compared with other combinations. It
trains the algorithm by using the subset of features iteratively.

https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 4/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint

On the basis of the output of the model, features are added or subtracted, and with this feature
set, the model has trained again.

Some techniques of wrapper methods are:

Forward selection - Forward selection is an iterative process, which begins with an

empty set of features. After each iteration, it keeps adding on a feature and evaluates the
performance to check whether it is improving the performance or not. The process
continues until the addition of a new variable/feature does not improve the performance
of the model.

Backward elimination - Backward elimination is also an iterative approach, but it is the

opposite of forward selection. This technique begins the process by considering all the
features and removes the least significant feature. This elimination process continues
until removing the features does not improve the performance of the model.

Exhaustive Feature Selection- Exhaustive feature selection is one of the best feature
selection methods, which evaluates each feature set as brute-force. It means this
method tries & make each possible combination of features and return the best
performing feature set.

Recursive Feature Elimination-

Recursive feature elimination is a recursive greedy optimization approach, where
features are selected by recursively taking a smaller and smaller subset of features. Now,
an estimator is trained with each set of features, and the importance of each feature is
determined using coef_attribute or through a feature_importances_attribute.

https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 5/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint

2. Filter Methods

In Filter Method, features are selected on the basis of statistics measures. This method does
not depend on the learning algorithm and chooses the features as a pre-processing step.

The filter method filters out the irrelevant feature and redundant columns from the model by
using different metrics through ranking.

The advantage of using filter methods is that it needs low computational time and does not
overfit the data.

Some common techniques of Filter methods are as follows:

Information Gain

Chi-square Test

Fisher's Score

Missing Value Ratio

Information Gain: Information gain determines the reduction in entropy while transforming
the dataset. It can be used as a feature selection technique by calculating the information gain
of each variable with respect to the target variable.

https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 6/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint

Chi-square Test: Chi-square test is a technique to determine the relationship between the
categorical variables. The chi-square value is calculated between each feature and the target
variable, and the desired number of features with the best chi-square value is selected.

Fisher's Score:

Fisher's score is one of the popular supervised technique of features selection. It returns the
rank of the variable on the fisher's criteria in descending order. Then we can select the variables
with a large fisher's score.

Missing Value Ratio:

The value of the missing value ratio can be used for evaluating the feature set against the
threshold value. The formula for obtaining the missing value ratio is the number of missing
values in each column divided by the total number of observations. The variable is having more
than the threshold value can be dropped.

3. Embedded Methods

Embedded methods combined the advantages of both filter and wrapper methods by
considering the interaction of features along with low computational cost. These are fast
processing methods similar to the filter method but more accurate than the filter method.

https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 7/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint

These methods are also iterative, which evaluates each iteration, and optimally finds the most
important features that contribute the most to training in a particular iteration. Some
techniques of embedded methods are:

Regularization- Regularization adds a penalty term to different parameters of the

machine learning model for avoiding overfitting in the model. This penalty term is added
to the coefficients; hence it shrinks some coefficients to zero. Those features with zero
coefficients can be removed from the dataset. The types of regularization techniques are
L1 Regularization (Lasso Regularization) or Elastic Nets (L1 and L2 regularization).

Random Forest Importance - Different tree-based methods of feature selection help us

with feature importance to provide a way of selecting features. Here, feature importance
specifies which feature has more importance in model building or has a great impact on
the target variable. Random Forest is such a tree-based method, which is a type of
bagging algorithm that aggregates a different number of decision trees. It automatically
ranks the nodes by their performance or decrease in the impurity (Gini impurity) over all
the trees. Nodes are arranged as per the impurity values, and thus it allows to pruning of
trees below a specific node. The remaining nodes create a subset of the most important
features.

How to choose a Feature Selection Method?

For machine learning engineers, it is very important to understand that which feature selection
method will work properly for their model. The more we know the datatypes of variables, the
easier it is to choose the appropriate statistical measure for feature selection.

https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 8/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint

To know this, we need to first identify the type of input and output variables. In machine
learning, variables are of mainly two types:

Numerical Variables: Variable with continuous values such as integer, float

Categorical Variables: Variables with categorical values such as Boolean, ordinal,

nominals.

Below are some univariate statistical measures, which can be used for filter-based feature
selection:

1. Numerical Input, Numerical Output:

Numerical Input variables are used for predictive regression modelling. The common method
to be used for such a case is the Correlation coefficient.

Pearson's correlation coefficient (For linear Correlation).

Spearman's rank coefficient (for non-linear correlation).

2. Numerical Input, Categorical Output:

Numerical Input with categorical output is the case for classification predictive modelling
problems. In this case, also, correlation-based techniques should be used, but with categorical
output.

ANOVA correlation coefficient (linear).

Kendall's rank coefficient (nonlinear).

3. Categorical Input, Numerical Output:

This is the case of regression predictive modelling with categorical input. It is a different
example of a regression problem. We can use the same measures as discussed in the above
case but in reverse order.

4. Categorical Input, Categorical Output:

This is a case of classification predictive modelling with categorical Input variables.

The commonly used technique for such a case is Chi-Squared Test. We can also use Information
gain in this case.

We can summarise the above cases with appropriate measures in the below table:

https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 9/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint

Input Output Feature Selection technique

Variable Variable

Numerical Numerical
Pearson's correlation coefficient (For linear
Correlation).

Spearman's rank coefficient (for non-linear

correlation).

Numerical Categorical
ANOVA correlation coefficient (linear).

Kendall's rank coefficient (nonlinear).

Categorical Numerical
Kendall's rank coefficient (linear).

ANOVA correlation coefficient (nonlinear).

Categorical Categorical
Chi-Squared test (contingency tables).

Mutual Information.

Conclusion
Feature selection is a very complicated and vast field of machine learning, and lots of studies
are already made to discover the best methods. There is no fixed rule of the best feature
selection method. However, choosing the method depend on a machine learning engineer who
can combine and innovate approaches to find the best method for a specific problem. One
should try a variety of model fits on different subsets of features selected through different
statistical Measures.

← Prev Next →

For Videos Join Our Youtube Channel: Join Now

https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 10/13

Introduction To TensorFlow in Python
100% (3)
Introduction To TensorFlow in Python
146 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
Solution Dseclzg524!01!102020 Ec2r
100% (1)
Solution Dseclzg524!01!102020 Ec2r
6 pages
Dive Into Deep Learning
100% (1)
Dive Into Deep Learning
633 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
Deep Learning (MODULE-3)
No ratings yet
Deep Learning (MODULE-3)
85 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
Introduction To Generative AI
100% (1)
Introduction To Generative AI
77 pages
CNN PPT Unit Iv
No ratings yet
CNN PPT Unit Iv
134 pages
Genetic Algorithms
100% (2)
Genetic Algorithms
94 pages
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
100% (1)
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
41 pages
Uninformed Search Algorithms
No ratings yet
Uninformed Search Algorithms
58 pages
An Analysis of Convolutional Neural Network Architectures
No ratings yet
An Analysis of Convolutional Neural Network Architectures
54 pages
What Is Naive Bayes Algorithm?
No ratings yet
What Is Naive Bayes Algorithm?
18 pages
Instant Ebooks Textbook Deep Generative Modeling Jakub M. Tomczak Download All Chapters
No ratings yet
Instant Ebooks Textbook Deep Generative Modeling Jakub M. Tomczak Download All Chapters
49 pages
LangChain - Chat With Your Data
No ratings yet
LangChain - Chat With Your Data
32 pages
Computer Education For Nepali School Students - QBASIC CLASS IX
No ratings yet
Computer Education For Nepali School Students - QBASIC CLASS IX
10 pages
Convolutional Neural Networks (Part I)
No ratings yet
Convolutional Neural Networks (Part I)
61 pages
Chapter 7 - Regression Analysis
100% (1)
Chapter 7 - Regression Analysis
111 pages
Btech CSE
No ratings yet
Btech CSE
17 pages
Depth Prediction Single Image
No ratings yet
Depth Prediction Single Image
8 pages
Back Propagation Technique
No ratings yet
Back Propagation Technique
24 pages
Lecture 4.a - Greedy Algorithms
No ratings yet
Lecture 4.a - Greedy Algorithms
45 pages
A Brief Survey of Deep Reinforcement Learning
No ratings yet
A Brief Survey of Deep Reinforcement Learning
16 pages
UpGrad Campus - Data Science & Analytics Brochure
100% (1)
UpGrad Campus - Data Science & Analytics Brochure
10 pages
Lecture Notes SC
No ratings yet
Lecture Notes SC
21 pages
C OMBINATORIAL M ODELS OF C OMPLEX S YSTEMSTesis Doctorado Eng
No ratings yet
C OMBINATORIAL M ODELS OF C OMPLEX S YSTEMSTesis Doctorado Eng
194 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
7 pages
Machine Learning: Andrew NG's Course From Coursera: Presentation
100% (1)
Machine Learning: Andrew NG's Course From Coursera: Presentation
4 pages
Bonus 1 - TF2.0 Practical Advanced Cheat Sheet PDF
No ratings yet
Bonus 1 - TF2.0 Practical Advanced Cheat Sheet PDF
17 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
Mehryar Mohri - Foundations of Machine Learning - Book
No ratings yet
Mehryar Mohri - Foundations of Machine Learning - Book
1 page
Elementary Algorithms
100% (1)
Elementary Algorithms
622 pages
Lectures 2 Heuristic Optimization Methods:: Combinatorial Optimization Complexity Theory When and Why To Use Heuristics
No ratings yet
Lectures 2 Heuristic Optimization Methods:: Combinatorial Optimization Complexity Theory When and Why To Use Heuristics
37 pages
Lecture 3.b - Dynamic Programming PDF
No ratings yet
Lecture 3.b - Dynamic Programming PDF
46 pages
Premanand Naik DataScientist 4years Pune
No ratings yet
Premanand Naik DataScientist 4years Pune
4 pages
ANN Matlab
No ratings yet
ANN Matlab
13 pages
Statistics and Machine Learning Toolbox™ Release Notes
No ratings yet
Statistics and Machine Learning Toolbox™ Release Notes
150 pages
Deep Learning Unit 1
No ratings yet
Deep Learning Unit 1
24 pages
Lec 06 Feature Selection and Extraction
No ratings yet
Lec 06 Feature Selection and Extraction
43 pages
Artificial Intelligence: Computer Science Engineering
No ratings yet
Artificial Intelligence: Computer Science Engineering
1 page
An Overview of Machine Learning
No ratings yet
An Overview of Machine Learning
23 pages
Back Propagation
100% (1)
Back Propagation
27 pages
Amitabha Bagchi, Rahul Muthu - Algorithms and Discrete Applied Mathematics 2023
No ratings yet
Amitabha Bagchi, Rahul Muthu - Algorithms and Discrete Applied Mathematics 2023
464 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
ف1
No ratings yet
ف1
4 pages
Lecture 01 (Introduction To Pattern Recognition)
No ratings yet
Lecture 01 (Introduction To Pattern Recognition)
26 pages
Automatic Differentiation With Pytorch: Stat 479: Deep Learning, Spring 2019 Sebastian Raschka
No ratings yet
Automatic Differentiation With Pytorch: Stat 479: Deep Learning, Spring 2019 Sebastian Raschka
43 pages
Gold Price Prediction System
No ratings yet
Gold Price Prediction System
8 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
Spring 2022 CS7643 Deep Learning Syllabus and Schedule - v5.1
No ratings yet
Spring 2022 CS7643 Deep Learning Syllabus and Schedule - v5.1
11 pages
Deep Learning Kathi
No ratings yet
Deep Learning Kathi
18 pages
Tutorial Bilevel Optimization Without Tears
No ratings yet
Tutorial Bilevel Optimization Without Tears
39 pages
G5Aiai Introduction To AI: Graham Kendall
No ratings yet
G5Aiai Introduction To AI: Graham Kendall
48 pages
Intro To Advanced Applied Algorithms Nitk 2013
No ratings yet
Intro To Advanced Applied Algorithms Nitk 2013
1,908 pages
Aerospace and AI Bringing Together Montreal S Distinctive Strengths-Vf Compressed
No ratings yet
Aerospace and AI Bringing Together Montreal S Distinctive Strengths-Vf Compressed
44 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
41 pages
Neural
No ratings yet
Neural
35 pages
Quantum Algorithms
No ratings yet
Quantum Algorithms
17 pages
Ann Chapter 2
No ratings yet
Ann Chapter 2
240 pages
Deep Learning Tutorial Release 0.1
No ratings yet
Deep Learning Tutorial Release 0.1
173 pages
3 - ANN Part One PDF
No ratings yet
3 - ANN Part One PDF
30 pages
Travel Agg
No ratings yet
Travel Agg
54 pages
Aptuso Healthcare Software Ecosystem
No ratings yet
Aptuso Healthcare Software Ecosystem
22 pages
Optimization in Machine Learning
No ratings yet
Optimization in Machine Learning
26 pages
For Email
No ratings yet
For Email
8 pages
ETI Microproject Report
No ratings yet
ETI Microproject Report
19 pages
Machine Leaning 1 Unit
No ratings yet
Machine Leaning 1 Unit
10 pages
Data Science Immersive Syllabus: Course
No ratings yet
Data Science Immersive Syllabus: Course
4 pages
Chapter 10: Artificial Neural Networks
No ratings yet
Chapter 10: Artificial Neural Networks
17 pages
DDoS Attacks Mitigation A Review of AI-Based Strategies and Techniques
No ratings yet
DDoS Attacks Mitigation A Review of AI-Based Strategies and Techniques
6 pages
AI Based Threat Detection System - IEEE Report
No ratings yet
AI Based Threat Detection System - IEEE Report
10 pages
Get Machine Learning Systems Designs That Scale 1617293334 Free All Chapters
100% (2)
Get Machine Learning Systems Designs That Scale 1617293334 Free All Chapters
24 pages
2019 Shale Analytics PDF
No ratings yet
2019 Shale Analytics PDF
5 pages
19 - Sessionppt - Clusteringalgos
No ratings yet
19 - Sessionppt - Clusteringalgos
36 pages
Artificial Intelligence Enabled Marketing Solution
No ratings yet
Artificial Intelligence Enabled Marketing Solution
14 pages
Ma PLe
No ratings yet
Ma PLe
13 pages
Optimizing Edge AI - A Comprehensive Survey On Data, Model, and System Strategies
No ratings yet
Optimizing Edge AI - A Comprehensive Survey On Data, Model, and System Strategies
31 pages
Using Artificial Intelligence To Improve Poultry P
No ratings yet
Using Artificial Intelligence To Improve Poultry P
25 pages
Word Transcription of MODI Script To Devanagari Using Deep Neural Network
No ratings yet
Word Transcription of MODI Script To Devanagari Using Deep Neural Network
5 pages
What Is Fintech Explaining Upcoming Trends in Financial Technology
No ratings yet
What Is Fintech Explaining Upcoming Trends in Financial Technology
21 pages
Hierarchical Clustering Solution 1
No ratings yet
Hierarchical Clustering Solution 1
2 pages
SDLC Model Explainable Automated Program Repair
No ratings yet
SDLC Model Explainable Automated Program Repair
7 pages
How Do AI Systems Learn To Identify Patterns in Animal Facial Expressions
No ratings yet
How Do AI Systems Learn To Identify Patterns in Animal Facial Expressions
2 pages
Winner Take All Autoencoders
No ratings yet
Winner Take All Autoencoders
11 pages
Cloud Computing 600 Assignment
No ratings yet
Cloud Computing 600 Assignment
3 pages
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)