Feature Selection Techniques in Machine Learning - Javatpoint
Feature Selection Techniques in Machine Learning - Javatpoint
“ Feature selection is a way of selecting the subset of the most relevant features from the
original features set by removing the redundant, irrelevant, or noisy features.
”
While developing the machine learning model, only a few variables in the dataset are useful for
building the model, and the rest features are either redundant or irrelevant. If we input the
dataset with all these redundant and irrelevant features, it may negatively impact and reduce
the overall performance and accuracy of the model. Hence it is very important to identify and
select the most appropriate features from the data and remove the irrelevant or less important
features, which is done with the help of feature selection in machine learning.
Feature selection is one of the important concepts of machine learning, which highly impacts
the performance of the model. As machine learning works on the concept of "Garbage In
Garbage Out", so we always need to input the most appropriate and relevant dataset to the
model in order to get a better result.
In this topic, we will discuss different feature selection techniques for machine learning. But
before that, let's first understand some basics of feature selection.
So, we can define feature Selection as, "It is a process of automatically or manually selecting
the subset of most appropriate and relevant features to be used in model building." Feature
selection is performed by either including the important features or excluding the irrelevant
features in the dataset without changing them.
Selecting the best features helps the model to perform well. For example, Suppose we want to
create a model that automatically decides which car should be crushed for a spare part, and to
do this, we have a dataset. This dataset contains a Model of the car, Year, Owner's name, Miles.
So, in this dataset, the name of the owner does not contribute to the model performance as it
does not decide if the car should be crushed or not, so we can remove this column and select
the rest of the features(column) for the model building.
https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 3/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint
It helps in the simplification of the model so that it can be easily interpreted by the
researchers.
1. Wrapper Methods
https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 4/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint
On the basis of the output of the model, features are added or subtracted, and with this feature
set, the model has trained again.
Exhaustive Feature Selection- Exhaustive feature selection is one of the best feature
selection methods, which evaluates each feature set as brute-force. It means this
method tries & make each possible combination of features and return the best
performing feature set.
https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 5/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint
2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures. This method does
not depend on the learning algorithm and chooses the features as a pre-processing step.
The filter method filters out the irrelevant feature and redundant columns from the model by
using different metrics through ranking.
The advantage of using filter methods is that it needs low computational time and does not
overfit the data.
Information Gain
Chi-square Test
Fisher's Score
Information Gain: Information gain determines the reduction in entropy while transforming
the dataset. It can be used as a feature selection technique by calculating the information gain
of each variable with respect to the target variable.
https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 6/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint
Chi-square Test: Chi-square test is a technique to determine the relationship between the
categorical variables. The chi-square value is calculated between each feature and the target
variable, and the desired number of features with the best chi-square value is selected.
Fisher's Score:
Fisher's score is one of the popular supervised technique of features selection. It returns the
rank of the variable on the fisher's criteria in descending order. Then we can select the variables
with a large fisher's score.
The value of the missing value ratio can be used for evaluating the feature set against the
threshold value. The formula for obtaining the missing value ratio is the number of missing
values in each column divided by the total number of observations. The variable is having more
than the threshold value can be dropped.
3. Embedded Methods
Embedded methods combined the advantages of both filter and wrapper methods by
considering the interaction of features along with low computational cost. These are fast
processing methods similar to the filter method but more accurate than the filter method.
https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 7/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint
These methods are also iterative, which evaluates each iteration, and optimally finds the most
important features that contribute the most to training in a particular iteration. Some
techniques of embedded methods are:
https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 8/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint
To know this, we need to first identify the type of input and output variables. In machine
learning, variables are of mainly two types:
Below are some univariate statistical measures, which can be used for filter-based feature
selection:
Numerical Input variables are used for predictive regression modelling. The common method
to be used for such a case is the Correlation coefficient.
Numerical Input with categorical output is the case for classification predictive modelling
problems. In this case, also, correlation-based techniques should be used, but with categorical
output.
This is the case of regression predictive modelling with categorical input. It is a different
example of a regression problem. We can use the same measures as discussed in the above
case but in reverse order.
The commonly used technique for such a case is Chi-Squared Test. We can also use Information
gain in this case.
We can summarise the above cases with appropriate measures in the below table:
https://www.javatpoint.com/feature-selection-techniques-in-machine-learning 9/13
7/28/24, 5:48 PM Feature Selection Techniques in Machine Learning - Javatpoint
Numerical Numerical
Pearson's correlation coefficient (For linear
Correlation).
Numerical Categorical
ANOVA correlation coefficient (linear).
Categorical Numerical
Kendall's rank coefficient (linear).
Categorical Categorical
Chi-Squared test (contingency tables).
Mutual Information.
Conclusion
Feature selection is a very complicated and vast field of machine learning, and lots of studies
are already made to discover the best methods. There is no fixed rule of the best feature
selection method. However, choosing the method depend on a machine learning engineer who
can combine and innovate approaches to find the best method for a specific problem. One
should try a variety of model fits on different subsets of features selected through different
statistical Measures.
← Prev Next →