0% found this document useful (0 votes)
41 views6 pages

Feature Selection - Study Material

The document discusses feature selection in machine learning. It defines feature selection and explains that it is the process of selecting relevant features from a dataset to improve model performance and avoid issues like overfitting. It also describes different feature selection methods like filter, wrapper and embedded methods and their techniques.

Uploaded by

JAYDEEP DAS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views6 pages

Feature Selection - Study Material

The document discusses feature selection in machine learning. It defines feature selection and explains that it is the process of selecting relevant features from a dataset to improve model performance and avoid issues like overfitting. It also describes different feature selection methods like filter, wrapper and embedded methods and their techniques.

Uploaded by

JAYDEEP DAS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Feature Selection in Machine Learning

“Feature selection is a way of selecting the subset of the most relevant features from the
original features set by removing the redundant, irrelevant, or noisy features.”
While developing the machine learning model, only a few variables in the dataset are useful
for building the model, and the rest features are either redundant or irrelevant. If we input
the dataset with all these redundant and irrelevant features, it may negatively impact and
reduce the overall performance and accuracy of the model. Hence it is very important to
identify and select the most appropriate features from the data and remove the irrelevant
or less important features, which is done with the help of feature selection in machine
learning.

Feature selection is one of the important concepts of machine learning, which highly
impacts the performance of the model. As machine learning works on the concept of
"Garbage In Garbage Out", so we always need to input the most appropriate and relevant
dataset to the model in order to get a better result.

The main topics to be discussed:

o What is Feature Selection?


o Need for Feature Selection
o Feature Selection Methods/Techniques

What is Feature Selection?

A feature is an attribute that has an impact on a problem or is useful for the problem, and
choosing the important features for the model is known as feature selection. Each machine
learning process depends on feature engineering, which mainly contains two processes;
which are Feature Selection and Feature Extraction. Although feature selection and
extraction processes may have the same objective, both are completely different from each
other. The main difference between them is that feature selection is about selecting the
subset of the original feature set, whereas feature extraction creates new features. Feature
selection is a way of reducing the input variable for the model by using only relevant data in
order to reduce overfitting in the model.

So, we can define feature Selection as, "It is a process of automatically or manually
selecting the subset of most appropriate and relevant features to be used in model
building." Feature selection is performed by either including the important features or
excluding the irrelevant features in the dataset without changing them.
Need for Feature Selection

Before implementing any technique, it is really important to understand, need for the
technique and so for the Feature Selection. As we know, in machine learning, it is necessary
to provide a pre-processed and good input dataset in order to get better outcomes. We
collect a huge amount of data to train our model and help it to learn better. Generally, the
dataset consists of noisy data, irrelevant data, and some part of useful data. Moreover, the
huge amount of data also slows down the training process of the model, and with noise and
irrelevant data, the model may not predict and perform well. So, it is very necessary to
remove such noises and less-important data from the dataset and to do this, and Feature
selection techniques are used.

Selecting the best features helps the model to perform well. For example, Suppose we want
to create a model that automatically decides which car should be crushed for a spare part,
and to do this, we have a dataset. This dataset contains a Model of the car, Year, Owner's
name, Miles. So, in this dataset, the name of the owner does not contribute to the model
performance as it does not decide if the car should be crushed or not, so we can remove this
column and select the rest of the features(column) for the model building.

Below are some benefits of using feature selection in machine learning:

o It helps in avoiding the curse of dimensionality.


o It helps in the simplification of the model so that it can be easily interpreted by the
researchers.
o It reduces the training time.
o It reduces overfitting hence enhance the generalization.

Feature Selection Techniques

There are mainly two types of Feature Selection techniques, which are:

o Supervised Feature Selection technique


Supervised Feature selection techniques consider the target variable and can be
used for the labelled dataset.
o Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore the target variable and can be
used for the unlabelled dataset.
1. Wrapper Methods

In wrapper methodology, selection of features is done by considering it as a search problem,


in which different combinations are made, evaluated, and compared with other
combinations. It trains the algorithm by using the subset of features iteratively. Some
common methods used are: Forward selection, Backward elimination, Exhaustive Feature
Selection, Recursive Feature Elimination.

2. Filter Methods

In Filter Method, features are selected on the basis of statistics measures. This method does
not depend on the learning algorithm and chooses the features as a pre-processing step.

The filter method filters out the irrelevant feature and redundant columns from the model
by using different metrics through ranking.

The advantage of using filter methods is that it needs low computational time and does not
overfit the data.

Some common methods used are: Information Gain, Chi-square Test, Fisher's Score, Missing
Value Ratio.

3. Embedded Methods

Embedded methods combined the advantages of both filter and wrapper methods by
considering the interaction of features along with low computational cost. These are fast
processing methods similar to the filter method but more accurate than the filter method.
These methods are also iterative, which evaluates each iteration, and optimally finds the
most important features that contribute the most to training in a particular iteration. Some
techniques of embedded methods are: Lasso, Random Forest, Regularization methods.

Overfitting and Underfitting in Machine Learning:

Overfitting and Underfitting are the two main problems that occur in machine learning and
degrade the performance of the machine learning models.

The main goal of each machine learning model is to generalize well.
Here generalization defines the ability of an ML model to provide a suitable output by
adapting the given set of unknown input. It means after providing training on the dataset, it
can produce reliable and accurate output. Hence, the underfitting and overfitting are the
two terms that need to be checked for the performance of the model and whether the
model is generalizing well or not.

Key terms:

o Bias: Bias is a prediction error that is introduced in the model due to oversimplifying


the machine learning algorithms. Or it is the difference between the predicted values
and the actual values.
o Variance: If the machine learning model performs well with the training dataset, but
does not perform well with the test dataset, then variance occurs.
o Noise: Noise is unnecessary and irrelevant data that reduces the performance of the
model.

Overfitting: Overfitting occurs when our machine learning model tries to cover all the data
points or more than the required data points present in the given dataset. Because of this,
the model starts caching noise and inaccurate values present in the dataset, and all these
factors reduce the efficiency and accuracy of the model. The overfitted model has low
bias and high variance.

The chances of occurrence of overfitting increase as much we provide training to our model.
It means the more we train our model, the more chances of occurring the overfitted model.

Overfitting is the main problem that occurs in supervised learning.

Example: The concept of the overfitting can be understood by the below graph of the linear
regression output:
As we can see from the above graph, the model tries to cover all the data points present in
the scatter plot. It may look efficient, but in reality, it is not so. Because the goal of the
regression model to find the best fit line, but here we have not got any best fit, so, it will
generate the prediction errors.

Underfitting: Underfitting occurs when our machine learning model is not able to capture
the underlying trend of the data. To avoid the overfitting in the model, the fed of training
data can be stopped at an early stage, due to which the model may not learn enough from
the training data. As a result, it may fail to find the best fit of the dominant trend in the
data.

In the case of underfitting, the model is not able to learn enough from the training data, and
hence it reduces the accuracy and produces unreliable predictions.

An underfitted model has high bias and low variance.

Example: We can understand the underfitting using below output of the linear regression
model:
As we can see from the above diagram, the model is unable to capture the data points
present in the plot.

What Is Mutual Information?

Mutual information is calculated between two variables and measures the reduction in
uncertainty for one variable given a known value of the other variable.
A quantity called mutual information measures the amount of information one can obtain
from one random variable given another.

The mutual information between two random variables X and Y can be stated formally as


follows:

 I(X ; Y) = H(X) – H(X | Y)


Where I(X ; Y) is the mutual information for X and Y, H(X) is the entropy for X and H(X | Y) is
the conditional entropy for X given Y.

Mutual information is a measure of dependence or “mutual dependence” between two


random variables. As such, the measure is symmetrical, meaning that I(X ; Y) = I(Y ; X).

It measures the average reduction in uncertainty about x that results from learning the value
of y; or vice versa, the average amount of information that x conveys about y.

*Source: Internet and Data Mining concepts by Han and Kamber.

You might also like