0% found this document useful (0 votes)

41 views6 pages

Feature Selection - Study Material

The document discusses feature selection in machine learning. It defines feature selection and explains that it is the process of selecting relevant features from a dataset to improve model performance and avoid issues like overfitting. It also describes different feature selection methods like filter, wrapper and embedded methods and their techniques.

Uploaded by

JAYDEEP DAS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views6 pages

Feature Selection - Study Material

Uploaded by

JAYDEEP DAS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Feature Selection in Machine Learning

“Feature selection is a way of selecting the subset of the most relevant features from the
original features set by removing the redundant, irrelevant, or noisy features.”
While developing the machine learning model, only a few variables in the dataset are useful
for building the model, and the rest features are either redundant or irrelevant. If we input
the dataset with all these redundant and irrelevant features, it may negatively impact and
reduce the overall performance and accuracy of the model. Hence it is very important to
identify and select the most appropriate features from the data and remove the irrelevant
or less important features, which is done with the help of feature selection in machine
learning.

Feature selection is one of the important concepts of machine learning, which highly
impacts the performance of the model. As machine learning works on the concept of
"Garbage In Garbage Out", so we always need to input the most appropriate and relevant
dataset to the model in order to get a better result.

The main topics to be discussed:

o What is Feature Selection?

o Need for Feature Selection
o Feature Selection Methods/Techniques

What is Feature Selection?

A feature is an attribute that has an impact on a problem or is useful for the problem, and
choosing the important features for the model is known as feature selection. Each machine
learning process depends on feature engineering, which mainly contains two processes;
which are Feature Selection and Feature Extraction. Although feature selection and
extraction processes may have the same objective, both are completely different from each
other. The main difference between them is that feature selection is about selecting the
subset of the original feature set, whereas feature extraction creates new features. Feature
selection is a way of reducing the input variable for the model by using only relevant data in
order to reduce overfitting in the model.

So, we can define feature Selection as, "It is a process of automatically or manually
selecting the subset of most appropriate and relevant features to be used in model
building." Feature selection is performed by either including the important features or
excluding the irrelevant features in the dataset without changing them.
Need for Feature Selection

Before implementing any technique, it is really important to understand, need for the
technique and so for the Feature Selection. As we know, in machine learning, it is necessary
to provide a pre-processed and good input dataset in order to get better outcomes. We
collect a huge amount of data to train our model and help it to learn better. Generally, the
dataset consists of noisy data, irrelevant data, and some part of useful data. Moreover, the
huge amount of data also slows down the training process of the model, and with noise and
irrelevant data, the model may not predict and perform well. So, it is very necessary to
remove such noises and less-important data from the dataset and to do this, and Feature
selection techniques are used.

Selecting the best features helps the model to perform well. For example, Suppose we want
to create a model that automatically decides which car should be crushed for a spare part,
and to do this, we have a dataset. This dataset contains a Model of the car, Year, Owner's
name, Miles. So, in this dataset, the name of the owner does not contribute to the model
performance as it does not decide if the car should be crushed or not, so we can remove this
column and select the rest of the features(column) for the model building.

Below are some benefits of using feature selection in machine learning:

o It helps in avoiding the curse of dimensionality.

o It helps in the simplification of the model so that it can be easily interpreted by the
researchers.
o It reduces the training time.
o It reduces overfitting hence enhance the generalization.

Feature Selection Techniques

There are mainly two types of Feature Selection techniques, which are:

o Supervised Feature Selection technique

Supervised Feature selection techniques consider the target variable and can be
used for the labelled dataset.
o Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore the target variable and can be
used for the unlabelled dataset.
1. Wrapper Methods

In wrapper methodology, selection of features is done by considering it as a search problem,

in which different combinations are made, evaluated, and compared with other
combinations. It trains the algorithm by using the subset of features iteratively. Some
common methods used are: Forward selection, Backward elimination, Exhaustive Feature
Selection, Recursive Feature Elimination.

2. Filter Methods

In Filter Method, features are selected on the basis of statistics measures. This method does
not depend on the learning algorithm and chooses the features as a pre-processing step.

The filter method filters out the irrelevant feature and redundant columns from the model
by using different metrics through ranking.

The advantage of using filter methods is that it needs low computational time and does not
overfit the data.

Some common methods used are: Information Gain, Chi-square Test, Fisher's Score, Missing
Value Ratio.

3. Embedded Methods

Embedded methods combined the advantages of both filter and wrapper methods by
considering the interaction of features along with low computational cost. These are fast
processing methods similar to the filter method but more accurate than the filter method.
These methods are also iterative, which evaluates each iteration, and optimally finds the
most important features that contribute the most to training in a particular iteration. Some
techniques of embedded methods are: Lasso, Random Forest, Regularization methods.

Overfitting and Underfitting in Machine Learning:

Overfitting and Underfitting are the two main problems that occur in machine learning and
degrade the performance of the machine learning models.

The main goal of each machine learning model is to generalize well.
Here generalization defines the ability of an ML model to provide a suitable output by
adapting the given set of unknown input. It means after providing training on the dataset, it
can produce reliable and accurate output. Hence, the underfitting and overfitting are the
two terms that need to be checked for the performance of the model and whether the
model is generalizing well or not.

Key terms:

o Bias: Bias is a prediction error that is introduced in the model due to oversimplifying

the machine learning algorithms. Or it is the difference between the predicted values
and the actual values.
o Variance: If the machine learning model performs well with the training dataset, but
does not perform well with the test dataset, then variance occurs.
o Noise: Noise is unnecessary and irrelevant data that reduces the performance of the
model.

Overfitting: Overfitting occurs when our machine learning model tries to cover all the data
points or more than the required data points present in the given dataset. Because of this,
the model starts caching noise and inaccurate values present in the dataset, and all these
factors reduce the efficiency and accuracy of the model. The overfitted model has low
bias and high variance.

The chances of occurrence of overfitting increase as much we provide training to our model.
It means the more we train our model, the more chances of occurring the overfitted model.

Overfitting is the main problem that occurs in supervised learning.

Example: The concept of the overfitting can be understood by the below graph of the linear
regression output:
As we can see from the above graph, the model tries to cover all the data points present in
the scatter plot. It may look efficient, but in reality, it is not so. Because the goal of the
regression model to find the best fit line, but here we have not got any best fit, so, it will
generate the prediction errors.

Underfitting: Underfitting occurs when our machine learning model is not able to capture
the underlying trend of the data. To avoid the overfitting in the model, the fed of training
data can be stopped at an early stage, due to which the model may not learn enough from
the training data. As a result, it may fail to find the best fit of the dominant trend in the
data.

In the case of underfitting, the model is not able to learn enough from the training data, and
hence it reduces the accuracy and produces unreliable predictions.

An underfitted model has high bias and low variance.

Example: We can understand the underfitting using below output of the linear regression
model:
As we can see from the above diagram, the model is unable to capture the data points
present in the plot.

What Is Mutual Information?

Mutual information is calculated between two variables and measures the reduction in
uncertainty for one variable given a known value of the other variable.
A quantity called mutual information measures the amount of information one can obtain
from one random variable given another.

The mutual information between two random variables X and Y can be stated formally as

follows:

 I(X ; Y) = H(X) – H(X | Y)

Where I(X ; Y) is the mutual information for X and Y, H(X) is the entropy for X and H(X | Y) is
the conditional entropy for X given Y.

Mutual information is a measure of dependence or “mutual dependence” between two

random variables. As such, the measure is symmetrical, meaning that I(X ; Y) = I(Y ; X).

It measures the average reduction in uncertainty about x that results from learning the value
of y; or vice versa, the average amount of information that x conveys about y.

*Source: Internet and Data Mining concepts by Han and Kamber.

Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
9 pages
Lecture 14
No ratings yet
Lecture 14
17 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
Kernels, Model Selection and Feature Selection
No ratings yet
Kernels, Model Selection and Feature Selection
5 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
52 pages
Feature Selection Technique
No ratings yet
Feature Selection Technique
7 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
ML Unit 1
No ratings yet
ML Unit 1
74 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Feature Pruning and Normalization
No ratings yet
Feature Pruning and Normalization
8 pages
3.1 Dimensionality Reduction
No ratings yet
3.1 Dimensionality Reduction
24 pages
Feature Selection Techniques For ML - A Survey of More Than Two Decades of Research - Dipti Theng
No ratings yet
Feature Selection Techniques For ML - A Survey of More Than Two Decades of Research - Dipti Theng
63 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Understanding Datasets Features Selection Train Test Validation Sets L12
No ratings yet
Understanding Datasets Features Selection Train Test Validation Sets L12
25 pages
Feature Selection
No ratings yet
Feature Selection
61 pages
Special Topic: Missing Values
No ratings yet
Special Topic: Missing Values
25 pages
ML Unit-2
No ratings yet
ML Unit-2
23 pages
Feature Selection
No ratings yet
Feature Selection
5 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
ML & DL
No ratings yet
ML & DL
19 pages
5.3 Model
No ratings yet
5.3 Model
26 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
ML Notes
No ratings yet
ML Notes
15 pages
UNIT 4 Supervised Learning
No ratings yet
UNIT 4 Supervised Learning
38 pages
Feature Selection
No ratings yet
Feature Selection
53 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Question1 Answers Complete
No ratings yet
Question1 Answers Complete
4 pages
Wa0028.
No ratings yet
Wa0028.
10 pages
Module3 DS PPT
No ratings yet
Module3 DS PPT
68 pages
Unit1 ML
No ratings yet
Unit1 ML
15 pages
02 - Diagnostics For Machine Learning Model
No ratings yet
02 - Diagnostics For Machine Learning Model
20 pages
Unit 3
No ratings yet
Unit 3
50 pages
Aam Unit 1 QB With Answer
No ratings yet
Aam Unit 1 QB With Answer
12 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Model Selection NEW
No ratings yet
Model Selection NEW
24 pages
Module-3 - DS (Autosaved)
No ratings yet
Module-3 - DS (Autosaved)
18 pages
Part 3
No ratings yet
Part 3
15 pages
Unit 1.2 Perceptron 2024
No ratings yet
Unit 1.2 Perceptron 2024
107 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
MLquestions
No ratings yet
MLquestions
26 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Confusion Matrix
No ratings yet
Confusion Matrix
26 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
Workbook of Pattern Recognition
No ratings yet
Workbook of Pattern Recognition
11 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
4 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages