0% found this document useful (0 votes)

123 views29 pages

The 5 Feature Selection Algorithms Every Data Scientist Should Know

The document discusses 5 feature selection algorithms that data scientists should know: 1) Pearson correlation, 2) Chi-squared test, 3) Recursive feature elimination, 4) Lasso selection, and 5) Tree-based selection. It also discusses why feature selection is important for reducing overfitting, improving explainability, and removing non-informative features. The document demonstrates these techniques on a dataset of football players to identify attributes that correlate with highly rated players.

Uploaded by

Rama Chandra Gunturi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

123 views29 pages

The 5 Feature Selection Algorithms Every Data Scientist Should Know

Uploaded by

Rama Chandra Gunturi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 29

The 5 Feature Selection

Algorithms every Data Scientist

should know
Bonus: What makes a good footballer great?

Rahul Agarwal
Follow
Jul 27, 2019 · 7 min read

Data Science is the study of algorithms.

I grapple through with many algorithms on a day to day basis,

so I thought of listing some of the most common and most used
algorithms one will end up using in this new DS Algorithm
series.

How many times it has happened when you create a lot of

features and then you need to come up with ways to reduce the
number of features.

We sometimes end up using correlation or tree-based methods

to find out the important features.

Can we add some structure to it?

This post is about some of the most common feature
selection techniques one can use while working with
data.

Why Feature Selection?

Before we proceed, we need to answer this question. Why don’t
we give all the features to the ML algorithm and let it decide
which feature is important?

So there are three reasons why we don’t:

1. Curse of dimensionality — Overfitting

Source

If we have more columns in the data than the number of rows,

we will be able to fit our training data perfectly, but that won’t
generalize to the new samples. And thus we learn absolutely
nothing.

2. Occam’s Razor:

We want our models to be simple and explainable. We lose

explainability when we have a lot of features.

3. Garbage In Garbage out:

Most of the times, we will have many non-informative features.

For Example, Name or ID variables. Poor-quality input will
produce Poor-Quality output.
Also, a large number of features make a model bulky, time-
taking, and harder to implement in production.

So What do we do?
We select only useful features.

Fortunately, Scikit-learn has made it pretty much easy for us to

make the feature selection. There are a lot of ways in which we
can think of feature selection, but most feature selection
methods can be divided into three major buckets

 Filter based: We specify some metric and based on

that filter features. An example of such a metric could
be correlation/chi-square.

 Wrapper-based: Wrapper methods consider the

selection of a set of features as a search problem.
Example: Recursive Feature Elimination

 Embedded: Embedded methods use algorithms that

have built-in feature selection methods. For instance,
Lasso and RF have their own feature selection
methods.

So enough of theory let us start with our five feature selection

methods.

We will try to do this using a dataset to understand it better.

I am going to be using a football player dataset to find

out what makes a good player great?
Don’t worry if you don’t understand football
terminologies. I will try to keep it at a minimum.

Here is the Kaggle Kernel with the code to try out yourself.

Some simple Data Preprocessing

We have done some basic preprocessing such as removing Nulls
and one hot encoding. And converting the problem to a
classification problem using:
y = traindf['Overall']>=87

Here we use High Overall as a proxy for a great player.

Our dataset(X) looks like below and has 223 columns.

train Data X

1. Pearson Correlation

This is a filter-based method.

We check the absolute value of the Pearson’s
correlation between the target and numerical features in our
dataset. We keep the top n features based on this criterion.

2. Chi-Squared
This is another filter-based method.

In this method, we calculate the chi-square metric between the

target and the numerical variable and only select the variable
with the maximum chi-squared values.
Source

Let us create a small example of how we calculate the chi-

squared statistic for a sample.

So let’s say we have 75 Right-Forwards in our dataset and 25

Non-Right-Forwards. We observe that 40 of the Right-Forwards
are good, and 35 are not good. Does this signify that the player
being right forward affects the overall performance?

Observed and Expected Counts

We calculate the chi-squared value:

To do this, we first find out the values we would expect to be

falling in each bucket if there was indeed independence between
the two categorical variables.

This is simple. We multiply the row sum and the column sum
for each cell and divide it by total observations.
so Good and NotRightforward Bucket Expected value= 25(Row
Sum)*60(Column Sum)/100(Total Observations)

Why is this expected? Since there are 25% notRightforwards in

the data, we would expect 25% of the 60 good players we
observed in that cell. Thus 15 players.

Then we could just use the below formula to sum over all the 4
cells:
I won’t show it here, but the chi-squared statistic also works in a
hand-wavy way with non-negative numerical and categorical
features.

We can get chi-squared features from our dataset as:

3. Recursive Feature Elimination

This is a wrapper based method. As I said before, wrapper
methods consider the selection of a set of features as a search
problem.

From sklearn Documentation:

The goal of recursive feature elimination (RFE) is to select

features by recursively considering smaller and
smaller sets of features. First, the estimator is trained on
the initial set of features and the importance of each feature is
obtained either through a coef_ attribute or through
a feature_importances_ attribute. Then, the least important features
are pruned from current set of features. That procedure is
recursively repeated on the pruned set until the desired
number of features to select is eventually reached.

As you would have guessed, we could use any estimator with the
method. In this case, we use LogisticRegression , and the RFE
observes the coef_ attribute of the LogisticRegression object

4. Lasso: SelectFromModel
Source

This is an Embedded method. As said before, Embedded

methods use algorithms that have built-in feature selection
methods.

For example, Lasso and RF have their own feature selection

methods. Lasso Regularizer forces a lot of feature weights to be
zero.

Here we use Lasso to select variables.

5. Tree-based: SelectFromModel
This is an Embedded method. As said before, Embedded
methods use algorithms that have built-in feature selection
methods.

We can also use RandomForest to select features based on

feature importance.

We calculate feature importance using node impurities in each

decision tree. In Random forest, the final feature importance is
the average of all decision tree feature importance.

We could also have used a LightGBM. Or an XGBoost object as

long it has a feature_importances_ attribute.

Bonus
Why use one, when we can have all?

The answer is sometimes it won’t be possible with a lot of data

and time crunch.

But whenever possible, why not do this?

We check if we get a feature based on all the methods. In this
case, as we can see Reactions and LongPassing are excellent
attributes to have in a high rated player. And as
expected Ballcontrol and Finishing occupy the top spot too.

Conclusion
Feature engineering and feature selection are critical parts of
any machine learning pipeline.

We strive for accuracy in our models, and one cannot get to a

good accuracy without revisiting these pieces again and again.

In this article, I tried to explain some of the most used feature

selection techniques as well as my workflow when it comes to
feature selection.

I also tried to provide some intuition into these methods, but

you should probably try to see more into it and try to
incorporate these methods into your work.

Do read my post on feature engineering too if you are

interested.
If you want to learn more about Data Science, I would like to
call out this excellent course by Andrew Ng. This was the one
that got me started. Do check it out.

Thanks for the read. I am going to be writing more beginner-

friendly posts in the future too. Follow me up at Medium or
Subscribe to my blog to be informed about them. As always, I
welcome feedback and constructive criticism and can be
reached on Twitter @mlwhiz.

Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
CPM Pert by Sandeep Jyani Sir PDF Notes Civil Engineering PDF Free
No ratings yet
CPM Pert by Sandeep Jyani Sir PDF Notes Civil Engineering PDF Free
222 pages
Chap-4) GUI Programming and Database Connectivity Using Python
No ratings yet
Chap-4) GUI Programming and Database Connectivity Using Python
26 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
49 pages
Vsphere Vcenter Server 70 Installation Guide
No ratings yet
Vsphere Vcenter Server 70 Installation Guide
88 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
EnvisionProject Quick Start Guide
100% (1)
EnvisionProject Quick Start Guide
48 pages
OPCNet Broker Quick User Guide
No ratings yet
OPCNet Broker Quick User Guide
7 pages
IDS 6 Feature Engineering
No ratings yet
IDS 6 Feature Engineering
68 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
OSPF Config
No ratings yet
OSPF Config
264 pages
Bindiya
No ratings yet
Bindiya
91 pages
7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Soc
No ratings yet
Soc
13 pages
Filter Based Feature Selection Using ANOVA: Suppose A Company Wants To Analyze Whether The
No ratings yet
Filter Based Feature Selection Using ANOVA: Suppose A Company Wants To Analyze Whether The
66 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
014 - Feature Selection and Dimensionality Reduction
No ratings yet
014 - Feature Selection and Dimensionality Reduction
58 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Lecture 15 - 23.09.2024 - Feature Selection
No ratings yet
Lecture 15 - 23.09.2024 - Feature Selection
47 pages
Lecture 8 Feature Selection and Dimensionality Reduction
No ratings yet
Lecture 8 Feature Selection and Dimensionality Reduction
27 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
Module2.1 Feature Selection
No ratings yet
Module2.1 Feature Selection
46 pages
ML Inter Q&A
No ratings yet
ML Inter Q&A
54 pages
Module2.1 Feature Selection
No ratings yet
Module2.1 Feature Selection
38 pages
Unit 3
No ratings yet
Unit 3
50 pages
Unit-1 - Data Structures Using C
No ratings yet
Unit-1 - Data Structures Using C
30 pages
CSL0777 L07fgfdg
No ratings yet
CSL0777 L07fgfdg
28 pages
Feature Selection - New
No ratings yet
Feature Selection - New
41 pages
Novel Component-Based Development Model For Sip-Based Mobile Application
No ratings yet
Novel Component-Based Development Model For Sip-Based Mobile Application
15 pages
Data Reduction
No ratings yet
Data Reduction
23 pages
Module 3
No ratings yet
Module 3
33 pages
Cics Question Bank 1 of 28
No ratings yet
Cics Question Bank 1 of 28
28 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Trail
No ratings yet
Trail
203 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Feature Selection 16891042299
No ratings yet
Feature Selection 16891042299
23 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
1 s2.0 S277266222400081X Main
No ratings yet
1 s2.0 S277266222400081X Main
11 pages
Durga Black Book
No ratings yet
Durga Black Book
36 pages
Practicals CPP F
No ratings yet
Practicals CPP F
15 pages
3.1 Dimensionality Reduction
No ratings yet
3.1 Dimensionality Reduction
24 pages
Shap-Select:: Lightweight Feature Selection Using SHAP Values and Regression
No ratings yet
Shap-Select:: Lightweight Feature Selection Using SHAP Values and Regression
13 pages
Web Browser
No ratings yet
Web Browser
71 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
Telefire ADR-3000 Brochure PDF
No ratings yet
Telefire ADR-3000 Brochure PDF
2 pages
Explore Feature Engineering
No ratings yet
Explore Feature Engineering
10 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Astros
No ratings yet
Astros
20 pages
Linux Users and Permissions
No ratings yet
Linux Users and Permissions
8 pages
Features Selection and Featurs Generation
No ratings yet
Features Selection and Featurs Generation
5 pages
Warpper Method
No ratings yet
Warpper Method
8 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Lab 1
No ratings yet
Lab 1
7 pages
Feature Selection
No ratings yet
Feature Selection
7 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Towards Continuous Deployment of A Multilingual Mobile App
No ratings yet
Towards Continuous Deployment of A Multilingual Mobile App
12 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
T Sanchez Resume Weebly 2
No ratings yet
T Sanchez Resume Weebly 2
1 page
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
No ratings yet
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
14 pages
Introduction To Cloud Computing
No ratings yet
Introduction To Cloud Computing
9 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Feature Selection Technique
No ratings yet
Feature Selection Technique
7 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
Feature Engineering
No ratings yet
Feature Engineering
5 pages
Featuere Selection
No ratings yet
Featuere Selection
5 pages
Predictive Feature Selection
No ratings yet
Predictive Feature Selection
2 pages
Feature Selection
No ratings yet
Feature Selection
5 pages
FSP 3000R7 Release Note 7 1 5
No ratings yet
FSP 3000R7 Release Note 7 1 5
40 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
GAIN RATIO and Correlation
No ratings yet
GAIN RATIO and Correlation
7 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
Software Listing
No ratings yet
Software Listing
5 pages
Feature Selection
No ratings yet
Feature Selection
2 pages
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
No ratings yet
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
9 pages
CV SimranJakhodia
No ratings yet
CV SimranJakhodia
1 page
02 Cucumber Introduction - TDD
No ratings yet
02 Cucumber Introduction - TDD
5 pages
TarekRadwan CV
No ratings yet
TarekRadwan CV
3 pages
Benchmarking Excel Format
No ratings yet
Benchmarking Excel Format
5 pages
Correlation Based Feature Selection
No ratings yet
Correlation Based Feature Selection
4 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
9 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
General Linear Model: Statistical Methods
No ratings yet
General Linear Model: Statistical Methods
9 pages
Background: Blaise Pascal
No ratings yet
Background: Blaise Pascal
3 pages
Active Server Pages
No ratings yet
Active Server Pages
3 pages

The 5 Feature Selection Algorithms Every Data Scientist Should Know

Uploaded by

The 5 Feature Selection Algorithms Every Data Scientist Should Know

Uploaded by

The 5 Feature Selection

Algorithms every Data Scientist

Data Science is the study of algorithms.

I grapple through with many algorithms on a day to day basis,

How many times it has happened when you create a lot of

We sometimes end up using correlation or tree-based methods

Can we add some structure to it?

Why Feature Selection?

So there are three reasons why we don’t:

1. Curse of dimensionality — Overfitting

If we have more columns in the data than the number of rows,

We want our models to be simple and explainable. We lose

3. Garbage In Garbage out:

Most of the times, we will have many non-informative features.

Fortunately, Scikit-learn has made it pretty much easy for us to

 Filter based: We specify some metric and based on

 Wrapper-based: Wrapper methods consider the

 Embedded: Embedded methods use algorithms that

So enough of theory let us start with our five feature selection

We will try to do this using a dataset to understand it better.

I am going to be using a football player dataset to find

Here is the Kaggle Kernel with the code to try out yourself.

Some simple Data Preprocessing

Here we use High Overall as a proxy for a great player.

Our dataset(X) looks like below and has 223 columns.

This is a filter-based method.

In this method, we calculate the chi-square metric between the

Let us create a small example of how we calculate the chi-

So let’s say we have 75 Right-Forwards in our dataset and 25

Observed and Expected Counts

We calculate the chi-squared value:

To do this, we first find out the values we would expect to be

Why is this expected? Since there are 25% notRightforwards in

We can get chi-squared features from our dataset as:

3. Recursive Feature Elimination

The goal of recursive feature elimination (RFE) is to select

This is an Embedded method. As said before, Embedded

For example, Lasso and RF have their own feature selection

Here we use Lasso to select variables.

We can also use RandomForest to select features based on

We calculate feature importance using node impurities in each

We could also have used a LightGBM. Or an XGBoost object as

The answer is sometimes it won’t be possible with a lot of data

But whenever possible, why not do this?

We strive for accuracy in our models, and one cannot get to a

In this article, I tried to explain some of the most used feature

I also tried to provide some intuition into these methods, but

Do read my post on feature engineering too if you are

Thanks for the read. I am going to be writing more beginner-

You might also like