The 5 Feature Selection Algorithms Every Data Scientist Should Know
The 5 Feature Selection Algorithms Every Data Scientist Should Know
Rahul Agarwal
Follow
Jul 27, 2019 · 7 min read
2. Occam’s Razor:
So What do we do?
We select only useful features.
1. Pearson Correlation
2. Chi-Squared
This is another filter-based method.
This is simple. We multiply the row sum and the column sum
for each cell and divide it by total observations.
so Good and NotRightforward Bucket Expected value= 25(Row
Sum)*60(Column Sum)/100(Total Observations)
Then we could just use the below formula to sum over all the 4
cells:
I won’t show it here, but the chi-squared statistic also works in a
hand-wavy way with non-negative numerical and categorical
features.
From sklearn Documentation:
As you would have guessed, we could use any estimator with the
method. In this case, we use LogisticRegression , and the RFE
observes the coef_ attribute of the LogisticRegression object
4. Lasso: SelectFromModel
Source
5. Tree-based: SelectFromModel
This is an Embedded method. As said before, Embedded
methods use algorithms that have built-in feature selection
methods.
Bonus
Why use one, when we can have all?
Conclusion
Feature engineering and feature selection are critical parts of
any machine learning pipeline.