Feature Selection
Feature Selection
Reference : https://machinelearningmastery.com/feature-selection-machine-learning-python/
https://www.datacamp.com/tutorial/feature-selection-python https://towardsdatascience.com/feature-
selection-methods-and-how-to-choose-them-1e7469100e7e https://www.youtube.com/watch?
v=za1aA9U4kbI
It is the process where you automatically select the features which are most important. It is also known as
variable selection or attribute selection.
Unsupervised methods
Unsupervised feature selection methods doesn't require any labels. They don't need access to the target
variables. It works by,
Discarding almost constant variables.
Dropping incomplete features.
Dropping high multicollinear variables.
Supervised methods
Wrapper methods
![[Feature Selection-4.png]] Wrapper method uses a model to evaluate the performance of different
subsets and the best subset is selected. But there is a chance of overfitting. So it is recommended to check
the subset selected with another model. Another disadvantage is the large computational requirements.
Popular wrapper methods are,
Backward Selection
In Backward selection, a full model with all features is analysed. In each iteration the feature contributed
more to the performance is removed. Process is repeated until we gain desired number of features.
Forward Selection
Forward Selection, is initiated with a null model and features are added by one by one maximizing the
performance of the model.
Recursive Feature Elimination is similar to Backward Selection. Difference is the selection of features to
discard. RFE uses importance of features to discard the features. Which is weight in linear models, impurity
decrease in tree based models ...etc.
Filter Methods
![[Feature Selection-3.png]] In Filter Methods the statistical relation of features with the target variable is
analysed using measures like correlation or mutual information. This is more simpler, faster and model-
agnostic than wrapper methods. They are less prone to overfitting. The major draw back of this method is
that, it ignores features that are weak predictors of the target variable. But makes more sense when
combined with other features.
Embedded Methods
The idea of embedded methods are to combine the benefits of filter methods and wrapper methods. It
focuses on getting faster results while getting the best subset like wrapper methods. There aren't many
embedded methods or inclusive algorithms available. One example is LASSO regression. Where the
weights of the features gradually shrunk towards zero. Many zero weighted features are removed while the
rest of the non zero features are remained. An example of LASSO regression is the computer vision.
Implementation
![[Feature Selection-6.png]]