Feature Selection
Feature Selection
A feature is an attribute that has an impact on a problem or is useful for the problem, and
choosing the important features for the model is known as feature selection.
Each machine learning process depends on feature engineering, which mainly contains two
processes; which are Feature Selection and Feature Extraction.
Although feature selection and extraction processes may have the same objective, both are
completely different from each other.
The main difference between them is that feature selection is about selecting the subset
of the original feature set, whereas feature extraction creates new features.
Feature selection is a way of reducing the input variable for the model by using only relevant
data in order to reduce overfitting in the model.
"It is a process of automatically or manually selecting the subset of most appropriate and
relevant features to be used in model building." Feature selection is performed by either
including the important features or excluding the irrelevant features in the dataset without
changing them.
Below are some benefits of using feature selection in machine learning:
o It helps in avoiding the curse of dimensionality.
o It helps in the simplification of the model so that it can be easily interpreted by
the researchers.
o It reduces the training time.
o It reduces overfitting hence enhance the generalization.
Feature Selection Techniques
There are mainly two types of Feature Selection techniques, which are:
Supervised Feature Selection technique
Supervised Feature selection techniques consider the target variable and can be used for the
labelled dataset.
Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore the target variable and can be used for the
unlabelled dataset.
Wrapper Methods
In wrapper methodology, selection of features is done by considering it as a search problem,
in which different combinations are made, evaluated, and compared with other combinations.
It trains the algorithm by using the subset of features iteratively.
On the basis of the output of the model, features are added or subtracted, and with this feature
set, the model has trained again.
Some techniques of wrapper methods are:
o Forward selection - Forward selection is an iterative process, which begins with an
empty set of features. After each iteration, it keeps adding on a feature and evaluates
the performance to check whether it is improving the performance or not.
o The process continues until the addition of a new variable/feature does not improve
the performance of the model.
o Backward elimination - Backward elimination is also an iterative approach, but it is
the opposite of forward selection. This technique begins the process by considering all
the features and removes the least significant feature.
o This elimination process continues until removing the features does not improve the
performance of the model.
o Exhaustive Feature Selection- Exhaustive feature selection is one of the best feature
selection methods, which evaluates each feature set as brute-force.
o It means this method tries & make each possible combination of features and return
the best performing feature set.
o RecursiveFeatureElimination-
Recursive feature elimination is a recursive greedy optimization approach, where
features are selected by recursively taking a smaller and smaller subset of features.
o Now, an estimator is trained with each set of features, and the importance of each
feature is determined using coef_attribute or through
a feature_importances_attribute.
2. Filter Methods
In Filter Method, features are selected on the basis of statistics measures. This method does
not depend on the learning algorithm and chooses the features as a pre-processing step.
The filter method filters out the irrelevant feature and redundant columns from the model by
using different metrics through ranking.
The advantage of using filter methods is that it needs low computational time and does not
overfit the data.
3. Embedded Methods
Embedded methods combined the advantages of both filter and wrapper methods by
considering the interaction of features along with low computational cost. These are fast
processing methods similar to the filter method but more accurate than the filter method.
These methods are also iterative, which evaluates each iteration, and optimally finds the most
important features that contribute the most to training in a particular iteration. Some
techniques of embedded methods are:
o Regularization- Regularization adds a penalty term to different parameters of the
machine learning model for avoiding overfitting in the model. This penalty term is
added to the coefficients; hence it shrinks some coefficients to zero.
o Those features with zero coefficients can be removed from the dataset. The types of
regularization techniques are L1 Regularization (Lasso Regularization) or Elastic Nets
(L1 and L2 regularization).
o Random Forest Importance - Different tree-based methods of feature selection help
us with feature importance to provide a way of selecting features. Here, feature
importance specifies which feature has more importance in model building or has a
great impact on the target variable.
o Random Forest is such a tree-based method, which is a type of bagging algorithm
that aggregates a different number of decision trees. It automatically ranks the nodes
by their performance or decrease in the impurity (Gini impurity) over all the trees.
Nodes are arranged as per the impurity values, and thus it allows to pruning of trees
below a specific node. The remaining nodes create a subset of the most important
features.
Feature selection is a very complicated and vast field of machine learning, and lots of studies
are already made to discover the best methods. There is no fixed rule of the best feature
selection method. However, choosing the method depend on a machine learning engineer
who can combine and innovate approaches to find the best method for a specific problem.
One should try a variety of model fits on different subsets of features selected through
different statistical Measure