Parameters for Feature Selection
Last Updated :
08 Apr, 2025
Feature selection is a process of selecting a subset of relevant features that contribute the most to the prediction of model while discarding redundant, irrelevant or noisy features. This ensures that the model focuses on the important variable required for prediction.
In this article we will discuss the various parameters used in feature selection helping you understand how to choose the right features for your machine learning model.
Common Parameters Used in Feature Selection
Here are different parameters that can be used in feature selection.
1. Information Gain
Information Gain is the reduction in uncertainty gained by knowing the value of a feature. In classification tasks, it is used to select features based on how much information they provide about the target variable. It is used in algorithms like Decision Trees.
IG(Y, X) = H(Y) - H(Y | X)
Where:
- H(Y) is the entropy of the target variable.
- H(Y|X) is the conditional entropy of the target variable given the feature X.
A higher information gain indicates that the feature provides more information and is a better choice for feature selection.
2. Correlation Coefficient
Correlation coefficient quantifies the degree to which two variables are related. In feature selection it helps in identifying redundant features. Highly correlated features tend to provide similar information so one of them can often be removed without significantly affecting the model’s performance.
r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2 \sum (Y_i - \bar{Y})^2}}
Where,
- X_i and Y_i are individual data points for features X and Y
- \bar{X} and \bar{Y} are the mean values of features X and Y.
A high absolute value of the correlation coefficient (close to 1 or -1) indicates a strong relationship between features. Features with high correlation can often be merged or one can be removed to reduce redundancy.
3. p-Value
p-value helps in determining whether the relationship between a feature and the target variable is statistically significant. In feature selection the p-value is used to identify features that significantly affect the model’s output. A low p-value indicates that a feature is statistically significant.
- p-value < 0.05: The feature is statistically significant.
- p-value > 0.05: The feature may not be statistically significant and can be considered for removal.
4. Recursive Feature Elimination
Recursive Feature Elimination (RFE) is a wrapper method for feature selection that recursively removes features from the dataset to improve model performance. The process involves fitting the model multiple times and eliminating the least important features based on model accuracy or feature importance scores.
Steps in RFE are:
- Train the model on the complete set of features.
- Rank the features based on their importance.
- Remove the least important features.
- Repeat the process until the desired number of features is selected.
RFE is effective for selecting the most relevant features when the model’s performance is the primary focus.
5. Feature Importance
Embedded methods such as those used in tree-based models like Random Forest and Gradient Boosting provide a feature importance score based on how well the feature contributes to the prediction.
- Random Forest Feature Importance: Random Forest computes the importance of each feature based on the decrease in impurity i.e., Gini index or entropy caused by the feature. Features that reduce impurity significantly are considered important.
- Gradient Boosting Feature Importance: In Gradient Boosting features are ranked by their contribution to the model's performance in terms of reducing the loss function.
6. Variance Threshold
Variance threshold is a simple filter-based method used to select features based on their variance. Features with low variance i.e features that have the same or nearly the same value across samples are less likely to contribute useful information to the model and can be removed.
\text{Variance}(X) = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \mu_X)^2
Where
- x_i is an individual sample,
- \mu_X is the mean of feature X
- n is the number of samples.
Features with low variance i.e close to zero can be discarded as they do not provide enough differentiation between data points.
7. Chi-Square Test
Chi-Square test is used to determine whether there is a significant association between two categorical variables. It is commonly used in feature selection for classification problems involving categorical data. Features with a higher score are considered more important.
\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
Where
- O_i is the observed frequency,
- E_i is the expected frequency.
A higher Chi-Square score indicates a stronger relationship between the feature and the target variable.
By using methods like Information Gain, Mutual Information, Correlation Coefficient, p-value and others you can effectively choose the best features for your model and avoid overfitting, redundancy and computational inefficiency.
Similar Reads
Feature Selection Using Random Forest
Feature selection is a crucial step in building machine learning models. It involves selecting the most important features from your dataset that contribute to the predictive power of the model. Random Forest, an ensemble learning method, is widely used for feature selection due to its inherent abil
4 min read
Feature Subset Selection Process
Feature Selection is the most critical pre-processing activity in any machine learning process. It intends to select a subset of attributes or features that makes the most meaningful contribution to a machine learning activity. In order to understand it, let us consider a small example i.e. Predict
12 min read
How to Perform Feature Selection for Regression Data
Feature selection is a crucial step in the data preprocessing pipeline for regression tasks. It involves identifying and selecting the most relevant features (or variables) that contribute to the prediction of the target variable. This process helps in reducing the complexity of the model, improving
8 min read
Sklearn | Model Hyper-parameters Tuning
Hyperparameter tuning is the process of finding the optimal values for the hyperparameters of a machine-learning model. Hyperparameters are parameters that control the behaviour of the model but are not learned during training. Hyperparameter tuning is an important step in developing machine learnin
12 min read
LightGBM Feature parameters
LightGBM (Light gradient-boosting machine) is a gradient-boosting framework developed by Microsoft, known for its impressive performance and less memory usage. In this article, we'll explore LightGBM's feature parameters while working with the Wisconsin Breast Cancer dataset. What is LightGBM?Micros
10 min read
CatBoost Regularization parameters
CatBoost, developed by Yandex, is a powerful open-source gradient boosting library designed to tackle categorical feature handling and deliver high-performance machine learning models. It stands out for its ability to handle categorical variables natively, without requiring extensive preprocessing.
9 min read
How can Feature Selection reduce overfitting?
The development of precise models is essential for predicted performance in the rapidly developing area of machine learning. The possibility of overfitting, in which a model picks up noise and oscillations unique to the training set in addition to the underlying patterns in the data, presents an inh
8 min read
ML | Feature Scaling - Part 1
Feature Scaling is a technique to standardize the independent features present in the data in a fixed range. It is performed during the data pre-processing. Working: Given a data-set with features- Age, Salary, BHK Apartment with the data size of 5000 people, each having these independent data featu
3 min read
Feature Scaling - Part 3
Prerequisite - Feature Scaling | Set-1 , Set-2Â Feature Scaling is one of the most important steps of Data Preprocessing. It is applied to independent variables or features of data. The data sometimes contains features with varying magnitudes and if we do not treat them, the algorithms only take in
5 min read
Feature Selection Using Random forest Classifier
Feature selection is a crucial step in the machine learning pipeline that involves identifying the most relevant features for building a predictive model. One effective method for feature selection is using a Random Forest classifier, which provides insights into feature importance. In this article,
5 min read