Feature Engineering
Feature Engineering
Feature Engineering is the process of creating new features or transforming existing features
to improve the performance of a machine-learning model. It involves selecting relevant
information from raw data and transforming it into a format that can be easily understood by
a model. The goal is to improve model accuracy by providing more meaningful and relevant
information.
What is a Feature?
In the context of machine learning, a feature (also known as a variable or attribute) is an
individual measurable property or characteristic of a data point that is used as input for a
machine learning algorithm. Features can be numerical, categorical, or text-based, and they
represent different aspects of the data that are relevant to the problem at hand.
For example, in a dataset of housing prices, features could include the number of
bedrooms, the square footage, the location, and the age of the property. In a dataset of
customer demographics, features could include age, gender, income level, and
occupation.
The choice and quality of features are critical in machine learning, as they can greatly
impact the accuracy and performance of the model.
1. Feature Creation
Feature Creation is the process of generating new features based on domain knowledge or by
observing patterns in the data. It is a form of feature engineering that can significantly
improve the performance of a machine-learning model.
2. Feature Transformation
Feature Transformation is the process of transforming the features into a more suitable
representation for the machine learning model. This is done to ensure that the model can
effectively learn from the data.
Types of Feature Transformation:
1. Normalization: Rescaling the features to have a similar range, such as between 0 and 1, to
prevent some features from dominating others.
2. Scaling: Scaling is a technique used to transform numerical variables to have a similar
scale, so that they can be compared more easily. Rescaling the features to have a similar
scale, such as having a standard deviation of 1, to make sure the model considers all
features equally.
3. Encoding: Transforming categorical features into a numerical representation. Examples
are one-hot encoding and label encoding.
4. Transformation: Transforming the features using mathematical operations to change the
distribution or scale of the features. Examples are logarithmic, square root, and reciprocal
transformations.
Why Feature Transformation?
1. Improves Model Performance: By transforming the features into a more suitable
representation, the model can learn more meaningful patterns in the data.
2. Increases Model Robustness: Transforming the features can make the model more robust
to outliers and other anomalies.
3. Improves Computational Efficiency: The transformed features often require fewer
computational resources.
4. Improves Model Interpretability: By transforming the features, it can be easier to
understand the model’s predictions.
3. Feature Extraction
Feature Extraction is the process of creating new features from existing ones to provide more
relevant information to the machine learning model. This is done by transforming,
combining, or aggregating existing features.
Types of Feature Extraction:
1. Dimensionality Reduction: Reducing the number of features by transforming the data into
a lower-dimensional space while retaining important information. Examples
are PCA and t-SNE.
2. Feature Combination: Combining two or more existing features to create a new one. For
example, the interaction between two features.
3. Feature Aggregation: Aggregating features to create a new one. For example, calculating
the mean, sum, or count of a set of features.
4. Feature Transformation: Transforming existing features into a new representation. For
example, log transformation of a feature with a skewed distribution.
Why Feature Extraction?
1. Improves Model Performance: By creating new and more relevant features, the model can
learn more meaningful patterns in the data.
2. Reduces Overfitting: By reducing the dimensionality of the data, the model is less likely
to overfit the training data.
3. Improves Computational Efficiency: The transformed features often require fewer
computational resources.
4. Improves Model Interpretability: By creating new features, it can be easier to understand
the model’s predictions.
4. Feature Selection
Feature Selection is the process of selecting a subset of relevant features from the dataset to
be used in a machine-learning model. It is an important step in the feature engineering
process as it can have a significant impact on the model’s performance.
Types of Feature Selection:
1. Filter Method: Based on the statistical measure of the relationship between the feature
and the target variable. Features with a high correlation are selected.
2. Wrapper Method: Based on the evaluation of the feature subset using a specific machine
learning algorithm. The feature subset that results in the best performance is selected.
3. Embedded Method: Based on the feature selection as part of the training process of the
machine learning algorithm.
Why Feature Selection?
1. Reduces Overfitting: By using only the most relevant features, the model can generalize
better to new data.
2. Improves Model Performance: Selecting the right features can improve the accuracy,
precision, and recall of the model.
3. Decreases Computational Costs: A smaller number of features requires less computation
and storage resources.
4. Improves Interpretability: By reducing the number of features, it is easier to understand
and interpret the results of the model.
5. Feature Scaling
Feature Scaling is the process of transforming the features so that they have a similar scale.
This is important in machine learning because the scale of the features can affect the
performance of the model.
Types of Feature Scaling:
1. Min-Max Scaling: Rescaling the features to a specific range, such as between 0 and 1, by
subtracting the minimum value and dividing by the range.
2. Standard Scaling: Rescaling the features to have a mean of 0 and a standard deviation of 1
by subtracting the mean and dividing by the standard deviation.
3. Robust Scaling: Rescaling the features to be robust to outliers by dividing them by the
interquartile range.
Why Feature Scaling?
1. Improves Model Performance: By transforming the features to have a similar scale, the
model can learn from all features equally and avoid being dominated by a few large
features.
2. Increases Model Robustness: By transforming the features to be robust to outliers, the
model can become more robust to anomalies.
3. Improves Computational Efficiency: Many machine learning algorithms, such as k-
nearest neighbors, are sensitive to the scale of the features and perform better with scaled
features.
4. Improves Model Interpretability: By transforming the features to have a similar scale, it
can be easier to understand the model’s predictions.
Handling Outliers
Outliers are the deviated values or data points that are observed too away from other data
points in such a way that they badly affect the performance of the model. Outliers can be
handled with this feature engineering technique. This technique first identifies the outliers
and then remove them out.
Standard deviation can be used to identify the outliers. For example, each value within a
space has a definite to an average distance, but if a value is greater distant than a certain
value, it can be considered as an outlier. Z-score can also be used to detect outliers.
Log transform
Logarithm transformation or log transform is one of the commonly used mathematical
techniques in machine learning. Log transform helps in handling the skewed data, and it
makes the distribution more approximate to normal after transformation. It also reduces the
effects of outliers on the data, as because of the normalization of magnitude differences, a
model becomes much robust.
Note: Log transformation is only applicable for the positive values; else, it will give an error.
To avoid this, we can add 1 to the data before transformation, which ensures transformation
to be positive.
One-Hot Encoding
One-hot encoding is a technique used to transform categorical variables into numerical values
that can be used by machine learning models. In this technique, each category is transformed
into a binary value indicating its presence or absence. For example, consider a categorical
variable “Colour” with three categories: Red, Green, and Blue. One-hot encoding would
transform this variable into three binary variables: Colour_Red, Colour_Green, and
Colour_Blue, where the value of each variable would be 1 if the corresponding category is
present and 0 otherwise.
Binning
Binning is a technique used to transform continuous variables into categorical variables. In
this technique, the range of values of the continuous variable is divided into several bins, and
each bin is assigned a categorical value. For example, consider a continuous variable “Age”
with values ranging from 18 to 80. Binning would divide this variable into several age groups
such as 18-25, 26-35, 36-50, and 51-80, and assign a categorical value to each age group.
Scaling
The most common scaling techniques are standardization and normalization. Standardization
scales the variable so that it has zero mean and unit variance. Normalization scales the
variable so that it has a range of values between 0 and 1.
Feature Split
Feature splitting is a powerful technique used in feature engineering to improve the
performance of machine learning models. It involves dividing single features into multiple
sub-features or groups based on specific criteria. This process unlocks valuable insights and
enhances the model’s ability to capture complex relationships and patterns within the data.