What Is Feature Engineering
What Is Feature Engineering
Feature engineering is the process of transforming raw data into features that are
suitable for machine learning models. In other words, it is the process of
selecting, extracting, and transforming the most relevant features from the
available data to build more accurate and efficient machine learning models.
What is a Feature?
In the context of machine learning, a feature (also known as a variable or
attribute) is an individual measurable property or characteristic of a data point
that is used as input for a machine learning algorithm. Features can be numerical,
categorical, or text-based, and they represent different aspects of the data that
are relevant to the problem at hand.
1. Feature Creation
Feature Creation is the process of generating new features based on domain
knowledge or by observing patterns in the data. It is a form of feature engineering
that can significantly improve the performance of a machine-learning model.
2. Feature Transformation
Feature Transformation is the process of transforming the features into a more
suitable representation for the machine learning model. This is done to ensure that
the model can effectively learn from the data.
Rescaling the features to have a similar scale, such as having a standard deviation
of 1, to make sure the model considers all features equally.
3. Feature Extraction
Feature Extraction is the process of creating new features from existing ones to
provide more relevant information to the machine learning model. This is done by
transforming, combining, or aggregating existing features.
Feature Combination: Combining two or more existing features to create a new one.
For example, the interaction between two features.
4. Feature Selection
Feature Selection is the process of selecting a subset of relevant features from
the dataset to be used in a machine-learning model. It is an important step in the
feature engineering process as it can have a significant impact on the model’s
performance.
Wrapper Method: Based on the evaluation of the feature subset using a specific
machine learning algorithm. The feature subset that results in the best performance
is selected.
Embedded Method: Based on the feature selection as part of the training process of
the machine learning algorithm.
5. Feature Scaling
Feature Scaling is the process of transforming the features so that they have a
similar scale. This is important in machine learning because the scale of the
features can affect the performance of the model.