0% found this document useful (0 votes)
27 views

Feature Engineering For Machine Learning

Feature engineering is the process of transforming raw data into features that can be used for machine learning models. It involves techniques like feature creation, transformation, extraction, selection, and scaling. The goal is to improve model performance by providing relevant input data. Key steps include data cleaning, transformation, extraction of important features, selection of important variables, feature iteration and splitting, and handling of missing data and categorical variables.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Feature Engineering For Machine Learning

Feature engineering is the process of transforming raw data into features that can be used for machine learning models. It involves techniques like feature creation, transformation, extraction, selection, and scaling. The goal is to improve model performance by providing relevant input data. Key steps include data cleaning, transformation, extraction of important features, selection of important variables, feature iteration and splitting, and handling of missing data and categorical variables.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Feature Engineering for

Machine Learning
• Feature engineering is the pre-processing step of machine learning,
which is used to transform raw data into features that can be used
for creating a predictive model using Machine learning
• In other words, it is the process of selecting, extracting, and
transforming the most relevant features from the available
data to build more accurate and efficient machine learning
models.
• Feature engineering involves a set of techniques that enable us
to create new features by combining or transforming the
existing ones.
Need for feature Engineering
• to improve the performance of machine learning models by
providing them with relevant and informative input data.
• Feature engineering can also help in addressing issues such as
overfitting, underfitting, and high dimensionality.
• feature engineering is a crucial step in preparing data for
analysis and decision-making in various fields, such as finance,
healthcare, marketing, and social sciences.
Processes Involved in Feature
Engineering
• Feature engineering in Machine learning consists of mainly 5
processes:
1) Feature Creation,
2) Feature Transformation,
3) Feature Extraction,
4) Feature Selection, and
5) Feature Scaling.
• The success of a machine learning model largely depends on
the quality of the features used in the model.
Feature Creation
• Feature Creation is the process of generating new features
based on domain knowledge or by observing patterns in the
data.
• The new features are created by mixing existing features using
addition, subtraction, and ration, and these new features have great
flexibility.
Types of Feature Creation:
• Domain-Specific, Creating new features based on domain
knowledge
• Data-Driven, Creating new features by observing patterns in
the data
• Synthetic, Generating new features by combining existing
features
Benefits of Feature Creation
• Improves Model Performance
• Increases Model Robustness
• Improves Model Interpretability
• Increases Model Flexibility
2. Feature Transformation

• Feature Transformation is the process of transforming the


features into a more suitable representation for the machine
learning model.
Types of Feature Transformation:

• Normalization: Rescaling the features to have a similar range,


such as between 0 and 1, to prevent some features from
dominating others.
• Scaling: Rescaling the features to have a similar scale, such as
having a standard deviation of 1, to make sure the model
considers all features equally.
• Encoding: Transforming categorical features into a numerical
representation. Examples are one-hot encoding and label encoding.
• Transformation: Transforming the features using mathematical
operations to change the distribution or scale of the features.
Examples are logarithmic, square root, and reciprocal
transformations.
3. Feature Extraction

• Feature Extraction is the process of creating new features from


existing ones to provide more relevant information to the
machine learning model.
• The main aim of this step is to reduce the volume of data so that it
can be easily used and managed for data modelling.
• Feature extraction methods include cluster analysis, text analytics,
edge detection algorithms, and principal components analysis (PCA).
Types of Feature Extraction
• Dimensionality Reduction, Reducing the number of features by
transforming the data into a lower-dimensional space while
retaining important information. Examples are PCA and t-SNE.
• Feature Combination, Combining two or more existing features to
create a new one. For example, the interaction between two
features.
• Feature Aggregation, Aggregating features to create a new one.
For example, calculating the mean, sum, or count of a set of
features.
• Feature Transformation, Transforming existing features into a new
representation. For example, log transformation of a feature with a
skewed distribution.
4. Feature Selection

• Feature selection is a way of selecting the subset of the most


relevant features from the original features set by removing the
redundant, irrelevant, or noisy features.
Types of Feature Selection
• Filter Method, Based on the statistical measure of the
relationship between the feature and the target variable.
Features with a high correlation are selected.
• Wrapper Method, Based on the evaluation of the feature
subset using a specific machine learning algorithm. The feature
subset that results in the best performance is selected.
• Embedded Method, Based on the feature selection as part of
the training process of the machine learning algorithm.
Feature Scaling

• Feature Scaling is the process of transforming the features so


that they have a similar scale.
• Types of Feature Scaling:
• Min-Max Scaling, subtracting the minimum value and dividing by the
range.
• Standard Scaling, Rescaling the features to have a mean of 0 and a
standard deviation of 1 by subtracting the mean and dividing by the
standard deviation.
• Robust Scaling, Rescaling the features to be robust to outliers by
dividing them by the interquartile range.
Steps to Feature Engineering

• Data Cleaning (removing or correcting any errors or inconsistencies )


• Data Transformation (normalization, standardization, and log transformation)
• Feature Extraction (principal component analysis (PCA), text parsing, and image
processing)

• Feature Selection (correlation analysis, mutual information, and stepwise regression)


• Feature Iteration (adding new features, removing redundant features and transforming
features in different ways. Binning is the process of grouping continuous features into discrete
bins)

• Feature Split (splitting a single variable into multiple variables)


Feature engineering techniques
• Missing data imputation
• Categorical encoding
• Variable transformation
• Outlier engineering
Missing data imputation

1.Complete case analysis


2.Mean / Median / Mode imputation
3.Missing Value Indicator
• Complete Case Analysis for Missing Data Imputation
• remove all the observations that contain missing values
• can only be used when there are only a few observations which
has a missing dataset
# check how many observations we would drop
print('total passengers with values in all variables: ', data1.dropna().shape[0])
print('total passengers in the Titanic: ', data1.shape[0])
print('percentage of data without missing values: ', data1.dropna().shape[0]/ np.float(data1.shape[0]))
So, we have complete information for only 20% of our observations in the Titanic dataset.
Thus, Complete Case Analysis method would not be an option for this dataset.
Mean/ Median/ Mode for Missing Data
Imputation
• Missing values can also be replaced with the mean, median, or
mode of the variable(feature)

• Output: 0, no null value in Age feature.


Missing Value Indicator For Missing Value
Indication
• This technique involves adding a binary variable to indicate
whether the value is missing for a certain observation.
• Output:
So, the Age_NA variable was created to capture the missingness.
Categorical encoding in Feature Engineering

• There are multiple techniques to do so:

1.One-Hot encoding (OHE)

2.Ordinal encoding

3.Count and Frequency encoding

4.Target encoding / Mean encoding


One-Hot Encoding

• It is a commonly used technique for encoding categorical


variables. It basically creates binary variables for each category
present in the categorical variable.
• These binary variables will have 0 if it is absent in the category
or 1 if it is present. Each new variable is called a dummy variable
or binary variable.
only 1 dummy variable to represent Sex
categorical variable
Ordinal Encoding

• In this case, a simple way to encode is to replace the labels with


some ordinal number. Look at sample code:
Count and Frequency Encoding

• In this encoding technique, categories are replaced by the count


of the observations that show that category in the dataset.
• Replacement can also be done with the frequency of the
percentage of observations in the dataset.
• Suppose, if 30 of 100 genders are male we can replace male
with 30 or by 0.3.
Target / Mean Encoding

• replace each category of a variable with the mean value of the


target for the observations that show a certain category.
Variable Transformation

• Machine learning algorithms like linear and logistic regression


assume that the variables are normally distributed.
• If a variable is not normally distributed, sometimes it is possible
to find a mathematical transformation so that the transformed
variable is Gaussian.
• Commonly used mathematical transformations are:
1.Logarithm transformation – log(x)
2.Square root transformation – sqrt(x)
3.Reciprocal transformation – 1 / x
4.Exponential transformation – exp(x)
• Loading numerical features of the titanic dataset.
• Now, to visualize the distribution of the age variable we will plot
histogram and Q-Q-plot.
Now, let’s apply the transformation and
compare the transformed Age variable.
• Logarithmic transformation
Square root transformation – sqrt(x)
Reciprocal transformation – 1 / x
Exponential transformation – exp(x)

You might also like