04 - Feature Engineering
04 - Feature Engineering
Feature Engineering
The importance of the data collecting and
preprocessing
1. Imputation
The process of managing missing values, which is one of the
most common problems when it comes to preparing data for
machine learning.
Feature Engineering Techniques
Missing At
Random
Missing Not At
Random
Feature Engineering Techniques
2. Handling Outliers
▪ Outliers are the observations in a given dataset that lies far
from the rest of the observations.
▪ Outliers occur due to the variability in the data, or due to
experimental error/human error
2. Handling Outliers
2. Handling Outliers
Z-score
Feature Engineering Techniques
2. Handling Outliers
2. Handling Outliers
Box-Plot
Feature Engineering Techniques
3. Log Transform
▪ A data transformation process where each variable is
replaced with its logarithm (each variable x is replaced
with log(x), base 10, base 2, or natural logarithm.
▪ It is commonly used to compress the y-axis in histograms,
making visualization clearer and de-emphasizing outliers
in the data.
▪ The log transformations have not produced normal
distributions; although close to normal
Feature Engineering Techniques
3. Log Transform
Feature Engineering Techniques
4. One-Hot Encoding
4. One-Hot Encoding
Feature Engineering Techniques
4. One-Hot Encoding
Feature Engineering Techniques
5. Scaling
▪ A data calibration technique that facilitates the
comparison of different types of data.
▪ Useful for measurements to correct the way the model
handles small and large numbers.
5. Scaling
▪ Techniques
◼ Scaling data