1.3.2. Feature Engineering and Variable - Transformation
1.3.2. Feature Engineering and Variable - Transformation
1
Learning Goals
In this section, we will cover:
- Feature engineering and variable transformation
- Feature encoding
- Feature scaling
2
Transforming Data: Background
Models used in Machine Learning Workflows often make assumptions
about the data.
3
Transforming Data: Background
4
Transformation of Data Distributions
Predictions from linear regression models assume residuals are normally distributed.
Features and predicted data are often skewed (distorted away from the center).
5
Transformation of Data Distributions
6
Log Transformation Example
7
Transformations: Log Features
8
Transformations: Polynomial Features
9
Polynomial Features: Syntax
10
Transformations: Polynomial Features
We can estimate higher-order relationships
in this data by adding polynomial features.
11
Variable Selection: Background
Variable selection involves choosing the set of features to include in the model.
12
Feature Encoding: Types of Features
Encoding is often applied to categorical features, that take non-numeric values.
Two primary types:
13
Feature Encoding: Approaches
There are several common approaches to encoding variables:
- One-hot encoding: converts variables that take multiple values into binary (0, 1)
variables, one for each category. This creates several new variables.
14
Feature Scaling: Background
15
Feature Scaling: Example
16
Feature Scaling: Example
17
Feature Scaling: Approaches
There are many approaches to scaling features.
Some of the more common approaches include:
- Robust scaling: is similar to min-max scaling, but instead maps the interquartile
range (the 75th percentile value minus the 25th percentile value) to (0,1).
This means the variable itself takes values outside of the (0, 1) interval.
18
Common Variable Transformations
19
Example of Standard and Min-Max
20
Example of Standard and Min-Max
""“ Handling the missing values """
from sklearn import preprocessing
# Scaled feature
x_after_min_max_scaler = min_max_scaler.fit_transform(x)
print ("\nAfter min max Scaling : \n",
x_after_min_max_scaler)
# Scaled feature
x_after_Standardisation = Standardisation.fit_transform(x)
21
print ("\nAfter Standardisation : \n",
Example of Standard and Min-Max
● Output:
22
Example of Standard and Min-Max
● Output:
23
Common Variable Transformations
24
Common Variable Transformations
25
Common Variable Transformations
● Categorical features are generally divided into 3 types:
○ A. Binary: Either/or
Examples:
■ Yes, No
■ True, False
df['species'].unique()
Output: 27
Summary
● Feature Engineering and Variable Transformation
○ Transforming variables helps to meet the assumptions of statistical
models. A concrete example is a linear regression, in which you may
transform a predictor variable such that it has a linear relation with a
target variable.
○ Common variable transformations are: calculating log transformations
and polynomial features, encoding a categorical variable, and scaling a
variable.
28
Learning Recap
In this section, we discussed:
- Feature engineering and variable transformation
- Feature encoding
- Feature scaling
29