0% found this document useful (0 votes)

37 views29 pages

1.3.2. Feature Engineering and Variable - Transformation

Uploaded by

havietthang02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views29 pages

1.3.2. Feature Engineering and Variable - Transformation

Uploaded by

havietthang02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Feature Engineering and

Variable and Transformation

1
Learning Goals
In this section, we will cover:
- Feature engineering and variable transformation
- Feature encoding
- Feature scaling

2
Transforming Data: Background
Models used in Machine Learning Workflows often make assumptions
about the data.

A common example is the linear regression model.

This assumes a linear relationship between observations and target
(outcome) variables.

3
Transforming Data: Background

4
Transformation of Data Distributions
Predictions from linear regression models assume residuals are normally distributed.

Features and predicted data are often skewed (distorted away from the center).

Data transformations can solve this issue.

5
Transformation of Data Distributions

6
Log Transformation Example

7
Transformations: Log Features

Log transformations can be useful

for linear regression.

The linear regression model involves

linear combinations of features.

8
Transformations: Polynomial Features

We can estimate higher-order relationships in

this data by adding polynomial features.

This allows us to use the same 'linear' model.

9
Polynomial Features: Syntax

10
Transformations: Polynomial Features
We can estimate higher-order relationships
in this data by adding polynomial features.

This allows us to use the same 'linear' model.

Even with higher-order polynomials.

11
Variable Selection: Background
Variable selection involves choosing the set of features to include in the model.

Variables must often be transformed before they can be included in models.

In addition to log and polynomial transformations, this can involve:

- Encoding: converting non-numeric features to numeric features.

- Scaling: converting the scale of numeric data so they are comparable.

The appropriate method of scaling or encoding depends on the type of feature.

12
Feature Encoding: Types of Features
Encoding is often applied to categorical features, that take non-numeric values.
Two primary types:

- Nominal: categorical variables take values in unordered categories

(e.g. Red, Blue, Green; True, False).

- Ordinal: categorical variables take values in ordered categories

(e.g. High, Medium, Low).

13
Feature Encoding: Approaches
There are several common approaches to encoding variables:

- Binary encoding: converts variables to either 0 or 1 and is suitable for variables

that take two possible values (e.g. True, False).

- One-hot encoding: converts variables that take multiple values into binary (0, 1)
variables, one for each category. This creates several new variables.

- Ordinal encoding: involves converting ordered categories to numerical values,

usually by creating one variable that takes integer equal to the number of
categories (e.g. 0, 1, 2, 3, ...).

14
Feature Scaling: Background

Feature scaling involves adjusting a variable's scale.

This allows comparison of variables with different scales.
Different continuous (numeric) features often have different scales.
Why might this be an issue?

15
Feature Scaling: Example

16
Feature Scaling: Example

17
Feature Scaling: Approaches
There are many approaches to scaling features.
Some of the more common approaches include:

- Standard scaling: converts features to standard normal variables

(by subtracting the mean and dividing by the standard error).

- Min-max scaling: converts variables to continuous variables in the (0, 1) interval

by mapping minimum values to 0 and maximum values to 1.
This type of scaling is sensitive to outliers.

- Robust scaling: is similar to min-max scaling, but instead maps the interquartile
range (the 75th percentile value minus the 25th percentile value) to (0,1).
This means the variable itself takes values outside of the (0, 1) interval.
18
Common Variable Transformations

Feature Type Transformation

Continuous: numerical values - Standard, Min-Max, Robust Scaling

19
Example of Standard and Min-Max

20
Example of Standard and Min-Max
""“ Handling the missing values """
from sklearn import preprocessing

""" MIN MAX SCALER """

min_max_scaler = preprocessing.MinMaxScaler(feature_range
=(0, 1))

# Scaled feature
x_after_min_max_scaler = min_max_scaler.fit_transform(x)
print ("\nAfter min max Scaling : \n",
x_after_min_max_scaler)

""" Standardisation """

Standardisation = preprocessing.StandardScaler()

# Scaled feature
x_after_Standardisation = Standardisation.fit_transform(x)
21
print ("\nAfter Standardisation : \n",
Example of Standard and Min-Max
● Output:

22
Example of Standard and Min-Max
● Output:

23
Common Variable Transformations

Feature Type Transformation

Continuous: numerical values - Standard, Min-Max, Robust Scaling
- Nominal: categorical, unordered features - Binary, One-hot Encoding (0, 1)
(True or False)

24
Common Variable Transformations

Feature Type Transformation

Continuous: numerical values - Standard, Min-Max, Robust Scaling
- Nominal: categorical, unordered features - Binary, One-hot Encoding (0, 1)
(True or False)
- Ordinal: categorical, ordered features - Ordinal Encoding (0, 1, 2, 3)
(movie ratings)

25
Common Variable Transformations
● Categorical features are generally divided into 3 types:
○ A. Binary: Either/or
Examples:
■ Yes, No
■ True, False

○ B. Ordinal: Specific ordered Groups.

Examples:
■ low, medium, high
■ cold, hot, lava Hot

○ Nominal: Unordered Groups. Examples

Examples:
■ cat, dog, tiger
■ pizza, burger, coke 26
Example of Encoding
● Assume that we apply Label Encoding on iris dataset on the target column
which is Species. This column contains three species Iris-setosa, Iris-
versicolor, Iris-virginica.
# Import label encoder
from sklearn import preprocessing

# label_encoder object knows how to understand word labels.

label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.

df['species']= label_encoder.fit_transform(df['species'])

df['species'].unique()

Output: 27
Summary
● Feature Engineering and Variable Transformation
○ Transforming variables helps to meet the assumptions of statistical
models. A concrete example is a linear regression, in which you may
transform a predictor variable such that it has a linear relation with a
target variable.
○ Common variable transformations are: calculating log transformations
and polynomial features, encoding a categorical variable, and scaling a
variable.

28
Learning Recap
In this section, we discussed:
- Feature engineering and variable transformation
- Feature encoding
- Feature scaling

Feature Engineering
100% (2)
Feature Engineering
76 pages
4 Data Pre Processing II
No ratings yet
4 Data Pre Processing II
26 pages
Unit II
No ratings yet
Unit II
119 pages
ML 3170724 Unit-4
No ratings yet
ML 3170724 Unit-4
97 pages
Data Preparation.2
No ratings yet
Data Preparation.2
18 pages
Cours Data
No ratings yet
Cours Data
51 pages
Unit No: 4 Basics of Feature Engineering (31707 24)
No ratings yet
Unit No: 4 Basics of Feature Engineering (31707 24)
98 pages
PPA Data Preparation
No ratings yet
PPA Data Preparation
31 pages
AI-Module 4 - Updated
No ratings yet
AI-Module 4 - Updated
53 pages
3 - AML - Lecture 3 - Feature Engg
No ratings yet
3 - AML - Lecture 3 - Feature Engg
39 pages
Feature Engineering
No ratings yet
Feature Engineering
43 pages
Week 6. Data Preparation and Transformation
No ratings yet
Week 6. Data Preparation and Transformation
34 pages
Data Transformation
No ratings yet
Data Transformation
5 pages
L1 - Data Pre-Processing & Steps of Building A Model
No ratings yet
L1 - Data Pre-Processing & Steps of Building A Model
30 pages
Unit 2 Feature Engineering
No ratings yet
Unit 2 Feature Engineering
64 pages
Features
No ratings yet
Features
5 pages
ML Unit2 Classppt
No ratings yet
ML Unit2 Classppt
44 pages
Ds 5
No ratings yet
Ds 5
9 pages
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
45 pages
Data Preprocessing
No ratings yet
Data Preprocessing
8 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
Lecture # 13 Data - Transformation - Techniques
No ratings yet
Lecture # 13 Data - Transformation - Techniques
36 pages
Feature Engineering
No ratings yet
Feature Engineering
50 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
Machine Learning With Python Data Preprocessing, Analysis and Visualization
No ratings yet
Machine Learning With Python Data Preprocessing, Analysis and Visualization
8 pages
Model Selection and Feature Engineering
No ratings yet
Model Selection and Feature Engineering
64 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
937-Module 05 PPT
No ratings yet
937-Module 05 PPT
14 pages
Exploring Categorical Data - Students
No ratings yet
Exploring Categorical Data - Students
40 pages
Explore Feature Engineering
No ratings yet
Explore Feature Engineering
10 pages
Week 10
No ratings yet
Week 10
50 pages
Feature Transformation
No ratings yet
Feature Transformation
6 pages
Unit 3-2
No ratings yet
Unit 3-2
15 pages
Unit 2 Part 2
No ratings yet
Unit 2 Part 2
6 pages
High School DXD 22 - Gremory of The Graduation Ceremony
56% (18)
High School DXD 22 - Gremory of The Graduation Ceremony
172 pages
ML - Week 04
No ratings yet
ML - Week 04
33 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
Unit 2 ML 2019
No ratings yet
Unit 2 ML 2019
91 pages
100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
Feature Engineering For Machine Learning
No ratings yet
Feature Engineering For Machine Learning
41 pages
DM Lab Cycle 2 1
No ratings yet
DM Lab Cycle 2 1
10 pages
ML Notes
No ratings yet
ML Notes
44 pages
DS Day 5
No ratings yet
DS Day 5
11 pages
Machine Learning Summer Training
No ratings yet
Machine Learning Summer Training
118 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
BBM en-GB 2015.4
100% (2)
BBM en-GB 2015.4
476 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
Machine: Learning
No ratings yet
Machine: Learning
24 pages
Sixteen Saviours or One?, John Perry. 1879
100% (3)
Sixteen Saviours or One?, John Perry. 1879
160 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
Machine Learning: by Team 2
No ratings yet
Machine Learning: by Team 2
41 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
No ratings yet
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
6 pages
Occult Signs and Symbols
100% (2)
Occult Signs and Symbols
27 pages
Linear and Nonlinear Programming Essentials
From Everand
Linear and Nonlinear Programming Essentials
Tanushri Kaniyar
No ratings yet
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Feature Engineering
No ratings yet
Feature Engineering
23 pages
Vb Net Programming
From Everand
Vb Net Programming
Martin Booch
No ratings yet
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
Specification 201 Quality Systems 14 April 2016.RCN-D1623234100
No ratings yet
Specification 201 Quality Systems 14 April 2016.RCN-D1623234100
59 pages
Water-Soluble Polymers For Petroleum Recovery PDF
No ratings yet
Water-Soluble Polymers For Petroleum Recovery PDF
355 pages
Pump Governer D6m Rastavljanje I Satavljanje
No ratings yet
Pump Governer D6m Rastavljanje I Satavljanje
33 pages
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
PE Pipe Fittings
No ratings yet
PE Pipe Fittings
41 pages
Psii
No ratings yet
Psii
8 pages
The Ordinary Skincare Routine
100% (1)
The Ordinary Skincare Routine
4 pages
Metal Losses in Pyrometallurgical Operations - A Review - Bellemans Et Al., 2018
No ratings yet
Metal Losses in Pyrometallurgical Operations - A Review - Bellemans Et Al., 2018
17 pages
Proficiency In: Examination For The Certificate of English
No ratings yet
Proficiency In: Examination For The Certificate of English
7 pages
Previews 1928680 Pre
No ratings yet
Previews 1928680 Pre
7 pages
TEP-North Region: Huawei Technologies Co., LTD
No ratings yet
TEP-North Region: Huawei Technologies Co., LTD
5 pages
The Song of Mahamudra
No ratings yet
The Song of Mahamudra
8 pages
1.3.1. Exploratory Data Analysis
No ratings yet
1.3.1. Exploratory Data Analysis
24 pages
JY-HSR3323 Boat Liferaft & Crane
No ratings yet
JY-HSR3323 Boat Liferaft & Crane
29 pages
1.4.1. Estimation and Inference
No ratings yet
1.4.1. Estimation and Inference
30 pages
Airgoo: Airbrush Compressor General Manual
No ratings yet
Airgoo: Airbrush Compressor General Manual
128 pages
Chemical Kinetics QB
No ratings yet
Chemical Kinetics QB
12 pages
Damac Construction Update June 2025
No ratings yet
Damac Construction Update June 2025
34 pages
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
No ratings yet
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
35 pages
4.1.1. Introduction To Unsupervised Learning
No ratings yet
4.1.1. Introduction To Unsupervised Learning
26 pages
1.1.2. Modern AI - Applications and Machine Learning Workflow
No ratings yet
1.1.2. Modern AI - Applications and Machine Learning Workflow
26 pages
Sahodaya Pre Board Examination - 2021-22: Class-XII
No ratings yet
Sahodaya Pre Board Examination - 2021-22: Class-XII
6 pages
032&58-CIR v. Marubeni Corp., December 18, 2001
No ratings yet
032&58-CIR v. Marubeni Corp., December 18, 2001
13 pages
CCA Shree Cement
No ratings yet
CCA Shree Cement
10 pages
Adaptive Quadrature - Revisited
No ratings yet
Adaptive Quadrature - Revisited
18 pages
Single Function Timers Model No. 740
No ratings yet
Single Function Timers Model No. 740
6 pages
2024 Acuvue Price List
No ratings yet
2024 Acuvue Price List
2 pages
Electric Spoon
No ratings yet
Electric Spoon
2 pages
WinGD - TIN036 1 - Update On Dual Fuel Methanol Engine Development
No ratings yet
WinGD - TIN036 1 - Update On Dual Fuel Methanol Engine Development
2 pages
Boiler Report
No ratings yet
Boiler Report
1 page
PiCar-X v2 Assembly Instructions
No ratings yet
PiCar-X v2 Assembly Instructions
2 pages
Isaac Asiedu CV
No ratings yet
Isaac Asiedu CV
5 pages