0% found this document useful (0 votes)
37 views29 pages

1.3.2. Feature Engineering and Variable - Transformation

Uploaded by

havietthang02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views29 pages

1.3.2. Feature Engineering and Variable - Transformation

Uploaded by

havietthang02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Feature Engineering and

Variable and Transformation

1
Learning Goals
In this section, we will cover:
- Feature engineering and variable transformation
- Feature encoding
- Feature scaling

2
Transforming Data: Background
Models used in Machine Learning Workflows often make assumptions
about the data.

A common example is the linear regression model.


This assumes a linear relationship between observations and target
(outcome) variables.

3
Transforming Data: Background

4
Transformation of Data Distributions
Predictions from linear regression models assume residuals are normally distributed.

Features and predicted data are often skewed (distorted away from the center).

Data transformations can solve this issue.

5
Transformation of Data Distributions

6
Log Transformation Example

7
Transformations: Log Features

Log transformations can be useful


for linear regression.

The linear regression model involves


linear combinations of features.

8
Transformations: Polynomial Features

We can estimate higher-order relationships in


this data by adding polynomial features.

This allows us to use the same 'linear' model.

9
Polynomial Features: Syntax

10
Transformations: Polynomial Features
We can estimate higher-order relationships
in this data by adding polynomial features.

This allows us to use the same 'linear' model.

Even with higher-order polynomials.

11
Variable Selection: Background
Variable selection involves choosing the set of features to include in the model.

Variables must often be transformed before they can be included in models.


In addition to log and polynomial transformations, this can involve:

- Encoding: converting non-numeric features to numeric features.


- Scaling: converting the scale of numeric data so they are comparable.

The appropriate method of scaling or encoding depends on the type of feature.

12
Feature Encoding: Types of Features
Encoding is often applied to categorical features, that take non-numeric values.
Two primary types:

- Nominal: categorical variables take values in unordered categories


(e.g. Red, Blue, Green; True, False).

- Ordinal: categorical variables take values in ordered categories


(e.g. High, Medium, Low).

13
Feature Encoding: Approaches
There are several common approaches to encoding variables:

- Binary encoding: converts variables to either 0 or 1 and is suitable for variables


that take two possible values (e.g. True, False).

- One-hot encoding: converts variables that take multiple values into binary (0, 1)
variables, one for each category. This creates several new variables.

- Ordinal encoding: involves converting ordered categories to numerical values,


usually by creating one variable that takes integer equal to the number of
categories (e.g. 0, 1, 2, 3, ...).

14
Feature Scaling: Background

Feature scaling involves adjusting a variable's scale.


This allows comparison of variables with different scales.
Different continuous (numeric) features often have different scales.
Why might this be an issue?

15
Feature Scaling: Example

16
Feature Scaling: Example

17
Feature Scaling: Approaches
There are many approaches to scaling features.
Some of the more common approaches include:

- Standard scaling: converts features to standard normal variables


(by subtracting the mean and dividing by the standard error).

- Min-max scaling: converts variables to continuous variables in the (0, 1) interval


by mapping minimum values to 0 and maximum values to 1.
This type of scaling is sensitive to outliers.

- Robust scaling: is similar to min-max scaling, but instead maps the interquartile
range (the 75th percentile value minus the 25th percentile value) to (0,1).
This means the variable itself takes values outside of the (0, 1) interval.
18
Common Variable Transformations

Feature Type Transformation


Continuous: numerical values - Standard, Min-Max, Robust Scaling

19
Example of Standard and Min-Max

20
Example of Standard and Min-Max
""“ Handling the missing values """
from sklearn import preprocessing

""" MIN MAX SCALER """


min_max_scaler = preprocessing.MinMaxScaler(feature_range
=(0, 1))

# Scaled feature
x_after_min_max_scaler = min_max_scaler.fit_transform(x)
print ("\nAfter min max Scaling : \n",
x_after_min_max_scaler)

""" Standardisation """


Standardisation = preprocessing.StandardScaler()

# Scaled feature
x_after_Standardisation = Standardisation.fit_transform(x)
21
print ("\nAfter Standardisation : \n",
Example of Standard and Min-Max
● Output:

22
Example of Standard and Min-Max
● Output:

23
Common Variable Transformations

Feature Type Transformation


Continuous: numerical values - Standard, Min-Max, Robust Scaling
- Nominal: categorical, unordered features - Binary, One-hot Encoding (0, 1)
(True or False)

24
Common Variable Transformations

Feature Type Transformation


Continuous: numerical values - Standard, Min-Max, Robust Scaling
- Nominal: categorical, unordered features - Binary, One-hot Encoding (0, 1)
(True or False)
- Ordinal: categorical, ordered features - Ordinal Encoding (0, 1, 2, 3)
(movie ratings)

25
Common Variable Transformations
● Categorical features are generally divided into 3 types:
○ A. Binary: Either/or
Examples:
■ Yes, No
■ True, False

○ B. Ordinal: Specific ordered Groups.


Examples:
■ low, medium, high
■ cold, hot, lava Hot

○ Nominal: Unordered Groups. Examples


Examples:
■ cat, dog, tiger
■ pizza, burger, coke 26
Example of Encoding
● Assume that we apply Label Encoding on iris dataset on the target column
which is Species. This column contains three species Iris-setosa, Iris-
versicolor, Iris-virginica.
# Import label encoder
from sklearn import preprocessing

# label_encoder object knows how to understand word labels.


label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.


df['species']= label_encoder.fit_transform(df['species'])

df['species'].unique()

Output: 27
Summary
● Feature Engineering and Variable Transformation
○ Transforming variables helps to meet the assumptions of statistical
models. A concrete example is a linear regression, in which you may
transform a predictor variable such that it has a linear relation with a
target variable.
○ Common variable transformations are: calculating log transformations
and polynomial features, encoding a categorical variable, and scaling a
variable.

28
Learning Recap
In this section, we discussed:
- Feature engineering and variable transformation
- Feature encoding
- Feature scaling

29

You might also like