0% found this document useful (0 votes)

52 views41 pages

Feature Engineering For Machine Learning

Feature engineering is the process of transforming raw data into features that can be used for machine learning models. It involves techniques like feature creation, transformation, extraction, selection, and scaling. The goal is to improve model performance by providing relevant input data. Key steps include data cleaning, transformation, extraction of important features, selection of important variables, feature iteration and splitting, and handling of missing data and categorical variables.

Uploaded by

mngcezyvlmcpadgnpn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views41 pages

Feature Engineering For Machine Learning

Uploaded by

mngcezyvlmcpadgnpn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Feature Engineering for

Machine Learning
• Feature engineering is the pre-processing step of machine learning,
which is used to transform raw data into features that can be used
for creating a predictive model using Machine learning
• In other words, it is the process of selecting, extracting, and
transforming the most relevant features from the available
data to build more accurate and efficient machine learning
models.
• Feature engineering involves a set of techniques that enable us
to create new features by combining or transforming the
existing ones.
Need for feature Engineering
• to improve the performance of machine learning models by
providing them with relevant and informative input data.
• Feature engineering can also help in addressing issues such as
overfitting, underfitting, and high dimensionality.
• feature engineering is a crucial step in preparing data for
analysis and decision-making in various fields, such as finance,
healthcare, marketing, and social sciences.
Processes Involved in Feature
Engineering
• Feature engineering in Machine learning consists of mainly 5
processes:
1) Feature Creation,
2) Feature Transformation,
3) Feature Extraction,
4) Feature Selection, and
5) Feature Scaling.
• The success of a machine learning model largely depends on
the quality of the features used in the model.
Feature Creation
• Feature Creation is the process of generating new features
based on domain knowledge or by observing patterns in the
data.
• The new features are created by mixing existing features using
addition, subtraction, and ration, and these new features have great
flexibility.
Types of Feature Creation:
• Domain-Specific, Creating new features based on domain
knowledge
• Data-Driven, Creating new features by observing patterns in
the data
• Synthetic, Generating new features by combining existing
features
Benefits of Feature Creation
• Improves Model Performance
• Increases Model Robustness
• Improves Model Interpretability
• Increases Model Flexibility
2. Feature Transformation

• Feature Transformation is the process of transforming the

features into a more suitable representation for the machine
learning model.
Types of Feature Transformation:

• Normalization: Rescaling the features to have a similar range,

such as between 0 and 1, to prevent some features from
dominating others.
• Scaling: Rescaling the features to have a similar scale, such as
having a standard deviation of 1, to make sure the model
considers all features equally.
• Encoding: Transforming categorical features into a numerical
representation. Examples are one-hot encoding and label encoding.
• Transformation: Transforming the features using mathematical
operations to change the distribution or scale of the features.
Examples are logarithmic, square root, and reciprocal
transformations.
3. Feature Extraction

• Feature Extraction is the process of creating new features from

existing ones to provide more relevant information to the
machine learning model.
• The main aim of this step is to reduce the volume of data so that it
can be easily used and managed for data modelling.
• Feature extraction methods include cluster analysis, text analytics,
edge detection algorithms, and principal components analysis (PCA).
Types of Feature Extraction
• Dimensionality Reduction, Reducing the number of features by
transforming the data into a lower-dimensional space while
retaining important information. Examples are PCA and t-SNE.
• Feature Combination, Combining two or more existing features to
create a new one. For example, the interaction between two
features.
• Feature Aggregation, Aggregating features to create a new one.
For example, calculating the mean, sum, or count of a set of
features.
• Feature Transformation, Transforming existing features into a new
representation. For example, log transformation of a feature with a
skewed distribution.
4. Feature Selection

• Feature selection is a way of selecting the subset of the most

relevant features from the original features set by removing the
redundant, irrelevant, or noisy features.
Types of Feature Selection
• Filter Method, Based on the statistical measure of the
relationship between the feature and the target variable.
Features with a high correlation are selected.
• Wrapper Method, Based on the evaluation of the feature
subset using a specific machine learning algorithm. The feature
subset that results in the best performance is selected.
• Embedded Method, Based on the feature selection as part of
the training process of the machine learning algorithm.
Feature Scaling

• Feature Scaling is the process of transforming the features so

that they have a similar scale.
• Types of Feature Scaling:
• Min-Max Scaling, subtracting the minimum value and dividing by the
range.
• Standard Scaling, Rescaling the features to have a mean of 0 and a
standard deviation of 1 by subtracting the mean and dividing by the
standard deviation.
• Robust Scaling, Rescaling the features to be robust to outliers by
dividing them by the interquartile range.
Steps to Feature Engineering

• Data Cleaning (removing or correcting any errors or inconsistencies )

• Data Transformation (normalization, standardization, and log transformation)
• Feature Extraction (principal component analysis (PCA), text parsing, and image
processing)

• Feature Selection (correlation analysis, mutual information, and stepwise regression)

• Feature Iteration (adding new features, removing redundant features and transforming
features in different ways. Binning is the process of grouping continuous features into discrete
bins)

• Feature Split (splitting a single variable into multiple variables)

Feature engineering techniques
• Missing data imputation
• Categorical encoding
• Variable transformation
• Outlier engineering
Missing data imputation

1.Complete case analysis

2.Mean / Median / Mode imputation
3.Missing Value Indicator
• Complete Case Analysis for Missing Data Imputation
• remove all the observations that contain missing values
• can only be used when there are only a few observations which
has a missing dataset
# check how many observations we would drop
print('total passengers with values in all variables: ', data1.dropna().shape[0])
print('total passengers in the Titanic: ', data1.shape[0])
print('percentage of data without missing values: ', data1.dropna().shape[0]/ np.float(data1.shape[0]))
So, we have complete information for only 20% of our observations in the Titanic dataset.
Thus, Complete Case Analysis method would not be an option for this dataset.
Mean/ Median/ Mode for Missing Data
Imputation
• Missing values can also be replaced with the mean, median, or
mode of the variable(feature)

• Output: 0, no null value in Age feature.

Missing Value Indicator For Missing Value
Indication
• This technique involves adding a binary variable to indicate
whether the value is missing for a certain observation.
• Output:
So, the Age_NA variable was created to capture the missingness.
Categorical encoding in Feature Engineering

• There are multiple techniques to do so:

1.One-Hot encoding (OHE)

2.Ordinal encoding

3.Count and Frequency encoding

4.Target encoding / Mean encoding

One-Hot Encoding

• It is a commonly used technique for encoding categorical

variables. It basically creates binary variables for each category
present in the categorical variable.
• These binary variables will have 0 if it is absent in the category
or 1 if it is present. Each new variable is called a dummy variable
or binary variable.
only 1 dummy variable to represent Sex
categorical variable
Ordinal Encoding

• In this case, a simple way to encode is to replace the labels with

some ordinal number. Look at sample code:
Count and Frequency Encoding

• In this encoding technique, categories are replaced by the count

of the observations that show that category in the dataset.
• Replacement can also be done with the frequency of the
percentage of observations in the dataset.
• Suppose, if 30 of 100 genders are male we can replace male
with 30 or by 0.3.
Target / Mean Encoding

• replace each category of a variable with the mean value of the

target for the observations that show a certain category.
Variable Transformation

• Machine learning algorithms like linear and logistic regression

assume that the variables are normally distributed.
• If a variable is not normally distributed, sometimes it is possible
to find a mathematical transformation so that the transformed
variable is Gaussian.
• Commonly used mathematical transformations are:
1.Logarithm transformation – log(x)
2.Square root transformation – sqrt(x)
3.Reciprocal transformation – 1 / x
4.Exponential transformation – exp(x)
• Loading numerical features of the titanic dataset.
• Now, to visualize the distribution of the age variable we will plot
histogram and Q-Q-plot.
Now, let’s apply the transformation and
compare the transformed Age variable.
• Logarithmic transformation
Square root transformation – sqrt(x)
Reciprocal transformation – 1 / x
Exponential transformation – exp(x)

Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
Unit 4 Basics of Feature Engineering
No ratings yet
Unit 4 Basics of Feature Engineering
33 pages
Feature Engineering PDF
No ratings yet
Feature Engineering PDF
19 pages
Facilitating Learning
92% (61)
Facilitating Learning
23 pages
Unit 2 Part 2
No ratings yet
Unit 2 Part 2
6 pages
AI-Module 4 - Updated
No ratings yet
AI-Module 4 - Updated
53 pages
Model Selection and Feature Engineering
No ratings yet
Model Selection and Feature Engineering
64 pages
What Is Feature Engineering
No ratings yet
What Is Feature Engineering
2 pages
ML - Unit-2 FULL - Feature Engineering Theory-13!09!24-1
No ratings yet
ML - Unit-2 FULL - Feature Engineering Theory-13!09!24-1
29 pages
Class PPT - Unit2
No ratings yet
Class PPT - Unit2
139 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
Unit 3-2
No ratings yet
Unit 3-2
15 pages
Unit 2 Feature Engineering
No ratings yet
Unit 2 Feature Engineering
64 pages
NN 7
No ratings yet
NN 7
26 pages
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
No ratings yet
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
69 pages
Semi Supervised Learning
No ratings yet
Semi Supervised Learning
86 pages
Feature Engineering
No ratings yet
Feature Engineering
6 pages
Summery of Feature Eng
No ratings yet
Summery of Feature Eng
4 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
Unit II
No ratings yet
Unit II
119 pages
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
No ratings yet
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
6 pages
Feature Engineering and Normalization
No ratings yet
Feature Engineering and Normalization
7 pages
Feature Engineering
No ratings yet
Feature Engineering
11 pages
Machine: Learning
No ratings yet
Machine: Learning
24 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
UNIT04
No ratings yet
UNIT04
35 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Unit 4
No ratings yet
Unit 4
25 pages
ML-Unit 3
No ratings yet
ML-Unit 3
58 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
ML Notes
No ratings yet
ML Notes
44 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Machine Learning (Feature Engineering)
No ratings yet
Machine Learning (Feature Engineering)
10 pages
Steps Assignment
No ratings yet
Steps Assignment
6 pages
Week 10
No ratings yet
Week 10
50 pages
Machine - Learning Note Modul2
No ratings yet
Machine - Learning Note Modul2
20 pages
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
No ratings yet
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
11 pages
NOTES
No ratings yet
NOTES
9 pages
ML Unit2 Classppt
No ratings yet
ML Unit2 Classppt
44 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
Feature Engineering
No ratings yet
Feature Engineering
2 pages
Feature Engineering
No ratings yet
Feature Engineering
50 pages
04 - Feature Engineering
No ratings yet
04 - Feature Engineering
28 pages
Data
No ratings yet
Data
36 pages
Feature Engineering PDF
No ratings yet
Feature Engineering PDF
19 pages
ML UNIT 2 2 Old
No ratings yet
ML UNIT 2 2 Old
15 pages
01 - Feature Engg
No ratings yet
01 - Feature Engg
43 pages
ML - Week 04
No ratings yet
ML - Week 04
33 pages
Feature Engineering
No ratings yet
Feature Engineering
15 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Deep Learning Vocabulary
No ratings yet
Deep Learning Vocabulary
6 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
ML 3170724 Unit-4
No ratings yet
ML 3170724 Unit-4
97 pages
Explore Feature Engineering
No ratings yet
Explore Feature Engineering
10 pages
PPA Data Preparation
No ratings yet
PPA Data Preparation
31 pages
4 Data Pre Processing II
No ratings yet
4 Data Pre Processing II
26 pages
DM - MOD - 1 Part III
No ratings yet
DM - MOD - 1 Part III
12 pages
Angular Performance Optimization: Everything you need to know
From Everand
Angular Performance Optimization: Everything you need to know
Abdelfattah Ragab
No ratings yet
Appreciative Inquiry: References
No ratings yet
Appreciative Inquiry: References
5 pages
Elevator Pitch Hand Out
No ratings yet
Elevator Pitch Hand Out
2 pages
Project Template
No ratings yet
Project Template
11 pages
Learning Competency Directory
100% (1)
Learning Competency Directory
6 pages
Problem Analysis Methods
0% (1)
Problem Analysis Methods
24 pages
COR2100 - Que VU - Term2 - 2223 (Updated)
No ratings yet
COR2100 - Que VU - Term2 - 2223 (Updated)
6 pages
Week001 Module
No ratings yet
Week001 Module
5 pages
7e's Science Lesson Plan
100% (1)
7e's Science Lesson Plan
5 pages
Beauty Sickness
No ratings yet
Beauty Sickness
8 pages
2nd-QUARTER-EXAM-key To Correction 2.0
No ratings yet
2nd-QUARTER-EXAM-key To Correction 2.0
5 pages
122 - PED12 - Lesson Proper For Week 2
No ratings yet
122 - PED12 - Lesson Proper For Week 2
5 pages
Class 1 Holidays Home Work
No ratings yet
Class 1 Holidays Home Work
11 pages
TPS 30
No ratings yet
TPS 30
40 pages
2nd Provisional Merit List of LP Hailakandi
No ratings yet
2nd Provisional Merit List of LP Hailakandi
26 pages
(Ebook) Sailing School Navigating Science and Skill, 1550-1800 by Margaret E. Schotte ISBN 9781421429533, 9781421429540, 1421429535, 1421429543
100% (1)
(Ebook) Sailing School Navigating Science and Skill, 1550-1800 by Margaret E. Schotte ISBN 9781421429533, 9781421429540, 1421429535, 1421429543
81 pages
SCHOLARSHIP AGREEMENT FORM 2023 PUBLIC - v3
No ratings yet
SCHOLARSHIP AGREEMENT FORM 2023 PUBLIC - v3
4 pages
1585234894iuf Presentation Avans Uas
No ratings yet
1585234894iuf Presentation Avans Uas
48 pages
General Biology
No ratings yet
General Biology
12 pages
JARVIS-1: Open-World Multi-Task Agents With Memory-Augmented Multimodal Language Models
No ratings yet
JARVIS-1: Open-World Multi-Task Agents With Memory-Augmented Multimodal Language Models
28 pages
AQA GCSE Specimen Paper Business Studies Exam
No ratings yet
AQA GCSE Specimen Paper Business Studies Exam
16 pages
Edutourism: The Nigeria Educational Challenges and International Students' Choice of Study in Nigerian Universities
No ratings yet
Edutourism: The Nigeria Educational Challenges and International Students' Choice of Study in Nigerian Universities
13 pages
Yoruba Three Value Logic (New)
No ratings yet
Yoruba Three Value Logic (New)
18 pages
Media Literacy Booklet Emedia Project Final ENG 1
No ratings yet
Media Literacy Booklet Emedia Project Final ENG 1
22 pages
French 8 Grammar PDF
No ratings yet
French 8 Grammar PDF
11 pages
Mapeh 6 1st Quarter CM
100% (1)
Mapeh 6 1st Quarter CM
10 pages
Cooperative-Learning-In-Foreign-Language-Teaching (SS)
No ratings yet
Cooperative-Learning-In-Foreign-Language-Teaching (SS)
10 pages
S9 Q4 Hybrid Module 2 Week 3 Conservation of Momentum
No ratings yet
S9 Q4 Hybrid Module 2 Week 3 Conservation of Momentum
19 pages
English Class 2
No ratings yet
English Class 2
48 pages
IDBI
No ratings yet
IDBI
3 pages

Feature Engineering For Machine Learning

Uploaded by

Feature Engineering For Machine Learning

Uploaded by

Feature Engineering for

• Feature Transformation is the process of transforming the

• Normalization: Rescaling the features to have a similar range,

• Feature Extraction is the process of creating new features from

• Feature selection is a way of selecting the subset of the most

• Feature Scaling is the process of transforming the features so

• Data Cleaning (removing or correcting any errors or inconsistencies )

• Feature Selection (correlation analysis, mutual information, and stepwise regression)

• Feature Split (splitting a single variable into multiple variables)

1.Complete case analysis

• Output: 0, no null value in Age feature.

• There are multiple techniques to do so:

1.One-Hot encoding (OHE)

3.Count and Frequency encoding

4.Target encoding / Mean encoding

• It is a commonly used technique for encoding categorical

• In this case, a simple way to encode is to replace the labels with

• In this encoding technique, categories are replaced by the count

• replace each category of a variable with the mean value of the

• Machine learning algorithms like linear and logistic regression

You might also like