0% found this document useful (0 votes)
7 views

Unit 2 Feature Engineering

Uploaded by

khanishkav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unit 2 Feature Engineering

Uploaded by

khanishkav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Unit-2

Feature Engineering:
• Introduction
• Feature Transformation
• Subset Selection
Modelling and Evaluation:
• Selecting a model
• Training model
• Model representation
• Evaluating and Improving model performance
Unit-2
Feature Engineering
Unit-2
Feature Engineering:
• Introduction
• Feature Transformation
• Subset Selection
Feature Engineering for Machine Learning

What is a Feature
• In the context of machine learning, a feature (also known as a variable or
attribute) is an individual measurable property or characteristic of a data point
that is used as input for a machine learning algorithm.
• Features can be numerical, categorical or text-based, and they represent
different aspects of the data that are relevant to the problem at hand.
• For example, in a dataset of housing prices, features could include the number
of bedrooms, the square footage, the location, and the age of the property.
• The choice and quality of features are critical in machine learning, as they can
greatly impact the accuracy and performance of the model.
Dataset features-IRIS

Petal_width Petal_length Sepal_width Sepal_length Species_name


0.2 1.4 3.5 5.1 Setosa
1.5 1.4 3 4.9 Versicolor
2.2 1.3 3.2 4.7 Setosa
1.2 1.5 3.1 4.6 Versicolor
0.2 1.4 3.6 5 Setosa
0.4 1.7 3.9 5.4 Setosa
0.3 1.4 3.4 4.6 Setosa
2.3 1.5 3.4 5 Versicolor
Contd..
• Feature engineering is the pre-processing step of machine learning.
• Feature engineering is the process of transforming raw data into features that
are suitable for machine learning models.
• It involves selecting relevant information from raw data and transforming it into
a format that can be easily understood by a model.
• The goal is to improve model accuracy by providing more meaningful and
relevant information.
• The success of machine learning models heavily depends on the quality of the
features used to train them.
• Feature engineering involves a set of techniques that enable us to create new
features by combining or transforming the existing ones.
• These techniques help to highlight the most important patterns and relationships
in the data, which in turn helps the machine learning model to learn from the data
more effectively.
• Features to be engineered to improve the performance of machine
learning models by providing them with relevant and informative
input data.
• Raw data may contain noise, irrelevant information, or missing
values, which can lead to inaccurate or biased model predictions.
• By engineering features, we can extract meaningful information from
the raw data.
• Feature engineering is a crucial step in preparing data for analysis and
decision-making in various fields, such as finance, healthcare,
marketing, and social sciences.
Processes Involved in Feature Engineering

• Feature engineering in ML contains two major elements:


• Feature Transformation: transforms the data –structured or unstructured,
into a new set of features, that can represent the underlying problem that ML
is trying to solve.
• Feature Subset Selection: Its objective is to derive a subset of features from
full feature set that is most meaningful in the context of ML problem
Processes Involved in Feature Engineering

• Feature engineering in ML contains two major elements:


• Feature Transformation
• Variants:
1. Feature Construction: This process discovers missing information
about the relationships between features and augments the feature
space by creating additional features
2. Feature Extraction: It is the process of extracting a new set of
features from the original set of features using some functional
mapping
Feature Construction
Feature Construction
Feature Creation:
• Feature Creation is the process of generating new features based on
domain knowledge or by observing patterns in the data.
• It is a form of feature engineering that can significantly improve the
performance of a machine-learning model.
Types of Feature Creation:
1.Domain-Specific: Creating new features based on domain knowledge,
such as creating features based on business rules or industry standards.
2.Data-Driven: Creating new features by observing patterns in the data,
such as calculating aggregations or creating interaction features.
3.Synthetic: Generating new features by combining existing features or
synthesizing new data points.
• Creating features involves creating new variables which will be most
helpful for our model. This can be adding or removing some features.
Example: the cost per sq. ft column is a feature creation.
• Below are the prices of properties in x city. It shows the area of the
house and total price.
Contd..
• The data may have some errors or may be incorrect, not all sources on the
internet are correct. To begin, we’ll add a new column to display the cost per
square foot.

• This new feature will help us understand a lot about our data. So, we have a new
column which shows cost per square ft.
Contd..
Benefits of Feature Creation:
1.Improves Model Performance: By providing additional and more
relevant information to the model, feature creation can increase the
accuracy and precision of the model.
2.Increases Model Robustness: By adding additional features, the model
can become more robust to outliers and other anomalies.
3.Improves Model Interpretability: By creating new features, it can be
easier to understand the model’s predictions.
4.Increases Model Flexibility: By adding new features, the model can be
made more flexible to handle different types of data.
Feature Construction is an essential activity
Feature Construction: Encoding nominal variables
Feature Construction: Encoding categorical(ordinal) variables
Feature Construction: Encoding categorical variables
One-Hot Encoding:
• One-hot encoding is a technique used to transform categorical variables
into numerical values that can be used by machine learning models.
• In this technique, each category is transformed into a binary value
indicating its presence or absence.
• For example, consider a categorical variable “Colour” with three
categories: Red, Green, and Blue.
• One-hot encoding would transform this variable into three binary
variables: Colour_Red, Colour_Green, and Colour_Blue, where the value
of each variable would be 1 if the corresponding category is present and
0 otherwise.
Feature Construction: Encoding numeric to categorical(ordinal)
variables
Feature Construction: Text-specific data (Bag-of-Words)

Vectorization Process for Text Corpus


1. Tokenize
2. Count
3. Normalize

Document-Term Matrix
General Types of Feature Transformation:
1.Normalization: Rescaling the features to have a similar range, such as
between 0 and 1, to prevent some features from dominating others.
2.Scaling: Rescaling the features to have a similar scale, such as having a
standard deviation of 1, to make sure the model considers all features
equally.
3.Encoding: Transforming categorical features into a numerical
representation. Examples are one-hot encoding and label encoding.
4.Transformation: Transforming the features using mathematical operations
to change the distribution or scale of the features. Examples are
logarithmic, square root, and reciprocal transformations.
Contd..
Benefits of Feature Transformation:

1.Improves Model Performance: By transforming the features into a


more suitable representation, the model can learn more meaningful
patterns in the data.
2.Increases Model Robustness: Transforming the features can make the
model more robust to outliers and other anomalies.
3.Improves Computational Efficiency: The transformed features often
require fewer computational resources.
4.Improves Model Interpretability: By transforming the features, it can
be easier to understand the model’s predictions.
Feature Extraction in ML
Feature Extraction Examples used in ML
Feature Extraction Algorithms used in ML

• Principal Component Analysis (PCA)


• Singular Value Decomposition (SVD)
• Linear Discriminant Analysis (LDA)
Principal Component Analysis
• In PCA, a new set of features are extracted from the original features that are quite dissimilar in
nature.
• So, an n-dimensional feature space gets transformed to an m-dimensional feature space, where
the dimensions are orthogonal to each other, i.e. completely independent o each other.

• A vector is a quantity having both magnitude and direction and hence can determine the position
of a point relative to another point in the Euclidean space.
• A vector space is a set of vectors.
• Vector spaces have a property that they can be represented as a linear combination of smaller set
of vectors, called basis vectors.
• So, any vector ‘v’ in a vector space can be represented by using a, that represents ‘n’ scalars and
u represents the basis vectors, as
Principal Component Analysis

• Let us extend this notion to the feature space of a data set


• The feature vector can be transformed to a vector space of the basis vectors
which are termed as principal components.
• A set of feature vectors that have similarity with each other is transformed to a
set of principal components that are completely unrelated
• The principal components capture the variability of the original feature space
• The number of components derived, is much smaller than the original set of
features.
Principal Component Analysis
Principal Component Analysis: Steps

https://www.kdnuggets.com/2023/05/principal-component-
analysis-pca-scikitlearn.html
Principal Component Analysis

https://www.geeksforgeeks.org/covariance-matrix/
Principal Component Analysis: Steps
*Note: Standardize the features of
dataset by removing the mean and
* scaling to unit variance so that each
feature has μ = 0 and σ = 1.
Singular Value Decomposition (SVD)

• SVD is a matrix factorization technique commonly used in linear


algebra.
• SVD of a matrix A(m*n) is a factorization of the form:
Singular Value Decomposition (SVD)

• When the dataset is sparse (as in case of text data), it is not advisable to
remove the mean of a data attribute.
• SVD is a good choice for dimensionality reduction in those situations
than PCA.

https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/
Singular Value Decomposition (SVD)

SVD on text Source Code


Linear Discriminant Analysis (LDA)

• LDA is another commonly used feature extraction technique like PCA or SVD.
• The objective of LDA is to transform a dataset into a lower dimensional feature
space
• The focus of LDA is not to capture the dataset variability
• Instead, LDA focuses on class separability, i.e. separating the features based on
class separability so as to avoid overfitting of the machine learning model
• LDA calculates eigen values and eigen vectors within a class and inter-class
scatter matrices.

https://www.statology.org/scree-plot-python/
Linear Discriminant Analysis (LDA)
Steps to be followed are given below:
1. Calculate the mean vectors for the individual classes
2. Calculate intra-class and inter-class scatter matrices
3. Calculate eigen values and eigen vectors for Sw and SB where Sw is the intra-class scatter matrix and SB is
the inter-class scatter matrix
Feature Selection

Task: predicting weights of students


Issues in High-dimensional data
Objective of Feature Selection
Key drivers of Feature Selection
• Feature Relevance
• Redundancy in Features
Measures of Feature Relevance
• Mutual Information
• Entropy of feature
Measures of Feature Relevance
• Entropy of Feature: Shannon’s Formula
Measures of Feature Relevance
• Mutual Information: Higher the value of mutual information of feature, more relevant is that
feature.
Measures of Feature Relevance
• For supervised learning, mutual information is considered
a good measure
• For unsupervised learning, there is no class variable, hence
feature to class mutual information is not a measure.
• Entropy of set of features without one feature at a time is
calculated for all the features
• Features are ranked in descending order of information gain*
from a feature and top ‘p%’ are considered as relevant features.

https://medium.com/@ompramod9921/decision-trees-
6a3c05e9cb82
Measures of Feature redundancy

• Feature redundancy is based on similar information contribution by


multiple features.
• Measures of similarity of information contribution:
1. Correlation-based measures
2. Distance-based measures
3. Other coefficient-based measure
Measures of Feature redundancy
• Correlation-based measure: It is a measure of linear dependency between two
random variables.
• Pearson’s product moment correlation coefficient for two feature variables F1
and F2 is defined as:
Measures of Feature redundancy
• Distance-based measures:
• Euclidean distance
• Minkowski distance
• Manhattan distance
• Hamming distance
Measures of Feature redundancy
• Distance-based measures:
• Euclidean distance is the most common distance between two random
feature variables F1 and F2 (for n rows) defined as:
Measures of Feature redundancy
• Euclidean Distance: Example
Measures of Feature redundancy
• Euclidean Distance: Example
Measures of Feature redundancy
• Distance-based measures:
Measures of Feature redundancy
• Hamming Distance: A special case of Manhattan distance is the Hamming
distance which measures the distance between binary vectors.
• Example: Hamming distance between 01101011 and 11001001 is 3
Measures of Feature redundancy
• Other Distance based measures:
• Jaccard index/coefficient is a measure of similarity between two features. Jaccard distance is
a measure of dissimilarity between two features, complimentary of Jaccard index.
Measures of Feature redundancy
• Jaccard index/coefficient: Example
Measures of Feature redundancy
• Simple Matching Coefficient (SMC):
Measures of Feature redundancy
• Cosine Similarity: It is the most popular measure in text classification.
• It measures the angle between two vectors.
• Cosine similarity=1=> x and y are similar
• Cosine similarity=0=> x and y do not share any similarity.

Cosine similarity between two features x and y is given by:


Measures of Feature redundancy
• Cosine Similarity:
Feature Selection process
Types of approaches for Feature Selection
Filter approach for Feature Selection:

• It is based on statistical measures like Pearson’s correlation, ANOVA,


Information Gain, Fisher score, Chi-square etc.
• No learning algorithm is employed to evaluate the goodness of the feature
selected
Wrapper approach for Feature Selection:
• Inductive Learning algorithm is employed to evaluate the goodness of the
feature selected.
• In this approach, for every candidate subset, the learning model is trained and
result is evaluated by running the learning algorithm.
• It is computationally expensive but generally has better performance.
Hybrid approach for Feature Selection:
Embedded approach for Feature Selection:

You might also like