0% found this document useful (0 votes)
10 views

Unit 2 Feature Engineering

Uploaded by

khanishkav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Unit 2 Feature Engineering

Uploaded by

khanishkav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Unit-2

Feature Engineering:
• Introduction
• Feature Transformation
• Subset Selection
Modelling and Evaluation:
• Selecting a model
• Training model
• Model representation
• Evaluating and Improving model performance
Unit-2
Feature Engineering
Unit-2
Feature Engineering:
• Introduction
• Feature Transformation
• Subset Selection
Feature Engineering for Machine Learning

What is a Feature
• In the context of machine learning, a feature (also known as a variable or
attribute) is an individual measurable property or characteristic of a data point
that is used as input for a machine learning algorithm.
• Features can be numerical, categorical or text-based, and they represent
different aspects of the data that are relevant to the problem at hand.
• For example, in a dataset of housing prices, features could include the number
of bedrooms, the square footage, the location, and the age of the property.
• The choice and quality of features are critical in machine learning, as they can
greatly impact the accuracy and performance of the model.
Dataset features-IRIS

Petal_width Petal_length Sepal_width Sepal_length Species_name


0.2 1.4 3.5 5.1 Setosa
1.5 1.4 3 4.9 Versicolor
2.2 1.3 3.2 4.7 Setosa
1.2 1.5 3.1 4.6 Versicolor
0.2 1.4 3.6 5 Setosa
0.4 1.7 3.9 5.4 Setosa
0.3 1.4 3.4 4.6 Setosa
2.3 1.5 3.4 5 Versicolor
Contd..
• Feature engineering is the pre-processing step of machine learning.
• Feature engineering is the process of transforming raw data into features that
are suitable for machine learning models.
• It involves selecting relevant information from raw data and transforming it into
a format that can be easily understood by a model.
• The goal is to improve model accuracy by providing more meaningful and
relevant information.
• The success of machine learning models heavily depends on the quality of the
features used to train them.
• Feature engineering involves a set of techniques that enable us to create new
features by combining or transforming the existing ones.
• These techniques help to highlight the most important patterns and relationships
in the data, which in turn helps the machine learning model to learn from the data
more effectively.
• Features to be engineered to improve the performance of machine
learning models by providing them with relevant and informative
input data.
• Raw data may contain noise, irrelevant information, or missing
values, which can lead to inaccurate or biased model predictions.
• By engineering features, we can extract meaningful information from
the raw data.
• Feature engineering is a crucial step in preparing data for analysis and
decision-making in various fields, such as finance, healthcare,
marketing, and social sciences.
Processes Involved in Feature Engineering

• Feature engineering in ML contains two major elements:


• Feature Transformation: transforms the data –structured or unstructured,
into a new set of features, that can represent the underlying problem that ML
is trying to solve.
• Feature Subset Selection: Its objective is to derive a subset of features from
full feature set that is most meaningful in the context of ML problem
Processes Involved in Feature Engineering

• Feature engineering in ML contains two major elements:


• Feature Transformation
• Variants:
1. Feature Construction: This process discovers missing information
about the relationships between features and augments the feature
space by creating additional features
2. Feature Extraction: It is the process of extracting a new set of
features from the original set of features using some functional
mapping
Feature Construction
Feature Construction
Feature Creation:
• Feature Creation is the process of generating new features based on
domain knowledge or by observing patterns in the data.
• It is a form of feature engineering that can significantly improve the
performance of a machine-learning model.
Types of Feature Creation:
1.Domain-Specific: Creating new features based on domain knowledge,
such as creating features based on business rules or industry standards.
2.Data-Driven: Creating new features by observing patterns in the data,
such as calculating aggregations or creating interaction features.
3.Synthetic: Generating new features by combining existing features or
synthesizing new data points.
• Creating features involves creating new variables which will be most
helpful for our model. This can be adding or removing some features.
Example: the cost per sq. ft column is a feature creation.
• Below are the prices of properties in x city. It shows the area of the
house and total price.
Contd..
• The data may have some errors or may be incorrect, not all sources on the
internet are correct. To begin, we’ll add a new column to display the cost per
square foot.

• This new feature will help us understand a lot about our data. So, we have a new
column which shows cost per square ft.
Contd..
Benefits of Feature Creation:
1.Improves Model Performance: By providing additional and more
relevant information to the model, feature creation can increase the
accuracy and precision of the model.
2.Increases Model Robustness: By adding additional features, the model
can become more robust to outliers and other anomalies.
3.Improves Model Interpretability: By creating new features, it can be
easier to understand the model’s predictions.
4.Increases Model Flexibility: By adding new features, the model can be
made more flexible to handle different types of data.
Feature Construction is an essential activity
Feature Construction: Encoding nominal variables
Feature Construction: Encoding categorical(ordinal) variables
Feature Construction: Encoding categorical variables
One-Hot Encoding:
• One-hot encoding is a technique used to transform categorical variables
into numerical values that can be used by machine learning models.
• In this technique, each category is transformed into a binary value
indicating its presence or absence.
• For example, consider a categorical variable “Colour” with three
categories: Red, Green, and Blue.
• One-hot encoding would transform this variable into three binary
variables: Colour_Red, Colour_Green, and Colour_Blue, where the value
of each variable would be 1 if the corresponding category is present and
0 otherwise.
Feature Construction: Encoding numeric to categorical(ordinal)
variables
Feature Construction: Text-specific data (Bag-of-Words)

Vectorization Process for Text Corpus


1. Tokenize
2. Count
3. Normalize

Document-Term Matrix
General Types of Feature Transformation:
1.Normalization: Rescaling the features to have a similar range, such as
between 0 and 1, to prevent some features from dominating others.
2.Scaling: Rescaling the features to have a similar scale, such as having a
standard deviation of 1, to make sure the model considers all features
equally.
3.Encoding: Transforming categorical features into a numerical
representation. Examples are one-hot encoding and label encoding.
4.Transformation: Transforming the features using mathematical operations
to change the distribution or scale of the features. Examples are
logarithmic, square root, and reciprocal transformations.
Contd..
Benefits of Feature Transformation:

1.Improves Model Performance: By transforming the features into a


more suitable representation, the model can learn more meaningful
patterns in the data.
2.Increases Model Robustness: Transforming the features can make the
model more robust to outliers and other anomalies.
3.Improves Computational Efficiency: The transformed features often
require fewer computational resources.
4.Improves Model Interpretability: By transforming the features, it can
be easier to understand the model’s predictions.
Feature Extraction in ML
Feature Extraction Examples used in ML
Feature Extraction Algorithms used in ML

• Principal Component Analysis (PCA)


• Singular Value Decomposition (SVD)
• Linear Discriminant Analysis (LDA)
Principal Component Analysis
• In PCA, a new set of features are extracted from the original features that are quite dissimilar in
nature.
• So, an n-dimensional feature space gets transformed to an m-dimensional feature space, where
the dimensions are orthogonal to each other, i.e. completely independent o each other.

• A vector is a quantity having both magnitude and direction and hence can determine the position
of a point relative to another point in the Euclidean space.
• A vector space is a set of vectors.
• Vector spaces have a property that they can be represented as a linear combination of smaller set
of vectors, called basis vectors.
• So, any vector ‘v’ in a vector space can be represented by using a, that represents ‘n’ scalars and
u represents the basis vectors, as
Principal Component Analysis

• Let us extend this notion to the feature space of a data set


• The feature vector can be transformed to a vector space of the basis vectors
which are termed as principal components.
• A set of feature vectors that have similarity with each other is transformed to a
set of principal components that are completely unrelated
• The principal components capture the variability of the original feature space
• The number of components derived, is much smaller than the original set of
features.
Principal Component Analysis
Principal Component Analysis: Steps

https://www.kdnuggets.com/2023/05/principal-component-
analysis-pca-scikitlearn.html
Principal Component Analysis

https://www.geeksforgeeks.org/covariance-matrix/
Principal Component Analysis: Steps
*Note: Standardize the features of
dataset by removing the mean and
* scaling to unit variance so that each
feature has μ = 0 and σ = 1.
Singular Value Decomposition (SVD)

• SVD is a matrix factorization technique commonly used in linear


algebra.
• SVD of a matrix A(m*n) is a factorization of the form:
Singular Value Decomposition (SVD)

• When the dataset is sparse (as in case of text data), it is not advisable to
remove the mean of a data attribute.
• SVD is a good choice for dimensionality reduction in those situations
than PCA.

https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/
Singular Value Decomposition (SVD)

SVD on text Source Code


Linear Discriminant Analysis (LDA)

• LDA is another commonly used feature extraction technique like PCA or SVD.
• The objective of LDA is to transform a dataset into a lower dimensional feature
space
• The focus of LDA is not to capture the dataset variability
• Instead, LDA focuses on class separability, i.e. separating the features based on
class separability so as to avoid overfitting of the machine learning model
• LDA calculates eigen values and eigen vectors within a class and inter-class
scatter matrices.

https://www.statology.org/scree-plot-python/
Linear Discriminant Analysis (LDA)
Steps to be followed are given below:
1. Calculate the mean vectors for the individual classes
2. Calculate intra-class and inter-class scatter matrices
3. Calculate eigen values and eigen vectors for Sw and SB where Sw is the intra-class scatter matrix and SB is
the inter-class scatter matrix
Feature Selection

Task: predicting weights of students


Issues in High-dimensional data
Objective of Feature Selection
Key drivers of Feature Selection
• Feature Relevance
• Redundancy in Features
Measures of Feature Relevance
• Mutual Information
• Entropy of feature
Measures of Feature Relevance
• Entropy of Feature: Shannon’s Formula
Measures of Feature Relevance
• Mutual Information: Higher the value of mutual information of feature, more relevant is that
feature.
Measures of Feature Relevance
• For supervised learning, mutual information is considered
a good measure
• For unsupervised learning, there is no class variable, hence
feature to class mutual information is not a measure.
• Entropy of set of features without one feature at a time is
calculated for all the features
• Features are ranked in descending order of information gain*
from a feature and top ‘p%’ are considered as relevant features.

https://medium.com/@ompramod9921/decision-trees-
6a3c05e9cb82
Measures of Feature redundancy

• Feature redundancy is based on similar information contribution by


multiple features.
• Measures of similarity of information contribution:
1. Correlation-based measures
2. Distance-based measures
3. Other coefficient-based measure
Measures of Feature redundancy
• Correlation-based measure: It is a measure of linear dependency between two
random variables.
• Pearson’s product moment correlation coefficient for two feature variables F1
and F2 is defined as:
Measures of Feature redundancy
• Distance-based measures:
• Euclidean distance
• Minkowski distance
• Manhattan distance
• Hamming distance
Measures of Feature redundancy
• Distance-based measures:
• Euclidean distance is the most common distance between two random
feature variables F1 and F2 (for n rows) defined as:
Measures of Feature redundancy
• Euclidean Distance: Example
Measures of Feature redundancy
• Euclidean Distance: Example
Measures of Feature redundancy
• Distance-based measures:
Measures of Feature redundancy
• Hamming Distance: A special case of Manhattan distance is the Hamming
distance which measures the distance between binary vectors.
• Example: Hamming distance between 01101011 and 11001001 is 3
Measures of Feature redundancy
• Other Distance based measures:
• Jaccard index/coefficient is a measure of similarity between two features. Jaccard distance is
a measure of dissimilarity between two features, complimentary of Jaccard index.
Measures of Feature redundancy
• Jaccard index/coefficient: Example
Measures of Feature redundancy
• Simple Matching Coefficient (SMC):
Measures of Feature redundancy
• Cosine Similarity: It is the most popular measure in text classification.
• It measures the angle between two vectors.
• Cosine similarity=1=> x and y are similar
• Cosine similarity=0=> x and y do not share any similarity.

Cosine similarity between two features x and y is given by:


Measures of Feature redundancy
• Cosine Similarity:
Feature Selection process
Types of approaches for Feature Selection
Filter approach for Feature Selection:

• It is based on statistical measures like Pearson’s correlation, ANOVA,


Information Gain, Fisher score, Chi-square etc.
• No learning algorithm is employed to evaluate the goodness of the feature
selected
Wrapper approach for Feature Selection:
• Inductive Learning algorithm is employed to evaluate the goodness of the
feature selected.
• In this approach, for every candidate subset, the learning model is trained and
result is evaluated by running the learning algorithm.
• It is computationally expensive but generally has better performance.
Hybrid approach for Feature Selection:
Embedded approach for Feature Selection:

You might also like