Lecture 14

Uploaded by

sayanpal854

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views17 pages

Lecture 14

Uploaded by

sayanpal854

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Feature Selection

Prof. Subir Kumar Das, Dept. of CSE 1

Generalization Error
• In supervised learning, the main goal is to use training data to build a
model that will be able to make accurate predictions based on new,
unseen data, which has the same characteristics as the initial training set.
• This is known as generalization.
• Generalization relates to how effectively the concepts learned by a
machine learning model apply to particular examples that were not used
throughout the training.
• To train a machine learning model, the dataset is split into 3 sets: training,
validation, and testing.
• Models is trained using the training data, then compared and tuned them
using the evaluation results on the validation set, and in the end, evaluate
the performance of best model on the testing set.
• The error rate on new cases is called the generalization error (or out-of-
sample error), and by evaluating the models on the validation set, an
estimation of this error is calculated.
• A model’s generalization error (also known as a prediction error) can be
expressed as the sum of three very different errors: Bias error, variance
error, and irreducible error.Prof. Subir Kumar Das, Dept. of CSE 2
Bias-Variance Tradeoff
• This type of error results from incorrect assumptions, such as thinking that
the data is linear when it is actually quadratic.
• Bias is defined as a systematic error that happens in the machine learning
model as a result of faulty ML assumptions.
• Bias is also the average squared difference between predictions of the
model and actual data.
• Models with a higher percentage of bias will not match the training data.
• On the other hand, models with lower bias rates will coincide with the
training dataset.
• Variance, as a generalization error, occurs due to the model’s excessive
sensitivity to small variations in the training data.
• In supervised learning, the model learns from training data.
• So, change in the training data, will also affect the model.
• The variance shows the amount by which the performance of the
predictive model will be impacted when evaluating based on the
validation data.
Prof. Subir Kumar Das, Dept. of CSE 3
Bias-Variance Tradeoff

• Bias/variance in machine learning relates to the problem of

simultaneously minimizing two error sources (bias error and variance
error).
• If the model is too simple (e.g., linear model), it will have high bias and low
variance.
• If your model is very complex and has many parameters, it will have low
bias and high variance.
• Decreasing the bias error, will increase the variance error and vice versa.
• This correlation is known as the bias-variance tradeoff.
Prof. Subir Kumar Das, Dept. of CSE 4
• Total Error = Bias²+ Variance + Irreducible Error
Overfitting
• When a model performs very well for training data but has poor
performance with test data (new data), it is known as overfitting.
• Like the child who memorized every math problem in the problem book
and would struggle when facing problems from anywhere else
• In this case, the machine learning model learns the details and noise in the
training data such that it negatively affects the performance of the model
on test data.
• If the model is overfitting, even a slight change in the output data will
cause the model to change significantly.
• Models that are overfitting usually have low bias and high variance

Prof. Subir Kumar Das, Dept. of CSE 5

Underfitting
• When a model has not learned the patterns in the training data well and is
unable to generalize well on the new data, it is known as underfitting.
• An underfit model has poor performance on the training data and will
result in unreliable predictions.
• Underfitting occurs when a model is not able to learn enough from
training data, making it difficult to capture the dominating trend (model is
unable to create a mapping between the input and the target variable).
• Machine learning models with underfitting tend to have poor performance
both in training and testing sets
• Like the child who learned only addition and was not able to solve
problems related to other basic arithmetic operations both from his math
problem book and during the math exam.
• Underfitting models usually have high bias and low variance.

Prof. Subir Kumar Das, Dept. of CSE 6

Feature Selection
• Feature Selection is the process of selecting the most important features
to input in machine learning algorithms.
• It helps in picking the most important factors from a bunch of options to
build better models in machine learning.
• Feature selection techniques are employed to reduce the number of
input variables by eliminating redundant or irrelevant features and
narrowing down the set of features to those most relevant to the
machine learning model.
• This process is crucial for several reasons:
• Improving Accuracy — By focusing on relevant data and eliminating
noise, the accuracy of the model improves.
• Reducing Overfitting — Less redundant data means less opportunity for
the model to make decisions based on noise, thereby reducing the risk of
overfitting.
• Reducing Training Time — Fewer data points reduce algorithm
complexity and the amount of time needed to train a model.
• Simplifying Models — Simpler models are easier to interpret and explain,
which is valuable in many applications.
Prof. Subir Kumar Das, Dept. of CSE 7
Feature Selection
• Simpler models — simple models are easy to explain - a model that is too
complex and unexplainable is not valuable
• Variance reduction — increase the precision of the estimates that can be
obtained for a given simulation
• Avoid the curse of high dimensionality — dimensionally cursed
phenomena states that, as dimensionality and the number of features
increases, the volume of space increases so fast that the available data
become limited. Feature selection may be used to reduce dimensionality
• The most common input variable data types include:
• Numerical Variables, such as Integer Variables and Floating Point Variables;
and Categorical Variables, such as Boolean Variables, Ordinal Variables,
and Nominal Variables.
• Popular libraries for feature selection include sklearn feature selection,
feature selection Python, and feature selection in R.
• Selection algorithms are categorized as either supervised, which can be
used for labeled data; or unsupervised, which can be used for unlabeled
data.
• Unsupervised techniques are classified as filter methods, wrapper
methods, embedded methods, Prof. Subiror hybrid
Kumar Das, Dept.methods:
of CSE 8
Filter Method

• Filter methods select features based on statistics rather than feature

selection cross-validation performance.
• A selected metric is applied to identify irrelevant attributes and perform
recursive feature selection.
• The scores from these evaluations are used to choose the input variables.
• Filter methods are either univariate, in which an ordered ranking list of
features is established to inform the final selection of feature subset;
• or multivariate, which evaluates the relevance of the features as a whole,
identifying redundant and irrelevant features.
• Filter methods are generally used as a preprocessing step.
• The selection of features is independent of any machine learning
algorithms.
• For basic guidance, the following table for defining correlation coefficients
can be referred:

Prof. Subir Kumar Das, Dept. of CSE 9

Filter Method

• Pearson’s Correlation: It is used as a measure for quantifying linear

dependence between two continuous variables X and Y.
• Its value varies from -1 to +1.
• Pearson’s correlation is given as:
• LDA: Linear discriminant analysis is used to find a linear combination of
features that characterizes or separates two or more classes (or levels) of a
categorical variable.
• ANOVA: ANOVA stands for Analysis of variance.
• It is similar to LDA except for the fact that it is operated using one or more
categorical independent features and one continuous dependent feature.
• Chi-Square: It is a is a statistical test applied to the groups of categorical
features to evaluate the likelihood of correlation or association between
them using their frequency distribution.
• One thing that should be kept in mind is that filter methods do not remove
Prof. Subir Kumar Das, Dept. of CSE 10
multicollinearity.
Wrapper Method

• Wrapper feature selection methods consider the selection of a set of

features as a search problem, whereby their quality is assessed with the
preparation, evaluation, and comparison of a combination of features to
other combinations of features.
• It tries to use a subset of features and train a model using them.
• This method facilitates the detection of possible interactions amongst
variables.
• Wrapper methods focus on feature subsets that will help improve the
quality of the results of the clustering algorithm used for the selection.
• These methods are usually computationally very expensive.
• Some common examples of wrapper methods are forward feature
selection, backward feature elimination, recursive feature elimination,
etc.
• One of the best ways for implementing feature selection with wrapper
methods is to use BorutaProf.package that finds the importance of a feature
Subir Kumar Das, Dept. of CSE 11
by creating shadow features.
Embedded Method
• Embedded methods combine the qualities’ of filter and wrapper methods.
• Embedded feature selection methods integrate the feature selection
machine learning algorithm as part of the learning algorithm, in which
classification and feature selection are performed simultaneously.
• It’s implemented by algorithms that have their own built-in feature
selection methods.
• The features that will contribute the most to each iteration of the model
training process are carefully extracted.
• Random forest feature selection, decision tree feature selection, and
LASSO feature selection are common embedded methods.
• LASSO and RIDGE regression which have inbuilt penalization functions to
reduce overfitting.

Prof. Subir Kumar Das, Dept. of CSE 12

Notes

Prof. Subir Kumar Das, Dept. of CSE 13

Prof. Subir Kumar Das, Dept. of CSE 14
Advantages of Wrapper Feature Selection
• Wrapper feature selection methods are a family of supervised feature
selection techniques that use a predictive model to evaluate the
importance of different subsets of features based on their predictive
performance.
• Performance-Oriented — Wrapper methods tend to provide the best-
performing feature set for the specific model used, as they are
algorithm-oriented and optimize for the highest accuracy or other
performance metrics.
• Model Interaction — They interact directly with the classifier to assess
feature usefulness, which can lead to better model performance
compared to methods that do not.
• Feature Interactions — These methods can capture interactions
between features that may be missed by simpler filter methods.

Prof. Subir Kumar Das, Dept. of CSE 15

Disadvantages of Wrapper Feature Selection
• Computationally Intensive — Wrapper methods are computationally
expensive because they require training and evaluating a model for
each candidate subset of features, which can be time consuming and
resource intensive.
• Risk of Overfitting — There is a higher potential for overfitting the
predictors to the training data, as the method seeks to optimize
performance on the given dataset. This may not generalize well to
unseen data.
• Model Dependency — The feature subsets produced by wrapper
methods are specific to the type of model used for selection, which
means they might not perform as well if applied to a different model.
• Lack of Transparency — Wrapper methods do not provide
explanations for why certain features are selected over others, which
can reduce the interpretability of the model.

Prof. Subir Kumar Das, Dept. of CSE 16

Thank You

Prof. Subir Kumar Das, Dept. of CSE 17

DL Unit1
100% (1)
DL Unit1
79 pages
Underfitting and Overfitting Slides and Transcript
No ratings yet
Underfitting and Overfitting Slides and Transcript
13 pages
1 Machine Learning
No ratings yet
1 Machine Learning
111 pages
Unit 4
No ratings yet
Unit 4
50 pages
7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
ML UNIT 4 Notes
No ratings yet
ML UNIT 4 Notes
30 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
52 pages
Module 3 Modified
No ratings yet
Module 3 Modified
48 pages
Data Science Interview Questions - 1
No ratings yet
Data Science Interview Questions - 1
55 pages
Model Parameters
No ratings yet
Model Parameters
26 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Ensemble Methods
No ratings yet
Ensemble Methods
12 pages
Unit IV
No ratings yet
Unit IV
51 pages
The Design School
50% (2)
The Design School
24 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Chapter7 - Exchangeability Bias-Variance Decomposition
No ratings yet
Chapter7 - Exchangeability Bias-Variance Decomposition
19 pages
2 - 6 Error Bounds
No ratings yet
2 - 6 Error Bounds
9 pages
UNIT 4 Supervised Learning
No ratings yet
UNIT 4 Supervised Learning
38 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Chapter 9 - Learning Techniques
No ratings yet
Chapter 9 - Learning Techniques
25 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Question1 Answers Complete
No ratings yet
Question1 Answers Complete
4 pages
International Market Entry Strategies of Digital Platform Businesses
100% (1)
International Market Entry Strategies of Digital Platform Businesses
75 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
Receiver Operator Characteristic
No ratings yet
Receiver Operator Characteristic
25 pages
Ensemble Method
No ratings yet
Ensemble Method
12 pages
ML Unit 1
No ratings yet
ML Unit 1
74 pages
Aam Unit 1 QB With Answer
No ratings yet
Aam Unit 1 QB With Answer
12 pages
Machine Learning-2
No ratings yet
Machine Learning-2
87 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
Product Process PDF
No ratings yet
Product Process PDF
61 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
Machine Learning Notes Anna University
No ratings yet
Machine Learning Notes Anna University
9 pages
RB's ML2 Notes
No ratings yet
RB's ML2 Notes
5 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Unit Ii ML
No ratings yet
Unit Ii ML
57 pages
Diagnosing Bias Vs Variance
No ratings yet
Diagnosing Bias Vs Variance
11 pages
Jkkklphftbbhuii
No ratings yet
Jkkklphftbbhuii
17 pages
Approach Towards Model Evaluation, Model Selection
No ratings yet
Approach Towards Model Evaluation, Model Selection
13 pages
Merge +1
No ratings yet
Merge +1
107 pages
Manufacturing Robotics Report
No ratings yet
Manufacturing Robotics Report
30 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
ML Assignment
No ratings yet
ML Assignment
5 pages
Kernels, Model Selection and Feature Selection
No ratings yet
Kernels, Model Selection and Feature Selection
5 pages
Summary IJTP 1 Merged
No ratings yet
Summary IJTP 1 Merged
27 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
AOL-2-Mod-1 MA
No ratings yet
AOL-2-Mod-1 MA
17 pages
NEC ML UNIT-III Complete Final
No ratings yet
NEC ML UNIT-III Complete Final
22 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Machine Learning Volume I 280820241047
No ratings yet
Machine Learning Volume I 280820241047
4 pages
Model Selection NEW
No ratings yet
Model Selection NEW
24 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
ML 21-22 Sem
No ratings yet
ML 21-22 Sem
10 pages
Academic Transcript
No ratings yet
Academic Transcript
7 pages
Feature Selection - Study Material
No ratings yet
Feature Selection - Study Material
6 pages
Q3 Statistics and Probability 11 Module 3
No ratings yet
Q3 Statistics and Probability 11 Module 3
21 pages
Consulting Process 1
100% (1)
Consulting Process 1
20 pages
Video Game Development As Public History
No ratings yet
Video Game Development As Public History
34 pages
Machine Learning Project 1
No ratings yet
Machine Learning Project 1
19 pages
CS405-6 2 1 2-Wikipedia
No ratings yet
CS405-6 2 1 2-Wikipedia
7 pages
Business Communication Polishing Your Professional Presence 3rd Edition Shwom Solutions Manualinstant Download
100% (10)
Business Communication Polishing Your Professional Presence 3rd Edition Shwom Solutions Manualinstant Download
51 pages
Radhakrishnan Health Data As Wealth
No ratings yet
Radhakrishnan Health Data As Wealth
47 pages
A "Short" Introduction To Model Selection
No ratings yet
A "Short" Introduction To Model Selection
25 pages
Special Topic: Missing Values
No ratings yet
Special Topic: Missing Values
25 pages
ACGIHTLVforHandActivityLevel HAL
No ratings yet
ACGIHTLVforHandActivityLevel HAL
6 pages
Dissertation Posters Examples
100% (2)
Dissertation Posters Examples
5 pages
Chelsea Amarkai Lartey - 2023 PDF
No ratings yet
Chelsea Amarkai Lartey - 2023 PDF
70 pages
Gage R&R
No ratings yet
Gage R&R
24 pages
Passis Test of Creativity
No ratings yet
Passis Test of Creativity
8 pages
MLT - MKC
No ratings yet
MLT - MKC
10 pages
BMIS Imus PDF
No ratings yet
BMIS Imus PDF
4 pages
Planning For Construction
No ratings yet
Planning For Construction
21 pages
Aeronautical Engineering Tentative Scheme PDF
No ratings yet
Aeronautical Engineering Tentative Scheme PDF
6 pages
MICRO
No ratings yet
MICRO
5 pages
Untitled
No ratings yet
Untitled
13 pages
Consumer Behaviour and Marketing Research
No ratings yet
Consumer Behaviour and Marketing Research
2 pages
Lecture 1 Notes
No ratings yet
Lecture 1 Notes
6 pages
It Doesn'T Hurt To Ask : How To Use This Guide
No ratings yet
It Doesn'T Hurt To Ask : How To Use This Guide
4 pages
HR Audit
No ratings yet
HR Audit
3 pages
An Evaluation of Four Methods of Assessing The Behaviour of Anxious Child Dental Patients
No ratings yet
An Evaluation of Four Methods of Assessing The Behaviour of Anxious Child Dental Patients
9 pages
Monte Carlo Simulation With HSPICE - REV
No ratings yet
Monte Carlo Simulation With HSPICE - REV
4 pages
Drug Abuse Hyderabad
No ratings yet
Drug Abuse Hyderabad
5 pages
Kashdanetal Fivedimensionalcuriosityscalerevised PAID
No ratings yet
Kashdanetal Fivedimensionalcuriosityscalerevised PAID
11 pages
Automated Software Testing Interview Questions You'll Most Likely Be Asked
From Everand
Automated Software Testing Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet