0% found this document useful (0 votes)

100 views24 pages

Feature Selection Methods

The document discusses feature selection, which aims to select a subset of important features for classification. It covers filter and wrapper methods, search strategies like sequential forward and backward selection, and evaluation criteria. Genetic algorithms can also be used for randomized feature selection.

Uploaded by

Sakshi jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views24 pages

Feature Selection Methods

Uploaded by

Sakshi jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Feature

Selection

Robot Image Credit: Viktoriya Sukhanova © 123RF.com

Feature Selection
• Given a set of n features, the goal of feature selection is to select a
subset of d features (d < n) in order to minimize the classification error.

dimensionality
reduction

• Why perform feature selection?

– Data interpretation\knowledge discovery (insights into which factors
which are most representative of your problem)
– Curse of dimensionality (amount of data grows exponentially with # of
features O(2" )

• Fundamentally different from dimensionality reduction (we will

discuss next time) based on feature combinations (i.e., feature
extraction).
Feature Selection vs.
Dimensionality Reduction
• Feature Selection
– When classifying novel patterns, only a small number of features
need to be computed (i.e., faster classification).
– The measurement units (length, weight, etc.) of the features are
preserved.

• Dimensionality Reduction (next time)

– When classifying novel patterns, all features need to be computed.
– The measurement units (length, weight, etc.) of the features are
lost.
Feature Selection Steps

• Feature selection is an
optimization problem.
– Step 1: Search the space of
possible feature subsets.

– Step 2: Pick the subset that is

optimal or near-optimal with
respect to some objective
function.
Feature Selection Steps (cont’d)

Search strategies
– Optimal
– Heuristic

Evaluation strategies
- Filter methods
- Wrapper methods
Evaluation Strategies
• Filter Methods
– Evaluation is independent of
the classification algorithm.

– The objective function

evaluates feature subsets by
their information content,
typically interclass distance,
statistical dependence or
information-theoretic
measures.
Evaluation Strategies

• Wrapper Methods
– Evaluation uses criteria
related to the
classification algorithm.

– The objective function is a

pattern classifier, which
evaluates feature subsets
by their predictive
accuracy (recognition rate
on test data) by statistical
resampling or cross-
validation.
Filter vs. Wrapper Approaches
Filter vs Wrapper Approaches
Search Strategies
• Assuming n features, an exhaustive search would
require:
ænö
– Examining all ç ÷ possible subsets of size d.
èd ø

– Selecting the subset that performs the best according to the

criterion function.

• The number of subsets grows combinatorially, making

exhaustive search impractical.
• In practice, heuristics are used to speed-up search but
they cannot guarantee optimality.

10
Naïve Search
• Sort the given n features in order of their probability of
correct recognition.

• Select the top d features from this sorted list.

• Disadvantage
– Correlation among features is not considered.
– The best pair of features may not even contain the best
individual feature.
Sequential forward selection (SFS)
(heuristic search)
• First, the best single feature is selected (i.e.,
using some criterion function).
• Then, pairs of features are formed using one of
the remaining features and this best feature, and
the best pair is selected.
• Next, triplets of features are formed using one
of the remaining features and these two best
features, and the best triplet is selected.
• This procedure continues until a predefined
number of features are selected.

SFS performs
best when the
optimal subset is
small.
12
Example
features added at
each iteration

Results of sequential forward feature selection for classification of a satellite image

using 28 features. x-axis shows the classification accuracy (%) and y-axis shows the
features added at each iteration (the first iteration is at the bottom). The highest
accuracy value is shown with a star. 13
Sequential backward selection (SBS)
(heuristic search)
• First, the criterion function is computed for all n
features.
• Then, each feature is deleted one at a time, the
criterion function is computed for all subsets with
n-1 features, and the worst feature is discarded.
• Next, each feature among the remaining n-1 is
deleted one at a time, and the worst feature is
discarded to form a subset with n-2 features.
• This procedure continues until a predefined
number of features are left.
SBS performs
best when the
optimal subset is
large.
14
Example
features removed at
each iteration

Results of sequential backward feature selection for classification of a satellite image

using 28 features. x-axis shows the classification accuracy (%) and y-axis shows the
features removed at each iteration (the first iteration is at the top). The highest accuracy
value is shown with a star. 15
Bidirectional Search (BDS)
• BDS applies SFS and SBS
simultaneously:
– SFS is performed from the
empty set.
– SBS is performed from the
full set.
• To guarantee that SFS and SBS
converge to the same
solution:
– Features already selected by
SFS are not removed by SBS.
– Features already removed by
SBS are not added by SFS.
3
Limitations of SFS and SBS

• The main limitation of SFS is that it is unable to

remove features that become non useful after the
addition of other features.
• The main limitation of SBS is its inability to
reevaluate the usefulness of a feature after it has
been discarded.
• We will examine some generalizations of SFS and
SBS:
– Plus-L, minus-R” selection (LRS)
– Sequential floating forward/backward selection (SFFS and
SFBS)
“Plus-L, minus-R” selection (LRS)
• A generalization of SFS and SBS
– If L>R, LRS starts from the empty set and:
• Repeatedly add L features
• Repeatedly remove R features
– If L<R, LRS starts from the full set and:
• Repeatedly removes R features
• Repeatedly add L features

Its main limitation is the lack of a

theory to help choose the optimal
values of L and R.
Sequential floating forward/backward
selection (SFFS and SFBS)
• An extension to LRS:
– Rather than fixing the values of L and R, floating methods
determine these values from the data.
– The dimensionality of the subset during the search can be
thought to be “floating” up and down

• Two floating methods:

– Sequential floating forward selection (SFFS)
– Sequential floating backward selection (SFBS)

P. Pudil, J. Novovicova, J. Kittler, Floating search methods in feature

selection, Pattern Recognition Lett. 15 (1994) 1119–1125.
Sequential floating forward selection
(SFFS)
• Sequential floating forward selection (SFFS) starts from
the empty set.
• After each forward step, SFFS performs backward steps
as long as the objective function increases.
Sequential floating backward selection
(SFBS)

• Sequential floating backward selection (SFBS) starts

from the full set.

• After each backward step, SFBS performs forward steps

as long as the objective function increases.
Feature Selection using GAs
(randomized search)

• GAs provide a simple, general, and powerful framework

for feature selection.

Feature Feature
Data Classifier
Extraction Subset

Feature
Selection
(GA)
Feature Selection Using GAs
(cont’d)
• Binary encoding: 1 means “choose feature” and 0
means “do not choose” feature.
1 N

• Fitness evaluation (to be maximized)

Fitness=w1 ´ accuracy + w2 ´ #zeros

Classification Number of
accuracy using a features
validation set w1>>w2
Feature Selection Summary
• Has two-fold advantage of providing some interpretation of
the data and making the learning problem easier

• Finding global optimum impractical in most situations, rely

on heuristics instead (greedy\random search)

• Filtering is fast and general but can pick a large # of

features

• Wrapping considers model bias but is MUCH slower due to

training multiple models

Numerical Analysis in Engineering: Taylor Series
No ratings yet
Numerical Analysis in Engineering: Taylor Series
28 pages
Classification
100% (2)
Classification
105 pages
Assignment 2 - ME PDF
No ratings yet
Assignment 2 - ME PDF
2 pages
Dynamic Programming Applications: Water Allocation
No ratings yet
Dynamic Programming Applications: Water Allocation
14 pages
LECTURE in DIVISION of POLYNOMIALS 2
No ratings yet
LECTURE in DIVISION of POLYNOMIALS 2
87 pages
Code ExerciseModelSelection
100% (1)
Code ExerciseModelSelection
19 pages
ch6 Perceptron MLP PDF
No ratings yet
ch6 Perceptron MLP PDF
31 pages
Dr. Meenakshi Sood Associate Professor, NITTTR Chandigarh: Meenkashi@nitttrchd - Ac.in
No ratings yet
Dr. Meenakshi Sood Associate Professor, NITTTR Chandigarh: Meenkashi@nitttrchd - Ac.in
39 pages
GPSS World Simulation Report
No ratings yet
GPSS World Simulation Report
10 pages
Chelyshkov Least Squares Support Vector Regression For Nonl 2022 Chaos Soli
No ratings yet
Chelyshkov Least Squares Support Vector Regression For Nonl 2022 Chaos Soli
12 pages
Tugas - Metnum - Archyuda Farchan
No ratings yet
Tugas - Metnum - Archyuda Farchan
8 pages
Percobaan Mat
No ratings yet
Percobaan Mat
4 pages
GMT Question20.12.2018
No ratings yet
GMT Question20.12.2018
1 page
A2AG05A
No ratings yet
A2AG05A
3 pages
Polynomials Cbse 10 Subjective
No ratings yet
Polynomials Cbse 10 Subjective
25 pages
ML Project Shivani Pandey
100% (2)
ML Project Shivani Pandey
49 pages
Problem Set 6: Discrete-Time Fourier Series and Transform and Discrete Fourier Transform Using Matlab
No ratings yet
Problem Set 6: Discrete-Time Fourier Series and Transform and Discrete Fourier Transform Using Matlab
3 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Alzheimers Disease Detection Using Different Machine Learning Algorithms
100% (1)
Alzheimers Disease Detection Using Different Machine Learning Algorithms
7 pages
Unit-4 Dynamic Programming
No ratings yet
Unit-4 Dynamic Programming
131 pages
Tomanovic GQF 2023
No ratings yet
Tomanovic GQF 2023
20 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
Introduction To Convolutional Neural Networks (CNNS)
No ratings yet
Introduction To Convolutional Neural Networks (CNNS)
28 pages
Convolutional Neural Networks - Annotated
No ratings yet
Convolutional Neural Networks - Annotated
83 pages
10.5 DeepRecurrent
No ratings yet
10.5 DeepRecurrent
8 pages
Master'S Thesis: Potential Deep Learning Approaches For The Physical Layer
No ratings yet
Master'S Thesis: Potential Deep Learning Approaches For The Physical Layer
59 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
SimulatedAnnealing PDF
No ratings yet
SimulatedAnnealing PDF
26 pages
1.3 Division of Polynomial Functions
No ratings yet
1.3 Division of Polynomial Functions
3 pages
Logistics Regression
100% (1)
Logistics Regression
5 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Matlab Iris RBF
No ratings yet
Matlab Iris RBF
21 pages
Heuristic Function
No ratings yet
Heuristic Function
18 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Simplex Method
No ratings yet
Simplex Method
9 pages
Data Science
No ratings yet
Data Science
39 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
15 pages
Unit 4 DSP Iir Filter Design
100% (3)
Unit 4 DSP Iir Filter Design
110 pages
The Problem of Overfitting: Overfitting With Linear Regression
No ratings yet
The Problem of Overfitting: Overfitting With Linear Regression
32 pages
Supervised Learning - Regression - Annotated
No ratings yet
Supervised Learning - Regression - Annotated
97 pages
Feature Engineering
No ratings yet
Feature Engineering
9 pages
Multicollinearity Exercise
100% (1)
Multicollinearity Exercise
6 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Cluster Analysis: Concepts and Techniques - Chapter 7
100% (1)
Cluster Analysis: Concepts and Techniques - Chapter 7
60 pages
Deep Reinforcement Learning For Cyber Security
No ratings yet
Deep Reinforcement Learning For Cyber Security
17 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
RBF, KNN, SVM, DT
No ratings yet
RBF, KNN, SVM, DT
9 pages
Eda PDF
100% (1)
Eda PDF
45 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
Sample Numerical Methods Mcs491 Assignment 1
No ratings yet
Sample Numerical Methods Mcs491 Assignment 1
25 pages
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
Lecture 03 Gradient Descent
No ratings yet
Lecture 03 Gradient Descent
26 pages
Bisection Method
No ratings yet
Bisection Method
13 pages
Polynomial 10 TH Ass-1
No ratings yet
Polynomial 10 TH Ass-1
4 pages
TP Regression
100% (1)
TP Regression
1 page
Optimization Syllabus
No ratings yet
Optimization Syllabus
2 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Vanishing and Exploding
No ratings yet
Vanishing and Exploding
9 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
Sample Questions of ANN
No ratings yet
Sample Questions of ANN
5 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Feature Engineering
No ratings yet
Feature Engineering
23 pages
Soft Max
No ratings yet
Soft Max
6 pages
IRIS BPNN - Ipynb - Colaboratory
100% (1)
IRIS BPNN - Ipynb - Colaboratory
4 pages
Intrusion Detection Systems Using Decision Trees and Support Vector Machines
No ratings yet
Intrusion Detection Systems Using Decision Trees and Support Vector Machines
16 pages
Cyber Physical Systems: The Role of Machine Learning and Cyber Security in Present and Future
No ratings yet
Cyber Physical Systems: The Role of Machine Learning and Cyber Security in Present and Future
16 pages
Machine Learning Project Report
100% (1)
Machine Learning Project Report
4 pages
Classification Algorithms
100% (2)
Classification Algorithms
23 pages
Practice Set: Operation Research (BMA342)
No ratings yet
Practice Set: Operation Research (BMA342)
10 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
Early Stopping in Practice
No ratings yet
Early Stopping in Practice
14 pages
ML Projects For Final Year
No ratings yet
ML Projects For Final Year
7 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
No ratings yet
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
8 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet