Wrapper Method
Wrapper Method
BACKWARD SELECTION
WHY DIMENSIONALITY REDUCTION?
◼ Curse of Dimensionality
onto 2D or 3D.
◼ Feature Selection
◼ Feature extraction
◼ Feature selection
◼ Feature selection:
Problem of selecting some subset from a set of input
features upon which it should focus attention, while
ignoring the rest
Diagnostic features:
- Eye separation
- Eye height
Non-Diagnostic features:
- Mouth height
- Nose length
FEATURE SELECTION METHODS
❖ Feature selection is an
optimization problem.
◼ In a bottom-up method one gradually adds the ranked features in the order of
their individual discrimination power and stops when the error rate stops
decreasing
◼ In a top-down truncation method one starts with the complete set of
features and progressively eliminates features while searching for the optimal
performance point
FEATURE SELECTION METHODS
❖ Filter Methods
❖ Wrapper Methods
◼ Evaluation uses criteria related to the
classification algorithm.
Wrapper Approach
❖ Advantages
◼ Accuracy: wrappers generally have better recognition rates than filters since
they tuned to the specific interactions between the classifier and the
features.
◼ Ability to generalize: wrappers have a mechanism to avoid over fitting, since
they typically use cross-validation measures of predictive accuracy.
❖ Disadvantages
◼ Slow execution
FILTER VS WRAPPER
APPROACHES (CONT’D)
Filter Apporach
◼ Advantages
◼ Fast execution: Filters generally involve a non-iterative computation on the
dataset, which can execute much faster than a classifier training session
◼ Generality: Since filters evaluate the intrinsic properties of the data, rather
than their interactions with a particular classifier, their results exhibit more
generality; the solution will be “good” for a large family of classifiers
◼ Disadvantages
◼ Tendency to select large subsets: Filter objective functions are generally
monotonic
SEARCH STRATEGIES
Four Features – x1, x2, x3,
x4
1,1,1,1
0,0,0,0
❖ Disadvantage
◼ Feature correlation is not considered.
◼ Best pair of features may not even contain the best individual
feature.
SEQUENTIAL FORWARD SELECTION (SFS)
(HEURISTIC SEARCH)
0,0,0,0
ILLUSTRATION (SFS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
ILLUSTRATION (SFS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
ILLUSTRATION (SFS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
ILLUSTRATION (SFS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
SEQUENTIAL BACKWARD SELECTION
(SBS) (HEURISTIC SEARCH)
• J(x2) is maximum
{x2}{x3} • x3 is the worst feature
ILLUSTRATION (SBS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
ILLUSTRATION (SBS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
ILLUSTRATION (SBS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
ILLUSTRATION (SBS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
ILLUSTRATION (SBS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
BIDIRECTIONAL SEARCH (BDS)
(HEURISTIC SEARCH)
SBS
{x1, x2, x3,
x4}
SFS
BIDIRECTIONAL SEARCH (BDS)
SBS
{x1, x2, x3,
x4}
SFS
BIDIRECTIONAL SEARCH (BDS)
SBS
{x1, x2, x3,
x4}
{x2, x3, {x2, x1, {x2, x2,
x4} x4} x3}
SFS
BIDIRECTIONAL SEARCH (BDS)
SBS
{x1, x2, x3,
x4}
J(x2, x1, x4) is maximum
{x2, x3, {x2, x1, {x2, x2, x3 is removed
x4} x4} x3}
SFS
BIDIRECTIONAL SEARCH (BDS)
SBS
{x1, x2, x3,
x4}
J(x2, x1, x4) is maximum
{x2, x3, {x2, x1, {x2, x2, x3 is removed
x4} x4} x3}
{x2, {x2,
x1} x4}
SFS
BIDIRECTIONAL SEARCH (BDS)
SBS
{x1, x2, x3,
x4}
J(x2, x1, x4) is maximum
{x2, x3, {x2, x1, {x2, x2, x3 is removed
x4} x4} x3}
SFS
BIDIRECTIONAL SEARCH (BDS)
SBS
{x1, x2, x3,
x4}
J(x2, x1, x4) is maximum
{x2, x3, {x2, x1, {x2, x2, x3 is removed
x4} x4} x3}
SFS
BIDIRECTIONAL SEARCH (BDS)
SBS
{x1, x2, x3,
x4}
J(x2, x1, x4) is maximum
{x2, x3, {x2, x1, {x2, x2, x3 is removed
x4} x4} x3}
SFS
BIDIRECTIONAL SEARCH (BDS)
SBS
{x1, x2, x3,
x4}
J(x2, x1, x4) is maximum
{x2, x3, {x2, x1, {x2, x2, x3 is removed
x4} x4} x3}
SFS
ILLUSTRATION (BDS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
ILLUSTRATION (BDS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
ILLUSTRATION (BDS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
ILLUSTRATION (BDS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
ILLUSTRATION (BDS)
Four Features – x1, x2, x3, x4
1,1,1,1
1-xi is selected; 0-xi is not selected
0,0,0,0
FEATURE SELECTION METHODS
Filter:
Selected
All Features Supervised Classifier
Features
Filter Learning
Algorithm Selected
Features
Wrapper:
Criterion Value
Criterion Value
Criterion Value
A B C D
A, B B, C B, D
A, B, C B, C, D
INTRODUCTION TO MACHINE LEARNING AND DATA MINING,
CARLA BRODLEY
Feature Feature
All Features Supervised Classifier
Subset Evaluation
Search Learning
Algorithm Criterion Selected
Features
Criterion Value
AB AD BD
A D