0% found this document useful (0 votes)

63 views14 pages

Model Selection and Feature Selection: Piyush Rai CS5350/6350: Machine Learning

This document discusses model selection and feature selection in machine learning. It describes several techniques for model selection, including holding out data, k-fold cross-validation, and information criteria based methods. It also covers different feature selection methods such as filter methods that rank features and wrapper methods that use the learning algorithm in the feature selection process. The goal of model and feature selection is to choose models and features that best generalize to new data.

Uploaded by

Bro Edwin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views14 pages

Model Selection and Feature Selection: Piyush Rai CS5350/6350: Machine Learning

Uploaded by

Bro Edwin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Model Selection and Feature Selection

Piyush Rai

CS5350/6350: Machine Learning

September 22, 2011

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 1 / 14

What is Model Selection

Given a set of models M = {M1 , M2 , . . . , MR }, choose the model that is expected

to do the best on the test data. M may consist of:
Same learning model with different complexities or hyperparameters
Nonlinear Regression: Polynomials with different degrees
K -Nearest Neighbors: Different choices of K
Decision Trees: Different choices of the number of levels/leaves
SVM: Different choices of the misclassification penalty hyperparameter C
Regularized Models: Different choices of the regularization parameter
Kernel based Methods: Different choices of kernels
.. and almost any learning problem
Different learning models (e.g., SVM, KNN, DT, etc.)

Note: Usually considered in supervised learning contexts but unsupervised

learning too faces this issue (e.g., how many clusters when doing clustering)

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 2 / 14

Held-out Data

Set aside a fraction (say 10%-20%) of the training data

This part becomes our held-out data
Other names: validation/development data

Remember: Held-out data is NOT the test data

Train each model using the remaining training data
Evaluate error on the held-out data
Choose the model with the smallest held-out error
Problems:
Wastes training data, so typically used when we have plenty of training data
Held-out data may not be good if there was an unfortunate split
Can ameliorate unfortunate splits by repeated random subsampling

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 3 / 14

Cross-Validation

K -fold Cross-Validation
Create K equal sized partitions of the training data
Each partition has N/K examples
Train using K 1 partitions, validate on the remaining partition
Repeat the same K times, each with a different validation partition

Finally, choose the model with smallest average validation error

Usually K is chosen as 10

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 4 / 14

Leave-One-Out (LOO) Cross-Validation

Special case of K -fold CV when K = N (number of training examples)

Each partition is now an example
Train using N 1 examples, validate on the remaining example
Repeat the same N times, each with a different validation example

Finally, choose the model with smallest average validation error

Can be expensive for large N. Typically used when N is small

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 5 / 14

Random Subsampling Cross-Validation

Randomly subsample a fixed fraction N (0 < < 1) of examples; call it the

validation set
Train using the rest of the examples, measure error on the validate set
Repeat K times, each with a different, randomly chosen validation set

Finally, choose the model with smallest average validation error

Usually is chosen as 0.1, K as 10

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 6 / 14

Bootstrapping

Given: a set of N examples

Idea: Sample N elements from this set with replacement
An already sampled element could be picked again

Use this new sample as the training data

Use the set of examples not selected as the validation data
For large N, training data consists of about only 63% unique examples
Training data is inherently small error estimate may be pessimistic
Use the following equation to compute the expected model error

e = 0.632 etestexamples + 0.368 etraining examples

Note: the above estimate may still be bad if we overfit and have
etraining examples = 0. Why?

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 7 / 14

Information Criteria based methods

Akaike Information Criteria (AIC)

AIC = 2k 2 log(L)

Bayesian Information Criteria (BIC)

BIC = k log(N) 2 log(L)

k: # of model parameters
L: maximum value of the model likelihood function
Applicable for probabilistic models (when likelihood is defined)
AIC/BIC penalize model complexity
.. as measured by the number of model parameters
BIC penalizes the number of parameters more than AIC
Model with the lowest AIC/BIC will be chosen
Can be used even for model selection in unsupervised learning

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 8 / 14

Minimum Description Length (MDL)

MDL measures the number of bits to encode a probability distribution

MDL = log2 P(z)

Minimum Description Length for a model M

Length(M) = log P(Y | X, w, M) log P(w | M)

Note: its just the MDL for models posterior distribution

P(w | X, Y, M) P(w | M) P(Y | X, w, M)

Complex posterior distribution Complex model

Choose the model with the lowest MDL
Note: MDL criteria is kind of equivalent to preferring the best regularized
model

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 9 / 14

Feature Selection

Selecting a useful subset from all the features

Why Feature Selection?

Some algorithms scale (computationally) poorly with increased dimension

Irrelevant features can confuse some algorithms

Redundant features adversely affect regularization

Removal of features can increase (relative) margin (and generalization)

Reduces data set and resulting model size

Note: Feature Selection is different from Feature Extraction

The latter transforms original features to get a small set of new features
More on feature extraction when we cover Dimensionality Reduction

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 10 / 14

Feature Selection Methods

Methods agnostic to the learning algorithm

Preprocessing based methods
E.g., remove a binary feature if its ON in very few or most examples

Filter Feature Selection methods

Use some ranking criteria to rank features
Select the top ranking features

Wrapper Methods (keep the learning algorithm in the loop)

Requires repeated runs of the learning algorithm with different set of features
Can be computationally expensive

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 11 / 14

Filter Feature Selection

Uses heuristics but is much faster than wrapper methods

Correlation Critera: Rank features in order of their correlation with the

labels
cov (Xd , Y )
R(Xd , Y ) = p
var (Xd )var (Y )

Mutual Information Criteria:

X X log P(Xd , Y )
MI (Xd , Y ) = P(Xd , Y )
P(Xd )P(Y )
Xd {0,1} Y {1,+1}

High mutual information mean high relevance of that feature

Note: These probabilities can be easily estimated from the data

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 12 / 14

Wrapper Methods

Two types: Forward Search and Backward Search

Forward Search
Start with no features
Greedily include the most relevant feature
Stop when selected the desired number of features

Backward Search
Start with all the features
Greedily remove the least relevant feature
Stop when selected the desired number of features

Inclusion/Removal criteria uses cross-validation

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 13 / 14

Wrapper Methods

Forward Search
Let F = {}
While not selected desired number of features
For each unused feature f :
S
Estimate models error on feature set F f (using cross-validation)

Add f with lowest error to F

Backward Search
Let F = {all features}
While not reduced to desired number of features
For each feature f F :
Estimate models error on feature set F \f (using cross-validation)

Remove f with lowest error from F

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 14 / 14

Class 12 Maths 2024-25 Notes Chapter 5. Continuity and Differentiability
No ratings yet
Class 12 Maths 2024-25 Notes Chapter 5. Continuity and Differentiability
58 pages
2025 CBSE XII Maths (Q.P. Code 65-4-1 65-4-2 65-4-3 Series ZXW4Y)
No ratings yet
2025 CBSE XII Maths (Q.P. Code 65-4-1 65-4-2 65-4-3 Series ZXW4Y)
26 pages
Engineering Mathematics Vol I Das Pal
No ratings yet
Engineering Mathematics Vol I Das Pal
516 pages
Maths Class 12 Formula Sheet Updated 12thmathmentor
No ratings yet
Maths Class 12 Formula Sheet Updated 12thmathmentor
30 pages
Practical Issues
No ratings yet
Practical Issues
30 pages
Study Material BCA 2nd Sem Mathematics-I
No ratings yet
Study Material BCA 2nd Sem Mathematics-I
64 pages
Orthonormal Basis
No ratings yet
Orthonormal Basis
11 pages
1) Test Paper TAD IIT JAM (MS) 2024
No ratings yet
1) Test Paper TAD IIT JAM (MS) 2024
16 pages
Algebra Written Notes
No ratings yet
Algebra Written Notes
96 pages
TE - DWM Module No 3
No ratings yet
TE - DWM Module No 3
48 pages
Snu Sample Paper 2
No ratings yet
Snu Sample Paper 2
4 pages
Formula Sheet For Isi
No ratings yet
Formula Sheet For Isi
15 pages
Mathematics Qualifier Exam Details
No ratings yet
Mathematics Qualifier Exam Details
1 page
Application of Derivatives Notes For JEE Main IIT JEE Advanced Download PDF - .pdf-87
0% (1)
Application of Derivatives Notes For JEE Main IIT JEE Advanced Download PDF - .pdf-87
10 pages
MLL CLASS XII - Maths 2024-25
No ratings yet
MLL CLASS XII - Maths 2024-25
44 pages
Quadratic ISI PYQs
No ratings yet
Quadratic ISI PYQs
14 pages
Maths (041) Xii PB 1 Ms Set B
100% (1)
Maths (041) Xii PB 1 Ms Set B
8 pages
Descriptive Statistics and Probability
No ratings yet
Descriptive Statistics and Probability
34 pages
Class XII DETERMINANTS Most Important Questions For 2024-25 Examination (Dr. Amit Bajaj)
No ratings yet
Class XII DETERMINANTS Most Important Questions For 2024-25 Examination (Dr. Amit Bajaj)
83 pages
Iit M Es Qualifier An Exam Qeq1-1
No ratings yet
Iit M Es Qualifier An Exam Qeq1-1
45 pages
Structural Equation Modeling Using Amos
100% (3)
Structural Equation Modeling Using Amos
238 pages
IIT M QUALIFIER EXAM QPQ2 29 Oct 2023
No ratings yet
IIT M QUALIFIER EXAM QPQ2 29 Oct 2023
70 pages
Transforms and Partial Differential Equation Questions Notes of m3, 3rd Semester Notes
100% (3)
Transforms and Partial Differential Equation Questions Notes of m3, 3rd Semester Notes
52 pages
UNIT II Eigenvalues and Eigenvectors
100% (2)
UNIT II Eigenvalues and Eigenvectors
18 pages
Maths (041) Xii PB 1 MS Set A
No ratings yet
Maths (041) Xii PB 1 MS Set A
8 pages
WC - Xii Maths QP Set 1 2024-25
No ratings yet
WC - Xii Maths QP Set 1 2024-25
6 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
63 pages
Different Iab Ilty
No ratings yet
Different Iab Ilty
26 pages
Set A Answer Key
No ratings yet
Set A Answer Key
54 pages
Maths Class Xii Sample Paper Test 05 For Board Exam 2024
100% (3)
Maths Class Xii Sample Paper Test 05 For Board Exam 2024
6 pages
Practice Assignment 1.1 - Not Graded
No ratings yet
Practice Assignment 1.1 - Not Graded
7 pages
Determinant Sheet 1632418974083
100% (1)
Determinant Sheet 1632418974083
40 pages
Algebraic and Transcedental Equations
No ratings yet
Algebraic and Transcedental Equations
24 pages
Formula Sheet Xii (Term1 and 2)
No ratings yet
Formula Sheet Xii (Term1 and 2)
14 pages
B.sc. (Hons.) III Mathematics-C.b.c.s.c5 Theory of Real Functions-2064
No ratings yet
B.sc. (Hons.) III Mathematics-C.b.c.s.c5 Theory of Real Functions-2064
4 pages
7 - Continuity and Differentiability PDF
No ratings yet
7 - Continuity and Differentiability PDF
16 pages
Newtons Difference Table Matlab Code
No ratings yet
Newtons Difference Table Matlab Code
3 pages
BSDS Syllabus Year 2024
No ratings yet
BSDS Syllabus Year 2024
15 pages
Linear Programming: ISC Class 12 Previous Years Board Questions
100% (1)
Linear Programming: ISC Class 12 Previous Years Board Questions
5 pages
Bilinear Transformations
No ratings yet
Bilinear Transformations
20 pages
Jee Maths Binomial Theorem Test
No ratings yet
Jee Maths Binomial Theorem Test
5 pages
Relations and Functions Assignment
No ratings yet
Relations and Functions Assignment
2 pages
Cbse Class 12 Mathematics Ef1gh 2 Set 1 2023 PDF
No ratings yet
Cbse Class 12 Mathematics Ef1gh 2 Set 1 2023 PDF
23 pages
Ratio and Proportion B.Com CA
No ratings yet
Ratio and Proportion B.Com CA
10 pages
Probability Case Study
No ratings yet
Probability Case Study
28 pages
11th IP Sample Paper
No ratings yet
11th IP Sample Paper
6 pages
Class 12 Chapter 13 Maths Important Formulas
No ratings yet
Class 12 Chapter 13 Maths Important Formulas
2 pages
Maths (041) Xii PB 1 QP Set B
No ratings yet
Maths (041) Xii PB 1 QP Set B
7 pages
Question Paper Pu II Maths - IIp
No ratings yet
Question Paper Pu II Maths - IIp
47 pages
UPSC Math Syllabus
No ratings yet
UPSC Math Syllabus
6 pages
Unit 1 Unit 2 Unit 3 DIFFERENTIAL CALCULUS 1 2 3 PDF
No ratings yet
Unit 1 Unit 2 Unit 3 DIFFERENTIAL CALCULUS 1 2 3 PDF
124 pages
Notes On Stochastic Processes: 1 Learning Outcomes
No ratings yet
Notes On Stochastic Processes: 1 Learning Outcomes
26 pages
Conformal and Bilinear
No ratings yet
Conformal and Bilinear
8 pages
Application of Integrals
No ratings yet
Application of Integrals
5 pages
Box Jenkins Methodology
100% (1)
Box Jenkins Methodology
29 pages
Data Processing & Analysis in B.R.
No ratings yet
Data Processing & Analysis in B.R.
19 pages
Relation and Function, Matrix and Derivatives
No ratings yet
Relation and Function, Matrix and Derivatives
10 pages
MA8251-Engineering Mathematics - II - Question Bank
No ratings yet
MA8251-Engineering Mathematics - II - Question Bank
15 pages
5 Hermitian and Skew-Hermitian Matrices: Definitions: A Matrix With Complex Elements Is Said To
No ratings yet
5 Hermitian and Skew-Hermitian Matrices: Definitions: A Matrix With Complex Elements Is Said To
4 pages
Introduction To Management Science Hiller Hiller Chapter02 PDF
100% (2)
Introduction To Management Science Hiller Hiller Chapter02 PDF
56 pages
6.sũnyam Sāmyasamuccaye
No ratings yet
6.sũnyam Sāmyasamuccaye
10 pages
Computer Oriented Numerical Methods (CONM) 2620004
No ratings yet
Computer Oriented Numerical Methods (CONM) 2620004
3 pages
Graph Theory and Algorithms: Pratima Panigrahi Department of Mathematics Indian Institute of Technology Kharagpur 721302
No ratings yet
Graph Theory and Algorithms: Pratima Panigrahi Department of Mathematics Indian Institute of Technology Kharagpur 721302
18 pages
ASYMPTOTES1
No ratings yet
ASYMPTOTES1
13 pages
FINN 321 Econometrics Muhammad Asim
No ratings yet
FINN 321 Econometrics Muhammad Asim
4 pages
Olah Data 2 (Analisis Diskriminan)
No ratings yet
Olah Data 2 (Analisis Diskriminan)
40 pages
9STEPSBinomial Logistic Regression EDWINABU
No ratings yet
9STEPSBinomial Logistic Regression EDWINABU
10 pages
Multiple Linear Regression AirBNB Solution-1 - Jupyter Notebook
100% (1)
Multiple Linear Regression AirBNB Solution-1 - Jupyter Notebook
17 pages
Tyagi Et Al. 2021
No ratings yet
Tyagi Et Al. 2021
19 pages
5103A1
No ratings yet
5103A1
6 pages
An Integrated Delphi-Agent-Based Model (ABM) For Simulating Postharvest Loss Behavior Management
No ratings yet
An Integrated Delphi-Agent-Based Model (ABM) For Simulating Postharvest Loss Behavior Management
10 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
SML Practical 1to11
No ratings yet
SML Practical 1to11
23 pages
cp4252 Machine Learning Lab Manual
No ratings yet
cp4252 Machine Learning Lab Manual
38 pages
Session 2
No ratings yet
Session 2
21 pages
+part 05 - AMEFA - 2024 - Autocorrelation
No ratings yet
+part 05 - AMEFA - 2024 - Autocorrelation
124 pages
This Study Resource Was: Parental Involvement and Academic Achievement A Study On 9-Year-Old Irish Children
No ratings yet
This Study Resource Was: Parental Involvement and Academic Achievement A Study On 9-Year-Old Irish Children
9 pages
Experimental Investigation To Study The Effect of Solid Lubricants On Cutting Forces and Surface Quality in End Milling
No ratings yet
Experimental Investigation To Study The Effect of Solid Lubricants On Cutting Forces and Surface Quality in End Milling
10 pages
InTech-Reverse Supply Chain Management Modeling Through System Dynamics PDF
No ratings yet
InTech-Reverse Supply Chain Management Modeling Through System Dynamics PDF
27 pages
Theory of Constrained Optimization
No ratings yet
Theory of Constrained Optimization
18 pages
Toward Agent-Based Models For Investment: J. Doyne Farmer
No ratings yet
Toward Agent-Based Models For Investment: J. Doyne Farmer
10 pages
A Comprehensive Approach To Misspecification Testing in Linear Regression Models
No ratings yet
A Comprehensive Approach To Misspecification Testing in Linear Regression Models
6 pages
3 2LeastSquaresRegression
No ratings yet
3 2LeastSquaresRegression
29 pages
A Brief Guide To Decisions at Each Step of The Propensity Score M
No ratings yet
A Brief Guide To Decisions at Each Step of The Propensity Score M
12 pages
Chapter 11 - 250305 - 102157
No ratings yet
Chapter 11 - 250305 - 102157
7 pages
Chapter 5: Regression and Correlation: Bivariate Data) and Relationship Between The Two Variables
No ratings yet
Chapter 5: Regression and Correlation: Bivariate Data) and Relationship Between The Two Variables
5 pages
Environmental Impact Characterization of Milling and Implications For Potential Energy Savings in Industry
No ratings yet
Environmental Impact Characterization of Milling and Implications For Potential Energy Savings in Industry
6 pages
4.2 Correlation Regression TQ
No ratings yet
4.2 Correlation Regression TQ
9 pages
Jisc/Osi Journal Authors Survey: Key Perspectives LTD
No ratings yet
Jisc/Osi Journal Authors Survey: Key Perspectives LTD
80 pages
P1296 PDF
No ratings yet
P1296 PDF
24 pages
Abm Platform Review
No ratings yet
Abm Platform Review
29 pages
2022 Test
No ratings yet
2022 Test
12 pages
2321-Article Text-8649-1-10-20200130
No ratings yet
2321-Article Text-8649-1-10-20200130
17 pages
REGRESSION
No ratings yet
REGRESSION
9 pages
Introduction To Management Science Hiller Hiller Chapter01 PDF
No ratings yet
Introduction To Management Science Hiller Hiller Chapter01 PDF
21 pages
Cfa Amos
No ratings yet
Cfa Amos
7 pages
Sawtooth Software: Assessing The Validity of Conjoint Analysis - Continued
No ratings yet
Sawtooth Software: Assessing The Validity of Conjoint Analysis - Continued
20 pages
Cutting Temperature, Tool Wear, Surface Roughness and Dimensional Deviation in Turning AISI-4037 Steel Under Cryogenic Condition
No ratings yet
Cutting Temperature, Tool Wear, Surface Roughness and Dimensional Deviation in Turning AISI-4037 Steel Under Cryogenic Condition
6 pages
1917041105-Daftar Nama Retail
No ratings yet
1917041105-Daftar Nama Retail
7 pages
Compmodels 20041019 PDF
No ratings yet
Compmodels 20041019 PDF
15 pages
C8 Eval
No ratings yet
C8 Eval
15 pages
Supercomputing Challenge: Northern New Mexico Collage (NNMC)
No ratings yet
Supercomputing Challenge: Northern New Mexico Collage (NNMC)
10 pages
Andika Word Apk. Kom
No ratings yet
Andika Word Apk. Kom
4 pages
6.8 Lack of Fit Test: Multiple Linear Regression: Objective
No ratings yet
6.8 Lack of Fit Test: Multiple Linear Regression: Objective
4 pages
302 Blom - 0127130504 Final Paper 4 18 13 PDF
No ratings yet
302 Blom - 0127130504 Final Paper 4 18 13 PDF
9 pages
How To Do ANCOVA Problems in SPSS
No ratings yet
How To Do ANCOVA Problems in SPSS
9 pages
Progress Report EECS 472 Yu Liu Agent Behavior
No ratings yet
Progress Report EECS 472 Yu Liu Agent Behavior
3 pages
Optimising An Agent - Based Model To Explore The Behaviour of Simulated Burglars
No ratings yet
Optimising An Agent - Based Model To Explore The Behaviour of Simulated Burglars
7 pages
MAS 202 Final Exam Topics Fall 2021
No ratings yet
MAS 202 Final Exam Topics Fall 2021
2 pages

Model Selection and Feature Selection: Piyush Rai CS5350/6350: Machine Learning

Uploaded by

Model Selection and Feature Selection: Piyush Rai CS5350/6350: Machine Learning

Uploaded by

Model Selection and Feature Selection

CS5350/6350: Machine Learning

September 22, 2011

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 1 / 14

Given a set of models M = {M1 , M2 , . . . , MR }, choose the model that is expected

Note: Usually considered in supervised learning contexts but unsupervised

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 2 / 14

Set aside a fraction (say 10%-20%) of the training data

Remember: Held-out data is NOT the test data

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 3 / 14

Finally, choose the model with smallest average validation error

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 4 / 14

Special case of K -fold CV when K = N (number of training examples)

Finally, choose the model with smallest average validation error

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 5 / 14

Randomly subsample a fixed fraction N (0 < < 1) of examples; call it the

Finally, choose the model with smallest average validation error

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 6 / 14

Given: a set of N examples

Use this new sample as the training data

e = 0.632 etestexamples + 0.368 etraining examples

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 7 / 14

Akaike Information Criteria (AIC)

Bayesian Information Criteria (BIC)

BIC = k log(N) 2 log(L)

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 8 / 14

MDL measures the number of bits to encode a probability distribution

MDL = log2 P(z)

Minimum Description Length for a model M

Length(M) = log P(Y | X, w, M) log P(w | M)

Note: its just the MDL for models posterior distribution

P(w | X, Y, M) P(w | M) P(Y | X, w, M)

Complex posterior distribution Complex model

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 9 / 14

Selecting a useful subset from all the features

Some algorithms scale (computationally) poorly with increased dimension

Irrelevant features can confuse some algorithms

Redundant features adversely affect regularization

Removal of features can increase (relative) margin (and generalization)

Reduces data set and resulting model size

Note: Feature Selection is different from Feature Extraction

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 10 / 14

Methods agnostic to the learning algorithm

Filter Feature Selection methods

Wrapper Methods (keep the learning algorithm in the loop)

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 11 / 14

Uses heuristics but is much faster than wrapper methods

Correlation Critera: Rank features in order of their correlation with the

Mutual Information Criteria:

High mutual information mean high relevance of that feature

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 12 / 14

Two types: Forward Search and Backward Search

Inclusion/Removal criteria uses cross-validation

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 13 / 14

Add f with lowest error to F

Remove f with lowest error from F

(CS5350/6350) Model Selection and Feature Selection September 22, 2011 14 / 14

You might also like