Model Selection and Feature Selection: Piyush Rai CS5350/6350: Machine Learning
Model Selection and Feature Selection: Piyush Rai CS5350/6350: Machine Learning
Piyush Rai
K -fold Cross-Validation
Create K equal sized partitions of the training data
Each partition has N/K examples
Train using K 1 partitions, validate on the remaining partition
Repeat the same K times, each with a different validation partition
Note: the above estimate may still be bad if we overfit and have
etraining examples = 0. Why?
AIC = 2k 2 log(L)
k: # of model parameters
L: maximum value of the model likelihood function
Applicable for probabilistic models (when likelihood is defined)
AIC/BIC penalize model complexity
.. as measured by the number of model parameters
BIC penalizes the number of parameters more than AIC
Model with the lowest AIC/BIC will be chosen
Can be used even for model selection in unsupervised learning
Backward Search
Start with all the features
Greedily remove the least relevant feature
Stop when selected the desired number of features
Forward Search
Let F = {}
While not selected desired number of features
For each unused feature f :
S
Estimate models error on feature set F f (using cross-validation)
Backward Search
Let F = {all features}
While not reduced to desired number of features
For each feature f F :
Estimate models error on feature set F \f (using cross-validation)