Lecture 7 - Feature Selection & Model Optimization
Lecture 7 - Feature Selection & Model Optimization
Since 2004
Hanoi, 10/2024
Outline
● True Error versus Empirical Error
● Overfitting, Underfitting
● Bias-Variance Tradeoff
● Model Optimization
○ Feature Selection
○ Regularization
○ Model Ensemble
FIT-CS INT3405E - Machine Learning 2
Recap: Support Vector Machines (SVM)
Support
Vectors
Price
Price
Size Size Size
y y y
x x x
1
( = sigmoid function)
“Underfitting” “Overfitting”
FIT-CS INT3405E - Machine Learning 8
Model Complexity
Error
True
Error
Empirical Error
model
complexity
underfitting Best model overfittin
g
Empirical error (training error) is no longer a good indicator of true error
• Optimal Predictor
Approx. error
• Regression:
High Bias
Low Bias
Overfitting
High Bias
Low Bias
Overfitting
3 Independent
• Low bias, high variance – good approximation training
but instable datasets
Error
True
Error
Empirical Error
model
complexity
underfitting Best model overfittin
g
High High
Bias Varianc
e
error
error
Test/CV
Error
high Test/CV
error Error
Training large
Error gap
Training
Error
(training set size) (training set size)
Idea: Compute the importance of the feature => Choose the most
important ones
FIT-CS INT3405E - Machine Learning 26
Feature Selection - Information Gain
• Choice of regularizer
• “Rigid Regression”
Thank you