0% found this document useful (0 votes)
26 views48 pages

Lecture 7 - Feature Selection & Model Optimization

Uploaded by

22028007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views48 pages

Lecture 7 - Feature Selection & Model Optimization

Uploaded by

22028007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

UET

Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN


VNU-University of Engineering and Technology

INT3405 - Machine Learning


Lecture 7: Model Optimization

Hanoi, 10/2024
Outline
● True Error versus Empirical Error
● Overfitting, Underfitting
● Bias-Variance Tradeoff
● Model Optimization
○ Feature Selection
○ Regularization
○ Model Ensemble
FIT-CS INT3405E - Machine Learning 2
Recap: Support Vector Machines (SVM)

Support
Vectors

FIT-CS INT3405E - Machine Learning 3


Recap: Soft Margin SVM
●Standard Linear SVM
○Introduce slack variables
○Relax the constraints
○Penalize the relaxation
Primal
Problem:

C is a regularization parameter. Soft margin SVM trade off between


maximizing the margin and minimizing the misclassification error rate

FIT-CS INT3405E - Machine Learning 4


True Error versus Empirical Error
• True Error/Risk: target performance measure
• Classification: probability of misclassification
• Regression: mean squared error
• Performance on a random test/unseen point (X,Y)
• Empirical Error/risk: performance on training data
• Classification: proportion of misclassified examples

• Regression: average squared error

FIT-CS INT3405E - Machine Learning 5


True Error versus Empirical Error

FIT-CS INT3405E - Machine Learning 6


Overfitting
Example: Linear regression (housing prices)
Price

Price

Price
Size Size Size

Overfitting: If we have too many features (complicated predictor), the


learned hypothesis may fit the training set very well, but fail to generalize
to new examples (predict prices on new examples).
FIT-CS INT3405E - Machine Learning 7
Overfitting versus Underfitting
Example: Logistic regression

y y y

x x x
1

( = sigmoid function)

“Underfitting” “Overfitting”
FIT-CS INT3405E - Machine Learning 8
Model Complexity

Error
True
Error

Empirical Error
model
complexity
underfitting Best model overfittin
g
Empirical error (training error) is no longer a good indicator of true error

FIT-CS INT3405E - Machine Learning 9


Examples of Model Complexity
● Examples of Model Spaces with increasing complexity:
○ Regression with polynomials of order k=0,1,2,…

Higher degree => higher complexity

○ Decision Trees with depth k or with k leaves

Higher depth/ More # leaves => Higher complexity

○ KNN classifiers with varying neighbourhood sizes k =1,2,3…


Small neighbourhood => Higher complexity

FIT-CS INT3405E - Machine Learning 10


Risk Analysis (1)
• True Error/Risk vs Empirical Error/Risk

• Optimal Predictor

• Empirical Error Minimization over class

FIT-CS INT3405E - Machine Learning 11


Risk Analysis (2)
• Excess Risk

Estimation error Approximation error


Due to randomness Due to restriction
of training data (finite sample size) of model class

Estimation Excess risk


error
Approx. error
FIT-CS INT3405E - Machine Learning 12
Risk Analysis (2)

Estimation error Approximation error

Estimation Excess risk


error

Approx. error

FIT-CS INT3405E - Machine Learning 13


Bias - Variance Trade-off

• Regression:

Notice that Optimal predictor


does not have zero error

FIT-CS INT3405E - Machine Learning 14


Bias

FIT-CS INT3405E - Machine Learning 15


Variance

FIT-CS INT3405E - Machine Learning 16


Bias and Variance: Intuition

Low Variance High Variance


Underfitting

High Bias

Low Bias

Overfitting

FIT-CS INT3405E - Machine Learning 17


Bias and Variance: Intuition

Low Variance High Variance


Underfitting

High Bias

Low Bias

Overfitting

FIT-CS INT3405E - Machine Learning 18


Bias -Variance Trade-off
• High bias, Low variance – poor approximation
but robust/stable

3 Independent
• Low bias, high variance – good approximation training
but instable datasets

FIT-CS INT3405E - Machine Learning 19


Model Optimization

Error
True
Error

Empirical Error
model
complexity
underfitting Best model overfittin
g

FIT-CS INT3405E - Machine Learning 20


Learning Curve

High High
Bias Varianc
e

error
error

Test/CV
Error
high Test/CV
error Error
Training large
Error gap
Training
Error
(training set size) (training set size)

FIT-CS INT3405E - Machine Learning 21


How to address Over-fitting
● Reduce number of features
○ Feature selection
○ Model selection algorithms
● Regularization
○ Incorporate model complexity for optimization, penalize
complex models using prior knowledge
○ Keep all the features, but reduce magnitude/values of model
parameters
○ Works well when we have a lot of features, each of which
contributes a bit to the prediction

FIT-CS INT3405E - Machine Learning 22


Feature Selection
Idea: Find the best set of features that allows one to build optimized
models => Reduce the model complexity,

FIT-CS INT3405E - Machine Learning 23


Feature Selection Methods

FIT-CS INT3405E - Machine Learning 24


Supervised Feature Selection

FIT-CS INT3405E - Machine Learning 25


Feature Selection - Filter Methods

Idea: Compute the importance of the feature => Choose the most
important ones
FIT-CS INT3405E - Machine Learning 26
Feature Selection - Information Gain

FIT-CS INT3405E - Machine Learning 27


Feature Selection - Chi-square

FIT-CS INT3405E - Machine Learning 28


Feature Selection - Wrapper Methods

Idea: Gradually choose the most important features

FIT-CS INT3405E - Machine Learning 29


Feature Selection - Regularization
• Regularized learning framework

Cost of model / model complexity


• Penalize complex models using prior knowledge.
• Two Examples
• Regularized Linear Regression (rigid regression)
• Regularized Logistic Regression

FIT-CS INT3405E - Machine Learning 30


Regularized Linear Regression
• Linear Regression

• Regularized Linear Regression

• Choice of regularizer
• “Rigid Regression”

• “Lasso” (Least absolute shrinkage and selection operator)

FIT-CS INT3405E - Machine Learning 31


Regularized Logistic Regression

• -regularized Logistic Regression

• -regularized Logistic Regression (“Sparse Logistic


Regressions”)

FIT-CS INT3405E - Machine Learning 32


Model Ensemble
• Basic Idea: Instead of learning one
mode, learning several and
combine them

• Typically improves the accuracy,


often by a lot

FIT-CS INT3405E - Machine Learning 33


Why does It Work?
Suppose there are 25 base classifiers
• Each classifier has error rate, ε = 0.35
• Assume classifiers are independent
• Probability that the ensemble classifier makes a wrong
prediction (i.e.,13 out of the 25 classifiers misclassified)

FIT-CS INT3405E - Machine Learning 34


Bagging Classifiers
• In general, sampling from p(h|D)
is difficult
• P(h|D) is difficult to compute
• P(h|D) is impossible to
compute for non-probabilistic
classifier such as SVM
• Bagging Classifiers:
• Realize sampling p(h|D) by
sampling training examples

FIT-CS INT3405E - Machine Learning 35


Boostrap Sampling
• Bagging = Boostrap aggregating
• Boostrap sampling: given set D containing n training examples
• Create Di by drawing n examples at random with
replacement from D
• Di expects to leave out about 0.37 of examples from D

FIT-CS INT3405E - Machine Learning 36


Bagging
• Sampling with replacement

• Build classifier on each bootstrap sample


• Each sample has probability (1 – 1/n) of being remained
• All training data has probability (1 – 1/n)^n of being remained
• This value tends to be 1/e ~ 0.37 for large n

FIT-CS INT3405E - Machine Learning 37


Bagging Algorithm

FIT-CS INT3405E - Machine Learning 38


Bagging ~ Bayesian Average

FIT-CS INT3405E - Machine Learning 39


Inefficiency with Bagging

• Inefficient boostrap sampling:


• Every example has equal chance to be
sampled
• No distinction between “easy”
examples and “difficult” examples
• Inefficient model combination:
• A constant weight for each classifier
• No distinction between accurate
classifiers and inaccurate classifiers

FIT-CS INT3405E - Machine Learning 40


Improve the Efficiency of Bagging
• Better sampling strategy
• Focus on the examples that are difficult to classify

• Better combination strategy


• Accurate model should be assigned larger weights

FIT-CS INT3405E - Machine Learning 41


Intuition

FIT-CS INT3405E - Machine Learning 42


Boosting: Example
• Instances that are wrongly classified will have their weights increased
• Instances that are correctly classified will have their weights decreased

• Example 4 is hard to classify


• Its weight is increased, therefore it is more likely to be
chosen again in subsequent rounds

FIT-CS INT3405E - Machine Learning 43


AdaBoost

FIT-CS INT3405E - Machine Learning 44


AdaBoost Example

FIT-CS INT3405E - Machine Learning 45


Stacking

FIT-CS INT3405E - Machine Learning 46


Summary
● True Error versus Empirical Error
● Overfitting, Underfitting
● Bias-Variance Tradeoff
● Model Optimization
○ Feature Selection
○ Regularization
○ Model Ensemble
FIT-CS INT3405E - Machine Learning 47
UET
Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN


VNU-University of Engineering and Technology

Thank you

You might also like