0% found this document useful (0 votes)

26 views48 pages

Lecture 7 - Feature Selection & Model Optimization

Uploaded by

22028007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views48 pages

Lecture 7 - Feature Selection & Model Optimization

Uploaded by

22028007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

UET

Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

VNU-University of Engineering and Technology

INT3405 - Machine Learning

Lecture 7: Model Optimization

Hanoi, 10/2024
Outline
● True Error versus Empirical Error
● Overfitting, Underfitting
● Bias-Variance Tradeoff
● Model Optimization
○ Feature Selection
○ Regularization
○ Model Ensemble
FIT-CS INT3405E - Machine Learning 2
Recap: Support Vector Machines (SVM)

Support
Vectors

FIT-CS INT3405E - Machine Learning 3

Recap: Soft Margin SVM
●Standard Linear SVM
○Introduce slack variables
○Relax the constraints
○Penalize the relaxation
Primal
Problem:

C is a regularization parameter. Soft margin SVM trade off between

maximizing the margin and minimizing the misclassification error rate

FIT-CS INT3405E - Machine Learning 4

True Error versus Empirical Error
• True Error/Risk: target performance measure
• Classification: probability of misclassification
• Regression: mean squared error
• Performance on a random test/unseen point (X,Y)
• Empirical Error/risk: performance on training data
• Classification: proportion of misclassified examples

• Regression: average squared error

FIT-CS INT3405E - Machine Learning 5

True Error versus Empirical Error

FIT-CS INT3405E - Machine Learning 6

Overfitting
Example: Linear regression (housing prices)
Price

Price

Price
Size Size Size

Overfitting: If we have too many features (complicated predictor), the

learned hypothesis may fit the training set very well, but fail to generalize
to new examples (predict prices on new examples).
FIT-CS INT3405E - Machine Learning 7
Overfitting versus Underfitting
Example: Logistic regression

y y y

x x x
1

( = sigmoid function)

“Underfitting” “Overfitting”
FIT-CS INT3405E - Machine Learning 8
Model Complexity

Error
True
Error

Empirical Error
model
complexity
underfitting Best model overfittin
g
Empirical error (training error) is no longer a good indicator of true error

FIT-CS INT3405E - Machine Learning 9

Examples of Model Complexity
● Examples of Model Spaces with increasing complexity:
○ Regression with polynomials of order k=0,1,2,…

Higher degree => higher complexity

○ Decision Trees with depth k or with k leaves

Higher depth/ More # leaves => Higher complexity

○ KNN classifiers with varying neighbourhood sizes k =1,2,3…

Small neighbourhood => Higher complexity

FIT-CS INT3405E - Machine Learning 10

Risk Analysis (1)
• True Error/Risk vs Empirical Error/Risk

• Optimal Predictor

• Empirical Error Minimization over class

FIT-CS INT3405E - Machine Learning 11

Risk Analysis (2)
• Excess Risk

Estimation error Approximation error

Due to randomness Due to restriction
of training data (finite sample size) of model class

Estimation Excess risk

error
Approx. error
FIT-CS INT3405E - Machine Learning 12
Risk Analysis (2)

Estimation error Approximation error

Estimation Excess risk

error

Approx. error

FIT-CS INT3405E - Machine Learning 13

Bias - Variance Trade-off

• Regression:

Notice that Optimal predictor

does not have zero error

FIT-CS INT3405E - Machine Learning 14

Bias

FIT-CS INT3405E - Machine Learning 15

Variance

FIT-CS INT3405E - Machine Learning 16

Bias and Variance: Intuition

Low Variance High Variance

Underfitting

High Bias

Low Bias

Overfitting

FIT-CS INT3405E - Machine Learning 17

Bias and Variance: Intuition

Low Variance High Variance

Underfitting

High Bias

Low Bias

Overfitting

FIT-CS INT3405E - Machine Learning 18

Bias -Variance Trade-off
• High bias, Low variance – poor approximation
but robust/stable

3 Independent
• Low bias, high variance – good approximation training
but instable datasets

FIT-CS INT3405E - Machine Learning 19

Model Optimization

Error
True
Error

Empirical Error
model
complexity
underfitting Best model overfittin
g

FIT-CS INT3405E - Machine Learning 20

Learning Curve

High High
Bias Varianc
e

error
error

Test/CV
Error
high Test/CV
error Error
Training large
Error gap
Training
Error
(training set size) (training set size)

FIT-CS INT3405E - Machine Learning 21

How to address Over-fitting
● Reduce number of features
○ Feature selection
○ Model selection algorithms
● Regularization
○ Incorporate model complexity for optimization, penalize
complex models using prior knowledge
○ Keep all the features, but reduce magnitude/values of model
parameters
○ Works well when we have a lot of features, each of which
contributes a bit to the prediction

FIT-CS INT3405E - Machine Learning 22

Feature Selection
Idea: Find the best set of features that allows one to build optimized
models => Reduce the model complexity,

FIT-CS INT3405E - Machine Learning 23

Feature Selection Methods

FIT-CS INT3405E - Machine Learning 24

Supervised Feature Selection

FIT-CS INT3405E - Machine Learning 25

Feature Selection - Filter Methods

Idea: Compute the importance of the feature => Choose the most
important ones
FIT-CS INT3405E - Machine Learning 26
Feature Selection - Information Gain

FIT-CS INT3405E - Machine Learning 27

Feature Selection - Chi-square

FIT-CS INT3405E - Machine Learning 28

Feature Selection - Wrapper Methods

Idea: Gradually choose the most important features

FIT-CS INT3405E - Machine Learning 29

Feature Selection - Regularization
• Regularized learning framework

Cost of model / model complexity

• Penalize complex models using prior knowledge.
• Two Examples
• Regularized Linear Regression (rigid regression)
• Regularized Logistic Regression

FIT-CS INT3405E - Machine Learning 30

Regularized Linear Regression
• Linear Regression

• Regularized Linear Regression

• Choice of regularizer
• “Rigid Regression”

• “Lasso” (Least absolute shrinkage and selection operator)

FIT-CS INT3405E - Machine Learning 31

Regularized Logistic Regression

• -regularized Logistic Regression

• -regularized Logistic Regression (“Sparse Logistic

Regressions”)

FIT-CS INT3405E - Machine Learning 32

Model Ensemble
• Basic Idea: Instead of learning one
mode, learning several and
combine them

• Typically improves the accuracy,

often by a lot

FIT-CS INT3405E - Machine Learning 33

Why does It Work?
Suppose there are 25 base classifiers
• Each classifier has error rate, ε = 0.35
• Assume classifiers are independent
• Probability that the ensemble classifier makes a wrong
prediction (i.e.,13 out of the 25 classifiers misclassified)

FIT-CS INT3405E - Machine Learning 34

Bagging Classifiers
• In general, sampling from p(h|D)
is difficult
• P(h|D) is difficult to compute
• P(h|D) is impossible to
compute for non-probabilistic
classifier such as SVM
• Bagging Classifiers:
• Realize sampling p(h|D) by
sampling training examples

FIT-CS INT3405E - Machine Learning 35

Boostrap Sampling
• Bagging = Boostrap aggregating
• Boostrap sampling: given set D containing n training examples
• Create Di by drawing n examples at random with
replacement from D
• Di expects to leave out about 0.37 of examples from D

FIT-CS INT3405E - Machine Learning 36

Bagging
• Sampling with replacement

• Build classifier on each bootstrap sample

• Each sample has probability (1 – 1/n) of being remained
• All training data has probability (1 – 1/n)^n of being remained
• This value tends to be 1/e ~ 0.37 for large n

FIT-CS INT3405E - Machine Learning 37

Bagging Algorithm

FIT-CS INT3405E - Machine Learning 38

Bagging ~ Bayesian Average

FIT-CS INT3405E - Machine Learning 39

Inefficiency with Bagging

• Inefficient boostrap sampling:

• Every example has equal chance to be
sampled
• No distinction between “easy”
examples and “difficult” examples
• Inefficient model combination:
• A constant weight for each classifier
• No distinction between accurate
classifiers and inaccurate classifiers

FIT-CS INT3405E - Machine Learning 40

Improve the Efficiency of Bagging
• Better sampling strategy
• Focus on the examples that are difficult to classify

• Better combination strategy

• Accurate model should be assigned larger weights

FIT-CS INT3405E - Machine Learning 41

Intuition

FIT-CS INT3405E - Machine Learning 42

Boosting: Example
• Instances that are wrongly classified will have their weights increased
• Instances that are correctly classified will have their weights decreased

• Example 4 is hard to classify

• Its weight is increased, therefore it is more likely to be
chosen again in subsequent rounds

FIT-CS INT3405E - Machine Learning 43

AdaBoost

FIT-CS INT3405E - Machine Learning 44

AdaBoost Example

FIT-CS INT3405E - Machine Learning 45

Stacking

FIT-CS INT3405E - Machine Learning 46

Summary
● True Error versus Empirical Error
● Overfitting, Underfitting
● Bias-Variance Tradeoff
● Model Optimization
○ Feature Selection
○ Regularization
○ Model Ensemble
FIT-CS INT3405E - Machine Learning 47
UET
Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

VNU-University of Engineering and Technology

Thank you

1 All Notes G
No ratings yet
1 All Notes G
217 pages
Data Science Machine Learning
No ratings yet
Data Science Machine Learning
369 pages
Lecture 5 Classification P2 Decision Tree
No ratings yet
Lecture 5 Classification P2 Decision Tree
54 pages
Lecture 6 Classification P3 SVM
No ratings yet
Lecture 6 Classification P3 SVM
44 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
55 pages
CS585 Lecture October03rd
No ratings yet
CS585 Lecture October03rd
146 pages
CH 6
No ratings yet
CH 6
24 pages
Lecture03 Linear Regression
No ratings yet
Lecture03 Linear Regression
54 pages
Learning
No ratings yet
Learning
51 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
51 pages
Week 5
No ratings yet
Week 5
72 pages
Lecture 6 Classification SVM
No ratings yet
Lecture 6 Classification SVM
44 pages
08 Classification
No ratings yet
08 Classification
46 pages
Lecture 5 Classification SVM
No ratings yet
Lecture 5 Classification SVM
44 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
49 pages
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
50 pages
Lecture 4 - Intro To Machine Learning and Decision Trees
No ratings yet
Lecture 4 - Intro To Machine Learning and Decision Trees
61 pages
Chp8 Classification Basic Concepts - Lecture#8
No ratings yet
Chp8 Classification Basic Concepts - Lecture#8
40 pages
Lecture 5 Evaluation - Classifer
No ratings yet
Lecture 5 Evaluation - Classifer
61 pages
5 DL
No ratings yet
5 DL
33 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
2 Supervised Learning
No ratings yet
2 Supervised Learning
52 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
DM Unit-3
No ratings yet
DM Unit-3
23 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
Session 5
No ratings yet
Session 5
36 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
112 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Week 4 Lecture Slides BUS265 2023
No ratings yet
Week 4 Lecture Slides BUS265 2023
45 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Chapter 19
No ratings yet
Chapter 19
30 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
SML Book Draft Latest
No ratings yet
SML Book Draft Latest
194 pages
T Test F Test Table
No ratings yet
T Test F Test Table
337 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Industrial Cases in Simulation Modeling
From Everand
Industrial Cases in Simulation Modeling
James A. Chisman PhD
No ratings yet
08 Class Basic
No ratings yet
08 Class Basic
141 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
Ancova: Psy 420 Andrew Ainsworth
No ratings yet
Ancova: Psy 420 Andrew Ainsworth
53 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
Tutorial Stat 322 PDF
No ratings yet
Tutorial Stat 322 PDF
58 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
CH 11 Correlatiion Analysis
No ratings yet
CH 11 Correlatiion Analysis
26 pages
Exercises
No ratings yet
Exercises
69 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Lect 1
No ratings yet
Lect 1
24 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
Box Jenkins Methodology
100% (1)
Box Jenkins Methodology
29 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Econometrics II Chap 4.1 Univariate Time Series
No ratings yet
Econometrics II Chap 4.1 Univariate Time Series
63 pages
Quiz 9 - Chap 10
No ratings yet
Quiz 9 - Chap 10
3 pages
Basic Econometrics Gujarati 2008
100% (12)
Basic Econometrics Gujarati 2008
946 pages
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
No ratings yet
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
47 pages
Week 2 Multiple Regression
No ratings yet
Week 2 Multiple Regression
24 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
INT354 Syllabus
No ratings yet
INT354 Syllabus
2 pages
CHAPTER 22 &23: Bivariate Statistical Analysis
No ratings yet
CHAPTER 22 &23: Bivariate Statistical Analysis
10 pages
Telco Customer Churn
100% (2)
Telco Customer Churn
11 pages
Classification
No ratings yet
Classification
4 pages
Preface To The Second Edition V 1 1
No ratings yet
Preface To The Second Edition V 1 1
9 pages
Multiple Linear Regression Analysis Usin
No ratings yet
Multiple Linear Regression Analysis Usin
19 pages
2019 Syllabus Empirical Methods in Finance
No ratings yet
2019 Syllabus Empirical Methods in Finance
8 pages
Neural - N - Problems - MLP
No ratings yet
Neural - N - Problems - MLP
15 pages
CONCEPTS IN MACHINE LEARNING-Ktunotes - in
No ratings yet
CONCEPTS IN MACHINE LEARNING-Ktunotes - in
14 pages
Star Test
No ratings yet
Star Test
7 pages
Survival Analysis Coursework
No ratings yet
Survival Analysis Coursework
11 pages
ALAS - M2L3 Application
No ratings yet
ALAS - M2L3 Application
6 pages
Scatter Diagrams and Karl Pearson Correlation Table by Arun
No ratings yet
Scatter Diagrams and Karl Pearson Correlation Table by Arun
3 pages
General Mathematics Unit 2 Test Solutions 2022
No ratings yet
General Mathematics Unit 2 Test Solutions 2022
10 pages
Lecture Notes On Multicollinearity
No ratings yet
Lecture Notes On Multicollinearity
16 pages
Econometrics Exam, 11,11,12
100% (1)
Econometrics Exam, 11,11,12
2 pages
Econometrics Project - 10807 - 10811 - 10817 - 10963 - 10981
No ratings yet
Econometrics Project - 10807 - 10811 - 10817 - 10963 - 10981
18 pages
Regression. Text Book Solution
No ratings yet
Regression. Text Book Solution
3 pages
CS229 Syllabus Spring 2023 Public - Syllabus
No ratings yet
CS229 Syllabus Spring 2023 Public - Syllabus
2 pages
Machine Learning - Exploring The Model
No ratings yet
Machine Learning - Exploring The Model
2 pages
Analisis Jalur 3x
No ratings yet
Analisis Jalur 3x
6 pages
Analysis of The Effect of JSA, Hirarc and Unsafe Action On The Occupation of Work Accidents On Forklift Operations in A Light Steel Company - East Java
No ratings yet
Analysis of The Effect of JSA, Hirarc and Unsafe Action On The Occupation of Work Accidents On Forklift Operations in A Light Steel Company - East Java
9 pages

Lecture 7 - Feature Selection & Model Optimization

Uploaded by

Lecture 7 - Feature Selection & Model Optimization

Uploaded by

UET

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

INT3405 - Machine Learning

FIT-CS INT3405E - Machine Learning 3

C is a regularization parameter. Soft margin SVM trade off between

FIT-CS INT3405E - Machine Learning 4

• Regression: average squared error

FIT-CS INT3405E - Machine Learning 5

FIT-CS INT3405E - Machine Learning 6

Overfitting: If we have too many features (complicated predictor), the

FIT-CS INT3405E - Machine Learning 9

Higher degree => higher complexity

○ Decision Trees with depth k or with k leaves

Higher depth/ More # leaves => Higher complexity

○ KNN classifiers with varying neighbourhood sizes k =1,2,3…

FIT-CS INT3405E - Machine Learning 10

• Empirical Error Minimization over class

FIT-CS INT3405E - Machine Learning 11

Estimation error Approximation error

Estimation Excess risk

Estimation error Approximation error

Estimation Excess risk

FIT-CS INT3405E - Machine Learning 13

Notice that Optimal predictor

FIT-CS INT3405E - Machine Learning 14

FIT-CS INT3405E - Machine Learning 15

FIT-CS INT3405E - Machine Learning 16

Low Variance High Variance

FIT-CS INT3405E - Machine Learning 17

Low Variance High Variance

FIT-CS INT3405E - Machine Learning 18

FIT-CS INT3405E - Machine Learning 19

FIT-CS INT3405E - Machine Learning 20

FIT-CS INT3405E - Machine Learning 21

FIT-CS INT3405E - Machine Learning 22

FIT-CS INT3405E - Machine Learning 23

FIT-CS INT3405E - Machine Learning 24

FIT-CS INT3405E - Machine Learning 25

FIT-CS INT3405E - Machine Learning 27

FIT-CS INT3405E - Machine Learning 28

Idea: Gradually choose the most important features

FIT-CS INT3405E - Machine Learning 29

Cost of model / model complexity

FIT-CS INT3405E - Machine Learning 30

• Regularized Linear Regression

• “Lasso” (Least absolute shrinkage and selection operator)

FIT-CS INT3405E - Machine Learning 31

• -regularized Logistic Regression

• -regularized Logistic Regression (“Sparse Logistic

FIT-CS INT3405E - Machine Learning 32

• Typically improves the accuracy,

FIT-CS INT3405E - Machine Learning 33

FIT-CS INT3405E - Machine Learning 34

FIT-CS INT3405E - Machine Learning 35

FIT-CS INT3405E - Machine Learning 36

• Build classifier on each bootstrap sample

FIT-CS INT3405E - Machine Learning 37

FIT-CS INT3405E - Machine Learning 38

FIT-CS INT3405E - Machine Learning 39

• Inefficient boostrap sampling:

FIT-CS INT3405E - Machine Learning 40

• Better combination strategy

FIT-CS INT3405E - Machine Learning 41

FIT-CS INT3405E - Machine Learning 42

• Example 4 is hard to classify

FIT-CS INT3405E - Machine Learning 43

FIT-CS INT3405E - Machine Learning 44

FIT-CS INT3405E - Machine Learning 45

FIT-CS INT3405E - Machine Learning 46

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

You might also like