0% found this document useful (0 votes)

63 views32 pages

ECS171: Machine Learning: Lecture 13: Validation, Model Selection

This document discusses validation and model selection in machine learning. It explains that validation is used to directly estimate out-of-sample error, while regularization is used to estimate the overfitting penalty. It also discusses splitting data into training and validation sets, analyzing the validation error estimate, and using cross-validation to address the tradeoff between validation set size and model selection bias. Cross-validation involves dividing data into folds and iteratively training and validating on different folds to select models and hyperparameters.

Uploaded by

Sam Dillinger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views32 pages

ECS171: Machine Learning: Lecture 13: Validation, Model Selection

Uploaded by

Sam Dillinger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

ECS171: Machine Learning

Lecture 13: Validation, Model Selection

Cho-Jui Hsieh
UC Davis

Feb 28, 2018

Validation
Validation versus regularization

We know, Eout = Ein + overfit penalty

Regularization:

Eout (h) = Ein (h) + overfit penalty

| {z }
reularization estimates this

Validation: directly estimate Eout (h)

Analyzing the estimate

On out-of-sample point (x, y ), the error is e(h(x), y )

e.g., e(h(x), y ) = (h(x) − y )2
Analyzing the estimate

On out-of-sample point (x, y ), the error is e(h(x), y )

e.g., e(h(x), y ) = (h(x) − y )2
Out-of-sample (test) error:

E[e(h(x), y )] = Eout (h)

Variance:
var[e(h(x), y )] = σ 2
Validation error

Given a set (x1 , y1 ), · · · , (xK , yK ) of validation data,

K
1 X
Eval (h) = e(h(xk ), yk )
K
k=1
K
1 X
E[Eval (h)] = E[e(h(xk ), yk )] = Eout (h)
K
k=1
K
1 X σ2
var[Eval (h)] = var[e(h(xk ), yk )] =
K2 K
k=1
Validation error

Given a set (x1 , y1 ), · · · , (xK , yK ) of validation data,

K
1 X
Eval (h) = e(h(xk ), yk )
K
k=1
K
1 X
E[Eval (h)] = E[e(h(xk ), yk )] = Eout (h)
K
k=1
K
1 X σ2
var[Eval (h)] = var[e(h(xk ), yk )] =
K2 K
k=1

So we (roughly) have
1
Eval (h) = Eout (h) ± O( √ )
K
| {z }
standard deviation
Validation is taken out of training set

Given the data set D = (x1 , y1 ), · · · , (xN , yN )

Split into
Dval : K points validation
Dtrain : N − K points training
Validation is taken out of training set

Given the data set D = (x1 , y1 ), · · · , (xN , yN )

Split into
Dval : K points validation
Dtrain : N − K points training
Tradeoff in choosing K :
Small K =⇒ bad estimate
Large K =⇒ small set for training
Validation

D −→ Dtrain ∪ Dval
| {z } |{z}
N−K K
Dtrain =⇒ g−
Validation error: Eval = Eval (g − )
Final model: D =⇒ g
Validation

D −→ Dtrain ∪ Dval
| {z } |{z}
N−K K
Dtrain =⇒ g−
Validation error: Eval = Eval (g − )
Final model: D =⇒ g

Rule of Thumb: K = N/5

Why “validation”?

Dval is used to make learning choices

(e.g., parameter tuning)
Examples: regularization parameter λ, number of iterations, · · ·
Why “validation”?

Dval is used to make learning choices

(e.g., parameter tuning)
Examples: regularization parameter λ, number of iterations, · · ·
Validation error 6= test error
because validation set affects learning
(optimistic bias)
Model selection by validation

D = Dtrain ∪ Dval
M models H1 , · · · , HM
Model selection by validation

D = Dtrain ∪ Dval
M models H1 , · · · , HM
− for each
Use Dtrain to learn gm
model
Model selection by validation

D = Dtrain ∪ Dval
M models H1 , · · · , HM
− for each
Use Dtrain to learn gm
model
− using D :
Evaluate each gm val
Em = Eval (gm − ),

m = 1, · · · , M
Model selection by validation

D = Dtrain ∪ Dval
M models H1 , · · · , HM
− for each
Use Dtrain to learn gm
model
− using D :
Evaluate each gm val
Em = Eval (gm − ),

m = 1, · · · , M
Pick model m = m∗ with
smallest Em
The bias in validation

We select the model Hm∗ using Dval

− −
Eval (gm ∗ ) is a biased estimate of Eout (gm∗ )
How much bias

For M models H1 , · · · , HM
Assume Dval is used for “training” on the finalists model
How much bias

For M models H1 , · · · , HM
Assume Dval is used for “training” on the finalists model
Selecting the best model from

Hval = {g1− , g2− , · · · , gM

−
}

Back to Hoeffding and VC:

r
− − log M
Eout (gm ∗) ≤ Eval (gm ∗) + O( )
K
How much bias

For M models H1 , · · · , HM
Assume Dval is used for “training” on the finalists model
Selecting the best model from

Hval = {g1− , g2− , · · · , gM

−
}

Back to Hoeffding and VC:

r
− − log M
Eout (gm ∗) ≤ Eval (gm ∗) + O( )
K

For continuous values (e.g., regularization parameter), M can be

replaced by mH (K )
Cross Validation
The dilemma about K

The chain of reasoning:

≈ Eout (g − ) |{z}
Eout (g ) |{z} ≈ Eval (g − )
(small K ) (large K )
The dilemma about K

The chain of reasoning:

≈ Eout (g − ) |{z}
Eout (g ) |{z} ≈ Eval (g − )
(small K ) (large K )

Can we have K both small and large?

Leave one out

N − 1 points for training, and 1 point for validation!

Dn = (x1 , y1 ), · · · , (xn−1 , yn−1 ), (xn , yn ) , (xn+1 , yn+1 ), · · · , (xN , yN )

Training: Dn −→ gn−
en = Eval (gn− ) = e(gn− (xn ), yn )
Leave one out

N − 1 points for training, and 1 point for validation!

Dn = (x1 , y1 ), · · · , (xn−1 , yn−1 ), (xn , yn ) , (xn+1 , yn+1 ), · · · , (xN , yN )

Training: Dn −→ gn−
en = Eval (gn− ) = e(gn− (xn ), yn )
1 PN
cross validation error: ECV = N n=1 en
Illustration of cross validation
Model selection using CV
Leave more than one out

Divide the dataset into C folds, each with K examples

C training sessions on N − K points each

Usually C = 5 (5-fold CV) or 10 (10-fold CV)

Classification/regression with CV

Given data D = {(x1 , y1 ), · · · , (xN , yN )}

Choose a suitable model
Ridge Regression:
N
1 X T
min (w xi − yi )2 + λw T w
w N n=1

Logistic regression:
N
1 X T
min log(1 + e −yi w xi ) + λw T w
w N n=1

Linear SVM:
N
1 X
min max(1 − yi w T xi , 0) + λw T w
w N n=1
Classification/regression with cross validation

Given data D = {(x1 , y1 ), · · · , (xN , yN )}

Split data into D = D1 ∪ D2 ∪ · · · ∪ D5
For each choice of λ
For c = 1, · · · , 5
Obtain gc− using D \ Dc
Compute ec = Eval (gc− )
Set ECV (λ) = (e1 + · · · + eC )/C
Choose λ∗ with best validation error
Train the model using full data D and λ∗
Conclusions

Next class: Support vector machines, Kernel methods

Questions?

Data Analysis Using WEKA
89% (9)
Data Analysis Using WEKA
24 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
MI - Unit 5
No ratings yet
MI - Unit 5
72 pages
Python A.I. Stock Prediction
100% (1)
Python A.I. Stock Prediction
24 pages
Ben Ulmer, Matt Fernandez, Predicting Soccer Results in The English Premier League PDF
100% (1)
Ben Ulmer, Matt Fernandez, Predicting Soccer Results in The English Premier League PDF
5 pages
Credit Scoring Using Machine Learning
No ratings yet
Credit Scoring Using Machine Learning
381 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Data Science: Sales Forecasting For Marketing
No ratings yet
Data Science: Sales Forecasting For Marketing
52 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Machine Learning
No ratings yet
Machine Learning
63 pages
Ch5 Resampling Methods
No ratings yet
Ch5 Resampling Methods
66 pages
KNN Bias Variance Classification Metrics
No ratings yet
KNN Bias Variance Classification Metrics
81 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
L03 Generalization, Train Test Splits and Validation
No ratings yet
L03 Generalization, Train Test Splits and Validation
49 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
M1 - Evaluating Predictive Performance
No ratings yet
M1 - Evaluating Predictive Performance
58 pages
14 Model Selection and Boosting
No ratings yet
14 Model Selection and Boosting
51 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Statistical Learning: Master in Data Science For Management
No ratings yet
Statistical Learning: Master in Data Science For Management
47 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
57 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
Lect 03 Evaluation Part 2
No ratings yet
Lect 03 Evaluation Part 2
40 pages
Ames Housing Price Prediction - Complete ML Project With Python
No ratings yet
Ames Housing Price Prediction - Complete ML Project With Python
14 pages
5 CV Boot-Handout PDF
No ratings yet
5 CV Boot-Handout PDF
44 pages
Lecture 9
No ratings yet
Lecture 9
16 pages
Cross Validation Thesis
100% (4)
Cross Validation Thesis
5 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
No ratings yet
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
35 pages
Bias-Variance Trade-Off
No ratings yet
Bias-Variance Trade-Off
28 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Lec 5
No ratings yet
Lec 5
28 pages
10 CV Val1
No ratings yet
10 CV Val1
26 pages
List Steps in Data Preparation. Give Short Description of Each Step
No ratings yet
List Steps in Data Preparation. Give Short Description of Each Step
20 pages
ML L1 PDF
No ratings yet
ML L1 PDF
43 pages
ML 4
No ratings yet
ML 4
21 pages
Chapter2 1 33
No ratings yet
Chapter2 1 33
18 pages
Ghojogh, Benyamin, and Mark Crowley
No ratings yet
Ghojogh, Benyamin, and Mark Crowley
23 pages
Lec 16
No ratings yet
Lec 16
18 pages
Lec-1 Bias-variance-Tradeoff
No ratings yet
Lec-1 Bias-variance-Tradeoff
24 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
ML 5
No ratings yet
ML 5
14 pages
DATA ANALYSIS UNIT 4 Notes
No ratings yet
DATA ANALYSIS UNIT 4 Notes
19 pages
002-Supervised Learning Setup 01 W2L2
No ratings yet
002-Supervised Learning Setup 01 W2L2
21 pages
Ovefitting, Generalization, Cross Validation
No ratings yet
Ovefitting, Generalization, Cross Validation
20 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Da Unit 4
No ratings yet
Da Unit 4
26 pages
18 CV & Model Selection
No ratings yet
18 CV & Model Selection
11 pages
3.1. Cross-Validation - Evaluating Estimator Performance - Scikit-Learn 1.3.0 Documentation
No ratings yet
3.1. Cross-Validation - Evaluating Estimator Performance - Scikit-Learn 1.3.0 Documentation
12 pages
Handout6 Validation
No ratings yet
Handout6 Validation
10 pages
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
17 pages
Modeling Inverse Kinematics in A Robotic Arm - MATLAB & Simulink Example
No ratings yet
Modeling Inverse Kinematics in A Robotic Arm - MATLAB & Simulink Example
5 pages
4 Model Order
No ratings yet
4 Model Order
10 pages
Unit V
No ratings yet
Unit V
12 pages
Biasvariancetradeoff 210313075413
No ratings yet
Biasvariancetradeoff 210313075413
13 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
Xiiaiuniticapstone Projectpartii
No ratings yet
Xiiaiuniticapstone Projectpartii
11 pages
10: Advice For Applying Machine Learning: Deciding What To Try Next
No ratings yet
10: Advice For Applying Machine Learning: Deciding What To Try Next
8 pages
On Estimating Model Accuracy
No ratings yet
On Estimating Model Accuracy
6 pages
Cross Validation
No ratings yet
Cross Validation
5 pages
ISYE 6501 Georgia Tech hmwk3.1b
No ratings yet
ISYE 6501 Georgia Tech hmwk3.1b
5 pages
Model Validation
No ratings yet
Model Validation
5 pages
Validation Over Under Fir Unit 5
No ratings yet
Validation Over Under Fir Unit 5
6 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
NNs PDF
No ratings yet
NNs PDF
16 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Crisp-Dm: Cross Industry Standard Process For Data Mining
No ratings yet
Crisp-Dm: Cross Industry Standard Process For Data Mining
60 pages
Solution 3.1
No ratings yet
Solution 3.1
4 pages
Detection of Bone Fracture Using Image Processing Methods: Dataset Used
No ratings yet
Detection of Bone Fracture Using Image Processing Methods: Dataset Used
6 pages
Deep Audio-Visual Speech Recognition
No ratings yet
Deep Audio-Visual Speech Recognition
13 pages
Arti Ficial Intelligence Exploitation in Facility Management Using Deep Learning
No ratings yet
Arti Ficial Intelligence Exploitation in Facility Management Using Deep Learning
16 pages
Real-World Image Datasets For Federated Learning
No ratings yet
Real-World Image Datasets For Federated Learning
8 pages
Application of Neural Networks To Population Pharmacokinetic Data Analysis
No ratings yet
Application of Neural Networks To Population Pharmacokinetic Data Analysis
6 pages
Predicting Article Retweets and Likes Based On The Title Using Machine Learning
No ratings yet
Predicting Article Retweets and Likes Based On The Title Using Machine Learning
10 pages
Deep Learning Based Object Detection and Recognition Framework For The Visually-Impaired
No ratings yet
Deep Learning Based Object Detection and Recognition Framework For The Visually-Impaired
5 pages
Pattern Recognition: Statistical and Neural: Lonnie C. Ludeman
No ratings yet
Pattern Recognition: Statistical and Neural: Lonnie C. Ludeman
47 pages
@ A Case Study On Car Evaluation and Prediction Comparative Analysis Using Data Mining Models
No ratings yet
@ A Case Study On Car Evaluation and Prediction Comparative Analysis Using Data Mining Models
5 pages
Homework 2: Lasso Regression: 1.1 Data Set and Programming Problem Overview
No ratings yet
Homework 2: Lasso Regression: 1.1 Data Set and Programming Problem Overview
11 pages
A Novel Semisupervised Deep Learning Method For Human Activity Recognition PDF
No ratings yet
A Novel Semisupervised Deep Learning Method For Human Activity Recognition PDF
10 pages
Speech2Face - Learning The Face Behind A Voice
No ratings yet
Speech2Face - Learning The Face Behind A Voice
11 pages
Bioacoustic Detection With Wavelet-Conditioned Convolutional Neural Networks
No ratings yet
Bioacoustic Detection With Wavelet-Conditioned Convolutional Neural Networks
13 pages
Deep Super Learner: A Deep Ensemble For Classification Problems
No ratings yet
Deep Super Learner: A Deep Ensemble For Classification Problems
12 pages
PASSLEAF: A Pool-Based Semi-Supervised Learning Framework For Uncertain Knowledge Graph Embedding
No ratings yet
PASSLEAF: A Pool-Based Semi-Supervised Learning Framework For Uncertain Knowledge Graph Embedding
8 pages
VGG Deep Neural Network Compression Via SVD and CUR Decomposition Techniques
No ratings yet
VGG Deep Neural Network Compression Via SVD and CUR Decomposition Techniques
6 pages
Transfer Learning Based Image Visualization Using CNN
No ratings yet
Transfer Learning Based Image Visualization Using CNN
9 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

ECS171: Machine Learning: Lecture 13: Validation, Model Selection

Uploaded by

ECS171: Machine Learning: Lecture 13: Validation, Model Selection

Uploaded by

ECS171: Machine Learning

Lecture 13: Validation, Model Selection

Feb 28, 2018

We know, Eout = Ein + overfit penalty

Eout (h) = Ein (h) + overfit penalty

Validation: directly estimate Eout (h)

On out-of-sample point (x, y ), the error is e(h(x), y )

On out-of-sample point (x, y ), the error is e(h(x), y )

E[e(h(x), y )] = Eout (h)

Given a set (x1 , y1 ), · · · , (xK , yK ) of validation data,

Given a set (x1 , y1 ), · · · , (xK , yK ) of validation data,

Given the data set D = (x1 , y1 ), · · · , (xN , yN )

Given the data set D = (x1 , y1 ), · · · , (xN , yN )

Rule of Thumb: K = N/5

Dval is used to make learning choices

Dval is used to make learning choices

We select the model Hm∗ using Dval

Hval = {g1− , g2− , · · · , gM

Back to Hoeffding and VC:

Hval = {g1− , g2− , · · · , gM

Back to Hoeffding and VC:

For continuous values (e.g., regularization parameter), M can be

The chain of reasoning:

The chain of reasoning:

Can we have K both small and large?

N − 1 points for training, and 1 point for validation!

Dn = (x1 , y1 ), · · · , (xn−1 , yn−1 ), (xn , yn ) , (xn+1 , yn+1 ), · · · , (xN , yN )

N − 1 points for training, and 1 point for validation!

Dn = (x1 , y1 ), · · · , (xn−1 , yn−1 ), (xn , yn ) , (xn+1 , yn+1 ), · · · , (xN , yN )

Divide the dataset into C folds, each with K examples

Usually C = 5 (5-fold CV) or 10 (10-fold CV)

Given data D = {(x1 , y1 ), · · · , (xN , yN )}

Given data D = {(x1 , y1 ), · · · , (xN , yN )}

Next class: Support vector machines, Kernel methods

You might also like