0% found this document useful (0 votes)

10 views20 pages

Ovefitting, Generalization, Cross Validation

The document discusses the concepts of overfitting, generalization, and model evaluation in machine learning, emphasizing the importance of selecting the right polynomial order for data modeling. It explains the process of dividing data into training and test sets, and introduces k-fold cross-validation as a method to reduce statistical variance in model evaluation. The advantages and disadvantages of different validation methods, including leave-one-out cross-validation, are also highlighted.

Uploaded by

AHSAN FAREED

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views20 pages

Ovefitting, Generalization, Cross Validation

Uploaded by

AHSAN FAREED

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Overfitting,

Generalization,
Cross-Validation

Video content can be watched here:

https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Overfitting vs
Generalization

Of Which order polynomial will be best for

the data?
Video content can be watched here:
https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Overfitting vs Generalization
• Of Which order polynomial will be best for the data?
• The model which has the least error as much as possible

0th order polynomial 1st order polynomial

regression regression

This is better
because it has less error
Video content can be watched here:
https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Overfitting vs Generalization
• Of Which order polynomial will be best for the data?
• What about this?

9th order polynomial

regression

• This may be the BEST because the error is ZERO!!

Do you agree with this?

Video content can be watched here:
https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Overfitting vs Generalization
• What is the purpose of Machine Learning?
Predict the unknown data
Learning the given data vs as exactly as possible
as exactly as possible
based on the given data

Video content can be watched here:

https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Overfitting vs
Generalization
• Then, which one looks best?
• As the order (M) increases,
• the complexity of model
increases
• As the complexity of model
increases,
• the model can more exactly
learn the given data
• However, the prediction
accuracy does not necessarily
increase

Video content can be watched here:

https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Model Evaluation • Which model is best?
• You may try several
approaches and need
to choose one
• You may try several
different parameters of
a model and need to
choose one
• Model Evaluations
• Based on Training &
Testing data set
• Cross-Validation

Video content can be watched here:

https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Training Set and Test Set
• How to choose a good model
• Divide the given data into TRAINING set and TEST set
• Training set and Test set should NOT overlap each other!!
• Both need to be independent as much as possible
• With Training set, build various models
• With Test set, evaluate each model
• Choose the model which shows the best performance with Test set

Data

Training Set Test Set

Video content can be watched here:
https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Training Set and Test Set
• How to choose a good model
Data

1. Randomly Split

Training Set Test Set

2. Model build 3. Test

Model 1 Model 2 Model n

4. Choose
the best model
Video content can be watched here:
https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Training Set and Test Set
• Performance Graph

Video content can be watched here:

https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Size of Test
set
• 50~30% of given data

Training Set Advantage • Simple & easy

and Test • Test set is not used for
Set modeling building. Waste of
data
• Data is randomly split
Disadvantage • Evaluation can be
significantly different
depending on data split
• => Any good idea?

Video content can be watched here:

https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Training Set and Test Set
• Data Waste & Random Split
Split 1 Split 2
Data Data

Training Set Test Set Training Set Test Set

Bad performance Good performance

Model Model

Video content can be watched here:

https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Cross Validation
• In order to reduce statistical variance
• Usually, k-fold cross validation is widely used

• K-fold Cross Validation

• Split given data into K folds Data
• Folds should not overlap with each other

Fold 1 Fold 2 Fold 3 … Fold k

• Compose k-1 training set and 1 test set with k folds

Video content can be watched here:

https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Cross Validation
• Example: 4-fold cross validation
Training/Test Training/Test Training/Test Training/Test
Set I Set II Set III Set IV
Fold 2

Training set
Fold 1 Fold 4 Fold 3
Training set

Training set

Training set
Fold 2 Fold 1 Fold 4 Fold 3

Fold 3 Fold 2 Fold 1 Fold 4

Test Test Test Test

Fold 4 Fold 3 Fold 2 Fold 1
set set set set

Choose a model by the average performance of 4 sets

Video content can be watched here:

https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Cross Validation
• Example: 4-fold cross validation
Training Set TestSet
Training/Test Fold 1, 2, 3 Fold 4
Set I

Model 1 Model 2 Model n

Set I 3 4 … 5
Set II …
Performance
Set III …
Set IV …

Video content can be watched here:

https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
Video content can be watched here:

Cross Validation https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg

• Example: 4 fold cross validation

Training Set TestSet
Training/Test Fold 4, 1, 2 Fold 3
Set II

Model 1 Model 2 Model n

Set I 3 4 … 5
Set II 4 1 … 2
Performance
Set III …
Set IV …
Cross Validation
Video content can be watched here:
https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg

• Example: 4 fold cross validation

Training Set TestSet
Training/Test Fold 3, 4, 1 Fold 2
Set III

Model 1 Model 2 Model n

Set I 3 4 … 5
Set II 4 1 … 2
Performance
Set III 5 2 … 3
Set IV …
Video content can be watched here:
https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg

Cross Validation
• Example: 4 fold cross validation
Training Set TestSet
Training/Test Fold 2, 3, 4 Fold 1
Set IV

Model 1 Model 2 Model n

Set I 3 4 … 5
Set II 4 1 … 2
Performance
Set III 5 2 … 3
Set IV 4 3 … 2
Cross Validation

Summary Advantage Disadvantage

The data set is divided into k subsets, Less dependent on how the data Time !!
and the holdout method is repeated gets divided.
k times. Every data point gets to be in a test
Each time, one of the k subsets is set exactly once, and gets to be in a
used as the test set and the other k- training set k-1 times.
1 subsets are put together to form a
training set.
Then the average error across all k
trials is computed.
The variance is reduced as k is
increased.
Video content can be watched here:
https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg
No fixed Regular CV: Split into folds and Shuffling
folds
folds and
Random No split into folds and Randomly dividing
the data into a test and training set k
Division different times.

Variation
Extreme case of k-fold cross validation
LOOCV
If data size is n, set k = n
(Leave-
one-out Every data point except one is used for
training and the remaining one is used for
Cross testing
Validation)
Repeat this n times
Video content can be watched here:
https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg

SIS ESD Sistems For Process Industries Using IEC 61508 Unit7 SIL Selection
100% (1)
SIS ESD Sistems For Process Industries Using IEC 61508 Unit7 SIL Selection
100 pages
Sixteen Saviours or One?, John Perry. 1879
100% (3)
Sixteen Saviours or One?, John Perry. 1879
160 pages
Cross-Validation in Machine Learning - Javatpoint
No ratings yet
Cross-Validation in Machine Learning - Javatpoint
8 pages
Useful WordsFor Creative Writing
No ratings yet
Useful WordsFor Creative Writing
22 pages
Structural Irregularities 2
No ratings yet
Structural Irregularities 2
12 pages
Frozen Desserts
No ratings yet
Frozen Desserts
27 pages
Lec 16
No ratings yet
Lec 16
18 pages
2020 Evaluation PDF
No ratings yet
2020 Evaluation PDF
25 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
K Fold
No ratings yet
K Fold
21 pages
14 Model Selection and Boosting
No ratings yet
14 Model Selection and Boosting
51 pages
K-Fold Cross Validation Technique and Its Essentials - Analytics Vidhya
No ratings yet
K-Fold Cross Validation Technique and Its Essentials - Analytics Vidhya
11 pages
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
No ratings yet
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
21 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
ML 3170724 Unit-3
No ratings yet
ML 3170724 Unit-3
48 pages
Sampling Methods in Machine Learning
No ratings yet
Sampling Methods in Machine Learning
13 pages
ML Unit 2 Part 1
No ratings yet
ML Unit 2 Part 1
47 pages
Lecture 9
No ratings yet
Lecture 9
16 pages
Unit 2
No ratings yet
Unit 2
28 pages
Wk07 Topic07 2 - 202303
No ratings yet
Wk07 Topic07 2 - 202303
21 pages
Section 1: Cross-Validation and Model Performance
No ratings yet
Section 1: Cross-Validation and Model Performance
33 pages
Training Evaluation
No ratings yet
Training Evaluation
42 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Unit IV
No ratings yet
Unit IV
51 pages
Chapter2 1 33
No ratings yet
Chapter2 1 33
18 pages
ML 5
No ratings yet
ML 5
14 pages
L03 Generalization, Train Test Splits and Validation
No ratings yet
L03 Generalization, Train Test Splits and Validation
49 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
Module3-Ensemble Learning
No ratings yet
Module3-Ensemble Learning
107 pages
List Steps in Data Preparation. Give Short Description of Each Step
No ratings yet
List Steps in Data Preparation. Give Short Description of Each Step
20 pages
Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
Lec9 - Evaluation
No ratings yet
Lec9 - Evaluation
11 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Unit V
No ratings yet
Unit V
12 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
Answer-4 Shreyansh
No ratings yet
Answer-4 Shreyansh
4 pages
ML Pyq Ans
No ratings yet
ML Pyq Ans
37 pages
Cofusion Matrix Cross - Validation
No ratings yet
Cofusion Matrix Cross - Validation
34 pages
Cross Validation
No ratings yet
Cross Validation
5 pages
Cross Validation
No ratings yet
Cross Validation
4 pages
Lecture Note #6 - PEC-CS701E
No ratings yet
Lecture Note #6 - PEC-CS701E
11 pages
Cross Validation Techniques
No ratings yet
Cross Validation Techniques
27 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
Module 6 - ML
No ratings yet
Module 6 - ML
30 pages
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
17 pages
Cross Validation
No ratings yet
Cross Validation
16 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
RO47002 - Lecture 2C - Hyperparameters and Cross-Validation
No ratings yet
RO47002 - Lecture 2C - Hyperparameters and Cross-Validation
10 pages
Unit 9 Model Evaluation
No ratings yet
Unit 9 Model Evaluation
26 pages
Model Validation
No ratings yet
Model Validation
5 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
Validation Over Under Fir Unit 5
No ratings yet
Validation Over Under Fir Unit 5
6 pages
5 DL
No ratings yet
5 DL
33 pages
SchneiderPL PDF
No ratings yet
SchneiderPL PDF
281 pages
K Fold and Other Cross-Validation Techniques
No ratings yet
K Fold and Other Cross-Validation Techniques
10 pages
ML-4th Unit
No ratings yet
ML-4th Unit
44 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
Unit 4
No ratings yet
Unit 4
34 pages
Linear Regression
100% (1)
Linear Regression
56 pages
Super Drive Tut 3
No ratings yet
Super Drive Tut 3
4 pages
Drilling Calculations
No ratings yet
Drilling Calculations
7 pages
Cross Validation
No ratings yet
Cross Validation
7 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
Model Evaluation and Cross-Validation Methods
No ratings yet
Model Evaluation and Cross-Validation Methods
3 pages
Model Selection and Evaluation
No ratings yet
Model Selection and Evaluation
23 pages
Magnetism - Notes 24-25
No ratings yet
Magnetism - Notes 24-25
13 pages
ML Unit1
No ratings yet
ML Unit1
11 pages
Biology Investigatory Project On Effects of Music Genres in Heart Rates
No ratings yet
Biology Investigatory Project On Effects of Music Genres in Heart Rates
47 pages
Jagged Harts (Katelyn Taylor) (Z-Library)
No ratings yet
Jagged Harts (Katelyn Taylor) (Z-Library)
281 pages
Physical Characteristics of Optical Fibers
No ratings yet
Physical Characteristics of Optical Fibers
7 pages
Hang Thỏ Word Formation B2 C1 Chapter 1
No ratings yet
Hang Thỏ Word Formation B2 C1 Chapter 1
4 pages
Grammar Lesson 1 - Overview
No ratings yet
Grammar Lesson 1 - Overview
5 pages
Lab p2
No ratings yet
Lab p2
9 pages
Wetted Surface Area of Partially Filled Horizontal Vessel
No ratings yet
Wetted Surface Area of Partially Filled Horizontal Vessel
1 page
Biefeld-Brown Effect and Space Curvature
No ratings yet
Biefeld-Brown Effect and Space Curvature
6 pages
B1 - Reading Lesson - Countdown To The Paris 2024 Olympics
No ratings yet
B1 - Reading Lesson - Countdown To The Paris 2024 Olympics
6 pages
Sample New Criticism Essay
No ratings yet
Sample New Criticism Essay
5 pages
X7 Petrol Brochure
No ratings yet
X7 Petrol Brochure
1 page
INTRODUCTION TO BIOMEDICAL TECHNOLOGY - Module-1ppt
No ratings yet
INTRODUCTION TO BIOMEDICAL TECHNOLOGY - Module-1ppt
17 pages
The New India Assurance Co. LTD.: Certificate Cum Policy Schedule
No ratings yet
The New India Assurance Co. LTD.: Certificate Cum Policy Schedule
2 pages
Role of UN and International NGOs in Global Health Governance - Edited
No ratings yet
Role of UN and International NGOs in Global Health Governance - Edited
3 pages
Dimension Drawing BD1100+ Purge
No ratings yet
Dimension Drawing BD1100+ Purge
1 page
Practical Report LSD - Aryan Malik - 20-3017
No ratings yet
Practical Report LSD - Aryan Malik - 20-3017
11 pages
HSSRPTR - Plus One Weightage To Chapters-2023
No ratings yet
HSSRPTR - Plus One Weightage To Chapters-2023
15 pages
Redemption - Batch - 3 11 24 To 3 15 24
No ratings yet
Redemption - Batch - 3 11 24 To 3 15 24
4 pages
6000 SQFT Multi Sports & Cricket Turf
No ratings yet
6000 SQFT Multi Sports & Cricket Turf
2 pages
Boiler Report
No ratings yet
Boiler Report
1 page
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
From Everand
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
Gabriel Awoyemi
No ratings yet
ISTQB Advanced Level Technical Test Analyst- Exam Insights: Q&A with Explanations
From Everand
ISTQB Advanced Level Technical Test Analyst- Exam Insights: Q&A with Explanations
SUJAN
No ratings yet

Ovefitting, Generalization, Cross Validation

Uploaded by

Ovefitting, Generalization, Cross Validation

Uploaded by

Overfitting,

Video content can be watched here:

Of Which order polynomial will be best for

0th order polynomial 1st order polynomial

9th order polynomial

• This may be the BEST because the error is ZERO!!

Do you agree with this?

Video content can be watched here:

Video content can be watched here:

Video content can be watched here:

Training Set Test Set

Training Set Test Set

2. Model build 3. Test

Model 1 Model 2 Model n

Video content can be watched here:

Training Set Advantage • Simple & easy

Video content can be watched here:

Training Set Test Set Training Set Test Set

Bad performance Good performance

Video content can be watched here:

• K-fold Cross Validation

Fold 1 Fold 2 Fold 3 … Fold k

• Compose k-1 training set and 1 test set with k folds

Video content can be watched here:

Fold 3 Fold 2 Fold 1 Fold 4

Test Test Test Test

Choose a model by the average performance of 4 sets

Video content can be watched here:

Model 1 Model 2 Model n

Video content can be watched here:

Cross Validation https://www.youtube.com/channel/UCov-vAEaVfqEtlHkdiRL3Tg

• Example: 4 fold cross validation

Model 1 Model 2 Model n

• Example: 4 fold cross validation

Model 1 Model 2 Model n

Model 1 Model 2 Model n

Summary Advantage Disadvantage

You might also like