0% found this document useful (0 votes)

10 views33 pages

5 DL

The document discusses the importance of proper dataset partitioning into training, validation, and test sets to avoid bias and overfitting in machine learning models. It highlights the challenges posed by imbalanced classes and offers strategies for handling such data, including down-sampling, up-sampling, and reweighting. Additionally, it emphasizes the significance of realistic evaluation plans and the need for data distribution consistency across sets to ensure effective model performance.

Uploaded by

kushalgangwar98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views33 pages

5 DL

Uploaded by

kushalgangwar98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Dataset – Train, Test & Validation

The basic rule in model validation

Don’t test your model on data it’s been

trained with!
Random selection of data vs bias
• Unbiased data:
– Data that represents the diversity in our target distribution of data on which we
want to make predictions. i.e., it is a random, representative sample. This is a
good scenario.

• Biased data:
– Data that does not accurately reflect our target population.
– Example – we only asked people with glasses who they will vote for, instead of
asking a random sample of the population.
– Training on biased data can lead to inaccuracies.
– There are methods to correct bias.
The three-way split
• Training set
A set of examples used for learning

• Validation set
A set of examples used to tune the parameters of a classifier

• Test set
A set of examples used only to assess the performance of fully-trained classifier. After
assessing the model with the test set, YOU MUST NOT further tune your model (that’s the
theory anyway… - in order to prevent ‘learning the test set’ and ‘overfitting’)
The three-way split

The entire available dataset

Validation Test

Performance
Model
tuning

evaluation
Training data
data data
How to perform the split?

• How many examples in each data set?

– Training: Typically, 60-80% of data
– Test set: Typically, 20-30% of your trained set
– Validation set: Often 20% of data
• Examples
– 3 way: Training: 60%, CV: 20%, Test: 20%
– 2 ways: Training 70%, Test: 30%
How to perform the split?

• Time based – If time is important

Can you use Jul 2011 stock price to try and predict Apr 2011
stock price?
How to perform the split?
• Some datasets are affected by seasonality.
90,000

80,000

70,000

60,000
Sales ($)

50,000

40,000

30,000

20,000

10,000

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Month

Is it a good test set?

How to perform the split?
• Many more possible scenarios…
• Key messages:
– Evaluation plan: Very important at the beginning of the project.

– Realistic evaluation: Such plan should reflect the real-life scenarios your model
should work on.

– External evaluation by third party is the gold standard (but not always realistic).
Imbalanced classes
Dealing with imbalanced classes

Class 1: Short people

Class 2: Tall people

Can we build a classifier that can find the needles in the haystack?
Imbalanced Class Sizes
• Sometimes, classes have very unequal frequency
– Cancer diagnosis: 99% healthy, 1% disease
– eCommerce: 90% don’t buy, 10% buy
– Security: >99.99% of Americans are not terrorists

• Similar situation with multiple classes.

• This creates problems for training and evaluating a model

Evaluating a model with imbalanced data

• Example: 99% of people do not have cancer

• If we simply create a ‘trivial classifier’ – predict that nobody has cancer, then 99% of our predictions
are correct! Bad news! – we incorrectly predict nobody has cancer.
• If we make only a 1% mistake on healthy patients, and accurately find all cancer patients
– Then ~50% of people that we tell have cancer are actually healthy!
Learning with imbalanced datasets is often done by
balancing the classes

• Important: Estimate the results using an imbalanced held-out (test) set

• How to create a “balanced” set
– Treat the majority class
• Down-sample
• Bootstrap (repeat down sampling with replacement)
– Treat the small class
• Up-Sample the small class
• Assign larger weights to the minority class samples
Down-sample and Up-sample
• Down-sample:
– Sample (randomly choose) a random subset of points in the
majority class

• Up-sample:
– Repeat minority points (Data duplications)
– Synthetic Minority Oversampling Technique (SMOTE): It works based on
the KNearestNeighbours algorithm, synthetically generating data points that fall in the proximity of
the already existing outnumbered group.

https://www.analyticsvidhya.com/blog/2020/11/handling-imbalanced-data-machine-learning-computer-vision-and-nlp/
https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data
Bootstrapping
• Sample m sets from the majority so that each sets size is equal to the
minority set.
• The downsampling is done with replacement (repeats allowed).
• Train a model of minority vs bootstrapped sample for each bootstrap
iteration.
• This gives us m different models this is the basis of ensemble models
like Random forest.
Reweighting

• Idea: Assign larger weights to samples from the smaller class

• A commonly used weighting scheme is linearly by class size:

𝑛
𝑤𝑐 =
𝑛𝑐
– Where 𝑛𝑐 is the size of the class c and 𝑛 = σ𝑐 𝑛𝑐 is the total sample
size.
Imbalanced data can be harnessed
• The Viola/Jones Face Detector
Faces Non-Faces

Training data
5000 faces 108 non faces

• P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features”, CVPR, 2001
• P. Viola and M. Jones, “Robust real-time face detection”, IJCV 57(2), 2004
Distribution of data for improving the accuracy of
the model

Training set
Which you run your learning algorithm on.
Dev Which you use to tune parameters, select features, and
(development) make other decisions regarding the learning algorithm.
set Sometimes also called the hold-out cross validation set .

which you use to evaluate the performance of the

Test set algorithm, but not to make any decisions regarding what
learning algorithm or parameters to use.
17 January 2025 19
Partition your data in different categories
Dev Test or
Hold Out Cross Validation
Training Set Test Set
Set

Traditional Style partitioning 70/30 or 60/20/20

But in the era of Deep Learning may even go down to 99 0.5 0.5

If the data size is 1,00,0000 then 5000 5000 data size will still be there
in dev and test sets
17 January 2025 20
Importance of Choosing dev and test sets wisely

The purpose of the dev

Very important that dev
and test sets are to direct Bad distribution will
and test set reflect data
your team toward the severely restrict analysis
you expect to get in the
most important changes to to guess that why test data
future and want to do well
make to the machine is not giving good results
on.
learning system

17 January 2025 21
Data Distribution Mismatch
It is naturally good to have the data in all the sets from the same distribution.

For example, Housing data coming from Mumbai and we are trying to find the
house prices in Chandigarh.

Else wasting a lot of time in improving the performance of dev set and then
finding out that it is not working well for the test set.

Sometimes we have only two partitioning of the data in that case they are
called Train/dev or train/test set.
17 January 2025 22
Underfitting vs Overfitting
Lecture 13

17 January 2025 23
Bias/Variance

17 January 2025 24
Bias/Variance

How much worse the

The algorithm’s error rate on
algorithm does on the dev
the training set is
(or test) set than the training
algorithm’s bias .
set is algorithm’s variance .

17 January 2025 25
Bias/Variance
Train Set Error 1 12 8 1

Dev Set Error 9 13 16 1.5

High Variance High Bias High Bias, Low Bias
High Variance Low Variance
Overfitting Underfitting Underfitting Good fit

Multi dimensional system can have high bias in some areas and high variance in some
other areas of the system, resulting in High Bias and High Variance issue

17 January 2025 26
High Bias
It allows to fit the training set better. If you find that this
Increase the model size (such as
increases variance, then use regularization, which will
number of neurons/layers)
usually eliminate the increase in variance.

Create additional features that help the algorithm

Modify input features based on
eliminate a particular category of errors. These new
insights from error analysis
features could help with both bias and variance.

Reduce or eliminate regularization

reduces avoidable bias but increase variance.
(L2, L1 regularization, dropout)

Modify model architecture (such as

neural network architecture) so that it This can affect both bias and variance.
is more suitable for your problem
17 January 2025 27
High Variance
Simplest and most reliable way to address variance, so long as
Add more training data you have access to significantly more data and enough
computational power to process the data.

Add regularization (L2, L1 regularization,

This technique reduces variance but increases bias.
dropout)

Add early stopping (stop gradient descent

Reduces variance but increases bias.
early, based on dev set error)

Modify model architecture (such as neural

network architecture) so that it is more This affects both bias and variance.
suitable for your problem

17 January 2025 28
Often compare with human level performance
Image Ease of obtaining data from human labelers.
recognition,
spam
classification.
Error analysis can draw on human intuition.

Use human-level performance to estimate the

optimal error rate and set a “desired error rate”.

17 January 2025 29
Tasks Where we don’t compare with human level
performance
Picking a book to It is harder to obtain labels
recommend to
you;

picking an ad to Human intuition is harder to count on

show a user on a
website;

predicting stock It is hard to know what the optimal error rate

market. and reasonable desired error rate is

17 January 2025 30
Classification example for animals
Type Scenario 1 Scenario 2
Humans (Bayes error) 1 7.5
Training error 8 8
Dev error 10 10
Focus on Bias Focus on Variance
Avoidable Bias 7 0.5

17 January 2025 31
New scenario
Type Scenario 1 Scenario 2 Scenario 3
Human (Bayes) error 1 1 1
0.7 0.7 0.7
0.5 0.5 0.5
Training error 5 1 0.7
Dev error 6 5 0.8
Bias issue Variance Issue Difficult

Two fundamental You can fit the training set well

Assumptions:
Training set performance should Generalize to dev/test set.

17 January 2025 32
Thank You
For more information, please visit the
following links:

[email protected]
[email protected]
https://www.linkedin.com/in/gauravsingal789/
http://www.gauravsingal.in

17 January 2025
33

AASHTO M300 Inorganic Zinc-Rich Primer
100% (2)
AASHTO M300 Inorganic Zinc-Rich Primer
8 pages
k3 Ve Service Manual
26% (19)
k3 Ve Service Manual
2 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Wk07 Topic07 2 - 202303
No ratings yet
Wk07 Topic07 2 - 202303
21 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
Lecture 12 - Machine Learning
No ratings yet
Lecture 12 - Machine Learning
18 pages
Tuning Decision Trees Python
No ratings yet
Tuning Decision Trees Python
50 pages
How To Improve Model
No ratings yet
How To Improve Model
27 pages
Machine Learning General: Definiton
No ratings yet
Machine Learning General: Definiton
14 pages
Data Science in FInancial Services - 2
No ratings yet
Data Science in FInancial Services - 2
48 pages
Deep Learning Unit 3
No ratings yet
Deep Learning Unit 3
19 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Understanding Datasets Features Selection Train Test Validation Sets L12
No ratings yet
Understanding Datasets Features Selection Train Test Validation Sets L12
25 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
ML Unit 2 Part 1
No ratings yet
ML Unit 2 Part 1
47 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Module3 DS PPT
No ratings yet
Module3 DS PPT
68 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Training Evaluation
No ratings yet
Training Evaluation
42 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
Unit IV
No ratings yet
Unit IV
51 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
Unit 3
No ratings yet
Unit 3
55 pages
ML 5
No ratings yet
ML 5
14 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Overfitting & Feature Engineering
No ratings yet
Overfitting & Feature Engineering
37 pages
ML3 - Evaluation
100% (1)
ML3 - Evaluation
65 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Structuring Machine Learning Projects
No ratings yet
Structuring Machine Learning Projects
12 pages
ML Unit 2
No ratings yet
ML Unit 2
86 pages
Chapter-3-Common Issues in Machine Learning
No ratings yet
Chapter-3-Common Issues in Machine Learning
20 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
UNIT 2 Data Science LM 2023
No ratings yet
UNIT 2 Data Science LM 2023
13 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
CSL0777 L08
No ratings yet
CSL0777 L08
29 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
ML Viva Questions
No ratings yet
ML Viva Questions
25 pages
2020 Evaluation PDF
No ratings yet
2020 Evaluation PDF
25 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
ML 5
No ratings yet
ML 5
26 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
2 1 TXT Bias Variance
No ratings yet
2 1 TXT Bias Variance
4 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Lec 2
No ratings yet
Lec 2
13 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
40 Interview Questions Asked at Startups in Machine Learning - Data Science
No ratings yet
40 Interview Questions Asked at Startups in Machine Learning - Data Science
13 pages
Lect 03 Evaluation Part 2
No ratings yet
Lect 03 Evaluation Part 2
40 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Lecture 15 - Recap and Midterm Review
No ratings yet
Lecture 15 - Recap and Midterm Review
37 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Sample Q - A For Module 3 - 4
No ratings yet
Sample Q - A For Module 3 - 4
18 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
Att A - Partial Project Concept Sumaryrev1
No ratings yet
Att A - Partial Project Concept Sumaryrev1
5 pages
Def Slide
No ratings yet
Def Slide
9 pages
LT-LT-: Satellite Tracer
No ratings yet
LT-LT-: Satellite Tracer
70 pages
01 - Electricity - Basic Principles
No ratings yet
01 - Electricity - Basic Principles
14 pages
PGP Aiml2024
No ratings yet
PGP Aiml2024
22 pages
Standard Specification For Castings, Austenitic-Ferritic (Duplex) Stainless Steel, For Pressure-Containing Parts
No ratings yet
Standard Specification For Castings, Austenitic-Ferritic (Duplex) Stainless Steel, For Pressure-Containing Parts
6 pages
CHAPTER 2 - FILE HANDLING-txtfile
No ratings yet
CHAPTER 2 - FILE HANDLING-txtfile
23 pages
CSC403 - Software Engineering BOSU
No ratings yet
CSC403 - Software Engineering BOSU
13 pages
1941 - National Building Code of Canada
No ratings yet
1941 - National Building Code of Canada
432 pages
Extra Grammar Exercises Lesson 1 UNIT 6
No ratings yet
Extra Grammar Exercises Lesson 1 UNIT 6
2 pages
Health - Lisa Bouslimani - Mental Wellbeing 2024-06-22
No ratings yet
Health - Lisa Bouslimani - Mental Wellbeing 2024-06-22
2 pages
DEGUZMAN KS3 LeaP G8Q3W6
No ratings yet
DEGUZMAN KS3 LeaP G8Q3W6
3 pages
Evaluation of Golden Proportions Among Three Ethnic Groups of Indian Females
No ratings yet
Evaluation of Golden Proportions Among Three Ethnic Groups of Indian Females
7 pages
Satish
No ratings yet
Satish
5 pages
XS Series E Appen 7 Installation PDF
No ratings yet
XS Series E Appen 7 Installation PDF
101 pages
Curriculum Map Subject: Science Quarter: 4 Grade Level: Grade 4 Topic: Earth and Space
100% (1)
Curriculum Map Subject: Science Quarter: 4 Grade Level: Grade 4 Topic: Earth and Space
5 pages
Questions Bank of Project Management
No ratings yet
Questions Bank of Project Management
4 pages
Design of HVAC Control System For Building Energy Management Systems
No ratings yet
Design of HVAC Control System For Building Energy Management Systems
5 pages
An Analytical Method of Aircraft Gearboxes: To Predict Efficiency
No ratings yet
An Analytical Method of Aircraft Gearboxes: To Predict Efficiency
25 pages
Umakant B
No ratings yet
Umakant B
3 pages
SC MCQ
0% (1)
SC MCQ
10 pages
2 Manual RPI M50A 12s V1 EU EN 2017-03-09
No ratings yet
2 Manual RPI M50A 12s V1 EU EN 2017-03-09
166 pages
Brosur Grolen HP19R
No ratings yet
Brosur Grolen HP19R
2 pages
Feasib1 5
No ratings yet
Feasib1 5
87 pages
Exercise About News Item
No ratings yet
Exercise About News Item
3 pages
Refrigeration
No ratings yet
Refrigeration
5 pages
Admission Form BNU
No ratings yet
Admission Form BNU
2 pages
Samuel Mercer - The Ideology of Work - Theoretical Humanism, Work and Labour (Historical Materialism Book Series, 311) - Brill Academic Pub (2024)
No ratings yet
Samuel Mercer - The Ideology of Work - Theoretical Humanism, Work and Labour (Historical Materialism Book Series, 311) - Brill Academic Pub (2024)
219 pages

5 DL

Uploaded by

5 DL

Uploaded by

Dataset – Train, Test & Validation

The basic rule in model validation

Don’t test your model on data it’s been

The entire available dataset

• How many examples in each data set?

• Time based – If time is important

Is it a good test set?

Class 1: Short people

• Similar situation with multiple classes.

• This creates problems for training and evaluating a model

• Example: 99% of people do not have cancer

• Important: Estimate the results using an imbalanced held-out (test) set

• Idea: Assign larger weights to samples from the smaller class

• A commonly used weighting scheme is linearly by class size:

which you use to evaluate the performance of the

Traditional Style partitioning 70/30 or 60/20/20

The purpose of the dev

How much worse the

Dev Set Error 9 13 16 1.5

Create additional features that help the algorithm

Reduce or eliminate regularization

Modify model architecture (such as

Add regularization (L2, L1 regularization,

Add early stopping (stop gradient descent

Modify model architecture (such as neural

Use human-level performance to estimate the

picking an ad to Human intuition is harder to count on

predicting stock It is hard to know what the optimal error rate

Two fundamental You can fit the training set well

You might also like