0% found this document useful (0 votes)
8 views9 pages

Training, Validation, and Test Sets: 2019 Philipp Krähenbühl and Chao-Yuan Wu

The document discusses splitting a dataset into training, validation, and test sets. The training set is used to train the model parameters, the validation set is used to tune hyperparameters and evaluate the model, and the test set is used to measure the final performance of the model on unseen data.

Uploaded by

Sid Science
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views9 pages

Training, Validation, and Test Sets: 2019 Philipp Krähenbühl and Chao-Yuan Wu

The document discusses splitting a dataset into training, validation, and test sets. The training set is used to train the model parameters, the validation set is used to tune hyperparameters and evaluate the model, and the test set is used to measure the final performance of the model on unseen data.

Uploaded by

Sid Science
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Training, validation, and test

sets
ⓒ 2019 Philipp Krähenbühl and Chao-Yuan Wu
Dataset
• Training set

• Learn model parameters

• Validation set

• Learn hyper-parameters

• Test set

• Measure generalization
performance
Why split the data?
• Overfitting

• Goal: Learn a model


that works well in the
real world

• Optimization objective:
Learn a model that
works well in training
data
Training set

• Used to train all


parameters of the
model

• Model will work very


well on training set

• Size: 60-80% of data


Validation set

• Used to determine how


well the model works

• Used to tune model and


hyper-parameters

• Size: 10-20% of data


Testing set

• Used to measure
performance of model
on unseen data

• Used exactly once

• Size: 10-20% of data


How to split the data?

• Random sampling
without replacement
Distribution of data
Low dimensions High dimensions

Ddata ≈ Dtrain ≈ Dvalid ≈ Dtest Ddata ≠ Dtrain ≠ Dvalid ≠ Dtest


Graduate student descent
Look at your
data / model output
semi-
manual
automated
Evaluate
your model on Design and
validation set train your model

automated

You might also like