How To Use Learning Curves To Diagnose Machine Learning Model Performance
How To Use Learning Curves To Diagnose Machine Learning Model Performance
Machine Learning
Mastery Making developers awesome at machine learning
Take my free 7-day email crash course now (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Download Your FREE Mini-Course
Diagnosing Model Behavior
The shape and dynamics of a learning curve can be used to diagnose the behavior of a
machine learning model and in turn perhaps suggest at the type of configuration changes that
may be made to improve learning and/or performance.
There are three common dynamics that you are likely to observe in learning curves; they are:
● Underfit.
● Overfit.
● Good Fit.
We will take a closer look at each with examples. The examples will assume that we are looking
at a minimizing metric, meaning that smaller relative scores on the y-axis indicate more or better
learning.
Books
● Deep Learning, 2016.
● An Introduction to Statistical Learning: with Applications in R, 2013.
Papers
● Learning curve models and applications: Literature review and research directions, 2011.
Posts
● How to Diagnose Overfitting and Underfitting of LSTM Models
● Overfitting and Underfitting With Machine Learning Algorithms
Articles
● Learning curve, Wikipedia.
● Overfitting, Wikipedia.
Summary
In this post, you discovered learning curves and how they can be used to diagnose the learning
and generalization behavior of machine learning models.
Specifically, you learned:
● Learning curves are plots that show changes in learning performance over time in terms
of experience.
● Learning curves of model performance on the train and validation datasets can be used to
diagnose an underfit, overfit, or well-fit model.
● Learning curves of model performance can be used to diagnose whether the train or
validation datasets are not relatively representative of the problem domain.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Jason Brownlee, PhD is a machine learning specialist who teaches developers how to get
results with modern machine learning methods via hands-on tutorials.
View all posts by Jason Brownlee →
https://machinelearningmastery.com/data-leakage-machine-learning/
Reply
On the other side underfitting appears when we need more experience (more epochs) to
train the model, so learning curves trend are continually down..until you get the right
stabilization with the appropriate set of epochs …
– My second question it is , how do you interpret the case when validation data get better
performance (high level) than training data…is it a good indication of good
generalization ?.
thank you Jason to allow us to share your knowledge !!
Reply
However, when I predicted for the test dataset I got around only 53% accuracy. I
had my data divided into train, valid, and test..
What could go wrong here? Any explanation would be so helpful. And, thank you
for the learning curves blog. Was indeed helpful …
Also, can you make predictions using validation data? What could go wrong/right
here?
Reply
https://machinelearningmastery.com/how-to-control-the-speed-and-
stability-of-training-neural-networks-with-gradient-descent-batch-size/
More on scaling:
https://machinelearningmastery.com/how-to-improve-neural-network-
stability-and-modeling-performance-with-data-scaling/
Reply
https://machinelearningmastery.com/how-to-use-transfer-
learning-when-developing-convolutional-neural-network-models/
Yes exactly. A “boar ‘class and an “other” class.
https://en.wikipedia.org/wiki/Coefficient_of_determination
Reply
https://machinelearningmastery.com/start-here/#better
Reply
I understand from this tutorial that the optimization learning curves are used for checking
the model fitness?
This may occur if the validation dataset has too few examples as compared to the training
dataset.
My question is, if you have more validation examples, say 30% of the entire dataset, then
will the curve smooth-out ?
Or, the fault is in the distribution of the validation set itself ? (the val_data might not
contain the same distribution as the train_data contained ).
If the above sentence is not a case of Unrepresented validation dataset, then how would
the curves look like when the validation data distribution is completey different from the
training_dataset. And what are the remedies to counter-act this issue ?
Reply
How can i have a more training size in time series? (By going backward (for
example add 1 day each time and appending the last train and test loss??
Reply
https://machinelearningmastery.com/introduction-to-regularization-to-reduce-
overfitting-and-improve-generalization-error/
Reply
https://machinelearningmastery.com/faq/single-faq/why-do-i-get-
different-results-each-time-i-run-the-code
Reply
The training loss is always (except corner cases) lower than the validation set.
Reply
https://machinelearningmastery.com/faq/single-faq/can-i-translate-your-posts-
books-into-another-language
Reply
(with accuracy over epoch, all of the values are between 0 and 1 – or 0% and 100%)
Reply
https://machinelearningmastery.com/introduction-to-regularization-to-reduce-
overfitting-and-improve-generalization-error/
Reply
I would like to send you some plot but I don’t now how can I do.
Reply
Regarding Roland Fernandez reply, the first reply to this article. I have built some models
and compiled them with ‘mse’ loss and I’m getting at the first epoch a value of 0.0090,and
at second a value of 0.0077,and it keeps learning but just a little bit per epoch, drawing at
the end an almost flat line like the one on the First Learning Curve “Example of Training
Learning Curve Showing An Underfit Model That Does Not Have Sufficient Capacity”. So I
want your opinion on this.
Does these model as Roland say aren’t representative of underfitting due to the low
values, or are in fact underfitting as you established in the article?
I most add that the obtained predictions with these models are in the expected range.
Reply
inputs 5395,23,1.
outputs 5395,23.
Inputs:________Outputs:
1,2,3___________4, 5,6
2,3,4___________5, 6,7
3,4,5___________6, 7,8
Could this be causing that the learning curve is almost flat? Should I be
training at batch_size?
Reply
https://machinelearningmastery.com/cross-entropy-for-machine-
learning/
It is much better to select a metric and compare models that way:
https://machinelearningmastery.com/faq/single-faq/how-to-know-if-a-
model-has-good-performance
Reply
49. Bel April 5, 2020 at 5:35 pm #
Hello Jason,
Is there any range which is considerd good for the Loss values (y-axis), say, the highest
loss value must be above some specific value?
Or that each problem has it’s own range of values, where only the shape of the curves
matter?
Thank you
Reply
https://machinelearningmastery.com/cross-entropy-for-machine-learning/
Generally, it is better to compare the results to a naive model.
Reply
what if i obtain a high validation accuracy, but the curve is not smooth?
thanks
Reply
https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
This can help with choosing a metric for classification:
https://machinelearningmastery.com/tour-of-evaluation-metrics-for-imbalanced-
classification/
Reply
But when I changed my loss function to RMSE and plotted the loss curves. There is a
huge gap between training loss curve and validation loss curve.(epoch: 200 training loss:
0.0757. Test loss: 0.1079)
In my code, I only changed the loss function part(MSE to RMSE). I applied the
Regularization techniques such as Batch Normalization and Dropout but still there is a big
gap between the curves.
I’m new to deep learning, but do you know whats the reason why there is huge gap
between the curves when applying RMSE?
Is it something to do with the Evalaution metric or something wrong in the coding part?
Thanks.
Reply
And for training the model, I leave out the loss function part or use ‘MSE’ as
loss function for training the model?
Reply
https://machinelearningmastery.com/faq/single-faq/can-you-
explain-this-research-paper-to-me
I’m still new to Deep Learning and I’m confused with the
terminologies of Validation Loss and Test Loss. Are they the
same or completely different?
And also you can’t train the model on the test data?
https://machinelearningmastery.com/difference-test-validation-
datasets/
After we choose a model and config, we can fit the final model on
all available data. We cannot fit the model on test data in order to
evaluate it as the model must be evaluated on data not used to
train it to give a fair estimate of performance.
For my another task, I want to compare with other Deep Learning models. For instance I
want to use MLP (Multilayer perceptron) or Logistic Regression(Machine Learning Model).
Is it possible to employ those models for movie rating prediction from 0 to 5?
Thanks.
Reply
https://stackoverflow.com/questions/62877425/validation-loss-curve-is-flat-and-training-
loss-curve-is-higher-than-validation
Thanks.
Reply
https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-cifar-
10-photo-classification/
Reply
-the intermediate accuracy values for validation (not test) (after saving weights after each
5 epochs)
-the value of accuracy after training + validation at the end of all the epochs
-draw accuracy curve for validation (the accuracy is known every 5 epochs)
-knowing the value of accuracy after 50 epochs for validation
During the whole deep network training, both of validation data loss and training data loss
reduces along with the increase of the epochs. But the reduction of validation data loss is
much smaller than the reduction of training data loss, is it normal and representative?
when epoch is small from 0, the curve of training data loss starts high and reduces along
the epochs, but the validation data loss curve starts already small and then reduces along
the epochs slightly.
Thank you.
Reply
Different literature always says good fit is that the validation loss is slighter
higher than training loss, but how high is slightly higher, could you please
give some hint?
Thank you as always.
Reply
https://machinelearningmastery.com/different-results-each-time-in-machine-
learning/
Reply
https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-
networks/
Reply
Perhaps try data cleaning to make the decision boundary more clear.
https://machinelearningmastery.com/faq/single-faq/why-cant-i-get-100-accuracy-or-
zero-error-with-my-model
Reply
https://ibb.co/NsnY1qH
As you can see I am using Logloss for evaluation. My interpretation is that it doesn’t over-
or underfit the data and that I am good to go.
2.) I have a regression task for the last 13% of the data (positive samples) and I have to
predict the different contract values.
My learning curve looks like this:
https://ibb.co/MnZbB15
My interpretation here is that I need more data to make a good prediction. The contract
values range from 0 to 200.000 $ and distribution is super skewed…
Thanks as always for all your support!
Marlon
Reply
https://machinelearningmastery.com/difference-test-validation-datasets/
Yes, I have tons of examples on the blog, use the search box. Perhaps start here:
https://machinelearningmastery.com/display-deep-learning-model-training-history-
in-keras/
Correct, learning curve is a diagnostic for poor model performance, not helpful for
model selection / general test harness like nested cv.
Reply
I have model learning curves with loss curves – both, train and test – okay, however, both
training and the testing accuracy is at 100% from the first epoch.
What should I do?
Any suggestions?
Always thank you!
Reply
https://machinelearningmastery.com/faq/single-faq/what-does-it-mean-if-i-have-0-
error-or-100-accuracy
Reply
The accuracy that I got is 97%, but I don’t know whether the model is overfitting or
underfitting based on the learning curves that I got.
Thank you.
Reply
https://machinelearningmastery.com/faq/single-faq/why-do-i-get-different-results-
each-time-i-run-the-code
Reply
when i plot with firt model : give a good fit but the value of RMSE it is not good
but when i plot second model i have test loss plot below of train loss plot with difference
between them nearly similair of “Unrepresentative Validation Dataset” (train loss decrease
and stable) but with the RMSE value better then of first model
https://machinelearningmastery.com/faq/single-faq/what-does-it-mean-
if-i-have-0-error-or-100-accuracy
Reply
https://ibb.co/Z6nrXM4
Thank you very much
Reply
not sure earlystopping also helps here to get to exactly that best epoch.
thoughts?
Reply
https://ibb.co/Wpmzmh3
https://docs.google.com/document/d/1_OjzPLk9QBPVR1aNFNQGDOZIJSUfTUcF5mFvE
GZa0A0/edit?usp=sharing
Reply
However, I do not see any gap at the end of the lines something that is usually can be
found in an overfitting model
On the other hand, I might have underfitting as:
1. The learning curve of an underfit model has a low training loss at the beginning which
gradually increases upon adding training examples and stay flat, indicating the addition of
more training examples can’t improve the model performance on unseen data
2.Training loss and validation loss are close to each other at the end
However, the train error is not to big something that usually is found on underfitting
models
I am confused Can you please provide me with some advice?
https://i.stack.imgur.com/xGKAj.png
https://i.stack.imgur.com/gkRMn.png
Reply
Welcome!
I'm Jason Brownlee PhD
and I help developers get results with machine learning.
Read more
How to use Data Scaling Improve Deep Learning Model Stability and
Performance