Lecture 4
Lecture 4
Lecture 4
How do we do it?
How do we do it?
where fˆ(x) is the prediction that fˆ gives for the ith observations.
where fˆ(x) is the prediction that fˆ gives for the ith observations.
The MSE will be small if the predicted responses are very close to the true
responses, and will be large if for some of the observations, the predicted
and true responses differ substantially.
where fˆ(x) is the prediction that fˆ gives for the ith observations.
The MSE will be small if the predicted responses are very close to the true
responses, and will be large if for some of the observations, the predicted
and true responses differ substantially.
where fˆ(x) is the prediction that fˆ gives for the ith observations.
The MSE will be small if the predicted responses are very close to the true
responses, and will be large if for some of the observations, the predicted
and true responses differ substantially.
This is the rationale behind dividing the data into training or test or
conducting cross-validation. To assess model accuracy we calculate MSE
on test data.
AIL 7310: ML for Econ Lecture 4 3/9
Assessing Model Accuracy
As model flexibility increases, training MSE will decrease, but the test MSE
may not. When a given method yields a small training MSE but a large
test MSE, we might be overfitting the data. This happens because our
statistical learning procedure is working too hard to find patterns in the
training data, and may be picking up some patterns that are just caused by
random chance rather than by true properties of the unknown function f.
When we overfit the training data, the test MSE will be very large because
the supposed patterns that the method found in the training data simply
don’t exist in the test data. Note that regardless of whether or not
overfitting has occurred, we almost always expect the training MSE to be
smaller than the test MSE because most statistical learning methods either
directly or indirectly seek to minimize the training MSE.
When we overfit the training data, the test MSE will be very large because
the supposed patterns that the method found in the training data simply
don’t exist in the test data. Note that regardless of whether or not
overfitting has occurred, we almost always expect the training MSE to be
smaller than the test MSE because most statistical learning methods either
directly or indirectly seek to minimize the training MSE.
A good classifier is one for which this error is smallest on test data.