lecture03b_overfitting_annotated
lecture03b_overfitting_annotated
Martin Jaggi
Last updated on: September 24, 2024
credits to Mohammad Emtiyaz Khan & Rüdiger Urbanke
Motivation
- o
Models can be too limited or they can be too rich. In the
first case we cannot find a function that is a good fit for the
data in our model. We then say that we underfit. In the
-
fo(X)
.
1 M =0
t
=
wo
0
−1
"
0 x 1
, log(u)
,
⑰
yn ⇡ w0 + w1xn + w2x2
2
n + . . . + w
in
M x·M
n =: (x
d
n ) >
w.
polynomial Xa ER
lifting linear in I ,
w EIRM
+
Overfitting with Linear Models
In the following four figures, circles are data points, the green
line represents the “true function”, and the red line is the
model. The parameter M is the maximum degree in the
polynomial basis.
e 1-param 2-param
,
1 M =0 1 M =1
- -
t t
e -
0 0
+
5 −1
↑ −1
-X 0 x 1 0 x 1
3-param 10-paam -
1 M =3 1 M =9
-
t t
-
20
0 0
−1
perfecte( −1
0 1 0
training log 1
x x
e= 0 test
point
For M = 0 (the model is a constant) the model is under-
fitting and the same is true for M = 1. For M = 3 the
model fits the data fairly well and is not yet so rich as to fit
in addition the small “wiggles” caused by the noise. But for
M = 9 we now have such a rich model that it can fit every
single data point and we see severe overfitting taking place.
What can we do to avoid overfitting? If you increase the
amount of data (increase N , but keep M fixed), overfitting
might reduce. This is shown in the following two figures
where we again consider the same model complexity M = 9
but we have extra data (N = 15 or even N = 100).
10-param 10-param
1 Ent
N = 15 1 ElN = 100
t t
0 0
−1 −1
0 x 1 0 x 1
more data
Additional Materials
Read about overfitting in the paper by Pedro Domingos (Sections 3 and 5
of “A few useful things to know about machine learning”).