Bias Variance Derivation
Bias Variance Derivation
2
Var(X) = E X2 E [X]
2 h i2
Var f (x) f^ (x) = E f (x) f^ (x) E f (x) f^ (x) (1)
| {z } | {z }
second m om ent square of m ean
Let y = f (x) + be the generic model where f (x) represents the underlying
natural model and is the random disturbance/noise that adds with the model
naturally, nobody can do anything about it. It is assumed to be zero mean iid.
f^ (x) is the machine learning model which is expected to be as good as y. We
will speci…cally be looking at Figure 1. The y-axis represent the mean-square
or the variance of the Test error. We will try to expand the Test error and
represent it using the three elements responsible namely: Noise, Bias and the
Variance.
Figure 1
1
2
Test MSE = E y f^ (x)
2
= E f (x) + f^ (x)
2
= E f (x) f^ (x) +
2
= E f (x) f^ (x) + 2
2 f (x) f^ (x)
2 h i
= E f (x) f^ (x) +E 2
2E f (x) f^ (x)
2 h i
= E f (x) f^ (x) +E 2
2E f (x) f^ (x) E [ ]
2
= E f (x) f^ (x) +E 2
* E[ ] = 0
2
= E f (x) f^ (x) +E 2
h i2
if we add and subtract the term E f (x) f^ (x)
2 h i2 h i2
= E f (x) f^ (x) +E 2
E f (x) f^ (x) +E f (x) f^ (x)
h i2 2 h i2
= E 2
+E f (x) f^ (x) +E f (x) f^ (x) E f (x) f^ (x)
| {z }
variance from eq 1
h i2
= 2
+E f (x) f^ (x) + Var f (x) f^ (x) * 2
=E 2
| {z } | {z }
Bias Noise Variance
If you notice, the variance increases when the right-hand-side blue term de-
creases which is actually the Bias. So the Bias and the Variance are both
complement of each other.