Bias Variance
Bias Variance
Srihari
Bias-Variance Decomposition
• Choosing λ in maximum likelihood/least
squares estimation
• Five part discussion:
1. On-line regression demo
2. Point estimate
Chinese Emperor’s Height
3. Formulation for regression
4. Example
5. Choice of optimal λ
1
Machine Learning ! ! ! ! !Srihari
Bias-Variance in Regression
• Interactive demo at
http://www.aiaccess.net/English/Glossaries/
GlosMod/e_gm_bias_variance.htm
Low degree polynomial has high bias (fits poorly)
but has low variance with different data sets
High degree polynomial has low bias (fits well) but
has high variance with different data sets
2
Machine Learning ! ! ! ! !Srihari
• Scenario 3
• Scenario 2 • Normally distributed
• Scenario 1 • Normally distributed beliefs with beliefs with mean 180 and
mean 180 and std dev 10 std dev 20 (variance=400)
• Everyone believes it is
180 (variance=0) (variance 100) • Poll two: One says 200
• Answer is always 180 • Poll two: One says 190, other 170 and other 160
• The error is always -20 • Bias Errors are -10 and -30 • Errors: 0 and -40
– Average bias error is -20 – Ave error is -20
• Ave squared error is 400
• Average bias error is 20 • Squared errors: 100 and 900 • Sq. errors: 0 and 1600
– Ave squared error: 500
• 400=400+0 – Ave squared error: 800
• 500 = 400 + 100
• 800 = 400 + 400
Determining optimal λ
• Average Prediction
1 L (l )
y(x) = ∑ y (x)
L l=1
• Squared Bias
N 2
1
(bias) 2 = ∑
N n=1
{ y(x n ) − h(x n )}
• Variance
1 N 1 L (l )
variance = ∑ ∑{ y (x n ) − y(x n )}
2
N n=1 L l=1 6
Machine Learning ! ! ! ! !Srihari
ln l=-0.31
Bias-Variance vs Bayesian
• Bias-Variance decomposition provides insight
into model complexity issue
• Limited practical value since it is based on
ensembles of data sets
– In practice there is only a single observed data set
– If there are many training samples then combine them
• which would reduce over-fitting for a given model complexity
• Bayesian approach gives useful insights into
over-fitting and is also practical