0% found this document useful (0 votes)
7 views8 pages

Bias Variance

Uploaded by

abcdwfghijk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views8 pages

Bias Variance

Uploaded by

abcdwfghijk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Machine Learning ! ! ! ! !

Srihari

Bias-Variance Decomposition
• Choosing λ in maximum likelihood/least
squares estimation
• Five part discussion:
1. On-line regression demo
2. Point estimate
Chinese Emperor’s Height
3. Formulation for regression
4. Example
5. Choice of optimal λ

1
Machine Learning ! ! ! ! !Srihari

Bias-Variance in Regression

• Interactive demo at

http://www.aiaccess.net/English/Glossaries/
GlosMod/e_gm_bias_variance.htm
Low degree polynomial has high bias (fits poorly)
but has low variance with different data sets
High degree polynomial has low bias (fits well) but
has high variance with different data sets

2
Machine Learning ! ! ! ! !Srihari

Bias-Variance in Point Estimate


True height of Chinese emperor: 200cm, about 6’6”
Poll a random American: ask “How tall is the emperor?”
We want to determine how wrong they are, on average
Each scenario has expected value of 180 (or bias error = 20), but increasing
variance in estimate
200 200 200
Bias Bias Bias
180 No variance Some variance 180 More variance 180

• Scenario 3
• Scenario 2 • Normally distributed
• Scenario 1 • Normally distributed beliefs with beliefs with mean 180 and
mean 180 and std dev 10 std dev 20 (variance=400)
• Everyone believes it is
180 (variance=0) (variance 100) • Poll two: One says 200
• Answer is always 180 • Poll two: One says 190, other 170 and other 160
• The error is always -20 • Bias Errors are -10 and -30 • Errors: 0 and -40
– Average bias error is -20 – Ave error is -20
• Ave squared error is 400
• Average bias error is 20 • Squared errors: 100 and 900 • Sq. errors: 0 and 1600
– Ave squared error: 500
• 400=400+0 – Ave squared error: 800
• 500 = 400 + 100
• 800 = 400 + 400

Squared error = Square of bias error + Variance


As variance increases, error increases
Machine Learning ! ! ! ! !Srihari

Bias -Variance in Regression


• y(x): estimate of the value of t for input x
• h(x): optimal prediction
h(x) = E[t | x] = ∫ tp(t | x)dt
• If we assume loss function L(t,y(x))={y(x)-t}2
• E[L] can be written as
expected loss = (bias)2 + variance + noise
• where
Difference between expected value
(bias) =
2
∫ {E [y(x;D)] − h(x)} p(x)dx
D
2
and optimal

variance = ∫ E [{y(x;D)] − E [y(x;D)]} ] p(x)dx


D D
2

noise = ∫ {h(x) − t} p(x,t)dxdt


2
4
Machine Learning ! ! ! ! !Srihari

Dependence of Bias-Variance on Model Complexity


20 Fits for Red: Average of Fits
• h(x)=sin(2px) 25 data Green: Sinusoid from which
data was generated
• Regularization parameter l
points each Low
• L=100 data sets Variance
• Each with N=25 High bias
• 24 Gaussian Basis
functions High l

– No of parameters M=25
• Total Error function:
1
N
λ T
∑{ n n }
2
t − w T
φ (x ) + w w
2 n=1 2
where f is a vector of basis functions

Result of averaging multiple


solutions with complex model gives good fit High
Weighted averaging of multiple Variance
solutions is at heart of Bayesian Low bias
approach: not wrt multiple data
sets but wrt posterior distribution
Low l
5
of parameters
Machine Learning ! ! ! ! !Srihari

Determining optimal λ
• Average Prediction
1 L (l )
y(x) = ∑ y (x)
L l=1

• Squared Bias
N 2
1
(bias) 2 = ∑
N n=1
{ y(x n ) − h(x n )}

• Variance
1 N 1 L (l )
variance = ∑ ∑{ y (x n ) − y(x n )}
2

N n=1 L l=1 6
Machine Learning ! ! ! ! !Srihari

Squared Bias and Variance vs λ

Test error minimum


occurs close to minimum
of (bias2+variance)

ln l=-0.31

Small values of λ allow model to Large values of λ pull weight


become finely tuned to noise parameters to zero leading
leading to large variance to large bias 7
Machine Learning ! ! ! ! !Srihari

Bias-Variance vs Bayesian
• Bias-Variance decomposition provides insight
into model complexity issue
• Limited practical value since it is based on
ensembles of data sets
– In practice there is only a single observed data set
– If there are many training samples then combine them
• which would reduce over-fitting for a given model complexity
• Bayesian approach gives useful insights into
over-fitting and is also practical

You might also like