0% found this document useful (0 votes)

27 views4 pages

Bias Variance Trade Off

The lecture discusses the bias-variance tradeoff in machine learning, explaining how generalization error can be decomposed into variance, bias, and noise. It outlines the conditions for high variance and high bias, providing symptoms and remedies for each scenario. The document emphasizes the importance of understanding these concepts to improve classifier performance.

Uploaded by

vijaygct

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views4 pages

Bias Variance Trade Off

Uploaded by

vijaygct

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

3/23/23, 11:55 AM Lecture 11: Bias Variance Tradeoff

Lecture 11: Bias-Variance Tradeoff

As usual, we are given a dataset D = {(x1 , y1 ), … , (xn , yn )} , drawn i.i.d. from some distribution
P (X, Y ) . Throughout this lecture we assume a regression setting, i.e. y ∈ R . In this lecture we will

decompose the generalization error of a classifier into three rather interpretable terms. Before we do that,
let us consider that for any given input x there might not exist a unique label y. For example, if your
vector x describes features of house (e.g. #bedrooms, square footage, ...) and the label y its price, you
could imagine two houses with identical description selling for different prices. So for any given feature
vector x , there is a distribution over possible labels. We therefore define the following, which will come in
useful later on:

Expected Label (given x ∈ R

d
):

ȳ (x) = Ey|x [Y ] = ∫ y Pr(y|x)∂ y.

The expected label denotes the label you would expect to obtain, given a feature vector x .
Alright, so we draw our training set D , consisting of n inputs, i.i.d. from the distribution P . As a second
step we typically call some machine learning algorithm A on this data set to learn a hypothesis (aka
classifier). Formally, we denote this process as hD = A(D).
For a given hD , learned on data set D with algorithm A , we can compute the generalization error (as
measured in squared loss) as follows:

Expected Test Error (given hD ):

2 2
E(x,y)∼P [(hD (x) − y) ] = ∫ ∫ (hD (x) − y) Pr(x, y)∂ y∂ x.

x y

Note that one can use other loss functions. We use squared loss because it has nice mathematical
properties, and it is also the most common loss function.
The previous statement is true for a given training set D . However, remember that D itself is drawn from
P , and is therefore a random variable. Further, hD is a function of D , and is therefore also a random
n

variable. And we can of course compute its expectation:

Expected Classifier (given A ):

¯
h = ED∼P n [hD ] = ∫ hD Pr(D)∂ D

where Pr(D) is the probability of drawing D from P n . Here, h

¯
is a weighted average over functions.
We can also use the fact that hD is a random variable to compute the expected test error only given A ,
taking the expectation also over D .

Expected Test Error (given A ):

2 2
E (x,y)∼P
[(hD (x) − y) ] = ∫ ∫ ∫ (hD (x) − y) P(x, y)P(D)∂ x∂ y∂ D
n
D∼P D x y

To be clear, D is our training points and the (x, y) pairs are the test points.
We are interested in exactly this expression, because it evaluates the quality of a machine learning
algorithm A with respect to a data distribution P (X, Y ) . In the following we will show that this
expression decomposes into three meaningful terms.

Decomposition of Expected Test Error

file:///Users/kilianweinberger/Documents/teaching/CS4780/CS4780MasterCornell/2023Spring/WebSite/lectures/temp.html 1/4
3/23/23, 11:55 AM Lecture 11: Bias Variance Tradeoff
2 2
¯ ¯
Ex,y,D [[hD (x) − y] ] = Ex,y,D [[(hD (x) − h(x)) + (h(x) − y)] ]

2
¯
¯ (x) − h 2 ¯ ¯ ¯
= Ex,D [(hD
(x)) ] + 2 Ex,y,D [(hD (x) − h(x)) (h(x) − y)] + Ex,y [(h (x) − y) ]

The middle term of the above equation is 0 as we show below

¯ ¯ ¯ ¯
Ex,y,D [(hD (x) − h(x)) (h(x) − y)] = Ex,y [ED [hD (x) − h(x)] (h(x) − y)]

¯ ¯
= Ex,y [(ED [hD (x)] − h(x)) (h(x) − y)]

¯ ¯ ¯
= Ex,y [(h(x) − h(x)) (h(x) − y)]

= Ex,y [0]

= 0

Returning to the earlier expression, we're left with the variance and another term

2 2 2
¯ ¯
Ex,y,D [(hD (x) − y) ] = Ex,D [(hD (x) − h (x)) ] + Ex,y [(h(x) − y) ]


Variance

We can break down the second term in the above equation as follows:

2 2
¯ ¯
Ex,y [(h(x) − y) ] = Ex,y [(h(x) − ȳ (x)) + (ȳ (x) − y) ]

2 2
¯ ¯
= Ex,y [(ȳ (x) − y) ] + Ex [(h(x) − ȳ (x)) ] + 2 Ex,y [(h(x) − ȳ (x)) (ȳ (x) − y)]

 
Noise 2
Bias

The third term in the equation above is 0, as we show below

¯ ¯
Ex,y [(h(x) − ȳ (x)) (ȳ (x) − y)] = Ex [Ey∣x [ȳ (x) − y] (h(x) − ȳ (x))]

¯
= Ex [Ey∣x [ȳ (x) − y] (h(x) − ȳ (x))]

¯
= Ex [(ȳ (x) − Ey∣x [y]) (h(x) − ȳ (x))]

¯
= Ex [(ȳ (x) − ȳ (x)) (h(x) − ȳ (x))]

= Ex [0]

= 0

This gives us the decomposition of expected test error as follows

2 2 2 2
¯ ¯
Ex,y,D [(hD (x) − y) ] = Ex,D [(hD (x) − h(x)) ] + Ex,y [(ȳ (x) − y) ] + Ex [(h(x) − ȳ (x)) ]

   

Expected Test Error Variance Noise 2
Bias

Variance: Captures how much your classifier changes if you train on a different training set. How "over-
specialized" is your classifier to a particular training set (overfitting)? If we have the best possible model
for our training data, how far off are we from the average classifier?

Bias: What is the inherent error that you obtain from your classifier even with infinite training data? This
is due to your classifier being "biased" to a particular kind of solution (e.g. linear classifier). In other
words, bias is inherent to your model.

Noise: How big is the data-intrinsic noise? This error measures ambiguity due to your data distribution
and feature representation. You can never beat this, it is an aspect of the data.

file:///Users/kilianweinberger/Documents/teaching/CS4780/CS4780MasterCornell/2023Spring/WebSite/lectures/temp.html 2/4
3/23/23, 11:55 AM Lecture 11: Bias Variance Tradeoff

Fig 1: Graphical illustration of bias and variance.

Fig 2: The variation of Bias and Variance with the model complexity. This is similar to the
concept of overfitting and underfitting. More complex models overfit while the simplest
models underfit.
Source: http://scott.fortmann-roe.com/docs/BiasVariance.html

Detecting High Bias and High Variance

If a classifier is under-performing (e.g. if the test or training error is too high), there are several ways to
improve performance. To find out which of these many techniques is the right one for the situation, the
first step is to determine the root of the problem.

file:///Users/kilianweinberger/Documents/teaching/CS4780/CS4780MasterCornell/2023Spring/WebSite/lectures/temp.html 3/4
3/23/23, 11:55 AM Lecture 11: Bias Variance Tradeoff

Figure 3: Test and training error as the number of training instances increases.

The graph above plots the training error and the test error and can be divided into two overarching
regimes. In the first regime (on the left side of the graph), training error is below the desired error
threshold (denoted by ϵ ), but test error is significantly higher. In the second regime (on the right side of
the graph), test error is remarkably close to training error, but both are above the desired tolerance of ϵ .

Regime 1 (High Variance)

In the first regime, the cause of the poor performance is high variance.

Symptoms:

1. Training error is much lower than test error

2. Training error is lower than ϵ
3. Test error is above ϵ

Remedies:

Add more training data

Reduce model complexity -- complex models are prone to high variance
Bagging (will be covered later in the course)

Regime 2 (High Bias)

Unlike the first regime, the second regime indicates high bias: the model being used is not robust enough
to produce an accurate prediction.

Symptoms:

1. Training error is higher than ϵ

Remedies:

Use more complex model (e.g. kernelize, use non-linear models)

Add features
Boosting (will be covered later in the course)

file:///Users/kilianweinberger/Documents/teaching/CS4780/CS4780MasterCornell/2023Spring/WebSite/lectures/temp.html 4/4

DL Unit1
100% (2)
DL Unit1
79 pages
Bias and Variance in Machine Learning
100% (1)
Bias and Variance in Machine Learning
7 pages
Bias and Variance
No ratings yet
Bias and Variance
21 pages
771 A18 Lec26 PDF
No ratings yet
771 A18 Lec26 PDF
113 pages
Unit 2
No ratings yet
Unit 2
97 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
L3 Model Selection Diagnostics
No ratings yet
L3 Model Selection Diagnostics
75 pages
08 Eval-Intro Slides
No ratings yet
08 Eval-Intro Slides
57 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
ML3 - Evaluation
100% (1)
ML3 - Evaluation
65 pages
Overfitting: Extracting Too Much
No ratings yet
Overfitting: Extracting Too Much
17 pages
Week2-Day 1-Introduction To Data Mining
No ratings yet
Week2-Day 1-Introduction To Data Mining
30 pages
40 Machine Learning Interview Questions
No ratings yet
40 Machine Learning Interview Questions
55 pages
Machine Learning Math Essentials - 12.02.2025
No ratings yet
Machine Learning Math Essentials - 12.02.2025
88 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
Bias - Variance Trade Off
No ratings yet
Bias - Variance Trade Off
11 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Module 3 Modified
No ratings yet
Module 3 Modified
48 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
(Technical) Machine Learning U3-6 (2019 Pattern)
No ratings yet
(Technical) Machine Learning U3-6 (2019 Pattern)
101 pages
Learning From Data: 4: Bias Variance Tradeoff
No ratings yet
Learning From Data: 4: Bias Variance Tradeoff
24 pages
Bias-Variance Decomposition
No ratings yet
Bias-Variance Decomposition
9 pages
3 Bias Variance Tradeoff
No ratings yet
3 Bias Variance Tradeoff
9 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Lecture 4 - Bias-Variance Trade-Off and Model Selection
No ratings yet
Lecture 4 - Bias-Variance Trade-Off and Model Selection
66 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Bias and Variance
No ratings yet
Bias and Variance
21 pages
Jkkklphftbbhuii
No ratings yet
Jkkklphftbbhuii
17 pages
Emailing PREDICTIVE ANALYSIS 2
No ratings yet
Emailing PREDICTIVE ANALYSIS 2
14 pages
2 1 TXT Bias Variance
No ratings yet
2 1 TXT Bias Variance
4 pages
10: Advice For Applying Machine Learning: Deciding What To Try Next
No ratings yet
10: Advice For Applying Machine Learning: Deciding What To Try Next
8 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
Machine Learning-Unit 3
No ratings yet
Machine Learning-Unit 3
18 pages
Biasvariancetradeoff 210313075413
No ratings yet
Biasvariancetradeoff 210313075413
13 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
No ratings yet
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
14 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
5 DL
No ratings yet
5 DL
33 pages
Bias Variance
No ratings yet
Bias Variance
8 pages
08 Eval-Intro Notes
No ratings yet
08 Eval-Intro Notes
10 pages
Probability Theory
No ratings yet
Probability Theory
3 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Lecture Planner - Physics - Laksha JEE 2025
100% (1)
Lecture Planner - Physics - Laksha JEE 2025
3 pages
Over Fitting
No ratings yet
Over Fitting
19 pages
Merge +1
No ratings yet
Merge +1
107 pages
12 Bias-Variance - Underfit - Overfit
No ratings yet
12 Bias-Variance - Underfit - Overfit
4 pages
Unit - 1 Leftover Topic Notes
No ratings yet
Unit - 1 Leftover Topic Notes
8 pages
Bias-Variance Tradeoff
No ratings yet
Bias-Variance Tradeoff
6 pages
1.bais Varience Trade-Off
No ratings yet
1.bais Varience Trade-Off
5 pages
ML - Bias Vs Variance - GeeksforGeeks
No ratings yet
ML - Bias Vs Variance - GeeksforGeeks
11 pages
Bias Variance Tradeoff ML
No ratings yet
Bias Variance Tradeoff ML
2 pages
ML UNIT 4 Notes
No ratings yet
ML UNIT 4 Notes
30 pages
Dis2 Sol
No ratings yet
Dis2 Sol
12 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
6 pages
Diagnosing Bias Vs Variance
No ratings yet
Diagnosing Bias Vs Variance
11 pages
Uf, Of, Bias-Variance Tradeoff
No ratings yet
Uf, Of, Bias-Variance Tradeoff
3 pages
Bias
No ratings yet
Bias
2 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
10 pages
Castor Oil, The Palm of Christ Compilation
100% (10)
Castor Oil, The Palm of Christ Compilation
11 pages
Proteins Associated With Neurodevelopmental Disorders
No ratings yet
Proteins Associated With Neurodevelopmental Disorders
347 pages
Greek (Pythagoras and Plato)
100% (1)
Greek (Pythagoras and Plato)
29 pages
Bloomberg Per Security Manual
0% (1)
Bloomberg Per Security Manual
103 pages
Electron Microscope-Cell Structure Workbook
No ratings yet
Electron Microscope-Cell Structure Workbook
7 pages
Kuby Immunology 7th Edition Owen HQ File Fast Access
No ratings yet
Kuby Immunology 7th Edition Owen HQ File Fast Access
317 pages
HT2000 H56163 HR PDF
No ratings yet
HT2000 H56163 HR PDF
1 page
CHPC PDF
No ratings yet
CHPC PDF
6 pages
Accounting Basic Entity Configuration Guide V10.1
No ratings yet
Accounting Basic Entity Configuration Guide V10.1
96 pages
Ideology of Islam
No ratings yet
Ideology of Islam
15 pages
Hanging Scaffolding - Pipe Rack Area
No ratings yet
Hanging Scaffolding - Pipe Rack Area
20 pages
光纤模块使用说明 (V1 6) 英文
No ratings yet
光纤模块使用说明 (V1 6) 英文
1 page
Edu 251 Thematic Unit Final
No ratings yet
Edu 251 Thematic Unit Final
7 pages
Project Muse 18237-646377
No ratings yet
Project Muse 18237-646377
16 pages
Uspfo Sop
No ratings yet
Uspfo Sop
34 pages
Failure Analysis of Water Pump Shaft
No ratings yet
Failure Analysis of Water Pump Shaft
7 pages
LE10-ECO-MA-SP25-Elementary Economic Analysis
No ratings yet
LE10-ECO-MA-SP25-Elementary Economic Analysis
33 pages
Coldstore Doors PDF
No ratings yet
Coldstore Doors PDF
2 pages
Le Mock 35 Ques @legaledgemock
No ratings yet
Le Mock 35 Ques @legaledgemock
40 pages
EBOOK Darby and Walsh Dental Hygiene Theory Practice 5Th Edition Download Full Chapter PDF Kindle
100% (59)
EBOOK Darby and Walsh Dental Hygiene Theory Practice 5Th Edition Download Full Chapter PDF Kindle
61 pages
Patellofemoral Pain Instability and Arth
No ratings yet
Patellofemoral Pain Instability and Arth
10 pages
Industrial Waste Treatment, Volume I
No ratings yet
Industrial Waste Treatment, Volume I
4 pages
OL Science English (2021) Paper 02
No ratings yet
OL Science English (2021) Paper 02
17 pages
309 Introdiag FL 15
No ratings yet
309 Introdiag FL 15
11 pages
Bim Project Execution Plan
No ratings yet
Bim Project Execution Plan
1 page
PVP2018 84110
No ratings yet
PVP2018 84110
6 pages
FDA Certificate
No ratings yet
FDA Certificate
3 pages
Influence of Superconductor Dirtiness On The SNSPD Sensitivity-Bandwidth Trade-Off
No ratings yet
Influence of Superconductor Dirtiness On The SNSPD Sensitivity-Bandwidth Trade-Off
12 pages
Global Imbalances CSR44
No ratings yet
Global Imbalances CSR44
56 pages
Comparator
No ratings yet
Comparator
11 pages
0818 Stran DWG001 MRDV Man 15M
No ratings yet
0818 Stran DWG001 MRDV Man 15M
80 pages
1 Correlation
No ratings yet
1 Correlation
1 page
Bisleri Brand Equity1 PDF
No ratings yet
Bisleri Brand Equity1 PDF
2 pages
July 19 1963
No ratings yet
July 19 1963
9 pages
FNLCRT 10
No ratings yet
FNLCRT 10
1 page
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet