0% found this document useful (0 votes)

123 views9 pages

Bias-Variance Decomposition

This document discusses bias-variance decomposition, which decomposes the loss of a machine learning model into bias and variance components. Bias measures how far a model's predictions are from the true underlying values on average. Variance measures how much a model's predictions change when trained on different data sets. The document defines bias and variance formally and shows how the squared error loss can be decomposed into the sum of the bias squared plus the variance. This decomposition helps understand how algorithms can underfit or overfit data.

Uploaded by

bp6tjgxs4j

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

123 views9 pages

Bias-Variance Decomposition

Uploaded by

bp6tjgxs4j

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Bias-Variance Decomposition - mlxtend 28/07/2021 06:02

Bias-Variance Decomposition
Bias variance decomposition of machine learning algorithms for various loss functions.

from mlxtend.evaluate import bias_variance_decomp

Overview
Often, researchers use the terms bias and variance or "bias-variance tradeo!" to describe the performance of
a model -- i.e., you may stumble upon talks, books, or articles where people say that a model has a high
variance or high bias. So, what does that mean? In general, we might say that "high variance" is proportional to
over"tting, and "high bias" is proportional to under"tting.

Anyways, why are we attempting to do this bias-variance decomposition in the "rst place? The decomposition
of the loss into bias and variance helps us understand learning algorithms, as these concepts are correlated to
under"tting and over"tting.

̂ some parameter
To use the more formal terms for bias and variance, assume we have a point estimator θ of
or function θ . Then, the bias is commonly de"ned as the di!erence between the expected value of the
estimator and the parameter that we want to estimate:

Bias = E[θ] ̂ − θ.

If the bias is larger than zero, we also say that the estimator is positively biased, if the bias is smaller than zero,
the estimator is negatively biased, and if the bias is exactly zero, the estimator is unbiased. Similarly, we de"ne
the variance as the di!erence between the expected value of the squared estimator minus the squared
expectation of the estimator:

( )
2
Var(θ) ̂ = E[θ ̂] − E[θ]̂ .
2

Note that in the context of this lecture, it will be more convenient to write the variance in its alternative form:

Var(θ) ̂ = E[(E[θ] ̂ − θ)2̂ ].

To illustrate the concept further in context of machine learning ...

Suppose there is an unknown target function or "true function" to which we do want to approximate. Now,
suppose we have di!erent training sets drawn from an unknown distribution de"ned as "true function +
noise." The following plot shows di!erent linear regression models, each "t to a di!erent training set. None of
these hypotheses approximate the true function well, except at two points (around x=-10 and x=6). Here, we
can say that the bias is large because the di!erence between the true value and the predicted value, on
average (here, average means "expectation of the training sets" not "expectation over examples in the training
set"), is large:

http://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/#bias-variance-decomposition Página 1 de 9
Bias-Variance Decomposition - mlxtend 28/07/2021 06:02

The next plot shows di!erent unpruned decision tree models, each "t to a di!erent training set. Note that
these hypotheses "t the training data very closely. However, if we would consider the expectation over training
sets, the average hypothesis would "t the true function perfectly (given that the noise is unbiased and has an
expected value of 0). As we can see, the variance is very large, since on average, a prediction di!ers a lot from
the expectation value of the prediction:

Bias-Variance Decomposition of the Squared Loss

We can decompose a loss function such as the squared loss into three terms, a variance, bias, and a noise
term (and the same is true for the decomposition of the 0-1 loss later). However, for simplicity, we will ignore
the noise term.

http://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/#bias-variance-decomposition Página 2 de 9
Bias-Variance Decomposition - mlxtend 28/07/2021 06:02

Before we introduce the bias-variance decomposition of the 0-1 loss for classi"cation, let us start with the
decomposition of the squared loss as an easy warm-up exercise to get familiar with the overall concept.

The previous section already listed the common formal de"nitions of bias and variance, however, let us de"ne
them again for convenience:

Bias(θ) ̂ = E[θ] ̂ − θ, Var(θ) ̂ = E[(E[θ] ̂ − θ)2̂ ].

Recall that in the context of these machine learning lecture (notes), we de"ned

the true or target function as y

= f (x) ,
the predicted target value as y
= ̂ = h(x) ,
̂ f (x)
and the squared loss as S = (y − y)2̂ . (I use S here because it will be easier to tell it apart from the E ,
which we use for the expectation in this lecture.)

Note that unless noted otherwise, the expectation is over training sets!

To get started with the squared error loss decomposition into bias and variance, let use do some algebraic
̂
manipulation, i.e., adding and subtracting the expected value of y and then expanding the expression using
the quadratic formula (a + b)2 = a2 + b2 + 2ab):

S = (y − y)2̂
(y − y)2̂ = (y − E[y] ̂ + E[y] ̂ − y)2̂
= (y − E[y])̂ 2 + (E[y] ̂ − y)2 + 2(y − E[y])(E[
̂ y] ̂ − y).̂
Next, we just use the expectation on both sides, and we are already done:

E[S] = E[(y − y)2̂ ]

E[(y − y)2̂ ] = (y − E[y])̂ 2 + E[(E[y] ̂ − y)2̂ ]
= [Bias]2 + Variance.
You may wonder what happened to the "2ab " term (2(y ̂
− E[y])(E[y] ̂ − y) ̂) when we used the expectation. It
turns that it evaluates to zero and hence vanishes from the equation, which can be shown as follows:

̂
E[2(y − E[y])(E[ y] ̂ − y)]̂ = 2E[(y − E[y])(E[̂ y] ̂ − y)]̂
= ̂
2(y − E[y])E[(E[ y] ̂ − y)]̂
= 2(y − E[y])(E[E[y]]̂ − E[y])̂
̂
= 2(y − E[y])(E[̂ y] ̂ − E[y])̂
= 0.
So, this is the canonical decomposition of the squared error loss into bias and variance. The next section will
discuss some approaches that have been made to decompose the 0-1 loss that we commonly use for
classi"cation accuracy or error.

The following "gure is a sketch of variance and bias in relation to the training error and generalization error --
how high variance related to over"tting, and how large bias relates to under"tting:

http://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/#bias-variance-decomposition Página 3 de 9
Bias-Variance Decomposition - mlxtend 28/07/2021 06:02

Bias-Variance Decomposition of the 0-1 Loss

Note that decomposing the 0-1 loss into bias and variance components is not as straight-forward as for the
squared error loss. To quote Pedro Domingos, a well-known machine learning researcher and professor at
University of Washington:

"several authors have proposed bias-variance decompositions related to zero-one loss

(Kong & Dietterich, 1995; Breiman, 1996b; Kohavi & Wolpert, 1996; Tibshirani, 1996;
Friedman, 1997). However, each of these decompositions has signi!cant shortcomings.". [1]

In fact, the paper this quote was taken from may o!er the most intuitive and general formulation at this point.
However, we will "rst, for simplicity, go over Kong & Dietterich formulation [2] of the 0-1 loss decomposition,
which is the same as Domingos's but excluding the noise term (for simplicity).

The table below summarizes the relevant terms we used for the squared loss in relation to the 0-1 loss. Recall
that the 0-1 loss, L , is 0 if a class label is predicted correctly, and one otherwise. The main prediction for the
squared error loss is simply the average over the predictions E[y] ̂ (the expectation is over training sets), for
the 0-1 loss Kong & Dietterich and Domingos de"ned it as the mode. I.e., if a model predicts the label one
more than 50% of the time (considering all possible training sets), then the main prediction is 1, and 0
otherwise.

- Squared Loss 0-1 Loss

Single loss (y − y)2̂ L(y, y) ̂

Expected loss E[(y − y)2̂ ] E[L(y, y)]̂

Main prediction E[y ] ̂ mean (average) mode

Bias2 (y − E[y])̂ 2 L(y, E[y])̂

Variance E[(E[y] ̂ − y)2̂ ] E[L(y, ̂ E[y])]

Hence, as result from using the mode to de"ne the main prediction of the 0-1 loss, the bias is 1 if the main
prediction does not agree with the true label y, and 0 otherwise:

Bias = {
1 if y ≠ E[y],̂
0 otherwise.

http://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/#bias-variance-decomposition Página 4 de 9
Bias-Variance Decomposition - mlxtend 28/07/2021 06:02

The variance of the 0-1 loss is de"ned as the probability that the predicted label does not match the main
prediction:

̂ E[y]).
V ariance = P(y ≠ ̂

Next, let us take a look at what happens to the loss if the bias is 0. Given the general de"nition of the loss, loss
= bias + variance, if the bias is 0, then we de"ne the loss as the variance:

̂ y) = V ariance = P(y ≠
Loss = 0 + V ariance = Loss = P(y ≠ ̂ E[y]).
̂

In other words, if a model has zero bias, it's loss is entirely de"ned by the variance, which is intuitive if we think
of variance in the context of being proportional over"tting.

The more surprising scenario is if the bias is equal to 1. If the bias is equal to 1, as explained by Pedro
Domingos, the increasing the variance can decrease the loss, which is an interesting observation. This can be
seen by "rst rewriting the 0-1 loss function as

̂ y) = 1 − P(y =
Loss = P(y ≠ ̂ y).

(Note that we have not done anything new, yet.) Now, if we look at the previous equation of the bias, if the bias
is 1, we have y ≠ E[y] .̂ If y is not equal to the main prediction, but y is also is equal to y , ̂ then y must
̂ be equal
to the main prediction. Using the "inverse" ("1 minus"), we can then write the loss as

̂ y) = 1 − P(y =
Loss = P(y ≠ ̂ y) = 1 − P(y ≠
̂ E[y]).
̂

Since the bias is 1, the loss is hence de"ned as "loss = bias - variance" if the bias is 1 (or "loss = 1 - variance").
This might be quite unintuitive at "rst, but the explanations Kong, Dietterich, and Domingos o!er was that if a
model has a very high bias such that it main prediction is always wrong, increasing the variance can be
bene"cial, since increasing the variance would push the decision boundary, which might lead to some correct
predictions just by chance then. In other words, for scenarios with high bias, increasing the variance can
improve (decrease) the loss!

References
[1] Domingos, Pedro. "A uni"ed bias-variance decomposition." Proceedings of 17th International
Conference on Machine Learning. 2000.
[2] Dietterich, Thomas G., and Eun Bae Kong. Machine learning bias, statistical bias, and statistical
variance of decision tree algorithms. Technical report, Department of Computer Science, Oregon State
University, 1995.

Example 1 -- Bias Variance Decomposition of a Decision Tree

Classifier

http://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/#bias-variance-decomposition Página 5 de 9
Bias-Variance Decomposition - mlxtend 28/07/2021 06:02

from mlxtend.evaluate import bias_variance_decomp

from sklearn.tree import DecisionTreeClassifier
from mlxtend.data import iris_data
from sklearn.model_selection import train_test_split

X, y = iris_data()
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3,
random_state=123,
shuffle=True,
stratify=y)

tree = DecisionTreeClassifier(random_state=123)

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(

tree, X_train, y_train, X_test, y_test,
loss='0-1_loss',
random_seed=123)

print('Average expected loss: %.3f' % avg_expected_loss)

print('Average bias: %.3f' % avg_bias)
print('Average variance: %.3f' % avg_var)

Average expected loss: 0.062

Average bias: 0.022
Average variance: 0.040

For comparison, the bias-variance decomposition of a bagging classi"er, which should intuitively have a lower
variance compared than a single decision tree:

from sklearn.ensemble import BaggingClassifier

tree = DecisionTreeClassifier(random_state=123)
bag = BaggingClassifier(base_estimator=tree,
n_estimators=100,
random_state=123)

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(

bag, X_train, y_train, X_test, y_test,
loss='0-1_loss',
random_seed=123)

print('Average expected loss: %.3f' % avg_expected_loss)

print('Average bias: %.3f' % avg_bias)
print('Average variance: %.3f' % avg_var)

Average expected loss: 0.048

Average bias: 0.022
Average variance: 0.026

Example 2 -- Bias Variance Decomposition of a Decision Tree

Regressor

http://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/#bias-variance-decomposition Página 6 de 9
Bias-Variance Decomposition - mlxtend 28/07/2021 06:02

from mlxtend.evaluate import bias_variance_decomp

from sklearn.tree import DecisionTreeRegressor
from mlxtend.data import boston_housing_data
from sklearn.model_selection import train_test_split

X, y = boston_housing_data()
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3,
random_state=123,
shuffle=True)

tree = DecisionTreeRegressor(random_state=123)

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(

tree, X_train, y_train, X_test, y_test,
loss='mse',
random_seed=123)

print('Average expected loss: %.3f' % avg_expected_loss)

print('Average bias: %.3f' % avg_bias)
print('Average variance: %.3f' % avg_var)

Average expected loss: 31.756

Average bias: 13.856
Average variance: 17.900

For comparison, the bias-variance decomposition of a bagging regressor is shown below, which should
intuitively have a lower variance than a single decision tree:

from sklearn.ensemble import BaggingRegressor

tree = DecisionTreeRegressor(random_state=123)
bag = BaggingRegressor(base_estimator=tree,
n_estimators=100,
random_state=123)

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(

bag, X_train, y_train, X_test, y_test,
loss='mse',
random_seed=123)

print('Average expected loss: %.3f' % avg_expected_loss)

print('Average bias: %.3f' % avg_bias)
print('Average variance: %.3f' % avg_var)

Average expected loss: 18.620

Average bias: 15.461
Average variance: 3.159

Example 3 -- TensorFlow/Keras Support

Since mlxtend v0.18.0, the bias_variance_decomp now supports Keras models. Note that the original model is
reset in each round (before re"tting it to the bootstrap samples).

http://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/#bias-variance-decomposition Página 7 de 9
Bias-Variance Decomposition - mlxtend 28/07/2021 06:02

from mlxtend.evaluate import bias_variance_decomp

from mlxtend.data import boston_housing_data
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import tensorflow as tf
import numpy as np

np.random.seed(1)
tf.random.set_seed(1)

X, y = boston_housing_data()
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3,
random_state=123,
shuffle=True)

model = tf.keras.Sequential([
tf.keras.layers.Dense(32, activation=tf.nn.relu),
tf.keras.layers.Dense(1)
])

optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mean_squared_error', optimizer=optimizer)

model.fit(X_train, y_train, epochs=100, verbose=0)

mean_squared_error(model.predict(X_test), y_test)

32.69300595184836

Note that it is highly recommended to use the same number of training epochs that you would use on the
original training set to ensure convergence:

np.random.seed(1)
tf.random.set_seed(1)

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(

model, X_train, y_train, X_test, y_test,
loss='mse',
num_rounds=100,
random_seed=123,
epochs=200, # fit_param
verbose=0) # fit_param

print('Average expected loss: %.3f' % avg_expected_loss)

print('Average bias: %.3f' % avg_bias)
print('Average variance: %.3f' % avg_var)

Average expected loss: 32.740

Average bias: 27.474
Average variance: 5.265

API
bias_variance_decomp(estimator, X_train, y_train, X_test, y_test, loss='0-1_loss', num_rounds=200,
random_seed=None, !t_params)

estimator : object A classi"er or regressor object or class implementing both a fit and predict method
similar to the scikit-learn API.

http://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/#bias-variance-decomposition Página 8 de 9
Bias-Variance Decomposition - mlxtend 28/07/2021 06:02

X_train : array-like, shape=(num_examples, num_features)

A training dataset for drawing the bootstrap samples to carry out the bias-variance decomposition.

y_train : array-like, shape=(num_examples)

Targets (class labels, continuous values in case of regression) associated with the X_train examples.

X_test : array-like, shape=(num_examples, num_features)

The test dataset for computing the average loss, bias, and variance.

y_test : array-like, shape=(num_examples)

Targets (class labels, continuous values in case of regression) associated with the X_test examples.

loss : str (default='0-1_loss')

Loss function for performing the bias-variance decomposition. Currently allowed values are '0-1_loss' and
'mse'.

num_rounds : int (default=200)

Number of bootstrap rounds for performing the bias-variance decomposition.

random_seed : int (default=None)

Random seed for the bootstrap sampling used for the bias-variance decomposition.

fit_params : additional parameters

Additional parameters to be passed to the ."t() function of the estimator when it is "t to the bootstrap
samples.

Returns

avg_expected_loss, avg_bias, avg_var : returns the average expected

average bias, and average bias (all #oats), where the average is computed over the data points in the test
set.

Examples

For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/

ython

Documentation built with MkDocs (http://www.mkdocs.org/).

http://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/#bias-variance-decomposition Página 9 de 9

Bias and Variance
No ratings yet
Bias and Variance
21 pages
Bias Variance Annotated
No ratings yet
Bias Variance Annotated
73 pages
08 Eval-Intro Slides
No ratings yet
08 Eval-Intro Slides
57 pages
Week 2 - Model Selection and Regularization - Corrected 26 Jan
No ratings yet
Week 2 - Model Selection and Regularization - Corrected 26 Jan
40 pages
Regularization Linear Models
No ratings yet
Regularization Linear Models
23 pages
226 Lecture5 Prediction
No ratings yet
226 Lecture5 Prediction
45 pages
Methodo Supervised 2022
No ratings yet
Methodo Supervised 2022
48 pages
Excellent 05 - Overfitting
No ratings yet
Excellent 05 - Overfitting
22 pages
Machine Learning-Unit 3
No ratings yet
Machine Learning-Unit 3
18 pages
6 Generalization Analysis
No ratings yet
6 Generalization Analysis
11 pages
08 Eval-Intro Notes
No ratings yet
08 Eval-Intro Notes
10 pages
3.3 Bias Variance
No ratings yet
3.3 Bias Variance
14 pages
(Technical) Machine Learning U3-6 (2019 Pattern)
No ratings yet
(Technical) Machine Learning U3-6 (2019 Pattern)
101 pages
DL-Lec 2 - Bias-Variance-Tradeoff
No ratings yet
DL-Lec 2 - Bias-Variance-Tradeoff
33 pages
Bias Variance
No ratings yet
Bias Variance
8 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Overfitting: Extracting Too Much
No ratings yet
Overfitting: Extracting Too Much
17 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
ESGB Evaluation Methods
No ratings yet
ESGB Evaluation Methods
84 pages
Bias-Variance Tradeoff
No ratings yet
Bias-Variance Tradeoff
6 pages
Lec21 BiasVarianceDecomposition
No ratings yet
Lec21 BiasVarianceDecomposition
15 pages
P&AD Lect 17 1 Unit2
No ratings yet
P&AD Lect 17 1 Unit2
14 pages
Machine Learning Math Essentials - 12.02.2025
No ratings yet
Machine Learning Math Essentials - 12.02.2025
88 pages
Learning Theory
No ratings yet
Learning Theory
19 pages
Bias Variance Trade Off
No ratings yet
Bias Variance Trade Off
4 pages
Module 3 Modified
No ratings yet
Module 3 Modified
48 pages
Lecture 1.4
No ratings yet
Lecture 1.4
14 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
7 pages
Ghojogh, Benyamin, and Mark Crowley
No ratings yet
Ghojogh, Benyamin, and Mark Crowley
23 pages
Over Fitting
No ratings yet
Over Fitting
19 pages
Unit 1-Week2: Linear Regression, Bias, Variance, Under and Over Fitting, Curse of Dimensionality and ROC
No ratings yet
Unit 1-Week2: Linear Regression, Bias, Variance, Under and Over Fitting, Curse of Dimensionality and ROC
53 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
77 pages
DoubleDescente Synthesis
No ratings yet
DoubleDescente Synthesis
5 pages
ML - Bias Vs Variance - GeeksforGeeks
No ratings yet
ML - Bias Vs Variance - GeeksforGeeks
11 pages
Dis2 Sol
No ratings yet
Dis2 Sol
12 pages
Machine Learning Interview Question
No ratings yet
Machine Learning Interview Question
72 pages
12 Bias-Variance - Underfit - Overfit
No ratings yet
12 Bias-Variance - Underfit - Overfit
4 pages
Addendum Bias Variance
No ratings yet
Addendum Bias Variance
3 pages
Bias Variance
No ratings yet
Bias Variance
3 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
MDL Assignment2 Spring23
No ratings yet
MDL Assignment2 Spring23
5 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
10 pages
Unit 4
No ratings yet
Unit 4
50 pages
Learning From Data: 4: Bias Variance Tradeoff
No ratings yet
Learning From Data: 4: Bias Variance Tradeoff
24 pages
Estimating Bias and Variance From Data: Unpublished Draft
No ratings yet
Estimating Bias and Variance From Data: Unpublished Draft
25 pages
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
No ratings yet
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
14 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
Bias and Variance
No ratings yet
Bias and Variance
21 pages
School of Computing and Information Systems The University of Melbourne COMP90049 Introduction To Machine Learning (Semester 1, 2022)
No ratings yet
School of Computing and Information Systems The University of Melbourne COMP90049 Introduction To Machine Learning (Semester 1, 2022)
4 pages
1 5 Bias Variance Trade Off
No ratings yet
1 5 Bias Variance Trade Off
34 pages
Jkkklphftbbhuii
No ratings yet
Jkkklphftbbhuii
17 pages
MIT15 097S12 Lec04
No ratings yet
MIT15 097S12 Lec04
6 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
Chapter2 1 22
No ratings yet
Chapter2 1 22
9 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Diagnosing Bias Vs Variance
No ratings yet
Diagnosing Bias Vs Variance
11 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
6 pages
Applied ML Notes
No ratings yet
Applied ML Notes
123 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Statistical Methods For Bioinformatics Lecture 2
No ratings yet
Statistical Methods For Bioinformatics Lecture 2
47 pages
Introduction To Machine Learning EECS 6327
No ratings yet
Introduction To Machine Learning EECS 6327
22 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Machine Learning
No ratings yet
Machine Learning
256 pages
Machine Learning Qs
No ratings yet
Machine Learning Qs
10 pages
DAIOT UNIT 5 (1) Own
No ratings yet
DAIOT UNIT 5 (1) Own
13 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
Underfitting & Overfitting
No ratings yet
Underfitting & Overfitting
13 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
Bias and Variance in Machine Learning - Javatpoint
100% (2)
Bias and Variance in Machine Learning - Javatpoint
6 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
Section 1: Cross-Validation and Model Performance
No ratings yet
Section 1: Cross-Validation and Model Performance
33 pages
Generalization Error: Elie Kawerk
No ratings yet
Generalization Error: Elie Kawerk
37 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
191IT7310Machine LearningQB
No ratings yet
191IT7310Machine LearningQB
27 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
Random Forest
No ratings yet
Random Forest
225 pages
FDS Viva
No ratings yet
FDS Viva
46 pages
DL Unit 1 Introduction To DL
No ratings yet
DL Unit 1 Introduction To DL
62 pages
Bruno Gonçalves: Deep Learning From Scratch
No ratings yet
Bruno Gonçalves: Deep Learning From Scratch
95 pages
Leadership Quarterly - 2022 - Opening The Black Box Uncovering The Leader Trait Paradigm Through Machine Learning
No ratings yet
Leadership Quarterly - 2022 - Opening The Black Box Uncovering The Leader Trait Paradigm Through Machine Learning
12 pages
Data Science 1 2023 - Lecture 02 - Mathematical Preliminaries and Correlation
No ratings yet
Data Science 1 2023 - Lecture 02 - Mathematical Preliminaries and Correlation
49 pages
Deep Learning: - Course Code: - Unit 2
No ratings yet
Deep Learning: - Course Code: - Unit 2
15 pages
09 - Machine Learning
No ratings yet
09 - Machine Learning
7 pages
STAT318 - Data Mining: Dr. Blair Robertson
No ratings yet
STAT318 - Data Mining: Dr. Blair Robertson
39 pages
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
No ratings yet
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
30 pages
CS229 Bias-Variance and Error Analysis: Yoann Le Calonnec October 2, 2017 1 The Bias-Variance Tradeoff
No ratings yet
CS229 Bias-Variance and Error Analysis: Yoann Le Calonnec October 2, 2017 1 The Bias-Variance Tradeoff
5 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Fuzzy Set Theory: Fundamentals and Applications
From Everand
Fuzzy Set Theory: Fundamentals and Applications
Fouad Sabry
No ratings yet

Bias-Variance Decomposition

Uploaded by

Bias-Variance Decomposition

Uploaded by

Bias-Variance Decomposition - mlxtend 28/07/2021 06:02

from mlxtend.evaluate import bias_variance_decomp

Var(θ) ̂ = E[(E[θ] ̂ − θ)2̂ ].

To illustrate the concept further in context of machine learning ...

Bias-Variance Decomposition of the Squared Loss

Bias(θ) ̂ = E[θ] ̂ − θ, Var(θ) ̂ = E[(E[θ] ̂ − θ)2̂ ].

the true or target function as y

E[S] = E[(y − y)2̂ ]

Bias-Variance Decomposition of the 0-1 Loss

"several authors have proposed bias-variance decompositions related to zero-one loss

- Squared Loss 0-1 Loss

Single loss (y − y)2̂ L(y, y) ̂

Expected loss E[(y − y)2̂ ] E[L(y, y)]̂

Main prediction E[y ] ̂ mean (average) mode

Bias2 (y − E[y])̂ 2 L(y, E[y])̂

Variance E[(E[y] ̂ − y)2̂ ] E[L(y, ̂ E[y])]

Example 1 -- Bias Variance Decomposition of a Decision Tree

from mlxtend.evaluate import bias_variance_decomp

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(

print('Average expected loss: %.3f' % avg_expected_loss)

Average expected loss: 0.062

from sklearn.ensemble import BaggingClassifier

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(

print('Average expected loss: %.3f' % avg_expected_loss)

Average expected loss: 0.048

Example 2 -- Bias Variance Decomposition of a Decision Tree

from mlxtend.evaluate import bias_variance_decomp

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(

print('Average expected loss: %.3f' % avg_expected_loss)

Average expected loss: 31.756

from sklearn.ensemble import BaggingRegressor

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(

print('Average expected loss: %.3f' % avg_expected_loss)

Average expected loss: 18.620

Example 3 -- TensorFlow/Keras Support

from mlxtend.evaluate import bias_variance_decomp

model.fit(X_train, y_train, epochs=100, verbose=0)

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(

print('Average expected loss: %.3f' % avg_expected_loss)

Average expected loss: 32.740

X_train : array-like, shape=(num_examples, num_features)

y_train : array-like, shape=(num_examples)

X_test : array-like, shape=(num_examples, num_features)

y_test : array-like, shape=(num_examples)

loss : str (default='0-1_loss')

num_rounds : int (default=200)

Number of bootstrap rounds for performing the bias-variance decomposition.

random_seed : int (default=None)

fit_params : additional parameters

avg_expected_loss, avg_bias, avg_var : returns the average expected

For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/

Copyright © 2014-2020 Sebastian Raschka (http://sebastianraschka.com)

You might also like