0% found this document useful (0 votes)
4 views4 pages

T04 Soln

The document discusses the comparison between flexible and inflexible statistical learning methods in various scenarios, focusing on bias and variance. It includes explanations of bias-variance trade-offs, error curves, and the k-nearest neighbors regression model. Additionally, it provides insights on how the choice of parameters like sample size and number of features can influence model performance.

Uploaded by

xuyifei9866
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

T04 Soln

The document discusses the comparison between flexible and inflexible statistical learning methods in various scenarios, focusing on bias and variance. It includes explanations of bias-variance trade-offs, error curves, and the k-nearest neighbors regression model. Additionally, it provides insights on how the choice of parameters like sample size and number of features can influence model performance.

Uploaded by

xuyifei9866
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine Learning 2025

National University of Singapore CS3244


Prof Lee Wee Sun and Prof Wang Ye

Tutorial 4
1. Flexible vs Inflexible Method (Modified from An Introduction to Statistical Learning)
For each of parts (a) through (d), indicate whether we would generally expect the perfor-
mance of a flexible statistical learning method to be better or worse than an inflexible method.
Justify your answer in terms of bias and variance.

(a) The sample size n is extremely large, and the number of features is small.
Solution: If the sample size is large and number of features is small, the variance can
usually be kept small even with flexible methods. Flexible methods can have smaller
bias so will likely be preferable in this case.
(b) The number of features is extremely large, and the number of observations n is small.
Solution: If the number of features is extremely large and number of observations is
small, it may be difficult to keep the variance small with flexible methods. Inflexible
methods may perform better even if it is biased if the variance is much smaller.
(c) The relationship between the features and response is highly non-linear.
Solution: Inflexible methods may not be able to represent highly non-linear functions
and have high bias. In this case, using flexible methods to reduce bias may be helpful,
although still need to consider the variance.
(d) The variance of the error terms, i.e. σ 2 = V ar(ϵ), is extremely high.
Solution: If the data is very noisy, then it is easy for flexible methods to overfit and
have high variance. An inflexible method would be less likely to overfit the noise.

2. Bias Variance and Error Curves. (From An Introduction to Statistical Learning)

(a) Provide a sketch of typical (squared) bias, variance, training error, test error, and irre-
ducible error curves, on a single plot, as we go from less flexible statistical learning
methods towards more flexible approaches. The x-axis should represent the amount
of flexibility in the method, and the y-axis should represent the values for each curve.
There should be five curves. Make sure to label each one.
Solution:
Tutorial 4 2

Figure 1: Curves showing bias, variance, training error, test error and Bayes error against flexibility.

(b) Explain why each of the five curves has the shape displayed in part (a).
Solution: As flexibility increases, the model is better able to approximate the target
conditional distribution and so bias decreases. For universal approximators, the bias
should decrease to zero. As flexibility increases, variance increase as the number of
ways to fit the same training dataset also increases. Training error will decrease, even-
tually to zero as the function class eventually is able to interpolate the data exactly. Test
error decreases initially as the reduction of bias dominates but eventually will increase
as variance increases faster than any reduction of bias. Irreducible error is external data
noise and is not affected by the approximator so does not change with flexibility of the
method.

3. Bias and Variance for kNN


In this problem we will consider the k-nearest regression fit model. Consider a training set
{(xi , yi )}i=1,...,N , where each sample follows the assumption of yi = f (xi ) + ϵi , where
ϵi ∼ N (0, σ 2 ) is i.i.d. Gaussian noise (such that E[ϵ] = 0 and V ar[ϵ] = σ 2 ).

(a) Let’s first try to understand the assumption about the training dataset. Which of these
statements are correct about the assumption of the training set?
A) all ϵi have the same mean
B) all yi have the same mean
C) all yi have the same variance
Solution: A and C.
Reason: Considering the data generation process of yi = f (xi ) + ϵi , f (xi ) is a function
of xi , which is not a random variable. However, ϵi is a random variable which follows
the Gaussian distribution ϵ ∼ N (0, σ 2 ), and it is randomly sampled for each i and
Tutorial 4 3

added to f (xi ). Therefore, we can get


E[yi ] = E[f (xi ) + ϵi ] = E[f (xi )] + E[ϵi ] = f (xi ).
V ar[yi ] = V ar[f (xi ) + ϵi ] = V ar[ϵi ] = σ 2 .
Note: f (x) is a constant here, not a random variable. The property of V ar[X + a] =
V ar[X], where X is a random variable and a is a constant can be used here.
(b) Using squared-error loss, the expected prediction error of a regression fit fˆ(x) at an
input point x = x0 can be written as the sum of irreducible error, bias squared and
variance. Assume that the neighbors are fixed, which make analysis simpler. If the
inputs xi are random rather than fixed, the analysis would not be exactly right but the
insights from the simpler analysis are nonetheless useful.
Under the assumption, the error can be expressed as:
Err(x0 ) = E[(y − fˆk (x0 ))2 |x0 )]
k
1X σ2
= σ 2 + (f (x0 ) − f (xi ))2 +
k i=1 k

where xi , (i = 1, . . . , k) are the k nearest data points.


Which part of the equation above represents irreducible error, bias squared and vari-
ance? Derive the equation above by calculating each of the three terms.
Solution:
• Irreducible error: σ 2 , based on the noise of the observation
• Variance:

V ar[f (xi )] = 0 because we assume the nearest neighbours xi are fixed, hence
values of f (xi ) are also fixed.
Tutorial 4 4

• Bias squared:

Pk Pk
Assuming fixed neighbours xi , E[ k1 i=1 y(xi )] = 1
k i=1 f (xi ).
(c) How would the equations from Part (b) change when k is varied? How would you
choose an optimal value of k using the above equations?
Solution:
• Choice of k will not affect the irreducible error.
• The variance will decrease as k increases.
• As the value of k increases, bias will likely increase – as the number of data points
increase, we will be considering points further away from x0 and we would move
further away from f (x0 ) (in the extreme case where k is equal to the number of
points in the training set, fˆ(x) would just give the mean of the training set output).

You might also like