0% found this document useful (0 votes)
16 views

Week 5 Tutorial

Uploaded by

297752644
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Week 5 Tutorial

Uploaded by

297752644
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

QBUS6810

Statistical Learning and Data Mining


Week 5 Tutorial

Question 1

Let θb be an estimator and θ the quantity to be estimated. You can think of θ as a


scalar-valued parameter, but the estimand can be any quantity of interest such as f (x)
in a regression model.

(a) What is an estimator (in words)?


(b) What is the mathematical definition of the bias of the estimator θ?
b Interpret the

equation.

(c) Define the risk of the estimator as


h i
b =E
R(θ) pdata L(θ, θ)
b

for a loss function R. The term risk appears again here because decision theory
also applies to the choice of estimator.

Furthermore, assume the squared error loss, such that


 2 
R(θ)
b =E θ−θ
b ,

where θ is the actual value of the parameter.

Show that
b = Bias2 (θ)
R(θ) b + V(θ).
b

Identify the property used by each step of the derivation.

Question 2

The kNN regression algorithm is based on the prediction rule


 
fb(x) = Average yi i is in Nk (x, D)
1 X
= yi
k i∈Nk (x,D)
where D = {(yi , xi )}ni=1 is the training data and Nk (x, D) contains the indexes of the
closest k data points to x in D according to some distance function dist(x, xi ).

(a) Why do we say that the KNN algorithm is a nonparametric method?

(b) What is the possible advantage of using a nonparametric method such as KNN
over a parametric approach such as a linear regression?

(c) Suppose that the DGP is the additive error model

Yi = µ(xi ) + εi , i = 1, . . . , n,

where each εi is a random error with mean zero and variance σ 2 that is indepen-
dent from everything else. Furthermore, assume that the training inputs are fixed
(therefore, all randomness comes from the errors).

Define the effective number of parameters (effective degrees of freedom) of a


regression estimator as
Pn
i=1 Cov(Yi , fb(xi ))
df(fb) = ,
σ2

The effective number of parameters measures the complexity of a regression model.


For a linear regression, we can show that df(fb) is the actual number of parameters
p + 1.

Show that the effective number of parameters of k-nearest neighbours regression is


n/k.

(d) Based on the effective number of parameters, how does the model complexity
change as a function of n and k?

(e) Consider a test case x∗ . Derive the bias of fb(x∗ ) for estimating µ(x∗ ). How does
it change as a function of k?

(f) Derive the variance of the estimator fb(x∗ ). How does it change as a function of k?

(g) Interpret the last two results together.

Question 3

A few years ago, users found that Google Photos was automatically tagging some people

Page 2
as “gorillas”, generating negative publicity for the company. A link to the story is on
the tutorial page.

One way to prevent failures of this type is to allow for a reject option in the classifier. In
this case, the algorithm can decline to provide an answer if it’s not sufficiently confident
in the prediction.

Suppose that the possible labels are Y = 1, . . . , C and the actions are A = Y ∪ {0},
where action 0 represents the reject option. Define the loss function:





0 if y = a

L(y, a) = `r if a = 0 .



` if y 6= a

e

(a) Derive the optimal policy.


(b) Discuss the result.

Page 3

You might also like