0% found this document useful (0 votes)
10 views

Week 5 Tutorial

Uploaded by

297752644
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Week 5 Tutorial

Uploaded by

297752644
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

QBUS6810

Statistical Learning and Data Mining


Week 5 Tutorial

Question 1

Let θb be an estimator and θ the quantity to be estimated. You can think of θ as a


scalar-valued parameter, but the estimand can be any quantity of interest such as f (x)
in a regression model.

(a) What is an estimator (in words)?


(b) What is the mathematical definition of the bias of the estimator θ?
b Interpret the

equation.

(c) Define the risk of the estimator as


h i
b =E
R(θ) pdata L(θ, θ)
b

for a loss function R. The term risk appears again here because decision theory
also applies to the choice of estimator.

Furthermore, assume the squared error loss, such that


 2 
R(θ)
b =E θ−θ
b ,

where θ is the actual value of the parameter.

Show that
b = Bias2 (θ)
R(θ) b + V(θ).
b

Identify the property used by each step of the derivation.

Question 2

The kNN regression algorithm is based on the prediction rule


 
fb(x) = Average yi i is in Nk (x, D)
1 X
= yi
k i∈Nk (x,D)
where D = {(yi , xi )}ni=1 is the training data and Nk (x, D) contains the indexes of the
closest k data points to x in D according to some distance function dist(x, xi ).

(a) Why do we say that the KNN algorithm is a nonparametric method?

(b) What is the possible advantage of using a nonparametric method such as KNN
over a parametric approach such as a linear regression?

(c) Suppose that the DGP is the additive error model

Yi = µ(xi ) + εi , i = 1, . . . , n,

where each εi is a random error with mean zero and variance σ 2 that is indepen-
dent from everything else. Furthermore, assume that the training inputs are fixed
(therefore, all randomness comes from the errors).

Define the effective number of parameters (effective degrees of freedom) of a


regression estimator as
Pn
i=1 Cov(Yi , fb(xi ))
df(fb) = ,
σ2

The effective number of parameters measures the complexity of a regression model.


For a linear regression, we can show that df(fb) is the actual number of parameters
p + 1.

Show that the effective number of parameters of k-nearest neighbours regression is


n/k.

(d) Based on the effective number of parameters, how does the model complexity
change as a function of n and k?

(e) Consider a test case x∗ . Derive the bias of fb(x∗ ) for estimating µ(x∗ ). How does
it change as a function of k?

(f) Derive the variance of the estimator fb(x∗ ). How does it change as a function of k?

(g) Interpret the last two results together.

Question 3

A few years ago, users found that Google Photos was automatically tagging some people

Page 2
as “gorillas”, generating negative publicity for the company. A link to the story is on
the tutorial page.

One way to prevent failures of this type is to allow for a reject option in the classifier. In
this case, the algorithm can decline to provide an answer if it’s not sufficiently confident
in the prediction.

Suppose that the possible labels are Y = 1, . . . , C and the actions are A = Y ∪ {0},
where action 0 represents the reject option. Define the loss function:





0 if y = a

L(y, a) = `r if a = 0 .



` if y 6= a

e

(a) Derive the optimal policy.


(b) Discuss the result.

Page 3

You might also like