0% found this document useful (0 votes)

16 views

Week 5 Tutorial

Uploaded by

297752644

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Week 5 Tutorial

Uploaded by

297752644

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

QBUS6810

Statistical Learning and Data Mining

Week 5 Tutorial

Question 1

Let θb be an estimator and θ the quantity to be estimated. You can think of θ as a

scalar-valued parameter, but the estimand can be any quantity of interest such as f (x)
in a regression model.

(a) What is an estimator (in words)?

(b) What is the mathematical definition of the bias of the estimator θ?
b Interpret the

equation.

(c) Define the risk of the estimator as

h i
b =E
R(θ) pdata L(θ, θ)
b

for a loss function R. The term risk appears again here because decision theory
also applies to the choice of estimator.

Furthermore, assume the squared error loss, such that

2
R(θ)
b =E θ−θ
b ,

where θ is the actual value of the parameter.

Show that
b = Bias2 (θ)
R(θ) b + V(θ).
b

Identify the property used by each step of the derivation.

Question 2

The kNN regression algorithm is based on the prediction rule

fb(x) = Average yi i is in Nk (x, D)
1 X
= yi
k i∈Nk (x,D)
where D = {(yi , xi )}ni=1 is the training data and Nk (x, D) contains the indexes of the
closest k data points to x in D according to some distance function dist(x, xi ).

(a) Why do we say that the KNN algorithm is a nonparametric method?

(b) What is the possible advantage of using a nonparametric method such as KNN
over a parametric approach such as a linear regression?

(c) Suppose that the DGP is the additive error model

Yi = µ(xi ) + εi , i = 1, . . . , n,

where each εi is a random error with mean zero and variance σ 2 that is indepen-
dent from everything else. Furthermore, assume that the training inputs are fixed
(therefore, all randomness comes from the errors).

Define the effective number of parameters (effective degrees of freedom) of a

regression estimator as
Pn
i=1 Cov(Yi , fb(xi ))
df(fb) = ,
σ2

The effective number of parameters measures the complexity of a regression model.

For a linear regression, we can show that df(fb) is the actual number of parameters
p + 1.

Show that the effective number of parameters of k-nearest neighbours regression is

n/k.

(d) Based on the effective number of parameters, how does the model complexity
change as a function of n and k?

(e) Consider a test case x∗ . Derive the bias of fb(x∗ ) for estimating µ(x∗ ). How does
it change as a function of k?

(f) Derive the variance of the estimator fb(x∗ ). How does it change as a function of k?

(g) Interpret the last two results together.

Question 3

A few years ago, users found that Google Photos was automatically tagging some people

Page 2
as “gorillas”, generating negative publicity for the company. A link to the story is on
the tutorial page.

One way to prevent failures of this type is to allow for a reject option in the classifier. In
this case, the algorithm can decline to provide an answer if it’s not sufficiently confident
in the prediction.

Suppose that the possible labels are Y = 1, . . . , C and the actions are A = Y ∪ {0},
where action 0 represents the reject option. Define the loss function:





0 if y = a

L(y, a) = `r if a = 0 .



` if y 6= a

e

(a) Derive the optimal policy.

(b) Discuss the result.

Page 3

Econ MIdterm 2 Practise
No ratings yet
Econ MIdterm 2 Practise
11 pages
L24 Quiz Group Meeting Biostatistics PDF
No ratings yet
L24 Quiz Group Meeting Biostatistics PDF
21 pages
Lean Six Sigma Analyze Phase Tollgate Template
91% (11)
Lean Six Sigma Analyze Phase Tollgate Template
30 pages
EconometricsII Exercises
100% (1)
EconometricsII Exercises
27 pages
StatLearning2r PDF
No ratings yet
StatLearning2r PDF
267 pages
exam_practice
No ratings yet
exam_practice
5 pages
Notes2
No ratings yet
Notes2
16 pages
SDS Solution1
No ratings yet
SDS Solution1
26 pages
Statistics Quiz
No ratings yet
Statistics Quiz
20 pages
Quant_Chapter_05_ols
No ratings yet
Quant_Chapter_05_ols
15 pages
I. Ii. Iii. Iv. V.: EBE 2174/EBQ2074 Econometrics Tutorial 2 (ANSWERS) Evan Lau
100% (1)
I. Ii. Iii. Iv. V.: EBE 2174/EBQ2074 Econometrics Tutorial 2 (ANSWERS) Evan Lau
3 pages
Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing
No ratings yet
Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing
16 pages
T04Soln
No ratings yet
T04Soln
4 pages
STAT2006_A1
No ratings yet
STAT2006_A1
21 pages
Lecture Notes For Mathematical Statistics
No ratings yet
Lecture Notes For Mathematical Statistics
184 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Linear Stochastic Models: 5.1 Least Squares
No ratings yet
Linear Stochastic Models: 5.1 Least Squares
12 pages
Merged Exercises
No ratings yet
Merged Exercises
238 pages
1 Introduction
No ratings yet
1 Introduction
8 pages
exam_practice_solution
No ratings yet
exam_practice_solution
9 pages
Set A_CAII 2
No ratings yet
Set A_CAII 2
4 pages
ps4
No ratings yet
ps4
4 pages
Sta 3
No ratings yet
Sta 3
9 pages
Machine 2020 Jul-Dec Practice 7,8
No ratings yet
Machine 2020 Jul-Dec Practice 7,8
37 pages
EF3450 1920B
No ratings yet
EF3450 1920B
5 pages
Empirical Finance 6
No ratings yet
Empirical Finance 6
38 pages
2018, Applied Ecotrix, Question Paper
No ratings yet
2018, Applied Ecotrix, Question Paper
20 pages
Nonlinear Regression
No ratings yet
Nonlinear Regression
12 pages
DS&ML 2
No ratings yet
DS&ML 2
8 pages
Qcm1 February 2015 424 Corrige
No ratings yet
Qcm1 February 2015 424 Corrige
10 pages
3 Fall 2007 Exam PDF
No ratings yet
3 Fall 2007 Exam PDF
7 pages
CS4780_Homework_5_SP24-2
No ratings yet
CS4780_Homework_5_SP24-2
7 pages
Theory of Estimation by P.G.dixit, Nirali Publication
No ratings yet
Theory of Estimation by P.G.dixit, Nirali Publication
186 pages
Statistics Resit July 16 2019+%28with+Answers%29
No ratings yet
Statistics Resit July 16 2019+%28with+Answers%29
11 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
Tahir Abbas F19-12005 M.com 02 Statics
No ratings yet
Tahir Abbas F19-12005 M.com 02 Statics
5 pages
Eco No Metrics
No ratings yet
Eco No Metrics
4 pages
Estimators: The Basic Statistical Model
No ratings yet
Estimators: The Basic Statistical Model
9 pages
14 Estimation
No ratings yet
14 Estimation
10 pages
Quetions For Top Students PDF
No ratings yet
Quetions For Top Students PDF
2 pages
HW3+Solution
No ratings yet
HW3+Solution
10 pages
850 Midterm 2015
No ratings yet
850 Midterm 2015
2 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Econ0064 Final Exam 2018-19 Term3
No ratings yet
Econ0064 Final Exam 2018-19 Term3
7 pages
12
No ratings yet
12
16 pages
ECO311 Practice Questions 1
No ratings yet
ECO311 Practice Questions 1
5 pages
Econometrics I 5
No ratings yet
Econometrics I 5
57 pages
CH 2
No ratings yet
CH 2
31 pages
Problem Set 3 PDF
No ratings yet
Problem Set 3 PDF
2 pages
Time Series and Panel Data Econometrics
No ratings yet
Time Series and Panel Data Econometrics
95 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Metrics Aug 2023
No ratings yet
Metrics Aug 2023
10 pages
Econometrics - Exercise set 2 (solution)
No ratings yet
Econometrics - Exercise set 2 (solution)
12 pages
lec # 3
No ratings yet
lec # 3
47 pages
Metrics Jan 2021
No ratings yet
Metrics Jan 2021
10 pages
Kondor Regression
No ratings yet
Kondor Regression
4 pages
Wa0006.
No ratings yet
Wa0006.
4 pages
Practice Final
No ratings yet
Practice Final
15 pages
Econometrics Endterm Summary 2 PDF
No ratings yet
Econometrics Endterm Summary 2 PDF
43 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Probabilistic Programming Introduction
No ratings yet
Probabilistic Programming Introduction
56 pages
2022 Midterm Set A
No ratings yet
2022 Midterm Set A
8 pages
Machine-Learning Models For Sales Time Series Forecasting: Bohdan M. Pavlyshenko
No ratings yet
Machine-Learning Models For Sales Time Series Forecasting: Bohdan M. Pavlyshenko
11 pages
4th sem model question paper
No ratings yet
4th sem model question paper
3 pages
Combined Analysis
No ratings yet
Combined Analysis
7 pages
A Strategy To Assess Water Meter Perform
No ratings yet
A Strategy To Assess Water Meter Perform
11 pages
Gelman - Data Analysis With Regressions and MultiLevel Hierarchical Models
No ratings yet
Gelman - Data Analysis With Regressions and MultiLevel Hierarchical Models
11 pages
BBM 350 Managerial Statistics
No ratings yet
BBM 350 Managerial Statistics
5 pages
Regression
No ratings yet
Regression
4 pages
[Ebooks PDF] download Applied Statistics: From Bivariate Through Multivariate Techniques Second Edition – Ebook PDF Version full chapters
100% (3)
[Ebooks PDF] download Applied Statistics: From Bivariate Through Multivariate Techniques Second Edition – Ebook PDF Version full chapters
51 pages
Inferential Analysis SPSS
No ratings yet
Inferential Analysis SPSS
55 pages
Exercise 10
No ratings yet
Exercise 10
3 pages
SARIMA+ARIMA Covid 19
No ratings yet
SARIMA+ARIMA Covid 19
8 pages
1-s2.0-S0950061822008741-main
No ratings yet
1-s2.0-S0950061822008741-main
12 pages
Categorical Data Analysis (CDA) - 1
No ratings yet
Categorical Data Analysis (CDA) - 1
154 pages
Meam 601 Activity 5 Yamuta, Adonis Jeff E.
No ratings yet
Meam 601 Activity 5 Yamuta, Adonis Jeff E.
3 pages
T-Test: Two-Sample Assuming Unequal Variances: Variabl E1 Variab Le2
No ratings yet
T-Test: Two-Sample Assuming Unequal Variances: Variabl E1 Variab Le2
10 pages
Hypothesis Testing - Tests on Binomial
No ratings yet
Hypothesis Testing - Tests on Binomial
8 pages
Batu Pahat (Q)
No ratings yet
Batu Pahat (Q)
5 pages
Hubungan Antar Volume Lalu Lintas Dengan Tingkat Kebisingan Di Jalan
No ratings yet
Hubungan Antar Volume Lalu Lintas Dengan Tingkat Kebisingan Di Jalan
7 pages
Q Table
No ratings yet
Q Table
39 pages
Flowchart On Statistical Technique
100% (1)
Flowchart On Statistical Technique
3 pages
One-Sample Kolmogorov-Smirnov Test
No ratings yet
One-Sample Kolmogorov-Smirnov Test
7 pages
Time Series Models 2nd Edition Andrew C. Harvey pdf download
100% (2)
Time Series Models 2nd Edition Andrew C. Harvey pdf download
69 pages
MGMT 469 Maximum Likelihood Estimation
No ratings yet
MGMT 469 Maximum Likelihood Estimation
6 pages
Full Download (Ebook PDF) Research Methods For The Behavioral Sciences 3rd Edition PDF
100% (10)
Full Download (Ebook PDF) Research Methods For The Behavioral Sciences 3rd Edition PDF
51 pages
Final - Finance and Acc STATA Assignment
No ratings yet
Final - Finance and Acc STATA Assignment
22 pages
Means Hypothesis Testing
No ratings yet
Means Hypothesis Testing
5 pages

Week 5 Tutorial

Uploaded by

Week 5 Tutorial

Uploaded by

QBUS6810

Statistical Learning and Data Mining

Let θb be an estimator and θ the quantity to be estimated. You can think of θ as a

(a) What is an estimator (in words)?

(c) Define the risk of the estimator as

Furthermore, assume the squared error loss, such that

where θ is the actual value of the parameter.

Identify the property used by each step of the derivation.

The kNN regression algorithm is based on the prediction rule

(a) Why do we say that the KNN algorithm is a nonparametric method?

(c) Suppose that the DGP is the additive error model

Define the effective number of parameters (effective degrees of freedom) of a

The effective number of parameters measures the complexity of a regression model.

Show that the effective number of parameters of k-nearest neighbours regression is

(g) Interpret the last two results together.

(a) Derive the optimal policy.

You might also like