7406HW04

ISYE 7406 HW 4

Uploaded by

deathslayer112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views3 pages

7406HW04

ISYE 7406 HW 4

Uploaded by

deathslayer112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

ISyE 7406: Data Mining & Statistical Learning

HW#4

Local Smoothing. The goal of this homework is to help you better understand the statistical properties
and computational challenges of local smoothing such as loess, Nadaraya-Watson (NW) kernel smoothing,
and spline smoothing.
For this purpose, we will compute empirical bias, empirical variances, and empirical mean square error
(MSE) based on m = 1000 Monte Carlo runs, where in each run we simulate a data set of n = 101
observations from the additive noise model

Yi = f (xi ) + i (1)

with the famous Mexican hat function

f (x) = (1 − x2 ) exp(−0.5x2 ), −2π ≤ x ≤ 2π, (2)

and 1 , · · · , n are independent and identically distributed (iid) N (0, 0.22 ). This function is known to pose a
variety of estimation challenges, and below we explore the difficulties inherent in this function.

(1) Let us first consider the (deterministic fixed) design with equi-distant points in [−2π, 2π].

(a) For each of m = 1000 Monte Carlo runs, simulate or generate a data set of the form (xi , Yi )) with
i−1
xi = 2π(−1 + 2 n−1 ) and Yi is from the model in (1). Denote such data set as Dj at the j-th
Monte Carlo run for j = 1, · · · , m = 1000.
(b) For each data set Dj or each Monte Carlo run, compute the three different kinds of local smooth-
ing estimates at every point in Dj : loess (with span = 0.75), Nadaraya-Watson (NW) kernel
smoothing with Gaussian Kernel and bandwidth = 0.2, and spline smoothing with the default
tuning parameter.
(c) At each point xi , for each local smoothing method, based on m = 1000 Monte Carlo runs, compute
the empirical bias, empirical variance, and empirical mean square error (MSE), which are defined
as
m
\ 1 X ˆ(j)
Bias{f (xi )} = f̄ m (xi ) − f (xi ), with f̄m (xi ) = f (xi )
m j=1
m 2
\ 1 X ˆ(j)
V ar{f (xi )} = f (xi ) − f̄m (xi ) ,
m j=1
m 2
\(xi )} 1 X ˆ(j)
M SE{f = f (xi ) − f (xi ) ,
m j=1

Here we use the true function value f (xi ) in (2) in the definition of empirical Bias and empirical
MSE, which better helps us understanding the performance of different local smoothing methods
1
when estimating the true function f . Moreover, we purposely use the coefficient m (instead of the
1
standard coefficient m−1 ) in the definition of empirical variance, so that the well-known relation
MSE = Bias2 + Var is applicable to the empirical version.
(d) Plot these quantities against xi for all three kinds of local smoothing estimators: loess, NW kernel,
and spline smoothing.
(e) Provide a through analysis of what the plots suggest, e.g., which method is better/worse on bias,
variance, and MSE? Do you think whether it is fair comparison between these three methods?
Why or why not?

1
(2) Repeat part (1) with another (deterministic) design that has non-equidistant points in the interval
[−2π, 2π]. The following R code is used to generate the design points xi ’s in my laptop, denoted by x2
below (you can keep these xi ’s fixed in the m = 1000 Monte Carlo runs):

set.seed(79)
x2 <- round(2*pi*sort(c(0.5, -1 + rbeta(50,2,2), rbeta(50,2,2))), 8)

For those students who use Python or other softwares, below are the values of x2, which is also available
in the dataset “HW04part2.x.csv”:

[1] -6.15270162 -5.96542722 -5.34857373 -5.25045325 -4.93408232

[6] -4.82399219 -4.79756399 -4.72091648 -4.56066454 -4.52796823
[11] -4.37012804 -4.35883971 -4.35650125 -4.14444058 -4.08772246
[16] -3.99254444 -3.85521732 -3.79979095 -3.68856521 -3.66394780
[21] -3.61158538 -3.49345211 -3.43019447 -3.19560754 -3.18518010
[26] -3.12475790 -2.97880845 -2.89425664 -2.88764749 -2.70117864
[31] -2.65330733 -2.63339792 -2.43171710 -2.23984616 -2.20285070
[36] -1.95028199 -1.89814133 -1.70630816 -1.66101539 -1.41528948
[41] -1.35654375 -1.32557550 -1.31997953 -1.28553107 -1.13468911
[46] -1.06343202 -0.96219478 -0.79705203 -0.61365764 -0.61076290
[51] 0.06614811 0.16881684 0.78634836 0.92627384 0.98082320
[56] 1.23222276 1.26040912 1.35389378 1.38890407 1.46203893
[61] 1.64117907 1.64643080 1.72974830 1.77209389 1.83964734
[66] 1.91725385 2.03169983 2.07038412 2.28388853 2.28508649
[71] 2.44458357 2.54947996 2.68015883 2.72649568 2.73596466
[76] 2.90513928 2.99685314 3.10296534 3.14159265 3.43473453
[81] 3.46738955 3.80582747 3.89114580 3.96880166 3.97037737
[86] 4.36755363 4.41302445 4.46414213 4.49439898 4.49756067
[91] 4.59615407 5.01854928 5.04809437 5.07324581 5.12980909
[96] 5.13371469 5.14677578 5.14849898 5.55918177 5.62584723
[101] 5.95526169

Here for simplicity and reasonable comparison, when estimating and predicting by the local smoothing
methods, please use span = 0.3365 for loess, bandwidth = 0.2 for NW kernel smoothing, and spar =
0.7163 for spline splines.
Discuss the statistical challenges and the computational challenges when using these three local smooth-
ing methods to estimate the Mexican hat function under this non-equidistant design, including the
suitable choices of tuning parameters.

2
Appendix: below are some sample R codes that may be useful to this homework, and please note that you
might need to modify/revise these R codes!

## Part #1 deterministic equidistant design

## Generate n=101 equidistant points in [-2\pi, 2\pi]
m <- 1000
n <- 101
x <- 2*pi*seq(-1, 1, length=n)

## Initialize the matrix of fitted values for three methods

fvlp <- fvnw <- fvss <- matrix(0, nrow= n, ncol= m)

##Generate data, fit the data and store the fitted values
for (j in 1:m){
## simulate y-values
## Note that you need to replace $f(x)$ below by the mathematical definition in eq. (2)
y <- f(x) + rnorm(length(x), sd=0.2);

## Get the estimates and store them

fvlp[,j] <- predict(loess(y ~ x, span = 0.75), newdata = x);
fvnw[,j] <- ksmooth(x, y, kernel="normal", bandwidth= 0.2, x.points=x)$y;
fvss[,j] <- predict(smooth.spline(y ~ x), x=x)$y
}

## Below is the sample R code to plot the mean of three estimators in a single plot
meanlp = apply(fvlp,1,mean);
meannw = apply(fvnw,1,mean);
meanss = apply(fvss,1,mean);

dmin = min( meanlp, meannw, meanss);

dmax = max( meanlp, meannw, meanss);
matplot(x, meanlp, "l", ylim=c(dmin, dmax), ylab="Response")
matlines(x, meannw, col="red")
matlines(x, meanss, col="blue")
## You might add the raw observations to compare with the fitted curves
# points(x,y)
## Can you adapt the above codes to plot the empirical bias/variance/MSE?

## Part #2 non-equidistant design

## assume you save the file "HW04part2.x.csv" in the local folder "C:/temp",
x2 <- read.table(file= "C:/temp/HW04part2.x.csv", header=TRUE);

## within each loop, you can consider the three local smoothing methods:
## please remember that you need to first simulate Y values within each loop
y <- (1-x2^2) * exp(-0.5 * x2^2) + rnorm(length(x2), sd=0.2);
predict(loess(y ~ x2, span = 0.3365), newdata = x2);
ksmooth(x2, y, kernel="normal", bandwidth= 0.2, x.points=x2)$y;
predict(smooth.spline(y ~ x2, spar= 0.7163), x=x2)$y

Case Reyem Affair
100% (3)
Case Reyem Affair
22 pages
SAMPLE EXAM QUESTIONS #2 1. in A Time-Series Forecasting Problem
88% (8)
SAMPLE EXAM QUESTIONS #2 1. in A Time-Series Forecasting Problem
3 pages
Solutions - Thinking Like A Mathematician
0% (2)
Solutions - Thinking Like A Mathematician
9 pages
CH 10 TB
100% (1)
CH 10 TB
23 pages
Tengreateconomists Schumpeter
No ratings yet
Tengreateconomists Schumpeter
319 pages
Endogeneity: Yusep Suparman
No ratings yet
Endogeneity: Yusep Suparman
25 pages
Quant Interview and Exam Prep
No ratings yet
Quant Interview and Exam Prep
21 pages
Chapter 5 Testbank Topic Grid: Garrison/Noreen/Brewer, Managerial Accounting, Twelfth Edition 5-1
No ratings yet
Chapter 5 Testbank Topic Grid: Garrison/Noreen/Brewer, Managerial Accounting, Twelfth Edition 5-1
4 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
43 pages
Handleiding Spss Multinomial Logit Regression
No ratings yet
Handleiding Spss Multinomial Logit Regression
35 pages
Demand Estimation
No ratings yet
Demand Estimation
84 pages
AP Macro Economics 2005 MC
No ratings yet
AP Macro Economics 2005 MC
14 pages
Econwp 146
No ratings yet
Econwp 146
51 pages
Different Fundamental Valuations
No ratings yet
Different Fundamental Valuations
21 pages
Healthcare Human Resource Planning
No ratings yet
Healthcare Human Resource Planning
28 pages
Final Report Econometric
No ratings yet
Final Report Econometric
29 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Algorithmic Trading Week 1
No ratings yet
Algorithmic Trading Week 1
9 pages
Coordination and Anti-Coordination Games
No ratings yet
Coordination and Anti-Coordination Games
4 pages
Udt YARINDO
No ratings yet
Udt YARINDO
88 pages
Model Fitting and Error Estimation: BSR 1803 Systems Biology: Biomedical Modeling
No ratings yet
Model Fitting and Error Estimation: BSR 1803 Systems Biology: Biomedical Modeling
34 pages
Dummy and Mediating
No ratings yet
Dummy and Mediating
12 pages
tr867 PDF
No ratings yet
tr867 PDF
48 pages
Lecture 13: Experiments With Random Effects: Montgomery, Chapter 12 or 13
No ratings yet
Lecture 13: Experiments With Random Effects: Montgomery, Chapter 12 or 13
54 pages
Multinomial Logistic Regression-2
No ratings yet
Multinomial Logistic Regression-2
21 pages
Statpro Lesson 9th Week
No ratings yet
Statpro Lesson 9th Week
26 pages
Semiparametric Regression
No ratings yet
Semiparametric Regression
22 pages
Fujii 2 PDF
No ratings yet
Fujii 2 PDF
17 pages
Six Steps in Regression Analysis by Hasan Nagra Econometrics Sir Atif Notes
No ratings yet
Six Steps in Regression Analysis by Hasan Nagra Econometrics Sir Atif Notes
30 pages
Beta Calculation
No ratings yet
Beta Calculation
24 pages
Em Algo For Multivariate GMM
No ratings yet
Em Algo For Multivariate GMM
9 pages
Assignment 2 - LP1
No ratings yet
Assignment 2 - LP1
7 pages
Statistical Learning in R
No ratings yet
Statistical Learning in R
31 pages
Quiz - Estimators Attempt Review
No ratings yet
Quiz - Estimators Attempt Review
5 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
EDA Final Exam Question Paper
No ratings yet
EDA Final Exam Question Paper
2 pages
Week8 PDF
No ratings yet
Week8 PDF
31 pages
Assignment 3: July 12, 2017
No ratings yet
Assignment 3: July 12, 2017
9 pages
Homework 3 R Tutorial: How To Use This Tutorial
No ratings yet
Homework 3 R Tutorial: How To Use This Tutorial
8 pages
LMSP
No ratings yet
LMSP
7 pages
Features Election
No ratings yet
Features Election
18 pages
Embedded Methods: Isabelle Guyon André Elisseeff
No ratings yet
Embedded Methods: Isabelle Guyon André Elisseeff
12 pages
STA302 Week12 Full
No ratings yet
STA302 Week12 Full
30 pages
SAS Annotated Output Regression Analysis
No ratings yet
SAS Annotated Output Regression Analysis
4 pages
Assignment: Monte Carlo Simulations 1
No ratings yet
Assignment: Monte Carlo Simulations 1
10 pages
LR-Heteroskedastisitas Test-Log10 Method
No ratings yet
LR-Heteroskedastisitas Test-Log10 Method
4 pages
Applied Linear Regression
No ratings yet
Applied Linear Regression
6 pages
Assignment 1
No ratings yet
Assignment 1
18 pages
BOOK Nonparametric and Semiparametric Models-2004
No ratings yet
BOOK Nonparametric and Semiparametric Models-2004
87 pages
Homework 2
No ratings yet
Homework 2
14 pages
A Review On Variable Selection in Regression
No ratings yet
A Review On Variable Selection in Regression
27 pages
Statistical Testing and Prediction Using Linear Regression: Abstract
No ratings yet
Statistical Testing and Prediction Using Linear Regression: Abstract
10 pages
Time Series Analysis in R A Beginner's Guide
No ratings yet
Time Series Analysis in R A Beginner's Guide
13 pages
R Fourier
No ratings yet
R Fourier
18 pages
Smoothing: Smooth
No ratings yet
Smoothing: Smooth
19 pages
Taf 6002505
No ratings yet
Taf 6002505
24 pages
ML Lab Experiments (1) - Pages-5
No ratings yet
ML Lab Experiments (1) - Pages-5
8 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Chapter 4..1
No ratings yet
Chapter 4..1
19 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Statistical Inference
No ratings yet
Statistical Inference
7 pages
Module 2 Lab Activity - Regression
No ratings yet
Module 2 Lab Activity - Regression
9 pages
Exp 5
No ratings yet
Exp 5
9 pages
STAT456 Study Guide
No ratings yet
STAT456 Study Guide
31 pages
Code
No ratings yet
Code
4 pages
Ordinal Logistic Regression Stata Command
No ratings yet
Ordinal Logistic Regression Stata Command
3 pages
ML Unit 3 Notes 1
No ratings yet
ML Unit 3 Notes 1
58 pages
Smoothing of Scatterplots
No ratings yet
Smoothing of Scatterplots
51 pages
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
No ratings yet
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
13 pages
Pinv For Modern ML
No ratings yet
Pinv For Modern ML
31 pages
Support Vector Machines: Comprehensive Problem Set: KFG December 19, 2024
No ratings yet
Support Vector Machines: Comprehensive Problem Set: KFG December 19, 2024
35 pages
Statistical Methods II
No ratings yet
Statistical Methods II
284 pages
Nonparametric and Semiparametric Models
No ratings yet
Nonparametric and Semiparametric Models
325 pages
Lesson 1
No ratings yet
Lesson 1
83 pages
Yohai 1991
No ratings yet
Yohai 1991
10 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
Samp Doc
No ratings yet
Samp Doc
4 pages
Slides3part2 mrbm2324
No ratings yet
Slides3part2 mrbm2324
23 pages
7406HW02 1
No ratings yet
7406HW02 1
3 pages
Assignment
No ratings yet
Assignment
7 pages
RSM1282-2025-Session 9-Binary Dependent Variables & Logistic Regression - POST
No ratings yet
RSM1282-2025-Session 9-Binary Dependent Variables & Logistic Regression - POST
35 pages
Hw4sol 1 PDF
No ratings yet
Hw4sol 1 PDF
5 pages
Locally Weighted Regression Algorithm
No ratings yet
Locally Weighted Regression Algorithm
6 pages
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
No ratings yet
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
5 pages
MATLAB Assignment
No ratings yet
MATLAB Assignment
5 pages
Chapter 2 SEM 2025
No ratings yet
Chapter 2 SEM 2025
52 pages
Assn 3
No ratings yet
Assn 3
8 pages
Week 2
No ratings yet
Week 2
61 pages
Assignment 1
No ratings yet
Assignment 1
16 pages
Plot The Final Point
No ratings yet
Plot The Final Point
2 pages
ASSP Lab-1
No ratings yet
ASSP Lab-1
19 pages
Shahun Term Workr1
No ratings yet
Shahun Term Workr1
34 pages
Labsheet 4 Ee23b066
No ratings yet
Labsheet 4 Ee23b066
11 pages
Quantile Regression
No ratings yet
Quantile Regression
3 pages
Packages SaeHB
No ratings yet
Packages SaeHB
33 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet