0% found this document useful (0 votes)

23 views7 pages

Nonparametric Regression

This document covers Lecture 9 of STAT 425, focusing on nonparametric regression methods, specifically regressograms and kernel regression. It discusses the concepts of estimating the regression function without parametric assumptions, detailing the construction, bias, variance, and mean squared error of both regressograms and kernel regression estimators. Additionally, it addresses the use of bootstrap methods for assessing estimator quality and constructing confidence intervals.

Uploaded by

Sylvia Cheung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views7 pages

Nonparametric Regression

Uploaded by

Sylvia Cheung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

STAT 425: Introduction to Nonparametric Statistics Winter 2018

Lecture 9: Regression: Regressogram and Kernel Regression

Instructor: Yen-Chi Chen

Reference: Chapter 5 of All of nonparametric statistics.

9.1 Introduction

Let (X1 , Y1 ), · · · , (Xn , Yn ) be a bivariate random sample. In the regression analysis, we are often interested
in the regression function
m(x) = E(Y |X = x).
Sometimes, we will write
Yi = m(Xi ) + i ,
where i is a mean 0 noise. The simple linear regression model is to assume that m(x) = β0 + β1 x, where
β0 and β1 are the intercept and slope parameter. In this lecture, we will talk about methods that direct
estimate the regression function m(x) without imposing any parametric form of m(x).

9.2 Regressogram (Binning)

We start with a very simple but extremely popular method. This method is called regressogram but people
often call it binning approach. You can view it as

regressogram = regression + histogram.

For simplicity, we assume that the covariates Xi ’s are from a distribution over [0, 1].
Similar to the histogram, we first choose M , the number of bins. Then we partition the interval [0, 1] into
M equal-width bins:

1 1 2 M −2 M −1 M −1
B1 = 0, , B2 = , , · · · , BM −1 = , , BM = ,1 .
M M M M M M
When x ∈ B` , we estimate m(x) by
Pn
Yi I(Xi ∈ B` )
b M (x) = Pi=1
m n = average of the responses whose covariates is in the same bin as x.
i=1 I(Xi ∈ B` )

Bias. The bias of a regressogram estimator is

1
bias(m
b M (x)) = O .
M

Variance. The variation of a regressogram estimator is

M
Var(m
b M (x)) = O .
n

9-1
9-2 Lecture 9: Regression: Regressogram and Kernel Regression

Therefore, the MSE and MISE will be at rate

1 M 1 M
MSE = O +O , MISE = O +O ,
M2 n M2 n
leading to the optimal number of bins M ∗ n1/3 and the optimal convergence rate O(n−2/3 ), the same as
the histogram.
Similar to the histogram, the regressogram has a slower convergence rate compared to many other com-
petitors (we will introduce several other candidates). However, they (histogram and regressogram) are still
very popular because the construction of an estimator is very simple and intuitive; practitioners with little
mathematical training can easily master these approaches.

9.3 Kernel Regression

Given a point x0 , assume that we are interested in the value m(x0 ). Here is a simple method to estimate
that value. When m(x0 ) is smooth, an observation Xi ≈ x0 implies m(Xi ) ≈ m(x0 ). Thus, the response
value Yi = m(Xi ) + i ≈ m(x0 ) + i . Using this observation, to reduce the noise i , we can use the sample
average. Thus, an estimator of m(x0 ) is to take the average of those responses whose covariate are close to
x0 .
To make it more concrete, let h > 0 be a threshold. The above procedure suggests to use
P Pn
i:|Xi −x0 |≤h Yi Yi I(|Xi − x0 | ≤ h)
mb loc (x0 ) = = Pi=1
n , (9.1)
nh (x0 ) i=1 I(|Xi − x0 | ≤ h)

where nh (x0 ) is the number of observations where the covariate X : |Xi − x0 | ≤ h. This estimator, m
b loc ,
is called the local average estimator. Indeed, to estimate m(x) at any given point x, we are using a local
average as an estimator.
The local average estimator can be rewritten as
Pn n n
Yi I(|Xi − x0 | ≤ h) X I(|Xi − x0 | ≤ h) X
mb loc (x0 ) = Pi=1
n = Pn · Yi = Wi (x0 )Yi , (9.2)
i=1 I(|Xi − x0 | ≤ h) i=1 `=1 I(|X` − x0 | ≤ h) i=1

where
I(|Xi − x0 | ≤ h)
Wi (x0 ) = Pn (9.3)
`=1 I(|X` − x0 | ≤ h)
Pn
is a weight for each observation. Note that i=1 Wi (x0 ) = 1 and Wi (x0 ) > 0 for all i = 1, · · · , n; this implies
that Wi (x0 )’s are indeed weights. Equation (9.2) shows that the local average estimator can be written as
a weighted average estimator so the i-th weight Wi (x0 ) determines the contribution of response Yi to the
estimator mb loc (x0 ).
In constructing the local average estimator, we are placing a hard-thresholding on the neighboring points–
those within a distance h are given equal weight but those outside the threshold h will be ignored completely.
This hard-thresholding leads to an estimator that is not continuous.
To avoid problem, we consider another construction of the weights. Ideally, we want to give more weights
to those observations that are close to x0 and we want to have a weight that is ‘smooth’. The Gaussian
2
function G(x) = √12π e−x /2 seems to be a good candidate. We now use the Gaussian function to construct
an estimator. We first construct the weight
G x0 −X i

G h
Wi (x0 ) = Pn x0 −X`
.
`=1 G h
Lecture 9: Regression: Regressogram and Kernel Regression 9-3

The quantity h > 0 is the similar quantity to the threshold in the local average but now it acts as the
smoothing bandwidth of the Gaussian. After constructing the weight, our new estimator is
n n Pn
G x0 −X x0 −Xi

i=1 Yi G
X X i
G h h
m
b G (x0 ) = Wi (x0 )Yi = Pn x0 −X`
Yi = Pn x0 −X`
. (9.4)
i=1 i=1 `=1 G h `=1 G h

This new estimator has a weight that changes more smoothly than the local average and is smooth as we
desire.
Observing from equation (9.1) and (9.4), one may notice that these local estimators are all of a similar form:
Pn x0 −Xi n
K x0 −X

i=1 Yi K
X i
h K K h
m
b h (x0 ) = Pn x0 −X`
= Wi (x0 )Yi , Wi (x0 ) = Pn x0 −X`
, (9.5)
`=1 K h i=1 `=1 K h

where K is some function. When K is a Gaussian, we obtain estimator (9.4); when K is a uniform over
[−1, 1], we obtain the local average (9.1). The estimator in equation (9.5) is called the kernel regression
estimator or Nadaraya-Watson estimator1 . The function K plays a similar role as the kernel function in
the KDE and thus it is also called the kernel function. And the quantity h > 0 is similar to the smoothing
bandwidth in the KDE so it is also called the smoothing bandwidth.

9.3.1 Theory

b h . We skip the details of derivations2 .

Now we study some statistical properties of the estimator m
Bias. The bias of the kernel regression at a point x is

h2 m0 (x)p0 (x)

bias(mb h (x)) = µK m00 (x) + 2 + o(h2 ),
2 p(x)

x2 K(x)dx is the
R
where p(x) is the probability density function of the covariates X1 , · · · , Xn and µK =
same constant of the kernel function as in the KDE.
0 0
The bias has two components: a curvature component m00 (x) and a design component m (x)p p(x)
(x)
. The
curvature component is similar to the one in the KDE; when the regression function curved a lot, kernel
smoothing will smooth out the structure, introducing some bias. The second component, also known as the
design bias, is a new component compare to the bias in the KDE. This component depends on the density
of covariate p(x). Note that in some studies, we can choose the values of covariates so the density p(x) is
also called the design (this is why it is known as the design bias).
Variance. The variance of the estimator is
σ 2 · σK
2

1 1
Var(m
b h (x)) = · +o ,
p(x) nh nh

where σ 2 = Var(i ) is the error of the regression model and σK2

= K 2 (x)dx is a constant of the kernel
R

function (the same as in the KDE). This expression tells us possible sources of variance. First, the variance
increases when σ 2 increases. This makes perfect sense because σ 2 is the noise level. When the noise level
is large, we expect the estimation error increases. Second, the density of covariate p(x) is inversely related
to the variance. This is also very reasonable because when p(x) is large, there tends to be more data points
1 https://en.wikipedia.org/wiki/Kernel_regression
2 if you are interested in the derivation, check http://www.ssc.wisc.edu/ bhansen/718/NonParametrics2.pdf and http:
~
//www.maths.manchester.ac.uk/~peterf/MATH38011/NPR%20N-W%20Estimator.pdf
9-4 Lecture 9: Regression: Regressogram and Kernel Regression

1

around x, increasing the size of sample that we are averaging from. Last, the convergence rate is O nh ,
which is the same as the KDE.
MSE and MISE. Using the expression of bias and variance, the MSE at point x is
2
h4 2 m0 (x)p0 (x) σ 2 · σK
2

00 1 1
MSE(m
b h (x)) = µ m (x) + 2 + · + o(h4 ) + o
4 K p(x) p(x) nh nh

and the MISE is

2
h4 2 m0 (x)p0 (x) σ 2 · σK
2
Z Z
00 1 1
MISE(m
b h) = µ m (x) + 2 dx + dx + o(h4 ) + o . (9.6)
4 K p(x) nh p(x) nh

Optimizing the major components in equation (9.6) (the AMISE), we obtain the optimal value of the smooth-
ing bandwidth
hopt = C ∗ · n−1/5 ,
where C ∗ is a constant depending on p and K.

9.3.2 Uncertainty and Confidence Intervals

How do we assess the quality of our estimator m

b h (x)?
We can use the bootstrap to do it. In this case, empirical bootstrap, residual bootstrap, and wild bootstrap all
can be applied. But note that each of them relies on slightly different assumptions. Let (X1∗ , Y1∗ ), · · · , (Xn∗ , Yn∗ )
be the bootstrap sample. Applying the bootstrap sample to equation (9.5), we obtain a bootstrap kernel
b ∗h . Now repeat the bootstrap procedure B times, this yields
regression, denoted as m
∗(1) ∗(B)
m
bh ,··· ,m
bh ,

B bootstrap kernel regression estimator. Then we can estimate the variance of m

b h (x) by the sample variance
B B
1 X ∗(`)
¯ ∗ (x) , ¯ ∗ 1 X ∗(`)
Var
d B (m
b h (x)) = b h (x) − m
m h,B b h,B (x) =
m m
b h (x).
B−1
b
B
`=1 `=1

Similarly, we can estimate the MSE as what we did in Lecture 5 and 6. However, when using the bootstrap
to estimate the uncertainty, one has to be very careful because when h is either too small or too large, the
bootstrap estimate may fail to converge its target.
When we choose h = O(n−1/5 ), the bootstrap estimate of the variance is consistent but the bootstrap
estimate of the MSE might not be consistent. The main reason is: it is easier for the bootstrap to estimate
the variance than the bias. Thus, when we choose h in such a way, both bias and the variance contribute a
lot to the MSE so we cannot ignore the bias. However, in this case, the bootstrap cannot estimate the bias
consistently so the estimate of the MSE is not consistent.
Confidence interval. To construct a confidence interval of m(x), we will use the following property of the
kernel regression:
√ σ 2 · σK
2

D
b h (x) − E(m
nh (m b h (x))) → N 0,
p(x)
b h (x) − E(m
m b h (x) D
→ N (0, 1).
Var(mb h (x))
Lecture 9: Regression: Regressogram and Kernel Regression 9-5

The variance depends on three quantities: σ 2 , σK

2
, and p(x). The quantity σK 2
is known because it is just a
characteristic of the kernel function. The density of covariates p(x) can be estimated using a KDE. So what
remains unknown is the noise level σ 2 . A good news is: we can estimate it using the residuals. Recall that
residuals are
ei = Yi − Ybi = Yi − m
b h (Xi ).
When m b h ≈ m, the residual becomes an approximation to the noise i . The quantity σ 2 = Var(1 ) so we
can use the sample variance of the residuals to estimate it (note that the average of residuals is 0):
n
1 X
b2 =
σ e2 , (9.7)
n − 2ν + νe i=1 i

where ν, νe are quantities acting as degree-of-freedom in which we will explain later. Thus, a 1 − α CI can be
constructed using
b · σK
σ
b h (x) ± z1−α/2 p
m ,
pbn (x)
where pbn (x) is the KDE of the covariates.
Bias issue.

9.3.3 Resampling Techniques

Cross-validation.
Bootstrap approach.
http://faculty.washington.edu/yenchic/17Sp_403/Lec8-NPreg.pdf

9.3.4 Relation to KDE

Many theoretical results of the KDE apply to the nonparametric regression. For instance, we can generalize
the MISE into other types of error measurement between m b h and m. We can also use derivatives of m b h as
estimators of the corresponding derivatives of m. Moreover, when we have a multivariate covariate, we can
use either a radial basis kernel or a product kernel to generalize the kernel regression to multivariate case.
The KDE and the kernel regression has a very interesting relationship. Using the given bivariate random
sample (X1 , Y1 ), · · · , (Xn , Yn ), we can estimate the joint PDF p(x, y) as
n
1 X Xi − x Yi − y
pbn (x, y) = K K .
nh2 i=1 h h

This joint density estimator also leads to a marginal density estimator of X:

n
Xi − x
Z
1 X
pbn (x) = pbn (x, y)dy = K .
nh i=1 h

Now recalled that the regression function is the conditional expectation

Z Z R
p(x, y) yp(x, y)dy
m(x) = E(Y |X = x) = yp(y|x)dy = y dy = .
p(x) p(x)
9-6 Lecture 9: Regression: Regressogram and Kernel Regression

Replacing p(x, y) and p(x) by their corresponding estimators pbn (x, y) and pbn (x), we obtain an estimate of
m(x) as
R
yb
pn (x, y)dy
m
b n (x) =
pbn (x)
R 1 Pn
y nh2 i=1 K Xih−x K Yih−y dy
= 1
Pn Xi −x

nh i=1 K h
Pn
Xi −x Yi −y dy
R
i=1 K h · y · K h h
= Pn Xi −x

i=1 K h
Pn Xi −x

i=1 K h Y i
= P n Xi −x

i=1 K h
Pn Xi −x

i=1 i Y K h
= Pn Xi −x

i=1 K h
=m b h (x).

Note that when K(x) is symmetric, y·K Yih−y dy
R
h = Yi . Namely, we may understand the kernel regression
as an estimator inverting the KDE of the joint PDF into a regression estimator.

9.4 Linear Smoother

Now we are going to introduce a very important notion called linear smoother. Linear smoother is a collection
of many regression estimators that have nice properties. The linear smoother is an estimator of the regression
function in the form that
Xn
m(x)
b = `i (x)Yi , (9.8)
i=1

where `i (x) is some function depending on X1 , · · · , Xn but not on any of Y1 , · · · , Yn .

The residual for the i-th observation can be written as
n
X
ej = Yj − m(X
b j ) = Yj − `i (Xj )Yi .
i=1

Let e = (e1 , · · · , en )T be the vector of residuals and define an n × n matrix L as Lij = `j (Xi ):
 
`1 (X1 ) `2 (X1 ) `3 (X1 ) · · · `n (X1 )
 `1 (X2 ) `2 (X2 ) `3 (X2 ) · · · `n (X2 ) 
L= .
 
.. .. .. .. 
 .. . . . . 
`1 (Xn ) `2 (Xn ) `3 (Xn ) · · · `n (Xn )
b = (Yb1 , · · · , Ybn )T = LY , where Y = (Y1 , · · · , Yn )T is the vector of observed Yi ’s
Then the predicted vector Y
and e = Y − Y = Y − LY = (I − L)Y.
b

Example: Linear Regression. For the linear regression, let X denotes the data matrix (first column is all
value 1 and second column is X1 , · · · , Xn ). We know that βb = (XT X)−1 XT Y and Y
b = Xβb = X(XT X)−1 XT Y.
This implies that the matrix L is
L = X(XT X)−1 XT ,
Lecture 9: Regression: Regressogram and Kernel Regression 9-7

which is also the projection matrix in linear regression. Thus, the linear regression is a linear smoother.
Example: Regressogram. The regressogram is also a linear smoother. Let B1 , · · · , Bm be the bins of the
covariate and define B(x) be the bin such that x belongs to. Then
I(Xj ∈ B(x))
`j (x) = Pn .
i=1 I(Xi ∈ B(x))

Example: Kernel Regression. As you may expect, the kernel regression is also a linear smoother. Recall
from equation (9.5)
Pn x0 −Xi n
K x0 −X

i=1 Yi K
X i
h K K h
mb h (x0 ) = Pn x0 −X`
= Wi (x0 )Yi , Wi (x0 ) = Pn x0 −X`

`=1 K h i=1 `=1 K h
so
x−Xj
K h
`j (x) = Pn x−X`
.
`=1 K h

9.4.1 Variance of Linear Smoother

The linear smoother has an unbiased estimator of the underlying noise level σ 2 . Recall that then noise level
σ 2 = Var(i ).
We need to use two tricks about variance and covariance matrix. For a matrix A and a random variable X,
Cov(AX) = ACov(X)AT .
Thus, the covariance matrix of the residual vector
Cov(e) = Cov((I − L)Y) = (I − L)Cov(Y)(I − LT ).
Because Y1 , · · · , Yn are IID, Cov(Y) = σ 2 In , where In is the n × n identity matrix. This implies
Cov(e) = (I − L)Cov(Y)(I − LT ) = σ 2 (I − L − LT + LLT ).

Now taking matrix trace in both side,

n
X
Tr(Cov(e)) = Var(ei ) = σ 2 Tr(I − L − LT + LLT ) = σ 2 (n − ν − ν + νe),
i=1

where ν = Tr(L) and νe = Tr(LLT ). Because the residual square is approximately Var(ei ), we have
n
X n
X
e2i ≈ Var(ei ) = σ 2 (n − 2ν + νe).
i=1 i=1
2
Thus, we can estimate σ by
n
1 X
b2 =
σ e2 , (9.9)
n − 2ν + νe i=1 i
which is what we did in equation (9.7). The quantity ν is called the degree of freedom. InPthe linear regression
1 n
case, ν = νe = p + 1, the number of covariates so the variance estimator σ b2 = n−p−1 2
i=1 ei . If you have
learned the variance estimator of a linear regression, you should be familiar with this estimator.
The degree of freedom ν is easy to interpret in the linear regression. And the power of equation (9.9) is that
it works for every linear smoother as long as the errors i ’s are IID. So it shows how we can define effective
degree of freedom for other complicated regression estimator.

The Unfolding of Time and Space
No ratings yet
The Unfolding of Time and Space
38 pages
ML20 Quiz 3 - Fin2 Fall 2024
No ratings yet
ML20 Quiz 3 - Fin2 Fall 2024
4 pages
Lecture Notes Week One
No ratings yet
Lecture Notes Week One
16 pages
Karma: What It Is, What It Isn't, Why It Matters. by Traleg Kyabgon
100% (1)
Karma: What It Is, What It Isn't, Why It Matters. by Traleg Kyabgon
2 pages
Notes
No ratings yet
Notes
199 pages
Stat 111: Introduction To Statistical Inference: ©2023 by Joseph K. Blitzstein and Neil Shephard
No ratings yet
Stat 111: Introduction To Statistical Inference: ©2023 by Joseph K. Blitzstein and Neil Shephard
387 pages
Standard Scores: Shiela Mae C. Gatchalian
No ratings yet
Standard Scores: Shiela Mae C. Gatchalian
11 pages
Question Bank 1
No ratings yet
Question Bank 1
4 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Non Par Regression
No ratings yet
Non Par Regression
35 pages
Lecture 04
No ratings yet
Lecture 04
19 pages
Nonparametric Statistics Epiphany 2024-25
No ratings yet
Nonparametric Statistics Epiphany 2024-25
102 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
Kernel Smoothing-MP Wand-MC Jones-1995
100% (1)
Kernel Smoothing-MP Wand-MC Jones-1995
228 pages
Applied Nonparametric Regression
No ratings yet
Applied Nonparametric Regression
7 pages
6A. Econometrics Review
No ratings yet
6A. Econometrics Review
8 pages
Hardle - Applied Nonparametric Regression
No ratings yet
Hardle - Applied Nonparametric Regression
433 pages
Nonparametric and Semiparametric Models
No ratings yet
Nonparametric and Semiparametric Models
325 pages
Regression Tree
No ratings yet
Regression Tree
7 pages
Intro To Regression
No ratings yet
Intro To Regression
4 pages
Exercises Chapter2 Part1
No ratings yet
Exercises Chapter2 Part1
2 pages
ID3, Information Gain and Entropy
No ratings yet
ID3, Information Gain and Entropy
8 pages
FEU Diliman - Forecasting Techniques
No ratings yet
FEU Diliman - Forecasting Techniques
5 pages
Biostats L2
No ratings yet
Biostats L2
36 pages
STATISTICAL TREATMENT OF DATA (Autosaved)
No ratings yet
STATISTICAL TREATMENT OF DATA (Autosaved)
20 pages
XXXX Statistical Estimation
No ratings yet
XXXX Statistical Estimation
87 pages
Article Kernal Model
No ratings yet
Article Kernal Model
9 pages
Table 9 3 Contains 40 Annual Counts of The Numbers of Recruits and Spawners in A Salmon
No ratings yet
Table 9 3 Contains 40 Annual Counts of The Numbers of Recruits and Spawners in A Salmon
2 pages
PH1700 Session 4b - Stu - Poisson - Estimation & Inference
No ratings yet
PH1700 Session 4b - Stu - Poisson - Estimation & Inference
38 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Kernel
No ratings yet
Kernel
3 pages
Biostatistics Unit 10. Measures of Relationship
No ratings yet
Biostatistics Unit 10. Measures of Relationship
37 pages
Supervised Learning by Fadhlurrohman Henriwan
No ratings yet
Supervised Learning by Fadhlurrohman Henriwan
31 pages
New Quantum Paradox Clarifies Where Our Views of Reality Go Wrong
No ratings yet
New Quantum Paradox Clarifies Where Our Views of Reality Go Wrong
32 pages
Exercise 20
No ratings yet
Exercise 20
2 pages
Chapter 6 - Evaluating Quantitative Data
No ratings yet
Chapter 6 - Evaluating Quantitative Data
21 pages
Energy Resonance System
No ratings yet
Energy Resonance System
2 pages
Islp 1
No ratings yet
Islp 1
15 pages
M 3 F 22 CH VIII
No ratings yet
M 3 F 22 CH VIII
25 pages
Quantum Uncertainty and Wave-Particle Duality Are Equivalent
No ratings yet
Quantum Uncertainty and Wave-Particle Duality Are Equivalent
3 pages
Business Analytics Syllubus
No ratings yet
Business Analytics Syllubus
1 page
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
No ratings yet
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
5 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
ECMT1020 Formulas 2021
No ratings yet
ECMT1020 Formulas 2021
9 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Chapter Iv: Stochastic Processes in Discrete Time 1. Filtrations
No ratings yet
Chapter Iv: Stochastic Processes in Discrete Time 1. Filtrations
12 pages
Multivariate Classification
No ratings yet
Multivariate Classification
7 pages
IM & EE-lecture PPT - CH-2
No ratings yet
IM & EE-lecture PPT - CH-2
29 pages
There Is No Matter, Only Waves
No ratings yet
There Is No Matter, Only Waves
6 pages
Kernel Smoothers: An Overview of Curve Estimators For The First Graduate Course in Nonparametric Statistics
No ratings yet
Kernel Smoothers: An Overview of Curve Estimators For The First Graduate Course in Nonparametric Statistics
13 pages
Statistics
No ratings yet
Statistics
53 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
MobileNetSSD Deploy - Prototxt
No ratings yet
MobileNetSSD Deploy - Prototxt
33 pages
Chapter 9 - Statistical Estimat - 2016 - Introduction To Statistical Machine Lea
No ratings yet
Chapter 9 - Statistical Estimat - 2016 - Introduction To Statistical Machine Lea
8 pages
Bias-Variance Tradeoffs: 1 Single Sample MLE
No ratings yet
Bias-Variance Tradeoffs: 1 Single Sample MLE
7 pages
Multivariat Kernel Regression
No ratings yet
Multivariat Kernel Regression
3 pages
Nadaraya-Watson Teoria PDF
No ratings yet
Nadaraya-Watson Teoria PDF
9 pages
Business Analytics
No ratings yet
Business Analytics
19 pages
Adv Stat Inf
No ratings yet
Adv Stat Inf
194 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Eco No Metrics
No ratings yet
Eco No Metrics
1,045 pages
Information Converted To Energy
No ratings yet
Information Converted To Energy
4 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
No ratings yet
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
8 pages
Exploring Negative Space: Log Cross Ratio 2i
100% (3)
Exploring Negative Space: Log Cross Ratio 2i
11 pages
Lesson 6: Correlation and Linear Regression
No ratings yet
Lesson 6: Correlation and Linear Regression
39 pages
Multiple Regression
No ratings yet
Multiple Regression
32 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Something From Nothing
100% (1)
Something From Nothing
6 pages
Econometrics
No ratings yet
Econometrics
310 pages
Fundamentals of Mathematical Statistics 2020
No ratings yet
Fundamentals of Mathematical Statistics 2020
196 pages
Exploring Data Patterns: Time Series
No ratings yet
Exploring Data Patterns: Time Series
20 pages
Testing Two Related Means
No ratings yet
Testing Two Related Means
19 pages
Cursus Advanced Econometrics
No ratings yet
Cursus Advanced Econometrics
129 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Smoothing: Smooth
No ratings yet
Smoothing: Smooth
19 pages
Capital Asset Pricing Model
No ratings yet
Capital Asset Pricing Model
1 page
Applied Nonparametric Regression
No ratings yet
Applied Nonparametric Regression
433 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
Quantum Pure Possibilities
No ratings yet
Quantum Pure Possibilities
44 pages
Lecture of Simca and Classification
No ratings yet
Lecture of Simca and Classification
14 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Emptiness and Category Theory
No ratings yet
Emptiness and Category Theory
14 pages
Prewhitening With SPSS
No ratings yet
Prewhitening With SPSS
4 pages
Statand Prob Q4 M1
100% (1)
Statand Prob Q4 M1
16 pages
Soederlind P. Lecture Notes For Econometrics (LN, Stockholm, 2002) (L) (86s) - GL - PDF
No ratings yet
Soederlind P. Lecture Notes For Econometrics (LN, Stockholm, 2002) (L) (86s) - GL - PDF
86 pages
UMI QM Coursework
No ratings yet
UMI QM Coursework
10 pages
Tinnitus - How An Alternative Remedy Became The Only Weapon Against The Ringing
No ratings yet
Tinnitus - How An Alternative Remedy Became The Only Weapon Against The Ringing
13 pages
Eco No Metrics
No ratings yet
Eco No Metrics
312 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
PDF Lind 18e Chap006 - Compress
No ratings yet
PDF Lind 18e Chap006 - Compress
32 pages
Space Time and Consciousness
No ratings yet
Space Time and Consciousness
10 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Informatics and Consciousness Transfer
No ratings yet
Informatics and Consciousness Transfer
13 pages
BOOK Nonparametric and Semiparametric Models-2004
No ratings yet
BOOK Nonparametric and Semiparametric Models-2004
87 pages
Regression 101
No ratings yet
Regression 101
18 pages
Notes On Estimation
No ratings yet
Notes On Estimation
76 pages
MMW Reviewer PDF
100% (1)
MMW Reviewer PDF
16 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
Decomposition of Time Series
No ratings yet
Decomposition of Time Series
14 pages
Assignment 7
No ratings yet
Assignment 7
3 pages
Trip Generation
No ratings yet
Trip Generation
39 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

Nonparametric Regression

Uploaded by

Nonparametric Regression

Uploaded by

STAT 425: Introduction to Nonparametric Statistics Winter 2018

Lecture 9: Regression: Regressogram and Kernel Regression

Reference: Chapter 5 of All of nonparametric statistics.

9.2 Regressogram (Binning)

regressogram = regression + histogram.

Bias. The bias of a regressogram estimator is

Variance. The variation of a regressogram estimator is

Therefore, the MSE and MISE will be at rate

9.3 Kernel Regression

b h . We skip the details of derivations2 .

where σ 2 = Var(i ) is the error of the regression model and σK2

and the MISE is

9.3.2 Uncertainty and Confidence Intervals

How do we assess the quality of our estimator m

B bootstrap kernel regression estimator. Then we can estimate the variance of m

The variance depends on three quantities: σ 2 , σK

9.3.3 Resampling Techniques

9.3.4 Relation to KDE

This joint density estimator also leads to a marginal density estimator of X:

Now recalled that the regression function is the conditional expectation

9.4 Linear Smoother

where `i (x) is some function depending on X1 , · · · , Xn but not on any of Y1 , · · · , Yn .

9.4.1 Variance of Linear Smoother

Now taking matrix trace in both side,

You might also like

where σ 2 = Var(i ) is the error of the regression model and σK2