0% found this document useful (0 votes)
154 views4 pages

NPR N-W Estimator

The document describes the derivation and properties of the Nadaraya-Watson estimator, a non-parametric regression method. It shows that the estimator is a weighted average of the response values, where the weights are determined by a kernel function of the distance between points. Asymptotically, the bias of the estimator depends on the second derivative of the regression function and kernel, while the variance depends on the error variance, kernel, and number of points. The mean squared error balances bias and variance terms.

Uploaded by

Manu
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views4 pages

NPR N-W Estimator

The document describes the derivation and properties of the Nadaraya-Watson estimator, a non-parametric regression method. It shows that the estimator is a weighted average of the response values, where the weights are determined by a kernel function of the distance between points. Asymptotically, the bias of the estimator depends on the second derivative of the regression function and kernel, while the variance depends on the error variance, kernel, and number of points. The mean squared error balances bias and variance terms.

Uploaded by

Manu
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

The Nadaraya-Watson Estimator

Derivation of the estimator


We have a random sample of bivariate data (x
1
, Y
1
), . . . , (x
n
, Y
n
).
The Nadaraya-Watson estimator we will be studying in this section is
more suitable for a random design. ie. when the data come from a joint pdf
f(x, y). The regression model is
Y
i
= m(x
i
) +e
i
, i = 1, . . . , n
where m() is unknown. The errors {
i
} satisfy
E(
i
) = 0, V (
i
) =
2

, Cov(
i
,
j
) = 0 for i = j.
To derive the estimator note that we can express m(x) in terms of the
joint pdf f(x, y) as follows:
m(x) = E[Y | X = x] =
_
yf(y | x)dy =
_
yf(x, y)dy
_
f(x, y)dy
We want to estimate the numerator and denominator separately using kernel
estimators. Firstly, for the joint density f(x, y) we use a product kernel
density estimator. ie

f(x, y) =
1
nh
x
h
y
n

i=1
K
_
x x
i
h
x
_
K
_
y y
i
h
y
_
=
1
n
n

i=1
K
h
x
(x x
i
)K
h
y
(y y
i
)
Hence, we have that
_
y

f(x, y)dy =
1
n
_
y
n

i=1
K
h
x
(x x
i
)K
h
y
(y y
i
)
Now,
_
yK
h
y
(y y
i
)dy = y
i
. Hence, we can write that
_
y

f(x, y)dy =
1
n
n

i=1
K
h
x
(x x
i
)y
i
This is our estimate of the numerator. For the denominator we have
_

f(x, y)dy =
1
n
n

i=1
K
h
x
(x x
i
)
_
K
h
y
(y y
i
)dy
=
1
n
n

i=1
K
h
x
(x x
i
) since the integral wrt y equals one
=

f(x)
1
Therefore, the Nadaraya-Watson estimate of the unknown regression func-
tion is given by
m(x) =

n
i=1
K
h
x
(x x
i
)y
i

n
i=1
K
h
x
(x x
i
)
=
n

i=1
W
h
x
(x, x
i
)y
i
where the weight function W
h
x
(x, x
i
) =
K
h
x
(xx
i
)

n
i=1
K
h
x
(xx
i
)
. Note that

n
i=1
W
h
x
(x, x
i
) =
1. This kernel regression estimator was rst proposed by Nadaraya (1964)
and Watson (1964). Note that the estimator is linear in the observations
{y
i
} and is, therefore, a linear smoother.
Asymptotic properties
This is complicated by the fact that the estimator is the ratio of two correlated
random variables. In the denominator we have that
E

f(x) f(x) +
h
2
2

2
K
f
(2)
(x)
and V (

f(x))
R(K)f(x)
nh
(See Section 2 on kernel density estimation)
For the the numerator,
E
_
n

i=1
K
h
x
(x x
i
)Y
i
_
=
_ _
v
1
n
K
_
x u
h
x
_
f(u, v)dudv
=
_ _
vK(s)f(x hs, v)dsdv (+)
using the change of variable s =
xu
h
x
. Now,
f(v | x hs) =
f(x hs, v)
f(x hs)
so that f(x hs, v) = f(v | x hs)f(x hs). The integral in (+) above is
therefore equal to
_ _
vK(s)f(v | x hs
)
f(x hs)dsdv =
_
K(s)f(x hs)
_
vf(v | x hs)dvds
2
=
_
K(s)f(x hs)m(x hs)ds
= f(x)m(x) +h
2
x

2
K
[f
(1)
(x)m
(1)
(x) +f
(2)
(x)m(x)/2 +f(x)m
(2)
(x)/2 +o(h
2
)]
using Taylor series expansions for f(x hs) and m(x hs). Therefore,
E m(x)
E
_

f(x, y)ydy
E

f(x)

f(x)[m(x) +h
2
x

2
K
(f
(1)
m
(1)
/f +f
(2)
m/(2f) +m
(2)
/2)]
f(x)[1 +h
2
x

2
K
f
(2)
/(2f)]
= m(x) +
h
2
x
2

2
K
_
m
(2)
(x) + 2m
(1)
(x)
f
(1)
(x)
f(x)
_
using the approximation that 1 +h
2
c)
1
(1 h
2
c) for small h in the factor
in the denominator and multiplying through. Hence, for a random design,
the
bias( m(x))
h
2
x
2

2
K
_
m
(2)
(x) + 2m
(1)
(x)
f
(1)
(x)
f(x)
_
However, in the xed design case the
bias( m(x))
h
2
x
2
m
(2)
(x)
When f
(1)
(x) = 0 the bias with a random design equals that with a xed
design. However, the two situations are not identical. The random design
has zero probability of being equally-spaced, even when f(x) is the U(0, 1)
pdf.
The V ( m(x)) can be obtained by using the following approximation for
the variance of the ratio of two random variables, N and D:
V
_
N
D
_

_
EN
ED
_
2
_
V (N)
(EN)
2
+
V (D)
(ED)
2

2Cov(N, D)
(EN)(ED)
_
provided the variance of the ratio exists. This result is based on a rst-order
Taylor series expansion. Now,
V
_
1
n
n

i=1
K
h
x
(x x
i
)Y
i
_
=
1
n
E[K
h
x
(x x
i
)Y
i
]
2
O(n
1
)

R(K)f(x)
nh
[
2

+m(x)
2
]
3
using the facts that
_
v
2
f(v | x hs) = [
2

(x hs) + m(x hs)


2
] and

(x) =
2

for all x. (ie. a constant).


Also,
V (

f(x))
R(K)f(x)
nh
Finally,
Cov
_
1
n
n

i=1
K
h
x
(x x
i
)Y
i
,
1
n
n

i=1
K
h
x
(x x
i
)
_
=
1
n
E[K
h
x
(x x
i
)
2
Y
i
] O(n
1
)

R(K)f(x)m(x)
nh
Substituting into the approximation formula gives
V ( m(x))
R(K)
2

nhf(x)
The variance of m(x) involves terms relating to the error variance
2

and
the relative amount of data through f(x).
We can use the above point-wise bias and variance results to construct
an expression for the AMSE of m(x) which is as follows:
AMSE( m(x))
h
4
x
4

4
K
_
m
(2)
(x) + 2m
(1)
(x)
f
(1)
(x)
f(x)
_
2
+
R(K)
2

nhf(x)
4

You might also like