Nonparametric Inference Using Orthogonal Functions
Nonparametric Inference Using Orthogonal Functions
8.1 Introduction
In this chapter we use orthogonal function methods for nonparametric infer-
ence. Specifically, we use an orthogonal basis to convert regression and density
estimation into a Normal means problem and then we construct estimates and
confidence sets using the theory from Chapter 7. In the regression case, the
resulting estimators are linear smoothers and thus are a special case of the es-
timators described in Section 5.2. We discuss another approach to orthogonal
function regression based on wavelets in the next chapter.
where i ∼ N (0, 1) are iid. For now, we assume a regular design meaning
that xi = i/n, i = 1, . . . , n.
Let φ1 , φ2 , . . . be an orthonormal basis for [0, 1]. In our examples we will
often use the cosine basis:
√
φ1 (x) ≡ 1, φj (x) = 2 cos((j − 1)πx), j ≥ 2. (8.2)
Expand r as
∞
r(x) = θj φj (x) (8.3)
j=1
1
where θj = 0 φj (x)r(x)dx.
First, we approximate r by
n
rn (x) = θj φj (x)
j=1
which is the projection of r onto the span of {φ1 , . . . , φn }.1 This introduces
an integrated squared bias of size
1 ∞
Bn (θ) = (r(x) − rn (x))2 dx = θj2 .
0 j=n+1
1
sup Bn (θ) = O . (8.5)
θ∈Θ(m,c) n2m
Hence this bias is negligible and we shall ignore it for the rest of the chapter.
More precisely, we will focus on estimating rn rather than r. Our next task is
to estimate the θ = (θ1 , . . . , θn ). Let
n
1
Zj = Yi φj (xi ), j = 1, . . . . (8.6)
n i=1
1 More
p(n)
generally we could take rn (x) = j=1 θj φj (x) where p(n) → ∞ at an appropriate
rate.
2 See Definition 7.2.
8.2 Nonparametric Regression 185
θ = bZ = (b1 Z1 , b2 Z2 , . . . , bn Zn ). (8.8)
b = (1, . . . , 1, 0, . . . , 0).
1 ≥ b1 ≥ · · · ≥ bn ≥ 0.
The set of constant modulators is denoted by MCONS , the set of nested sub-
set modulators is denoted by MNSS and the set of monotone modulators is
denoted by MMON .
Given a modulator b = (b1 , . . . , bn ), the function estimator is
n n
rn (x) = θj φj (x) = bj Zj φj (x). (8.9)
j=1 j=1
Observe that n
rn (x) = Yi i (x) (8.10)
i=1
where n
1
i (x) = bj φj (x)φj (xi ). (8.11)
n j=1
Chapter 5. We shall address the problem using Stein’s unbiased risk estima-
tor (Section 7.4) instead of cross-validation.
Let 3 n 4
R(b) = Eθ (bj Zj − θj )2
j=1
3 We call this a modified risk estimator since we have inserted an estimate σ of σ and we
replaced (Zj2 − σ
2 /n) with (Zj2 − σ
2 /n)+ which usually improves the risk estimate.
8.2 Nonparametric Regression 187
For a fixed b, we expect that R(b) approximates R(b). But for the react
estimator we require more: we want R(b) to approximate R(b) uniformly for
b ∈ M. If so, then inf b∈M R(b) ≈ inf b∈M R(b) and the b that minimizes R(b)
should be nearly as good as the b that minimizes R(b). This motivates the
next result.
8.17 Theorem (Beran and Dümbgen, 1998). Let M be one of MCONS , MNSS
or MMON . Let R(b) denote the true risk of the estimator (b1 Z1 , . . . , bn Zn ).
Let b∗ minimize R(b) over M and let b minimize R(b)
over M. Then
|R(b) − R(b∗ )| → 0
J
and set rn (x) = j=1 Zj φj (x). It is a good idea to plot the estimated risk as
a function of J. To minimize R(b)
over MMON , note that R(b) can be written
as
n n
2
σ
R(b) = (bi − gi )2 Zi2 + gi (8.19)
i=1
n i=1
where gi = (Zi2 − (
σ 2 /n))/Zi2 . So it suffices to minimize
n
(bi − gi )2 Zi2
i=1
188 8. Nonparametric Inference Using Orthogonal Functions
Summary of react
n
1. Let Zj = n−1 i=1 Yi φj (xi ) for j = 1, . . . , n.
3. Let
J
rn (x) = Zj φj (x).
j=1
8.20 Example (Doppler function). Recall that the Doppler function from
Example 5.63 is
2.1π
r(x) = x(1 − x) sin .
x + .05
The top left panel in Figure 8.1 shows the true function. The top right panel
shows 1000 data points. The data were simulated from the model Yi = r(i/n)+
σi with σ = 0.1 and i ∼ N (0, 1). The bottom left panel shows the estimated
risk for the NSS modulator as a function of the number of terms in the fit.
The risk was minimized by using the modulator:
b = (1, . . . , 1, 0, . . . , 0).
% &# $ % &# $
187 813
The bottom right panel shows the react fit. Compare with Figure 5.6.
8.21 Example (CMB data). Let us compare react to local smoothing for the
CMB data from Example 4.4. The estimated risk (for NSS) is minimized by
using J = 6 basis functions. The fit is shown in Figure 8.2. and is similar to
the fits obtained in Chapter 5. (We are ignoring the fact that the variance is
not constant.) The plot of the risk reveals that there is another local minimum
around J = 40. The bottom right plot shows the fit using 40 basis functions.
This fit appears to undersmooth.
8.2 Nonparametric Regression 189
1
1
0
0
−1
−1
0.0 0.5 1.0 0.0 0.5 1.0
0 500 1000 1
0
−1 0.0 0.5 1.0
FIGURE 8.1. Doppler test function. Top left: true function. Top right: 1000 data
points. Bottom left: estimated risk as a function of the number of terms in the fit.
Bottom right: final react fit.
There are several ways to construct a confidence set for r. We begin with
confidence balls. First, construct a confidence ball Bn for θ = (θ1 , . . . , θn )
using any of the methods in Section 7.8. Then define
n
*
Cn = r= θj φj : (θ1 , . . . , θn ) ∈ Bn . (8.22)
j=1
8.23 Theorem (Beran and Dümbgen, 1998). Let θ be the MON or NSS esti-
2 be the estimator of σ defined in (8.12). Let
mator and let σ
n
*
Bn = θ = (θ1 , . . . , θn ) : 2 2
(θj − θj ) ≤ sn (8.24)
j=1
where
s2n b) + τ√zα
= R(
n
2
σ 4 2
τ2 = (2bj − 1)(1 − cj )
n j
2
σ 2
σ2
+ 4 Zj2 − (1 − bj ) + (2bj − 1)cj
j
n
190 8. Nonparametric Inference Using Orthogonal Functions
0 450 900 0 25 50
0 450 900
FIGURE 8.2. CMB data using react. Top left: NSS fit using J = 6 basis functions.
Top right: estimated risk. Bottom left: NSS fit using J = 40 basis functions.
and
0 if j ≤ n − J
cj =
1/J if j > n − J.
Then, for any c > 0 and m > 1/2,
To construct confidence bands, we use the fact that rn is a linear smoother
and we can then use the method from Section 5.7. The band is given by (5.99),
namely,
I(x) = rn (x) − c σ
||(x)||, rn (x) + c σ
||(x)|| (8.25)
where
n
1
||(x)||2 ≈ b2j φ2j (x) (8.26)
n j=1
handle this case. The simplest is to use a basis {φ1 , . . . , φn } that is orthogonal
with respect to the design points x1 , . . . , xn . That is, we choose a basis for
L2 (Pn ) where Pn = n−1 ni=1 δi and δi is a point mass at xi . This requires
that
||φ2j || = 1, j = 1, . . . , n
and
φj , φk = 0, 1≤j<k≤n
where
n
1
f, g = f (x)g(x)dPn (x) = f (xi )g(xi )
n i=1
and
n
1
||f ||2 = f 2 (x)dPn (x) = f 2 (xi ).
n i=1
ψ1 (x)
φ1 (x) = where ψ1 (x) = g1 (x)
||ψ1 ||
and
ar,j = gr , φj .
It follows that
σ2
Zj ≈ N θj ,
n
and we can then use the methods that we developed in this chapter.
192 8. Nonparametric Inference Using Orthogonal Functions
Then,
E(Zj ) = φj (x)f (x)dx = θj
and
1
V(Zj ) = φ2j (x)f (x)dx − θj2 ≡ σj2 .
n
We estimate σj2 by
n
1 2
j2 =
σ (φj (Xi ) − Zj )
n2 i=1
8.8 Exercises
1. Prove Lemma 8.4.
7. Get the data on fragments of glass collected in forensic work from the
book website. Let Y be refractive index and let x be aluminium con-
tent (the fourth variable). Perform a nonparametric regression to fit the
model Y = r(x) + . Use react and compare to local linear smoothing.
Estimate the variance. Construct 95 percent confidence bands for your
estimate.
8. Get the motorcycle data from the book website. The covariate is time
(in milliseconds) and the response is acceleration at time of impact. Use
react to fit the data. Compute 95 percent confidence bands. Compute
a 95 percent confidence ball. Can you think of a creative way to display
the confidence ball?
8.8 Exercises 195
10. Repeat the previous exercise but use Cauchy errors instead of Normal
errors. How might you change the procedure to make the estimators
more robust?
11. Generate n = 1000 data points from (1/2)N (0, 1) + (1/2)N (µ, 1). Com-
pare kernel density estimators and react density estimators. Try µ =
0, 1, 2, 3, 4, 5.
12. Recall that a modulator is any vector of the form b = (b1 , . . . , bn ) such
that 0 ≤ bj ≤ 1, j = 1, . . . , n. The greedy modulator is the modulator
b∗ = (b∗1 , . . . , b∗n ) chosen to minimize the risk R(b) over all modulators.
(a) Find b∗ .
(b) What happens if we try to estimate b∗ from the data? In particular,
consider taking b∗ to minimize the estimated risk R.
Why will this not
work well? (The problem is that we are now minimizing R over a very
large class and R does not approximate R uniformly over such a large
class.)
13. Let
Yi = r(x1i , x2i ) + i
14. Download the air quality data set from the book website. Model ozone
as a function of solar R, wind and temperature. Use a tensor product
basis.