Em 602
Em 602
Abstract. In this paper, we derive the asymptotic properties of average derivative estimators
when the regressors are contaminated with classical measurement error and the density of this
error is unknown. Average derivatives of conditional mean functions are used extensively in
economics and statistics, most notably in semiparametric index models. As well as ordinary
smooth measurement error, we provide results for supersmooth error distributions. This is a
particularly important class of error distribution as it includes the popular Gaussian density.
We show that under this ill-posed inverse problem, despite using nonparametric deconvolution
√
techniques and an estimated error characteristic function, we are able to achieve a n rate
of convergence for the average derivative estimator. Interestingly, if the measurement error
density is symmetric, the asymptotic variance of the average derivative estimator is the same
irrespective of whether the error density is estimated or not.
1. Introduction
Since the seminal paper of Powell, Stock and Stoker (1989), average derivatives have enjoyed
much popularity. They have found primary use in estimating coefficients in single index models,
where Powell, Stock and Stoker (1989) showed that these estimators identify the parameters of
interest up-to-scale. They have also been employed to great effect in the estimation of consumer
demand functions (see, for example, Blundell, Duncan and Pendakur, 1998, and Yatchew, 2003)
and sample selection models (for example, Das, Newey and Vella, 2003). Finally, several testing
procedures have also made use of these estimators (see, for example, Härdle, Hildenbrand and
Jerison, 1991, and Racine, 1997).
√
A key benefit of average derivate estimators is their ability to achieve a n rate of con-
vergence despite being constructed using nonparametric techniques. Powell, Stock and Stoker
(1989), among many others, demonstrated this parametric rate in the standard case of correctly
measured regressors. Fan (1995) extended this result to allow for regressors contaminated with
classical measurement error from the class of ordinary smooth distributions, for example, gamma
or Laplace. In that paper, it was shown that average derivative estimators, constructed using de-
√
convolution techniques, were able to retain the n rate of convergence enjoyed by their correctly
Financial support from the ERC Consolidator Grant (SNP 615882) is gratefully acknowledged (Otsu).
1
measured counterparts. However, this result relied on knowledge of the true error distribution
and did not cover the case of supersmooth error densities, which includes Gaussian error.
Extending these results to supersmooth measurement error is not a trivial extension, and it
is not clear a priori whether this parametric rate can be achieved in this case. Indeed, in many
estimation and testing problems, convergence rates and asymptotic distributions are fundamen-
tally different between ordinary smooth and supersmooth error densities (see, for example, Fan,
1991, van Es and Uh, 2005, Dong and Otsu, 2018, and Otsu and Taylor, 2019).
Furthermore, no result has been provided regarding the asymptotic properties of average
derivative estimators in the more realistic situation where the measurement error density is
unknown. Much recent work in the errors-in-variables literature has been aimed at relaxing the
assumption of a known measurement error distribution, and deriving the asymptotic properties
of estimators and test statistics in this setting (see, for example, Delaigle, Hall and Meister, 2008,
Dattner, Reiß and Trabs, 2016, and Kato and Sasaki, 2018).
Measurement error is rife in datasets from all fields. It is a problem that affects economic,
medical, social, and physical data sets, to name just a few. In response to the slow convergence
rates achieved by nonparametric deconvolution techniques, practitioners may shy away from the
use of these estimators in the face of classical measurement error. By showing that we can still
obtain a parametric rate of convergence even in the worst case scenario of supersmooth error and
an estimated error characteristic function, we hope to encourage greater use of nonparametric
estimation in applied work when covariates are contaminated with error.
Moreover, since the curse of dimensionality (which plagues all nonparametric estimators) is
exacerbated in the presence of measurement error, the potential gain from using average deriva-
tives is increased when regressors are mismeasured. In particular, in the case of ordinary smooth
error densities, the convergence rate of deconvolution estimators, although slower than stan-
dard nonparametric estimators, remains polynomial. However, for supersmooth densities, this
convergence typically deteriorates to a log(n) rate.
In the next section, we describe the setup of our model, discuss the assumptions imposed, and
provide our main result. All mathematical proofs are relegated to the Appendix.
2
2. Main result
X = X ∗ + ,
where g 0 and f 0 are the first-order derivatives of g and f , respectively. The second equality
follows from using integration by parts (see Lemma 2.1 of Powell, Stock and Stoker, 1989).
The key use of such density weighted average derivatives is in single-index models and partially
linear single-index models. Taking g(X) = g(X10 β, X2 ) for some unknown link function g with
X = (X1 , X2 ), we obtain the partially linear case; when X2 is removed, this becomes the single-
index model. Such specifications are very general and cover a wide variety of regression models.
For example, binary choice models, truncated and censored dependent variable models, and
duration models (see Ichimura, 1993, for a more detailed discussion). They can also be used as
a simple dimension reduction solution to the curse of dimensionality.
For identification purposes, it is necessary to make some normalization restriction on β. This
is because any scaling factor can be subsumed into g. Hence, this parameter of interest is only
identified up to scale. Due to the linear index structure, the density weighted average derivative
identifies this scaled β.
Pn ˜0 (X ∗ ), where
If we directly observe X ∗ , θ can be estimated by the sample analog − n2 j=1 Yj f j
3
√
by applying the deconvolution method. Let i = −1 and f ft be the Fourier transform of a
function f . If f is known, based on the i.i.d. sample {Yj , Xj }nj=1 of (Y, X), the densities f and
h can be estimated by
n n
˜ 1 X x − Xj 1 X x − Xj y − Yj
f (x) = K , h̃(x, y) = 2 K Ky ,
nbn bn nbn bn bn
j=1 j=1
for j = 1, . . . , n. Under the assumption that f is symmetric, its Fourier transform fft can be
estimated by (Delaigle, Hall and Meister, 2008)
1/2
n
1 X
fˆ (t) =
ft r
cos{t(Xj − Xj )} . (4)
n j=1
where
ˆ
1 K ft (t)
K̂(x) = e−itx dt.
2π fˆft (t/bn )
Then the parameter θ can estimated by
ˆ
θ̂ = −2 y fˆ0 (x)ĥ(x, y)dxdy
n n ˆ
2 XX 0 x − Xj x − Xk
= − Yk K̂ K̂ dx, (5)
n2 b3n bn bn
j=1 k=1
4
where fˆ0 and K̂0 are the first-order derivatives of fˆ and K̂0 , respectively, and the second equality
´
follows from yKy ((y − Yk )/bn )dy = bn Yk . Here we have derived the estimator for the case of a
continuous Y . However, our estimator θ̂ in (5) can be applied to the case of a discrete Y as well.
Throughout this paper, we will focus on the case of a single covariate to keep the notation
simple. The proposed method, however, can easily adapt to the multivariate case. In particular,
when there are multiple covariates and one of them is mismeasured, i.e.,
Y = g(X ∗ , Z) + u,
ˆ ˆ
n X n ∂K z−Zj
2 X x − Xj x − Xk z bn z − Zk
θ̂d = − Yk K K Kz dxdz.
n2 b2D+3
n j=1 k=1
bn bn ∂zd bn
5
We expect that analogous results to our main theorem can be established for this estimator as
well.
2.2. Asymptotic properties. We now investigate the asymptotic properties of the average
derivative estimator θ̂ in (5). Let G = gf . For ordinary smooth measurement error densities, we
impose the following assumptions.
Assumption OS.
(1): {Yj , Xj , Xjr }nj=1 is an i.i.d. sample of (Y, X, X r ) satisfying (1). g(·) = E[Y |X ∗ = ·]
has p continuous, bounded, and integrable derivatives. The density function f (·) of X ∗
has (p + 1) continuous, bounded, and integrable derivatives, where p is a positive integer
satisfying p > α + 1.
(2): (, r ) are mutually independent and independent of (Y, X ∗ ), the distributions of
and r are identical, absolutely continuous with respect to the Lebesgue measure, and the
characteristic function fft is of the form
1
fft (s) ∼ Pα v
for all t ∈ R,
v=0 Cv |s|
Also K ft is compactly supported on [−1, 1], symmetric around zero, and bounded.
−2(1+3α) −1/2 → 0, and n1/2 bp → 0 as n → ∞.
(4): n−1/2 bn log(b−1
n ) n
The i.i.d. restriction on the data from Assumption (1) is standard in the literature and is
imposed merely for ease of derivation rather than necessity. The second part of this assumption
requires sufficient smoothness from the regression function and density function of X relative to
the smoothness of the measurement error. Assumption (2) is the conventional ordinary smooth
assumption for the measurement error. Assumption (3) requires a kernel function of order p to
remove the bias term from the nonparametric estimator. The first part of Assumption (4) requires
that the bandwidth does not decay to zero too quickly as n → ∞. This is necessary to ensure
6
the asymptotic linearity of the estimator and apply a Hoeffding projection. The particular rate
depends on the parameters of the measurement error characteristic function. The second part of
Assumption (4) ensures the bandwidth approaches zero sufficiently fast to remove the asymptotic
bias from the nonparametric estimator. Finally, Assumption (5) is a high-level assumption on
the boundedness of the asymptotic variance of the average derivative estimator.
For the supersmooth case, we impose the following assumptions.
Assumption SS.
(1): {Yj , Xj , Xjr }nj=1 is an i.i.d. sample of (Y, X, X r ) satisfying (1). g(·) = E[Y |X ∗ = ·]
and the Lebesgue density f (·) of X ∗ are infinitely differentiable.
(2): (, r ) are mutually independent and independent of (Y, X ∗ ), the distributions of
and r are identical, absolutely continuous with respect to the Lebesgue measure, and the
characteristic function fft is of the form
γ
fft (t) = Ce−µ|t| for all t ∈ R,
Also K ft is compactly supported on [−1, 1], symmetric around zero, and bounded.
−γ
(4): bn → 0 and n−1/2 b−2
n e
6µbn log(b−1 )−1/2 → 0 as n → ∞.
n
Many of the same comments as for the ordinary smooth case apply to this setting. However,
the second part of Assumption (2) is more restrictive and appears to be necessary. As discussed
in Meister (2009), one can show that the class of infinitely differentiable functions still contains
a comprehensive nonparametric class of densities’ (pp. 44), including, of course, Gaussian and
mixtures of Gaussians. For the regression function, all polynomials satisfy this restriction, as
well as circular functions, exponentials, and products or sums of such smooth functions. As-
sumption (2) is the conventional supersmooth assumption for the measurement error, with the
non-standard additional constraint on γ being even. Although this rules out the Cauchy dis-
tribution (where γ = 1), importantly, this still contains the canonical Gaussian distribution as
7
well as Gaussian mixtures. van Es and Gugushvili (2008) imposed a similar constraint, although
they restrict themselves further to γ = 2. Assumption (3) requires an infinite-order kernel func-
tion; these are often required in supersmooth deconvolution problems. Meister (2009) discussed
sin(x)
their construction and noted that the commonly used sinc kernel, K(x) = πx , satisfies the
requirements. Assumption (4) requires the bandwidth to decay to zero at a logarithmic rate. In
particular, because we are using an infinite-order kernel, we can ignore concerns of the bias from
the nonparametric estimator and choose a bandwidth of at least bn = O (13µ)1/γ log(n)−1/γ
to satisfy this assumption.
Based on these assumptions, our main result is as follows.
√ d
n(θ̂ − θ) → N (0, 4V ar(r(X, Y ))).
√
The most important aspect of this result is the n convergence of the estimator. Before this
result, Powell, Stock and Stoker (1989) showed the same rate of convergence in the case of cor-
rectly measured regressors, and Fan (1995) confirmed this result for ordinary smooth error in the
regressors when the error distribution is known. The above theorem shows that the convergence
rate of these average derivative estimators does not change when measurement error is intro-
duced. In particular, it does not change in the severely ill-posed case of supersmooth error, nor
does it change when the measurement error distribution is estimated. Interestingly, as outlined
in the Appendix, the asymptotic variance depends on the symmetry of the measurement error
density. When the measurement error is symmetric around zero, remainder terms associated
with the estimation error of the measurement error characteristic function vanish, and the as-
ymptotic variance is the same as if the measurement error distribution is known; however, this
is not the case for asymmetric distributions.
√
In pointwise estimation and testing problems, n convergence is typically not attained. For
example, Holzman and Boysen (2006) showed that the integrated squared error of deconvolu-
tion estimators has a fundamentally different asymptotic distribution in the face of supersmooth
measurement error in comparison to the case of ordinary smooth error. While Fan (1991) showed
that deconvolution estimators under supersmooth contamination attain a log(n) rate of conver-
gence whereas ordinary smooth measurement error results in a polynomial rate of convergence
in n. In this paper, we show that this discontinuity in the properties of deconvolution estimators
facing supersmooth or ordinary smooth error does not continue to hold for averaged estimators.
8
As a by-product of the proof, we also establish the asymptotic distribution of Fan’s (1995)
estimator for θ when the distribution of is known and supersmooth.
Corollary. Suppose Assumption SS holds true without the repeated measurement X r . Then the
estimator θ̃ defined by replacing K̂ in (5) with K satisfies
√ d
n(θ̃ − θ) → N (0, 4V ar(r(X, Y ))).
9
Appendix A. Proof of theorem (supersmooth case)
Since the arguments are similar, we first present a proof for the supersmooth case. In Section
C, we provide a proof for the ordinary smooth case by explaining in detail the parts of the proof
that differ to the supersmooth setting.
ˆ = 1 Pn ξl (t) for ξl (t) = cos(t(Xl − X r )), and ξ(t) = |f ft (t)|2 . Note that fˆft (t) =
Let ξ(t) n l=1 l
ˆ 1/2 and f (t) = |ξ(t)|1/2 . By expansions around ξ(t/b
|ξ(t)| ˆ n ) = ξ(t/bn ), we obtain
where
ˆ (
ˆ
)
1 −itx ft ξ(t/b n ) − ξ(t/bn )
A1 (x) = − e K (t) dt,
4π |ξ(t/bn )|3/2
ˆ (
ˆ
)
i ξ(t/b n ) − ξ(t/b n )
A2 (x) = e−itx tK ft (t) dt,
4π |ξ(t/bn )|3/2
ˆ (ˆ )
1 1 1 ξ(t/b n ) − ξ(t/bn )
R1 (x) = − e−itx K ft (t) − dt
4π ˜
|ξ(t/b n )|
1/2 |ξ(t/bn )|1/2 |ξ(t/bn )|
ˆ ( )(
ˆ 1/2 − |ξ(t/b )|1/2
)
1 −itx ft 1 1 |ξ(t/b n )| n
− e K (t) − dt,
2π ˆ
|ξ(t/b n )|
1/2 |ξ(t/bn )|1/2 |ξ(t/bn )|1/2
ˆ (ˆ )
i −itx ft 1 1 ξ(t/bn ) − ξ(t/bn )
R2 (x) = e tK (t) − dt
4π ˜
|ξ(t/b n )|
1/2 |ξ(t/bn )|1/2 |ξ(t/bn )|
ˆ ( )(
ˆ 1/2 − |ξ(t/b )|1/2
)
i 1 1 | ξ(t/b n )| n
+ e−itx tK ft (t) − dt,
2π ˆ
|ξ(t/b n )|
1/2 |ξ(t/bn )|1/2 |ξ(t/bn )|1/2
˜
for some ξ(t/b ˆ
n ) ∈ (ξ(t/bn ), ξ(t/bn )). Thus, we can decompose
n n ˆ
2 XX 0 x − Xj x − Xk
θ̂ = − 2 3 Yk K̂ K̂ dx = S + T1 + · · · + T6 , (6)
n bn bn bn
j=1 k=1
where
n n ˆ
2 XX 0 x − Xj x − Xk
S = − 2 3 Yk K K dx
n bn bn bn
j=1 k=1
n n ˆ
2 XX 0 x − Xj x − Xk
− 2 3 Yk K A1 dx
n bn bn bn
j=1 k=1
n n ˆ
2 XX x − Xj x − Xk
− 2 3 Yk A2 K dx,
n bn bn bn
j=1 k=1
10
n n ˆ
2 XX 0 x − Xj x − Xk
T1 = − Yk K R 1 dx,
n2 b3n bn bn
j=1 k=1
n n ˆ
2 XX x − Xj x − Xk
T2 = − Yk R 2 K dx,
n2 b3n bn bn
j=1 k=1
n n ˆ
2 XX x − Xj x − Xk
T3 = − 2 3 Yk A2 A1 dx,
n bn bn bn
j=1 k=1
n n ˆ
2 XX x − Xj x − Xk
T4 = − 2 3 Yk R2 A1 dx,
n bn bn bn
j=1 k=1
n n ˆ
2 XX x − Xj x − Xk
T5 = − 2 3 Yk A2 R1 dx,
n bn bn bn
j=1 k=1
n n ˆ
2 XX x − Xj x − Xk
T6 = − 2 3 Yk R2 R1 dx.
n bn bn bn
j=1 k=1
T1 , . . . , T6 = op (n−1/2 ). (7)
ˆ ˆ
x−X
j
n X
n −it ˜ −1/2 − |ξ(t/b )|−1/2 }
e tK ft (t){|ξ(t/bn )|
b
i X n
n x − Xk
T2,1 = − Yk dtK dx,
2πn2 b3n ˆ
×|ξ(t/bn )|−1 {ξ(t/b bn
j=1 k=1 n ) − ξ(t/bn )}
ˆ ˆ
x−X
j
n X
n −it ˆ −1/2 − |ξ(t/b )|−1/2 }
e tK ft (t){|ξ(t/bn )|
b
i X n
n x − Xk
T2,2 = − 2 3 Yk dtK dx.
πn bn ˆ
×|ξ(t/bn )|−1/2 {|ξ(t/b 1/2 − |ξ(t/b )|1/2 } bn
j=1 k=1 n )| n
ˆ it Xjb−X
k
n X n ˜
ft (t)K ft (−t){|ξ(t/b −1/2 − |ξ(t/b )|−1/2 }
1 X e n tK n )| n
|n1/2 T2,1 | =
3/2 b2
Yk dt
2πn n j=1 k=1
×|ξ(t/bn )| −3/2 ˆ
{ξ(t/bn ) − ξ(t/bn )}
!
1/2 −2 ˜ −1/2 −1/2 −3/2 ˆ
= Op n b sup {|ξ(t)| − |ξ(t)| }|ξ(t)| {ξ(t) − ξ(t)}
n
|t|≤b−1
n
4µb−γ
= Op n1/2 b−2
n e
n
%2n = op (1),
where the first equality follows from a change of variables, the second equality follows from
´
X −X
it j k
= 1, 1 n |Yk | = Op (1), and |tK ft (t)K ft (−t)| < ∞ (by Assumption SS (3)),
P
e bn
n k=1
˜
the third equality follows from the definition of ξ(t), Assumption SS (2), and Lemma 1, and
11
the last equality follows from Assumption SS (4). A similar argument yields T2,2 = op (n−1/2 ),
and thus T2 = op (n−1/2 ). Also, using similar arguments as for T2 , gives T1 = op (n−1/2 ) and
T3 = op (n−1/2 ).
For T4 , we decompose T4 = T4,1 + T4,2 , where
x−X
j
−it ˜
´ e ft
tK (t){|ξ(t/bn )| −1/2 − |ξ(t/bn )|−1/2 }
bn
ˆ
n X n dt
i X n o
T4,1 = Yk
×|ξ(t/b n )| ˆ
−1 ξ(t/b
n ) − ξ(t/bn ) dx,
8π 2 n2 b3n
j=1 k=1 ´ −it x−Xk
× e bn ˆ
K ft (t)|ξ(t/bn )|−3/2 {ξ(t/b n ) − ξ(t/bn )}dt
x−X
j
´ e−it bn tK ft (t){|ξ(t/bˆ )| −1/2 − |ξ(t/b )|−1/2 }
n X n ˆ n n
dt
i X
T4,2 = Yk ˆ
×|ξ(t/bn )|−1/2 {|ξ(t/b n )|
1/2 − |ξ(t/b )|1/2 }
n
dx.
2
4π n bn2 3
j=1 k=1 ´ −it x−Xk
× e bn ˆ
K ft (t)|ξ(t/bn )|−3/2 {ξ(t/b n ) − ξ(t/bn )}dt
5µb−γ
= Op n1/2 b−2
n e n
%3
n = op (1),
X −X
it j k
where the first equality follows from a change of variables, e bn = 1, 1 Pn |Yk | =
n k=1
´
ft ft
Op (1), and |tK (t)K (−t)|dt < ∞ (by Assumption SS (3)), the second equality follows from
˜
the definition of ξ(t), Assumption SS (2), and Lemma 1, and the last equality follows from
Assumption SS (4). A similar argument yields T4,2 = op (n−1/2 ), and thus T4 = op (n−1/2 ). Also,
similar arguments as used for T4 imply T5 = op (n−1/2 ).
12
For T6 , we decompose T6 = T6,1 + T6,2 + T6,3 + T6,4 , where
x−X
j
−it
´ e bn
tK ˜
ft (t){|ξ(t/b
n )| −1/2 − |ξ(t/b )|−1/2 }
n
dt
n n ˆ
−1 ˆ
i ×|ξ(t/bn )| {ξ(t/bn ) − ξ(t/bn )}
X X
T6,1 = Yk dx
2 2 3
x−Xk
8π n bn −it
´ e bn ˜
K ft (t){|ξ(t/b n )|
−1/2 − |ξ(t/b )|−1/2 }
j=1 k=1
n
× dt
ˆ
×|ξ(t/b )|−1 {ξ(t/b n ) − ξ(t/b )}
n
n
x−X
j
´ e−it bn tK ft (t){|ξ(t/bˆ )| −1/2 − |ξ(t/b )|−1/2 }
n n
dt
n n ˆ
−1/2 ˆ 1/2 1/2
i ×|ξ(t/bn )| {|ξ(t/bn )| − |ξ(t/bn )| }
X X
T6,2 = Yk dx
2 2 3
x−Xk
4π n bn −it
´ e bn ˜
K ft (t){|ξ(t/bn )|
−1/2 − |ξ(t/b )|−1/2 }
j=1 k=1 n
× dt
ˆ
|ξ(t/b )|−1 {ξ(t/b n ) − ξ(t/b )} n
n
x−X
j
´ e−it bn tK ft (t){|ξ(t/b˜ )| −1/2 − |ξ(t/b )|−1/2 }
n n
dt
n n ˆ
−1 ˆ
i ×|ξ(t/bn )| {ξ(t/bn ) − ξ(t/bn )}
X X
T6,3 = Yk dx
2 2 3
x−Xk
4π n bn −it
´ e bn ˆ
K ft (t){|ξ(t/b )| −1/2 − |ξ(t/b )|−1/2 }
j=1 k=1
n n
× dt
×|ξ(t/b )| −1/2 ˆ
{|ξ(t/b
n )| − |ξ(t/b )|1/2 }
1/2
n
n
x−X
j
´ e−it bn tK ft (t){|ξ(t/bˆ )| −1/2 − |ξ(t/b )|−1/2 }
n n
dt
n n ˆ
ˆ
i ×|ξ(t/bn )|−1/2 {|ξ(t/b n )|
1/2 − |ξ(t/b )|1/2 }
XX n
T6,4 = Yk dx
x−Xk
2π 2 n2 b3n
−it
´ e bn ˆ
K ft (t){|ξ(t/b )| −1/2 − |ξ(t/b )|−1/2 }
j=1 k=1
n n
× dt
×|ξ(t/b )| −1/2 ˆ
{|ξ(t/b
n )| − |ξ(t/b )|1/2 }
1/2
n
n
Since T6,2 and T6,3 are cross-product terms, it is enough to focus on T6,1 and T6,4 . For T6,1 , we
have
!
n1/2 b−2
1/2 ˜ −1/2 −1/2 2 −2 ˆ 2
|n T6,1 | = Op n sup {|ξ(t)| − |ξ(t)| } |ξ(t)| {ξ(t) − ξ(t)}
|t|≤b−1
n
6µb−γ
= Op n1/2 b−2
n e
n
%4n = op (1),
X −X
it j k
= 1, 1 n |Yk | =
P
where the first equality follows from a change of variables, e bn
n k=1
´
ft ft
Op (1), and |tK (t)K (−t)|dt < ∞ (by Assumption SS (3)), the second equality follows from
˜
the definition of ξ(t), Assumption SS (2), and Lemma 1, and the last equality follows from
Assumption SS (4). A similar argument yields T6,4 = op (n−1/2 ), and thus T6 = op (n−1/2 ).
Combining these results, we obtain (7).
We now consider the term S in (6). Let dj = (Yj , Xj , ξj ) and
13
where
´
x−Xj
K0 Yk K x−X
bn bn
k
dx
+ i ´ ´
x−X n
j o
−it ξl (t/bn )−E[ξl (t/bn )]
qn (dj , dk , dl ) = − 3b13 4π e bn
Yk K x−X bn
k
dx |ξ(t/bn )|3/2
tK ft (t)dt .
n
´ ´
n
x−Xk o
x−Xj −it ξl (t/bn )−E[ξl (t/bn )]
1
− 4π K0 bn Yk e bn
dx |ξ(t/b )|3/2
K ft (t)dt
n
We show that
S1 , . . . , S4 = op (n−1/2 ), (8)
|n1/2 S1 |
P
n Pn ´
x−Xj
x−Xk
j=1 k=j+1 K0 Y dx
bn k K bn
´ ´ −it
x−X
j n o
ξl (t/bn )−E[ξl (t/bn )]
Pn Pn
i x−Xk ft
= O(n−5/2 b−3
n )
+ j=1 k=j+1 4π e bn
Yk K dx tK (t)dt
bn |ξ(t/bn )|3/2
´ ´ 0 x−Xj
n
x−Xk o
Pn Pn 1 −it ξl (t/bn )−E[ξl (t/bn )] ft
+ j=1 k=j+1 4π
K bn Yk e bn
dx |ξ(t/b )| 3/2 K (t)dt
n
≡ S1,1 + S1,2 + S1,3 .
ˆ ˆ ˆ
n n tX +sX ft ft
X X 1 −i(s+t)x/bn i k j K (s) K (t)
S1,1 = O(n−5/2 b−2
n ) Y k e dx ise bn
dsdt
j=1 k=j+1 bn fft (s/bn ) fft (t/bn )
2µb−γ
= Op n−1/2 b−2 n e
n
= op (1),
tX +sX
i k j 1 Pn
where the second equality follows from a change of variables, e k=1 |Yk |
bn = 1, =
n
Op (1), and Assumption SS (3), and the last equality follows from Assumption SS (4). For S1,2 ,
a similar argument as used for T3 can be used to show
4µb−γ
S1,2 = Op n−1/2 b−2
n e n
% n = op (1).
14
Furthermore, the same arguments can be used to show S2 , S3 , S4 = op (n−1/2 ).
Pn
We now analyze the main term U . Let rn (dj ) = E[pn (dj , dk , dl )|dj ] and Û = θ+ n3 j=1 {rn (dj )−
then it holds
n
3X
U =θ+ {rn (dj ) − E[rn (dj )]} + op (n−1/2 ). (10)
n
j=1
E[pn (dj , dk , dl )2 ]
"ˆ 2 #
1 0 x − Xj x − Xk
≤ E K Yk K dx
3b6n bn bn
" ˆ ˆ x−X 2 #
1 i −it j x − X k ξ l (t/b n ) − E[ξ l (t/b n )]
+ 6E e bn
Yk K dx tK ft (t)dt
3bn 4π bn |ξ(t/bn )|3/2
" ˆ ˆ 2 #
− −
x−Xk
1 1 x X j −it ξ l (t/b n ) E[ξ l (t/bn )]
+ 6E K0 Yk e bn
dx K ft (t)dt
3bn 4π bn |ξ(t/bn )|3/2
≡ P1 + P2 + P3 .
For P1 ,
ˆ ˆ ˆ ˆ ˆ 2
1 0 sj + tj − sk − tk
P1 = K (z)K z + dz E[Y 2 |X ∗ = sk ]
3b4n bn
×f (sk )f (sj )fv (tk )fv (tj )dsk dsj dtk dtj
ˆ ˆ ˆ ˆ s −s
1 −i(w1 +w2 ) jb k 2 ∗
= e n E[Y |X = sk ]f (sk )f (sj )dsk dsj
12π 2 b4n
w1 w2 |K ft (w1 )|2 |K ft (w2 )|2
× dw1 dw2
|fft (w1 /bn )|2 |fft (w2 /bn )|2
4µb−γ
= O b−4n e n
,
x−sj −tj
where the first equality follows by the change of variables z = bn , the second equality follows
by Lemma 2, and the penultimate equality follows from Assumption SS (2). Thus Assumption
SS (4) guarantees P1 = o(n).
For P2 , Lemma 2 implies
ˆ ˆ ft
i −itz K (t)
te {ξl (t/bn ) − E[ξl (t/bn )]}dt K(z − c)dz
4π fft (t/bn )3
ˆ
i we−iwc |K ft (w)|2
= {ξl (w/bn ) − E[ξl (w/bn )]}dw.
4π |fft (w/bn )|4
15
Then we can write
" ˆ ˆ 2 #
ft
1 2 i −itz K (t) x − Xk
P2 = E Yk te {ξl (t/bn ) − E[ξl (t/bn )]}dt K dx
3b6n 4π fft (t/bn )3 bn
2
ˆ ˆ ˆ ˆ
x−s −u
j j
−it ft
te bn
K (t)
1 i
x − sk − uk
= ··· E dtK dx
3bn6 4π ×f ft (t/b )−3 {ξ (t/b ) − E[ξ (t/b )]} bn
n l n l n
×E[Y 2 |X ∗ = sk ]f (sk )f (sj )fv (uk )fv (uj )dsk dsj duk duj
ˆ ˆ ˆ ˆ s −s
1 −i(w1 +w2 ) jb k 2 ∗
= e n E[Y |X = sk ]f (sk )f (sj )dsk dsj
12π 2 b4n
w1 w2 |K ft (w1 )|2 |K ft (w2 )|2
× E[{ξl (w1 /bn ) − E[ξl (w1 /bn )]}{ξl (w2 /bn ) − E[ξl (w2 /bn )]}]dw1 dw2
|fft (w1 /bn )|6 |fft (w2 /bn )|6
12µb−γ
= O b−4n e n
log(b −1 −1
n ) = o(n),
where the third equality follows from a similar argument as for P1 combined with Kato and
Sasaki (2018, Lemma 4) to bound {ξl (w1 /bn ) − E[ξl (w1 /bn )}, and the last equality follows from
Assumption SS (4). The order of P3 can be shown in an almost identical manner, and we obtain
(9).
Combining (6), (7), (8), (10), and a direct calculation to characterize rn (dj ) = E[pn (dj , dk , dl )|dj ],
it follows
n
√ 3 X
n(θ̂ − θ) = √ {rn (dj ) − E[rn (dj )]} + op (1),
n
j=1
n n ˆ
2 X 1 X ξj (t/bn ) − E[ξj (t/bn )]
= √ 3 {ηj − E[ηj ]} − √ 3 ∆(t) K ft (t)dt + op (1), (11)
nbn
j=1
2π nbn
j=1
|ξ(t/bn )|3/2
where
ˆ
∗
∗
x − Xj 0 x−X 0 x−X
ηj = K E YK − Yj E K dx,
bn bn bn
ˆ ˆ
x − X∗ ∗
0 x−X
−it x−X −it x−X
∆(t) = itE e bn
E YK −E K E Ye bn
dx.
bn bn
16
For the first term in (11), note that
ˆ
ηj 1
= K(z){q1 (Xj + bn z) − Yj q2 (Xj + bn z)}dz
b3n b2n
+∞ ˆ ˆ
X (−1)l bl n
= K(z)K(w)(w − z)l dwdz{G(l+1) (Xj ) − Yj f (l+1) (Xj )}
l!
l=0
+∞ ˆ
X (−1)l bln
= K(z)z l dz{G(l+1) (Xj ) − Yj f (l+1) (Xj )}
l!
l=0
∞
X µh
= {G(hγ+1) (Xj ) − Yj f (hγ+1) (Xj )},
ihγ Ch!
h=0
h ∗
i h i
where the first equality follows by q1 (x) = E Y K 0 x−X and q (x) = E K 0 x−X ∗ and
bn 2 bn
x−Xj
the change of variable z = bn , the second equality follows by Lemma 4, the third equality
follows from Assumption SS (3), and the last equality follows by Lemma 5.
ξj (t/bn )−E[ξj (t/bn )]
Let Ξj (t) = |ξ(t/bn )| . For the second term in (11), we have
ˆ
ξj (t/bn ) − E[ξj (t/bn )]
∆(t) 3/2
K ft (t)dt
|ξ(t/bn )|
ˆ ˆ ˆ
x − x∗
ft −itx/bn ∗ ∗
= i tf (t/bn ) e K G(x )dxdx Ξj (t)K ft (t)dt
bn
ˆ ˆ ˆ ∗
ft −itx/bn 0 x − x ∗ ∗
− G (t/bn ) e K f (x )dxdx Ξj (t)K ft (t)dt
bn
ˆ ˆ ˆ
x − x∗
ft −itx/bn ∗ ∗
= i tf (t/bn ) e K G(x )dxdx Ξj (t)K ft (t)dt
bn
ˆ ˆ ˆ
x − x∗
ft −itx/bn ∗ ∗
−i tG (t/bn ) e K f (x )dxdx Ξj (t)K ft (t)dt
bn
ˆ
= ibn tK ft (t)f ft (t/bn )K ft (−t)Gft (−t/bn )Ξj (t)dt
ˆ
−ibn tK ft (−t)f ft (−t/bn )K ft (t)Gft (t/bn )Ξj (t)dt
= 0,
where the first equality follows from the definition of ∆(t), the second equality follows by inte-
´ −itx/b 0 x−x∗ ´ −itx/b
x−x∗
gration by parts, that is e n K bn dx = it e n K bn dx, the third equality
follows from a change of variables, and the last equality follows from symmetry of ξj (t) and ξ(t)
(which implies symmetry of Ξj (t)).
Therefore, the conclusion follows by the central limit theorem.
17
Appendix B. Lemmas
ˆ ˆ
!−1
ft 2
1 w|K (w)|
K0 (z)K(z)dz| = |fft (w)|
| dw = O inf ,
2π |fft (w/bn )|2 |w|≤b−1
n
where the second equality follows from compactness of support of K ft (Assumption SS (3)). The
conclusion follows by Assumption SS (2).
18
Lemma 4. Under Assumption SS (1) and (3), it holds
Xj − Xk∗
X +∞
(−1)l bl+2 (l+1) (X ) ˆ
0 n G j
E Yk K z + Xj = K(w)(w − z)l dw,
bn l!
l=0
Xj − Xk∗
X +∞
(−1)l bl+2 (l+1) (X ) ˆ
0 n f j
E K z+ Xj = K(w)(w − z)l dw.
bn l!
l=0
+∞ ˆ
X (−1)l bl+2 G(l+1) (Xj )
n
= K(w)(w − z)l dw,
l!
l=0
Xj −s
where the second equality follows by the change of variable w = z + bn , the third equality
follows by integration by parts, the fourth equality follows by an expansion of G0 (Xj − bn (w − z))
around Xj . The second statement can be proved by similar arguments.
ˆ
µp/γ p!
p
bpn ip C(p/γ)!
for p = hγ with h = 0, 1, . . . ,
K(z)z dz =
0 for other positive integers.
where the first equality follows by Assumption SS (2) and K ft (t) = K ft (−t), the second equality
P+∞ uh
follows by eu = (l) ft l ft
h=0 h! , the third equality follows by (K ) (t) = (−it) K (t) (see, e.g.,
Lemma A.6 of Meister, 2009), the fourth equality follows by (K (hγ) )ft (−t) = (K (hγ) )ft (t), which
is from K ft (t) = K ft (−t), (K (l) )ft (t) = (−it)l K ft (t), and the assumption that γ is even. Thus,
we have
ˆ +∞ ˆ
p
X µh
K(z)z dz = z p K (hγ) (z)dz,
Ch!(−ibn )hγ
h=0
19
and the conclusion follows by Assumption SS (3) and using the integration by parts.
The steps in this proof are the same as that for the supersmooth case, as such, we only explain
parts of the proof that differ. Furthermore, in the proof of the supersmooth case we endevour to
obtain expressions in terms of fft wherever possible. This allows us to skip to this final step in
each asymptotic argument, and requires us only to input the relevant form for fft . This proof
also leverages much of the work from Fan (1995) but extends this by allowing for an estimated
measurement error density.
As in the proof of the supersmooth case, we have
n n ˆ
2 XX 0 x − Xj x − Xk
θ̂ = − 2 3 Yk K̂ K̂ dx = S + T1 + · · · + T6 ,
n bn bn bn
j=1 k=1
Then we have T2 = op (n−1/2 ) by Lemma 1 and Assumption OS (2). The rest of T1 , T3 , . . . T6 are
shown to be of order op (n−1/2 ) in a similar way.
Again, decompose S = n−2 (n − 1)(n − 2)U + S1 + · · · + S4 , where all objects are defined in
the proof of the supersmooth case. We can show the asymptotic negligibility of S1 , . . . , S4 as
follows. We again decompose |n1/2 S1 | = S1,1 + S1,2 + S1,3 . To bound S1,1 , we write
!−2
S1,1 = Op n−1/2 b−2
n inf |fft (t)| = op (1).
|t|≤b−1
n
where the second equality follows from Assumption OS (2) and (4). Recall from the proof of the
supersmooth case
! !−1
S1,2 = Op n−1/2 b−2
n sup |fˆft (t) − fft (t)| inf |fft (t)|4 = op (1).
|t|≤b−1
n
|t|≤b−1
n
The asymptotic negligibility of S1,3 can be shown in an almost identical way. The same arguments
can also be used to show S2 , S3 , S4 = op (n−1/2 ).
As in the supersmooth case, we also need to show E[pn (dj , dk , dl )2 ] = o(n) in order to write
U = θ + n3 nj=1 {rn (dj ) − E[rn (dj )]} + op (1). We begin by decomposing E[pn (dj , dk , dl )2 ] =
P
20
P1 + P2 + P3 , where these objects are defined in the supersmooth proof. For P1 ,
!−2
P1 = O b−4
n inf |fft (w)|2 = o(n),
|w|≤b−1
n
P2 = O b−4
n inf |fft (w)|6 log(bn )−2 = o(n),
|w|≤b−1
n
by Assumption OS (2) and (4). The order of P3 can be shown in an almost identical manner.
Then, it follows
n
√ 3 X
n(θ̂ − θ) = √ {rn (dj ) − E[rn (dj )]} + op (1),
n
j=1
and the remainder of the proof for the supersmooth case applies as it does not depend on the
form of fft .
21
References
[1] Ahn, H. and J. L. Powell (1993) Semiparametric estimation of censored selection models with a nonparametric
selection mechanism, Journal of Econometrics, 58, 3-29.
[2] Blundell, R., Duncan, A. and K. Pendakur (1998) Semiparametric estimation and consumer demand, Journal
of Applied Econometrics, 13, 435-461.
[3] Das, M., Newey, W. K. and F. Vella (2003) Nonparametric estimation of sample selection models, Review of
Economic Studies, 70, 33-58.
[4] Dattner, I., Reiß, M. and M. Trabs (2016) Adaptive quantile estimation in deconvolution with unknown error
distribution, Bernoulli, 22, 143-192.
[5] Delaigle, A., Hall, P. and A. Meister (2008) On deconvolution with repeated measurements, Annals of
Statistics, 36, 665-685.
[6] Dong, H. and T. Otsu (2018) Nonparametric estimation of additive model with errors-in-variables, Working
paper.
[7] Fan, J. (1991) On the optimal rates of convergence for nonparametric deconvolution problems, Annals of
Statistics, 1257-1272.
[8] Fan, Y. (1995) Average derivative estimation with errors-in-variables, Journal of Nonparametric Statistics,
4, 395-407.
[9] Härdle, W., Hildenbrand, W. and M. Jerison (1991) Empirical evidence on the law of demand, Econometrica,
1525-1549.
[10] Holzmann, H. and L. Boysen (2006) Integrated square error asymptotics for supersmooth deconvolution,
Scandinavian Journal of Statistics, 33, 849-860.
[11] Ichimura, H. (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models,
Journal of Econometrics, 58, 71-120.
[12] Kato, K. and Y. Sasaki (2018) Uniform confidence bands in deconvolution with unknown error distribution,
Journal of Econometrics, 207, 129-161.
[13] Meister, A (2009) Deconvolution Problems in Nonparametric Statistics, Springer.
[14] Otsu, T. and L. Taylor (2019) Specification testing for errors-in-variables models, Working paper.
[15] Powell, J. L., Stock, J. H. and T. M. Stoker (1989) Semiparametric estimation of index coefficients, Econo-
metrica, 57, 1403-1430.
[16] Racine, J. (1997) Consistent significance testing for nonparametric regression, Journal of Business & Eco-
nomic Statistics, 15, 369-378.
[17] van Es, B. and S. Gugushvili (2008) Weak convergence of the supremum distance for supersmooth kernel
deconvolution, Statistics & Probability Letters, 78, 2932-2938.
[18] van Es, B. and H.-W. Uh (2005) Asymptotic normality for kernel type deconvolution estimators, Scandinavian
Journal of Statistics, 32, 467-483.
[19] Yatchew, A. (2003) Semiparametric Regression for the Applied Econometrician, Cambridge University Press.
22
Department of Economics, Southern Methodist University, 3300 Dyer Street, Dallas, TX
75275, US.
Email address: [email protected]
Department of Economics and Business Economics, Fuglesangs Allé 4 building 2631, 12 8210
Aarhus V, Denmark
Email address: [email protected]
23