0% found this document useful (0 votes)

23 views65 pages

Minimum Distance Estimators

This document introduces a Tikhonov regularised (TiR) estimator for estimating functional parameters based on conditional moment restrictions. The TiR estimator minimizes a distance criterion plus a penalty term involving the Sobolev norm of the function, which helps address ill-posedness. The paper studies the asymptotic properties of the TiR estimator, derives its mean integrated squared error rate, and proves its pointwise asymptotic normality. Simulation results show the TiR estimator has good finite sample properties and its data-driven selection of the regularization parameter works well. The TiR estimator is numerically tractable and extends estimation to nonlinear conditional moment models.

Uploaded by

sepwandjitanguep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views65 pages

Minimum Distance Estimators

Uploaded by

sepwandjitanguep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

TIKHONOV REGULARISATION FOR FUNCTIONAL

MINIMUM DISTANCE ESTIMATORS

P. Gagliardini∗ and O. Scaillet†

This version: May 2006

(ﬁrst version: May 2006)

∗
University of Lugano.
†
HEC Genève and Swiss Finance Institute.
Tikhonov Regularisation for Functional Minimum Distance Estimators

Abstract

We study the asymptotic properties of a Tikhonov regularised (TiR) estimator of a func-

tional parameter based on a minimum distance principle for nonparametric conditional mo-

ment restrictions. The estimator is computationally tractable and even takes a closed form

in the linear case. We derive its Mean Integrated Squared Error (MISE), its rate of conver-

gence and its pointwise asymptotic normality under a regularisation parameter depending

on the sample size. The optimal value of the regularisation parameter is characterised. We

illustrate our theoretical ﬁndings and the small sample properties with simulation results

for two numerical examples. We also discuss two data driven selection procedures of the

regularisation parameter via a spectral representation and a subsampling approximation of

the MISE.

Keywords and phrases: Minimum Distance, Nonparametric Estimation, Ill-posed

Inverse Problems, Endogeneity, Generalized Method of Moments, Subsampling, Tikhonov

Regularisation.

JEL classiﬁcation: C13, C14.

AMS 2000 classiﬁcation: 62G08, 62G20.

1
1 Introduction

Minimum distance and extremum estimators have received a lot of attention in the litera-

ture to exploit conditional moment restrictions assumed to hold true on the data generating

process [see e.g Newey and McFadden (1994) for a review]. In a parametric setting, lead-

ing examples are the Ordinary Least Squares estimator, which takes a closed form, and

the Nonlinear Least Squares estimator, which is computed through numerical optimization.

Correction for endogeneity are provided by the Instrumental Variable estimator in the linear

case and by the Generalised Method of Moments estimator in the nonlinear case.

In a functional setting, regression curves are inferred by local polynomial estimators and

sieve estimators. A well known example is the Rosenblatt-Parzen kernel estimator. Recently,

several suggestions have been made to correct for endogeneity in the nonparametric context

as well, mainly motivated by the interest for non-parametric IV estimation of structural equa-

tions. Newey and Powell (NP, 2003) consider the problem of estimating non-parametrically

a regression function, which is the conditional expectation of the dependent variable given

a set of instruments. They propose a consistent minimum distance estimator, which is a

non-parametric analog of the Two-Stage Least Square estimator. The NP methodology ex-

tends to the case of general, nonlinear conditional moment restrictions. Ai and Chen (AC,

2003) follow a similar approach to estimate an unknown function contained in a conditional

moment. Although their focus is more on the eﬃcient estimation of the parametric compo-

nent in a semi-parametric conditional moment speciﬁcation, they show that the estimator

of the functional component converges at a rate faster than T −1/4 in an appropriate metric.

2
Darolles, Florens and Renault (DFR, 2003) and Hall and Horowitz (HH, 2005) also consider

non-parametric estimation of an instrumental regression function, but focus on the linear

case. Their estimation approach is based on the empirical analog of the conditional moment

restriction, seen as a linear integral equation in the unknown functional parameter. HH de-

rive the optimal rate of convergence of their estimator in quadratic mean. [Cite some other

peripheric papers (Chernozukov, Vanhems)].

The main theoretical diﬃculty to overcome in non-parametric estimation with endogene-

ity is ill-posedness. Ill-posedness occurs since the mapping of the reduced form parameter

(that is, the distribution of the data) into the structural parameter (the instrumental regres-

sion function) is not continuous. This may have serious consequences, in particular it can

lead to inconsistency of the estimators. The problem of ill-posedness has been addressed

in the literature in diﬀerent ways. NP and AC propose to introduce bounds on the deriva-

tives of the functional parameter of interest, which amounts to assume a compact parameter

space. In the linear case, DFR and HH adopt a regularisation technique, which results in a

kind of ridge regression approach in a functional setting.

The aim of this paper is to introduce a new minimum distance estimator for a func-

tional parameter identiﬁed by conditional moment restrictions. To address the issue of

ill-posedness, we consider penalized extremum estimators which minimize a criterion of the

type QT (ϕ) + λT G(ϕ), where QT (ϕ) is a minimum distance criterion in the functional pa-

rameter ϕ, G(ϕ) is a penalty function and λT is a positive sequence converging to zero. The

penalty function G(ϕ) corresponds to the Sobolev norm of function ϕ, which involves the

3
L2 norms of both ϕ and its derivative ∇ϕ. The basic idea behind our estimator is that

the term λT G(ϕ) penalizes highly oscillating components of the estimator, which are oth-

erwise unduly enhanced by the minimum distance criterion QT (ϕ) because of ill-posedness.

The amount of regularisation is tuned by parameter λT . We call our estimator a Tikhonov

Regularised (TiR) estimator, since the penalty term is inspired by the pioneering paper of

Tikhonov (1963) on the regularisation of ill-posed inverse problems. We stress that also the

regularisation approach of DFR and HH is an example of Tikhonov regularisation, but with

penalty term involving the L2 norm instead of the Sobolev norm of the parameter. To avoid

confusion, we refer to the DFR and HH estimator as a regularised estimator with L2 norm.

Our paper contributes to the literature along several directions. First, we introduce a

nonparametric estimator for conditional moment restrictions, which admits the following

features: (i) it applies in the general (linear and nonlinear) setting; (ii) the tuning parameter

is allowed to depend on sample size and to be stochastic; (iii) it may have a faster rate

of convergence than the DFR and HH estimator in the linear case; (iv) it admits a closed

form in the linear case. We emphasize that point (ii) is crucial to develop estimators with

data-driven selection of the tuning parameter. This point is not addressed in the setting of

NP and AC, where the tuning parameter is the bound on the Sobolev norm of the estimator

and is assumed ﬁxed in all theoretical results. For the same reason, feature (iv) is not shared

by NP and AC estimator [see Section 2.4 for more details on the links between the TiR

estimator and the literature]. Concerning point (iii), we give in Section 4 the condition

under which this property holds. In our Monte-Carlo experiments in Section 6, we have

4
found a superior performance of the TiR estimator compared to the regularised estimator

with L2 norm. 1

Second, we study rather in depth the asymptotic properties of our estimator. In par-

ticular: (i) we prove the consistency of the TiR estimator; (ii) we derive the asymptotic

expansion of the Mean Integrated Squared Error (MISE) as a function of the sample size

and the (deterministic) regularisation parameter; (iii) we prove the pointwise asymptotic

normality of the TiR estimator. To the best of our knowledge, results (ii) and (iii), as well

as (i) for a sequence of stochastic regularisation parameters, are new for non-parametric

estimators of this type. In particular, the asymptotic expansion of the MISE allows us to

study the eﬀect of the regularisation parameter on the variance term and on the bias term

of the TiR estimator, to deﬁne the optimal sequence of regularisation parameters, and to

derive the associated optimal rate of convergence of the TiR estimator. The methodology

is easily extended to the case of regularisation with L2 norm, so that these results are in-

teresting also for the study of the properties of the DFR and HH estimators. Finally, the

asymptotic expansion of the MISE suggests a procedure for the data-driven selection of the

regularisation parameter, that we implement in the Monte-Carlo study.

Third, we investigate the attractiveness of the TiR estimator from an applied point of

view. In the nonlinear case, the TiR estimator only requires running an unconstrained

optimisation routine instead of a constrained one, and in the linear case it even takes a

closed form. Such a numerical tractability is a key advantage in practice, when using heavy

1
The advantage of the Sobolev norm compared to the L2 norm for regularisation is also pointed out in
a numerical example in Kress (1999), Example 16.21.

5
resampling techniques for example. The ﬁnite sample properties seem very appealing from

our numerical experiments on two examples and two data driven selection procedures of the

regularisation parameter.

The rest of the paper is organized as follows. In Section 2, we ﬁrst introduce the general

setting of non-parametric estimation under conditional moment restrictions and the problem

of ill-posedness. We then deﬁne the TiR estimator and discuss the links with the literature.

In Section 3 we prove the consistency of the TiR estimator. Section 4 is devoted to the

analysis of the MISE and of the optimal rates of convergence of the TiR estimator. The case

of linear moment restrictions is detailed in Section 5. In section 6 we present a Monte-Carlo

study of the ﬁnite sample properties of the TiR estimator. Finally, Section 7 concludes. The

proofs of all results in the paper are gathered in the Appendices. The proofs of technical

Lemmas are collected in a document, which is available by the authors on request.

2 Minimum Distance estimators under Tikhonov reg-

ularisation

In this section we introduce the class of Tikhonov Regularised (TiR) estimators. In Section

2.1 we present the general setting of non-parametric Minimum Distance estimation. In

Section 2.2 we highlight its main issue, namely ill-posedness. In Section 2.3 TiR estimators

are deﬁned as a regularisation method for the ill-posedness problem. Finally, links with

estimators and results currently available in the literature are discussed in detail in Section

2.4.

6
2.1 Nonparametric Minimum Distance estimation

Let {(Yt , Xt , Zt ) : t = 1, ..., T } be i.i.d. variables, and let the support of Xt be X = [0, 1].

Suppose that the parameter of interest is a function ϕ0 deﬁned on X , which satisﬁes the

conditional moment restriction

E0 [g (Y, ϕ0 (X)) | Z] = 0, (1)

where g is a known function. Parameter ϕ0 belongs to a subset Θ of L2 [0, 1], equipped

R
with the L2 scalar product hϕ, ψi = X
ϕ(x)ψ(x)dx and the L2 norm kϕk = hϕ, ϕi1/2 . It

is assumed that Θ is bounded and closed, and that ϕ0 is the unique function ϕ ∈ Θ that

satisﬁes the conditional moment restriction (1).

The non-parametric Minimum Distance approach to estimate ϕ0 as in AC and NP relies

on ϕ0 minimizing the criterion

h 0
i
Q∞ (ϕ) = E0 m (ϕ, Z) Ω0 (Z)m (ϕ, Z) , ϕ ∈ Θ, (2)

where m (ϕ, z) = E0 [g (Y, ϕ (X)) | Z = z], and Ω0 (z) is a p.d. matrix for any given z. This

criterion is well-deﬁned if m (ϕ, z) belongs to L2Ω0 (FZ ), for any ϕ ∈ Θ, where L2Ω0 (FZ )

denotes the L2 space of square integrable vector-valued functions of Z deﬁned by scalar

h 0
i
product hψ1 , ψ 2 iL2Ω (FZ ) = E0 ψ1 (Z) Ω0 (Z)ψ2 (Z) . Then, the idea is to estimate ϕ0 by the
0

minimizer of the empirical counterpart of Criterion (2). For instance, AC and NP estimate

the conditional moment m (ϕ, z) by an orthogonal polynomials approach, and minimize the

empirical criterion over a ﬁnite-dimensional Sieve approximation of Θ based on polynomial

or spline functions.

7
The main diﬃculty in non-parametric Minimum Distance estimation is that, contrary to

the standard parametric case, the assumption that function ϕ0 is identiﬁed in a bounded and

closed parameter set Θ is not suﬃcient in general to get the consistency of the estimator.

This is due to the so-called ill-posedness of such an estimation problem.

2.2 Unidentiﬁability and ill-posedness in Minimum Distance esti-

mation

The goal of this section is to highlight the issue of ill-posedness in Minimum Distance es-

timation [NP; see also Kress (1999), Chapter 15, for a general treatment of ill-posed in-

verse problems, and Carrasco, Florens and Renault (2005) for a survey on inverse problems

in econometrics]. To brieﬂy explain what ill-posedness is, note that solving the equation

E0 [g (Y, ϕ (X)) | Z] = 0 for unknown function ϕ ∈ Θ can be seen as an inverse problem,

which maps the conditional distribution F0 (y, x|z) of (Y, X) given Z = z into the solution

ϕ0 [see Equation (1)]. Ill-posedness arises when this mapping is not continuous. As a conse-

quence, the estimator ϕ̂ of ϕ0 , which is the solution of the inverse problem corresponding to a

consistent estimator F̂ of F0 , is not guaranteed to be consistent. Indeed, by non-continuity,

small deviations of F̂ from F0 may result in large deviations of ϕ̂ from ϕ0 . We refer to

NP for a more in-depth discussion along these lines. In this paper, we prefer to emphasize

the link between ill-posedness and a classical concept in econometrics, namely parameter

identiﬁcation.

To illustrate the main point, let us consider the case of non-parametric linear IV estima-

tion, where g(y, ϕ(x)) = ϕ (x) − y. The moment function m(ϕ, z) = E0 [ϕ (X) − Y | Z = z]

8
can be written as

m(ϕ, z) = (Aϕ) (z) − r (z) = (A∆ϕ) (z) , (3)

R
where ∆ϕ := ϕ − ϕ0 , operator A is deﬁned by (Aϕ) (z) = ϕ(x)f(w|z)dw and r(z) =
R
yf (w|z)dw. Conditional moment restriction (1) identiﬁes ϕ0 if and only if operator A is

injective. The limit criterion in (2) becomes

h 0
i
Q∞ (ϕ) = E0 (A∆ϕ) (Z) Ω0 (Z) (A∆ϕ) (Z) = h∆ϕ, A∗ A∆ϕi, (4)

where A∗ denotes the adjoint operator of A w.r.t. the scalar products h., .i and h., .iL2Ω (FZ ) .
0

Under weak regularity conditions, operator A is compact. Thus, A∗ A is compact and

operator A∗ A, and by µ1 ≥ µ2 ≥ · · · , with µj > 0, the corresponding eigenvalues [see Kress

(1999), Section 15.3, for the spectral decomposition of compact, self-adjoint operators]. By

compactness of A∗ A, the eigenvalues are such that µj → 0. Assume that ϕ0 is an interior

point of parameter set Θ. Then, the limit criterion Q∞ (ϕ) can be minimized by a sequence

in Θ such as

ϕn = ϕ0 + εψn , n ∈ N, (5)

for ε > 0, which does not converge to ϕ0 . Indeed, Q∞ (ϕn ) = ε2 hψn , A∗ Aψ n i = ε2 µn → 0 as

n → ∞, but kϕn − ϕ0 k = ε, ∀n. Since we can chose ε > 0 as small as we want, the usual

identiﬁcation assumption [e.g., White and Wooldridge (1991)]

inf Q∞ (ϕ) > 0 = Q∞ (ϕ0 ), for ε > 0, (6)

ϕ∈Θ:kϕ−ϕ0 k≥ε

9
is not satisﬁed. In other words, function ϕ0 is not identiﬁed in Θ as an isolated minimum

of Q∞ . This is the identiﬁcation problem of Minimum Distance estimation with functional

parameter. Failure of identiﬁcation condition (6) is due to 0 being a limit point of the eigen-

values of operator A∗ A. It applies in the general setting of conditional moment restriction

(1), whenever the linearization of moment function m(ϕ, z) around ϕ = ϕ0 involves a com-

pact operator. This is the maintained assumption in our paper, and is stated below.

Assumption 1 (Ill-posedness): The moment function m(ϕ, z) is such that

m(ϕ, z) = (A∆ϕ) (z) + R (ϕ, z), for any ϕ ∈ Θ, where

Z
∂g
(i) the operator A deﬁned by (A∆ϕ) (z) = (y, ϕ0 (x)) f (w|z) ∆ϕ (x) dw is a compact
∂v
operator in L2 [0, 1] ;

(ii) the second-order term R (ϕ, z) is such that supϕ∈Θ kR (ϕ, .)kL2 / kA∆ϕkL2 < 1.
Ω0 (FZ ) Ω0 (FZ )

Under Assumption 1, the identiﬁcation condition (6) is not satisﬁed, and the Minimum

Distance estimator which minimizes the empirical counterpart of criterion Q∞ (ϕ) over (a

Sieve approximation of) set Θ is not consistent.

2.3 Tikhonov Regularised (TiR) estimators

In this paper, we address the issue of ill-posedness by introducing Minimum Distance estima-

tors based on Tikhonov regularisation. We consider extremum estimators which minimize a

criterion of the type QT (ϕ) + λT G (ϕ), where QT (ϕ) is an empirical counterpart of criterion

Q∞ (ϕ) in (2), G (ϕ) is a penalty function introduced to solve the unidentiﬁability problem

10
arising from ill-posedness, and λT is a sequence converging to zero as sample size T increases.

Functions QT (ϕ) and G (ϕ) are deﬁned next.

The conditional moment m (ϕ, z) = E0 [g (Y, ϕ (X)) | Z = z] can be estimated non-

R
parametrically by m̂ (ϕ, z) = g (y, ϕ (x)) fˆ (w|z) dw, where fˆ (w|z) denotes a kernel es-

timator of the density of (Y, X) given Z = z with kernel K, bandwidth hT , and w = (y, x).

Then, the criterion QT (ϕ) is deﬁned by

T
1X
QT (ϕ) = m̂ (ϕ, Zt )0 ΩT (Zt ) m̂ (ϕ, Zt ) ,
T t=1

where ΩT (z), T ∈ N, is a sequence of p.d. matrices converging to Ω0 (z), P-a.s., for any z.

Diﬀerent choices of penalty function G(ϕ) are possible, leading to consistent estimators

under the assumptions of Theorem 1 in Section 3 below. In this paper, we focus on the

Sobolev norm G(ϕ) = kϕk2H . More precisely, we assume that ϕ0 belongs to some sub-

set Θ of the Sobolev space H 2 [0, 1], which is deﬁned as the completion of linear space

{ϕ ∈ C 1 [0, 1] | ∇ϕ ∈ L2 [0, 1]} with respect to the L2 scalar product h., .i. Sobolev space

H 2 [0, 1] is an Hilbert space w.r.t. the scalar product hϕ, ψiH := hϕ, ψi + h∇ϕ, ∇ψi, and the
1/2
corresponding Sobolev norm is denoted by kϕkH = hϕ, ϕiH .

The Minimum Distance estimator under Tikhonov regularisation with Sobolev norm is

deﬁned next.

Deﬁnition 1: The Tikhonov Regularised (TiR) Minimum Distance estimator is deﬁned by

ϕ̂ = arg inf QT (ϕ) + λT kϕk2H , (7)

ϕ∈ΘT

where λT is a stochastic sequence such that λT ≥ 0 and λT → 0 P-a.s., and (ΘT ) is an

11
increasing sequence of subsets of the Sobolev space H 2 [0, 1], which are compact w.r.t. the

L2 -norm k.k.

The name Tikhonov Regularised (TiR) estimators that we use to characterize the Mini-

mum Distance estimators introduced in Deﬁnition 1 goes back to Tikhonov (1963), in his pi-

oneering paper on the regularisation of ill-posed inverse problems [see Kress (1999), Chapter

16]. The main intuition is that the term λT kϕk2H in the criterion penalizes highly oscillating

components of the estimated function, which would be otherwise unduly enhanced, since the

criterion QT (ϕ) becomes asymptotically ﬂat along some directions because of ill-posedness.

For instance, in the linear IV case where Q∞ (ϕ) = h∆ϕ, A∗ A∆ϕi, these directions corre-

spond to the eigenfunctions ψn of operator A∗ A to eigenvalues µn close to zero, that is for

large n [see Equation (5) and the discussion in Section 2.2]. Typically, ψn is an highly oscil-

lating function and kψn kH → ∞ as n → ∞, so that these directions are penalized by term

G(ϕ) = kϕk2H in the empirical criterion QT (ϕ) + λT kϕk2H . In Theorem 1 in Section 3 below,

we provide precise conditions under which the penalty function G (ϕ) = kϕk2H restores the

validity of the identiﬁcation condition (6) and ensures the consistency of the TiR estimator.

The sequence (λT ) in Deﬁnition 1 controls for the amount of regularisation introduced

by term G (ϕ) = kϕk2H , and how this depends on sample size T . Therefore, λT can be seen

as a tuning parameter (or as a sequence of tuning parameters). The rate of convergence

of λT to zero aﬀects the rate of convergence of the TiR estimator ϕ̂. We will discuss in

Section 4 below the choice of the sequence (λT ) to achieve an optimal rate of convergence of

TiR estimator ϕ
b T , and we will present two global data driven selection procedures for λT in

12
Section 6.

2.4 Links with the literature

The goal of this Section is to discuss the links between the TiR estimator and the diﬀer-

ent approaches proposed in the literature on nonparametric estimation under conditional

moment restrictions.

2.4.1 Regularisation by compactness

To address the issue of ill-posedness, NP and AC [see also Blundell, Chen and Kristensen

(2005)] suggest considering a compact parameter set Θ. In this case, by the same argument

as in the standard parametric setting, the assumption that ϕ0 is the unique function in

Θ which satisﬁes (1) implies identiﬁcation condition (6). Compact sets in L2 [0, 1] can be

deﬁned by imposing a bound on the Sobolev norm kϕkH ≤ B of the functional parameter.

Then, the estimator is obtained by minimization problem (7), where λT is interpreted as a

Kuhn-Tucker multiplier.

Our approach by TiR estimators diﬀer from AC and NP along two directions. On the

one hand, for TiR estimators λT is a free regularisation parameter, whereas λT is tight down

by the slackness condition in NP and AC approach: either λT = 0 or kϕkH = B, P -a.s. As

a consequence, the approach by TiR estimators presents three important advantages.

i) Optimal rates of convergence. Although, for given sample size T , selecting dif-

ferent λT amounts to select diﬀerent B when the constraint is binding, the asymptotic

properties of the TiR estimator and of the estimators with ﬁxed B are diﬀerent. In partic-

13
ular, the adoption of a bound B on the Sobolev norm independent of sample size T implies

in general the selection of a sub-optimal sequence of regularisation parameters λT . Thus,

the NP and AC estimators share rates of convergence which are slower than that of the

TiR estimator with optimally selected sequence of regularisation parameter. The optimal

rates of convergence for the TiR estimator are characterized in Section 4. Finally, note that

letting B = BT grow (slowly) with sample size T is not equivalent to our approach and does

not garantee the consistency of the estimator. Indeed, when BT → ∞, the resulting limit

parameter set Θ is not compact.

ii) Data-driven selection of tuning parameters. For the TiR estimator, the tuning

parameter λT is allowed to depend on sample size T and sample data, whereas in the theo-

retical setting of NP and AC the tuning parameter B is treated as ﬁxed. Thus, our approach

allows for a discussion of asymptotic properties of regularised estimators with data-driven

selection of the tuning parameter [see Proposition 3 in Section 3 for consistency].

iii) Computational tractability. Finally, we emphasize that the TiR estimator fea-

tures computational advantages compared to NP and AC estimators. This is because, for

given λT , the TiR estimator is deﬁned by an unconstrained optimization problem, whereas

inequality constraint kϕkH ≤ B has to be accounted for in the minimization deﬁning esti-

mators with given B. In particular, in the case of linear conditional moment restrictions,

TiR estimators admit a closed form [see Section 5], whereas the computation of the NP and

AC estimator requires a numerical constrained quadratic optimization routine.

On the other hand, a second diﬀerence is that NP, AC and BCK use ﬁnite-dimensional

14
Sieve estimators, whereas in ou approach .... TO BE CONTINUED.

2.4.2 Regularisation with L2 norm

For the special case of non-parametric IV estimation of a single equation model [see Equation

(3)], DFR and HH [see also Carrasco, Florens and Renault (2005)] introduce a regularised

estimator deﬁned by minimization problem (7) with Sobolev norm kϕkH replaced by L2 norm

kϕk in the penalty term, and Ω0 (Z) = 1. Indeed, it is possible to show that the ﬁrst order

condition for such an estimator corresponds to the linear equation (4.1) in DFR, or to the

estimator deﬁned at p. 4 in HH. DFR and HH study the consistency and the optimal rates

of convergence of their estimator for a deterministic sequence of regularisation parameters

λT . In Section 4, we will compare the optimal rate of convergence of regularised estimators

with Sobolev norm and with L2 norm, and give conditions under which the ﬁrst one is larger.

Finally, note that the techniques used in this paper to study the asymptotic properties of

TiR estimators are easily extended to estimators with L2 regularisation and allow to derive

new results for the DFR and HH estimators, such as the asymptotic expansion of the Mean

Integrated Square Error (MISE) in Section 4.

3 Consistency of TiR estimators

In this section we show the consistency of the TiR estimator. To highlight the main idea,

we ﬁrst provide in Section 3.1 a consistency theorem for penalized extremum estimators

minimizing the criterion QT (ϕ) + λT G (ϕ) with a general penalty function G (ϕ). Then,

in Section 3.2 the assumptions of the theorem are particularized to the Sobolev penalty

15
function G (ϕ) = kϕk2H used for the TiR estimator.

3.1 A general consistency result for penalized extremum estima-

tors

Let us consider an extremum estimator of the TiR-type as in Deﬁnition 1 with a general

penalty function G(ϕ)

ϕ̂ = arg inf QT (ϕ) + λT G(ϕ), (8)

ϕ∈ΘT

where QT (ϕ) , (λT ) and ΘT are as in Deﬁnition 1. This estimator is well-deﬁned and

measurable under weak conditions [see Appendix 1].

The consistency of estimator ϕ̂ deﬁned in (8) is stated in the next Theorem.

Theorem 1: Let

p
(i) δ T := supϕ∈ΘT |QT (ϕ) − Q∞ (ϕ)| −→ 0;

(ii) ϕ0 ∈ Θ, and ∪∞ 2
T =1 ΘT is dense in Θ ⊂ H [0, 1];

(iii) For any ε > 0, Cε (λ) := inf ϕ∈Θ:kϕ−ϕ0 k≥ε Q∞ (ϕ) + λG(ϕ) − Q∞ (ϕ0 ) − λG(ϕ0 ) > 0, for

any λ > 0 small enough;

p
(iv) ∃a > 0 such that limλ→0 λ−a Cε (λ) > 0, T a δ T −→ 0, and T a ρT → 0, for any ε > 0,

where ρT := inf ϕ∈ΘT :kϕ−ϕ0 k≤ε Q∞ (ϕ) + |G(ϕ) − G(ϕ0 )|.

Then, under (i)-(iv), for any sequence (λT ) such that λT > 0, λT → 0, P-a.s., and

λT /T → 0, P-a.s., (9)

16
p
ϕT − ϕ0 k −→ 0.
the estimator ϕ̂ deﬁned in (8) is consistent, namely kb

Proof: See Appendix 1.

If G = 0, Theorem 1 corresponds to the standard result of consistency for extremum

estimators [e.g., Wooldridge and White (1991), Corollary 2.6]. Indeed, in this case, Condition

(iii) is the usual identiﬁcation condition (6), whereas Condition (iv) is satisﬁed. Theorem 1

extends this consistency result to situations where Condition (6) does not hold, as it is the

case for our ill-posed setting (see Section 2.2). The identiﬁcation of ϕ0 as isolated minimum

is restored by including a small additional component λG (ϕ) in the limit criterion. Thus,

Condition (iii) in Theorem 1 is the condition on penalty function G (ϕ) to overcome ill-

posedness and achieve consistency of the estimator ϕ̂. To interpret Condition (iv), note that

in the ill-posed setting we have Cε (λ) → 0 as λ → 0, and the rate of this convergence can

be seen as a measure for the severity of ill-posedness. Thus, Condition (iv) introduces a
p
bound on ill-posedness severity, related to the rates of uniform convergence δ T −→ 0 and

approximation error ρT → 0 of the Sieve ΘT . In Appendix 1, we provide technical regularity

conditions to quantify this bound and to verify Conditions (i), (ii), and (iv) of Theorem 1 for

the TiR estimator. Finally, it is important to emphasize that Theorem 1 is more general than

the results currently known in the literature, since sequence (λT ) is allowed to be stochastic,

possibly data dependent, in a fully general way. Condition (9) on λT for consistency requires

that λT converges to zero at a rate smaller than 1/T .

The rest of this Section will focus on the key assumption of Theorem 1, that is identiﬁ-

17
cation assumption (iii). The next Proposition provides a suﬃcient condition for the validity

of this assumption.

Proposition 2: Assume that the function G is bounded from below. Furthermore, suppose

that, for any ε > 0 and any sequence (ϕn ) in Θ such that kϕn − ϕ0 k ≥ ε for all n ∈ N, we

have

Q∞ (ϕn ) → Q∞ (ϕ0 ) as n → ∞ =⇒ G (ϕn ) → ∞ as n → 0. (10)

Then, Condition (iii) of Theorem 1 is satisﬁed.

Proof: See Appendix 1.

Condition (10) provides a simple intuition to explain why the penalty function G (ϕ)

restores identiﬁcation. Indeed, it basically requires that the sequences (ϕn ) in Θ, which

minimize Q∞ (ϕ) without converging to ϕ0 , are penalized by function G (ϕ) . In the next

section, we particularize this condition for the penalty function which is relevant for the TiR

estimator in Deﬁnition 1, that is the Sobolev norm G (ϕ) = kϕk2H .

3.2 Penalization with Sobolev norm

When the penalty function G(ϕ) = kϕk2H is used, Condition (10) in Proposition 2 can be

nicely stated in terms of the spectrum of the operator A∗ A, where A is the operator in the

linearization of the moment function deﬁned in Assumption 1.

of operator A∗ A to eigenvalues µj , ordered such that µ1 ≥ µ2 ≥ · · · , and let function

18
ψj ∈ H 2 [0, 1], for any j ∈ N. Then, Mn := inf ϕ∈Sn :kϕk=1 kϕkH → ∞ as n → ∞, where
© ª
Sn =span ψj : j ≥ n .

Assumption 2 basically requires that the subspace spanned by the eigenfunctions of

A∗ A to eigenvalues close to zero consists of highly oscillating functions with large Sobolev

norm. Then, deviations of the estimator ϕ̂ from ϕ0 along such directions are penalized by

G(ϕ) = kϕk2H . This compensates the inability of the empirical criterion QT (ϕ) to achieve

this task because of its becoming asymptotically ﬂat in such directions.

In Lemma A.1 in Appendix 1, we show that Assumptions 1 and 2 imply Condition (10)

in Proposition 2. Then, from Theorem 1 and Proposition 2, the consistency of the TiR

estimator follows.

4 Mean Integrated Square Error analysis of the TiR

estimator

4.1 The Mean Integrated Square Error

In this section, we derive the Mean Integrated Square Error (MISE) of the TiR estimator

with deterministic sequence of regularisation parameters. To simplify the exposition, we

assume that an optimal weighting matrix is used.

Assumption 3: The asymptotic weighting matrix Ω0 (z) is V0 [g (Yt , ϕ0 (Xt )) | Z = z]−1 .

The asymptotic expansion of the MISE is characterized in the next Proposition.

19
Proposition 3: Under Assumptions 1-3, in Appendix B, and the bandwidth conditions

³ ´
hm
T = o (λT b (λT )) , (T λT )−1 = o hdTZ , (11)

the MISE of the TiR estimator ϕ̂ with deterministic sequence (λT ) is given by

∞
£ ¤ 1X νj ° °2
° ° + b (λT )2 =: MT (λ)
E kϕ̂ − ϕ0 k2 = 2 φj (12)
T j=1 (λT + ν j )
© ª
up to terms which are asymptotically negligible w.r.t. the RHS, where φj : j ∈ N are the

orthonormal eigenfunctions of operator A A to eigenvalues ν j , A denotes the adjoint oper-

ator of A w.r.t. the scalar products h., .iH and h., .iL2Ω (FZ ) , function b (λT ) is given by
0

° °
b (λT ) = °(λT + A A)−1 A Aϕ0 − ϕ0 ° , (13)

m is the order of the kernel K, and dZ the dimension of Z.

Proof: See Appendix 2.

The asymptotic expansion of the MISE consists of two components, which are a variance

term and a bias term, respectively.

(i) The bias function b (λT ) is the L2 norm of (λT + A A)−1 A Aϕ0 − ϕ0 =: ϕ∗ − ϕ0 . To

interpret function ϕ∗ , note that the quadratic approximation of the limit criterion [see (4)

and Assumption 1] can be written as

h 0
i
h∆ϕ, A∗ A∆ϕi = E0 (A∆ϕ) (Z) Ω0 (Z) (A∆ϕ) (Z) = h∆ϕ, A A∆ϕiH , ϕ ∈ Θ.

Then, function ϕ∗ minimizes the penalized asymptotic criterion h∆ϕ, A A∆ϕiH + λT kϕk2H .

Thus, b (λT ) is the asymptotic bias arising from introducing penalty λT kϕk2H in the criterion.

20
It corresponds to the so-called regularisation bias in the theory of Tikhonov regularisation

[see e.g. Kress (1999), Groetsch (1984)]. Under general conditions on operator A A and true

function ϕ0 , the bias function b (λ) is increasing w.r.t. λ and such that b (λ) → 0 as λ → 0.
X∞
° °2 £ ¤
(ii) The variance term T −1 °φj ° ν j / (λT + ν j )2 involves a weighted sum of the
j=1
° °2
"regularised" inverse eigenvalues ν j / (λT + ν j )2 of operator A A, with weights °φj ° . 2
To

have an interpretation, note that the inverse of operator A A corresponds to the standard
¡ 0 ¢−1
asymptotic variance matrix J0 V0−1 J0 of the eﬃcient GMM in the parametric setting,
h 0
i
where J0 = E0 ∂g/∂θ and V0 = V0 [g]. In the ill-posed non-parametric setting, the inverse

of operator A A is unbounded, and its eigenvalues 1/ν j → ∞ diverge. The penalty term

λT kϕk2H in the criterion deﬁning the TiR estimator implies that inverse eigenvalues 1/ν j are

replaced by ν j / (λT + ν j )2 .
X ∞
° °2 £ ¤
The variance term T −1 °φj ° ν j / (λT + ν j )2 is a decreasing function of λT . To study
j=1
its behaviour when λT → 0, we introduce the next assumption.

Assumption 4: The eigenfunctions φj and the eigenvalues ν j of A A satisfy

∞
X ° °2
ν j−1 °φj ° = ∞.
j=1

∞
X ° °2 £ ¤
Under Assumption 4, the series kT := °φj ° ν j / (λT + ν j )2 diverges as λT → 0.
j=1
When kT → ∞ such that kT /T → 0, the variance term converges to zero. However, the rate

of convergence is smaller than the parametric rate 1/T . This smaller rate of convergence

is typical in nonparametric estimation. Note, however, that the smaller rate of convergence

2
Since ν j /(λT + ν j )2 ≤ ν j , the inﬁnite sum converges under Assumption B.6 (i) in Appendix B.

21
is not coming from localization as for kernel estimation, but from the ill-posedness of the

problem, which implies ν j → 0.

The asymptotic expansion of the MISE of the TiR estimator given in Proposition 3 does

not involve the bandwidth hT , as long as Conditions (11) are satisﬁed. The variance term is

asymptotically independent of hT since the asymptotic expansion of ϕ̂−ϕ0 involves the kernel

density estimator integrated w.r.t. (Y, X, Z) [see Equation (36) in Appendix 2, ﬁrst term, and

the proof of Lemma A.3]. The integral averages the localization eﬀect of the bandwidth hT .

On the contrary, kernel estimation m̂(ϕ, z) of the conditional moment function does have an

eﬀect on the bias of the TiR estimator. However, the assumption hm

T = o (λT b (λT )) in (11)

implies that the estimation bias is asymptotically negligible compared to the regularisation

bias [see Lemma A.4 in Appendix 2].

Finally, it is also possible to derive a similar asymptotic expansion of the MISE for the

e T regularised by the L2 norm. This characterisation is new

estimator ϕ

∞
£ 2¤ 1X µj
¢ + eb (λT ) ,
2
ϕT − ϕ0 k =
E ke ¡ (14)
T j=1 λT + µj 2
° °
where µj are the eigenvalues of operator A∗ A, and eb (λT ) = °(λT + A∗ A)−1 A∗ Aϕ0 − ϕ0 °.

Let us now come back to the MISE MT (λ) of the TiR estimator in Proposition 3 and dis-

cuss the optimal choice of the regularisation parameter λT . Since the bias term is increasing

in the regularisation parameter, whereas the variance term is decreasing, we face a kind of

bias-variance trade-oﬀ. The optimal sequence of deterministic regularisation parameters is

given by λ∗T = arg minλ>0 MT (λ), and the corresponding optimal MISE of the TiR estimator

is given by MT∗ := MT (λ∗T ).

22
The optimal sequence of regularisation parameters λ∗T , in particular its rate of conver-

gence to zero, depends on the decay behaviour of the eigenvalues ν j and of the norms of
° °
eigenfunctions °φj °, as well as on the bias function b (λ) close to λ = 0. In the next section,

we characterize the optimal sequence of regularisation parameters λ∗T , the corresponding

optimal MISE MT∗ , and their rate of convergence in a broad class of models.

4.2 Optimal rates of convergence

° °
The eigenvalues ν j and the norms of eigenfunctions °φj ° can feature diﬀerent types of decay

as j → ∞, for instance geometric or hyperbolic decay. Intuitively, the ﬁrst type is associated

with a faster convergence of the spectrum to zero, and thus to a more serious problem of ill-

posedness. In this section, we focus our analysis on the case where the eigenvalues ν j feature
° °
geometric decay and the norms of eigenfunctions °φj ° feature hyperbolic decay. Results for

the other cases are summarised at the end of the section.

° °
Assumption 5: The eigenvalues ν j and the norms of the eigenfunctions °φj ° of operator

A A are such that, for j = 1, 2, · · · , and some positive constants C1 , C2 ,

° °2
(i) ν j = C1 exp (−αj), α > 0 , (ii) °φj ° = C2 j −β , β > 0.

Assumption 5 (i) is satisﬁed for a large number of models, including for instance the

two examples that we consider below in our Monte-Carlo analysis. In general, it is known

that, under appropriate regularity conditions, compact integral operators with smooth kernel

feature eigenvalues with decay of (at least) exponential type [see Theorem 15.20 in Kress

23
3
(1999)]. Assumption 5 (ii) is adopted e.g. in Wahba (1977), and is also satisﬁed in the

examples of our Monte-Carlo analysis.

We further assume that the bias function features a power-law behaviour close to λ = 0.

Assumption 6: The bias function is such that b(λ) = C3 λδ , δ > 0, for λ close to 0, where

C3 is a positive constant.

Then, the MISE and the optimal sequence of regularisation parameters are characterised in

the next Proposition.

Proposition 4: Under the Assumptions of Proposition 3, Assumptions 5 and 6, for some

positive constants c1 , c2 , c and c, we have

1 1
(i) The MISE is MT (λ) = c1 +c2 λ2δ , up to terms which are negligible when
T λ [log (1/λ)]β
λ → 0 and T → ∞.

(ii) The optimal sequence of regularisation parameters is

1
log λ∗T = log c − log T, T ∈ N, (15)
1 + 2δ

up to a term which is negligible w.r.t. the RHS.

2δ 2δβ
(iii) The optimal MISE is MT∗ = cT − 1+2δ (log T )− 1+2δ , up to a term which is negligible

w.r.t. the RHS.

3
In the case of linear IV estimation and regularisation with L2 norm, the eigenvalues correspond to the
nonlinear canonical correlations of (X, Z). When X and Z are monotonic transformations of variables which
are jointly normally distributed with correlation parameter ρ, the canonical correlations of (X, Z) are ρj ,
j ∈ N [see e.g. DFR]. Thus the eigenvalues feature exponential decay.

24
Proof: See Appendix 3.

The log of the optimal regularisation parameter is linear in the log sample size. The

slope coeﬃcient γ := 1/(1 + 2δ) is smaller than 1, and depends on the convexity parameter

δ of the bias function close to λ = 0. We have γ < 1/2 when the squared bias function

b(λ)2 is convex, that is 2δ > 1, respectively γ ≥ 1/2 when 2δ < 1. The optimal MISE

converges to zero as a power of T and of log T . The negative exponent of the dominant term

T is 2δ/(1 + 2δ). This rate of convergence is smaller than 1, that is the parametric rate,

because of ill-posedness, and is increasing w.r.t. convexity parameter δ of the bias function.

Note that the geometric decay rate α does not aﬀect neither the rate of convergence of the

optimal regularisation sequence, nor that of the MISE, whereas coeﬃcient β of eigenfunction

norms aﬀects the exponent of the log T term in the MISE only. Finally, under Assumptions

5 and 6, the bandwidth conditions (11) are fulfilled for the optimal sequence of regularisation
1 2δ 1 1+δ
parameters (15) if hT = C · T −η , with <η < . This condition can be
dZ 1 + 2δ m 1 + 2δ
m 1+δ
satisfied if > .
dZ 2δ
To conclude this section, we briefly discuss the optimal rate of convergence of the MISE

when the eigenvalues feature hyperbolic decay, that is ν j = Cj −α , α > 0, or when regu-

larisation with L2 norm is adopted. The results are summarized in Table 1 below, and are

found using Formula (14) and an argument similar to the proof of Proposition 4. In Table 1,

parameter β is deﬁned as in Assumption 5 (ii) for the TiR estimator. Parameters α and α
e

denote the hyperbolic decay rates of the eigenvalues of operator A A for the TiR estimator,

and of operator A∗ A for L2 regularisation, respectively. We assume α, α

e > 1, and α > β − 1

25
to satisfy Assumption 4. Finally, parameters δ and e
δ are the power-law coeﬃcients of the

bias function b (λ) and eb (λ) for λ → 0 as in Assumption 6, where b (λ) is deﬁned in (13) for

the TiR estimator, and eb (λ) in (14) for L2 regularisation, respectively.

TiR estimator L2 regularisation

geometric 2δ 2δβ − 2e
δ
T − 1+2δ (log T )− 1+2δ T 1+2eδ

spectrum

hyperbolic 2δ − 2eδ
T − 1+2δ+(1−β)/α T 1+2e
δ+1/e
α

spectrum

Table 1: Optimal rate of convergence of the MISE. The decay factors are α and α
e for the

eigenvalues, δ and e
δ for the bias, and β for the squared norm of the eigenfunctions.

With hyperbolic spectrum, the rate of convergence (power of T ) of the TiR estimator

features an additional term (1 − β) /α in the denominator, which involves both the α and

β coeﬃcients. When β > 1, the rate of convergence is faster than that with geometric

spectrum. This is an eﬀect of the less severe ill-posedness problem. The rate of convergence

with geometric spectrum is recovered letting α → ∞ (up to the log T term).

The rate of convergence with L2 regularisation coincides with that of the TiR estimator

with β = 0 and coeﬃcients α, δ corresponding to operator A∗ A instead of A A. With geo-

metric spectrum, the TiR estimator features a faster rate of convergence than the regularised

estimator with L2 norm if δ > e

δ, that is if the bias function of the TiR estimator is more

26
convex. Finally, note that with hyperbolic spectrum and L2 regularisation, the formula given

4
in Table 1 corresponds to that derived by HH, Theorem 4.1.

5 The TiR estimator for linear moment restrictions

In this section we derive the TiR estimator when the moment restrictions are linear w.r.t. the

functional parameter ϕ0 . We consider the case of non-parametric IV estimation of a single

equation model, with g (y, ϕ0 (x)) = ϕ0 (x) − y, and conditional moment as in (3). Then, the

estimated moment function is given by

Z Z ³ ´
m̂ (ϕ, z) = ϕ (x) fˆ (w|z) dw − y fˆ (w|z) dw =: Âϕ (z) − r̂ (z) .

To simplify the exposition, we assume that Ω0 (z) = V0 [Yt − ϕ0 (Xt ) | Z = z]−1 = 1 in As-

sumption 3. The objective function of the TiR estimator in Deﬁnition 1 can be rewritten as

[see Appendix 2.1]

QT (ϕ) + λT kϕk2H = hϕ, Â ÂϕiH − 2hϕ, Â r̂iH + λT hϕ, ϕiH , ϕ ∈ H 2 [0, 1] , (16)

up to a term independent of ϕ, where Â denotes the linear operator deﬁned on L2Ω0 (FZ ) by

1 X³ ´
T
hϕ, Â ψiH = Âϕ (Zt ) ψ (Zt ) , ϕ ∈ H 2 [0, 1] , ψ ∈ L2Ω0 (FZ ). (17)
T t=1

Under the regularity conditions in Appendix B, Criterion (16) admits a global minimum ϕ̂

on H [0, 1], which is characterized by the ﬁrst order condition

³ ´
λT + Â Â ϕ̂ = Â r̂ . (18)

4
To see this, note that their Assumption A.3 corresponds to e
δ = (2β HH − 1) / (e
α + β HH ), where β HH
is the β coeﬃcient of HH.

27
5
This is a Fredholm integral equation of Type II . The transformation of the ill-posed

problem (1) in the well-posed estimating equation (18) is induced by the penalty term

involving the Sobolev norm. The TiR estimator is the unique solution of Equation (18) and

is given by
³ ´−1
ϕ̂ = λT + Â Â Â r̂. (19)

The TiR estimator can be approximated numerically by introducing a ﬁnite-dimensional

basis of functions {Pj : j = 1, ..., K} in H 2 [0, 1] and solving Equation (19) on the subspace

spanned by {Pj : j = 1, ..., K}, which yields

K
X 0
ϕ' θj Pj =: θ P, θ ∈ RK . (20)
j=1

The K × K matrix corresponding to operator Â Â on the subspace spanned by {Pj } is given

by [using (17)]

1 X³ ´ ³ ´ 1 ³ b0 b´
T
hPi , Â ÂPj iH = ÂPi (Zt ) ÂPj (Zt ) = PP , i, j = 1, ..., K,
T t=1 T i,j

0 R
where Pb is the T × K matrix with rows Pb (Zt ) = P (x) fˆ (w|Zt ) dw, t = 1, ..., T . Ma-
0

trix Pb is the matrix of the "ﬁtted values" in the regression of P (X) on Z at the sample
µ ¶
1 b0 b 1 0b
points. Then, Equation (18) reduces to a matrix equation λT D + P P θ = Pb R,
T T
0
where R b = (r̂ (Z1 ) , ..., r̂ (ZT )) and D is the K × K matrix of Sobolev scalar products
µ ¶−1
b 1 b0 b 1 b0 b
Di,j = hPi , Pj iH , i, j = 1, ..., K. The solution is given by θ = λT D + P P P R,
T T
0
which yields the approximation of the TiR estimator ϕ̂ ' θ̂ P.

5
See e.g. Linton and Mammen (2005), (2006), Gagliardini and Gouriéroux (2006), and the survey by
Carrasco, Florens, Renault (2005) for other examples of estimation problems leading to Type II equations.

28
Estimator θ̂ is a 2SLS estimator with a ridge correction term. It is easy to verify that,

this is the estimator that we obtain, if we replace Approximation (20) in Criterion (16) and

we minimize w.r.t. θ. This latter approach has been considered by NP, AC, and Blundell

et al (2005), which use Sieve estimators. However, it is important to emphasize that, the

introduction of a series of basis functions as in (20) is simply a method to compute approx-

imately the true TiR estimator ϕ̂ in (19), which is a well-deﬁned estimator on the function

space. In particular, when K = KT → ∞ suﬃciently fast with T , the asymptotic properties

0
of estimator θ̂ P are the same as for estimator ϕ̂. Moreover, the asymptotic properties (con-

sistency) of the estimators proposed by NP, AC, and Blundell et al (2005), have been derived

only in the case where parameter λT is tight down by the inequality constraint kϕ̂kH ≤ B̄

for ﬁxed B̄, whereas, for the TiR estimator, λT is treated as a free regularization parameter

depending on sample size.

6 A Monte Carlo study

6.1 Data generating process

Following NP we draw the errors U and V and the instruments Z as

     
 U   0   1 ρ 0 
     
     
 V  ∼ N  0  ,  ρ 1 0  , ρ ∈ {0, 0.5},
     
     
     
Z 0 0 0 1

and build X ∗ = Z + V . Then we map X ∗ into a variable X = Φ (X ∗ ), which lives in

[0, 1]. The function Φ denotes the cdf of a standard Gaussian variable, and is assumed to be

known. To generate Y , we restrict ourselves to the linear case since a simulation analysis of

29
a nonlinear case would be very time consuming. We examine two designs

Case 1: Y = Ba,b (X) + U,

where Ba,b denotes the cdf of a Beta(a, b) variable;

Case 2: Y = sin (πX) + U .

The parameters of the beta distribution are chosen equal to a = 2 and b = 5.

When the correlation ρ between U and V is 50% there is endogeneity in both cases.

When ρ = 0 there is no need to correct for the endogeneity bias.

The moment condition is

E0 [Y − ϕ0 (X) | Z] = 0,

where the functional parameter is ϕ0 (x) = Ba,b (x) in Case 1, and ϕ0 (x) = sin (πx) in Case

2, x ∈ [0, 1].

6.2 Estimation procedure

Since we face an unknown function ϕ0 on [0, 1], we use a series approximation based on

standardized shifted Chebyshev polynomials of the ﬁrst kind (see Section 22 on orthogonal

polynomials of Abramowitz and Stegun (1970) for their mathematical properties). We take

orders 0 to 5 which yields six coeﬃcients (K = 6) to be estimated in the approximation

X5
√ p
ϕ(x) ' θj Pj (x), where P0 (x) = T0∗ (x)/ π, Pj (x) = Tj∗ (x)/ π/2, j 6= 0, and the shifted
j=0

30
Chebyshev polynomials of the ﬁrst kind are

T0∗ (x) = 1, T1∗ (x) = −1 + 2x, T2∗ (x) = 1 − 8x + 8x2 ,

T3∗ (x) = −1 + 18x − 48x2 + 32x3 , T4∗ (x) = 1 − 32x + 160x2 − 256x3 + 128x4 ,

T5∗ (x) = −1 + 50x − 400x2 + 1120x3 − 1280x4 + 512x5 .

R1 R1
The (squared) Sobolev norm kϕk2H = 0
ϕ2 + 0
(∇ϕ)2 is approximated by

5 X
X 5 Z 1
kϕk2H ' θi θj (Pi (x)Pj (x) + ∇Pi (x)∇Pj (x)) dx.
i=0 j=0 0

The coeﬃcients in this quadratic form θ0 Dθ take a closed form, and can be computed easily

via integration with a symbolic calculus package:

 
√ √
1 − 2 − 2
 π
0 3π
0 15π
0 
 
 .. 
 . 26
0 38
0 166 
 3π 5π 21π 
 
 
 218
0 1182
0 
 5π 35π 
D=

.

 3898
0 5090 
 35π 63π 
 
 .. 
 67894 
 . 315π
0 
 
 
82802
... ... 231π

31
The L2 norm kϕk2 can be approximated in a similar way with θ0 Bθ where
 
√ √
1 − 2 − 2
 π
0 3π
0 15π
0 
 
 .. 
 . 2
0 −2
0 −2 
 3π 5π 21π 
 
 
 14
0 −38
0 
 15π 105π 
B=

.

 34
0 −22 
 35π 63π 
 
 .. 
 62 
 . 63π
0 
 
 
98
... ... 99π

Such simple and exact forms ease implementation, improve on speed, and contribute to the

6
numerical stability of the estimation procedure .

The kernel estimator m̂ (ϕ, z) of the conditional moment is approximated through

θ0 P̂ (z) − r̂(z) where

T
X µ ¶ T
X µ ¶
Zt − z Zt − z
P (Xt ) K Yt K
t=1
h t=1
h
P̂ (z) ' T µ ¶ , r̂(Z) ' T µ ¶ ,
X Zt − z X Zt − z
K K
t=1
h t=1
h

where h denotes the bandwidth, and K is the Gaussian kernel. This kernel estimator is

asymptotically equivalent to the one described in the lines above. We prefer it because of its

numerical tractability. It has the advantage of avoiding bivariate numerical integration and

the choice of two additional bandwidthes. The bandwidth is selected via the standard rule

of thumb h = 1.06σ̂ Z T −1/5 (Silverman (1986)), where σ̂ Z is the empirical standard deviation

of Zt .

The weighting function Ω0 (z) is taken equal to unity, satisfying Assumption 3.

6
The Gauss programs developed for this section are available on request from the authors.

32
6.3 Simulation results

The sample size is initially ﬁxed at T = 400. Estimator performance is measured in terms

of the Mean Integrated Squared Error (MISE) and the Integrated Squared Bias (ISB) based

on averages over 1000 repetitions. We use a univariate Gauss-Legendre quadrature with 40

knots to compute the integrals.

Figures 1 to 4 concern Case 1 while Figures 5 to 8 concern Case 2. In each ﬁgure the left

panel plots the MISE on a grid of lambda, the central panel the ISB on a grid of lambda, and

the right panel the mean estimated functions and the true function on the unit interval. Mean

estimated functions correspond to averages obtained either from regularised estimates with

a lambda achieving the lowest MISE or from OLS estimates. The regularization schemes use

the Sobolev norm, corresponding to the TiR estimator (odd numbering of the ﬁgures), and

the L2 norm (even numbering of the ﬁgures). We consider designs exhibiting an endogeneity

(ρ = 0.5) in Figures 1, 2, 5, 6, while Figures 3, 4, 7, 8 are dedicated to the designs without

endogeneity (ρ = 0).

A couple of remarks can be made. First, the bias of the OLS estimator can be large

under endogeneity. Second, the MISE of the TiR estimator is more convex in lambda than

the one obtained from an L2 norm, and performance is clearly better for the TiR estimator.

The Sobolev norm should be strongly favoured over the L2 norm in order to recover the

shape of the true functions. Third, the ﬁt obtained by the OLS estimator is almost perfect

when endogeneity is absent. Using six polynomials delivers a very good approximation of

the true functions.

33
We have also examined sample sizes T = 100 and T = 1000, as well as approximations

based on polynomials with orders up to 10 and 15. The above conclusions remain qualita-

tively unaﬀected. This suggests that as soon as the order of the polynomials is suﬃciently

large to deliver a good numerical approximation of the underlying function, it is not neces-

sary to link it with sample size, as explained in Section 5. For example Figures 9 and 10 are

the analogues of Figures 1 and 5 with T = 1000. We can see that the bias term is almost

identical, while the variance term decreases by a factor about 2.5 as predicted by Proposition

In Figure 11 we display the six eigenvalues of operator A A and the L2 -norms of the

corresponding eigenfunctions when the same approximation basis of six polynomials is used.

These true quantities have been computed by Monte Carlo integration. The eigenvalues ν j
° °2
feature a geometric decay w.r.t. the order j, whereas the decay of the norms °φj ° is of an

hyperbolic type. This is conform to Assumption 5 and the analysis conducted in Proposition

4. A linear ﬁt of the plotted points gives a decay factor α̂ = 2.254 for the eigenvalues and a

decay factor β̂ = 2.911 for the norm of the eigenfunctions.

Figure 12 is dedicated to check whether the line log λ∗T = log c − γ log T, induced by

Proposition 4 (ii), holds in small samples. For ρ = 0.5 the right panel for Case 1 as well as the

left panel for Case 2 exhibit a linear relationship between the logarithm of the regularisation

parameter minimizing the average MISE on the 1000 Monte Carlo simulations and the

logarithm of sample size ranging from T = 50 to T = 1000. The OLS estimation of this

linear relationship from the plotted pairs delivers ĉ = .226, γ̂ = .752 in Case 1, and ĉ = .012,

34
γ̂ = .428 in Case 2. Both estimated slope coeﬃcients are smaller than 1, and qualitatively

consistent with the implications of Proposition 4. Indeed, from Figures 9 and 10 the ISB

curve appears to be more convex in Case 2 than in Case 1. This points to a larger δ

parameter, and thus to a smaller slope coeﬃcient γ = 1/ (1 + 2δ), in Case 2. Inverting the

relationship γ = 1/ (1 + 2δ) we get estimates for the decay factor δ, which are δ̂ = .165 and

δ̂ = .668 in Case 1 and Case 2, respectively.

By a similar argument, Proposition 4 also explains the better performance of the TiR

estimator compared to the L2 -regularised estimator that we reported above. Indeed, com-

paring the ISB curves of the two estimators in Case 1 (Figures 1 and 2) and in Case 2

(Figures 5 and 6), it appears that the TiR estimator features a more convex ISB curve. This

implies δ > e
δ and thus a faster rate of convergence of the TiR estimator.

Finally we wish to conclude by a brief discussion on data driven selection procedures

of the regularisation parameter λT . We investigate a ﬁrst method based on the asymptotic

spectral representation of the MISE provided in Proposition 3, and a second method based

on a resampling approximation.

The ﬁrst data driven selection procedure aims at estimating directly Expression (12) in

order to derive the optimal regularisation parameter. In unreported results we have checked

that the asymptotic MISE, the asymptotic ISB and the asymptotic variance are close to the

ones exhibited in Figures 9 and 10. These true quantities have also been computed by Monte

Carlo integration. We have found an asymptotic optimal lambda equal to .0018 in Case 1

and to .0009 in Case 2, which are of the same magnitudes as .0013 and .0007 in Figures 9

35
and 10. We have also checked that the linear relationship exhibited in Figure 12 holds true

when deduced from optimizing the asymptotic MISE. The OLS estimation delivers ĉ = .418,

γ̂ = .795 in Case 1, and ĉ = .037, γ̂ = .546 in Case 2, and thus δ̂ = .129 and δ̂ = .418,

respectively.

The data driven estimation algorithm goes as follows:

Algorithm

(i) Perform the spectral decomposition of the matrix D−1 Pb Pb/T to get eigenvalues ν̂ j and
0

0
eigenvectors ŵj , normalized to ŵj Dŵj = 1, j = 1, ..., K.

(ii) Get a ﬁrst-step TiR estimator θ̄ using a pilot regularisation parameter λ̄.

(iii) Estimate the MISE:

K
1X ν̂ j
M̃ (λ) = ŵ0 B ŵj
T j=1 (λ + ν̂ j )2 j
" µ ¶−1 # " µ ¶−1 #
0 1 1 1 1
Pb Pb λD + Pb Pb Pb Pb λD + Pb Pb
0 0 0 0
+θ̄ −I B − I θ̄,
T T T T

and minimize it w.r.t. λ to get the optimal regularisation parameter λ̂.

(iv) Compute the second-step TiR estimator b

θ using regularisation parameter λ̂.

A second-step estimated MISE viewed as a function of sample size T and regularisation

parameter λ can then be estimated with θ̂ instead of θ̄. Besides, if we assume the decay

behaviour of Assumptions 5 and 6, the decay factors α and β can be estimated via minus the

slopes of the linear ﬁt on the pairs (log ν̂ j , j) and on the pairs (log ŵj0 B ŵj , log j), j = 1, ..., K.

36
After getting lambdas minimizing the second-step estimated MISE on a grid of sample sizes

we can also estimate γ by regressing the logarithm of lambda on the logarithm of sample

size.

We have used λ̄ = {.0005, .0001} as the pilot regularisation parameter for T = 1000 and

ρ = .5. In Case 1, the average (quartiles) of the selected lambda over 1000 simulations is

equal to .0028 (.0014, .0020, .0033) when λ̄ = .0005, and .0027 (.0007, .0014, .0029) when

λ̄ = .0001. In Case 2, the results are .0009 (.0007, .0008, .0009) when λ̄ = .0005, and .0008

(.0004, .0006, .0009) when λ̄ = .0001. The selection procedure tends to slightly overpenalize

on average, especially in Case 1, but this does not seem to impact much the MISE of the

two-step TiR estimator. Indeed if we use the optimal data driven regularisation parameter

at each simulation, the MISE based on averages over the 1000 simulations is equal to .0120

for Case 1 and equal to .0144 for Case 2 when λ̄ = .0005 (resp., .0156 and .0175 when

λ̄ = .0001), which are of the same magnitudes as the best MISE, which are .0099 and .0121

in Figures 9 and 10. In Case 1, the tendency of the selection procedure to overpenalized

without unduly affecting efficiency is due to the flatness of the MISE curve.

We also get average values for the decay factors α and β close to the asymptotic ones.

These have been computed through estimating the coeﬃcients of a linear ﬁt for each sim-

ulation, and averaging over the 1000 simulations. For α the average (quartiles) is equal to

2.2502 (2.1456, 2.2641, 2.3628), and for β it is equal to 2.9222 (2.8790, 2.9176, 2.9619).

To compute the average value for the decay factor γ we have used an equally spaced

grid of sample sizes T ∈ {500, 550, ..., 950, 1000} in the variance component of the MISE,

37
together with the data driven estimate of θ in the bias component of the MISE. Optimizing

on the grid of sample sizes yields an optimal lambda for each sample size per simulation.

The logarithm of the optimal lambda is then regressed on the logarithm of the sample

size, and the estimated slope is averaged over the 1000 simulations to obtain the average

estimated gamma. In Case 1, we get an average (quartile) of .6081 (.4908, .6134, .6979),

when λ̄ = .0005, and .7224 (.5171, .6517, .7277), when λ̄ = .0001. In Case 2, we get an

average (quartile) of .5597 (.4918, .5333, .5962), when λ̄ = .0005, and .5764 (.4946, .5416,

.6203), when λ̄ = .0001.

The second data driven selection procedure builds on the suggestion of Goh (2004) based

on a subsampling procedure (also called the m-out-of-n (moon) bootstrap). Even if his

theoretical results are derived for semiparametric estimators we believe that they could be

extended to our case as well. Recognizing that λT = cT −γ we propose to choose c and γ

which minimize the following estimator of the MISE:

Z
11X 1
M̂ISE(c, γ) = (ϕ̂ (x; c, γ) − ϕ̄(x))2 dx,
I J i,j 0 i,j

where ϕ̂i,j (x; c, γ) denotes the estimator based on the jth subsample of size mi (mi << T )

with regularisation parameter λmi = cm−γ

i , and ϕ̄(x) denotes the estimator based on the

original sample of size T with a pilot regularisation parameter λ̄ chosen suﬃciently small to

eliminate the bias.

In practice we have chosen 500 subsamples (J = 500) for each subsample size mi ∈

{50, 60, 70, ..., 100} (I = 6), λ̄ = {.0005, .0001}, and T = 1000. To determine c and γ we

have build a joined grid with values around the OLS estimates coming from Case 1, namely

38
{.15, .2, .25} × {.7, .75, .8}, and with values around the OLS estimates coming from Case 2,

namely {.005, .01, .015} × {.35, .4, .45}. Note that the two grids yield a similar range for λT .

In the experiments for ρ = 0.5 we want to verify whether the data driven procedure is able

to pick most of the time c and γ in the ﬁrst set of values in Case 1, and in the second set of

values in Case 2. On 1000 simulations we have found a frequency equal to 96% of adequate

choices in Case 1 when λ̄ = .0005, and 87% when λ̄ = .0001. In Case 2 we have found 77%

when λ̄ = .0005, and 82% when λ̄ = .0001.These frequencies are scattered among the grid

values.

39
References
Ai, C., and X. Chen (2003): "Eﬃcient Estimation of Models with Conditional Moment
Restrictions Containing Unknown Functions", Econometrica, 71, 1795-1843.

Carrasco, M., Florens, J.-P., and E. Renault (2005): "Linear Inverse Problems in
Structural Econometrics: Estimation Based on Spectral Decomposition and Regulari-
sation", forthcoming in the Handbook of Financial Econometrics.

Darolles, S., Florens, J.-P., and E. Renault (2003): "Nonparametric Instrumental Re-
gression", D.P.

Hall, P., and J. Horowitz (2005): "Nonparametric Methods for Inference in the Pres-
ence of Instrumental Variables", Annals of Statistics.

Kress, R. (1999): Linear Integral Equations, Springer.

Newey, W., and J. Powell, (2003): "Instrumental Variable Estimation of Nonparamet-

ric Models", Econometrica, 71, 1565-1578.

Reed, M., and B. Simon (1980): Functional Analysis, Academic Press.

White, H., and J. Wooldridge (1991): "Some Results on Sieve Estimation with Depen-
dent Observations", in Nonparametric and Semiparametric Methods in Econometrics
and Statistics, Proceedings of the Fifth International Symposium In Economic Theory
and Econometrics, Cambridge University Press.

40
Estimated and
MISE ISB true functions
0.12 0.07 2

0.06
0.1 1.5

0.05
0.08 1

0.04
0.06 0.5
0.03

0.04 0
0.02

0.02 -0.5
0.01

0 0 -1
0 1 2 3 4 5 0 1 2 3 4 5 0 0.2 0.4 0.6 0.8 1
λ λ
-3 -3
x 10 x 10 x

Figure 1: MISE (left panel), ISB (central panel) and estimated function (right panel) for
the TiR estimator using Sobolev norm (solid line) and for OLS estimator (dashed line). The
true function is the dotted line in the right panel, and corresponds to Case 1. Correlation
parameter is ρ = 0.5, and sample size is T = 400.

41
Estimated and
MISE ISB true functions
0.55 0.07 2

0.5
0.06
1.5
0.45

0.4 0.05
1
0.35
0.04
0.3 0.5
0.03
0.25
0
0.2 0.02

0.15
-0.5
0.01
0.1

0.05 0 -1

λ λ
0 0.025 0.05 0 0.025 0.05 0 0.2 0.4 0.6 0.8 1
x

Figure 2: MISE (left panel), ISB (central panel) and estimated function (right panel) for the
regularised estimator using L2 norm (solid line) and for OLS estimator (dashed line). The
true function is the dotted line in the right panel, and corresponds to Case 1. Correlation
parameter is ρ = 0.5, and sample size is T = 400.

42
Estimated and
MISE ISB true functions
0.12 0.012 1.2

1
0.1 0.01

0.8
0.08 0.008

0.6
0.06 0.006
0.4

0.04 0.004
0.2

0.02 0.002
0

0 0 -0.2
0 1 2 3 4 5 0 1 2 3 4 5 0 0.2 0.4 0.6 0.8 1
λ λ
-3 -3 x
x 10 x 10

Figure 3: MISE (left panel), ISB (central panel) and estimated function (right panel) for
the TiR estimator using Sobolev norm (solid line) and for OLS estimator (dashed line). The
true function is the dotted line in the right panel, and corresponds to Case 1. Correlation
parameter is ρ = 0, and sample size is T = 400.

43
Estimated and
MISE ISB true functions
0.7 0.045 1.2

0.04
0.6 1

0.035
0.5 0.8
0.03

0.4 0.025 0.6

0.3 0.02 0.4

0.015
0.2 0.2
0.01

0.1 0
0.005

0 0 -0.2
0 0.025 0.05 0 0.025 0.05 0 0.2 0.4 0.6 0.8 1
λ λ x

Figure 4: MISE (left panel), ISB (central panel) and estimated function (right panel) for the
regularised estimator using L2 norm (solid line) and for OLS estimator (dashed line). The
true function is the dotted line in the right panel, and corresponds to Case 1. Correlation
parameter is ρ = 0, and sample size is T = 400.

44
Estimated and
MISE ISB true functions
0.12 0.07 1.2

0.11 1
0.06
0.1 0.8

0.09 0.05 0.6

0.08 0.4
0.04
0.07 0.2
0.03
0.06 0

0.05 0.02 -0.2

0.04 -0.4
0.01
0.03 -0.6

0.02 0 -0.8

λ λ
0 1 2 3 4 5 0 1 2 3 4 5 0 0.2 0.4 0.6 0.8 1
-3 -3 x
x 10 x 10

Figure 5: MISE (left panel), ISB (central panel) and estimated function (right panel) for
the TiR estimator using Sobolev norm (solid line) and for OLS estimator (dashed line). The
true function is the dotted line in the right panel, and corresponds to Case 2. Correlation
parameter is ρ = 0.5, and sample size is T = 400.

45
Estimated and
MISE ISB true functions
0.55 0.07 1.2

0.5 1
0.06
0.45 0.8

0.4 0.05 0.6

0.35 0.4
0.04
0.3 0.2
0.03
0.25 0

0.2 0.02 -0.2

0.15 -0.4
0.01
0.1 -0.6

0.05 0 -0.8

λ λ
0 0.025 0.05 0 0.025 0.05 0 0.2 0.4 0.6 0.8 1
x

Figure 6: MISE (left panel), ISB (central panel) and estimated function (right panel) for the
regularised estimator using L2 norm (solid line) and for OLS estimator (dashed line). The
true function is the dotted line in the right panel, and corresponds to Case 2. Correlation
parameter is ρ = 0.5, and sample size is T = 400.

46
Estimated and
MISE ISB true functions
0.12 0.05 1.2

0.045
1
0.1
0.04

0.035 0.8
0.08
0.03
0.6
0.06 0.025
0.4
0.02
0.04
0.015 0.2

0.01
0.02
0
0.005

0 0 -0.2

λ λ
0 1 2 3 4 5 0 1 2 3 4 5 0 0.2 0.4 0.6 0.8 1
-3 -3 x
x 10 x 10

Figure 7: MISE (left panel), ISB (central panel) and estimated function (right panel) for
the TiR estimator using Sobolev norm (solid line) and for OLS estimator (dashed line). The
true function is the dotted line in the right panel, and corresponds to Case 2. Correlation
parameter is ρ = 0, and sample size is T = 400.

47
Estimated and
MISE ISB true functions
0.7 0.07 1.2

0.6 0.06 1

0.5 0.05 0.8

0.4 0.04 0.6

0.3 0.03 0.4

0.2 0.02 0.2

0.1 0.01 0

0 0 -0.2

λ λ
0 0.025 0.05 0 0.025 0.05 0 0.2 0.4 0.6 0.8 1
x

Figure 8: MISE (left panel), ISB (central panel) and estimated function (right panel) for the
regularised estimator using L2 norm (solid line) and for OLS estimator (dashed line). The
true function is the dotted line in the right panel, and corresponds to Case 2. Correlation
parameter is ρ = 0, and sample size is T = 400.

48
Estimated and
MISE ISB true functions
0.1 0.07 2

0.09
0.06
1.5
0.08

0.07 0.05
1
0.06
0.04
0.05 0.5
0.03
0.04
0
0.03 0.02

0.02
-0.5
0.01
0.01

0 0 -1

λ λ
0 1 2 3 4 5 0 1 2 3 4 5 0 0.2 0.4 0.6 0.8 1
-3 -3 x
x 10 x 10

Figure 9: MISE (left panel), ISB (central panel) and estimated function (right panel) for
the TiR estimator using Sobolev norm (solid line) and for OLS estimator (dashed line). The
true function is the dotted line in the right panel, and corresponds to Case 1. Correlation
parameter is ρ = 0.5, and sample size is T = 1000.

49
Estimated and
MISE ISB true functions
0.1 0.07 1.2

0.09 1
0.06
0.08 0.8

0.07 0.05 0.6

0.06 0.4
0.04
0.05 0.2
0.03
0.04 0

0.03 0.02 -0.2

0.02 -0.4
0.01
0.01 -0.6

0 0 -0.8

λ λ
0 1 2 3 4 5 0 1 2 3 4 5 0 0.2 0.4 0.6 0.8 1
-3 -3 x
x 10 x 10

Figure 10: MISE (left panel), ISB (central panel) and estimated function (right panel) for
the TiR estimator using Sobolev norm (solid line) and for OLS estimator (dashed line). The
true function is the dotted line in the right panel, and corresponds to Case 2. Correlation
parameter is ρ = 0.5, and sample size is T = 1000.

50
Eigenvalues Eigenfunctions
2 0

0 -1

-2
-2

log(|| φ || )
-4

2
log( ν )
j

j
-3
-6
-4
-8

-10 -5

-12 -6
1 2 3 4 5 6 0 0.5 1 1.5 2

log(j)

Figure 11: The six largest eigenvalues (left Panel) and the L2 -norms of the corresponding
eigenfunctions (right Panel) of operator A A.

51
Case 1: Beta Case 2: Sin
-4 -4

-4.5 -4.5

-5 -5

-5.5 -5.5
log( λ )

log( λT)
T

-6 -6

-6.5 -6.5

-7 -7

-7.5 -7.5
4 5 6 7 4 5 6 7
log(T) log(T)

Figure 12: Log of optimal regularisation parameter as a function of log of sample size for
Case 1 (left panel) and Case 2 (right panel). Correlation parameter is ρ = 0.5.

APPENDIX 1

Consistency of the TiR estimator

In this Appendix we prove the consistency of penalized extremum estimators

ϕ̂ = arg inf QT (ϕ) + λT G(ϕ). (A.1)

ϕ∈ΘT

This covers the special case of the TiR estimator in Deﬁnition 1, where G(ϕ) = kϕk2H .

A.1.1 Existence and measurability of the estimator

From Theorem 2.2 of White and Wooldridge (1991), the estimator ϕ̂ in (A.1) is well-deﬁned

and measurable if

(i) function QT : Ω × ΘT → R is Borel-measurable, where QT (ω, ϕ) denotes the values of

random variable QT (ϕ) for event ω ∈ Ω, and (Ω, F, P ) is a complete probability space;

52
(ii) mappings ϕ → G(ϕ) and ϕ → QT (ω, ϕ) are weakly lower semi-continuous on ΘT , P -a.s.,

for any T , w.r.t. the L2 norm k.k ;

(iii) set ΘT is compact w.r.t. the L2 norm k.k for any T.

A.1.2 Consistency of penalized extremum estimators

Proof of Theorem 1: For any T and some given ε > 0, let us deﬁne ϕ∗T ∈ ΘT such that

Q∞ (ϕ∗T ) + λT G (ϕ∗T ) = inf Q∞ (ϕ) + λT G (ϕ) .

ϕ∈ΘT :kϕ−ϕ0 k≤ε

£ ¤
We have P [kϕ̂ − ϕ0 k > ε] ≤ P inf ϕ∈ΘT :kϕ−ϕ0 k≥ε QT (ϕ) + λT G(ϕ) ≤ QT (ϕ∗T ) + λT G(ϕ∗T ) .

Let us bound the probability on the RHS. Denoting ∆QT := QT − Q∞ , we get

inf QT (ϕ) + λT G(ϕ) ≤ QT (ϕ∗T ) + λT G(ϕ∗T )

ϕ∈ΘT :kϕ−ϕ0 k≥ε

=⇒ inf Q∞ (ϕ) + λT G(ϕ) + inf ∆QT (ϕ) ≤ Q∞ (ϕ∗T ) + λT G (ϕ∗T ) + sup |∆QT (ϕ)|
ϕ∈Θ:kϕ−ϕ0 k≥ε ϕ∈ΘT ϕ∈ΘT

=⇒ inf Q∞ (ϕ) + λT G(ϕ) − λT G(ϕ0 ) ≤ inf Q∞ (ϕ) + λT [G (ϕ) − G(ϕ0 )]

ϕ∈Θ:kϕ−ϕ0 k≥ε ϕ∈ΘT :kϕ−ϕ0 k≤ε

+2 sup |∆QT (ϕ)|

ϕ∈ΘT

≤ inf Q∞ (ϕ) + |G (ϕ) − G(ϕ0 )|

ϕ∈ΘT :kϕ−ϕ0 k≤ε

+2 sup |∆QT (ϕ)|

ϕ∈ΘT

= ρT + 2δ T .

Thus, from (iii) we get for a > 0

P [kϕ̂ − ϕ0 k > ε] ≤ P [Cε (λT ) ≤ ρT + 2δ T ]

· ¸
1 1 a a
= P 1 ≤ −a a (T ρT + 2T δ T ) =: P [1 ≤ ZT ] .
λT Cε (λT ) (T λT )

53
p
Since λT → 0 such that (T λT )−1 → 0, P -a.s., for a chosen as in (iv) we have ZT → 0, and we

deduce P [kϕ̂ − ϕ0 k > ε] ≤ P [ZT ≥ 1] → 0. Since ε > 0 is arbitrary, the proof is concluded.

Proof of Proposition 2: By contradiction, assume that Condition (iii) of Theorem 1

is not satisﬁed. Then there exists ε > 0 and a sequence (λn ) such that λn & 0 and

Cε (λn ) ≤ 0, ∀n ∈ N. (21)

By deﬁnition of function Cε (λ), for any λ > 0 and η > 0, there exists ϕ ∈ Θ such that

kϕ − ϕ0 k ≥ ε, and Q∞ (ϕ) + λG (ϕ) − λG (ϕ0 ) ≤ Cε (λ) + η. Setting λ = η = λn for n ∈ N,

we deduce from (21) that there exists a sequence (ϕn ) such that ϕn ∈ Θ, kϕn − ϕ0 k ≥ ε, and

Q∞ (ϕn ) + λn G (ϕn ) − λn G (ϕ0 ) ≤ λn . (22)

Now, since Q∞ (ϕn ) ≥ 0, we get λn G (ϕn ) − λn G (ϕ0 ) ≤ λn , that is

G (ϕn ) ≤ G (ϕ0 ) + 1. (23)

Moreover, since G (ϕn ) − G (ϕ0 ) ≥ G0 − G (ϕ0 ), we get Q∞ (ϕn ) + λn G0 − λn G (ϕ0 ) ≤ λn

from (22), that is Q∞ (ϕn ) ≤ λn (1 + G (ϕ0 ) − G0 ), which implies

lim Q∞ (ϕn ) = 0 = Q∞ (ϕ0 ). (24)

Obviously, the simultaneous holding of (23) and (24) violates Assumption (10). ¥

A.1.3 Penalization with Sobolev norm

54
In this Section we check that the assumptions in A.1.1 and A.1.2 hold for the special case

G(ϕ) = kϕk2H under Assumptions 1 and 2.

i) The mapping ϕ → kϕk2H is lower semi-continuous on H 2 [0, 1] w.r.t. the norm k.k [see

Reed and Simon (1980), p. 358].

ii) Let us verify that the assumptions of Proposition 2 are satisﬁed. Clearly function

G(ϕ) = kϕk2H is bounded from below by 0. Let us now check that Assumption (10) in

Proposition 2 is satisﬁed.

Lemma A.1: Assumptions 1 and 2 imply Assumption (10) in Proposition 2.

Proof: Let ε > 0 and let (ϕn ) be a sequence in Θ such that kϕn − ϕ0 k ≥ ε for all n ∈ N,

and

Q∞ (ϕn ) → 0 as n → ∞. (25)

We have to prove

kϕn kH → ∞ as n → 0. (26)

ϕn − ϕ0
To this aim, deﬁne sequence en = , n ∈ N. Then, ken k = 1 for all n ∈ N,
kϕn − ϕ0 k
h∆ϕn , A∗ A∆ϕn i 1
and from Assumption 1 and (25), hen , A∗ Aen i = 2 ≤ Q∞ (ϕn ) → 0,
kϕn − ϕ0 k c2 ε2
as n → ∞. Let Π(N) denote the orthogonal projection [w.r.t. the scalar product h., .i)] on

the subspace spanned by {ψ1 , ..., ψ N }. Then we have for any N ∈ N

XN N ∞
° °
°Π(N) en °2 = 1 X 1 X
hψj , en i2 ≤ µj hψj , en i2 ≤ µj hψj , en i2
j=1
µN j=1 µN j=1
1
= hen , A∗ Aen i → 0, as n → ∞,
µN
° °
that is °Π(N) en ° → 0 as n → ∞, for any N ∈ N.

55
Let us now derive a lower bound for the Sobolev norm ken kH . We have

° ° ° °
ken kH ≥ °Π⊥ ° ° °
(N) en H − Π(N) en H , (27)

derive bounds for the two terms in the RHS of (27). We have
° ∞ ° °P °
° ° ° ∞ ° Ã ∞ !1/2
° ⊥ ° X hψ ,
° j=N +1 j n j ° e iψ X
°Π(N) en ° = ° ° ° H 2
hψj , en iψj ° = ³ ´1/2 hψj , en i
H ° ° P∞ 2
j=N+1 hψ j , en i
j=N+1 H j=N+1

Ã ∞ !1/2
X ³ ° °2 ´1/2
≥ inf kϕkH hψj , en i2 = MN+1 1 − °Π(N) en ° ,
ϕ∈SN +1 :kϕk=1
j=N+1
© ª
since ken k = 1, where SN+1 =span ψj : j ≥ N + 1 , and MN+1 = inf ϕ∈SN +1 :kϕk=1 kϕkH .

Moreover,
°N °2
° °2 °X ° N
X
°Π(N) en ° = ° °
° hψj , en iψj ° = hψj , en ihψl , en ihψj , ψ l iH
H ° °
j=1 H j,l=1
N
X N
¯ ¯ ° ° ° ° X ¯ ¯
≤ ¯hψj , en i¯ |hψl , en i| °ψj ° kψl k ≤ max °ψj °2 ¯hψj , en i¯ |hψl , en i|
H H H
j=1,...,N
j,l=1 j,l=1
Ã N
!2 N
X ¯ ¯ X ° °2
= MN ¯hψj , en i¯ ≤ NM N hψj , en i2 = NM N °Π(N) en ° ,
j,l=1 j,l=1
° °2
where M N = maxj=1,...,N °ψj °H . Thus, we get from (27)

³ ° °2 ´1/2 ° °
ken kH ≥ MN +1 1 − °Π(N ) en ° − cN °Π(N) en ° , (28)

p
for any N and n ∈ N, where cN = NM N . Note that, since M N ≥ kψN k2H ≥ MN2 , by

Assumption 2 we have cN → ∞ as N → ∞. Since the bound (28) holds for any N ∈ N, it

follows

³ ° °2 ´1/2 ° °
ken kH ≥ MNn +1 1 − Π(Nn ) en °
° − cNn °Π(Nn ) en ° , for any n ∈ N, (29)

56
for any sequence of integers (Nn ).

Let us now prove that there exists a sequence of integers (Nn ) such that the RHS of (29)

diverges. To this goal, deﬁne the sequence n(N), N = 1, 2... recursively by

© ° ° ª
n(1) = min n∗ ∈ N | c1 °Π(1) en ° ≤ 1 for all n ≥ n∗ ,
© ° ° ª
n(N) = min n∗ ∈ N | n∗ > n(N − 1) , cN °Π(N) en ° ≤ 1 for all n ≥ n∗ , N = 2, ...

° °
Since cN °Π(N) en ° → 0 as n → ∞, for any N ∈ N, it follows that n(N) < ∞, for any N ∈ N,

and the sequence n(N), N = 1, 2... is strictly increasing. Then, let the sequence of integers

(Nn ), for n ≥ n(1), be deﬁned by





 1 if n(1) ≤ n < n(2),



Nn = 2 if n(2) ≤ n < n(3),





 ...
 ..
.

By construction, we have
° °
cNn °Π(Nn ) en ° ≤ 1, (30)

for any n ≥ n(1). Since cN → ∞ as N → ∞, we deduce

° °
°Π(Nn ) en ° ≤ 1/2, ∀n large enough. (31)

Using Bounds (30) and (31) in Inequality (29), we get ken kH ≥ MNn +1 (3/4)1/2 − 1 → ∞,

as n → ∞, from Assumption 2.

Finally, we get

kϕn kH = kkϕn − ϕ0 k en + ϕ0 kH ≥ kϕn − ϕ0 k ken kH − kϕ0 kH (32)

≥ ε ken kH − kϕ0 kH → ∞.

57
Therefore, (26) follows, and the proof is concluded. ¥

58
Appendix 2

The MISE of the TiR estimator

In this Appendix we derive the asymptotic expansion of the MISE with deterministic

sequence of regularisation parameters (Proof of Proposition 3). We focus on the linear IV

R
case m (ϕ, z) = E0 [ϕ (X) − Y | Z = z] = (Aϕ) (z)−r (z) , where (Aϕ) (z) = ϕ(x)f (w|z)dw
R
and r(z) = yf(w|z)dw.

A.2.1 The ﬁrst-order condition

The objective function of the TiR estimator becomes in the linear case

1 X h³ ´ i2
T
QT (ϕ) + λT kϕk2H = Âϕ (Zt ) − r̂ (Zt ) + λT hϕ, ϕiH . (33)
T t=1

Let us now prove that this objective function can be written as a quadratic form in ϕ ∈

H 2 [0, 1]. To this aim, let us introduce the dual operator Â of Â.

Lemma A.1: Under regularity conditions, the following properties hold P -a.s. :

(i) Function r̂ is in L2 (FZ );

(ii) Operator Â maps H 2 [0, 1] into L2 (FZ );

(iii) There exists a linear operator Â from L2 (FZ ) into H 2 [0, 1], such that
³ ´ 1 X³ ´
T
h, Â ψ = Âh (Zt ) ψ (Zt ) , for any ψ ∈ L2 (FZ ) and h ∈ H 2 [0, 1];
H T t=1
(iv) Operator Â Â : H 2 [0, 1] → H 2 [0, 1] is compact.

Proof: See Appendix B.

59
Then, from Lemma A.1 i)-iii), Criterion (33) can be rewritten as

³ ´
QT (ϕ) + λT kϕk2H = hϕ, λT + Â Â ϕiH − 2hϕ, Â r̂iH , (34)

up to a term independent of ϕ. From Lemma A.1 iv), Â Â is a compact operator from

H 2 [0, 1] in itself. Since Â Â is positive, operator λT + Â Â is invertible [Kress (1999),

Theorem 3.4]. It follows that the quadratic criterion function (34) admits a global minimum
³ ´
2
over H [0, 1]. It is given by the ﬁrst-order condition Â Â + λT ϕ b T = Â r̂, that is

³ ´−1
ϕ̂ = λT + Â Â Â r̂. (35)

A.2.2 The asymptotic expansion of the ﬁrst-order condition

Let us now expand the estimator in (35). We can write

Z Z ³ ´
r̂(z) = ˆ
(y − ϕ0 (x)) ∆f(w|z)dw + ˆ
ϕ0 (x)f(w|z)dw =: ψ̂(z) + Âϕ0 (z) ,

³ ´−1 ³ ´−1
where ∆fˆ(w|z) := f(w|z)−f(w|z).
ˆ Hence, ϕ̂ = λT + Â Â Â ψ̂+ λT + Â Â Â Âϕ0 ,

which yields

£ ¤
ϕ̂ − ϕ0 = (λT + A A)−1 A ψ̂ + (λT + A A)−1 A Aϕ0 − ϕ0 + RT

=: VT + D(λT ) + RT , (36)

where the remaining term RT is given by

·³ ´−1 ¸
−1
RT = λT + Â Â Â − (λT + A A) A ψ̂
·³ ´−1 ¸
−1
+ λT + Â Â Â Â − (λT + A A) A A ϕ0 .

60
³ ´
Lemma A.2: Assume the bandwidth conditions hm
T = o (λT ) , λ/TT = o hdTZ , where m is

the order of the kernel K , and dZ the dimension of Z. Then, under regularity assumptions,
£ ¤ ¡ £ ¤¢
E kRT k2 = o E kVT + D(λT )k2 .

Proof: See Appendix B.

From (36) we deduce

£ ¤ £ ¤ £ ¤
E kϕ̂ − ϕ0 k2 = E kVT + D(λT )k2 + E kRT k2 + 2E [(VT + D(λT ), RT )]
£ ¤ ¡ £ ¤¢
= E kVT + D(λT )k2 + o E kVT + D(λT )k2 ,

by applying twice the Cauchy-Schwarz inequality and Lemma A.2. Since

£ ° °2
2¤ ° −1 −1 °
E kVT + D(λT )k = °(λT + A A) A Aϕ0 − ϕ0 + (λT + A A) A E ψ̂°
·° ³ ´°2 ¸
° −1 °
+E °(λT + A A) A ψ̂ − E ψ̂ ° , (37)

we get

£ ¤ ° °2
° °
E kϕ̂ − ϕ0 k2 = °(λT + A A)−1 A Aϕ0 − ϕ0 + (λT + A A)−1 A E ψ̂°
·° ³ ´°2 ¸
° −1 °
+E °(λT + A A) A ψ̂ − E ψ̂ ° , (38)

up to a term which is asymptotically negligible w.r.t. the RHS. This asymptotic expansion

consists of a bias term (regularisation bias plus estimation bias) and a variance term, which

will be analysed separately below in Lemma A.3 and A.4. Combining these two Lemmas

and the asymptotic expansion in (38) results in Proposition 3 .

A.2.3 Asymptotic expansion of the variance term

61
Lemma A.3: Under regularity conditions, up to a term which is asymptotically negligible
·° ³ ´°2 ¸ ∞
° −1 ° 1X νj ° °2
°φ ° .
w.r.t. the RHS, we have E °(λT + A A) A ψ̂ − E ψ̂ ° =
T j=1 (λT + ν j )2 j
Proof: See Appendix B.

A.2.4 Asymptotic expansion of the bias term

° °
Lemma A.4: Deﬁne b(λT ) = °(λT + A A)−1 A Aϕ0 − ϕ0 °. Then, under regularity condi-

tions and the bandwidth condition hm

T = o (λT b(λT )) , where m is the order of the kernel K,
° °
° °
we have °(λT + A A)−1 A Aϕ0 − ϕ0 + (λT + A A)−1 A E ψ̂° = b(λT ), up to a term which

is asymptotically negligible w.r.t. the RHS.

Proof: See Appendix B.

62
Appendix 3

Rate of convergence with geometric spectrum

In this Appendix we prove Proposition 4.

i) The next Lemma A.5 characterizes the variance term of the asymptotic expansion of the

MISE in Proposition 3.

° °2
Lemma A.5: Let ν j and °φj ° satisfy Assumption 5, and deﬁne the function
X∞ µ ¶1−β
νj ° °2
I(λ) = ° ° , λ > 0. Then, lim λ [log (1/λ)] I(λ) = 1
β
2 φj C2 .
j=1
(λ + ν j ) λ→0 α
Proof: See Appendix B.

1 1
From Lemma A.5 and using Assumption 6, we get MT (λ) = c1 β
+ c2 λ2δ ,
T λ [log (1/λ)]
µ ¶1−β
1
for λ → 0 and T → ∞, where c1 = C2 , c2 = C32 .
α
ii) The optimal sequence λ∗T is obtained by minimizing function MT (λ) w.r.t. λ. We have
µ ¶
dMT (λ) 1 1 β β−1 1
= − c1 2 2β
[log (1/λ)] − λβ [log (1/λ)] + 2c2 δλ2δ−1
dλ T λ [log (1/λ)] λ
1 log (1/λ) − β
= − c1 2 + 2c2 δλ2δ−1 .
T λ [log (1/λ)]β+1

Thus
dMT (λ∗T ) 1 c1 log (1/λ∗T ) − β
= 0 ⇐⇒ ∗ β+1
= (λ∗T )2δ+1 . (39)
dλ T 2c2 δ [log (1/λT )]

To solve the latter equation for λ∗T , deﬁne τ T := log (1/λ∗T ). Then τ T satisﬁes

1 1+β 1
τ T = c3 + log T + log τ T − log (τ T − β) ,
1 + 2δ 1 + 2δ 1 + 2δ

63
where c3 = (1 + 2δ)−1 log (2c2 δ/c1 ). It follows that

1 1+β
τ T = c4 + log T + log log T + o (log log T ) ,
1 + 2δ 1 + 2δ

for some constant c4 , that is

1 1+β
log (λ∗T ) = −c4 − log T − log log T + o (log log T ) .
1 + 2δ 1 + 2δ

iii) Finally, let us compute the MISE corresponding to λ∗T . We have

1 1 1 1
MT (λ∗T ) = c1 ∗ ∗ β
+ c2 (λ∗T )2δ = c1 ∗ β + c2 (λ∗T )2δ .
T λT [log (1/λT )] T λT τ T

µ ¶ 2δ+1
1 Ã ! 2δ+1
1

c1 1
− 2δ+1 τT − β 1 − β
From (39), λ∗T = T = c5 T − 2δ+1 τ T 2δ+1 , for some constant c5 ,
2c2 δ τ β+1
T
up to a term which is negligible w.r.t. the RHS. Thus we get

1 c1 − 2δ+1
1 1 − 2δ+1 2δ − 2δβ
MT (λ∗T ) = T β + c2 c2δ
5 T τ T 2δ+1
T c5 − +β
τ T 2δ+1
2δβ
2δ − 2δ 2δβ
= c6 T − 2δ+1 τ T 2δ+1 = c7 T − 2δ+1 (log T )− 2δ+1 ,

for some constants c6 and c7 , up to a term which is negligible w.r.t. the RHS.

Interval Estimation Practice Questions
0% (2)
Interval Estimation Practice Questions
19 pages
Stat-II CH-TWO
No ratings yet
Stat-II CH-TWO
68 pages
Sample Survey Ref
100% (1)
Sample Survey Ref
73 pages
Mixed Models Theory and Applications With R 2nd Edition Complete Ebook Edition
100% (15)
Mixed Models Theory and Applications With R 2nd Edition Complete Ebook Edition
17 pages
Unit 1
100% (3)
Unit 1
42 pages
Econometrics 1 Handbook 1
No ratings yet
Econometrics 1 Handbook 1
201 pages
Restricted Parameter Space Estimation Problems
No ratings yet
Restricted Parameter Space Estimation Problems
171 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Statistical Inference Point Estimators Estimating The Population Mean Using Confidence Intervals
No ratings yet
Statistical Inference Point Estimators Estimating The Population Mean Using Confidence Intervals
40 pages
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
From Everand
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
César Pérez López
No ratings yet
Estimation Theory Overview
100% (1)
Estimation Theory Overview
17 pages
Bootstrapping Max Statistics in High Dimensions - Near-Parametric Rates Under Weak Variance Decay and Application To Functional and Multinomial Data
No ratings yet
Bootstrapping Max Statistics in High Dimensions - Near-Parametric Rates Under Weak Variance Decay and Application To Functional and Multinomial Data
64 pages
Statistics 512 Notes I D. Small
No ratings yet
Statistics 512 Notes I D. Small
8 pages
KEBRI DEHAR UNIVERSITY - MoSHE Model For ABVM
No ratings yet
KEBRI DEHAR UNIVERSITY - MoSHE Model For ABVM
58 pages
Statistics Diffusions
No ratings yet
Statistics Diffusions
66 pages
Notes Part 2 PDF
No ratings yet
Notes Part 2 PDF
63 pages
Adaptive Estimation in An Autoregression and A Geometrical
No ratings yet
Adaptive Estimation in An Autoregression and A Geometrical
37 pages
rssb12129 Sup 0001 Supinfo
No ratings yet
rssb12129 Sup 0001 Supinfo
39 pages
14 Aos1221
No ratings yet
14 Aos1221
37 pages
Annal Horowitz Mammen 2004
No ratings yet
Annal Horowitz Mammen 2004
32 pages
Estimation of Ordinary Differential Equation Models With Discretization Error Quantification
No ratings yet
Estimation of Ordinary Differential Equation Models With Discretization Error Quantification
30 pages
Arthur E. Albert, Leland A. Gardner Jr. Stochastic Approximation and NonLinear Regression
No ratings yet
Arthur E. Albert, Leland A. Gardner Jr. Stochastic Approximation and NonLinear Regression
211 pages
Bias Variance PDF
No ratings yet
Bias Variance PDF
58 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Econometrics Chapter 3
No ratings yet
Econometrics Chapter 3
24 pages
The Risk of Machine Learning
No ratings yet
The Risk of Machine Learning
66 pages
Dommel and Pichler - 2024 - On The Approximation of Kernel Functions
No ratings yet
Dommel and Pichler - 2024 - On The Approximation of Kernel Functions
27 pages
Slide-Co Minh NT
No ratings yet
Slide-Co Minh NT
162 pages
Validación Cruzada
No ratings yet
Validación Cruzada
23 pages
Carrasco GeneralizationGMMContinuum 2000
No ratings yet
Carrasco GeneralizationGMMContinuum 2000
39 pages
MLE and Model Selection
No ratings yet
MLE and Model Selection
22 pages
Technometrics
No ratings yet
Technometrics
14 pages
NeurIPS 2020 Minimax Estimation of Conditional Moment Models Paper
No ratings yet
NeurIPS 2020 Minimax Estimation of Conditional Moment Models Paper
15 pages
Estimating Regression Models of Finite But Unknown Order
No ratings yet
Estimating Regression Models of Finite But Unknown Order
17 pages
Minimum L - Distance Estimators For Non-Normalized Parametric Models
No ratings yet
Minimum L - Distance Estimators For Non-Normalized Parametric Models
32 pages
Adaptive Signal Recovery in Sparse Nonparametric Models: Natalia Stepanova and Marie Turcicova
No ratings yet
Adaptive Signal Recovery in Sparse Nonparametric Models: Natalia Stepanova and Marie Turcicova
15 pages
Exo Probastats EPFL
No ratings yet
Exo Probastats EPFL
13 pages
Regress
No ratings yet
Regress
11 pages
C. A. E. Goodhart - R. J. Bhansali
No ratings yet
C. A. E. Goodhart - R. J. Bhansali
64 pages
FERRÉ (2006) - A - Multilayer Preceptón With Functional Inputs
No ratings yet
FERRÉ (2006) - A - Multilayer Preceptón With Functional Inputs
17 pages
Consistent Estimation in Cox Proportional Hazards Models With Covariate Mesaurement Errors (Kong and Gu)
No ratings yet
Consistent Estimation in Cox Proportional Hazards Models With Covariate Mesaurement Errors (Kong and Gu)
17 pages
A Symmetric Function Approach To Polynomial Regression - Hans-Christian Herbig, Daniel Herden, Christopher Seaton
No ratings yet
A Symmetric Function Approach To Polynomial Regression - Hans-Christian Herbig, Daniel Herden, Christopher Seaton
12 pages
Local Linear Regression For Functional Data: Alain Berlinet, Abdallah Elamine, André Mas Université Montpellier 2
No ratings yet
Local Linear Regression For Functional Data: Alain Berlinet, Abdallah Elamine, André Mas Université Montpellier 2
23 pages
Math
No ratings yet
Math
14 pages
Conditional Least Squares Estimation in Nonlinear and Nonstationary Stochastic Regression Models
No ratings yet
Conditional Least Squares Estimation in Nonlinear and Nonstationary Stochastic Regression Models
27 pages
Math
No ratings yet
Math
15 pages
Krueger - Experimental Estimates of Education Production
No ratings yet
Krueger - Experimental Estimates of Education Production
37 pages
09 SS049
No ratings yet
09 SS049
14 pages
Bai 1996
No ratings yet
Bai 1996
27 pages
A Robust Method For Multiple Linear Regression: Technometrics
No ratings yet
A Robust Method For Multiple Linear Regression: Technometrics
10 pages
Ridge Regression
No ratings yet
Ridge Regression
9 pages
Choosing Between and Interpreting The Heckit and Two-Part Models For Corner Solutions
No ratings yet
Choosing Between and Interpreting The Heckit and Two-Part Models For Corner Solutions
14 pages
Solution 3 Problem 1: Let X
No ratings yet
Solution 3 Problem 1: Let X
12 pages
R300 Solution Guide 2018M
No ratings yet
R300 Solution Guide 2018M
8 pages
Sol Exam 2006
No ratings yet
Sol Exam 2006
12 pages
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
Shannon Normalization
No ratings yet
Shannon Normalization
7 pages
Wang Schaubel 2018 Supplemental
No ratings yet
Wang Schaubel 2018 Supplemental
12 pages
Sampling Distribution
No ratings yet
Sampling Distribution
10 pages
Nonparametric Estimation of Trend For Stochastic Differential Equations Driven by Multiplicative Stochastic Volatility
No ratings yet
Nonparametric Estimation of Trend For Stochastic Differential Equations Driven by Multiplicative Stochastic Volatility
11 pages
Chapman-Kolmogorov Equations 30 The Effect of 48511
No ratings yet
Chapman-Kolmogorov Equations 30 The Effect of 48511
9 pages
CookTsai (1985) Biometrika
No ratings yet
CookTsai (1985) Biometrika
8 pages
Topic 2a Theory of Estimation
No ratings yet
Topic 2a Theory of Estimation
12 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
1954 - Application of The Rayleigh Ritz Method To Variational Problem by Indritz
No ratings yet
1954 - Application of The Rayleigh Ritz Method To Variational Problem by Indritz
37 pages
Estimation of Time-Varying Par in STAT Models - Bertsimas Et - Al. (1999) - PUB
No ratings yet
Estimation of Time-Varying Par in STAT Models - Bertsimas Et - Al. (1999) - PUB
21 pages
Tikhonov Regularization
No ratings yet
Tikhonov Regularization
8 pages
ECQ Manual PDF
No ratings yet
ECQ Manual PDF
29 pages
STAT 135 Solutions To Homework 3:: 30 Points
No ratings yet
STAT 135 Solutions To Homework 3:: 30 Points
8 pages
Nonlinear Estimation For Linear Inverse Problems With Error in The Operator
No ratings yet
Nonlinear Estimation For Linear Inverse Problems With Error in The Operator
27 pages
Sampling Techniques MCQ
100% (2)
Sampling Techniques MCQ
47 pages
Applied Iterative Methods
From Everand
Applied Iterative Methods
Louis A. Hageman
No ratings yet
Industrial Mathematics Institute: Research Report
No ratings yet
Industrial Mathematics Institute: Research Report
25 pages
Tests For Choice of Regularization Parameter in Nonlinear Inverse Problems
No ratings yet
Tests For Choice of Regularization Parameter in Nonlinear Inverse Problems
17 pages
Parameter estimation: S (x) = S (ϑ, x), the observed process is, X, 0 ≤ t ≤ T
No ratings yet
Parameter estimation: S (x) = S (ϑ, x), the observed process is, X, 0 ≤ t ≤ T
13 pages
Making Use of Incomplete Observations in The Analysis of Structural Equation Models The CALIS Procedure's Full Information Maximum Likelihood Method in SAS STAT 9.3
No ratings yet
Making Use of Incomplete Observations in The Analysis of Structural Equation Models The CALIS Procedure's Full Information Maximum Likelihood Method in SAS STAT 9.3
20 pages
Probability Coin Flip Ext
No ratings yet
Probability Coin Flip Ext
12 pages
Boukeloua 2015
No ratings yet
Boukeloua 2015
8 pages
Uncertainties in Measurement
No ratings yet
Uncertainties in Measurement
7 pages
Technical Sheet: Uncertainty of Test Result
No ratings yet
Technical Sheet: Uncertainty of Test Result
6 pages
Research Article A Method For Identifying A Spacewise Dependent Heat Source Under Stochastic Noise Interference
No ratings yet
Research Article A Method For Identifying A Spacewise Dependent Heat Source Under Stochastic Noise Interference
14 pages
Lec Note 7 2024
No ratings yet
Lec Note 7 2024
4 pages
Journal of Petroleum Science and Engineering: Hong Tang, Christopher D. White
No ratings yet
Journal of Petroleum Science and Engineering: Hong Tang, Christopher D. White
6 pages
Cir Asfda Asfd Asdf
No ratings yet
Cir Asfda Asfd Asdf
4 pages
Pre and Post Merger P-E Ratios
No ratings yet
Pre and Post Merger P-E Ratios
4 pages
Sol Stat Chapter2
No ratings yet
Sol Stat Chapter2
9 pages
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction To Econometrics, 5 Edition: Review: Random Variables, Sampling, Estimation, and Inference
No ratings yet
Introduction To Econometrics, 5 Edition: Review: Random Variables, Sampling, Estimation, and Inference
7 pages