copula approach for dependence

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Journal of Multivariate Analysis 171 (2019) 139–162

Contents lists available at ScienceDirect

Journal of Multivariate Analysis


journal homepage: www.elsevier.com/locate/jmva

A copula approach for dependence modeling in multivariate


nonparametric time series

Natalie Neumeyer a , Marek Omelka b , , Šárka Hudecová b
a
Fachbereich Mathematik, Universität Hamburg, Bundesstraße 55, 20146 Hamburg, Germany
b
Department of Probability and Statistics, Faculty of Mathematics and Physics, Charles University, Sokolovská 83, 186 75 Praha 8,
Czech Republic

article info a b s t r a c t

Article history: This paper is concerned with modeling the dependence structure of two (or more) time
Received 18 May 2017 series in the presence of a (possibly multivariate) covariate which may include past values
Available online 6 December 2018 of the time series. We assume that the covariate influences only the conditional mean and
AMS 2010 subject classifications: the conditional variance of each of the time series but the distribution of the standardized
primary 62H12 innovations is not influenced by the covariate and is stable in time. The joint distribution of
secondary 62G05 the time series is then determined by the conditional means, the conditional variances and
62M10 the marginal distributions of the innovations, which we estimate nonparametrically, and
Keywords: the copula of the innovations, which represents the dependency structure. We consider a
Asymptotic representation nonparametric and a semiparametric estimator based on the estimated residuals. We show
CHARN model that under suitable assumptions, these copula estimators are asymptotically equivalent to
Empirical copula process estimators that would be based on the unobserved innovations. The theoretical results are
Goodness-of-fit testing illustrated by simulations and a real data example.
Nonparametric AR-ARCH model
© 2018 Elsevier Inc. All rights reserved.
Nonparametric SCOMDY model
Weak convergence

1. Introduction

Modeling the dependence of k observed time series can be of utmost importance for applications, e.g., in risk management
to model the dependence between several exchange rates. We will consider the problem of modeling k dependent
nonparametric AR-ARCH time series defined, for all i ∈ {1, . . . , n}, j ∈ {1, . . . , k}, by

Yji = mj (X i ) + σj (X i ) εji ,

where the covariate X i may include past values of the process, Yj i−1 , Yj i−2 , . . . with j ∈ {1, . . . , k}, or other exogenous
variables. It will be assumed that the innovations (ε1i , . . . , εki ) with i ∈ Z, are mutually independent and identically
distributed random vectors and that (ε1i , . . . , εki ) is independent of the past and present covariates {X ℓ : ℓ ≤ i} for all
i ∈ Z. For identifiability we further assume E (εji ) = 0, var(εji ) = 1 for all j ∈ {1, . . . , k}, so that the functions mj and
σj represent the conditional mean and volatility function of the jth time series. Such models are also called multivariate
nonparametric CHARN (conditional heteroscedastic autoregressive nonlinear) models and have gained much attention over
the last decades; see Fan and Yao [10] and Gao [11] for extensive overviews.

∗ Corresponding author.
E-mail address: [email protected] (M. Omelka).

https://doi.org/10.1016/j.jmva.2018.11.016
0047-259X/© 2018 Elsevier Inc. All rights reserved.
140 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

Note that due to the structure of the model and Sklar’s theorem, see, e.g., Nelsen [26], for zj = {yj − mj (x)}/σj (x) with
j ∈ {1, . . . , k}, one has
Pr(Y1i ≤ y1 , . . . , Yki ≤ yk | X i = x) = Pr(ε1i ≤ z1 , . . . , εki ≤ zk ) = C {F1ε (z1 ), . . . , Fkε (zk )},
where F1ε , . . . , Fkε denote the marginal distributions of the innovations and C their copula. Thus the joint conditional
distribution of the observations, given the covariate, is completely specified by the individual conditional mean and variance
functions, the marginal distributions of the innovations, and their copula. The copula C describes the dependence structure
of the k time series, conditional on the covariates, after removing influences of the conditional means and variances as well
as marginal distributions.
We will model the conditional mean and variance function nonparametrically like Härdle et al. [18], among others.
Semiparametric estimation, e.g., with additive structure for mj and multiplicative structure for σj2 as in Yang et al. [36],
can be considered as well and all results derived herein remain valid under appropriate changes for the estimators and
assumptions. Further we will model the marginal distributions of the innovations nonparametrically, whereas we will
consider two options for the estimation of the copula C : a parametric and a nonparametric approach. As the innovations
are unobservable, both estimators will be based on estimated residuals. We will show that the asymptotic distribution is not
affected by the necessary pre-estimation of the mean and variance functions. This remarkable result is intrinsic for copula
estimation and it was already observed in (semi-)parametric estimation of copula; see the references in the next paragraph.
In contrast, the asymptotic distribution of empirical distribution functions is typically influenced by pre-estimation of mean
and variance functions. Moreover, a comparison between the parametric and nonparametric copula estimator allows us to
test the fit of a parametric class of copulas.
Our approach extends the following parametric and semiparametric approaches in time series contexts. Chen and Fan
[6] introduced SCOMDY (semiparametric copula-based multivariate dynamic) models which are very similar to the model
considered here. However, the conditional mean and variance functions are modeled parametrically, while the marginal
distributions of innovations are estimated nonparametrically and a parametric copula model is applied to model the
dependence. See also Kim et al. [19] for similar methods for some parametric time series models including nonlinear GARCH
models, Kim et al. [20], Rémillard et al. [30], and the review by Patton [27]. Chan et al. [5] even give (next to the parametric
estimation of a copula) a goodness-of fit test for the innovation copula in the GARCH context. Further, in an iid setting, Gijbels
et al. [16] show that in nonparametric location-scale models the asymptotic distribution of the empirical copula is not
influenced by pre-estimation of the mean and variance function. This result was further generalized by Portier and Segers
[28] to a completely nonparametric model for the marginals.
The remainder of the paper is organized as follows. In Section 2 we define the estimators and state some regularity
assumptions. In Section 2.1 we show the weak convergence of the copula process, while in Section 2.2 we show the
asymptotic normality of a parameter estimator when considering a parametric class of copulas. Section 2.3 is devoted to
goodness-of-fit testing. In Section 3 we present simulation results and Section 4 features a real data example. All proofs are
given in the Appendix.

2. Main results

For the ease of presentation we will focus on the case of two time series, i.e., k = 2, but all results can be extended to
general k ≥ 2 in an obvious manner. Suppose we have observed, for i ∈ {1, . . . , n}, a section of the stationary stochastic
process {Y1i , Y2i , X i }i∈Z that satisfies, for all i ∈ {1, . . . , n},
Y1i = m1 (X i ) + σ1 (X i ) ε1i , Y2i = m2 (X i ) + σ2 (X i ) ε2i ,
where X i = (Xi1 , . . . , Xid ) is a d-dimensional covariate and the innovations {(ε1i , ε2i ) : i ∈ Z} are independent identically

distributed random vectors. Further assume that (ε1i , ε2i ) is independent of the past and present covariates X k , k ≤ i, for
all i ∈ Z, and that E (ε1i ) = E (ε2i ) = 0, var(ε1i ) = var(ε2i ) = 1. If the marginal distribution functions F1ε and F2ε of the
innovations are continuous, then the copula C of the innovations is unique and can be expressed, for all (u1 , u2 ) ∈ [0, 1]2 , as

C (u1 , u2 ) = Fε {F1−ε1 (u1 ), F2−ε1 (u2 )}. (1)


As the innovations (ε1i , ε2i ) are unobserved, the inference about the copula C must be based on the estimated residuals
defined, for all i ∈ {1, . . . , n}, j ∈ {1, 2}, by
ε̂ji = {Yji − m̂j (X i )}/σ̂j (X i ), (2)
where m̂j and σ̂j are the estimates of the unknown functions mj and σj . In what follows we will consider the local polynomial
estimators of order p; see, e.g., [9,23].
Here, for a given x = (x1 , . . . , xd )⊤ , m̂j (x) is defined as β̂0 , the component of β̂ with multi-index 0 = (0, . . . , 0), where β̂
is the solution to the minimization problem
n { }2
∑ ∑
min Y jℓ − βi ψi,hn (X ℓ − x) Khn (X ℓ − x). (3)
β=(βi )i∈I
ℓ=1 i∈I
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 141

Here I = I(d, p) denotes the set of multi-indices i = (i1 , . . . , id ) with i. = i1 + · · · + id ≤ p and ψi,hn (x) =
d (k) ik
k=1 (xk /hn ) (1/ik !). In addition,

d ( )
∏ 1 Xℓk − xk
Khn (X ℓ − x) = (k)
k (k)
,
k=1
hn hn
(1) (d)
with k being a kernel function and hn = (hn , . . . , hn ) the smoothing parameter. Moreover, σj2 (x) is estimated as

σ̂j2 (x) = ŝj (x) − m̂2j (x),


where ŝj (x) is obtained in the same way as m̂j (x) but with Yjℓ replaced with Yj2ℓ .
For any function f defined on J, interval in Rd , define for ℓ ∈ N, δ ∈ (0, 1],
|Di f (x) − Di f (x′ )|
∥f ∥ℓ+δ = max sup |Di f (x)| + max sup ,
i∈I(d,ℓ) x∈J i∈I(d,ℓ) x,x′ ∈J ∥x − x′ ∥δ
i.=ℓ
x̸ =x′

ℓ+δ
where Di = ∂ i. /∂ x11 · · · ∂ xdd , and ∥·∥ is the Euclidean norm on Rd . Denote by CM
i i
(J) the set of ℓ-times differentiable functions
f on J, such that ∥f ∥ℓ+δ ≤ M. Denote by C̃2ℓ+δ (J) the subset of C2ℓ+δ (J) of the functions that satisfy infx∈J f (x) ≥ 1/2.
In what follows we are going to prove that under appropriate regularity assumptions, using the estimated residuals (2)
instead of the (true) unobserved innovations εji affects neither the asymptotic distribution of the empirical copula estimator
nor the parametric estimator of a copula.

2.1. Empirical copula estimation

Mimicking (1), the copula C can be estimated nonparametrically as


C̃n (u1 , u2 ) = F̂ε̂ {F̂1−ε̂1 (u1 ), F̂2−ε̂1 (u2 )},
where
n
1 ∑
F̂ε̂ (y1 , y2 ) = wni 1(ε̂1i ≤ y1 , ε̂2i ≤ y2 ),
Wn
i=1

is the estimate of the joint distribution function Fε (y1 , y2 ) and for j ∈ {1, 2},
n
1 ∑
F̂jε̂ (y) = wni 1(ε̂ji ≤ y),
Wn
i=1

the corresponding marginal empirical cumulative distribution functions. Here we make use of a weight function wn (x) =
1(x ∈ Jn ) and put wni = wn (X i ) as well as Wn = wn1 + · · · + wnn . For some real positive sequence cn → ∞ we set
Jn = [−cn , cn ]d .
(or)
Now let Cn be the ‘oracle’ estimator based on the unobserved innovations, i.e.,
Cn(or) (u1 , u2 ) = F̂ε {F̂1−ε1 (u1 ), F̂2−ε1 (u2 )},
where
n
1∑
F̂ε (z1 , z2 ) = 1(ε1i ≤ z1 , ε2i ≤ z2 )
n
i=1

is the estimator of Fε (z1 , z2 ) based on the unobserved innovations and F̂1ε , F̂2ε the corresponding marginal empirical
cumulative distribution functions.

Regularity assumptions
(β) The process (X i , Y1i , Y2i )i∈Z is strictly stationary and absolutely regular (β -mixing) with the mixing coefficient βi that
satisfies βi = O(i−b ) with b > d + 3.
(Fε ) The second-order partial derivatives Fε(1,1) , Fε(1,2) and Fε(2,2) of the joint cumulative distribution function Fε (y1 , y2 ) =
Pr(ε1 ≤ y1 , ε2 ≤ y2 ), with Fε(j,k) (y1 , y2 ) = ∂ 2 Fε (y1 , y2 )/(∂ yj ∂ yk ), satisfy

max sup |Fε(j,k) (y1 , y2 )|(1 + |yj |)(1 + |yk |) < ∞.


j,k∈{1,2} y1 ,y2 ∈R

Furthermore, for j ∈ {1, 2}, the innovation density fjε satisfies

ε (u)|} fjε {Fjε (u)} = 0.


1 −1 1 −1
lim {1 + |Fj−
ε (u)|} fjε {Fjε (u)} = 0 and lim {1 + |Fj−
u→0+ u→1−
142 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

(FX ) The common density fX of the observations X i with i ∈ Z is bounded and differentiable with bounded uniformly
continuous first order partial derivatives. Suppose that the sequence cn which is of order O{(ln n)1/d } is chosen in such
a way that infx∈Jn fX (x) converges to zero not faster than some negative power of ln n.
(M) For some s > (2b − 2 − d)/(b − 3 − d) with b from assumption (β), for j ∈ {1, 2}, E |εj0 |2s < ∞, the functions σj2s fX
and |mj σj |s fX are bounded and there are some i∗ ∈ N, B > 0 such that for all i ≥ i∗ ,
sup σj2 (x0 )σj2 (xi )fX 0 ,X i (x0 , xi ) ≤ B, sup |mj (x0 )mj (xi )| σj (x0 )σj (xi )fX 0 ,X i (x0 , xi ) ≤ B,
x0 ,xi x0 ,xi

where fX 0 ,X i denotes the joint density of (X 0 , X i ) and is bounded for i ≥ i∗ .


p+1
(mσ ) For j ∈ {1, 2} and for each n ∈ N, let mj and σj be elements of CMn (Jn ) for some sequence Mn that is either bounded or
diverges to infinity not faster than some power of n. Further, assume E{σj4 (X 1 )} < ∞ and that minj∈{1,2} infx∈Jn σj (x)
is either bounded away from zero or converges to zero not faster than a negative power of ln n.
(k)
(Bw) There exists a sequence hn such that hn /hn → ak , where ak ∈ (0, ∞), k ∈ {1, . . . , d}. Further, there exists some
δ > d/(b − 1) such that, for all D > 0,
+2δ
nh2p
n
+2
(ln n)D = o(1), nh3d
n (ln n)−D → ∞. (4)
(k) k : R → R is a symmetric (d + 2)-times continuously differentiable probability density function supported on [−1, 1].

Remark 1. Using Fε (y1 , y2 ) = C {F1ε (y1 ), F2ε (y2 )}, assumption (Fε ) requires that
max sup |C (j,k) (u1 , u2 ) fjε {Fj−ε 1 (uj )} fkε {Fk−ε1 (uk )}
j,k∈{1,2} u ,u ∈[0,1]
1 2

+ C (j) (u1 , u2 ) fjε′ {Fj−ε 1 (uj )}1(j = k)| {1 + |Fj−ε 1 (uj )|}{1 + |Fk−ε1 (uk )|} < ∞,
where C (j) (u1 , u2 ) = ∂ C (u1 , u2 )/∂ uj and C (j,k) (u1 , u2 ) = ∂ 2 C (u1 , u2 )/(∂ uj ∂ uk ) stand for the first and second order partial
derivatives of the copula function. Thus provided that for some η > 0
C (j,k) (u1 , u2 ) = O{uj (1 − uj )−η uk (1 − uk )−η },
−η −η

1 −1 η η −1 −1
then we need that the functions fjε (Fj− ′
ε (u)){1+|Fjε (u)|} are of order O{u (1−u) } and the functions fjε {Fjε (uj )}{1+|Fjε (uj )|}
2

are bounded.

Remark 2. Parts of our assumptions are reproduced from Hansen [17] because we apply his results about uniform rates of
convergence for kernel estimators several times in our proofs. Note that in his Theorem 2, we set q = ∞ to simplify the
assumptions. Further note that if beta mixing coefficients are diminishing exponentially fast, then it is sufficient to assume
s > 2 in (M).

Remark 3. Note that the bandwidth conditions (4) can be fulfilled if and only if 2p + 2 > 3d + 2δ , i.e., in view of assumption
(Bw) if and only if 2p + 2 > 3d + 2d/(b − 1). Thus if b > 2d + 1, then for d = 1 it is sufficient to take p = 1 and for d = 2 one
can take p = 3. In general with increasing dimension d, higher smoothness of the unknown functions has to be assumed and
higher order local polynomial estimators have to be used. This phenomenon is well known in the context of nonparametric
inference.
So in general one can choose the bandwidth as hn ∼ n−1/a , where a ∈ (3d + 2d/(b − 1), 2p + 2). The problem is that if
one wants to take p as small as possible, the range of possible values for a is rather short, which makes the choice of a rather
delicate. To make the choice of a more flexible in practice, one can for instance assume that b > 10d + 1 which (among
others) includes models for beta mixing coefficients diminishing exponentially fast. Now for d = 1 and p = 1 one can take
a in the interval (3.1, 4). See also the bandwidth choice in our simulation study in Section 3.

Remark 4. The choice of cn is a delicate problem in practice. As far as we know even in analogous settings, see, e.g., [8,22,25]
and the references therein, this problem has not yet been addressed. Note that the weight function wn (x) is chosen in the
simplest possible form in order to simplify the presentation of the proof. In practice it is of interest to use more general
forms of Jn . Further as the density fX is unknown, data-driven procedures to choose Jn are of interest. In the simulation
study reported in Section 3, we suggest a data-driven procedure for the choice of the weighting function in the case d = 1.
Nevertheless the data driven choice of Jn (in particular for general d) and its theoretical justification call for further research.

Theorem 1. Suppose that assumptions (β), (Fε ), (FX ), (Bw), (M), (k), (Jn ) and (mσ ) are satisfied. Then

sup | n {C̃n (u1 , u2 ) − Cn(or) (u1 , u2 )}| = oP (1).
(u1 ,u2 )∈[0,1]2
√ (or)
Note that Theorem 1 together with the weak convergence of n (Cn −√C ) as reported, e.g., in Proposition 3.1 of [31]
or Theorem 1 (together with Remark 2) in [13], implies that process C̃n = n (C̃n − C ) weakly converges in the space of
bounded functions ℓ∞ ([0, 1]2 ) to a centered Gaussian process GC , which can be written as
GC (u1 , u2 ) = BC (u1 , u2 ) − C (1) (u1 , u2 ) BC (u1 , 1) − C (2) (u1 , u2 ) BC (1, u2 ) ,
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 143

where BC is a Brownian bridge on [0, 1]2 with covariance function


E {BC (u1 , u2 )BC (u′1 , u′2 )} = C (u1 ∧ u′1 , u2 ∧ u′2 ) − C (u1 , u2 ) C (u′1 , u′2 ) .
Nevertheless when one uses this result in applications for statistical inference, we recommend to replace the sample size n
with Wn = wn1 + · · · + wnn in the formulas. The thing is that the copula is estimated in fact only from Wn observations and
this should be reflected in order to improve the finite-sample performance of asymptotic inference procedures.

2.2. Semiparametric copula estimation

The copula C describes the dependency between the two time series of interest, given the covariate. For applications
modeling this dependency structure parametrically is advantageous because a parametric model often gives easier access
to interpretations. Goodness-of-fit testing will be considered in the next section.
Suppose that the joint distribution of (ε1i , ε2i ) is given by the copula C (u1 , u2 ; θ ), where θ = (θ1 , . . . , θp )⊤ is an
unknown parameter that belongs to a parametric space Θ ⊂ Rp . In copula settings we are often interested in semi-
parametric estimation of the parameter θ , i.e., estimation of θ without making any parametric assumption on the marginal
distributions F1ε and F2ε . The methods of semiparametric estimation for iid settings are summarized in Tsukahara [32]. The
question of interest is what happens if we use the estimated residuals (2) instead of the unobserved innovations εji . Generally
speaking, thanks to Theorem 1, the answer is that using ε̂ji instead of εji does not change the asymptotic distribution provided
that the parameter of interest can be written as a Hadamard differentiable functional of a copula.

2.2.1. Method-of-moments using rank correlation


This method is in a general way described for instance in Section 5.5.1 of [24]. To illustrate the application of Theorem 1
for this method consider that the parameter θ is one-dimensional. The inversion of Kendall’s tau is a very popular method
of estimating the unknown parameter. For this method the estimator of θ is given by
θ̂n(ik) = τ −1 (τ̂n ),
where
∫ 1 ∫ 1
τ (θ ) = −1 + 4 C (u1 , u2 ; θ ) dC (u1 , u2 ; θ )
0 0

is the theoretical Kendall’s tau and τ̂n is an estimate of Kendall’s tau. In our settings the Kendall’s tau would be computed
from the estimated residuals (ε̂1i , ε̂2i ) for which wni > 0. By Theorem 1 and Hadamard differentiability of Kendall’s tau
proved in Lemma 1 of [35], the estimators of Kendall’s tau based on ε̂ji or on εji are asymptotically equivalent. Thus provided
that τ ′ (θ ) ̸ = 0 one finds that, as n → ∞,

n (θ̂n(ik) − θ )⇝N [0, στ2 /{τ ′ (θ )}2 ],
where
στ2 = var{8 C (U11 , U21 ; θ ) − 4 U11 − 4 U21 } and (U11 , U21 ) = (F1ε (ε11 ), F2ε (ε21 )).
Analogously, one can show that working with residuals has asymptotically negligible effects also for the method of moments
introduced in Brahimi and Necir [3].

2.2.2. Minimum distance estimation


Here one can follow, for instance, Section 3.2 of [32]. Note that thanks to Theorem 1, the proof of Theorem 3 of [32] does
(or)
not change when Cn is replaced with C̃n . Thus provided assumptions (B.1)–(B.5) of [32] are satisfied with δ(u1 , u2 ; θ ) =
∂ C (u1 , u2 ; θ )/∂θ , then the estimator defined as
∫∫ { }2
(md)
θ̂ n = arg min C̃n (u1 , u2 ) − C (u1 , u2 ; t) du1 du2
t∈Θ [0,1]2
is asymptotically normal and satisfies, as n → ∞,
√ (md)
n (θ̂ n − θ )⇝N (0p , Σ(md) ),
where
[∫ ∫ 2
]
{ ∑ }
Σ(md) = var γ (u1 , u2 ; θ ) 1(U11 ≤ u1 , U21 ≤ u2 ) − C (j) (u1 , u2 ; θ ) 1(Uj1 ≤ uj ) du1 du2 ,
[0,1]2 j=1

with
{∫ ∫ }−1
γ (u1 , u2 ; θ ) = δ(v1 , v2 ; θ ) δ (v1 , v2 ; θ ) dv1 dv2

δ(u1 , u2 ; θ ).
[0,1]2
144 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

2.2.3. M-estimator, rank approximate Z-estimators


To define a general M-estimator let us introduce
Wn
(Ũ1i , Ũ2i ) = (F̂1ε̂ (ε̂1i ), F̂2ε̂ (ε̂2i )) (5)
Wn + 1
that can be viewed as estimates of the unobserved (U1i , U2i ). Note that the multiplier Wn /(Wn + 1) is introduced in order to
have both of the coordinates of the vector (Ũ1i , Ũ2i ) bounded away from 0 and 1. The M-estimator of the parameter θ is now
defined as
n

θ̂ n = arg min wni ρ (Ũ1i , Ũ2i ; t)
t∈Θ
i=1

where ρ (u1 , u2 ; θ ) is a given loss function. This class of estimators includes among others the pseudo-maximum likelihood
(pl)
estimators θ̂ n , for which ρ (u1 , u2 ; θ ) = − ln c(u1 , u2 ; θ ), with c being the copula density function; see [12].
Note that the estimator θ̂ n is usually searched for as a solution to the estimating equations
n

wni φ(Ũ1i , Ũ2i ; θ̂ n ) = 0p , (6)
i=1

where φ(u1 , u2 ; θ ) = ∂ρ (u1 , u2 ; θ )/∂θ . In [32], the estimator defined as the solution of (6) is called a rank approximate
Z -estimator.
In what follows we give general assumptions under which there exists a consistent root θ̂ n of the estimating equations (6)
(or)
that is asymptotically equivalent to the consistent root θ̂ n of the ‘oracle’ estimating equations given by
n
(or)

φ(Û1i , Û2i ; θ̂ n ) = 0p , (7)
i=1

where
n
(Û1i , Û2i ) = (F̂1ε (ε1i ), F̂2ε (ε2i )) (8)
n+1
are the standard pseudo-observations calculated from the unobserved innovations and their marginal empirical distribution
functions F̂jε (y).
Unfortunately, these general assumptions exclude some useful models (e.g., pseudo-maximum likelihood estimator in
the Clayton family of copulas) for which the function φ(u1 , u2 ; θ ) viewed as a function of (u1 , u2 ) is unbounded. The reason
is that for empirical distribution functions computed from estimated residuals ε̂ji , we lack some of the sophisticated results
that are available for empirical distribution functions computed from (true) innovations εji . For such copula families one
can use, e.g., the method of moments using rank correlation (see Section 2.2.1) to stay on the safe side. Nevertheless the
simulation study in Section 3 suggests that the pseudo-maximum likelihood estimation can be used also for the Clayton
copula (and probably also for other families of copulas with non-zero tail dependence) provided that the dependence is not
very strong.

Regularity assumptions
In what follows let θ stand for the true value of the parameter and V (θ ) for an open neighborhood of θ .

(Id) θ is a unique minimizer of the function r(t) = E ρ (U1i , U2i ; t) and θ is an inner point of Θ .
(φ) There exists V (θ ) such that for each ℓ1 , ℓ2 ∈ {1, . . . , p} the functions φℓ1 (u1 , u2 ; t) = ∂ρ (u1 , u2 ; t)/∂ tℓ1 and
φℓ1 ,ℓ2 (u1 , u2 ; t) = ∂ρ (u1 , u2 ; t)/(∂ tℓ1 ∂ tℓ2 ) are uniformly continuous in (u1 , u2 ) uniformly in t ∈ V (θ ) and of uniformly
bounded Hardy–Krause variation; see, e.g., Berghaus et al. [1].
(φ(j) ) There exist V (θ ) and a function h(u1 , u2 ) such that for each t ∈ V (θ )
(j)
max max |φℓ (u1 , u2 ; t)| ≤ h(u1 , u2 ),
j∈{1,2} ℓ∈{1,...,p}

(j)
where φℓ (u1 , u2 ; t) = ∂φℓ (u1 , u2 ; t)/∂ uj and E {h(U11 , U21 )} < ∞.
(Γ) Each element of the (matrix) function Γ(t) = E {∂φ(U1 , U2 ; t)/(∂ t⊤ )} is a continuous function on V (θ ) and the matrix
Γ = Γ(θ ) is positively definite.

Theorem 2. Suppose that the assumptions of Theorem 1 are satisfied and that also (Id), (φ), (φ(j) ), and (Γ) hold. Then with
probability going to 1, there exists a consistent root θ̂ n of the estimating equations (6), which satisfies

n (θ̂ n − θ )⇝Np (0p , Γ−1 Σ Γ−1 ),
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 145

where
[ 2 ∫∫
]
∑ ∂
Σ = var φ(U11 , U21 ; θ ) + {1(Uj1 ≤ vj ) − vj } φ(v1 , v2 ; θ ) dC (v1 , v2 ; θ ) .
∂vj
j=1

The proof of Theorem 2 is given in Appendix B. Note that the asymptotic distribution of the estimator θ̂ n coincides with the
(or)
distribution given in Section 4 of Genest et al. [12] that corresponds to the consistent root θ̂ n of the estimating equations (7).
Thus using the residuals instead of the true innovations has asymptotically negligible effect on the (first-order) asymptotic
(or)
properties. In fact, it can be even shown that both θ̂ n and θ̂ n have the same asymptotic representations and thus
√ (or)
n (θ̂ n − θ̂ n ) = oP (1).

2.3. Goodness-of-fit testing

When modeling multivariate data using copulas parametrically, one needs to choose a suitable family of copulas. When
choosing the copula family tests of goodness-of-fit are often a useful tool. Thus we are interested in testing H0 : C ∈ C0 ,
where C0 = {Cθ : θ ∈ Θ } is a given parametric family of copulas.
Many testing methods have been proposed, see, e.g., [14,21] and the references therein. The most standard ones are based
on the comparison of nonparametric and parametric estimators of a copula. For instance the Cramér–von Mises statistic is
given by
∫∫
Sn = {C̃n (u1 , u2 ) − C (u1 , u2 ; θ̂ n )}2 dC̃n (u1 , u2 ),

where θ̂ n is an estimate of the unknown parameter θ .


(or) (or)
As the asymptotic distributions of C̃n (u1 , u2 ) and θ̂ n are the same as the asymptotic distribution of C̃n (u1 , u2 ) and θ̂ n
we suggest that the significance of the test statistic can be assessed in the same way as in iid settings. Thus one can use
for instance the parametric bootstrap by simply generating independent and identically distributed observations from the
copula function C (u1 , u2 ; θ̂ n ). The test statistic is then simply recalculated from this observation in the same way as if
we directly observed the innovations. The only difference is that instead of generating n observations, we recommend to
generate only Wn observations.
Similar remarks hold when testing other hypotheses about the copula such as symmetry, for instance. Note that testing
H0 : C (u1 , u2 ) ≡ u1 u2 provides a test for conditional independence of the two time series, given the covariate.

3. Simulation study

A small Monte Carlo study was conducted in order to compare the semiparametric estimators based on the residuals with
the ‘oracle’ estimators based on (unobserved) innovations. The inversion of Kendall’s tau (IK) method and the maximum
pseudo-likelihood (MPL) method were considered for the following five copula families: Clayton, Frank, Gumbel, normal,
and Student with 4 degrees of freedom. The values of the parameters are chosen so that they correspond to the Kendall’s tau
τ ∈ {0.25, 0.50, 0.75}. The data were simulated from the following four models:
√ √
Y1i = (0.5 + 0.4 e−0.8 Xi ) Xi +
2
1 + 0.2 Xi2 ε1i , Y2i = 0.5 − 0.5 Xi + 1 + 0.4 Xi2 ε2i , (Mod 1)

Y1i = 0.7 Y1,i−1 + ε1i , Y2i = −0.5 Y2,i−1 + ε2i , (Mod 2)


0.5 Y1,i−1
Y1i = + ε1i , Y2i = −0.4 Y2,i−1 + ε2i , (Mod 3)
1 + 0.1 Y12,i−1
Y1i = σ1i ε1i , σ1i2 = 1 + 0.3Y12,i−1 , Y2i = σ2i ε2i , σ2i2 = 5 + 0.2 Y22,i−1 , (Mod 4)

where the innovations ε1i , ε2i follow marginally the standard normal distribution, and Xi is an exogenous variable following
the AR model Xi = 0.6 Xi−1 + ξi with ξi being iid from a standard normal distribution. The simulations were conducted also
for innovations ε1i , ε2i with Student marginals with 5 degrees of freedom, but the corresponding results are very similar. For
brevity of the paper, we do not present them here. All the simulations were conducted in R computing environment [29].
The nonparametric estimates m̂j and σ̂j are constructed as local polynomial estimators of order p = 1 with K being the
triweight kernel. The bandwidth hn is chosen for each estimation separately by the cross-validation method from the interval
(D, H), where D = σ̂Z /n1/(3+ε) and H = σ̂Z ln2 (n)/n1/(4−ε) for ε = 0.1 (see Remark 3) and σ̂Z is an estimate of the standard
deviation of the explanatory variable Z (being Xi or Yi−1 , depending on the model) given by σ̂Z = min{SZ , IQRZ /1.34}, where
SZ stands for the sample standard deviation and IQRZ is the interquartile range.
The weights are given by wn (z) = 1(z ∈ [cnL , cnU ]), where [cnL , cnU ] is the largest possible interval such that infz ∈[c L ,c U ] ˆ
fZ (z) ≥
n n
1/{σ̂Z ln2 (n)}, where ˆ
fZ is the kernel density estimator of the marginal density of Z with triweight kernel and the bandwidth
chosen by the standard normal reference rule; see, e.g., p. 201 in [10].
146 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

Table 1
Estimation for the Clayton copula with normal marginals (100 multiples of bias, SD and RMSE).
Model τ Estim n = 200 n = 500 n = 1000
Bias SD RMSE Bias SD RMSE Bias SD RMSE
(ik,or)
θ̂
Known innovations

0.25 n −0.03 3.25 3.25 0.10 2.47 2.47 0.12 1.86 1.87
(pl,or)
0.25 θ̂
n 0.51 3.00 3.04 0.35 2.15 2.18 0.24 1.69 1.70
(ik,or)
0.50 θ̂
n 0.01 2.64 2.64 0.06 2.03 2.03 0.07 1.52 1.52
(pl,or)
0.50 θ̂
n 0.09 2.47 2.47 0.08 1.84 1.85 0.04 1.39 1.39
(ik,or)
0.75 θ̂
n 0.01 1.58 1.58 0.05 1.19 1.19 0.02 0.89 0.89
(pl,or)
0.75 θ̂
n −0.28 1.48 1.50 −0.17 1.10 1.11 −0.12 0.80 0.81

0.25 θ̂n(ik) −0.08 4.66 4.66 −0.22 2.97 2.97 −0.16 2.06 2.06
0.25 θ̂n(pl) 0.62 4.15 4.19 0.07 2.62 2.62 −0.02 1.82 1.82
1 0.50 θ̂n(ik) −0.46 3.94 3.97 −0.41 2.48 2.51 −0.25 1.74 1.76
0.50 θ̂n(pl) −0.90 3.59 3.70 −0.81 2.25 2.39 −0.55 1.60 1.69
0.75 θ̂n(ik) −1.04 2.45 2.66 −0.85 1.55 1.77 −0.59 1.07 1.22
0.75 θ̂n(pl) −3.00 2.66 4.01 −2.23 1.59 2.74 −1.57 1.15 1.94
0.25 θ̂n(ik) −0.43 4.78 4.79 −0.05 2.93 2.92 0.07 2.08 2.08
0.25 θ̂n(pl) 0.26 4.30 4.31 0.25 2.58 2.59 0.15 1.90 1.90
2 0.50 θ̂n(ik) −0.91 3.93 4.03 −0.24 2.40 2.41 −0.09 1.71 1.72
0.50 θ̂n(pl) −1.50 3.62 3.92 −0.57 2.21 2.29 −0.36 1.60 1.64
0.75 θ̂n(ik) −1.96 2.63 3.27 −0.70 1.52 1.68 −0.39 1.05 1.12
0.75 θ̂n(pl) −4.63 3.19 5.62 −2.14 1.84 2.82 −1.27 1.16 1.72
0.25 θ̂n(ik) −0.43 4.81 4.83 −0.09 2.91 2.91 0.03 2.09 2.09
0.25 θ̂n(pl) 0.24 4.37 4.38 0.19 2.56 2.57 0.11 1.90 1.90
3 0.50 θ̂n(ik) −0.93 3.97 4.07 −0.32 2.41 2.43 −0.16 1.72 1.72
0.50 θ̂n(pl) −1.52 3.70 4.00 −0.66 2.20 2.30 −0.46 1.61 1.67
0.75 θ̂n(ik) −1.85 2.61 3.20 −0.82 1.53 1.73 −0.53 1.04 1.16
0.75 θ̂n(pl) −4.39 3.05 5.35 −2.25 1.78 2.86 −1.46 1.14 1.85
0.25 θ̂n(ik) −0.49 4.85 4.87 −0.09 2.93 2.93 0.02 2.10 2.10
0.25 θ̂n(pl) 0.13 4.37 4.37 0.14 2.58 2.59 0.06 1.90 1.90
4 0.50 θ̂n(ik) −0.82 3.99 4.07 −0.25 2.40 2.41 −0.12 1.73 1.73
0.50 θ̂n(pl) −1.54 3.70 4.01 −0.80 2.22 2.36 −0.53 1.60 1.69
0.75 θ̂n(ik) −1.22 2.57 2.84 −0.49 1.48 1.56 −0.28 1.04 1.08
0.75 θ̂n(pl) −3.43 2.76 4.40 −1.93 1.65 2.54 −1.20 1.10 1.62

For each setting, we compute the estimate of the copula parameter θ from the true (but unobserved) innovations using
(ik,or) (pl,or)
the inversion of Kendall’s tau method (θ̂n ) and the maximum pseudo-likelihood method (θ̂n ). These oracle estimators
(ik) (pl)
are compared with their counterparts computed from the residuals (θ̂n ) and (θ̂n ). To have more comparable results for
different copula families, the estimates of the parameters are done on the Kendall’s tau scale. That is, we are in fact comparing
nonparametric estimates of Kendall’s tau with parametric estimates, where the parameter is estimated with the help of the
maximum pseudo-likelihood method [12]. The performance of the estimators is measured by the bias, standard deviation
(SD), and the root mean square error (RMSE), which are estimated from the 1000 random samples for chosen sample sizes
n ∈ {200, 500, 1000}. Since the obtained quantities are of order 10−2 and smaller, we report 100 multiples of bias, SD and
(ik) (pl)
RMSE in Tables 1–5. As θ̂n and θ̂n are natural competitors, the larger of the two corresponding performance measures
(bias, SD, RMSE) is stressed by bold font.
In agreement with the results of [12,32], the results for the (oracle) estimates based on (unobserved) innovations are in
favor of MPL method. This continues to hold also when working with estimated residuals provided that the dependence is
not very strong (i.e., τ = 0.25 or τ = 0.50). But if the dependence is strong (i.e., τ = 0.75), then one should consider using
the IK method. This seems to be true in particular for the Clayton copula and to some extent also for the Frank copula and
the Gumbel copula. A closer inspection of the results reveals that while the standard deviation of MPL method is almost
always slightly smaller than the standard deviation of the IK method, the bias can be substantially larger. In contrast,
the results suggest that for the normal and the Student copula one can stick with MPL method even in case of a strong
dependence.
Finally note that for large sample sizes, the performance of the estimates based on residuals is usually almost as good
as of the oracle estimates based on (unobserved) innovations. But there is still some price to pay even for the sample size
n = 1000 and this price increases somewhat with the level of dependence. The question for possible further research is how
to explain the bad performance of PML method based on residuals for the Clayton copula with a strong dependence.
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 147

Table 2
Estimation for the Frank copula with normal marginals (100 multiples of bias, SD and RMSE).
Model τ Estim n = 200 n = 500 n = 1000
Bias SD RMSE Bias SD RMSE Bias SD RMSE
(ik,or)
θ̂
Known innovations

0.25 n −0.01 3.16 3.16 −0.05 2.33 2.33 −0.14 1.70 1.71
(pl,or)
0.25 θ̂ n 0.04 3.16 3.15 −0.03 2.32 2.32 −0.12 1.70 1.70
(ik,or)
0.50 θ̂ n −0.02 2.37 2.37 −0.01 1.73 1.73 −0.09 1.28 1.28
(pl,or)
0.50 θ̂ n 0.00 2.34 2.34 −0.02 1.72 1.72 −0.08 1.27 1.27
(ik,or)
0.75 θ̂ n −0.02 1.18 1.18 0.00 0.87 0.87 −0.03 0.64 0.64
(pl,or)
0.75 θ̂ n −0.13 1.17 1.17 −0.07 0.87 0.87 −0.07 0.64 0.64

0.25 θ̂n(ik) −0.23 4.54 4.54 −0.11 2.82 2.82 −0.05 1.92 1.92
0.25 θ̂n(pl) −0.12 4.52 4.52 −0.05 2.81 2.81 −0.03 1.90 1.90
1 0.50 θ̂n(ik) −0.49 3.46 3.50 −0.32 2.18 2.20 −0.22 1.43 1.44
0.50 θ̂n(pl) −0.47 3.40 3.43 −0.30 2.15 2.17 −0.21 1.42 1.43
0.75 θ̂n(ik) −0.97 1.87 2.11 −0.69 1.15 1.34 −0.53 0.74 0.91
0.75 θ̂n(pl) −1.22 1.84 2.21 −0.81 1.16 1.41 −0.60 0.75 0.96
0.25 θ̂n(ik) −0.28 4.48 4.49 −0.15 2.78 2.79 −0.21 1.88 1.89
0.25 θ̂n(pl) −0.17 4.47 4.47 −0.12 2.77 2.77 −0.19 1.88 1.89
2 0.50 θ̂n(ik) −0.77 3.44 3.53 −0.29 2.13 2.14 −0.24 1.41 1.43
0.50 θ̂n(pl) −0.75 3.40 3.48 −0.31 2.10 2.12 −0.24 1.40 1.42
0.75 θ̂n(ik) −1.65 2.20 2.75 −0.66 1.18 1.35 −0.38 0.75 0.84
0.75 θ̂n(pl) −1.90 2.20 2.91 −0.78 1.18 1.41 −0.43 0.75 0.87
0.25 θ̂n(ik) −0.33 4.53 4.54 −0.17 2.77 2.77 −0.24 1.89 1.91
0.25 θ̂n(pl) −0.23 4.53 4.53 −0.14 2.75 2.75 −0.22 1.89 1.90
3 0.50 θ̂n(ik) −0.83 3.48 3.58 −0.37 2.09 2.12 −0.32 1.42 1.45
0.50 θ̂n(pl) −0.81 3.44 3.53 −0.38 2.06 2.10 −0.32 1.41 1.44
0.75 θ̂n(ik) −1.62 2.15 2.70 −0.77 1.14 1.37 −0.51 0.76 0.92
0.75 θ̂n(pl) −1.86 2.14 2.84 −0.89 1.14 1.44 −0.57 0.77 0.96
0.25 θ̂n(ik) −0.37 4.56 4.57 −0.16 2.79 2.80 −0.22 1.90 1.91
0.25 θ̂n(pl) −0.26 4.54 4.54 −0.13 2.79 2.79 −0.20 1.90 1.91
4 0.50 θ̂n(ik) −0.76 3.48 3.56 −0.30 2.13 2.15 −0.25 1.43 1.46
0.50 θ̂n(pl) −0.73 3.43 3.50 −0.31 2.11 2.13 −0.25 1.42 1.44
0.75 θ̂n(ik) −1.11 2.05 2.34 −0.48 1.15 1.24 −0.30 0.76 0.81
0.75 θ̂n(pl) −1.33 2.01 2.41 −0.58 1.14 1.28 −0.35 0.76 0.83

4. Application

To illustrate the proposed methods let us consider daily log returns of USD/CZK (US Dollar/Czech Koruna) and GBP/CZK
(British Pound/Czech Koruna) exchange rates from January 4, 2010 to December 31, 2012. Note that we take only data until
the end of 2012 (total of 758 observations for each series), because in November 2013 the Czech National Bank started an
action targeting the CZK/EUR exchange rate.
Daily foreign exchange rates have been successfully modeled using the nonparametric autoregression, e.g., in [18,36].
Here, we apply a simple model of two separate nonparametric autoregressions of order 1 and search for a feasible copula for
the innovations. The conditional means and variances are modeled using local polynomials with degree p = 1. The weights
and the smoothing parameters are chosen as in Section 3. The fitted conditional means and standard deviations are plotted
together with the data in Fig. 1. It is visible that the conditional mean functions are rather flat and range around zero.
We use the goodness-of-fit test proposed in Section 2.3 in order to decide which copula should be used for modeling the
innovations from the two autoregressions. The copula parameter is estimated using the inversion of Kendall’s tau method.
The significance of the test statistics is assessed with the help of the bootstrap test based on B = 999 bootstrap samples.
We test Clayton, Frank, Gumbel, normal and Student copula with 4 degrees of freedom respectively and obtain p-values
0.000, 0.000, 0.001, 0.055, 0.305, respectively. Hence, we conclude that the Student copula seems to be the best choice for
the innovations. The normal copula is also not rejected at the 5% level, but the corresponding p-value is rather borderline,
so the Student copula seems to provide a better fit. The maximum pseudo-likelihood method estimates 5.156 degrees of
freedom and parameter ρ = 0.778. Fig. 2 shows plot of pseudo-observations (Ũ1i , Ũ2i ) given by (5), together with contours
of the fitted Student copula.

Acknowledgments

The authors are grateful to the Editor-in-Chief, Christian Genest, an Associate Editor and the reviewers for their valuable
comments, which led to an improved manuscript. The first author gratefully acknowledges support from the DFG (Research
148 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

Table 3
Estimation for the Gumbel copula with normal marginals (100 multiples of bias, SD and RMSE).
Model τ Estim n = 200 n = 500 n = 1000
Bias SD RMSE Bias SD RMSE Bias SD RMSE
(ik,or)
θ̂
Known innovations

0.25 n 0.01 3.19 3.19 0.13 2.43 2.44 0.08 1.88 1.88
(pl,or)
0.25 θ̂
n 0.44 3.01 3.04 0.38 2.37 2.40 0.24 1.81 1.82
(ik,or)
0.50 θ̂
n 0.02 2.58 2.58 0.11 1.96 1.97 0.02 1.49 1.49
(pl,or)
0.50 θ̂
n 0.24 2.42 2.43 0.27 1.89 1.91 0.12 1.44 1.44
(ik,or)
0.75 θ̂
n 0.02 1.48 1.48 0.06 1.12 1.12 0.00 0.84 0.84
(pl,or)
0.75 θ̂
n −0.06 1.35 1.36 0.02 1.05 1.05 −0.03 0.78 0.78

0.25 θ̂n(ik) −0.36 4.76 4.78 0.06 3.06 3.05 −0.09 2.06 2.06
0.25 θ̂n(pl) 0.24 4.68 4.68 0.37 2.92 2.94 0.08 2.01 2.01
1 0.50 θ̂n(ik) −0.56 3.92 3.96 −0.17 2.45 2.46 −0.22 1.69 1.70
0.50 θ̂n(pl) −0.36 3.83 3.84 −0.10 2.35 2.35 −0.20 1.65 1.66
0.75 θ̂n(ik) −0.85 2.36 2.50 −0.52 1.42 1.51 −0.49 1.01 1.12
0.75 θ̂n(pl) −1.35 2.32 2.69 −0.84 1.36 1.60 −0.73 0.99 1.22
0.25 θ̂n(ik) −0.16 4.58 4.58 0.02 2.91 2.91 0.04 2.10 2.10
0.25 θ̂n(pl) 0.49 4.42 4.45 0.32 2.86 2.88 0.20 2.03 2.04
2 0.50 θ̂n(ik) −0.66 3.77 3.82 −0.14 2.36 2.36 −0.09 1.67 1.68
0.50 θ̂n(pl) −0.50 3.61 3.64 −0.09 2.30 2.30 −0.05 1.62 1.62
0.75 θ̂n(ik) −1.61 2.50 2.97 −0.52 1.43 1.52 −0.32 0.99 1.04
0.75 θ̂n(pl) −2.37 2.52 3.46 −0.95 1.45 1.73 −0.55 0.98 1.13
0.25 θ̂n(ik) −0.18 4.57 4.57 0.01 2.93 2.92 0.02 2.11 2.11
0.25 θ̂n(pl) 0.46 4.41 4.43 0.31 2.87 2.88 0.18 2.03 2.03
3 0.50 θ̂n(ik) −0.66 3.73 3.78 −0.18 2.36 2.37 −0.16 1.69 1.70
0.50 θ̂n(pl) −0.50 3.59 3.62 −0.13 2.31 2.32 −0.13 1.63 1.64
0.75 θ̂n(ik) −1.52 2.48 2.90 −0.58 1.41 1.53 −0.42 0.98 1.07
0.75 θ̂n(pl) −2.20 2.44 3.29 −0.98 1.40 1.71 −0.64 0.96 1.15
0.25 θ̂n(ik) −0.26 4.60 4.60 −0.06 2.97 2.97 0.04 2.12 2.12
0.25 θ̂n(pl) 0.30 4.47 4.47 0.19 2.89 2.89 0.18 2.04 2.05
4 0.50 θ̂n(ik) −0.63 3.79 3.84 −0.13 2.36 2.37 −0.11 1.69 1.69
0.50 θ̂n(pl) −0.56 3.63 3.67 −0.16 2.31 2.32 −0.13 1.65 1.65
0.75 θ̂n(ik) −0.83 2.38 2.52 −0.29 1.40 1.43 −0.21 0.97 0.99
0.75 θ̂n(pl) −1.51 2.35 2.79 −0.71 1.41 1.57 −0.45 0.95 1.05

Fig. 1. Fitted conditional mean and variance for the analyzed log returns.

Unit FOR 1735 Structural Inference in Statistics: Adaptation and Efficiency). The second author gratefully acknowledges
support from the grant GACR 15-04774Y. The research of the third author was supported by the grant GACR 18-01781Y.
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 149

Table 4
Estimation for the normal copula with normal marginals (100 multiples of bias, SD and RMSE).
Model τ Estim n = 200 n = 500 n = 1000
Bias SD RMSE Bias SD RMSE Bias SD RMSE
(ik,or)
θ̂
Known innovations

0.25 n −0.02 3.13 3.13 −0.05 2.32 2.31 −0.03 1.78 1.77
(pl,or)
0.25 θ̂ n 0.38 2.99 3.02 0.22 2.19 2.20 0.13 1.66 1.67
(ik,or)
0.50 θ̂ n −0.01 2.44 2.44 −0.04 1.81 1.81 −0.02 1.39 1.39
(pl,or)
0.50 θ̂ n 0.32 2.26 2.28 0.19 1.67 1.68 0.12 1.27 1.27
(ik,or)
0.75 θ̂ n −0.01 1.36 1.36 −0.02 1.01 1.01 −0.01 0.77 0.77
(pl,or)
0.75 θ̂ n −0.04 1.23 1.23 −0.03 0.91 0.91 −0.01 0.69 0.69

0.25 θ̂n(ik) −0.29 4.65 4.66 −0.07 2.83 2.83 −0.15 1.99 2.00
0.25 θ̂n(pl) 0.35 4.49 4.50 0.19 2.72 2.72 0.02 1.89 1.89
1 0.50 θ̂n(ik) −0.48 3.67 3.70 −0.23 2.22 2.23 −0.25 1.56 1.58
0.50 θ̂n(pl) 0.00 3.40 3.40 −0.05 2.08 2.08 −0.13 1.44 1.44
0.75 θ̂n(ik) −0.78 2.17 2.30 −0.53 1.27 1.38 −0.47 0.88 1.00
0.75 θ̂n(pl) −0.94 2.02 2.23 −0.64 1.19 1.35 −0.52 0.81 0.96
0.25 θ̂n(ik) −0.34 4.39 4.40 −0.12 2.80 2.80 −0.10 1.94 1.94
0.25 θ̂n(pl) 0.38 4.21 4.22 0.22 2.72 2.72 0.10 1.83 1.83
2 0.50 θ̂n(ik) −0.70 3.47 3.54 −0.25 2.20 2.21 −0.16 1.53 1.54
0.50 θ̂n(pl) −0.20 3.21 3.22 −0.01 2.06 2.06 −0.01 1.40 1.40
0.75 θ̂n(ik) −1.54 2.25 2.73 −0.59 1.31 1.43 −0.34 0.86 0.93
0.75 θ̂n(pl) −1.80 2.14 2.80 −0.71 1.23 1.43 −0.39 0.79 0.88
0.25 θ̂n(ik) −0.38 4.41 4.42 −0.15 2.80 2.81 −0.13 1.95 1.96
0.25 θ̂n(pl) 0.33 4.23 4.24 0.18 2.72 2.73 0.06 1.83 1.83
3 0.50 θ̂n(ik) −0.71 3.48 3.55 −0.32 2.19 2.21 −0.22 1.52 1.53
0.50 θ̂n(pl) −0.21 3.20 3.21 −0.08 2.05 2.06 −0.07 1.39 1.39
0.75 θ̂n(ik) −1.45 2.19 2.63 −0.70 1.29 1.46 −0.43 0.87 0.97
0.75 θ̂n(pl) −1.70 2.07 2.67 −0.81 1.21 1.46 −0.48 0.79 0.93
0.25 θ̂n(ik) −0.34 4.40 4.41 −0.15 2.81 2.81 −0.11 1.96 1.97
0.25 θ̂n(pl) 0.30 4.24 4.25 0.16 2.72 2.72 0.07 1.84 1.84
4 0.50 θ̂n(ik) −0.69 3.47 3.53 −0.27 2.20 2.22 −0.18 1.54 1.55
0.50 θ̂n(pl) −0.26 3.23 3.24 −0.09 2.07 2.07 −0.07 1.41 1.41
0.75 θ̂n(ik) −0.82 2.19 2.34 −0.35 1.29 1.34 −0.22 0.87 0.89
0.75 θ̂n(pl) −1.14 2.08 2.37 −0.52 1.23 1.33 −0.31 0.81 0.86

Fig. 2. Pseudo-observations (Ũ1i , Ũ2i ) given by (5) together with contours of the fitted Student copula (black curves).
150 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

Table 5
Estimation for Student copula with normal marginals (100 multiples of bias, SD and RMSE).
Model τ Estim n = 200 n = 500 n = 1000
Bias SD RMSE Bias SD RMSE Bias SD RMSE
(ik,or)
θ̂
Known innovations

0.25 n −0.28 3.53 3.53 0.00 2.68 2.68 0.07 1.99 1.99
(pl,or)
0.25 θ̂
n 0.03 3.48 3.48 0.23 2.61 2.62 0.21 1.97 1.98
(ik,or)
0.50 θ̂
n −0.19 2.81 2.82 −0.01 2.14 2.14 0.05 1.60 1.60
(pl,or)
0.50 θ̂
n 0.05 2.66 2.66 0.19 2.00 2.00 0.17 1.51 1.52
(ik,or)
0.75 θ̂
n −0.10 1.62 1.62 −0.01 1.23 1.23 0.02 0.93 0.93
(pl,or)
0.75 θ̂
n −0.16 1.46 1.47 −0.02 1.09 1.09 0.02 0.83 0.83

0.25 θ̂n(ik) −0.25 4.93 4.93 −0.18 3.30 3.30 −0.11 2.28 2.28
0.25 θ̂n(pl) 0.24 4.96 4.96 0.08 3.32 3.32 0.00 2.27 2.27
1 0.50 θ̂n(ik) −0.48 3.95 3.97 −0.34 2.62 2.64 −0.24 1.81 1.83
0.50 θ̂n(pl) −0.17 3.82 3.82 −0.18 2.57 2.57 −0.20 1.74 1.75
0.75 θ̂n(ik) −0.79 2.33 2.46 −0.64 1.56 1.68 −0.49 1.06 1.17
0.75 θ̂n(pl) −1.13 2.22 2.48 −0.83 1.47 1.69 −0.66 0.99 1.19
0.25 θ̂n(ik) −0.61 4.99 5.03 −0.20 3.23 3.24 0.02 2.22 2.22
0.25 θ̂n(pl) −0.21 4.98 4.98 0.03 3.18 3.18 0.15 2.19 2.20
2 0.50 θ̂n(ik) −0.89 4.01 4.11 −0.35 2.62 2.64 −0.08 1.79 1.79
0.50 θ̂n(pl) −0.80 3.86 3.94 −0.24 2.45 2.46 −0.01 1.69 1.69
0.75 θ̂n(ik) −1.66 2.57 3.06 −0.70 1.55 1.70 −0.30 1.06 1.10
0.75 θ̂n(pl) −2.37 2.48 3.42 −0.99 1.44 1.75 −0.46 0.97 1.07
0.25 θ̂n(ik) −0.59 5.01 5.05 −0.24 3.22 3.23 −0.01 2.23 2.23
0.25 θ̂n(pl) −0.21 4.97 4.98 −0.01 3.18 3.18 0.12 2.20 2.20
3 0.50 θ̂n(ik) −0.90 4.07 4.16 −0.43 2.59 2.63 −0.14 1.79 1.79
0.50 θ̂n(pl) −0.79 3.88 3.96 −0.33 2.44 2.46 −0.08 1.68 1.69
0.75 θ̂n(ik) −1.60 2.61 3.06 −0.76 1.55 1.73 −0.39 1.06 1.13
0.75 θ̂n(pl) −2.20 2.48 3.31 −1.05 1.43 1.77 −0.56 0.97 1.12
0.25 θ̂n(ik) −0.63 5.03 5.07 −0.23 3.26 3.27 0.00 2.22 2.22
0.25 θ̂n(pl) −0.28 4.97 4.97 −0.01 3.22 3.22 0.12 2.19 2.20
4 0.50 θ̂n(ik) −0.91 4.06 4.16 −0.38 2.61 2.64 −0.09 1.78 1.78
0.50 θ̂n(pl) −0.81 3.82 3.91 −0.32 2.45 2.47 −0.07 1.68 1.68
0.75 θ̂n(ik) −0.97 2.51 2.69 −0.42 1.55 1.61 −0.17 1.05 1.06
0.75 θ̂n(pl) −1.47 2.34 2.76 −0.70 1.44 1.60 −0.33 0.96 1.02

Appendix A. Proof of Theorem 1

Recall the definition Wn = wn1 + · · · + wnn . Introduce


n
1 ∑
Ĝn (u1 , u2 ) = wni 1{ε̂1i ≤ F1−ε1 (u1 ), ε̂2i ≤ F2−ε1 (u2 )} = F̂ε̂ {F1−ε1 (u1 ), F2−ε1 (u2 )}
Wn
i=1

and note that


C̃n (u1 , u2 ) = Ĝn {Ĝ−
1n (u1 ), Ĝ2n (u2 )},
1 −1

where Ĝ1n and Ĝ2n denote the marginals of Ĝn . Further Ĝn is a distribution function on [0, 1]2 with the marginals cdfs
satisfying Ĝ1n (0) = Ĝ2n (0) = 0. Thus one can make use of the Hadamard differentiability of the ‘copula mapping’
Φ : G ↦→ G(G− 1 , G2 ) proved in Theorem 2.4 of Bücher and Volgushev [4] provided that we show that the process defined,
1 −1

for all (u1 , u2 ) ∈ [0, 1]2 , by



Ĝn (u1 , u2 ) = n {Ĝn (u1 , u2 ) − C (u1 , u2 )},
converges in distribution in the space ℓ∞ ([0, 1]2 ) to a process G with continuous trajectories such that G(u, 0) = G(0, u) =
G(1, 1) = 0 for each u ∈ [0, 1].

A.1. Decomposition and weak convergence of Ĝn

Denote
n n
1∑ 1 ∑
n (u1 , u2 ) =
G(or) 1{ε1i ≤ F1−ε1 (u1 ), ε2i ≤ F2−ε1 (u2 )}, n (u1 , u2 ) =
G̃(or) wni 1{ε1i ≤ F1−ε1 (u1 ), ε2i ≤ F2−ε1 (u2 )}.
n Wn
i=1 i=1
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 151

Now one can decompose the process Ĝn as


√ √ √
Ĝn = n (Ĝn − G̃(or)
n )+ n (G̃(or)
n − G(or)
n )+ n (G(or)
n − C ) = G̃n + G̃(or)
n n ,
+ G(or) (A.1)
(or) (or)
where G̃n , G̃n
and stand for the first, second and third terms respectively on the right-hand side of the first equation
Gn
in (A.1).
In Appendix A.2 it will be shown that the first term on the right-hand side of (A.1) satisfies uniformly in (u1 , u2 ) ∈ [0, 1]2 ,
n
1 ∑
G̃n (u1 , u2 ) = √ C (1) (u1 , u2 ) f1ε {F1−ε1 (u1 )} {ε1i + F1−ε1 (u1 )(ε1i
2
− 1)/2}
n
i=1
n
1 ∑
+ √ C (2) (u1 , u2 ) f2ε {F2−ε1 (u2 )} {ε2i + F2−ε1 (u2 )(ε2i
2
− 1)/2} + oP (1), (A.2)
n
i=1

where (in agreement with the last two conditions in (Fε )) for u1 ∈ {0, 1} the first term on the right-hand side of (A.2) is
defined as zero and analogously for u2 ∈ {0, 1}.
In Appendix A.3, we will show the asymptotic negligibility of the second term on the right-hand side of (A.1), i.e.,

n (u1 , u2 )| =
|G̃(or) n (u1 , u2 ) − Gn (u1 , u2 )}| = oP (1).
| n {G̃(or) (or)
sup sup (A.3)
(u1 ,u2 )∈[0,1]2 (u1 ,u2 )∈[0,1]2

Now combining (A.1) with (A.2) and (A.3) yields that uniformly in (u1 , u2 )

Ĝn (u1 , u2 ) = An (u1 , u2 ) + Bn (u1 , u2 ) + oP (1), (A.4)

where
n
1 ∑
An (u1 , u2 ) = √ [1{U1i ≤ u1 , U2i ≤ u2 } − C (u1 , u2 )], (A.5)
n
i=1
n 2
1 ∑ ∑ (j)
Bn (u1 , u2 ) = √ C (u1 , u2 ) fjε {Fj−
ε (uj )} {εji + Fjε (uj )(εji − 1)/2}.
1 −1 2
n
i=1 j=1

The asymptotic representation (A.4) together with standard techniques yields the weak convergence of the process Ĝn .
Now thanks to the Hadamard differentiability of the copula functional and Theorem 3.9.4 in [33],
√ √
n {C̃n (u1 , u2 ) − C (u1 , u2 )} = 1n (u1 ), Ĝ2n (u2 )} − C (u1 , u2 )]
n [Ĝn {Ĝ− 1 −1

= Ĝn (u1 , u2 ) − C (1) (u1 , u2 ) Ĝn (u1 , 1) − C (2) (u1 , u2 ) Ĝn (1, u2 ) + oP (1). (A.6)

Note that, for all (u1 , u2 ) ∈ [0, 1]2 ,

Bn (u1 , u2 ) − C (1) (u1 , u2 )Bn (u1 , 1) − C (2) (u1 , u2 )Bn (1, u2 ) = 0. (A.7)

Further combining (A.6) with (A.4), (A.5) and (A.7) gives



n {C̃n (u1 , u2 ) − C (u1 , u2 )} = An (u1 , u2 ) − C (1) (u1 , u2 )An (u1 , 1) − C (2) (u1 , u2 )An (1, u2 ) + oP (1). (A.8)
√ (or)
Now the right-hand side of (A.8) coincides with the asymptotic representation of the ‘oracle’ copula process n (Cn − C ),
which implies the statement of Theorem 1.

A.2. Showing (A.2)

Let us introduce the process


n
1 ∑
Zn (f ) = √ f (X i , ε1i , ε2i ),
n
i=1

which is indexed to be the set of functions


{
F = (x, y1 , y2 ) ↦ → 1{x ∈ [−c , c ]d } 1{y1 ≤ z1 b1 (x) + a1 (x), y2 ≤ z2 b2 (x) + a2 (x)},
}
c ∈ R+ , z1 , z2 ∈ R, a1 , a2 ∈ G , b1 , b2 ∈ G̃ ,
152 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

where
{ }
G = a : R d → R : a ∈ C1
d+δ
(Rd ), sup ∥x∥ν |a(x)| ≤ 1 , (A.9)
x
{ }
G̃ = b : Rd → R : b ∈ C̃2
d+δ
(Rd ), sup ∥x∥ν |b(x) − 1| ≤ 1 (A.10)
x

and for δ from assumption (Bw) and some ν large enough such that
b{d/ν + d/(d + δ )}/(b − 1) < 1. (A.11)
Denote the centered process as
Z̄n (f ) = Zn (f ) − E Zn (f ), (A.12)
and note that f ∈ F is formally identified by (c , z1 , z2 , a1 , b1 , a2 , b2 ). We will use the notation f =
ˆ (c , z1 , z2 , a1 , b1 , a2 , b2 ).
In accordance with van der Vaart and Wellner [34], the notation Z̄n (fn ) for random fn is understood to mean the value of the
mapping f ↦ → Z̄n (f ) evaluated at fn . Consider the semi-norm given by
∫ 1
∥f ∥ 2
2,β = β −1 (u) Qf2 (u) du,
0

where
β −1 (u) = inf{x > 0 : β⌊x⌋ ≤ u}, Qf (u) = inf{x > 0 : Pr(|f (ε11 , ε21 , X 1 )| > x) ≤ u}.
From assumption (β) one obtains that β −1
(u) ≤ cu−1/b for some constant c. Further denote
P |f − g | = E |f (X 1 , ε11 , ε12 ) − g(X 1 , ε11 , ε12 )| = Pr(|f (X 1 , ε11 , ε12 ) − g(X 1 , ε11 , ε12 )| > 0).
As F consists of indicator functions for f , g ∈ F one has Qf −g (u) = 1(0 < u < P |f − g |). Thus one obtains, for ϵ < 1,
∫ P |f −g |
cb
∥f − g ∥22,β ≤ c u−1/b du = (P |f − g |)1−1/b . (A.13)
0 b−1
Starting with brackets of ∥ · ∥2 -length ϵ 2b/(b−1) of the function classes G , G̃ and {x ↦ → 1{x ∈ [−c , c ]d } : c ∈ R+ } it is then
easy to construct brackets for F with ∥ · ∥2,β -length ϵ ; compare with the proof of Lemma 1 in [8]. Thus one obtains
ln{N[ ] (ϵ, F , ∥ · ∥2,β )} ≤ ln{O(ϵ −2db/(b−1) ) N[ ] (ϵ 2b/(b−1) , G , ∥ · ∥2 ) N[ ] (ϵ 2b/(b−1) , G̃ , ∥ · ∥2 )}
( b d d
)
≤ O{ln(ϵ )} + O ϵ −2 b−1 ( ν + d+δ ) , (A.14)

where the rate follows from Lemma 2 in Appendix C. Further one bracket is sufficient for ϵ ≥ 1. Thus by (A.14) and (A.11),
∫ ∞
ln N[ ] (ϵ, F , ∥ · ∥2,β ) dϵ < ∞.

0

From Dedecker and Louhichi [7], Section 4.3, it follows that the centered process Z̄n given by (A.12) is asymptotically ∥ · ∥2,β -
equicontinuous. To apply this result in order to prove (A.2) note that
√ ∑n
n [ { }
G̃n (u1 , u2 ) = wni 1 ε1i ≤ F1−ε1 (u1 ) σσ̂1 (X
(X i )
)
+ m̂1 (X i )−m1 (X i )
σ1 (X i )
, ε2i ≤ F2−ε1 (u2 ) σ̂σ2 (X i)
(X )
+ m̂2 (X i )−m2 (X i )
σ2 (X i )
Wn 1 i 2 i
i=1
]
− 1{ε1i ≤ F1−ε1 (u1 ), ε2i ≤ F2−ε1 (u2 )}
and introduce the process
n
1 ∑
Ǧn (u1 , u2 ) = √ wni [1{ε1i ≤ F1−ε1 (u1 )b̂1 (X i ) + â1 (X i ), ε2i ≤ F2−ε1 (u2 )b̂2 (X i ) + â2 (X i )}
n
i=1

− 1{ε1i ≤ F1−ε1 (u1 ), ε2i ≤ F2−ε1 (u2 )}]


with âj , b̂j , j ∈ {1, 2}, from Lemma 1 and Remark 5 in Appendix C. Then one obtains by monotonicity arguments applying
Lemma 1(i) that, on an event with probability converging to 1,
Zn (fnℓ ) − Zn (gn ) ≤ {(Wn /n)G̃n (u1 , u2 ) − Ǧn (u1 , u2 )} ≤ Zn (fnu ) − Zn (gn )
for some deterministic positive sequence γn = o(n−1/2 ). Here,
fnℓ =
ˆ (cn , F1−ε1 (u1 ), F2−ε1 (u2 ), â1 − γn , b̂1 − γn sign{F1−ε1 (u1 )}, â2 − γn , b̂2 − γn sign{F2−ε1 (u2 )}),
ˆ (cn , F1−ε1 (u1 ), F2−ε1 (u2 ), â1 + γn , b̂1 + γn sign{F1−ε1 (u1 )}, â2 + γn , b̂2 + γn sign{F2−ε1 (u2 )}),
fnu =
ˆ (cn , F1−ε1 (u1 ), F2−ε1 (u2 ), â1 , b̂1 , â2 , b̂2 ).
gn =
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 153

We only consider the upper bound; the lower one can be handled completely analogously. First note that Zn (fnu ) − Zn (gn ) =
Z̄n (fnu ) − Z̄n (gn ) + Rn , where with probability converging to 1,

|Rn | ≤ 2 n max sup |Fjε (uv + w) − Fjε {u(v + sγn ) + w + γn }| = o(1), (A.15)
j∈{1,2} u∈R,s∈{−1,1}
v∈(1/2,1),w∈(−1,1)

where the last equality follows by a Taylor expansion, assumption (Fε ) and γn = o(n−1/2 ). Now for j ∈ {1, 2}, introduce the
notation

ε (u, x, γ ) = Fjε (u)[b̂j (x) + γ sign{Fjε (u)}] + âj (x) + γ .


1 −1 −1
Fj− (A.16)

One can show as in (A.13) that for a sufficiently large M,

∥fnu − gn ∥2,β ≤ M Pr[|1{X 1 ∈ Jn , ε11 ≤ F1−ε1 (u1 , X 1 , γn ), ε21 ≤ F2−ε1 (u2 , X 1 , γn )}


− 1{X 1 ∈ Jn , ε11 ≤ F1−ε1 (u1 , X 1 , 0), ε21 ≤ F2−ε1 (u2 , X 1 , 0)}|> 0]1−1/b
≤ M Pr{X 1 ∈ Jn , F1−ε1 (u1 , X 1 , 0) ≤ ε11 ≤ F1−ε1 (u1 , X 1 , γn )}1−1/b
+ M Pr{X 1 ∈ Jn , F2−ε1 (u2 , X 1 , 0) ≤ ε21 ≤ F2−ε1 (u2 , X 1 , γn )}1−1/b
and this can be bounded above by Mn−(1−1/b)/2 times the bound on the right-hand side of (A.15) and thus converges to zero
in probability uniformly in u1 , u2 . Therefore there exists a deterministic sequence δn ↘ 0 with Pr(supu1 ,u2 ∥fnu − gn ∥2,β ≤
δn ) → 1 as n → ∞. Further by Lemma 1 and Remark 5 of Appendix C, one has Pr(fnu , fnl , gn ∈ F ) → 1 as n → ∞. Now from
∥ · ∥2,β -equicontinuity of Z̄n one obtains, for every ϵ > 0, that
{ } { }
Pr sup | Z̄n (fnu ) − Z̄n (gn )| > ϵ ≤ Pr sup |Z̄n (f ) − Z̄n (g)| > ϵ + o(1) = o(1)
u1 ,u2 f ,g ∈ F
∥f −g ∥2,β ≤δn

and thus |Z̄n (fnu ) − Z̄n (gn )| = oP (1) uniformly with respect to u1 , u2 . In combination with (A.15), analogous considerations
for the lower bound Zn (fnℓ ) − Zn (gn ) and the fact that

Wn /n = 1 + oP (1), (A.17)

we obtain

sup |(Wn /n)G̃n (u1 , u2 ) − Ǧn (u1 , u2 )| = oP (1).


u1 ,u2

Further, thanks to (A.17), it is sufficient to show that the process Ǧn (u1 , u2 ) has the asymptotic representation given by the
right-hand side of (A.2).
Thus the remaining proof of (A.2) is divided into two parts. First we prove that

sup |Ǧn (u1 , u2 ) − E∗ {Ǧn (u1 , u2 )}| = oP (1) (A.18)


u1 ,u2

and then we calculate E∗ {Ǧn (u1 , u2 )}. Here, with slight abuse of notation, E∗ denotes expectation, considering the functions
âj , b̂j as deterministic.

A.3. Showing (A.18).

Note that we have

Ǧn (u1 , u2 ) − E∗ {Ǧn (u1 , u2 )} = Z̄n (fn ) − Z̄n (gn ), (A.19)

where

ˆ (cn , F1−ε1 (u1 ), F2−ε1 (u2 ), â1 , b̂1 , â2 , b̂2 ),


fn = ˆ (cn , F1−ε1 (u1 ), F2−ε1 (u2 ), 0, 1, 0, 1),
gn =

with 0 and 1 standing for functions that are constantly equal to 0 and 1, respectively. Similarly to before one can show that
for a sufficiently large M
( [ ] [ ])1−1/b
∥fn − gn ∥2,β ≤ M E |F1ε {F1−ε1 (u1 , X 1 , 0)} − u1 | 1(X 1 ∈ Jn ) + E |F2ε {F2−ε1 (u2 , X 1 , 0)} − u2 | 1(X 1 ∈ Jn )

using notation (A.16). Now note that with Lemma 1(iii) in Appendix C, we obtain ∥fn − gn ∥2,β = oP (1) uniformly in u1 , u2 .
Finally with the help of (A.17), (A.19) and the asymptotic ∥ · ∥2,β -equicontinuity of the process Z̄n , one can conclude (A.18).
154 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

A.4. Calculating E∗ {Ǧn (u1 , u2 )}.

To simplify the notation and to prevent the confusion let the random vector X have the same distribution as X 1 . With the
help of a second-order Taylor series expansion of the right-hand side, one gets

E∗ {Ǧn (u1 , u2 )} = n E∗ {wn (X )[Fε {F1−ε1 (u1 , X , 0), F2−ε1 (u2 , X , 0)} − Fε {F1−ε1 (u1 ), F2−ε1 (u2 )}]}
2
√ ∑
= n E∗ [wn (X ) Fε(j) {F1−ε1 (u1 ), F2−ε1 (u2 )} YjX (uj )] (A.20)
j=1
2 2
1√ ∑ ∑ ∗
+ n E [wn (X ) Fε(j,k) {F1−ε1 (u1X ), F2−ε1 (u2X )} YjX (uj ) YkX (uk )],
2
j=1 k=1

where, for j ∈ {1, 2},

ε (u, x, 0) − Fjε (u) = âj (x) + Fjε (u) {b̂j (x) − 1},
1 −1 −1
Yjx (u) = Fj−

ε (uj , x, 0)} and uj . Now using Lemma 1(iv) in Appendix C, for j ∈ {1, 2},
1
and the point ujx lies between the points Fjε {Fj−

n E∗ [wn (X ) Fε(j) {F1−ε1 (u1 ),F2−ε1 (u2 )} YjX (uj )]
√ [ ]
= n Fε(j) {F1−ε1 (u1 ), F2−ε1 (u2 )} E∗ {âj (X )}1(X ∈ Jn ) + Fj−1 ∗
ε (uj ) E [{b̂j (X ) − 1} 1(X ∈ Jn )]
n
1 ∑
= Fε(j) {F1−ε1 (u1 ), F2−ε1 (u2 )} √ {εji + Fj−ε 1 (uj ) (εji2 − 1)/2} + oP (1)
n
i=1
n
1 ∑
= √ C (j) (u1 , u2 ) fjε {Fj−ε 1 (uj )} {εji + Fj−ε 1 (uj ) (εji2 − 1)/2} + oP (1)
n
i=1

uniformly in (u1 , u2 ).
To conclude the proof of (A.2) we need to show that ‘the second order terms’ in (A.20) are asymptotically negligible. To
show that note that by assumption (Fε ) and Lemma 1(iii) there exists a finite constant M such that with probability going
to 1,
|Fε(j,k) {F1−ε1 (u1x ), F2−ε1 (u2x )}| {1 + |Fj−ε 1 (uj )|} {1 + |Fk−ε1 (uk )|}
{1 + |Fj−ε 1 (uj )|} {1 + |Fk−ε1 (uk )|}
= |Fε(j,k) {F1−ε1 (u1x ), F2−ε1 (u2x )}| {1 + |Fj−ε 1 (ujx )|} {1 + |Fk−ε1 (ukx )|}
{1 + |Fj−ε 1 (ujx )|}{1 + |Fk−ε1 (ukx )|}
{1 + |Fj−ε 1 (uj )|} {1 + |Fk−ε1 (uk )|}
≤M
{1 + |Fj−ε 1 (ujx )|}{1 + |Fk−ε1 (ukx )|}
1 −1
M {1 + |Fj−
ε (uj )|} {1 + |Fkε (uk )|}
≤ ≤ 2M
{1 + |Fj−ε 1 (uj ){1 + oP (n−1/4 )} + oP (n−1/4 )|}{1 + |Fk−ε1 (uk ){1 + oP (n−1/4 )} + oP (n−1/4 )|}
uniformly in (u1 , u2 ) ∈ [0, 1]2 and x ∈ Jn . Thus to prove
E∗ [wn (X ) Fε(j,k) {F1−ε1 (u1X ), F2−ε1 (u2X )} YjX (uj ) YkX (uk )] = oP (n−1/2 ),
it is sufficient to use once more Lemma 1(iii).

A.5. Showing (A.3)

Recall that Wn = wn1 + · · · + wnn and decompose


√ √ √
n (u1 , u2 ) − Gn (u1 , u2 )} =
n {G̃(or) n (u1 , u2 ) − C (u1 , u2 )} − n (u1 , u2 ) − C (u1 , u2 )}
(or)
n {G̃(or) n {G(or)
n
n ∑( wni wni wni 1) [ ]
= √ − + − 1{ε1i ≤ F1−ε1 (u1 ), ε2i ≤ F2−ε1 (u2 )} − C (u1 , u2 )
n Wn n n n
i=1
n
(n/Wn − 1) ∑ [ ]
= √ wni 1{ε1i ≤ F1−ε1 (u1 ), ε2i ≤ F2−ε1 (u2 )} − C (u1 , u2 ) (A.21)
n
i=1
n
1 ∑ [ ]
+√ (wni − 1) 1{ε1i ≤ F1−ε1 (u1 ), ε2i ≤ F2−ε1 (u2 )} − C (u1 , u2 )
n
i=1
= (n/Wn − 1) Bn1 (u1 , u2 ) + Bn2 (u1 , u2 ),
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 155

where Bn1 (u1 , u2 ) stands for the first term on the right-hand side of Eq. (A.21) (except for the factor n/Wn − 1) and Bn2 (u1 , u2 )
for the second term. Using standard techniques, one can then show that both Bn1 (u1 , u2 ) and Bn2 (u1 , u2 ) viewed as processes
on [0, 1]2 are asymptotically equicontinuous. To this end, note that Bn1 (u1 , u2 ) corresponds to the process Z̄n (f ) as defined in
Appendix A.1 above with f = ˆ (cn , F1−ε1 (u1 ), F2−ε1 (u2 ), 0, 1, 0, 1). Alternatively, results by Bickel and Wichura [2] can be applied.
Moreover, given that n/Wn − 1 = oP (1) and E{wn (X 1 ) − 1} = − Pr(X 1 ̸ ∈ Jn ) = o(1), one can conclude that both processes
(n/Wn − 1) Bn1 (u1 , u2 ) and Bn2 (u1 , u2 ) are uniformly asymptotically negligible in probability, which together with (A.21)
implies (A.3). □

Appendix B. Proof of Theorem 2

Thanks to assumption (φ), the estimator θ̂ n is a solution to the estimating equations (6). In what follows, first we prove
the existence of a consistent root of the estimating equations (6) and then we derive that this root satisfies
n
√ 1 ∑
n (θ̂ n − θ ) = Γ−1 √ φ(Û1i , Û2i ; θ ) + oP (1), (B.1)
n
i=1

where (Û1i , Û2i ) was introduced in (8). The statement of the theorem now follows for p = 1 by Proposition A 1(ii) of Genest
et al. [12] and for p > 1 by Theorem 1 of Gijbels et al. [15].

B.1. Proving consistency

Let
n
1 ∑
C̃n′ (u1 , u2 ) = wni 1(Ũ1i ≤ u1 , Ũ2i ≤ u2 ),
Wn
i=1

where the pseudo-observations (Ũ1i , Ũ2i ) are defined in (5). Note that
sup |C̃n (u1 , u2 ) − C̃n′ (u1 , u2 )| = OP (1/Wn ) = OP (1/n). (B.2)
(u1 ,u2 )∈[0,1]2

Fix ℓ ∈ {1, . . . , p}. By Corollary A.7 of Berghaus et al. [1], one finds
n ∫ 1 ∫ 1
1 ∑
wni φℓ (Ũ1i , Ũ2i ; t) = φℓ (v1 , v2 ; t) dC̃n′ (v1 , v2 )
Wn 0 0
i=1
∫ 1 ∫ 1
= C̃n′ (v1 , v2 ) dφℓ (v1 , v2 ; t) + φℓ (1, 1; t)
0 0
∫ 1 ∫ 1
− C̃n (v1 , 1) dφℓ (v1 , 1; t) −

C̃n′ (1, v2 ) dφℓ (1, v2 ; t). (B.3)
0 0

Note that thanks to assumption (φ(j) ), one has, uniformly in t ∈ V (θ ),


∫ 1 n ∫ 1
1 ∑
C̃n′ (v1 , 1) dφℓ (v1 , 1; t) = wni 1(Ũ1i ≤ v1 ) dφℓ (v1 , 1; t)
0 Wn 0
i=1
Wn ∫ 1 Wn
1 ∑ 1 ∑
= dφℓ (v1 , 1; t) = φℓ (1, 1; t) − φℓ {i/(Wn + 1), 1; t}
Wn i/(Wn +1) Wn
i=1 i=1
∫ 1
= φℓ (1, 1; t) − φℓ (v1 , 1; t) dv1 + OP (1/Wn ) (B.4)
0

and, analogously,
∫ 1 ∫ 1
C̃n′ (1, v2 ) dφℓ (1, v2 ; t) = φℓ (1, 1; t) − φℓ (1, v2 ; t) dv2 + OP (1/Wn ). (B.5)
0 0

Now combining (B.3), (B.4) and (B.5), we deduce that


n ∫ 1 ∫ 1
1 ∑
wni φℓ (Ũ1i , Ũ2i ; t) = C̃n′ (v1 , v2 ) dφℓ (v1 , v2 ; t) + Aℓ (t) + OP (1/n), (B.6)
Wn 0 0
i=1

where
∫ 1 ∫ 1
Aℓ (t) = −φℓ (1, 1; t) + φℓ (v1 , 1; t) dv1 + φℓ (1, v2 ; t) dv2 . (B.7)
0 0
156 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

Analogously, we find
∫ 1 ∫ 1
E φℓ (U11 , U21 ; t) = C (v1 , v2 ) dφℓ (v1 , v2 ; t) + Aℓ (t). (B.8)
0 0

Now using (B.2), (B.6), (B.8) and assumption (φ) gives that, uniformly in t ∈ V (θ ),
n ∫ 1 ∫ 1
1 ∑
wni φℓ (Ũ1i , Ũ2i ; t) − E φℓ (U11 , U21 ; t) = {C̃n′ (v1 , v2 ) − C (v1 , v2 )} dφℓ (v1 , v2 ; t) + OP (1/n) = oP (1),
Wn 0 0
i=1

where we have used Theorem 1 and assumption (φ). The existence of a consistent root of estimating equations (6) now
follows by assumptions (Id) and (Γ).
Analogously, one can show the existence of a consistent root of estimating equations (7). □

B.2. Showing (B.1)

Let θ̂ n be a consistent root of the estimating equations (6). Then by the Mean Value Theorem applied to each coordinate
of the vector-valued function, viz.
n
1 ∑
Ψn (t) = wni φ(Ũ1i , Ũ2i ; t)
Wn
i=1

we find
n n n
1 ∑ 1 ∑ 1 ∑
0p = wni φ(Ũ1i , Ũ2i ; θ̂ n ) = wni φ(Ũ1i , Ũ2i ; θ ) + wni Dφ (Ũ1i , Ũ2i ; θ ∗n ) (θ̂ n − θ ),
Wn Wn Wn
i=1 i=1 i=1

where Dφ stands for ∂φ(u1 , u2 ; t)/∂ t and θ n is between θ̂ n and θ . Note that as the Mean Value Theorem is applied to a vector-

valued function, there are in fact p different points θ n∗,1 , . . . , θ ∗, p


n for each coordinate of the function Ψn (t) but all of them are
consistent so for simplicity of notation we do not distinguish them.
Thus to complete the proof of (B.1), it is sufficient to show that
n
1 ∑
wni Dφ (Ũ1i , Ũ2i ; θ ∗n ) = Γ + oP (1) (B.9)
Wn
i=1

and
√ ∑n n
n 1 ∑
wni φ(Ũ1i , Ũ2i ; θ ) = √ φ(Û1i , Û2i ; θ ) + oP (1). (B.10)
Wn n
i=1 i=1

When proving (B.9) one can mimic the proof of consistency of θ̂ n and show that there exists an open neighborhood V (θ )
of θ such that
1 n
 ∑ 
wni Dφ (Ũ1i , Ũ2i ; t) − E Dφ (U11 , U21 ; t) 
 = oP (1).

sup  
t∈V (θ ) n
i=1

Using the consistency of θ̂ n and assumption (Γ) yields (B.9). Thus one can concentrate on proving (B.10). Set
n
1∑
Cn′(or) (u1 , u2 ) = 1(Û1i ≤ u1 , Û2i ≤ u2 ),
n
i=1

where (Û1i , Û2i ) are defined in (8). Note that

sup |Cn(or) (u1 , u2 ) − Cn′(or) (u1 , u2 )| = OP (1/n). (B.11)


(u1 ,u2 )∈[0,1]2

Analogously as (B.6), one can also show that, for ℓ ∈ {1, . . . , p},
n ∫ 1 ∫ 1
1∑
φℓ (Û1i , Û2i ; θ ) = φℓ (v1 , v2 ; θ ) dCn′(or) (v1 , v2 )
n 0 0
i=1
∫ 1 ∫ 1
= Cn′(or) (v1 , v2 ) dφℓ (v1 , v2 ; θ ) + Aℓ (θ ) + OP (1/n), (B.12)
0 0
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 157

where Aℓ (θ ) is given in (B.7). Now using (B.2), (B.6), (B.11), (B.12), Theorem 1 and (φ), one finds
√ ∑n n
n 1 ∑
wni φℓ (Ũ1i , Ũ2i ; θ ) − √ φℓ (Û1i , Û2i ; θ )
Wn n
i=1 i=1
1 1
√ √
∫ ∫
= n {C̃n′ (u1 , u2 ) − Cn′(or) (u1 , u2 )} dφℓ (u1 , u2 ; θ ) + oP (1/ n) = oP (1),
0 0

which verifies (B.10) and completes the proof of (B.1). □

Appendix C. Auxiliary results

Lemma 1. Assume that (β), (Fε ), (M), (FX ), (Bw), (k), (Jn ) and (mσ ) are satisfied. Then there exist random functions âj and b̂j on
Jn such that, for j ∈ {1, 2},

(i) supx∈Jn |{m̂j (x) − mj (x)}/σj (x) − âj (x)| = oP (n−1/2 ) supx∈Jn |σ̂j (x)/σj (x) − b̂j (x)| = oP (n−1/2 );
(ii) ∥âj ∥d+δ = oP (1), ∥b̂j − 1∥d+δ = oP (1) for δ > 0 from assumption (Bw),
(iii) supx∈Jn |âj (x)| = oP (n−1/4 ) and supx∈Jn |b̂j (x) − 1| = oP (n−1/4 ),
n ∫ n
−1/2
(εji2 − 1)/(2n) + oP (n−1/2 ).
∑ ∑
εji /n + oP (n

(iv) Jn
âj (x)fX (x) dx = ), { b̂j (x) − 1} fX (x) dx =
i=1 Jn i=1

Proof. For ease of presentation we set j = 1 and assume hn = (hn , . . . , hn ). We will first prove the assertions for m̂1 . The proof
basically goes along the lines of the proof of Lemma 1 by Müller et al. [25], but changes are necessary due to the dependency
of observations in our model and because our covariate density is not assumed to be bounded away from zero on its support.
Recall that I(d, p) denotes the set of multi-indices i = (i1 , . . . , id ) with i. = i1 + · · · + id ≤ p and we set I = I(d, p), where
n = [−cn − hn , cn + hn ] and
d
p is the order of the polynomials used in the local polynomial estimation. Further introduce J+
note that thanks to assumption (Bw), there exists q > 0 such that

αn(1) = inf fX (x) ≥ 1/(ln n)q (C.1)


x∈J+
n

(2)
n is a subset of J2n . Finally define αn = minj∈{1,2} infx∈Jn σj (x) which is by assumption
as for all sufficiently large n the set J+
(mσ ) either bounded away from zero or converges to zero not faster than a negative power of ln n.

Proof of assertion (i) for m̂1 . Fix some x ∈ Jn and let β̂ denote the solution of the minimization problem (3). Then β̂ satisfies
the normal equations

∀i∈I Ai (x) + Bi (x) − Q̂ik (x) β̂k = 0,
k∈ I

where
n
1∑
Ai (x) = σ1 (X ℓ ) ε1ℓ ψi,hn (X ℓ − x) Khn (X ℓ − x),
n
ℓ=1
n
1∑
Bi (x) = m1 (X ℓ ) ψi,hn (X ℓ − x) Khn (X ℓ − x),
n
ℓ=1
n
1∑
Q̂ik (x) = ψi,hn (X ℓ − x) ψk,hn (X ℓ − x) Khn (X ℓ − x).
n
ℓ=1

From Theorem 2 in Hansen [17] we obtain, for ϱn = {ln n/(nhdn )}1/2 ,

sup |Q̂ik (x) − Qik (x)| = OP (ϱn ), (C.2)


x∈Jn

where we define Qik (x) = E{Q̂ik (x)} for all i, k ∈ I. Note that

Qik (x) = ψi,(1,...,1) (u) ψk,(1,...,1) (u) fX (x + hn u) K (u) du,

and for x ∈ Jn , consider the matrices Q(x) with entries Qik (x) with i, k ∈ I. Analogously, put Q̂(x) for the matrix with entries
Q̂ik (x) with i, k ∈ I.
158 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

It follows from (C.1) that 0 < λn ≤ a⊤ Q(x) a ≤ Λ < ∞ for all vectors a of unit Euclidean length, where λn is a sequence
(1)
of positive real numbers of the same rate as αn in (C.1). Thus Q(x) has eigenvalues in the interval [λn , Λ], and on the event
{ }
En = sup ∥Q̂(x) − Q(x)∥ ≤ λn /2
x∈Jn

one has a Q̂(x) a ≥ λn /2 for all a of unit Euclidean length, such that the matrix Q̂(x) is invertible as well. Here and throughout,

∥Q∥ denotes the spectral norm of a matrix Q. Note that Pr(En ) → 1 by (C.2) and ϱn = o(αn(1) ), which holds under assumption
(Bw). For the remainder of the proof, we assume that the event En takes place because its complement does not matter for
the assertions of the lemma. It follows from the normal equations that, for x ∈ Jn ,

m̂1 (x) = e1 ⊤ Q̂−1 (x) {A(x) + B(x)},

where e1 = (1, 0, . . . , 0)⊤ and A(x) and B(x) denote the vectors with components Ai (x) and Bi (x) with i ∈ I, respectively.
Now define, for x ∈ Jn ,

â1 (x) = e1 ⊤ Q−1 (x) A(x)/σ1 (x). (C.3)

Then we have the decomposition

{m̂1 (x) − m1 (x)}/σ1 (x) − â1 (x) = r1 (x) + r2 (x) (C.4)

with remainder terms

r1 (x) = e1 ⊤ {Q̂−1 (x) − Q−1 (x)} A(x)/σ1 (x), r2 (x) = e1 ⊤ Q̂−1 (x) {B(x) − Q̂(x) β(x)}/σ1 (x),

where β(x) is the vector with components β i (x) = hin. Di m1 (x) with i ∈ I. From Theorem 2 in [17], we obtain

sup |Ai (x)| = OP (ϱn ) (C.5)


x∈Jn

for all i ∈ I. For the treatment of the inverse matrices in r1 (x), we use Cramér’s rule and write
1 1
Q̂−1 (x) − Q−1 (x) = {Ĉ(x)}⊤ − {C(x)}⊤
det{Q̂(x)} det{Q(x)}
det{Q(x)} − det{Q̂(x)} 1
= {Ĉ(x)}⊤ + {Ĉ(x) − C(x)}⊤ ,
det{Q̂(x)} det{Q(x)} det{Q(x)}

where Ĉ(x) and C(x) denote the cofactor matrices of Q̂(x) and Q(x), respectively. Due to the boundedness of the func-
tions Qik each element of Ĉ(x) − C(x) can be absolutely bounded by OP (ϱn ) by (C.2) and the same rate is obtained for
|I|
|det{Q(x)} − det{Q̂(x)}|, uniformly in x. Using the lower bound λn for the determinant of Q(x), and assumption (mσ ) to
bound 1/σ1 gives the rate
( )
(ϱn )2
sup |r1 (x)| = OP (1) 2|I| (2) = oP (n−1/2 ) (C.6)
x∈Jn (αn ) αn

by assumption (Bw). In order to show negligibility of r2 (x) first note that the spectral norm of Q̂−1 (x) is given by the reciprocal
of the square root of the smallest eigenvalue of Q̂(x)⊤ Q̂(x). With

{a⊤ Q̂(x)⊤ Q̂(x) a}1/2 = ∥Q̂(x)a∥ ≥ {a⊤ Q(x)⊤ Q(x) a}1/2 − ∥Q̂(x) − Q(x)∥ ≥ λn /2

(on En ) for all a with ∥a∥ = 1, we obtain the rate O(λ−1


n ) for ∥Q̂
−1
(x)∥. Further, by Taylor expansion of m1 (X ℓ ) of order p + 1
in the definition of Bi (x) and using assumption (mσ ), we have
n
1∑
∥Bi (x) − {Q̂(x) β(x)}i ∥ ≤ Mn fX (x),
hpn+1 ∥ψi,hn (X ℓ − x)∥ Khn (X ℓ − x) = O(Mn hpn+1 )ˆ
n
ℓ=1
∑n
where the kernel density estimator ˆ
fX (x) = ℓ=1 Khn (X ℓ − x)/n converges to fX (x) uniformly in x ∈ Jn , see Theorem 6
by [17]. In sum, we have
( p+1 )
sup |r2 (x)| = OP
Mn hn
(1) (2) = oP (n−1/2 ) (C.7)
x∈Jn αn αn

using assumption (Bw). Now assertion (i) for m̂1 follows from (C.3), (C.4), (C.6) and (C.7).
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 159

Proof of assertion (ii) for â1 . Note that p ≥ d and thus â1 is (d + 1)-times partially differentiable and
⎧ ⎫
i i ′ i i ′
⎪ ⎪
⎨ |D â1 (x) − D â1 (x )| |D â1 (x) − D â1 (x )| ⎬
∥â1 ∥d+δ = max sup |Di â1 (x)| + max max sup , sup
i∈I(d,d) x∈Jn i∈I(d,d) ⎪ x,x′ ∈Jn ∥x − x′ ∥δ x,x′ ∈Jn ∥x − x′ ∥δ ⎪
i.=d ⎩ ⎭
∥x−x′ ∥≤hn ∥x−x′ ∥>hn

≤ max sup |Di â1 (x)| + max sup |Di â1 (x)| h1n−δ + 2 max sup |Di â1 (x)| h−δ
n
i∈I(d,d) x∈Jn i∈I(d,d+1) x∈J
n
i∈I(d,d) x∈J
n
i.=d+1 i.=d

by the Mean Value Theorem. Again by Theorem 2 of [17] we have, for all i ∈ I(d, d + 1),
sup |hin. Di Ak (x)| = OP (ϱn ).
x∈Jn

Further note that


∂ −1 ∂
{ }
Q (x) = Q−1 (x) Q(x) Q−1 (x)
∂ xk ∂ xk
(1)
and that the spectral norm of Q−1 (x) can be bounded by O(1/αn ) with the same considerations as before. We apply the
product rule for derivatives to obtain
ℓ ∑
j d+1 j
k ℓ−j k d+1−j
ϱn h− ϱn h−
∑ ( ) ∑ ∑ ( )
n Mn n Mn
∥â1 ∥d+δ ≤ max OP (1) (2) + OP (1) (2) h1n−δ
ℓ∈{1,...,d} (αn )j−k+1 (αn )ℓ−j+1 (αn )j−k+1 (αn )d−j+2
j=0 k=0 j=0 k=0
d j
k d−j
ϱn h−
∑ ∑ ( )
n Mn
+2 OP (1) j−k+1 (2) d−j+1 h−δ
n
(α )
n (α ) n
j=0 k=0
d d−1 d
ϱn h− ϱn h − ϱn h−
( ) ( ) ( ) ( )
ϱn
= OP n
(1) (2) + OP n
(1) (2) h1n−δ + OP n
(1) (2) h−δ
n = OP (1) (2) = oP (1)
αn αn αn αn αn αn hnd+δ αn αn

by assumption (Bw). Assertion (ii) for â1 follows.

Proof of assertion (iii) for â1 . From the definition of â1 and (C.5) we obtain that
( )
ϱn
sup |â1 (x)| = OP (1) (2) = oP (n−1/4 )
x∈Jn αn αn

and thus (iii) follows for â1 .

Proof of assertion (iv) for â1 . To prove (iv), note that


∫ n
1∑
â1 (x)fX (x) dx = ε1i ∆n (X i )
Jn n
i=1

with

1
∆n (X i ) = σ1 (X i ) e1 ⊤ Q−1 (x) ψ hn (X i − x) Khn (X i − x) fX (x) dx
σ1 (x)

fX (X i − uhn )
= σ1 (X i ) e1 ⊤ Q−1 (X i − uhn ) ψ (u) K (u) du.
[
X i −cn
hn
,
X i +cn
hn
] σ1 (X i − uhn )

n ) = 0. Further, for Jn = [−cn + hn , cn − hn ]


From the support properties of the kernel function it follows that ∆n (X i )1(X i ̸ ∈ J+ − d

note that
n
1∑ −1/2
ε1i ∆n (X i ) 1(X i ∈ J+ −
n \ Jn ) = oP (n ) (C.8)
n
i=1

because the expectation is zero and the variance is bounded by


1 1 1 1 ( )
n \ Jn ) σ1 (X 1 )} ≤
2
sup sup fX (x) E {1(X 1 ∈ J+ −
O 1
Mn Pr(X 1 ∈ J+ −
n \ Jn )
n x∈Jn σ1 (x) λn x∈Rd n
(1) (2)
αn αn

1 ( )
= O Mn h n
(1) (2) = o(n−1 ).
n αn αn

It remains to consider
n
1∑
ε1i ∆n (X i ) 1(X i ∈ J−
n ),
n
i=1
160 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

(1) (2)
with ∆n (X i ) = ∆n (X i ) + ∆n (X i ), where

∆(1)
n (X i ) = e1 ⊤ Q−1 (X i − uhn ) ψ (u) K (u) fX (X i − uhn ) du,
[−1,1]d

∆(2)
n (X i ) = e1 ⊤ Q−1 (X i − uhn ) ψ (u) K (u) fX (X i − uhn ) {σ1 (X i )/σ1 (X i − uhn ) − 1} du.
[−1,1]d
(2) (1) (2)
Now, by applying the mean value theorem for σ1 , for X i ∈ J−
n , |∆n (X i )| can be bounded by O{Mn hn /(αn αn )} = o(1). Thus
analogously as when showing (C.8) one can use Markov’s inequality to get
∫ n n
1∑ 1∑ −1/2
â1 (x)fX (x) dx − ε1i = ε1i {∆(1) −
n (X i ) − 1} 1(X i ∈ Jn ) + oP (n ).
Jn n n
i=1 i=1

(1)
To obtain the desired negligibility it remains to show E[{∆n (X i ) − 1}2 1(X i ∈ J−
n )] → 0. To this end we write

∗ (x − hn u) ψ (u) K (u) fX (x − hn u) du,
1
1= e 1 ⊤ Q−
[−1,1]d

where the matrix Q∗ (x) has entries



Q∗ik (x) = fX (x) ψi (u) ψk (u) K (u) du

with i, k ∈ I. Note that Q∗ (x) has the smallest eigenvalue of order λn . Thus we can write

E[{∆(1) 2 −
n (X i ) − 1} 1(X i ∈ Jn )]
[[∫ ]2 ]
∗ (X i − uhn )} ψ (u) K (u) fX (X i − uhn ) du
e1 ⊤ {Q−1 (X i − uhn ) − Q−1
=E 1(X i ∈ J−
n)
[−1,1]d
∫ ∫
≤ O(1) ∥Q−1 (x − uhn ) − Q− 1 2
∗ (x − uhn )∥ du fX (x) dx
J−
n [−1,1]d
∫ ∫
≤ O(1) sup ∥Q−1 (x)∥2 sup ∥Q−

1
(x)∥ 2
∥Q(x − uhn ) − Q∗ (x − uhn )∥2 du fX (x) dx.
x∈Jn x∈Jn J−
n [−1,1]d

Now with bounds for the matrix norms similar to before, and inserting the definitions of Q and Q∗ , we obtain
( )∫ ∫ ∫
E[{∆(1) 2 −
n (X i ) − 1} 1(X i ∈ Jn )] ≤ O
1
(1) |fX (x − uhn + hn v) − fX (x − uhn )|2 K (v) dv du fX (x) dx
(αn )4
J−
n [−1,1]d [−1,1]d
= O{h2n /(αn(1) )4 } = o(1)
by the Mean Value Theorem and assumptions (FX ) and (k).

Proof of assertions (i)–(iv) for σ̂1 . Recall the definition σ̂12 = ŝ1 − m̂21 , where ŝ1 is the local polynomial estimator based on
(X 1 , Y11
2
), . . . , (X n , Y1n
2
). With the notation s1 (x) = E(Y1i2 | X i = x) = σ12 (x) + m21 (x), we obtain
σ̂1 (x) ŝ1 (x) − s1 (x) m̂1 (x) − m1 (x) m1 (x)
=1+ − + r(x),
σ1 (x) 2σ12 (x) σ1 (x) σ1 (x)
where
1 {m̂1 (x) − m1 (x)}2 {σ̂ 2
1 (x) −σ
2
1 (x)
} 2
r(x) = − − .
2 σ 2
1 (x) 2σ 2
1 (x) {σ̂
1 (x) +σ 1 (x)
} 2

For x ∈ Jn , set

ĉ1 (x) = e1 ⊤ Q−1 (x) Ã(x)/{2 σ12 (x)},

where Ã(x) denotes the vector with components


n
1∑
Ãi (x) = {2 m1 (X ℓ ) σ1 (X ℓ ) ε1ℓ + σ12 (X ℓ ) (ε12ℓ − 1)} ψi,hn (X ℓ − x) Khn (X ℓ − x)
n
ℓ=1

with i ∈ I. Along the lines of the proof of (i) and (ii) for m̂1 , one can prove that
⏐ ⏐
⏐ ŝ1 (x) − s1 (x) p+1 )
⏐ = OP (1) ϱn (2)
2
( ) (
Mn h n
= oP (n−1/2 )

sup ⏐
⏐ − ĉ1 (x) + OP
2σ 2 (x)
(1) (2)
x∈J
⏐ (αn )|I| (αn )2 αn (αn )2
n 1
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 161

and
( )
ϱn
sup |ĉ1 (x)| = OP (1) (2) = oP (n−1/4 ).
x∈Jn αn (αn )2

Now noticing that σ̂12 (x) − σ12 (x) = ŝ1 (x) − s1 (x) − {m̂1 (x) − m1 (x)}{m̂1 (x) + m1 (x)}, we obtain the rate
ϱn2
( )
sup |r(x)| = oP (n−1/2 ) + OP (1) 2 (2) = oP (n−1/2 )
x∈Jn (αn ) (αn )4

and (i) follows for σ̂1 .


If we define b̂1 (x) = 1 + ĉ1 (x) − â1 (x)m1 (x)/σ1 (x), then (ii) and (iii) follow analogously to before. The only difference is
an additional factor σ1 (x) in the denominator that needs to be considered.
To show validity of (iv) note that the regression model Y1i2 = s1 (X i ) + ηi holds with error term ηi = σi2 (X i )(ε1i 2
− 1) +
2m1 (X i )σ1 (X i )ε1i . From this one obtains analogously to the derivation of (iv) for â1 that
∫ n n
ŝ1 (x) 1 ∑ 2 1 ∑ m1 (X i )
fX (x) dx = (ε1i − 1) + ε1i + oP (n−1/2 ).
Jn 2σ12 (x) 2n n σ1 (X i )
i=1 i=1

â1 (x){m1 (x)/σ1 (x)}fX (x) dx, which is again shown analogously to the

But the second sum is also the dominating term in Jn
proof of (iv) for â1 . Thus (iv) follows for b̂1 . □

Remark 5. Note that due to property (iii) of Lemma 1 and (C.1), we have, for x ∈ Jn = [−cn , cn ]d ,
∥x∥ν |â1 (x)| ≤ O{(ln n)ν/d } o(n−1/4 ) = o(1)
for every ν > 0. In the proof of Lemma 1, â1 (x) was only defined for x ∈ Jn . Now we define â1 on Rd in such a way that if
â1 ∈ C1d+δ (Jn ) and ∥x∥ν |â1 (x)| ≤ 1, then â1 ∈ G defined in (A.9). Then Pr(â1 ∈ G ) → 1 by Lemma 1. Analogously b̂1 is defined
on Rd such that Pr(b̂1 ∈ G̃ ) → 1 for G̃ from (A.10).

Lemma 2. Let H = G or G̃ denote one of the function classes defined in (A.9) and (A.10) (depending on ν > 0 and δ ∈ (0, 1]),
then we have
[ ]
ln N(ϵ, H, ∥ · ∥∞ ) = O ϵ −{d/ν+d/(d+δ )}

as ϵ ↘ 0, and thus the same bound holds for ln N[ ] (ϵ, H, ∥ · ∥2 ).

Proof. Let H = G (the proof is similar for G̃ ) and let ϵ > 0. Choose D = D(ϵ ) = ϵ −1/ν . Let B denote the ball of radius D around
the origin. Let a1 , . . . , am : B → R denote the centers of ϵ -balls with respect to the supremum norm that cover C1d+δ (B),
i.e., m = N(ϵ, C1d+δ (B), ∥ · ∥∞ ). Then for each a ∈ G we have a|B ∈ C1d+δ (B) and thus there exists j0 ∈ {1, . . . , m} such that
supx∈B |a(x) − aj0 (x)| ≤ ϵ . Now for each j ∈ {1, . . . , m}, define aj (x) = 0 for x ∈ Rd \ B. Then
{ }
sup |a(x) − aj0 (x)| ≤ max ϵ, sup |a(x)| ≤ ϵ
x∈Rd ∥x∥≥D
ν
because ∥x∥ |a(x)| ≤ 1 by definition of G . We obtain N(ϵ, G , ∥ · ∥∞ ) ≤ m and due to Theorem 2.7.1 in van der Vaart and
Wellner [33], we have, for some universal K ,
[ ]
ln m ≤ K λd (B1 ) ϵ −d/(d+δ ) = O{(D + 2)d } ϵ −d/(d+δ ) = O ϵ −{d/ν+d/(d+δ )} ,

where B1 = {x : ∥x − B∥ < 1}. Thus the first assertion follows. The second assertion follows by the proof of Corollary 2.7.2
in [33]. □

References

[1] B. Berghaus, A. Bücher, S. Volgushev, Weak convergence of the empirical copula process with respect to weighted metrics, Bernoulli 23 (2017)
743−772.
[2] P.J. Bickel, M.J. Wichura, Convergence criteria for multiparameter stochastic processes and some applications, Ann. Math. Stat. 42 (1971) 1656−1670.
[3] B. Brahimi, A. Necir, A semiparametric estimation of copula models based on the method of moments, Stat. Methodol. 9 (2012) 467–477.
[4] A. Bücher, S. Volgushev, Empirical and sequential empirical copula processes under serial dependence, J. Multivariate Anal. 119 (2013) 61–70.
[5] N.-H. Chan, J. Chen, X. Chen, Y. Fan, L. Peng, Statistical inference for multivariate residual copula of GARCH models, Statist. Sinica 19 (2009) 53–70.
[6] X. Chen, Y. Fan, Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification, J.
Econometrics 135 (2006) 125–154.
[7] J. Dedecker, S. Louhichi, Maximal inequalities and empirical central limit theorems, in: H. Dehling, T. Mikosch, M. Sorensen (Eds.), Empirical Process
Techniques for Dependent Data, Birkhäuser Boston, 2002, pp. 137–160.
[8] H. Dette, J.C. Pardo-Fernández, I. Van Keilegom, Goodness-of-fit tests for multiplicative models with dependent data, Scand. J. Stat. 36 (2009) 782–799.
[9] J. Fan, I. Gijbels, Local Polynomial Modelling and its Applications, Chapman & Hall/CRC, London, 1996.
[10] J. Fan, Q. Yao, Nonlinear Time Series: Nonparametric and Parametric Methods, Springer, New York, 2005.
162 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162

[11] J. Gao, Nonlinear Time Series: Semiparametric and Nonparametric Methods, Chapman & Hall/CRC, Boca Raton, FL, 2007.
[12] C. Genest, K. Ghoudi, L.-P. Rivest, A semiparametric estimation procedure of dependence parameters in multivariate families of distributions,
Biometrika 82 (1995) 543–552.
[13] C. Genest, J.G. Nešlehová, B. Rémillard, Asymptotic behavior of the empirical multilinear copula process under broad conditions, J. Multivariate Anal.
159 (2017) 82–110.
[14] C. Genest, B. Rémillard, D. Beaudoin, Goodness-of-fit tests for copulas: A review and a power study, Insurance Math. Econom. 44 (2009) 199–213.
[15] I. Gijbels, M. Omelka, M. Pešta, N. Veraverbeke, Score tests for covariate effects in conditional copulas, J. Multivariate Anal. 159 (2017) 111–133.
[16] I. Gijbels, M. Omelka, N. Veraverbeke, Estimation of a copula when a covariate affects only marginal distributions, Scand. J. Stat. 42 (2015) 1109–1126.
[17] B.E. Hansen, Uniform convergence rates for kernel estimation with dependent data, Econom. Theory 24 (2008) 726–748.
[18] W. Härdle, A. Tsybakov, L. Yang, Nonparametric vector autoregression, J. Statist. Plann. Inference 68 (1998) 221–245.
[19] G. Kim, M.J. Silvapulle, P. Silvapulle, Semiparametric estimation of the error distribution in multivariate regression using copulas, Aust. N. Z. J. Stat.
49 (2007) 321–336.
[20] G. Kim, M.J. Silvapulle, P. Silvapulle, Estimating the error distribution in multivariate heteroscedastic time-series models, J. Statist. Plann. Inference
138 (2008) 1442–1458.
[21] I. Kojadinovic, M. Holmes, Tests of independence among continuous random vectors based on Cramér–von Mises functionals of the empirical copula
process, J. Multivariate Anal. 100 (2009) 1137–1154.
[22] H.L. Koul, X. Zhu, Goodness-of-fit testing of error distribution in nonparametric ARCH(1) models, J. Multivariate Anal. 137 (2015) 141–160.
[23] E. Masry, Multivariate local polynomial regression for time series: uniform strong consistency and rates, J. Time Series Anal. 17 (1996) 571–599.
[24] A.J. McNeil, R. Frey, P. Embrechts, Quantitative Risk Management: Concepts, Techniques and Tools, Princeton University Press, Princeton, NJ, 2005.
[25] U.U. Müller, A. Schick, W. Wefelmeyer, Estimating the error distribution function in nonparametric regression, Statist. Probab. Lett. 79 (2009) 957–964.
[26] R.B. Nelsen, An Introduction to Copulas, second ed, Springer, New York, 2006.
[27] A.J. Patton, A review of copula models for economic time series, J. Multivariate Anal. 110 (2012) 4–18.
[28] F. Portier, J. Segers, On the weak convergence of the empirical conditional copula under a simplifying assumption, J. Multivariate Anal. 166 (2018)
160–181.
[29] R Core Team, R : A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2018.
[30] B. Rémillard, N. Papageorgiou, F. Soustra, Copula-based semiparametric models for multivariate time series, J. Multivariate Anal. 110 (2012) 30–42.
[31] J. Segers, Weak convergence of empirical copula processes under nonrestrictive smoothness assumptions, Bernoulli 18 (2012) 764–782.
[32] H. Tsukahara, Semiparametric estimation in copula models, Canad. J. Statist. 33 (2005) 357–375.
[33] A.W. van der Vaart, J.A. Wellner, Weak Convergence and Empirical Processes, Springer, New York, 1996.
[34] A.W. van der Vaart, J.A. Wellner, Empirical processes indexed by estimated functions, in: E.A. Cator, G. Jongbloed, C. Kraaikamp, H.P. Lopuhaä, J.A.
Wellner (Eds.), Asymptotics: Particles, Processes and Inverse Problems, Institute of Mathematical Statistics, Hayward, CA, 2007, pp. 234–252.
[35] N. Veraverbeke, M. Omelka, I. Gijbels, Estimation of a conditional copula and association measures, Scand. J. Stat. 38 (2011) 766–780.
[36] L. Yang, W.K. Härdle, J. Nielsen, Nonparametric autoregression with multiplicative volatility and additive mean, J. Time Series Anal. 20 (1999) 579–604.

You might also like