copula approach for dependence
copula approach for dependence
copula approach for dependence
article info a b s t r a c t
Article history: This paper is concerned with modeling the dependence structure of two (or more) time
Received 18 May 2017 series in the presence of a (possibly multivariate) covariate which may include past values
Available online 6 December 2018 of the time series. We assume that the covariate influences only the conditional mean and
AMS 2010 subject classifications: the conditional variance of each of the time series but the distribution of the standardized
primary 62H12 innovations is not influenced by the covariate and is stable in time. The joint distribution of
secondary 62G05 the time series is then determined by the conditional means, the conditional variances and
62M10 the marginal distributions of the innovations, which we estimate nonparametrically, and
Keywords: the copula of the innovations, which represents the dependency structure. We consider a
Asymptotic representation nonparametric and a semiparametric estimator based on the estimated residuals. We show
CHARN model that under suitable assumptions, these copula estimators are asymptotically equivalent to
Empirical copula process estimators that would be based on the unobserved innovations. The theoretical results are
Goodness-of-fit testing illustrated by simulations and a real data example.
Nonparametric AR-ARCH model
© 2018 Elsevier Inc. All rights reserved.
Nonparametric SCOMDY model
Weak convergence
1. Introduction
Modeling the dependence of k observed time series can be of utmost importance for applications, e.g., in risk management
to model the dependence between several exchange rates. We will consider the problem of modeling k dependent
nonparametric AR-ARCH time series defined, for all i ∈ {1, . . . , n}, j ∈ {1, . . . , k}, by
Yji = mj (X i ) + σj (X i ) εji ,
where the covariate X i may include past values of the process, Yj i−1 , Yj i−2 , . . . with j ∈ {1, . . . , k}, or other exogenous
variables. It will be assumed that the innovations (ε1i , . . . , εki ) with i ∈ Z, are mutually independent and identically
distributed random vectors and that (ε1i , . . . , εki ) is independent of the past and present covariates {X ℓ : ℓ ≤ i} for all
i ∈ Z. For identifiability we further assume E (εji ) = 0, var(εji ) = 1 for all j ∈ {1, . . . , k}, so that the functions mj and
σj represent the conditional mean and volatility function of the jth time series. Such models are also called multivariate
nonparametric CHARN (conditional heteroscedastic autoregressive nonlinear) models and have gained much attention over
the last decades; see Fan and Yao [10] and Gao [11] for extensive overviews.
∗ Corresponding author.
E-mail address: [email protected] (M. Omelka).
https://doi.org/10.1016/j.jmva.2018.11.016
0047-259X/© 2018 Elsevier Inc. All rights reserved.
140 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162
Note that due to the structure of the model and Sklar’s theorem, see, e.g., Nelsen [26], for zj = {yj − mj (x)}/σj (x) with
j ∈ {1, . . . , k}, one has
Pr(Y1i ≤ y1 , . . . , Yki ≤ yk | X i = x) = Pr(ε1i ≤ z1 , . . . , εki ≤ zk ) = C {F1ε (z1 ), . . . , Fkε (zk )},
where F1ε , . . . , Fkε denote the marginal distributions of the innovations and C their copula. Thus the joint conditional
distribution of the observations, given the covariate, is completely specified by the individual conditional mean and variance
functions, the marginal distributions of the innovations, and their copula. The copula C describes the dependence structure
of the k time series, conditional on the covariates, after removing influences of the conditional means and variances as well
as marginal distributions.
We will model the conditional mean and variance function nonparametrically like Härdle et al. [18], among others.
Semiparametric estimation, e.g., with additive structure for mj and multiplicative structure for σj2 as in Yang et al. [36],
can be considered as well and all results derived herein remain valid under appropriate changes for the estimators and
assumptions. Further we will model the marginal distributions of the innovations nonparametrically, whereas we will
consider two options for the estimation of the copula C : a parametric and a nonparametric approach. As the innovations
are unobservable, both estimators will be based on estimated residuals. We will show that the asymptotic distribution is not
affected by the necessary pre-estimation of the mean and variance functions. This remarkable result is intrinsic for copula
estimation and it was already observed in (semi-)parametric estimation of copula; see the references in the next paragraph.
In contrast, the asymptotic distribution of empirical distribution functions is typically influenced by pre-estimation of mean
and variance functions. Moreover, a comparison between the parametric and nonparametric copula estimator allows us to
test the fit of a parametric class of copulas.
Our approach extends the following parametric and semiparametric approaches in time series contexts. Chen and Fan
[6] introduced SCOMDY (semiparametric copula-based multivariate dynamic) models which are very similar to the model
considered here. However, the conditional mean and variance functions are modeled parametrically, while the marginal
distributions of innovations are estimated nonparametrically and a parametric copula model is applied to model the
dependence. See also Kim et al. [19] for similar methods for some parametric time series models including nonlinear GARCH
models, Kim et al. [20], Rémillard et al. [30], and the review by Patton [27]. Chan et al. [5] even give (next to the parametric
estimation of a copula) a goodness-of fit test for the innovation copula in the GARCH context. Further, in an iid setting, Gijbels
et al. [16] show that in nonparametric location-scale models the asymptotic distribution of the empirical copula is not
influenced by pre-estimation of the mean and variance function. This result was further generalized by Portier and Segers
[28] to a completely nonparametric model for the marginals.
The remainder of the paper is organized as follows. In Section 2 we define the estimators and state some regularity
assumptions. In Section 2.1 we show the weak convergence of the copula process, while in Section 2.2 we show the
asymptotic normality of a parameter estimator when considering a parametric class of copulas. Section 2.3 is devoted to
goodness-of-fit testing. In Section 3 we present simulation results and Section 4 features a real data example. All proofs are
given in the Appendix.
2. Main results
For the ease of presentation we will focus on the case of two time series, i.e., k = 2, but all results can be extended to
general k ≥ 2 in an obvious manner. Suppose we have observed, for i ∈ {1, . . . , n}, a section of the stationary stochastic
process {Y1i , Y2i , X i }i∈Z that satisfies, for all i ∈ {1, . . . , n},
Y1i = m1 (X i ) + σ1 (X i ) ε1i , Y2i = m2 (X i ) + σ2 (X i ) ε2i ,
where X i = (Xi1 , . . . , Xid ) is a d-dimensional covariate and the innovations {(ε1i , ε2i ) : i ∈ Z} are independent identically
⊤
distributed random vectors. Further assume that (ε1i , ε2i ) is independent of the past and present covariates X k , k ≤ i, for
all i ∈ Z, and that E (ε1i ) = E (ε2i ) = 0, var(ε1i ) = var(ε2i ) = 1. If the marginal distribution functions F1ε and F2ε of the
innovations are continuous, then the copula C of the innovations is unique and can be expressed, for all (u1 , u2 ) ∈ [0, 1]2 , as
Here I = I(d, p) denotes the set of multi-indices i = (i1 , . . . , id ) with i. = i1 + · · · + id ≤ p and ψi,hn (x) =
d (k) ik
k=1 (xk /hn ) (1/ik !). In addition,
∏
d ( )
∏ 1 Xℓk − xk
Khn (X ℓ − x) = (k)
k (k)
,
k=1
hn hn
(1) (d)
with k being a kernel function and hn = (hn , . . . , hn ) the smoothing parameter. Moreover, σj2 (x) is estimated as
ℓ+δ
where Di = ∂ i. /∂ x11 · · · ∂ xdd , and ∥·∥ is the Euclidean norm on Rd . Denote by CM
i i
(J) the set of ℓ-times differentiable functions
f on J, such that ∥f ∥ℓ+δ ≤ M. Denote by C̃2ℓ+δ (J) the subset of C2ℓ+δ (J) of the functions that satisfy infx∈J f (x) ≥ 1/2.
In what follows we are going to prove that under appropriate regularity assumptions, using the estimated residuals (2)
instead of the (true) unobserved innovations εji affects neither the asymptotic distribution of the empirical copula estimator
nor the parametric estimator of a copula.
is the estimate of the joint distribution function Fε (y1 , y2 ) and for j ∈ {1, 2},
n
1 ∑
F̂jε̂ (y) = wni 1(ε̂ji ≤ y),
Wn
i=1
the corresponding marginal empirical cumulative distribution functions. Here we make use of a weight function wn (x) =
1(x ∈ Jn ) and put wni = wn (X i ) as well as Wn = wn1 + · · · + wnn . For some real positive sequence cn → ∞ we set
Jn = [−cn , cn ]d .
(or)
Now let Cn be the ‘oracle’ estimator based on the unobserved innovations, i.e.,
Cn(or) (u1 , u2 ) = F̂ε {F̂1−ε1 (u1 ), F̂2−ε1 (u2 )},
where
n
1∑
F̂ε (z1 , z2 ) = 1(ε1i ≤ z1 , ε2i ≤ z2 )
n
i=1
is the estimator of Fε (z1 , z2 ) based on the unobserved innovations and F̂1ε , F̂2ε the corresponding marginal empirical
cumulative distribution functions.
Regularity assumptions
(β) The process (X i , Y1i , Y2i )i∈Z is strictly stationary and absolutely regular (β -mixing) with the mixing coefficient βi that
satisfies βi = O(i−b ) with b > d + 3.
(Fε ) The second-order partial derivatives Fε(1,1) , Fε(1,2) and Fε(2,2) of the joint cumulative distribution function Fε (y1 , y2 ) =
Pr(ε1 ≤ y1 , ε2 ≤ y2 ), with Fε(j,k) (y1 , y2 ) = ∂ 2 Fε (y1 , y2 )/(∂ yj ∂ yk ), satisfy
(FX ) The common density fX of the observations X i with i ∈ Z is bounded and differentiable with bounded uniformly
continuous first order partial derivatives. Suppose that the sequence cn which is of order O{(ln n)1/d } is chosen in such
a way that infx∈Jn fX (x) converges to zero not faster than some negative power of ln n.
(M) For some s > (2b − 2 − d)/(b − 3 − d) with b from assumption (β), for j ∈ {1, 2}, E |εj0 |2s < ∞, the functions σj2s fX
and |mj σj |s fX are bounded and there are some i∗ ∈ N, B > 0 such that for all i ≥ i∗ ,
sup σj2 (x0 )σj2 (xi )fX 0 ,X i (x0 , xi ) ≤ B, sup |mj (x0 )mj (xi )| σj (x0 )σj (xi )fX 0 ,X i (x0 , xi ) ≤ B,
x0 ,xi x0 ,xi
Remark 1. Using Fε (y1 , y2 ) = C {F1ε (y1 ), F2ε (y2 )}, assumption (Fε ) requires that
max sup |C (j,k) (u1 , u2 ) fjε {Fj−ε 1 (uj )} fkε {Fk−ε1 (uk )}
j,k∈{1,2} u ,u ∈[0,1]
1 2
+ C (j) (u1 , u2 ) fjε′ {Fj−ε 1 (uj )}1(j = k)| {1 + |Fj−ε 1 (uj )|}{1 + |Fk−ε1 (uk )|} < ∞,
where C (j) (u1 , u2 ) = ∂ C (u1 , u2 )/∂ uj and C (j,k) (u1 , u2 ) = ∂ 2 C (u1 , u2 )/(∂ uj ∂ uk ) stand for the first and second order partial
derivatives of the copula function. Thus provided that for some η > 0
C (j,k) (u1 , u2 ) = O{uj (1 − uj )−η uk (1 − uk )−η },
−η −η
1 −1 η η −1 −1
then we need that the functions fjε (Fj− ′
ε (u)){1+|Fjε (u)|} are of order O{u (1−u) } and the functions fjε {Fjε (uj )}{1+|Fjε (uj )|}
2
are bounded.
Remark 2. Parts of our assumptions are reproduced from Hansen [17] because we apply his results about uniform rates of
convergence for kernel estimators several times in our proofs. Note that in his Theorem 2, we set q = ∞ to simplify the
assumptions. Further note that if beta mixing coefficients are diminishing exponentially fast, then it is sufficient to assume
s > 2 in (M).
Remark 3. Note that the bandwidth conditions (4) can be fulfilled if and only if 2p + 2 > 3d + 2δ , i.e., in view of assumption
(Bw) if and only if 2p + 2 > 3d + 2d/(b − 1). Thus if b > 2d + 1, then for d = 1 it is sufficient to take p = 1 and for d = 2 one
can take p = 3. In general with increasing dimension d, higher smoothness of the unknown functions has to be assumed and
higher order local polynomial estimators have to be used. This phenomenon is well known in the context of nonparametric
inference.
So in general one can choose the bandwidth as hn ∼ n−1/a , where a ∈ (3d + 2d/(b − 1), 2p + 2). The problem is that if
one wants to take p as small as possible, the range of possible values for a is rather short, which makes the choice of a rather
delicate. To make the choice of a more flexible in practice, one can for instance assume that b > 10d + 1 which (among
others) includes models for beta mixing coefficients diminishing exponentially fast. Now for d = 1 and p = 1 one can take
a in the interval (3.1, 4). See also the bandwidth choice in our simulation study in Section 3.
Remark 4. The choice of cn is a delicate problem in practice. As far as we know even in analogous settings, see, e.g., [8,22,25]
and the references therein, this problem has not yet been addressed. Note that the weight function wn (x) is chosen in the
simplest possible form in order to simplify the presentation of the proof. In practice it is of interest to use more general
forms of Jn . Further as the density fX is unknown, data-driven procedures to choose Jn are of interest. In the simulation
study reported in Section 3, we suggest a data-driven procedure for the choice of the weighting function in the case d = 1.
Nevertheless the data driven choice of Jn (in particular for general d) and its theoretical justification call for further research.
Theorem 1. Suppose that assumptions (β), (Fε ), (FX ), (Bw), (M), (k), (Jn ) and (mσ ) are satisfied. Then
√
sup | n {C̃n (u1 , u2 ) − Cn(or) (u1 , u2 )}| = oP (1).
(u1 ,u2 )∈[0,1]2
√ (or)
Note that Theorem 1 together with the weak convergence of n (Cn −√C ) as reported, e.g., in Proposition 3.1 of [31]
or Theorem 1 (together with Remark 2) in [13], implies that process C̃n = n (C̃n − C ) weakly converges in the space of
bounded functions ℓ∞ ([0, 1]2 ) to a centered Gaussian process GC , which can be written as
GC (u1 , u2 ) = BC (u1 , u2 ) − C (1) (u1 , u2 ) BC (u1 , 1) − C (2) (u1 , u2 ) BC (1, u2 ) ,
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 143
The copula C describes the dependency between the two time series of interest, given the covariate. For applications
modeling this dependency structure parametrically is advantageous because a parametric model often gives easier access
to interpretations. Goodness-of-fit testing will be considered in the next section.
Suppose that the joint distribution of (ε1i , ε2i ) is given by the copula C (u1 , u2 ; θ ), where θ = (θ1 , . . . , θp )⊤ is an
unknown parameter that belongs to a parametric space Θ ⊂ Rp . In copula settings we are often interested in semi-
parametric estimation of the parameter θ , i.e., estimation of θ without making any parametric assumption on the marginal
distributions F1ε and F2ε . The methods of semiparametric estimation for iid settings are summarized in Tsukahara [32]. The
question of interest is what happens if we use the estimated residuals (2) instead of the unobserved innovations εji . Generally
speaking, thanks to Theorem 1, the answer is that using ε̂ji instead of εji does not change the asymptotic distribution provided
that the parameter of interest can be written as a Hadamard differentiable functional of a copula.
is the theoretical Kendall’s tau and τ̂n is an estimate of Kendall’s tau. In our settings the Kendall’s tau would be computed
from the estimated residuals (ε̂1i , ε̂2i ) for which wni > 0. By Theorem 1 and Hadamard differentiability of Kendall’s tau
proved in Lemma 1 of [35], the estimators of Kendall’s tau based on ε̂ji or on εji are asymptotically equivalent. Thus provided
that τ ′ (θ ) ̸ = 0 one finds that, as n → ∞,
√
n (θ̂n(ik) − θ )⇝N [0, στ2 /{τ ′ (θ )}2 ],
where
στ2 = var{8 C (U11 , U21 ; θ ) − 4 U11 − 4 U21 } and (U11 , U21 ) = (F1ε (ε11 ), F2ε (ε21 )).
Analogously, one can show that working with residuals has asymptotically negligible effects also for the method of moments
introduced in Brahimi and Necir [3].
with
{∫ ∫ }−1
γ (u1 , u2 ; θ ) = δ(v1 , v2 ; θ ) δ (v1 , v2 ; θ ) dv1 dv2
⊤
δ(u1 , u2 ; θ ).
[0,1]2
144 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162
where ρ (u1 , u2 ; θ ) is a given loss function. This class of estimators includes among others the pseudo-maximum likelihood
(pl)
estimators θ̂ n , for which ρ (u1 , u2 ; θ ) = − ln c(u1 , u2 ; θ ), with c being the copula density function; see [12].
Note that the estimator θ̂ n is usually searched for as a solution to the estimating equations
n
∑
wni φ(Ũ1i , Ũ2i ; θ̂ n ) = 0p , (6)
i=1
where φ(u1 , u2 ; θ ) = ∂ρ (u1 , u2 ; θ )/∂θ . In [32], the estimator defined as the solution of (6) is called a rank approximate
Z -estimator.
In what follows we give general assumptions under which there exists a consistent root θ̂ n of the estimating equations (6)
(or)
that is asymptotically equivalent to the consistent root θ̂ n of the ‘oracle’ estimating equations given by
n
(or)
∑
φ(Û1i , Û2i ; θ̂ n ) = 0p , (7)
i=1
where
n
(Û1i , Û2i ) = (F̂1ε (ε1i ), F̂2ε (ε2i )) (8)
n+1
are the standard pseudo-observations calculated from the unobserved innovations and their marginal empirical distribution
functions F̂jε (y).
Unfortunately, these general assumptions exclude some useful models (e.g., pseudo-maximum likelihood estimator in
the Clayton family of copulas) for which the function φ(u1 , u2 ; θ ) viewed as a function of (u1 , u2 ) is unbounded. The reason
is that for empirical distribution functions computed from estimated residuals ε̂ji , we lack some of the sophisticated results
that are available for empirical distribution functions computed from (true) innovations εji . For such copula families one
can use, e.g., the method of moments using rank correlation (see Section 2.2.1) to stay on the safe side. Nevertheless the
simulation study in Section 3 suggests that the pseudo-maximum likelihood estimation can be used also for the Clayton
copula (and probably also for other families of copulas with non-zero tail dependence) provided that the dependence is not
very strong.
Regularity assumptions
In what follows let θ stand for the true value of the parameter and V (θ ) for an open neighborhood of θ .
(Id) θ is a unique minimizer of the function r(t) = E ρ (U1i , U2i ; t) and θ is an inner point of Θ .
(φ) There exists V (θ ) such that for each ℓ1 , ℓ2 ∈ {1, . . . , p} the functions φℓ1 (u1 , u2 ; t) = ∂ρ (u1 , u2 ; t)/∂ tℓ1 and
φℓ1 ,ℓ2 (u1 , u2 ; t) = ∂ρ (u1 , u2 ; t)/(∂ tℓ1 ∂ tℓ2 ) are uniformly continuous in (u1 , u2 ) uniformly in t ∈ V (θ ) and of uniformly
bounded Hardy–Krause variation; see, e.g., Berghaus et al. [1].
(φ(j) ) There exist V (θ ) and a function h(u1 , u2 ) such that for each t ∈ V (θ )
(j)
max max |φℓ (u1 , u2 ; t)| ≤ h(u1 , u2 ),
j∈{1,2} ℓ∈{1,...,p}
(j)
where φℓ (u1 , u2 ; t) = ∂φℓ (u1 , u2 ; t)/∂ uj and E {h(U11 , U21 )} < ∞.
(Γ) Each element of the (matrix) function Γ(t) = E {∂φ(U1 , U2 ; t)/(∂ t⊤ )} is a continuous function on V (θ ) and the matrix
Γ = Γ(θ ) is positively definite.
Theorem 2. Suppose that the assumptions of Theorem 1 are satisfied and that also (Id), (φ), (φ(j) ), and (Γ) hold. Then with
probability going to 1, there exists a consistent root θ̂ n of the estimating equations (6), which satisfies
√
n (θ̂ n − θ )⇝Np (0p , Γ−1 Σ Γ−1 ),
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 145
where
[ 2 ∫∫
]
∑ ∂
Σ = var φ(U11 , U21 ; θ ) + {1(Uj1 ≤ vj ) − vj } φ(v1 , v2 ; θ ) dC (v1 , v2 ; θ ) .
∂vj
j=1
The proof of Theorem 2 is given in Appendix B. Note that the asymptotic distribution of the estimator θ̂ n coincides with the
(or)
distribution given in Section 4 of Genest et al. [12] that corresponds to the consistent root θ̂ n of the estimating equations (7).
Thus using the residuals instead of the true innovations has asymptotically negligible effect on the (first-order) asymptotic
(or)
properties. In fact, it can be even shown that both θ̂ n and θ̂ n have the same asymptotic representations and thus
√ (or)
n (θ̂ n − θ̂ n ) = oP (1).
When modeling multivariate data using copulas parametrically, one needs to choose a suitable family of copulas. When
choosing the copula family tests of goodness-of-fit are often a useful tool. Thus we are interested in testing H0 : C ∈ C0 ,
where C0 = {Cθ : θ ∈ Θ } is a given parametric family of copulas.
Many testing methods have been proposed, see, e.g., [14,21] and the references therein. The most standard ones are based
on the comparison of nonparametric and parametric estimators of a copula. For instance the Cramér–von Mises statistic is
given by
∫∫
Sn = {C̃n (u1 , u2 ) − C (u1 , u2 ; θ̂ n )}2 dC̃n (u1 , u2 ),
3. Simulation study
A small Monte Carlo study was conducted in order to compare the semiparametric estimators based on the residuals with
the ‘oracle’ estimators based on (unobserved) innovations. The inversion of Kendall’s tau (IK) method and the maximum
pseudo-likelihood (MPL) method were considered for the following five copula families: Clayton, Frank, Gumbel, normal,
and Student with 4 degrees of freedom. The values of the parameters are chosen so that they correspond to the Kendall’s tau
τ ∈ {0.25, 0.50, 0.75}. The data were simulated from the following four models:
√ √
Y1i = (0.5 + 0.4 e−0.8 Xi ) Xi +
2
1 + 0.2 Xi2 ε1i , Y2i = 0.5 − 0.5 Xi + 1 + 0.4 Xi2 ε2i , (Mod 1)
where the innovations ε1i , ε2i follow marginally the standard normal distribution, and Xi is an exogenous variable following
the AR model Xi = 0.6 Xi−1 + ξi with ξi being iid from a standard normal distribution. The simulations were conducted also
for innovations ε1i , ε2i with Student marginals with 5 degrees of freedom, but the corresponding results are very similar. For
brevity of the paper, we do not present them here. All the simulations were conducted in R computing environment [29].
The nonparametric estimates m̂j and σ̂j are constructed as local polynomial estimators of order p = 1 with K being the
triweight kernel. The bandwidth hn is chosen for each estimation separately by the cross-validation method from the interval
(D, H), where D = σ̂Z /n1/(3+ε) and H = σ̂Z ln2 (n)/n1/(4−ε) for ε = 0.1 (see Remark 3) and σ̂Z is an estimate of the standard
deviation of the explanatory variable Z (being Xi or Yi−1 , depending on the model) given by σ̂Z = min{SZ , IQRZ /1.34}, where
SZ stands for the sample standard deviation and IQRZ is the interquartile range.
The weights are given by wn (z) = 1(z ∈ [cnL , cnU ]), where [cnL , cnU ] is the largest possible interval such that infz ∈[c L ,c U ] ˆ
fZ (z) ≥
n n
1/{σ̂Z ln2 (n)}, where ˆ
fZ is the kernel density estimator of the marginal density of Z with triweight kernel and the bandwidth
chosen by the standard normal reference rule; see, e.g., p. 201 in [10].
146 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162
Table 1
Estimation for the Clayton copula with normal marginals (100 multiples of bias, SD and RMSE).
Model τ Estim n = 200 n = 500 n = 1000
Bias SD RMSE Bias SD RMSE Bias SD RMSE
(ik,or)
θ̂
Known innovations
0.25 n −0.03 3.25 3.25 0.10 2.47 2.47 0.12 1.86 1.87
(pl,or)
0.25 θ̂
n 0.51 3.00 3.04 0.35 2.15 2.18 0.24 1.69 1.70
(ik,or)
0.50 θ̂
n 0.01 2.64 2.64 0.06 2.03 2.03 0.07 1.52 1.52
(pl,or)
0.50 θ̂
n 0.09 2.47 2.47 0.08 1.84 1.85 0.04 1.39 1.39
(ik,or)
0.75 θ̂
n 0.01 1.58 1.58 0.05 1.19 1.19 0.02 0.89 0.89
(pl,or)
0.75 θ̂
n −0.28 1.48 1.50 −0.17 1.10 1.11 −0.12 0.80 0.81
0.25 θ̂n(ik) −0.08 4.66 4.66 −0.22 2.97 2.97 −0.16 2.06 2.06
0.25 θ̂n(pl) 0.62 4.15 4.19 0.07 2.62 2.62 −0.02 1.82 1.82
1 0.50 θ̂n(ik) −0.46 3.94 3.97 −0.41 2.48 2.51 −0.25 1.74 1.76
0.50 θ̂n(pl) −0.90 3.59 3.70 −0.81 2.25 2.39 −0.55 1.60 1.69
0.75 θ̂n(ik) −1.04 2.45 2.66 −0.85 1.55 1.77 −0.59 1.07 1.22
0.75 θ̂n(pl) −3.00 2.66 4.01 −2.23 1.59 2.74 −1.57 1.15 1.94
0.25 θ̂n(ik) −0.43 4.78 4.79 −0.05 2.93 2.92 0.07 2.08 2.08
0.25 θ̂n(pl) 0.26 4.30 4.31 0.25 2.58 2.59 0.15 1.90 1.90
2 0.50 θ̂n(ik) −0.91 3.93 4.03 −0.24 2.40 2.41 −0.09 1.71 1.72
0.50 θ̂n(pl) −1.50 3.62 3.92 −0.57 2.21 2.29 −0.36 1.60 1.64
0.75 θ̂n(ik) −1.96 2.63 3.27 −0.70 1.52 1.68 −0.39 1.05 1.12
0.75 θ̂n(pl) −4.63 3.19 5.62 −2.14 1.84 2.82 −1.27 1.16 1.72
0.25 θ̂n(ik) −0.43 4.81 4.83 −0.09 2.91 2.91 0.03 2.09 2.09
0.25 θ̂n(pl) 0.24 4.37 4.38 0.19 2.56 2.57 0.11 1.90 1.90
3 0.50 θ̂n(ik) −0.93 3.97 4.07 −0.32 2.41 2.43 −0.16 1.72 1.72
0.50 θ̂n(pl) −1.52 3.70 4.00 −0.66 2.20 2.30 −0.46 1.61 1.67
0.75 θ̂n(ik) −1.85 2.61 3.20 −0.82 1.53 1.73 −0.53 1.04 1.16
0.75 θ̂n(pl) −4.39 3.05 5.35 −2.25 1.78 2.86 −1.46 1.14 1.85
0.25 θ̂n(ik) −0.49 4.85 4.87 −0.09 2.93 2.93 0.02 2.10 2.10
0.25 θ̂n(pl) 0.13 4.37 4.37 0.14 2.58 2.59 0.06 1.90 1.90
4 0.50 θ̂n(ik) −0.82 3.99 4.07 −0.25 2.40 2.41 −0.12 1.73 1.73
0.50 θ̂n(pl) −1.54 3.70 4.01 −0.80 2.22 2.36 −0.53 1.60 1.69
0.75 θ̂n(ik) −1.22 2.57 2.84 −0.49 1.48 1.56 −0.28 1.04 1.08
0.75 θ̂n(pl) −3.43 2.76 4.40 −1.93 1.65 2.54 −1.20 1.10 1.62
For each setting, we compute the estimate of the copula parameter θ from the true (but unobserved) innovations using
(ik,or) (pl,or)
the inversion of Kendall’s tau method (θ̂n ) and the maximum pseudo-likelihood method (θ̂n ). These oracle estimators
(ik) (pl)
are compared with their counterparts computed from the residuals (θ̂n ) and (θ̂n ). To have more comparable results for
different copula families, the estimates of the parameters are done on the Kendall’s tau scale. That is, we are in fact comparing
nonparametric estimates of Kendall’s tau with parametric estimates, where the parameter is estimated with the help of the
maximum pseudo-likelihood method [12]. The performance of the estimators is measured by the bias, standard deviation
(SD), and the root mean square error (RMSE), which are estimated from the 1000 random samples for chosen sample sizes
n ∈ {200, 500, 1000}. Since the obtained quantities are of order 10−2 and smaller, we report 100 multiples of bias, SD and
(ik) (pl)
RMSE in Tables 1–5. As θ̂n and θ̂n are natural competitors, the larger of the two corresponding performance measures
(bias, SD, RMSE) is stressed by bold font.
In agreement with the results of [12,32], the results for the (oracle) estimates based on (unobserved) innovations are in
favor of MPL method. This continues to hold also when working with estimated residuals provided that the dependence is
not very strong (i.e., τ = 0.25 or τ = 0.50). But if the dependence is strong (i.e., τ = 0.75), then one should consider using
the IK method. This seems to be true in particular for the Clayton copula and to some extent also for the Frank copula and
the Gumbel copula. A closer inspection of the results reveals that while the standard deviation of MPL method is almost
always slightly smaller than the standard deviation of the IK method, the bias can be substantially larger. In contrast,
the results suggest that for the normal and the Student copula one can stick with MPL method even in case of a strong
dependence.
Finally note that for large sample sizes, the performance of the estimates based on residuals is usually almost as good
as of the oracle estimates based on (unobserved) innovations. But there is still some price to pay even for the sample size
n = 1000 and this price increases somewhat with the level of dependence. The question for possible further research is how
to explain the bad performance of PML method based on residuals for the Clayton copula with a strong dependence.
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 147
Table 2
Estimation for the Frank copula with normal marginals (100 multiples of bias, SD and RMSE).
Model τ Estim n = 200 n = 500 n = 1000
Bias SD RMSE Bias SD RMSE Bias SD RMSE
(ik,or)
θ̂
Known innovations
0.25 n −0.01 3.16 3.16 −0.05 2.33 2.33 −0.14 1.70 1.71
(pl,or)
0.25 θ̂ n 0.04 3.16 3.15 −0.03 2.32 2.32 −0.12 1.70 1.70
(ik,or)
0.50 θ̂ n −0.02 2.37 2.37 −0.01 1.73 1.73 −0.09 1.28 1.28
(pl,or)
0.50 θ̂ n 0.00 2.34 2.34 −0.02 1.72 1.72 −0.08 1.27 1.27
(ik,or)
0.75 θ̂ n −0.02 1.18 1.18 0.00 0.87 0.87 −0.03 0.64 0.64
(pl,or)
0.75 θ̂ n −0.13 1.17 1.17 −0.07 0.87 0.87 −0.07 0.64 0.64
0.25 θ̂n(ik) −0.23 4.54 4.54 −0.11 2.82 2.82 −0.05 1.92 1.92
0.25 θ̂n(pl) −0.12 4.52 4.52 −0.05 2.81 2.81 −0.03 1.90 1.90
1 0.50 θ̂n(ik) −0.49 3.46 3.50 −0.32 2.18 2.20 −0.22 1.43 1.44
0.50 θ̂n(pl) −0.47 3.40 3.43 −0.30 2.15 2.17 −0.21 1.42 1.43
0.75 θ̂n(ik) −0.97 1.87 2.11 −0.69 1.15 1.34 −0.53 0.74 0.91
0.75 θ̂n(pl) −1.22 1.84 2.21 −0.81 1.16 1.41 −0.60 0.75 0.96
0.25 θ̂n(ik) −0.28 4.48 4.49 −0.15 2.78 2.79 −0.21 1.88 1.89
0.25 θ̂n(pl) −0.17 4.47 4.47 −0.12 2.77 2.77 −0.19 1.88 1.89
2 0.50 θ̂n(ik) −0.77 3.44 3.53 −0.29 2.13 2.14 −0.24 1.41 1.43
0.50 θ̂n(pl) −0.75 3.40 3.48 −0.31 2.10 2.12 −0.24 1.40 1.42
0.75 θ̂n(ik) −1.65 2.20 2.75 −0.66 1.18 1.35 −0.38 0.75 0.84
0.75 θ̂n(pl) −1.90 2.20 2.91 −0.78 1.18 1.41 −0.43 0.75 0.87
0.25 θ̂n(ik) −0.33 4.53 4.54 −0.17 2.77 2.77 −0.24 1.89 1.91
0.25 θ̂n(pl) −0.23 4.53 4.53 −0.14 2.75 2.75 −0.22 1.89 1.90
3 0.50 θ̂n(ik) −0.83 3.48 3.58 −0.37 2.09 2.12 −0.32 1.42 1.45
0.50 θ̂n(pl) −0.81 3.44 3.53 −0.38 2.06 2.10 −0.32 1.41 1.44
0.75 θ̂n(ik) −1.62 2.15 2.70 −0.77 1.14 1.37 −0.51 0.76 0.92
0.75 θ̂n(pl) −1.86 2.14 2.84 −0.89 1.14 1.44 −0.57 0.77 0.96
0.25 θ̂n(ik) −0.37 4.56 4.57 −0.16 2.79 2.80 −0.22 1.90 1.91
0.25 θ̂n(pl) −0.26 4.54 4.54 −0.13 2.79 2.79 −0.20 1.90 1.91
4 0.50 θ̂n(ik) −0.76 3.48 3.56 −0.30 2.13 2.15 −0.25 1.43 1.46
0.50 θ̂n(pl) −0.73 3.43 3.50 −0.31 2.11 2.13 −0.25 1.42 1.44
0.75 θ̂n(ik) −1.11 2.05 2.34 −0.48 1.15 1.24 −0.30 0.76 0.81
0.75 θ̂n(pl) −1.33 2.01 2.41 −0.58 1.14 1.28 −0.35 0.76 0.83
4. Application
To illustrate the proposed methods let us consider daily log returns of USD/CZK (US Dollar/Czech Koruna) and GBP/CZK
(British Pound/Czech Koruna) exchange rates from January 4, 2010 to December 31, 2012. Note that we take only data until
the end of 2012 (total of 758 observations for each series), because in November 2013 the Czech National Bank started an
action targeting the CZK/EUR exchange rate.
Daily foreign exchange rates have been successfully modeled using the nonparametric autoregression, e.g., in [18,36].
Here, we apply a simple model of two separate nonparametric autoregressions of order 1 and search for a feasible copula for
the innovations. The conditional means and variances are modeled using local polynomials with degree p = 1. The weights
and the smoothing parameters are chosen as in Section 3. The fitted conditional means and standard deviations are plotted
together with the data in Fig. 1. It is visible that the conditional mean functions are rather flat and range around zero.
We use the goodness-of-fit test proposed in Section 2.3 in order to decide which copula should be used for modeling the
innovations from the two autoregressions. The copula parameter is estimated using the inversion of Kendall’s tau method.
The significance of the test statistics is assessed with the help of the bootstrap test based on B = 999 bootstrap samples.
We test Clayton, Frank, Gumbel, normal and Student copula with 4 degrees of freedom respectively and obtain p-values
0.000, 0.000, 0.001, 0.055, 0.305, respectively. Hence, we conclude that the Student copula seems to be the best choice for
the innovations. The normal copula is also not rejected at the 5% level, but the corresponding p-value is rather borderline,
so the Student copula seems to provide a better fit. The maximum pseudo-likelihood method estimates 5.156 degrees of
freedom and parameter ρ = 0.778. Fig. 2 shows plot of pseudo-observations (Ũ1i , Ũ2i ) given by (5), together with contours
of the fitted Student copula.
Acknowledgments
The authors are grateful to the Editor-in-Chief, Christian Genest, an Associate Editor and the reviewers for their valuable
comments, which led to an improved manuscript. The first author gratefully acknowledges support from the DFG (Research
148 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162
Table 3
Estimation for the Gumbel copula with normal marginals (100 multiples of bias, SD and RMSE).
Model τ Estim n = 200 n = 500 n = 1000
Bias SD RMSE Bias SD RMSE Bias SD RMSE
(ik,or)
θ̂
Known innovations
0.25 n 0.01 3.19 3.19 0.13 2.43 2.44 0.08 1.88 1.88
(pl,or)
0.25 θ̂
n 0.44 3.01 3.04 0.38 2.37 2.40 0.24 1.81 1.82
(ik,or)
0.50 θ̂
n 0.02 2.58 2.58 0.11 1.96 1.97 0.02 1.49 1.49
(pl,or)
0.50 θ̂
n 0.24 2.42 2.43 0.27 1.89 1.91 0.12 1.44 1.44
(ik,or)
0.75 θ̂
n 0.02 1.48 1.48 0.06 1.12 1.12 0.00 0.84 0.84
(pl,or)
0.75 θ̂
n −0.06 1.35 1.36 0.02 1.05 1.05 −0.03 0.78 0.78
0.25 θ̂n(ik) −0.36 4.76 4.78 0.06 3.06 3.05 −0.09 2.06 2.06
0.25 θ̂n(pl) 0.24 4.68 4.68 0.37 2.92 2.94 0.08 2.01 2.01
1 0.50 θ̂n(ik) −0.56 3.92 3.96 −0.17 2.45 2.46 −0.22 1.69 1.70
0.50 θ̂n(pl) −0.36 3.83 3.84 −0.10 2.35 2.35 −0.20 1.65 1.66
0.75 θ̂n(ik) −0.85 2.36 2.50 −0.52 1.42 1.51 −0.49 1.01 1.12
0.75 θ̂n(pl) −1.35 2.32 2.69 −0.84 1.36 1.60 −0.73 0.99 1.22
0.25 θ̂n(ik) −0.16 4.58 4.58 0.02 2.91 2.91 0.04 2.10 2.10
0.25 θ̂n(pl) 0.49 4.42 4.45 0.32 2.86 2.88 0.20 2.03 2.04
2 0.50 θ̂n(ik) −0.66 3.77 3.82 −0.14 2.36 2.36 −0.09 1.67 1.68
0.50 θ̂n(pl) −0.50 3.61 3.64 −0.09 2.30 2.30 −0.05 1.62 1.62
0.75 θ̂n(ik) −1.61 2.50 2.97 −0.52 1.43 1.52 −0.32 0.99 1.04
0.75 θ̂n(pl) −2.37 2.52 3.46 −0.95 1.45 1.73 −0.55 0.98 1.13
0.25 θ̂n(ik) −0.18 4.57 4.57 0.01 2.93 2.92 0.02 2.11 2.11
0.25 θ̂n(pl) 0.46 4.41 4.43 0.31 2.87 2.88 0.18 2.03 2.03
3 0.50 θ̂n(ik) −0.66 3.73 3.78 −0.18 2.36 2.37 −0.16 1.69 1.70
0.50 θ̂n(pl) −0.50 3.59 3.62 −0.13 2.31 2.32 −0.13 1.63 1.64
0.75 θ̂n(ik) −1.52 2.48 2.90 −0.58 1.41 1.53 −0.42 0.98 1.07
0.75 θ̂n(pl) −2.20 2.44 3.29 −0.98 1.40 1.71 −0.64 0.96 1.15
0.25 θ̂n(ik) −0.26 4.60 4.60 −0.06 2.97 2.97 0.04 2.12 2.12
0.25 θ̂n(pl) 0.30 4.47 4.47 0.19 2.89 2.89 0.18 2.04 2.05
4 0.50 θ̂n(ik) −0.63 3.79 3.84 −0.13 2.36 2.37 −0.11 1.69 1.69
0.50 θ̂n(pl) −0.56 3.63 3.67 −0.16 2.31 2.32 −0.13 1.65 1.65
0.75 θ̂n(ik) −0.83 2.38 2.52 −0.29 1.40 1.43 −0.21 0.97 0.99
0.75 θ̂n(pl) −1.51 2.35 2.79 −0.71 1.41 1.57 −0.45 0.95 1.05
Fig. 1. Fitted conditional mean and variance for the analyzed log returns.
Unit FOR 1735 Structural Inference in Statistics: Adaptation and Efficiency). The second author gratefully acknowledges
support from the grant GACR 15-04774Y. The research of the third author was supported by the grant GACR 18-01781Y.
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 149
Table 4
Estimation for the normal copula with normal marginals (100 multiples of bias, SD and RMSE).
Model τ Estim n = 200 n = 500 n = 1000
Bias SD RMSE Bias SD RMSE Bias SD RMSE
(ik,or)
θ̂
Known innovations
0.25 n −0.02 3.13 3.13 −0.05 2.32 2.31 −0.03 1.78 1.77
(pl,or)
0.25 θ̂ n 0.38 2.99 3.02 0.22 2.19 2.20 0.13 1.66 1.67
(ik,or)
0.50 θ̂ n −0.01 2.44 2.44 −0.04 1.81 1.81 −0.02 1.39 1.39
(pl,or)
0.50 θ̂ n 0.32 2.26 2.28 0.19 1.67 1.68 0.12 1.27 1.27
(ik,or)
0.75 θ̂ n −0.01 1.36 1.36 −0.02 1.01 1.01 −0.01 0.77 0.77
(pl,or)
0.75 θ̂ n −0.04 1.23 1.23 −0.03 0.91 0.91 −0.01 0.69 0.69
0.25 θ̂n(ik) −0.29 4.65 4.66 −0.07 2.83 2.83 −0.15 1.99 2.00
0.25 θ̂n(pl) 0.35 4.49 4.50 0.19 2.72 2.72 0.02 1.89 1.89
1 0.50 θ̂n(ik) −0.48 3.67 3.70 −0.23 2.22 2.23 −0.25 1.56 1.58
0.50 θ̂n(pl) 0.00 3.40 3.40 −0.05 2.08 2.08 −0.13 1.44 1.44
0.75 θ̂n(ik) −0.78 2.17 2.30 −0.53 1.27 1.38 −0.47 0.88 1.00
0.75 θ̂n(pl) −0.94 2.02 2.23 −0.64 1.19 1.35 −0.52 0.81 0.96
0.25 θ̂n(ik) −0.34 4.39 4.40 −0.12 2.80 2.80 −0.10 1.94 1.94
0.25 θ̂n(pl) 0.38 4.21 4.22 0.22 2.72 2.72 0.10 1.83 1.83
2 0.50 θ̂n(ik) −0.70 3.47 3.54 −0.25 2.20 2.21 −0.16 1.53 1.54
0.50 θ̂n(pl) −0.20 3.21 3.22 −0.01 2.06 2.06 −0.01 1.40 1.40
0.75 θ̂n(ik) −1.54 2.25 2.73 −0.59 1.31 1.43 −0.34 0.86 0.93
0.75 θ̂n(pl) −1.80 2.14 2.80 −0.71 1.23 1.43 −0.39 0.79 0.88
0.25 θ̂n(ik) −0.38 4.41 4.42 −0.15 2.80 2.81 −0.13 1.95 1.96
0.25 θ̂n(pl) 0.33 4.23 4.24 0.18 2.72 2.73 0.06 1.83 1.83
3 0.50 θ̂n(ik) −0.71 3.48 3.55 −0.32 2.19 2.21 −0.22 1.52 1.53
0.50 θ̂n(pl) −0.21 3.20 3.21 −0.08 2.05 2.06 −0.07 1.39 1.39
0.75 θ̂n(ik) −1.45 2.19 2.63 −0.70 1.29 1.46 −0.43 0.87 0.97
0.75 θ̂n(pl) −1.70 2.07 2.67 −0.81 1.21 1.46 −0.48 0.79 0.93
0.25 θ̂n(ik) −0.34 4.40 4.41 −0.15 2.81 2.81 −0.11 1.96 1.97
0.25 θ̂n(pl) 0.30 4.24 4.25 0.16 2.72 2.72 0.07 1.84 1.84
4 0.50 θ̂n(ik) −0.69 3.47 3.53 −0.27 2.20 2.22 −0.18 1.54 1.55
0.50 θ̂n(pl) −0.26 3.23 3.24 −0.09 2.07 2.07 −0.07 1.41 1.41
0.75 θ̂n(ik) −0.82 2.19 2.34 −0.35 1.29 1.34 −0.22 0.87 0.89
0.75 θ̂n(pl) −1.14 2.08 2.37 −0.52 1.23 1.33 −0.31 0.81 0.86
Fig. 2. Pseudo-observations (Ũ1i , Ũ2i ) given by (5) together with contours of the fitted Student copula (black curves).
150 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162
Table 5
Estimation for Student copula with normal marginals (100 multiples of bias, SD and RMSE).
Model τ Estim n = 200 n = 500 n = 1000
Bias SD RMSE Bias SD RMSE Bias SD RMSE
(ik,or)
θ̂
Known innovations
0.25 n −0.28 3.53 3.53 0.00 2.68 2.68 0.07 1.99 1.99
(pl,or)
0.25 θ̂
n 0.03 3.48 3.48 0.23 2.61 2.62 0.21 1.97 1.98
(ik,or)
0.50 θ̂
n −0.19 2.81 2.82 −0.01 2.14 2.14 0.05 1.60 1.60
(pl,or)
0.50 θ̂
n 0.05 2.66 2.66 0.19 2.00 2.00 0.17 1.51 1.52
(ik,or)
0.75 θ̂
n −0.10 1.62 1.62 −0.01 1.23 1.23 0.02 0.93 0.93
(pl,or)
0.75 θ̂
n −0.16 1.46 1.47 −0.02 1.09 1.09 0.02 0.83 0.83
0.25 θ̂n(ik) −0.25 4.93 4.93 −0.18 3.30 3.30 −0.11 2.28 2.28
0.25 θ̂n(pl) 0.24 4.96 4.96 0.08 3.32 3.32 0.00 2.27 2.27
1 0.50 θ̂n(ik) −0.48 3.95 3.97 −0.34 2.62 2.64 −0.24 1.81 1.83
0.50 θ̂n(pl) −0.17 3.82 3.82 −0.18 2.57 2.57 −0.20 1.74 1.75
0.75 θ̂n(ik) −0.79 2.33 2.46 −0.64 1.56 1.68 −0.49 1.06 1.17
0.75 θ̂n(pl) −1.13 2.22 2.48 −0.83 1.47 1.69 −0.66 0.99 1.19
0.25 θ̂n(ik) −0.61 4.99 5.03 −0.20 3.23 3.24 0.02 2.22 2.22
0.25 θ̂n(pl) −0.21 4.98 4.98 0.03 3.18 3.18 0.15 2.19 2.20
2 0.50 θ̂n(ik) −0.89 4.01 4.11 −0.35 2.62 2.64 −0.08 1.79 1.79
0.50 θ̂n(pl) −0.80 3.86 3.94 −0.24 2.45 2.46 −0.01 1.69 1.69
0.75 θ̂n(ik) −1.66 2.57 3.06 −0.70 1.55 1.70 −0.30 1.06 1.10
0.75 θ̂n(pl) −2.37 2.48 3.42 −0.99 1.44 1.75 −0.46 0.97 1.07
0.25 θ̂n(ik) −0.59 5.01 5.05 −0.24 3.22 3.23 −0.01 2.23 2.23
0.25 θ̂n(pl) −0.21 4.97 4.98 −0.01 3.18 3.18 0.12 2.20 2.20
3 0.50 θ̂n(ik) −0.90 4.07 4.16 −0.43 2.59 2.63 −0.14 1.79 1.79
0.50 θ̂n(pl) −0.79 3.88 3.96 −0.33 2.44 2.46 −0.08 1.68 1.69
0.75 θ̂n(ik) −1.60 2.61 3.06 −0.76 1.55 1.73 −0.39 1.06 1.13
0.75 θ̂n(pl) −2.20 2.48 3.31 −1.05 1.43 1.77 −0.56 0.97 1.12
0.25 θ̂n(ik) −0.63 5.03 5.07 −0.23 3.26 3.27 0.00 2.22 2.22
0.25 θ̂n(pl) −0.28 4.97 4.97 −0.01 3.22 3.22 0.12 2.19 2.20
4 0.50 θ̂n(ik) −0.91 4.06 4.16 −0.38 2.61 2.64 −0.09 1.78 1.78
0.50 θ̂n(pl) −0.81 3.82 3.91 −0.32 2.45 2.47 −0.07 1.68 1.68
0.75 θ̂n(ik) −0.97 2.51 2.69 −0.42 1.55 1.61 −0.17 1.05 1.06
0.75 θ̂n(pl) −1.47 2.34 2.76 −0.70 1.44 1.60 −0.33 0.96 1.02
where Ĝ1n and Ĝ2n denote the marginals of Ĝn . Further Ĝn is a distribution function on [0, 1]2 with the marginals cdfs
satisfying Ĝ1n (0) = Ĝ2n (0) = 0. Thus one can make use of the Hadamard differentiability of the ‘copula mapping’
Φ : G ↦→ G(G− 1 , G2 ) proved in Theorem 2.4 of Bücher and Volgushev [4] provided that we show that the process defined,
1 −1
Denote
n n
1∑ 1 ∑
n (u1 , u2 ) =
G(or) 1{ε1i ≤ F1−ε1 (u1 ), ε2i ≤ F2−ε1 (u2 )}, n (u1 , u2 ) =
G̃(or) wni 1{ε1i ≤ F1−ε1 (u1 ), ε2i ≤ F2−ε1 (u2 )}.
n Wn
i=1 i=1
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 151
where (in agreement with the last two conditions in (Fε )) for u1 ∈ {0, 1} the first term on the right-hand side of (A.2) is
defined as zero and analogously for u2 ∈ {0, 1}.
In Appendix A.3, we will show the asymptotic negligibility of the second term on the right-hand side of (A.1), i.e.,
√
n (u1 , u2 )| =
|G̃(or) n (u1 , u2 ) − Gn (u1 , u2 )}| = oP (1).
| n {G̃(or) (or)
sup sup (A.3)
(u1 ,u2 )∈[0,1]2 (u1 ,u2 )∈[0,1]2
Now combining (A.1) with (A.2) and (A.3) yields that uniformly in (u1 , u2 )
where
n
1 ∑
An (u1 , u2 ) = √ [1{U1i ≤ u1 , U2i ≤ u2 } − C (u1 , u2 )], (A.5)
n
i=1
n 2
1 ∑ ∑ (j)
Bn (u1 , u2 ) = √ C (u1 , u2 ) fjε {Fj−
ε (uj )} {εji + Fjε (uj )(εji − 1)/2}.
1 −1 2
n
i=1 j=1
The asymptotic representation (A.4) together with standard techniques yields the weak convergence of the process Ĝn .
Now thanks to the Hadamard differentiability of the copula functional and Theorem 3.9.4 in [33],
√ √
n {C̃n (u1 , u2 ) − C (u1 , u2 )} = 1n (u1 ), Ĝ2n (u2 )} − C (u1 , u2 )]
n [Ĝn {Ĝ− 1 −1
= Ĝn (u1 , u2 ) − C (1) (u1 , u2 ) Ĝn (u1 , 1) − C (2) (u1 , u2 ) Ĝn (1, u2 ) + oP (1). (A.6)
Bn (u1 , u2 ) − C (1) (u1 , u2 )Bn (u1 , 1) − C (2) (u1 , u2 )Bn (1, u2 ) = 0. (A.7)
where
{ }
G = a : R d → R : a ∈ C1
d+δ
(Rd ), sup ∥x∥ν |a(x)| ≤ 1 , (A.9)
x
{ }
G̃ = b : Rd → R : b ∈ C̃2
d+δ
(Rd ), sup ∥x∥ν |b(x) − 1| ≤ 1 (A.10)
x
and for δ from assumption (Bw) and some ν large enough such that
b{d/ν + d/(d + δ )}/(b − 1) < 1. (A.11)
Denote the centered process as
Z̄n (f ) = Zn (f ) − E Zn (f ), (A.12)
and note that f ∈ F is formally identified by (c , z1 , z2 , a1 , b1 , a2 , b2 ). We will use the notation f =
ˆ (c , z1 , z2 , a1 , b1 , a2 , b2 ).
In accordance with van der Vaart and Wellner [34], the notation Z̄n (fn ) for random fn is understood to mean the value of the
mapping f ↦ → Z̄n (f ) evaluated at fn . Consider the semi-norm given by
∫ 1
∥f ∥ 2
2,β = β −1 (u) Qf2 (u) du,
0
where
β −1 (u) = inf{x > 0 : β⌊x⌋ ≤ u}, Qf (u) = inf{x > 0 : Pr(|f (ε11 , ε21 , X 1 )| > x) ≤ u}.
From assumption (β) one obtains that β −1
(u) ≤ cu−1/b for some constant c. Further denote
P |f − g | = E |f (X 1 , ε11 , ε12 ) − g(X 1 , ε11 , ε12 )| = Pr(|f (X 1 , ε11 , ε12 ) − g(X 1 , ε11 , ε12 )| > 0).
As F consists of indicator functions for f , g ∈ F one has Qf −g (u) = 1(0 < u < P |f − g |). Thus one obtains, for ϵ < 1,
∫ P |f −g |
cb
∥f − g ∥22,β ≤ c u−1/b du = (P |f − g |)1−1/b . (A.13)
0 b−1
Starting with brackets of ∥ · ∥2 -length ϵ 2b/(b−1) of the function classes G , G̃ and {x ↦ → 1{x ∈ [−c , c ]d } : c ∈ R+ } it is then
easy to construct brackets for F with ∥ · ∥2,β -length ϵ ; compare with the proof of Lemma 1 in [8]. Thus one obtains
ln{N[ ] (ϵ, F , ∥ · ∥2,β )} ≤ ln{O(ϵ −2db/(b−1) ) N[ ] (ϵ 2b/(b−1) , G , ∥ · ∥2 ) N[ ] (ϵ 2b/(b−1) , G̃ , ∥ · ∥2 )}
( b d d
)
≤ O{ln(ϵ )} + O ϵ −2 b−1 ( ν + d+δ ) , (A.14)
where the rate follows from Lemma 2 in Appendix C. Further one bracket is sufficient for ϵ ≥ 1. Thus by (A.14) and (A.11),
∫ ∞
ln N[ ] (ϵ, F , ∥ · ∥2,β ) dϵ < ∞.
√
0
From Dedecker and Louhichi [7], Section 4.3, it follows that the centered process Z̄n given by (A.12) is asymptotically ∥ · ∥2,β -
equicontinuous. To apply this result in order to prove (A.2) note that
√ ∑n
n [ { }
G̃n (u1 , u2 ) = wni 1 ε1i ≤ F1−ε1 (u1 ) σσ̂1 (X
(X i )
)
+ m̂1 (X i )−m1 (X i )
σ1 (X i )
, ε2i ≤ F2−ε1 (u2 ) σ̂σ2 (X i)
(X )
+ m̂2 (X i )−m2 (X i )
σ2 (X i )
Wn 1 i 2 i
i=1
]
− 1{ε1i ≤ F1−ε1 (u1 ), ε2i ≤ F2−ε1 (u2 )}
and introduce the process
n
1 ∑
Ǧn (u1 , u2 ) = √ wni [1{ε1i ≤ F1−ε1 (u1 )b̂1 (X i ) + â1 (X i ), ε2i ≤ F2−ε1 (u2 )b̂2 (X i ) + â2 (X i )}
n
i=1
We only consider the upper bound; the lower one can be handled completely analogously. First note that Zn (fnu ) − Zn (gn ) =
Z̄n (fnu ) − Z̄n (gn ) + Rn , where with probability converging to 1,
√
|Rn | ≤ 2 n max sup |Fjε (uv + w) − Fjε {u(v + sγn ) + w + γn }| = o(1), (A.15)
j∈{1,2} u∈R,s∈{−1,1}
v∈(1/2,1),w∈(−1,1)
where the last equality follows by a Taylor expansion, assumption (Fε ) and γn = o(n−1/2 ). Now for j ∈ {1, 2}, introduce the
notation
and thus |Z̄n (fnu ) − Z̄n (gn )| = oP (1) uniformly with respect to u1 , u2 . In combination with (A.15), analogous considerations
for the lower bound Zn (fnℓ ) − Zn (gn ) and the fact that
Wn /n = 1 + oP (1), (A.17)
we obtain
Further, thanks to (A.17), it is sufficient to show that the process Ǧn (u1 , u2 ) has the asymptotic representation given by the
right-hand side of (A.2).
Thus the remaining proof of (A.2) is divided into two parts. First we prove that
and then we calculate E∗ {Ǧn (u1 , u2 )}. Here, with slight abuse of notation, E∗ denotes expectation, considering the functions
âj , b̂j as deterministic.
where
with 0 and 1 standing for functions that are constantly equal to 0 and 1, respectively. Similarly to before one can show that
for a sufficiently large M
( [ ] [ ])1−1/b
∥fn − gn ∥2,β ≤ M E |F1ε {F1−ε1 (u1 , X 1 , 0)} − u1 | 1(X 1 ∈ Jn ) + E |F2ε {F2−ε1 (u2 , X 1 , 0)} − u2 | 1(X 1 ∈ Jn )
using notation (A.16). Now note that with Lemma 1(iii) in Appendix C, we obtain ∥fn − gn ∥2,β = oP (1) uniformly in u1 , u2 .
Finally with the help of (A.17), (A.19) and the asymptotic ∥ · ∥2,β -equicontinuity of the process Z̄n , one can conclude (A.18).
154 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162
To simplify the notation and to prevent the confusion let the random vector X have the same distribution as X 1 . With the
help of a second-order Taylor series expansion of the right-hand side, one gets
√
E∗ {Ǧn (u1 , u2 )} = n E∗ {wn (X )[Fε {F1−ε1 (u1 , X , 0), F2−ε1 (u2 , X , 0)} − Fε {F1−ε1 (u1 ), F2−ε1 (u2 )}]}
2
√ ∑
= n E∗ [wn (X ) Fε(j) {F1−ε1 (u1 ), F2−ε1 (u2 )} YjX (uj )] (A.20)
j=1
2 2
1√ ∑ ∑ ∗
+ n E [wn (X ) Fε(j,k) {F1−ε1 (u1X ), F2−ε1 (u2X )} YjX (uj ) YkX (uk )],
2
j=1 k=1
ε (u, x, 0) − Fjε (u) = âj (x) + Fjε (u) {b̂j (x) − 1},
1 −1 −1
Yjx (u) = Fj−
ε (uj , x, 0)} and uj . Now using Lemma 1(iv) in Appendix C, for j ∈ {1, 2},
1
and the point ujx lies between the points Fjε {Fj−
√
n E∗ [wn (X ) Fε(j) {F1−ε1 (u1 ),F2−ε1 (u2 )} YjX (uj )]
√ [ ]
= n Fε(j) {F1−ε1 (u1 ), F2−ε1 (u2 )} E∗ {âj (X )}1(X ∈ Jn ) + Fj−1 ∗
ε (uj ) E [{b̂j (X ) − 1} 1(X ∈ Jn )]
n
1 ∑
= Fε(j) {F1−ε1 (u1 ), F2−ε1 (u2 )} √ {εji + Fj−ε 1 (uj ) (εji2 − 1)/2} + oP (1)
n
i=1
n
1 ∑
= √ C (j) (u1 , u2 ) fjε {Fj−ε 1 (uj )} {εji + Fj−ε 1 (uj ) (εji2 − 1)/2} + oP (1)
n
i=1
uniformly in (u1 , u2 ).
To conclude the proof of (A.2) we need to show that ‘the second order terms’ in (A.20) are asymptotically negligible. To
show that note that by assumption (Fε ) and Lemma 1(iii) there exists a finite constant M such that with probability going
to 1,
|Fε(j,k) {F1−ε1 (u1x ), F2−ε1 (u2x )}| {1 + |Fj−ε 1 (uj )|} {1 + |Fk−ε1 (uk )|}
{1 + |Fj−ε 1 (uj )|} {1 + |Fk−ε1 (uk )|}
= |Fε(j,k) {F1−ε1 (u1x ), F2−ε1 (u2x )}| {1 + |Fj−ε 1 (ujx )|} {1 + |Fk−ε1 (ukx )|}
{1 + |Fj−ε 1 (ujx )|}{1 + |Fk−ε1 (ukx )|}
{1 + |Fj−ε 1 (uj )|} {1 + |Fk−ε1 (uk )|}
≤M
{1 + |Fj−ε 1 (ujx )|}{1 + |Fk−ε1 (ukx )|}
1 −1
M {1 + |Fj−
ε (uj )|} {1 + |Fkε (uk )|}
≤ ≤ 2M
{1 + |Fj−ε 1 (uj ){1 + oP (n−1/4 )} + oP (n−1/4 )|}{1 + |Fk−ε1 (uk ){1 + oP (n−1/4 )} + oP (n−1/4 )|}
uniformly in (u1 , u2 ) ∈ [0, 1]2 and x ∈ Jn . Thus to prove
E∗ [wn (X ) Fε(j,k) {F1−ε1 (u1X ), F2−ε1 (u2X )} YjX (uj ) YkX (uk )] = oP (n−1/2 ),
it is sufficient to use once more Lemma 1(iii).
where Bn1 (u1 , u2 ) stands for the first term on the right-hand side of Eq. (A.21) (except for the factor n/Wn − 1) and Bn2 (u1 , u2 )
for the second term. Using standard techniques, one can then show that both Bn1 (u1 , u2 ) and Bn2 (u1 , u2 ) viewed as processes
on [0, 1]2 are asymptotically equicontinuous. To this end, note that Bn1 (u1 , u2 ) corresponds to the process Z̄n (f ) as defined in
Appendix A.1 above with f = ˆ (cn , F1−ε1 (u1 ), F2−ε1 (u2 ), 0, 1, 0, 1). Alternatively, results by Bickel and Wichura [2] can be applied.
Moreover, given that n/Wn − 1 = oP (1) and E{wn (X 1 ) − 1} = − Pr(X 1 ̸ ∈ Jn ) = o(1), one can conclude that both processes
(n/Wn − 1) Bn1 (u1 , u2 ) and Bn2 (u1 , u2 ) are uniformly asymptotically negligible in probability, which together with (A.21)
implies (A.3). □
Thanks to assumption (φ), the estimator θ̂ n is a solution to the estimating equations (6). In what follows, first we prove
the existence of a consistent root of the estimating equations (6) and then we derive that this root satisfies
n
√ 1 ∑
n (θ̂ n − θ ) = Γ−1 √ φ(Û1i , Û2i ; θ ) + oP (1), (B.1)
n
i=1
where (Û1i , Û2i ) was introduced in (8). The statement of the theorem now follows for p = 1 by Proposition A 1(ii) of Genest
et al. [12] and for p > 1 by Theorem 1 of Gijbels et al. [15].
Let
n
1 ∑
C̃n′ (u1 , u2 ) = wni 1(Ũ1i ≤ u1 , Ũ2i ≤ u2 ),
Wn
i=1
where the pseudo-observations (Ũ1i , Ũ2i ) are defined in (5). Note that
sup |C̃n (u1 , u2 ) − C̃n′ (u1 , u2 )| = OP (1/Wn ) = OP (1/n). (B.2)
(u1 ,u2 )∈[0,1]2
Fix ℓ ∈ {1, . . . , p}. By Corollary A.7 of Berghaus et al. [1], one finds
n ∫ 1 ∫ 1
1 ∑
wni φℓ (Ũ1i , Ũ2i ; t) = φℓ (v1 , v2 ; t) dC̃n′ (v1 , v2 )
Wn 0 0
i=1
∫ 1 ∫ 1
= C̃n′ (v1 , v2 ) dφℓ (v1 , v2 ; t) + φℓ (1, 1; t)
0 0
∫ 1 ∫ 1
− C̃n (v1 , 1) dφℓ (v1 , 1; t) −
′
C̃n′ (1, v2 ) dφℓ (1, v2 ; t). (B.3)
0 0
and, analogously,
∫ 1 ∫ 1
C̃n′ (1, v2 ) dφℓ (1, v2 ; t) = φℓ (1, 1; t) − φℓ (1, v2 ; t) dv2 + OP (1/Wn ). (B.5)
0 0
where
∫ 1 ∫ 1
Aℓ (t) = −φℓ (1, 1; t) + φℓ (v1 , 1; t) dv1 + φℓ (1, v2 ; t) dv2 . (B.7)
0 0
156 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162
Analogously, we find
∫ 1 ∫ 1
E φℓ (U11 , U21 ; t) = C (v1 , v2 ) dφℓ (v1 , v2 ; t) + Aℓ (t). (B.8)
0 0
Now using (B.2), (B.6), (B.8) and assumption (φ) gives that, uniformly in t ∈ V (θ ),
n ∫ 1 ∫ 1
1 ∑
wni φℓ (Ũ1i , Ũ2i ; t) − E φℓ (U11 , U21 ; t) = {C̃n′ (v1 , v2 ) − C (v1 , v2 )} dφℓ (v1 , v2 ; t) + OP (1/n) = oP (1),
Wn 0 0
i=1
where we have used Theorem 1 and assumption (φ). The existence of a consistent root of estimating equations (6) now
follows by assumptions (Id) and (Γ).
Analogously, one can show the existence of a consistent root of estimating equations (7). □
Let θ̂ n be a consistent root of the estimating equations (6). Then by the Mean Value Theorem applied to each coordinate
of the vector-valued function, viz.
n
1 ∑
Ψn (t) = wni φ(Ũ1i , Ũ2i ; t)
Wn
i=1
we find
n n n
1 ∑ 1 ∑ 1 ∑
0p = wni φ(Ũ1i , Ũ2i ; θ̂ n ) = wni φ(Ũ1i , Ũ2i ; θ ) + wni Dφ (Ũ1i , Ũ2i ; θ ∗n ) (θ̂ n − θ ),
Wn Wn Wn
i=1 i=1 i=1
where Dφ stands for ∂φ(u1 , u2 ; t)/∂ t and θ n is between θ̂ n and θ . Note that as the Mean Value Theorem is applied to a vector-
∗
and
√ ∑n n
n 1 ∑
wni φ(Ũ1i , Ũ2i ; θ ) = √ φ(Û1i , Û2i ; θ ) + oP (1). (B.10)
Wn n
i=1 i=1
When proving (B.9) one can mimic the proof of consistency of θ̂ n and show that there exists an open neighborhood V (θ )
of θ such that
1 n
∑
wni Dφ (Ũ1i , Ũ2i ; t) − E Dφ (U11 , U21 ; t)
= oP (1).
sup
t∈V (θ ) n
i=1
Using the consistency of θ̂ n and assumption (Γ) yields (B.9). Thus one can concentrate on proving (B.10). Set
n
1∑
Cn′(or) (u1 , u2 ) = 1(Û1i ≤ u1 , Û2i ≤ u2 ),
n
i=1
Analogously as (B.6), one can also show that, for ℓ ∈ {1, . . . , p},
n ∫ 1 ∫ 1
1∑
φℓ (Û1i , Û2i ; θ ) = φℓ (v1 , v2 ; θ ) dCn′(or) (v1 , v2 )
n 0 0
i=1
∫ 1 ∫ 1
= Cn′(or) (v1 , v2 ) dφℓ (v1 , v2 ; θ ) + Aℓ (θ ) + OP (1/n), (B.12)
0 0
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 157
where Aℓ (θ ) is given in (B.7). Now using (B.2), (B.6), (B.11), (B.12), Theorem 1 and (φ), one finds
√ ∑n n
n 1 ∑
wni φℓ (Ũ1i , Ũ2i ; θ ) − √ φℓ (Û1i , Û2i ; θ )
Wn n
i=1 i=1
1 1
√ √
∫ ∫
= n {C̃n′ (u1 , u2 ) − Cn′(or) (u1 , u2 )} dφℓ (u1 , u2 ; θ ) + oP (1/ n) = oP (1),
0 0
Lemma 1. Assume that (β), (Fε ), (M), (FX ), (Bw), (k), (Jn ) and (mσ ) are satisfied. Then there exist random functions âj and b̂j on
Jn such that, for j ∈ {1, 2},
(i) supx∈Jn |{m̂j (x) − mj (x)}/σj (x) − âj (x)| = oP (n−1/2 ) supx∈Jn |σ̂j (x)/σj (x) − b̂j (x)| = oP (n−1/2 );
(ii) ∥âj ∥d+δ = oP (1), ∥b̂j − 1∥d+δ = oP (1) for δ > 0 from assumption (Bw),
(iii) supx∈Jn |âj (x)| = oP (n−1/4 ) and supx∈Jn |b̂j (x) − 1| = oP (n−1/4 ),
n ∫ n
−1/2
(εji2 − 1)/(2n) + oP (n−1/2 ).
∑ ∑
εji /n + oP (n
∫
(iv) Jn
âj (x)fX (x) dx = ), { b̂j (x) − 1} fX (x) dx =
i=1 Jn i=1
Proof. For ease of presentation we set j = 1 and assume hn = (hn , . . . , hn ). We will first prove the assertions for m̂1 . The proof
basically goes along the lines of the proof of Lemma 1 by Müller et al. [25], but changes are necessary due to the dependency
of observations in our model and because our covariate density is not assumed to be bounded away from zero on its support.
Recall that I(d, p) denotes the set of multi-indices i = (i1 , . . . , id ) with i. = i1 + · · · + id ≤ p and we set I = I(d, p), where
n = [−cn − hn , cn + hn ] and
d
p is the order of the polynomials used in the local polynomial estimation. Further introduce J+
note that thanks to assumption (Bw), there exists q > 0 such that
(2)
n is a subset of J2n . Finally define αn = minj∈{1,2} infx∈Jn σj (x) which is by assumption
as for all sufficiently large n the set J+
(mσ ) either bounded away from zero or converges to zero not faster than a negative power of ln n.
Proof of assertion (i) for m̂1 . Fix some x ∈ Jn and let β̂ denote the solution of the minimization problem (3). Then β̂ satisfies
the normal equations
∑
∀i∈I Ai (x) + Bi (x) − Q̂ik (x) β̂k = 0,
k∈ I
where
n
1∑
Ai (x) = σ1 (X ℓ ) ε1ℓ ψi,hn (X ℓ − x) Khn (X ℓ − x),
n
ℓ=1
n
1∑
Bi (x) = m1 (X ℓ ) ψi,hn (X ℓ − x) Khn (X ℓ − x),
n
ℓ=1
n
1∑
Q̂ik (x) = ψi,hn (X ℓ − x) ψk,hn (X ℓ − x) Khn (X ℓ − x).
n
ℓ=1
where we define Qik (x) = E{Q̂ik (x)} for all i, k ∈ I. Note that
∫
Qik (x) = ψi,(1,...,1) (u) ψk,(1,...,1) (u) fX (x + hn u) K (u) du,
and for x ∈ Jn , consider the matrices Q(x) with entries Qik (x) with i, k ∈ I. Analogously, put Q̂(x) for the matrix with entries
Q̂ik (x) with i, k ∈ I.
158 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162
It follows from (C.1) that 0 < λn ≤ a⊤ Q(x) a ≤ Λ < ∞ for all vectors a of unit Euclidean length, where λn is a sequence
(1)
of positive real numbers of the same rate as αn in (C.1). Thus Q(x) has eigenvalues in the interval [λn , Λ], and on the event
{ }
En = sup ∥Q̂(x) − Q(x)∥ ≤ λn /2
x∈Jn
one has a Q̂(x) a ≥ λn /2 for all a of unit Euclidean length, such that the matrix Q̂(x) is invertible as well. Here and throughout,
⊤
∥Q∥ denotes the spectral norm of a matrix Q. Note that Pr(En ) → 1 by (C.2) and ϱn = o(αn(1) ), which holds under assumption
(Bw). For the remainder of the proof, we assume that the event En takes place because its complement does not matter for
the assertions of the lemma. It follows from the normal equations that, for x ∈ Jn ,
where e1 = (1, 0, . . . , 0)⊤ and A(x) and B(x) denote the vectors with components Ai (x) and Bi (x) with i ∈ I, respectively.
Now define, for x ∈ Jn ,
r1 (x) = e1 ⊤ {Q̂−1 (x) − Q−1 (x)} A(x)/σ1 (x), r2 (x) = e1 ⊤ Q̂−1 (x) {B(x) − Q̂(x) β(x)}/σ1 (x),
where β(x) is the vector with components β i (x) = hin. Di m1 (x) with i ∈ I. From Theorem 2 in [17], we obtain
for all i ∈ I. For the treatment of the inverse matrices in r1 (x), we use Cramér’s rule and write
1 1
Q̂−1 (x) − Q−1 (x) = {Ĉ(x)}⊤ − {C(x)}⊤
det{Q̂(x)} det{Q(x)}
det{Q(x)} − det{Q̂(x)} 1
= {Ĉ(x)}⊤ + {Ĉ(x) − C(x)}⊤ ,
det{Q̂(x)} det{Q(x)} det{Q(x)}
where Ĉ(x) and C(x) denote the cofactor matrices of Q̂(x) and Q(x), respectively. Due to the boundedness of the func-
tions Qik each element of Ĉ(x) − C(x) can be absolutely bounded by OP (ϱn ) by (C.2) and the same rate is obtained for
|I|
|det{Q(x)} − det{Q̂(x)}|, uniformly in x. Using the lower bound λn for the determinant of Q(x), and assumption (mσ ) to
bound 1/σ1 gives the rate
( )
(ϱn )2
sup |r1 (x)| = OP (1) 2|I| (2) = oP (n−1/2 ) (C.6)
x∈Jn (αn ) αn
by assumption (Bw). In order to show negligibility of r2 (x) first note that the spectral norm of Q̂−1 (x) is given by the reciprocal
of the square root of the smallest eigenvalue of Q̂(x)⊤ Q̂(x). With
{a⊤ Q̂(x)⊤ Q̂(x) a}1/2 = ∥Q̂(x)a∥ ≥ {a⊤ Q(x)⊤ Q(x) a}1/2 − ∥Q̂(x) − Q(x)∥ ≥ λn /2
using assumption (Bw). Now assertion (i) for m̂1 follows from (C.3), (C.4), (C.6) and (C.7).
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 159
Proof of assertion (ii) for â1 . Note that p ≥ d and thus â1 is (d + 1)-times partially differentiable and
⎧ ⎫
i i ′ i i ′
⎪ ⎪
⎨ |D â1 (x) − D â1 (x )| |D â1 (x) − D â1 (x )| ⎬
∥â1 ∥d+δ = max sup |Di â1 (x)| + max max sup , sup
i∈I(d,d) x∈Jn i∈I(d,d) ⎪ x,x′ ∈Jn ∥x − x′ ∥δ x,x′ ∈Jn ∥x − x′ ∥δ ⎪
i.=d ⎩ ⎭
∥x−x′ ∥≤hn ∥x−x′ ∥>hn
≤ max sup |Di â1 (x)| + max sup |Di â1 (x)| h1n−δ + 2 max sup |Di â1 (x)| h−δ
n
i∈I(d,d) x∈Jn i∈I(d,d+1) x∈J
n
i∈I(d,d) x∈J
n
i.=d+1 i.=d
by the Mean Value Theorem. Again by Theorem 2 of [17] we have, for all i ∈ I(d, d + 1),
sup |hin. Di Ak (x)| = OP (ϱn ).
x∈Jn
Proof of assertion (iii) for â1 . From the definition of â1 and (C.5) we obtain that
( )
ϱn
sup |â1 (x)| = OP (1) (2) = oP (n−1/4 )
x∈Jn αn αn
with
∫
1
∆n (X i ) = σ1 (X i ) e1 ⊤ Q−1 (x) ψ hn (X i − x) Khn (X i − x) fX (x) dx
σ1 (x)
∫
fX (X i − uhn )
= σ1 (X i ) e1 ⊤ Q−1 (X i − uhn ) ψ (u) K (u) du.
[
X i −cn
hn
,
X i +cn
hn
] σ1 (X i − uhn )
note that
n
1∑ −1/2
ε1i ∆n (X i ) 1(X i ∈ J+ −
n \ Jn ) = oP (n ) (C.8)
n
i=1
1 ( )
= O Mn h n
(1) (2) = o(n−1 ).
n αn αn
It remains to consider
n
1∑
ε1i ∆n (X i ) 1(X i ∈ J−
n ),
n
i=1
160 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162
(1) (2)
with ∆n (X i ) = ∆n (X i ) + ∆n (X i ), where
∫
∆(1)
n (X i ) = e1 ⊤ Q−1 (X i − uhn ) ψ (u) K (u) fX (X i − uhn ) du,
[−1,1]d
∫
∆(2)
n (X i ) = e1 ⊤ Q−1 (X i − uhn ) ψ (u) K (u) fX (X i − uhn ) {σ1 (X i )/σ1 (X i − uhn ) − 1} du.
[−1,1]d
(2) (1) (2)
Now, by applying the mean value theorem for σ1 , for X i ∈ J−
n , |∆n (X i )| can be bounded by O{Mn hn /(αn αn )} = o(1). Thus
analogously as when showing (C.8) one can use Markov’s inequality to get
∫ n n
1∑ 1∑ −1/2
â1 (x)fX (x) dx − ε1i = ε1i {∆(1) −
n (X i ) − 1} 1(X i ∈ Jn ) + oP (n ).
Jn n n
i=1 i=1
(1)
To obtain the desired negligibility it remains to show E[{∆n (X i ) − 1}2 1(X i ∈ J−
n )] → 0. To this end we write
∫
∗ (x − hn u) ψ (u) K (u) fX (x − hn u) du,
1
1= e 1 ⊤ Q−
[−1,1]d
with i, k ∈ I. Note that Q∗ (x) has the smallest eigenvalue of order λn . Thus we can write
E[{∆(1) 2 −
n (X i ) − 1} 1(X i ∈ Jn )]
[[∫ ]2 ]
∗ (X i − uhn )} ψ (u) K (u) fX (X i − uhn ) du
e1 ⊤ {Q−1 (X i − uhn ) − Q−1
=E 1(X i ∈ J−
n)
[−1,1]d
∫ ∫
≤ O(1) ∥Q−1 (x − uhn ) − Q− 1 2
∗ (x − uhn )∥ du fX (x) dx
J−
n [−1,1]d
∫ ∫
≤ O(1) sup ∥Q−1 (x)∥2 sup ∥Q−
∗
1
(x)∥ 2
∥Q(x − uhn ) − Q∗ (x − uhn )∥2 du fX (x) dx.
x∈Jn x∈Jn J−
n [−1,1]d
Now with bounds for the matrix norms similar to before, and inserting the definitions of Q and Q∗ , we obtain
( )∫ ∫ ∫
E[{∆(1) 2 −
n (X i ) − 1} 1(X i ∈ Jn )] ≤ O
1
(1) |fX (x − uhn + hn v) − fX (x − uhn )|2 K (v) dv du fX (x) dx
(αn )4
J−
n [−1,1]d [−1,1]d
= O{h2n /(αn(1) )4 } = o(1)
by the Mean Value Theorem and assumptions (FX ) and (k).
Proof of assertions (i)–(iv) for σ̂1 . Recall the definition σ̂12 = ŝ1 − m̂21 , where ŝ1 is the local polynomial estimator based on
(X 1 , Y11
2
), . . . , (X n , Y1n
2
). With the notation s1 (x) = E(Y1i2 | X i = x) = σ12 (x) + m21 (x), we obtain
σ̂1 (x) ŝ1 (x) − s1 (x) m̂1 (x) − m1 (x) m1 (x)
=1+ − + r(x),
σ1 (x) 2σ12 (x) σ1 (x) σ1 (x)
where
1 {m̂1 (x) − m1 (x)}2 {σ̂ 2
1 (x) −σ
2
1 (x)
} 2
r(x) = − − .
2 σ 2
1 (x) 2σ 2
1 (x) {σ̂
1 (x) +σ 1 (x)
} 2
For x ∈ Jn , set
with i ∈ I. Along the lines of the proof of (i) and (ii) for m̂1 , one can prove that
⏐ ⏐
⏐ ŝ1 (x) − s1 (x) p+1 )
⏐ = OP (1) ϱn (2)
2
( ) (
Mn h n
= oP (n−1/2 )
⏐
sup ⏐
⏐ − ĉ1 (x) + OP
2σ 2 (x)
(1) (2)
x∈J
⏐ (αn )|I| (αn )2 αn (αn )2
n 1
N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162 161
and
( )
ϱn
sup |ĉ1 (x)| = OP (1) (2) = oP (n−1/4 ).
x∈Jn αn (αn )2
Now noticing that σ̂12 (x) − σ12 (x) = ŝ1 (x) − s1 (x) − {m̂1 (x) − m1 (x)}{m̂1 (x) + m1 (x)}, we obtain the rate
ϱn2
( )
sup |r(x)| = oP (n−1/2 ) + OP (1) 2 (2) = oP (n−1/2 )
x∈Jn (αn ) (αn )4
â1 (x){m1 (x)/σ1 (x)}fX (x) dx, which is again shown analogously to the
∫
But the second sum is also the dominating term in Jn
proof of (iv) for â1 . Thus (iv) follows for b̂1 . □
Remark 5. Note that due to property (iii) of Lemma 1 and (C.1), we have, for x ∈ Jn = [−cn , cn ]d ,
∥x∥ν |â1 (x)| ≤ O{(ln n)ν/d } o(n−1/4 ) = o(1)
for every ν > 0. In the proof of Lemma 1, â1 (x) was only defined for x ∈ Jn . Now we define â1 on Rd in such a way that if
â1 ∈ C1d+δ (Jn ) and ∥x∥ν |â1 (x)| ≤ 1, then â1 ∈ G defined in (A.9). Then Pr(â1 ∈ G ) → 1 by Lemma 1. Analogously b̂1 is defined
on Rd such that Pr(b̂1 ∈ G̃ ) → 1 for G̃ from (A.10).
Lemma 2. Let H = G or G̃ denote one of the function classes defined in (A.9) and (A.10) (depending on ν > 0 and δ ∈ (0, 1]),
then we have
[ ]
ln N(ϵ, H, ∥ · ∥∞ ) = O ϵ −{d/ν+d/(d+δ )}
Proof. Let H = G (the proof is similar for G̃ ) and let ϵ > 0. Choose D = D(ϵ ) = ϵ −1/ν . Let B denote the ball of radius D around
the origin. Let a1 , . . . , am : B → R denote the centers of ϵ -balls with respect to the supremum norm that cover C1d+δ (B),
i.e., m = N(ϵ, C1d+δ (B), ∥ · ∥∞ ). Then for each a ∈ G we have a|B ∈ C1d+δ (B) and thus there exists j0 ∈ {1, . . . , m} such that
supx∈B |a(x) − aj0 (x)| ≤ ϵ . Now for each j ∈ {1, . . . , m}, define aj (x) = 0 for x ∈ Rd \ B. Then
{ }
sup |a(x) − aj0 (x)| ≤ max ϵ, sup |a(x)| ≤ ϵ
x∈Rd ∥x∥≥D
ν
because ∥x∥ |a(x)| ≤ 1 by definition of G . We obtain N(ϵ, G , ∥ · ∥∞ ) ≤ m and due to Theorem 2.7.1 in van der Vaart and
Wellner [33], we have, for some universal K ,
[ ]
ln m ≤ K λd (B1 ) ϵ −d/(d+δ ) = O{(D + 2)d } ϵ −d/(d+δ ) = O ϵ −{d/ν+d/(d+δ )} ,
where B1 = {x : ∥x − B∥ < 1}. Thus the first assertion follows. The second assertion follows by the proof of Corollary 2.7.2
in [33]. □
References
[1] B. Berghaus, A. Bücher, S. Volgushev, Weak convergence of the empirical copula process with respect to weighted metrics, Bernoulli 23 (2017)
743−772.
[2] P.J. Bickel, M.J. Wichura, Convergence criteria for multiparameter stochastic processes and some applications, Ann. Math. Stat. 42 (1971) 1656−1670.
[3] B. Brahimi, A. Necir, A semiparametric estimation of copula models based on the method of moments, Stat. Methodol. 9 (2012) 467–477.
[4] A. Bücher, S. Volgushev, Empirical and sequential empirical copula processes under serial dependence, J. Multivariate Anal. 119 (2013) 61–70.
[5] N.-H. Chan, J. Chen, X. Chen, Y. Fan, L. Peng, Statistical inference for multivariate residual copula of GARCH models, Statist. Sinica 19 (2009) 53–70.
[6] X. Chen, Y. Fan, Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification, J.
Econometrics 135 (2006) 125–154.
[7] J. Dedecker, S. Louhichi, Maximal inequalities and empirical central limit theorems, in: H. Dehling, T. Mikosch, M. Sorensen (Eds.), Empirical Process
Techniques for Dependent Data, Birkhäuser Boston, 2002, pp. 137–160.
[8] H. Dette, J.C. Pardo-Fernández, I. Van Keilegom, Goodness-of-fit tests for multiplicative models with dependent data, Scand. J. Stat. 36 (2009) 782–799.
[9] J. Fan, I. Gijbels, Local Polynomial Modelling and its Applications, Chapman & Hall/CRC, London, 1996.
[10] J. Fan, Q. Yao, Nonlinear Time Series: Nonparametric and Parametric Methods, Springer, New York, 2005.
162 N. Neumeyer, M. Omelka and Š. Hudecová / Journal of Multivariate Analysis 171 (2019) 139–162
[11] J. Gao, Nonlinear Time Series: Semiparametric and Nonparametric Methods, Chapman & Hall/CRC, Boca Raton, FL, 2007.
[12] C. Genest, K. Ghoudi, L.-P. Rivest, A semiparametric estimation procedure of dependence parameters in multivariate families of distributions,
Biometrika 82 (1995) 543–552.
[13] C. Genest, J.G. Nešlehová, B. Rémillard, Asymptotic behavior of the empirical multilinear copula process under broad conditions, J. Multivariate Anal.
159 (2017) 82–110.
[14] C. Genest, B. Rémillard, D. Beaudoin, Goodness-of-fit tests for copulas: A review and a power study, Insurance Math. Econom. 44 (2009) 199–213.
[15] I. Gijbels, M. Omelka, M. Pešta, N. Veraverbeke, Score tests for covariate effects in conditional copulas, J. Multivariate Anal. 159 (2017) 111–133.
[16] I. Gijbels, M. Omelka, N. Veraverbeke, Estimation of a copula when a covariate affects only marginal distributions, Scand. J. Stat. 42 (2015) 1109–1126.
[17] B.E. Hansen, Uniform convergence rates for kernel estimation with dependent data, Econom. Theory 24 (2008) 726–748.
[18] W. Härdle, A. Tsybakov, L. Yang, Nonparametric vector autoregression, J. Statist. Plann. Inference 68 (1998) 221–245.
[19] G. Kim, M.J. Silvapulle, P. Silvapulle, Semiparametric estimation of the error distribution in multivariate regression using copulas, Aust. N. Z. J. Stat.
49 (2007) 321–336.
[20] G. Kim, M.J. Silvapulle, P. Silvapulle, Estimating the error distribution in multivariate heteroscedastic time-series models, J. Statist. Plann. Inference
138 (2008) 1442–1458.
[21] I. Kojadinovic, M. Holmes, Tests of independence among continuous random vectors based on Cramér–von Mises functionals of the empirical copula
process, J. Multivariate Anal. 100 (2009) 1137–1154.
[22] H.L. Koul, X. Zhu, Goodness-of-fit testing of error distribution in nonparametric ARCH(1) models, J. Multivariate Anal. 137 (2015) 141–160.
[23] E. Masry, Multivariate local polynomial regression for time series: uniform strong consistency and rates, J. Time Series Anal. 17 (1996) 571–599.
[24] A.J. McNeil, R. Frey, P. Embrechts, Quantitative Risk Management: Concepts, Techniques and Tools, Princeton University Press, Princeton, NJ, 2005.
[25] U.U. Müller, A. Schick, W. Wefelmeyer, Estimating the error distribution function in nonparametric regression, Statist. Probab. Lett. 79 (2009) 957–964.
[26] R.B. Nelsen, An Introduction to Copulas, second ed, Springer, New York, 2006.
[27] A.J. Patton, A review of copula models for economic time series, J. Multivariate Anal. 110 (2012) 4–18.
[28] F. Portier, J. Segers, On the weak convergence of the empirical conditional copula under a simplifying assumption, J. Multivariate Anal. 166 (2018)
160–181.
[29] R Core Team, R : A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2018.
[30] B. Rémillard, N. Papageorgiou, F. Soustra, Copula-based semiparametric models for multivariate time series, J. Multivariate Anal. 110 (2012) 30–42.
[31] J. Segers, Weak convergence of empirical copula processes under nonrestrictive smoothness assumptions, Bernoulli 18 (2012) 764–782.
[32] H. Tsukahara, Semiparametric estimation in copula models, Canad. J. Statist. 33 (2005) 357–375.
[33] A.W. van der Vaart, J.A. Wellner, Weak Convergence and Empirical Processes, Springer, New York, 1996.
[34] A.W. van der Vaart, J.A. Wellner, Empirical processes indexed by estimated functions, in: E.A. Cator, G. Jongbloed, C. Kraaikamp, H.P. Lopuhaä, J.A.
Wellner (Eds.), Asymptotics: Particles, Processes and Inverse Problems, Institute of Mathematical Statistics, Hayward, CA, 2007, pp. 234–252.
[35] N. Veraverbeke, M. Omelka, I. Gijbels, Estimation of a conditional copula and association measures, Scand. J. Stat. 38 (2011) 766–780.
[36] L. Yang, W.K. Härdle, J. Nielsen, Nonparametric autoregression with multiplicative volatility and additive mean, J. Time Series Anal. 20 (1999) 579–604.