Research 6
Research 6
Abstract
Measuring conditional dependence is one of the important tasks in statistical inference and
is fundamental in causal discovery, feature selection, dimensionality reduction, Bayesian
network learning, and others. In this work, we explore the connection between conditional
dependence measures induced by distances on a metric space and reproducing kernels
associated with a reproducing kernel Hilbert space (RKHS). For certain distance and kernel
pairs, we show the distance-based conditional dependence measures to be equivalent to
that of kernel-based measures. On the other hand, we also show that some popular kernel
conditional dependence measures based on the Hilbert-Schmidt norm of a certain cross-
conditional covariance operator, do not have a simple distance representation, except in
some limiting cases.
Keywords: Conditional independence test, distance covariance, energy distance, Hilbert-
Schmidt independence criterion, reproducing kernel Hilbert space
1. Introduction
where µk (ν) is called the mean element or kernel mean embedding of ν. Using this notion,
the kernel distance, also called as the maximum mean discrepancy (MMD) between two
probability distributions P and Q is defined as the distance between their mean elements
(Gretton et al., 2007), i.e., D(P, Q) = kµk (P) − µk (Q)kHk . The kernel embedding and the
kernel distance are well-studied in the literature and their mathematical theory is well-
developed (Sriperumbudur et al., 2010, 2011; Sriperumbudur, 2016; Szabó and Sriperumbudur,
2018; Simon-Gabriel and Schölkopf, 2018; Simon-Gabriel et al., 2020). Generalizing this
notion of kernel embedding to distributions defined on product spaces yields a kernel measure
of dependence, called the Hilbert-Schmidt independence criterion (HSIC; Gretton et al., 2005,
Gretton et al., 2008, Smola et al., 2007), which can then be used as a measure of conditional
dependence by employing it to conditional probability distributions (Fukumizu et al., 2004,
2008). Fukumizu et al. (2004); Gretton et al. (2005) provided an alternate interpretation for
HSIC in terms of the Hilbert-Schmidt norm of a certain cross-covariance operator, based on
which the Hilbert-Schmidt norm of a conditional cross-covariance operator (we refer to it as
HSC̈IC) is then proposed as a measure of conditional dependence. We point the reader to
Sections 4 and 5 for details and refer to these class of probability metrics as kernel-based
measures.
Sejdinovic et al. (2013) established an equivalence between distance-based and kernel-
based dependence measures (i.e., distance covariance and HSIC) by showing that a repro-
ducing kernel that defines HSIC induces a semi-metric of negative type which in turn defines
the distance covariance (Székely et al., 2007, 2009), and vice-versa. However, despite the
striking similarity, the relationship between conditional distance covariance and related
kernel measures is not known. The goal of this work is to investigate the relationship
between distance and kernel-based measures of conditional independence, and in particular,
2
Distance and Kernel Measures of Conditional Dependence
understand whether these measures are equivalent (i.e., the distance measure can be obtained
from the kernel measure and vice-versa).
As our contributions, first, in Theorem 1 (Section 4.2), we generalize the conditional
distance covariance of Wang et al. (2015) to arbitrary metric spaces of negative type—we
call this as generalized CdCov (gCdCov)—and develop a kernel measure of conditional
dependence (we refer to it as HSCIC) that is equivalent to gCdCov. Therefore, it follows from
Theorem 1 that CdCov introduced by Wang et al. (2015) is a special case of the HSCIC. In
fact, the HSCIC we obtain is exactly the conditional dependence measure recently proposed
by Park and Muandet (2020). Second, in Theorem 2 (Section 5), we consider the kernel
measure of conditional dependence based on the Hilbert-Schmidt norm of the conditional
cross-covariance operator (i.e., HSC̈IC) and obtain its distance-based interpretation. We
show that this distance-based version of HSC̈IC does not have an elegant interpretation,
except in limiting cases where it is related to CdCov and gCdCov (see Corollaries 3 and 4).
The paper is organized as follows. Definitions and notation that are widely used
throughout the paper are collected in Section 2. The preliminaries on distance-based and
kernel-based measures are presented in Sections 3 and 4.1, respectively, while main results
are presented in Sections 4.2 and 5.
3
Sheng and Sriperumbudur
As a crucial property, CdCov is zero PZ -almost surely if and only if X Y |Z. Similar
to distance covariance, one advantage of this measure is that its sample version can be
expressed elegantly as a V - or U -statistic, based on which Wang et al. (2015) proposed a
statistically consistent conditional independence test.
The conditional distance covariance defined above can also be computed in terms of the
conditional expectations of pairwise Euclidean distances:
V 2 (X, Y |Z) = E[E[kX − X 0 kkY − Y 0 k|X, Y, Z]|Z] + E[kX − X 0 k|Z]E[kY − Y 0 k|Z]
−2E[[E[kX − X 0 k|X, Z]E[kY − Y 0 k|Y, Z]|Z], (2)
4
Distance and Kernel Measures of Conditional Dependence
where (X, Y ) and (X 0 , Y 0 ) are independent copies given Z. In the similar spirit of Lyons
(2013), CdCov can be extended to metric spaces of negative type through conditional
expectations so that (2) can be written as
Vρ2X ,ρY (X, Y |Z) = E[E[ρX (X, X 0 )ρY (Y, Y 0 )|X, Y, Z]|Z]
+E[ρX (X, X 0 )|Z]E[ρY (Y, Y 0 )|Z]
−2E[E[ρX (X, X 0 )|X, Z]E[ρY (Y, Y 0 )|Y, Z]|Z], (3)
0 0
=: G[ρX (X, X )ρY (Y, Y )] =: G ◦ [ρX ρY ], (4)
where ρX and ρY are metrics of strongly negative type defined on spaces X and Y
respectively with E[ρ2X (X, x0 )|Z] < ∞ a.s.-PZ and E[ρ2Y (Y, y0 )|Z] < ∞ a.s.-PZ for some
x0 ∈ X and y0 ∈ Y . The moment conditions ensure that the expectations are finite. When
ρX and ρY are strongly negative, then clearly (3) is zero if and only if X Y |Z.
where Hk is an RKHS with k as the reproducing kernel. Based on this embedding, a distance
on the space of probabilities can be defined through the distance between the embeddings,
i.e., Dk (P, Q) = kµP − µQ kHk , called the kernel distance or maximum mean discrepancy
(Gretton et al., 2007). If the map P 7→ µP is injective, then the kernel k that induces µP is
said to be characteristic (Fukumizu et al., 2009; Sriperumbudur et al., 2010) and therefore
1/2
Dk (P, Q) induces a metric on Mk (X ) := {P ∈ M1+ (X ) : X k(x, x) dP (x) < ∞},
R p
where M1+ (X ) denotes the set of all probability measures on X . Using the reproducing
property of the kernel, it can be shown that
Dk2 (P, Q) = EXX 0 k(X, X 0 ) + EY Y 0 k(Y, Y 0 ) − 2EXY k(X, Y ),
i.i.d. i.i.d.
where X, X 0 ∼ P and Y, Y 0 ∼ Q. Extending this distance to probability measures on
product spaces, particularly the joint measure PXY and product of marginals PX PY , yields
5
Sheng and Sriperumbudur
If the kernels kX and kY are characteristic, then HSIC characterizes independence (Szabó
and Sriperumbudur, 2018), i.e., DkX kY (PXY , PX PY ) = 0 if and only if X Y . An empirical
version of (5) has been used as a test statistic in independence testing and the resultant test
is shown to be consistent against all alternatives as long as kX and kY are characteristic
(Gretton et al., 2008). An interesting connection between kernel-based HSIC and distance-
based dCov is shown by Sejdinovic et al. (2013) that dCov in (1) is in fact a special case of
HSIC and HSIC is equivalent to the generalized dCov introduced by Lyons (2013). This
result provides a unifying framework for the distance and kernel-based dependence measures.
With this background, in the rest of the paper, we explore the relation between distance
and kernel-based measures of conditional dependence.
and
kY (y, y 0 ) = ρY (y, θ0 ) + ρY (y 0 , θ0 ) − ρY (y, y 0 )
for some θ ∈ X and θ0 ∈ Y . Then
Vρ2X ,ρY (X, Y |Z) = G ◦ [ρX ρY ] = Dk2X kY (PXY |Z , PX|Z PY |Z ), a.s.-PZ (6)
with
Dk2X kY (PXY |Z , PX|Z PY |Z ) = G ◦ [kX kY ] ,
where G is defined in (4).
On the other hand, let kX and kY be pd kernels on X and Y respectively. Suppose
2 (X, X)|Z] < ∞ and E[k 2 (Y, Y )|Z] < ∞ a.s.-P . If ρ
E[kX Y Z X and ρY are semi-metrics on
X and Y that are kernel-induced, i.e.,
kX (x, x) + kX (x0 , x0 )
ρX (x, x0 ) = − kX (x, x0 )
2
6
Distance and Kernel Measures of Conditional Dependence
and
kY (y, y) + kY (y 0 , y 0 )
ρY (y, y 0 ) = − kY (y, y 0 ),
2
then (6) holds.
a.s.-PZ , where we used the fact that G[g(X, Y, X 0 , Y 0 )] = 0 a.s.-PZ when g does not depend
on one or more of its arguments (for example, a constant function). On the other hand,
suppose ρX and ρY are kernel-induced. Clearly they are of negative type. Then
Vρ2X ,ρY (X, Y |Z) = G[ρX (X, X 0 )ρY (Y, Y 0 )] = G[kX (X, X 0 )kY (Y, Y 0 )]
= kE[kX (X, ·) ⊗ kY (Y, ·)|Z] − E[kX (X, ·)|Z] ⊗ E[kY (Y, ·)|Z]k2HX ⊗HY
2
= µPXY |Z − µPX|Z ⊗ µPY |Z . (7)
HX ⊗HY
7
Sheng and Sriperumbudur
measures of conditional dependence (which we do in Section 5), first we will briefly discuss
how HSIC is related to the Hilbert-Schmidt norm of a cross-covariance operator so that its
extension to the conditional version is natural.
For random variables X ∼ PX and Y ∼ PY with joint distribution PXY such that
E[kX (X, X)] < ∞ and E[kY (Y, Y )] < ∞, there exists a unique bounded linear operator,
called the cross-covariance operator (Baker, 1973; Fukumizu et al., 2004), ΣY X : HkX →
HkY such that ∀ f ∈ HkX , g ∈ HkY ,
hg, ΣY X f iHkY = E[f (X)g(Y )] − E[f (X)]E[g(Y )].
In fact, using the reproducing property that f (x) = hf, kX (·, x)iHkX , ∀ x ∈ X and g(y) =
hg, kY (·, y)iHkY , ∀ y ∈ Y , it follows that
Z Z Z Z
ΣY X = kY (·, y) ⊗ kX (·, x) dPXY (x, y) − kY (·, y) dPY (y) ⊗ kX (·, x) dPX (x), (8)
where ⊗ denotes the tensor product. Clearly, ΣY X is a natural generalization of the finite-
dimensional covariance matrix between two random vectors X ∈ Rp and Y ∈ Rq . Based on
(8) and the reproducing property, it can be verified that
Z Z 2
2
kΣY X kHS = kX (·, x) ⊗ kY (·, y) d(PXY − PX PY )(x, y)
Z Z Z Z HS
× d(PXY − PX PY )(x0 , y 0 )
= Dk2X kY (PXY , PX PY ), (9)
where k · kHS denotes the Hilbert-Schmidt norm. Since HSCIC is a conditional version
of HSIC and since the latter is the Hilbert-Schmidt norm of the cross-variance operator,
it is natural to extend ΣY X to its conditional version as a PZ -measurable bounded linear
operator Σ̇Y X|Z : HkX → HkY such that ∀ f ∈ HkX , g ∈ HkY ,
hg, Σ̇Y X|Z f iHkY = E[f (X)g(Y )|Z] − E[f (X)|Z]E[g(Y )|Z], a.s.-PZ ,
thereby yielding
Σ̇Y X|Z = E[kX (·, X) ⊗ kY (·, Y )|Z] − E[kX (·, X)|Z] ⊗ E[kY (·, Y )|Z].
Similar to (9), it is easy to verify that
kΣ̇Y X|Z k2HS = Dk2X kY (PXY |Z , PX|Z PY |Z )
a.s.-PZ . Therefore if kX and kY are characteristic, then X Y |Z ⇐⇒ Σ̇Y X|Z = 0, PZ -a.s.
However, in the kernel literature, to the best of our knowledge, besides the concurrent
and independent work by Park and Muandet (2020) in which a quantity similar to HSCIC
is proposed, HSCIC has not been used as a measure of conditional independence probably
because it is a random operator. We can obtain a single measure of conditional dependence
by considering the expectation of HSCIC over Z ∼ PZ , i.e.,
DPZ (PXY |Z , PX|Z PY |Z ) := EZ [kΣ̇Y X|Z k2HS ]. (10)
This single measure of conditional dependence by taking HSCIC over Z is not discussed in
Park and Muandet (2020).
8
Distance and Kernel Measures of Conditional Dependence
However, unlike Σ̇Y X|Z , the conditional cross-covariance operator ΣY X|Z does not character-
ize conditional independence since ΣY X|Z = 0—assuming kX and kY to be characteristic—
only implies PXY = EZ [PX|Z PY |Z ] and not Σ̇Y X|Z = 0, a.s.-PZ (Fukumizu et al., 2004,
Theorem 8). Therefore, Fukumizu et al. (2004, Corollary 9) considered Z as a part of X
by defining Ẍ := (X, Z) and showed that ΣY Ẍ|Z = 0 if and only if X Y |Z, assuming
kX , kY and kZ to be characteristic. This is indeed the case since if kX , kY and kZ are
characteristic, then ΣY Ẍ|Z = 0 implies EZ [Σ̇Y Ẍ|Z ] = 0 and therefore
E[E[1{X ∈ A, Y ∈ B, Z ∈ C}|Z]]
−E[E[1{X ∈ A, Z ∈ C}|Z]EY |Z [1{Y ∈ B}|Z]]
= E[1{X ∈ A, Y ∈ B, Z ∈ C}] − E[E[1{X ∈ A, Z ∈ C}|Z]E[1{Y ∈ B}|Z]]
= E[E[1{X ∈ A, Y ∈ B}|Z]1{Z ∈ C}]
−E[E[1{X ∈ A}|Z]1{Z ∈ C}]E[1{Y ∈ B}|Z]]
= E[[PXY |Z (A × B|Z) − PX|Z (A|Z)PY |Z (B|Z)]1{Z ∈ C}] = 0,
for all A ∈ BX , B ∈ BY and C ∈ BZ , where BX , BY and BZ are the Borel σ-algebras
associated with X , Y and Z respectively. This implies,
PXY |Z (A × B|Z) − PX|Z (A|Z)PY |Z (B|Z) = 0, a.s.-PZ ,
implying X Y |Z, a.s.-PZ . Hence kΣY Ẍ|Z k2HS can be used as a measure of conditional
independence, which we refer to it as HSC̈IC.
The goal of this section is to explore the distance counterpart of HSC̈IC and understand
how it is related to CdCov, gCdCov, and DPZ defined in (10). To this end, we first provide
an expression for kΣY Ẍ|Z k2HS in terms of kernels, using which we obtain an expression in
terms of distances.
Theorem 2 Suppose EX [kX 2 (X, X)] < ∞, E [k 2 (Y, Y )] < ∞ and E [k 2 (Z, Z)] < ∞.
Y Y Z Z
Denote Ẍ = (X, Z) Then
h D E i
kΣY Ẍ|Z k2HS = EZ EZ 0 kZ (Z, Z 0 ) Σ̇Y X|Z , Σ̇Y X|Z 0
HS
= EZ EZ 0 [kZ (Z, Z 0 )h(Z, Z 0 )], (11)
9
Sheng and Sriperumbudur
where h(Z, Z 0 ) := FY X|Z FY 0 X 0 |Z 0 [kX (X, X 0 )kY (Y, Y 0 )], FY X|Z := EXY |Z − EY |Z EX|Z and
EXY |Z := E[·|Z] (EY |Z and EX|Z are defined similarly).
Suppose kX and kY are distance-induced, i.e.,
kX (x, x0 ) = ρX (x, θ)+ρX (x0 , θ)−ρX (x, x0 ) and kY (y, y 0 ) = ρY (y, θ0 )+ρY (y 0 , θ0 )−ρY (y, y 0 )
for some θ ∈ X and θ0 ∈ Y . Then h(Z, Z 0 ) = FY X|Z FY 0 X 0 |Z 0 [ρX (X, X 0 )ρY (Y, Y 0 )].
Therefore,
2
kΣY Ẍ|Z k2HS = E[Σ̇Y X|Z ⊗ kZ (·, Z)]
HS
D E
= EZ [Σ̇Y X|Z ⊗ kZ (·, Z)], EZ [Σ̇Y X|Z ⊗ kZ (·, Z)]
D E HS
0
= EZ EZ 0 Σ̇Y X|Z ⊗ kZ (·, Z), Σ̇Y X|Z 0 ⊗ kZ (·, Z )
HS
D E
0
= EZ EZ 0 Σ̇Y X|Z , Σ̇Y X|Z 0 hkZ (·, Z), kZ (·, Z )iHkZ
D EHS
= EZ EZ 0 Σ̇Y X|Z , Σ̇Y X|Z 0 kZ (Z, Z 0 ). (12)
HS
Note that Σ̇Y X|Z = FY X|Z [kY (·, Y ) ⊗ kX (·, X)]. Therefore,
D E
Σ̇Y X|Z , Σ̇Y X|Z 0 = FY X|Z [kY (·, Y ) ⊗ kX (·, X)] , FY X|Z 0 [kY (·, Y ) ⊗ kX (·, X)] HS
HS
= FY X|Z [kY (·, Y ) ⊗ kX (·, X)] , FY 0 X 0 |Z 0 kY (·, Y 0 ) ⊗ kX (·, X 0 ) HS
using which in (12) yields the result. If kX and kY are distance-induced, then using the
fact that FY X|Z FY 0 X 0 |Z 0 [g(X, X 0 , Y, Y 0 )] = 0 when g does not depend on one or more of its
arguments—basically, the same argument that we carried out in the proof of Theorem 1—we
have
h(Z, Z 0 ) = FY X|Z FY 0 X 0 |Z 0 [ρX (X, X 0 )ρY (Y, Y 0 )],
and the result follows.
While h(Z, Z 0 ) has a distance interpretation as shown in Theorem 2, kΣY Ẍ|Z k2HS does not
10
Distance and Kernel Measures of Conditional Dependence
Corollary 3 Suppose the assumptions of Theorem 2 hold and PZ has a density pZ w.r.t. the
Lebesgue measure on Rd such that h(z, ·)pZ is uniformly continuous and bounded for all
z ∈ Rd . For t > 0, let
z − z0
0 1
kZ (z, z ) = d ψ , z, z 0 ∈ Rd ,
t t
where ψ ∈ L1 (Rd ) is a bounded continuous positive definite function with Rd ψ(z) dz = 1.
R
Then
lim kΣY Ẍ|Z k2HS = EZ [kΣ̇Y X|Z k2HS pZ (Z)] = DP 2 (PXY |Z , PX|Z PY |Z ).
t→0 Z
where ∗ denotes convolution. Taking the limit on both sides as t → 0 and applying dominated
convergence theorem, we obtain
Z Z
2
lim kΣY Ẍ|Z kHS = lim pZ (z)(ψt ∗ (h(z, ·)pZ )(z) dz = pZ (z) lim(ψt ∗ (h(z, ·)pZ )(z) dz.
t→0 t→0 t→0
The result follows from Folland (1999, Theorem 8.14) which yields limt→0 (ψt ∗(h(z, ·)pZ )(z) =
h(z, z)pZ (z) for all z ∈ Rd and by noting that h(Z, Z) = kΣ̇Y X|Z k2HS .
11
Sheng and Sriperumbudur
Then
2
kΣY Ẍ|Z k2HS = EZ η(Z) φXY |Z − φX|Z φY |Z
L2 (w)
, (15)
Z
2
ess sup φXY |Z (t, s) − φX|Z (t)φY |Z (s) dw(t, s) < ∞, (16)
Z
then
lim kΣY Ẍ|Z k2HS = p2Z (a)V 2 (X, Y |Z = a). (17)
t→0
and therefore (15) follows by using (18) in (11) with k(z, z 0 ) = η(z)η(z 0 ) and applying
dominated convergence theorem through (14). We now prove (18). Consider
where
0
h i
Λ(t, s, Z, Z ) = φXY |Z (t, s) − φX|Z (t)φY |Z (s) φXY |Z 0 (t, s) − φX|Z 0 (t)φY |Z 0 (s)
h h i h i h ii
= E ei(ht,Xi+hs,Y i) |Z − E eiht,Xi |Z E eihs,Y i |Z
h i
· E ei(ht,Xi+hs,Y i) |Z 0 − E eiht,Xi |Z 0 E eihs,Y i |Z 0
0 i+hs,Y −Y 0 i) 0 i+hs,Y −Y 0 i)
= EXY |Z EX 0 Y 0 |Z 0 ei(ht,X−X − EXY |Z EX 0 |Z 0 EY 0 |Z 0 ei(ht,X−X
0 i+hs,Y −Y 0 i)
−EX|Z EY |Z EX 0 Y 0 |Z 0 ei(ht,X−X
0 i+hs,Y −Y 0 i)
+EX|Z EY |Z EX 0 |Z 0 EY 0 |Z 0 ei(ht,X−X
0 i+hs,Y −Y 0 i)
= FY X|Z FY 0 X 0 |Z 0 ei(ht,X−X , (20)
12
Distance and Kernel Measures of Conditional Dependence
by noting that sinht, X − X 0 i and sinhs, Y − Y 0 i are odd functions w.r.t. t and s respectively.
Since cosht, X − X 0 i coshs, Y − Y 0 i = 1 − (1 − cosht, X − X 0 i) − (1 − coshs, Y − Y 0 i) + (1 −
cosht, X − X 0 i)(1 − coshs, Y − Y 0 i) and
FY X|Z FY 0 X 0 |Z 0 · [f (X, X 0 , Y, Y 0 )] = 0
where the last equality follows from Lemma 1 of Székely et al. (2007) through 1−cosht,xi
R
cp ktkp+1
dt =
−d z
kxk, thereby proving the result in (15). By defining θt (z) = t θ t , we have
EZ [η(Z) φXY |Z − φX|Z φY |Z ] = θt ∗ φXY |Z − φX|Z φY |Z pZ (a),
which by (Folland, 1999, Theorem 8.14) converges to φXY |Z=a − φX|Z=a φY |Z=a pZ (a) as
t → 0. Using these in (15) along with dominated convergence theorem combined with (16)
yields (17).
6. Discussion
Conditional distance covariance is a commonly used metric for measuring conditional
dependence in the statistics community. In the machine learning community, a conditional
dependence measure based on reproducing kernels is popularly used in applications such as
conditional independence testing. In this work, we have explored the connection between
these two conditional dependence measures where we showed the distance-based measure to
be a limiting version of the kernel-based measure, where we may view conditional distance
covariance as a member of a much larger class of kernel-based conditional dependence
measures. This may enable to design more powerful conditional independence tests by
choosing a richer class of kernels.
Having understood the relation between these various measures of conditional dependence,
an important question to understand is the statistical behavior of conditional independence
tests based on these measures. Fukumizu et al. (2004, Proposition 5) provides an alternate
13
Sheng and Sriperumbudur
representation for the conditional covariance operator, ΣY Ẍ|Z in terms of only covariance
operators (this is reminiscent of the situation when (X, Y, Z) are jointly normal so that
the conditional covariance matrix can be represented in terms of the joint covariance
matrices) as ΣY Ẍ|Z = ΣY Ẍ − ΣY Z Σ̃−1 −1
ZZ ΣZ Ẍ where Σ̃ZZ is the right inverse of ΣZZ on
(Ker(ΣZZ ))⊥ . The advantage of this alternate form is that ΣY Ẍ|Z can be estimated from
i.i.d.
data (Xi , Yi , Zi )ni=1 ∼ PXY Z by simply estimating the (cross) covariance operators ΣY Ẍ ,
ΣY Z , ΣZ Ẍ , and replacing Σ̃−1
ZZ by an inverse of the regularized version of an empirical
estimator of ΣZZ . Using these, a plug-in (biased) estimator kΣ̂Y Ẍ|Z k2HS of HSC̈IC (i.e.,
kΣY Ẍ|Z k2HS ), can be shown to be consistent and to have a computational complexity of
O(n3 ), where Σ̂Y Ẍ|Z := Σ̂Y Ẍ − Σ̂Y Z (Σ̂ZZ +λI)−1 Σ̂Z Ẍ and λ > 0—these claims can be proved
using the ideas in Fukumizu et al. (2008) where such claims are proved for a normalized
version of ΣY Ẍ|Z . Similar results are shown for the kernel version of HSCIC (see (7)) by
Park and Muandet (2020). To elaborate, (Park and Muandet, 2020, Section 5.2) proposed a
biased estimator of HSCIC (see r.h.s. of (7)), which is based on Gram matrices on X , Y and
Z and associated regularized inverse, yielding a computational complexity of O(n3 ). On the
other hand, Wang et al. (2015) proposed a (biased) estimator of CdCov—the same idea can
be used to estimate gCdCov and therefore HSCIC—based on a Nadarya-Watson type density
estimator of PXY |Z , where it can be shown that HSCIC can be consistently estimated with
a computational complexity of O(n3 ). This means, all these different estimators of HSCIC
and HSC̈IC are consistent and have same computational complexity. However, the statistical
performance of these estimators as test statistics to test for conditional independence remains
open.
Acknowledgements
BKS is partially supported by National Science Foundation (NSF) award DMS-1713011 and
CAREER award DMS-1945396.
References
N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical
Society, 68(3):337–404, 1950.
R. D. Cook and B. Li. Dimension reduction for conditional mean in regression. The Annals
of Statistics, 30(2):455–474, 2002.
14
Distance and Kernel Measures of Conditional Dependence
J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, New
York, USA, 2000.
15
Sheng and Sriperumbudur
G. Székely and M. Rizzo. Testing for equal distributions in high dimension. InterStat, (5),
2004.
G. Székely and M. Rizzo. Brownian distance covariance. The Annals of Applied Statistics, 4
(3):1233–1303, 2009.
Gábor J Székely, Maria L Rizzo, et al. Brownian distance covariance. The Annals of Applied
statistics, 3(4):1236–1265, 2009.
X. Wang, W. Pan, W. Hu, Y. Tian, and H. Zhang. Conditional distance correlation. Journal
of the American Statistical Association, 110(512):1726–1734, 2015.
16