0% found this document useful (0 votes)
37 views29 pages

A New Coefficient of Correlation (Slides) - Sourav Chatterjee

Uploaded by

Ma Ga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views29 pages

A New Coefficient of Correlation (Slides) - Sourav Chatterjee

Uploaded by

Ma Ga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

A new coefficient of correlation

Sourav Chatterjee

Sourav Chatterjee A new coefficient of correlation 1 / 29


Coefficients of correlation
The three most popular classical measures of statistical association
are Pearson’s correlation coefficient, Spearman’s ρ, and Kendall’s τ .
These coefficients are powerful for detecting linear or monotone
associations, and they have well-developed asymptotic theories for
calculating P-values.
However, a serious problem is that they are not effective for detecting
associations that are not monotonic, even in the complete absence of
noise.
There have been many proposals to address this deficiency of the
classical coefficients, such as
the maximal correlation coefficient,
various coefficients based on joint cumulative distribution functions and
ranks,
kernel-based methods,
information theoretic coefficients,
coefficients based on copulas, and
coefficients based on pairwise distances.
Sourav Chatterjee A new coefficient of correlation 2 / 29
Problems

Some of these coefficients are popular among practitioners. But there


are two common problems.
First, most of these coefficients are designed for testing independence,
and not for measuring the strength of the relationship between the
variables.
Ideally, one would like a coefficient that approaches its maximum
value if and only if one variable looks more and more like a noiseless
function of the other.
It is sometimes believed that the maximal information coefficient and
the maximal correlation coefficient measure the strength of the
relationship in the above sense, but that’s not correct.
Although MIC and maximal correlation are maximized when one
variable is a function of the other, the converse is not true. They may
be equal to 1 even if the relationship is very noisy. (An example is
given in the paper.)
Sourav Chatterjee A new coefficient of correlation 3 / 29
Problems, contd.

The second problem is that none of the coefficients for testing


independence have simple asymptotic theories under the hypothesis of
independence that facilitate the quick computation of P-values for
testing independence.
In the absence of such theories, the only recourse is to use
computationally expensive permutation tests or other kinds of
bootstrap.

Sourav Chatterjee A new coefficient of correlation 4 / 29


Goal of this talk

One may wonder if it is at all possible to define a coefficient that is


as simple as the classical coefficients, and yet
is a consistent estimator of some measure of dependence which is 0 if
and only if the variables are independent and 1 if and only if one is a
measurable function of the other, and
has a simple asymptotic theory under the hypothesis of independence,
like the classical coefficients.
I will now present such a coefficient.
The formula is so simple that it is likely that there are many such
coefficients, some of them possibly having better properties than the
one I am going to present.
Reference:
Chatterjee, S. (2021). A new coefficient of correlation. J. Amer.
Statist. Assoc., 116, no. 536, 2009–2022.

Sourav Chatterjee A new coefficient of correlation 5 / 29


A new coefficient of correlation
Let (X , Y ) be a pair of random variables, where Y is not a constant.
Our data consists of n i.i.d. pairs (X1 , Y1 ), . . . , (Xn , Yn ) with the same
law as (X , Y ), where n ≥ 2.
Given the data, rearrange it as (U1 , V1 ), . . . , (Un , Vn ), where
U1 ≤ · · · ≤ Un . If there multiple ways to do this rearrangement (i.e.,
if the Xi ’s have ties) then choose one uniformly at random.
Let ri be the number of j such that Vj ≤ Vi and `i to be the number
of j such that Vj ≥ Vi .
Then define:

n n−1 |ri+1 − ri |
P
ξn (X , Y ) := 1 − Pi=1 .
2 ni=1 `i (n − `i )

When there are no ties among the Yi ’s, `1 , . . . , `n is just a


permutation of 1, . . . , n, and so the denominator in the above
expression is just n(n2 − 1)/3.
Sourav Chatterjee A new coefficient of correlation 6 / 29
Consistency

The following theorem shows that ξn is a consistent estimator of a certain


measure of dependence between the random variables X and Y .

Theorem (C., 2021)


If Y is not almost surely a constant, then as n → ∞, ξn (X , Y ) converges
almost surely to the deterministic limit
R
Var(E(1{Y ≥t} |X ))dµ(t)
ξ(X , Y ) := R ,
Var(1{Y ≥t} )dµ(t)

where µ is the law of Y . This limit belongs to the interval [0, 1]. It is 0 if
and only if X and Y are independent, and it is 1 if and only if there is a
measurable function f : R → R such that Y = f (X ) almost surely.

Sourav Chatterjee A new coefficient of correlation 7 / 29


Remarks

Unlike most coefficients, ξn is not symmetric in X and Y .


But we would like to keep it that way because we may want to
understand if Y is a function X , and not just if one of the variables is
a function of the other.
A symmetric measure of dependence, if required, can be easily
obtained by taking the maximum of ξn (X , Y ) and ξn (Y , X ).
By the theorem, this symmetrized coefficient converges in probability
to max{ξ(X , Y ), ξ(Y , X )}, which is 0 if and only if X and Y are
independent, and 1 if and only if at least one of X and Y is a
measurable function of the other.

Sourav Chatterjee A new coefficient of correlation 8 / 29


Remarks, contd.

In the theorem, there are no restrictions on the law of (X , Y ) other


than that Y is not a constant.
In particular, X and Y can be discrete, continuous, light-tailed or
heavy-tailed.
The coefficient ξn (X , Y ) remains unchanged if we apply strictly
increasing transformations to X and Y , because it is based on ranks.
For the same reason, it can be computed in time O(n log n). (The
actual computation on a computer is also very fast.)
One downside of this coefficient is that it has low power for testing
independence against standard alternatives, compared to some other
tests. (See my survey for references.)

Sourav Chatterjee A new coefficient of correlation 9 / 29


Remarks, contd.

The limiting value ξ(X , Y ) has appeared earlier in the literature


(Dette et al. 2013, and Gamboa et al. 2018).
Dette et al. gave a copula-based estimator for ξ(X , Y ) when X and
Y are continuous, that is consistent under smoothness assumptions
on the copula and appears to be computable in time n5/3 for an
optimal choice of tuning parameters.
The coefficient ξn looks similar to some coefficients defined earlier
(e.g., by Friedman and Rafsky 1983), but in spite of its simple form,
it seems to be genuinely new.

Sourav Chatterjee A new coefficient of correlation 10 / 29


Asymptotic theory under independence when Y is
continuous

Theorem (C., 2021)


Suppose that X and Y are independent and Y is continuous. Then

nξn → N(0, 2/5) in distribution as n → ∞.

In numerical examples, it is seen that the CLT is roughly valid even


for n as small as 20.

Sourav Chatterjee A new coefficient of correlation 11 / 29


Asymptotic distribution under independence when Y is not
continuous

For each t ∈ R, let F (t) := P(Y ≤ t) and G (t) := P(Y ≥ t). Let
φ(y , y 0 ) := min{F (y ), F (y 0 )}, and define

Eφ(Y1 , Y2 )2 − 2E(φ(Y1 , Y2 )φ(Y1 , Y3 )) + (Eφ(Y1 , Y2 ))2


τ2 = ,
(EG (Y )(1 − G (Y )))2

where Y1 , Y2 , Y3 are independent copies of Y . Then, we have:

Theorem (C., 2021)



Suppose that X and Y are independent. Then nξn converges to
N(0, τ 2 ) in distribution as n → ∞, where τ 2 is given by the formula stated
above. The number τ 2 is strictly positive if Y is not a constant, and
equals 2/5 if Y is continuous.

Sourav Chatterjee A new coefficient of correlation 12 / 29


How to estimate τ 2
There is a simple way to estimate τ 2 from the data using the
estimator
an − 2bn + cn2
τ̂n2 = ,
dn2
where an , bn , cn and dn are defined as follows.
For each i, let
R(i) := #{j : Yj ≤ Yi }, L(i) := #{j : Yj ≥ Yi }.
Let u1 ≤ u2 ≤ · · · ≤ un beP an increasing rearrangement of
R(1), . . . , R(n). Let vi := ij=1 uj for i = 1, . . . , n. Define
n n
1 X 1 X
an := (2n − 2i + 1)ui2 , bn := (vi + (n − i)ui )2 ,
n4 n5
i=1 i=1
n n
1 X 1 X
cn := 3 (2n − 2i + 1)ui , dn := 3 L(i)(n − L(i)).
n n
i=1 i=1
Then we have the following result.
Sourav Chatterjee A new coefficient of correlation 13 / 29
Consistency of τ̂n2

Theorem (C., 2021)


The estimator τ̂n2 can be computed in time O(n log n), and converges to
τ 2 almost surely as n → ∞.

Thus, we have a complete asymptotic theory for ξn under


independence.

Sourav Chatterjee A new coefficient of correlation 14 / 29


Asymptotic theory under dependence

Theorem (Lin & Han, 2022)


If the joint c.d.f. of (X , Y ) is continuous and Y is not almost surely a
function of X ,
ξn − E(ξn )
p → N(0, 1)
Var(ξn )
in distribution as n → ∞.

Lin and Han also show that under some additional mild assumptions
on the joint distribution, one can replace E(ξn ) by ξ above, and
thereby obtain confidence intervals for ξ.
Moreover, they give a statistic for accurately estimating the variance
of ξn from data.
Thus, we now have a complete asymptotic theory under dependence
too.

Sourav Chatterjee A new coefficient of correlation 15 / 29


Some simulated examples
●● ●●● ●

●●


● ● ●


●● ● ●
●● ●● ● ●● ●
●●
● ● ● ●
● ●

● ● ● ●●
● ● ● ● ●
●● ● ● ● ●
●● ● ● ● ●
●● ●
● ● ● ● ● ● ●
●● ● ● ●
● ● ●● ●
●● ● ● ●
●● ● ●
● ● ●
● ● ●
● ●
●●●

● ●● ● ●
● ● ● ● ● ●
●● ●● ● ● ●
● ●● ● ● ● ●
● ●● ● ● ●
●●
● ●●● ● ● ●● ●
● ● ● ●
● ● ● ● ●
●●
● ●
● ● ●

●●
● ● ● ● ● ●
●● ●● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
●● ● ● ● ● ● ● ● ●● ●
●● ● ● ●

● ● ●
●●
● ● ● ● ● ●
●● ●
● ● ● ● ● ● ●
● ●● ● ●
● ● ● ● ●
●● ● ● ● ●
● ● ●
● ●
● ●


● ● ● ●

●● ● ●
● ●
●●● ●
●● ● ●

● ●

● ● ●

0.97 0.51 0.03

● ● ●

● ●




● ●
● ● ●
● ●

● ●● ●
● ●
● ●
● ● ● ●●
● ● ●
● ● ● ● ●
●●● ●
●●
● ● ●
● ● ● ●
● ● ● ●● ●

● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ●●
● ● ● ● ● ●● ●
● ●
● ● ● ● ● ●
●● ● ●
● ●● ● ● ●● ● ●
● ● ●

● ● ● ● ●
● ● ● ● ● ●●
● ● ●● ●
● ●● ● ● ●
● ● ● ●
●● ● ● ● ●
● ●● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ●● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ● ● ● ●
● ●● ● ●
● ● ●
● ● ●●
● ● ● ●● ● ● ●

● ● ● ●
● ● ● ● ●
●● ●● ● ● ● ●
●● ●● ● ●

● ●● ● ●● ● ●

●● ●

●● ●●


●●●
●● ●
●● ●
●●● ● ●

0.94 0.27 0.03

●●
● ●● ● ●


● ●

● ●
● ● ●● ●
● ● ● ●●
● ●
● ●
● ●
● ● ● ●
● ● ●
● ●● ●
● ● ●
● ● ● ●
● ●
●● ● ●
●● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ●
●● ● ● ● ●
● ● ● ●
● ● ●
● ●
● ● ●● ●
● ● ●
● ● ●● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●● ● ●
●●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ●
● ● ● ● ●
● ● ●
● ●
● ● ● ● ● ● ●●
● ● ●● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●

● ● ● ● ● ● ●
● ● ●
● ● ● ●

● ●
● ● ● ● ● ● ●
● ● ● ● ● ●● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
● ● ●● ● ● ● ● ● ●

● ● ● ●

● ●
● ● ● ● ●

● ● ●
● ●● ●
● ● ●
● ● ● ●
● ● ●

0.88 0.56 0.02

Figure: Values of ξn for various kinds of scatterplots, with n = 100.


Sourav Chatterjee A new coefficient of correlation 16 / 29
A real data example

In a landmark paper, Spellman et al. (1998) studied the expressions of


6223 yeast genes with the goal of identifying genes whose transcript
levels oscillate during the cell cycle.
In lay terms, this means that the expressions were studied over a
number of successive time points (23, to be precise), and the goal
was to identify the genes for which the transcript levels follow an
oscillatory pattern.
This example illustrates the utility of correlation coefficients in
detecting patterns, because the number of genes is so large that
identifying patterns by visual inspection is out of the question.

Sourav Chatterjee A new coefficient of correlation 17 / 29


Example, contd.

P-values were obtained for each gene, and a set of significant genes
were selected using the Benjamini–Hochberg FDR procedure, with the
expected proportion of false discoveries set at 0.05.
It turned out that there were 215 genes that were selected by ξn but
by none of the other tests that have been used previously.
The figure in the next slide shows the transcript levels of the top 6 of
these genes (that is, those with the smallest P-values). As the figure
shows, these genes exhibit almost perfect oscillatory behavior — and
yet, they were not selected by other tests.

Sourav Chatterjee A new coefficient of correlation 18 / 29


Transcript levels of the top 6 genes selected by ξn

● ●


● ● ●
● ● ●



● ●
● ● ●

● ●
● ● ●

● ●
● ●

● ● ●
● ● ● ●

● ●
● ● ● ●

● ●

YBL003C YLR462W

● ● ● ●
● ●
● ●
● ●
● ●
● ● ● ●

● ●
● ● ● ● ●

● ●

● ●



● ●
● ●

● ● ●
● ●
● ●
● ● ●

YGR044C YKL164C

● ● ● ● ●

● ●



● ●


● ● ● ● ●

● ● ●
● ●

● ●
● ● ●

● ●
● ● ● ●
● ●
● ● ●

● ●

YDR224C YHR218W

Sourav Chatterjee A new coefficient of correlation 19 / 29


A multivariate generalization

Two measurable spaces are said to be isomorphic to each other if


there is a bijection between the two spaces which is measurable and
whose inverse is also measurable.
A standard Borel space is a measurable space that is isomorphic to a
Borel subset of a Polish space.
The Borel isomorphism theorem says that any uncountable standard
Borel space is isomorphic to R.

Sourav Chatterjee A new coefficient of correlation 20 / 29


Generalization of ξn

Let X and Y be two standard Borel spaces.


Let ϕ be an isomorphism between X and a Borel subset of R, and let
ψ be an isomorphism between Y and a Borel subset of R.
Let (X , Y ) be an X × Y-valued pair of random variables, and let
(X1 , Y1 ), . . . , (Xn , Yn ) be i.i.d. copies of (X , Y ).
Let X 0 := ϕ(X ) and Y 0 := ψ(Y ), so that (X 0 , Y 0 ) is a pair of
real-valued random variables. Let Xi0 := ϕ(Xi ) and Yi0 := ψ(Yi ) for
each i.
Finally, define
ξn := ξn0 ,
where ξn0 is the ξ-correlation for the data (X10 , Y10 ), . . . , (Xn0 , Yn0 ).
Note that the definition of ξn depends on our choices of ϕ and ψ.

Sourav Chatterjee A new coefficient of correlation 21 / 29


Main theorem about generalized ξn
Theorem (C., 2022)
If Y is not almost surely a constant, then as n → ∞, ξn converges almost
surely to a deterministic limit ξ. This limit belongs to the interval [0, 1]. It
is 0 if and only if X and Y are independent, and it is 1 if and only if there
is a Borel measurable function f : X → Y such that Y = f (X ) almost
surely.

This result is from my recent survey paper “A survey of some recent


developments in measures of association”, available on arXiv.
This survey paper also contains a list of all the major developments
since my paper came out, including other multivariate generalizations,
extensions to conditional dependence coefficients, power calculations,
suggested improvements, etc.
The univariate ξ coefficient, as well as the above multivariate
generalization and P-values for testing independence, can be
computed using the R package XICOR.
Sourav Chatterjee A new coefficient of correlation 22 / 29
Example of a Borel isomorphism
Here is an example of a Borel isomorphism between Rd and a Borel
subset of R.
Take any x = (x1 , . . . , xd ) ∈ Rd . Let
ai,1 · · · ai,ki .bi,1 bi,2 · · ·
be the binary expansion of |xi |.
Filling in extra 0’s at the beginning if necessary, let us assume that
k1 = · · · = kd = k. Then, let us ‘interlace’ the digits to get the
number
a1,1 a2,1 · · · ad,1 a1,2 a2,2 · · · ad,2 · · · a1,k a2,k · · · ad,k
.b1,1 b2,1 · · · bd,1 b1,2 b2,2 · · · bd,2 · · · .
Let ci = 1 if xi ≥ 0 and 0 if xi < 0.
Sticking 1c1 c2 · · · cd in front of the above list, we get an encoding of
the vector x as a real number.
Numerical simulations with ξn computed using the above scheme
produced satisfactory results.
Sourav Chatterjee A new coefficient of correlation 23 / 29
Where does the formula for ξn come from?

Warning: The next couple of slides are going to be technical. Please


bear with me.
Unfortunately, I do not know of a simpler way of explaining where the
formula for ξn comes from.
For simplicity, we shall consider only the case where X and Y are
continuous random variables, where the formula for ξn simplifies to
n−1
3 X
ξn = 1 − 2
|ri+1 − ri |,
n −1
i=1

where ri is the rank of Y(i) , where (X(1) , Y(1) ), . . . , (X(n) , Y(n) ) is the
rearrangement of (X1 , Y1 ), . . . , (Xn , Yn ) such that X(1) ≤ · · · ≤ X(n) .

Sourav Chatterjee A new coefficient of correlation 24 / 29


Things to show

There are two main things to show. First, we have to show that
ξn → ξ, where
R
Var(E(1{Y ≥t} |X ))dµ(t)
ξ= R ,
Var(1{Y ≥t} )dµ(t)

where µ is the law of Y .


Then, we need to understand why ξ ∈ [0, 1], ξ = 0 if and only if X
and Y are independent, and ξ = 1 if and only if Y = f (X ) for some
measurable function f .
The second step is easier, so let’s do that first.

Sourav Chatterjee A new coefficient of correlation 25 / 29


Proof sketch for properties of ξ

Recall that R
Var(E(1{Y ≥t} |X ))dµ(t)
ξ= R ,
Var(1{Y ≥t} )dµ(t)
where µ is the law of Y .
Since Var(E(1{Y ≥t} |X )) ≤ Var(1{Y ≥t} ) for every t, we have
ξ ∈ [0, 1].
Now, Var(E(1{Y ≥t} |X )) = Var(1{Y ≥t} ) iff E(Var(1{Y ≥t} |X )) = 0 iff
1{Y ≥t} is a measurable function of X .
This holds for all t in the support of Y if and only if Y is a
measurable function of X .
This proves that ξ = 1 iff Y is a measurable function of X .

Sourav Chatterjee A new coefficient of correlation 26 / 29


Proof sketch contd.

Similarly, Var(E(1{Y ≥t} |X )) = 0 iff E(1{Y ≥t} |X ) is a constant, iff


1{Y ≥t} is independent of X .
Again, this holds for all t in the support of Y if and only if Y and X
are independent.
This proves that ξ = 0 iff X and Y are independent.

Sourav Chatterjee A new coefficient of correlation 27 / 29


Proof sketch for ξn → ξ
Recall that ri is the rank of Y(i) , where (X(1) , Y(1) ), . . . , (X(n) , Y(n) ) is
a rearrangement of the data in increasing order of Xi ’s.
Recall that
n−1
3 X
ξn = 1 − 2 |ri+1 − ri |.
n −1
i=0
Note that ri /n ≈ F (Y(i) ), where F is the c.d.f. of Y .
(Glivenko–Cantelli)
Thus,
n−1
3X
ξn ≈ 1 − |F (Y(i+1) ) − F (Y(i) )|
n
i=0
n
3X
≈1− |F (Yj ) − F (YN(j) )|,
n
j=1

where N(j) is the index k such that Xk is immediately to the right of


Xj .
Sourav Chatterjee A new coefficient of correlation 28 / 29
Proof sketch contd.

Now, |F (x) − F (y )| = (1{t≤x} − 1{t≤y } )2 dµ(t), where µ is the law


R

of Y .
Since Xi ≈ XN(i) , the random variables Yi and YN(i) are
approximately i.i.d. conditional on X = (X1 , . . . , Xn ).
This gives

E[(1{t≤Yi } − 1{t≤YN(i) } )2 |X] ≈ 2Var(1{t≤Yi } |X)


= 2Var(1{t≤Yi } |Xi ).

Thus, E(1{t≤Yi } − 1{t≤YN(i) } )2 ≈ 2E[Var(1{t≤Y } |X )].


R
So, we get E|F (Yi ) − F (YN(i) )| ≈ 2E[Var(1{t≤Y } |X )]dµ(t).
From this, it is easy to show E(ξn ) → ξ. Using concentration
inequalities, we then get ξn → ξ.

Sourav Chatterjee A new coefficient of correlation 29 / 29

You might also like