A New Coefficient of Correlation (Slides) - Sourav Chatterjee
A New Coefficient of Correlation (Slides) - Sourav Chatterjee
Sourav Chatterjee
n n−1 |ri+1 − ri |
P
ξn (X , Y ) := 1 − Pi=1 .
2 ni=1 `i (n − `i )
where µ is the law of Y . This limit belongs to the interval [0, 1]. It is 0 if
and only if X and Y are independent, and it is 1 if and only if there is a
measurable function f : R → R such that Y = f (X ) almost surely.
For each t ∈ R, let F (t) := P(Y ≤ t) and G (t) := P(Y ≥ t). Let
φ(y , y 0 ) := min{F (y ), F (y 0 )}, and define
Lin and Han also show that under some additional mild assumptions
on the joint distribution, one can replace E(ξn ) by ξ above, and
thereby obtain confidence intervals for ξ.
Moreover, they give a statistic for accurately estimating the variance
of ξn from data.
Thus, we now have a complete asymptotic theory under dependence
too.
● ● ●
●
● ●
●
●
●
●
● ●
● ● ●
● ●
●
● ●● ●
● ●
● ●
● ● ● ●●
● ● ●
● ● ● ● ●
●●● ●
●●
● ● ●
● ● ● ●
● ● ● ●● ●
●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ●●
● ● ● ● ● ●● ●
● ●
● ● ● ● ● ●
●● ● ●
● ●● ● ● ●● ● ●
● ● ●
●
● ● ● ● ●
● ● ● ● ● ●●
● ● ●● ●
● ●● ● ● ●
● ● ● ●
●● ● ● ● ●
● ●● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ●● ● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ● ● ● ●
● ●● ● ●
● ● ●
● ● ●●
● ● ● ●● ● ● ●
●
● ● ● ●
● ● ● ● ●
●● ●● ● ● ● ●
●● ●● ● ●
●
● ●● ● ●● ● ●
●
●● ●
●
●● ●●
●
●
●●●
●● ●
●● ●
●●● ● ●
●●
● ●● ● ●
●
●
● ●
●
● ●
● ● ●● ●
● ● ● ●●
● ●
● ●
● ●
● ● ● ●
● ● ●
● ●● ●
● ● ●
● ● ● ●
● ●
●● ● ●
●● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ●
●● ● ● ● ●
● ● ● ●
● ● ●
● ●
● ● ●● ●
● ● ●
● ● ●● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●● ● ●
●●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ●
● ● ● ● ●
● ● ●
● ●
● ● ● ● ● ● ●●
● ● ●● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
●
● ● ● ● ● ● ●
● ● ●
● ● ● ●
●
● ●
● ● ● ● ● ● ●
● ● ● ● ● ●● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
● ● ●● ● ● ● ● ● ●
●
● ● ● ●
●
● ●
● ● ● ● ●
●
● ● ●
● ●● ●
● ● ●
● ● ● ●
● ● ●
P-values were obtained for each gene, and a set of significant genes
were selected using the Benjamini–Hochberg FDR procedure, with the
expected proportion of false discoveries set at 0.05.
It turned out that there were 215 genes that were selected by ξn but
by none of the other tests that have been used previously.
The figure in the next slide shows the transcript levels of the top 6 of
these genes (that is, those with the smallest P-values). As the figure
shows, these genes exhibit almost perfect oscillatory behavior — and
yet, they were not selected by other tests.
● ●
●
● ● ●
● ● ●
●
●
●
● ●
● ● ●
●
● ●
● ● ●
●
● ●
● ●
●
● ● ●
● ● ● ●
●
● ●
● ● ● ●
●
● ●
YBL003C YLR462W
● ● ● ●
● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
●
● ●
●
●
●
● ●
● ●
●
● ● ●
● ●
● ●
● ● ●
YGR044C YKL164C
● ● ● ● ●
●
● ●
●
●
●
●
● ●
●
●
● ● ● ● ●
●
● ● ●
● ●
● ●
● ● ●
●
● ●
● ● ● ●
● ●
● ● ●
● ●
YDR224C YHR218W
where ri is the rank of Y(i) , where (X(1) , Y(1) ), . . . , (X(n) , Y(n) ) is the
rearrangement of (X1 , Y1 ), . . . , (Xn , Yn ) such that X(1) ≤ · · · ≤ X(n) .
There are two main things to show. First, we have to show that
ξn → ξ, where
R
Var(E(1{Y ≥t} |X ))dµ(t)
ξ= R ,
Var(1{Y ≥t} )dµ(t)
Recall that R
Var(E(1{Y ≥t} |X ))dµ(t)
ξ= R ,
Var(1{Y ≥t} )dµ(t)
where µ is the law of Y .
Since Var(E(1{Y ≥t} |X )) ≤ Var(1{Y ≥t} ) for every t, we have
ξ ∈ [0, 1].
Now, Var(E(1{Y ≥t} |X )) = Var(1{Y ≥t} ) iff E(Var(1{Y ≥t} |X )) = 0 iff
1{Y ≥t} is a measurable function of X .
This holds for all t in the support of Y if and only if Y is a
measurable function of X .
This proves that ξ = 1 iff Y is a measurable function of X .
of Y .
Since Xi ≈ XN(i) , the random variables Yi and YN(i) are
approximately i.i.d. conditional on X = (X1 , . . . , Xn ).
This gives