0% found this document useful (0 votes)

67 views9 pages

High-Dimensional, Two-Sample Testing

This document discusses high-dimensional two-sample testing, where the goal is to test if two samples come from the same distribution when the samples have a high number of dimensions. It presents several approaches for defining metrics and test statistics for comparing the distributions, including kernel-based metrics and graph-based tests. The key challenges are that standard tests break down in high dimensions and the minimally detectable difference between distributions depends on both the sample size and dimensionality. Permutation tests are often used instead of asymptotic distributions due to the many nuisance parameters involved.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views9 pages

High-Dimensional, Two-Sample Testing

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

High-Dimensional, Two-Sample Testing

1 Introduction

We observe two iid sample

X1 , . . . , Xn ∼ P, Y1 , . . . , Ym ∼ Q

where Xi , Yi ∈ Rd . We want to test

H0 : P = Q versus H1 : P 6= Q.

Throughout, we will assume that n/(n + m) → π ∈ (0, 1) as the sample size increases.

In low dimensions, there are many tests with good power. For example, we could use the
test statistic
T = sup |Fbn (t) − G
bn (t)|
t

where Fbn and Gbn are the empirical cdf’s. To find the α-level critical value we can use
asymptotic theory or permutation testing. But there are other approaches for the high-
dimensional case.

Why are we interested in two-sample testing? We might be interested in testing whether

two groups are the same for scientific reasons (treatment versus control, for example). Two
sample testing can also be used to screen features for classification.

2 Metrics

One way to define a test is to first define a metric between distributions. For example
Z Z
d(P, Q) = sup gdP − gdQ

g∈G

for some class of functions G. Here are some examples. If G = {g : ||g||∞ ≤ 1} then d(P, Q)
is the total variation distance. If G is the set of g such that

|g(y) − g(x)|
sup ≤1
x6=y ||x − y||

then d(P, Q) is the earth-mover distance (or Wasserstein distance). This is equivalent to
inf R ER ||X −Y || where the infimum is over all joint distributions R for (X, Y ) with marginals

1
P and Q. If G = {I(−∞,t] : t ∈ Rd } then d(P, Q) is the Kolmogorov-Smirnov distance. See
Sriperumbudur et al (2010) for more examples.

In general, estimating d(P, Q) is difficult. But if we take G to be a RKHS defined by a kernel

K, it can be shown that
Z Z Z Z Z Z
2
θ = d (P, Q) = K(x, y)dP (x)dP (y)+ K(x, y)dQ(x)dQ(y)−2 K(x, y)dP (x)dQ(y).

The plus-in estimator of d2 (P, Q) is

2 X 2 X 2 X
T = K(Xi , Xj ) + K(Yi , Yj ) − K(Xi , Yj ).
n(n − 1) i<j m(m − 1) i<j nm i,j

A related distance is the energy distance (Szekeley 1989, 2002) defined by

d2 (P, Q) = 2E[||X − Y ||] − E[||X − X 0 ||] − E[||Y − Y 0 ||].
The advantage of the energy distance is that there is no tuning parameter. (The RKHS
distance actually requires a bandwidth.) The sample estimate is
2 XX 1 XX 1 XX
||Xi − Yj || − 2 ||Xi − Xj || − 2 ||Yi − Yj ||.
n1 n2 i j n1 i j n2 i j

How do we know when to reject H0 ? One approach is to find the limiting distribution of T
under H0 . This turns out to be, for the RKHS distance,
∞
X
T 2 λj (Zj2 − 1)
j=1

where the Zj ’s are N(0,1) and the λj ’s are the eigenvalues defined by
Z
L(x, y)ψj (x)dP (x) = λj ψj (y)

where L(x, y) = K(x, y) − E[K(x, X)] − E[K(X, x)] + E[K(X, Y )]. This distribution is called
a Gaussian chaos. This distribution has infinitely many nuisance parameters which makes
it un-usable. Instead, we use the permutation distribution to choose the critical value.

It can be shown that

2 1
T − d (P, Q) = OP √
N
where N = n ∧ m. Thus, it appears that the quality of T does not depend on the dimension!
This is false. What matters here is the power. As we shall see below, the minimax power,
that is the smallest detectable difference, is
2β
4β+d
1
N

2
where β is the smoothness. This was proved by Arias-Castro, Pelletier and Saligrama (2016)
based on techniques developed by Ingster (1987). We’ll discuss this more below.

The problem is that the kernel is hiding a lot. To see this, note that T is essentially the
same as Z
ph (x) − qbh (x))2
(b

where pbh and qbh are kernel density estimators. This test was proposed by Anderson, Hall and
Titterington (1994). But remember, the kernel has a tuning √ parameter. If it is Gaussian,
there is a bandwidth. The statement T − d(P, Q) = OP (1/ N ) assumes we do not change
the bandwidth. But to have good power, we need to let the bandwidth go to zero and we no
longer have the fast rate. The power of the RKHS test in general, nonparametric settings is
not well studied.

Now suppose we want a confidence interval for θ = d2 (P, Q). Unfortunately, there is no
known practical method if we use the above estimator. However, we can use the idea in
Gretton et al (2012) to get a simple (but statistically inefficient) method. Instead of using
a U -statistic, we break the sample into blocks of size two. For simplicity, assume that
n1 = n2 = n. Define
2X 1 X
θb = h (X2j−1 , Y2j−1 ), (X2j , Y2j ) ≡ Rj
n j m j

where m = n/2 and

h((xi , yi ), (xj , yj )) = K(Xi , Xj ) + K(Yi , Yj ) − K(Xi , Yj ) − K(Xj , Yi ).

√
It follows from the CLT and Slutzky’s theorem that m(θb − θ)/s N (0, 1) where s2
√ variance of R1 , . . . , Rm . Hence, an asymptotic 1 − α confidence interval is
is the sample
θb ± szα/2 / m.

3 Graph Based Tests

Another class of tests is based on geometric graphs. Let Z1 , . . . , ZN be the combined sample
where N = n + m. Let Li = 1 if Zi is from group 1 and Li = 2 if Zi is from group 2.

Let Ni be the k-nearest neighbors of Zi . Define

n k
1 XX
T = Bj (r)
nk i=1 r=1

where Bj (r) = 1 if the rth nearest neighbor has the same label as Zi . This corresponds to
forming a k nearest neighbor graph and asking how many of the k nearest neighbors are

3
from the same group as the node. The probability of getting the same label under H0 is
µ = π 2 + (1 − π)2 .

It can be shown that, under H0 ,

√
nk(T − µ)
N (0, 1).
σ
The proof is difficult because the test statistic is summing quantities that are not dependent.
The variance σ 2 is known but is very, very complicated. See Schilling (1986a, 1986b). In
practice, we can use the permutation distribution to get the critical value. Under H1 , the
mean of T converges to
Z
p(x)q(x)
θ = 1 − 2π(1 − π) dx
πp(x) + (1 − π)q(x)
which is a distance between p and q. In my experience, this test works well even with k = 1.

In high-dimensions we need to correct the test to account for some strange effects (Mondal,
Biswas and Ghosh, 2015). If P concentrates its data on a ring R and Q concentrates its data
on a larger ring S that surrounds R, then every point in Q can be closer to a point from P .

Here is an example. Let’s take k = 1 and n =Pm. Let Bi = 1 if its nearest neighbor is from
the same group. The test statistic is T = n−1 i Bi . We are testing
1 1
H0 : P (Bi ) = versus H1 : P (Bi ) > .
2 2

Suppose that X1 , X2 ∼ N (µ1 , σ12 I) and Y1 , Y2 ∼ N (µ2 , σ22 I). Take µ1 = (a, . . . , a) and
µ2 = (b, . . . , b). Now,
1 P 1 P 1 P
||X1 − X2 ||2 → 2σ12 , ||Y1 − Y2 ||2 → 2σ22 , ||X1 − Y2 ||2 → σ12 + σ22 + (a − b)2 .
d d d
Let a = 0, b = 0.2, σ12 = 1, σ22 = 1.2. Then

2σ12 < σ12 + σ22 + (a − b)2 < 2σ22 .

Every observation from Q is closer to an observation from P .

The data will look like this:

X1 X2 . . . Xn Y1 Y2 . . . Yn
Bi 1 1 ... 1 0 0 ... 0

We will not reject H0 in this case since (2n)−1 i Bi = 1/2. The problem is that P (Bi =
P
1|Li = 1) = 1 and P (Bi = 1|Li = 2) = 0 but P (Bi = 1) = 1/2. However, if we do a

4
two-sided test, separately within each group, we would reject. Mondal, Biswas and Ghosh
(2015) suggest taking
U = (T1 − θ)2 + (T2 − θ)2
where Tj = (nk)−1 i:Li =j Zj ∈Ni I(Li = Lj ). However, this test can have low power in
P P
other cases. The best strategy is to use both tests i.e. W = T ∨ U .

A similar test, called the cross-match test, was defined by by Rosenbaum (2005). We take
the pooled sample and partitionPthe data into pairs W1 = (Z1 , Z2 ), W2 = (Z3 , Z4 ), . . .. The
partition is chosen to minimize j ||Z2j − Z2j−1 ||2 . Let
X
T = Ai
i

where Ai = 1 if the ith pair has differing labels (i.e. (0,1) or (1,0)) and Ai = 0 otherwise. We
reject when T is small. The exact distribution of T under H0 is known; it is hypergeometric.
It can accurately be approximated with a N (µ, σ 2 ) where

mn 2n(n − 1)m(m − 1)
µ= , σ2 = .
(N − 1) (N − 3)(N − 1)2
This accurate, simple limiting distribution for T under the null is the main advantage of this
test. However, seems to have less power than thePNN test. Also, the distribution of T under
H1 is not known. We could have defined T = i Bi where Bi = 1 − Ai and and rejected
when T is large. This is then the same as the k-NN test with k = 1 except that we allow no
overlap between groups.

4 Smooth Tests

Neyman (1937) introduced a method for testing that takes advantage of smoothness. First,
consider one dimensional data Y1 , . . . , Yn ∼ P . Suppose we want to test

H0 : P = Uniform(0, 1) H1 : P 6= Uniform(0, 1).

If we want to have power against smooth alternatives, Neyman proposed that we define
k
!
X
pθ (x) = c(θ) exp θj ψj (x)
j=1

where ψ1 , ψ2 , . . . , are orthonormal functions and

1
c(θ) = R P .
k
exp j=1 θj ψj (x) dx

5
The null hypothesis corresponds to θ = (θ1 , . . . , θk ) = (0, . . . , 0). One way to test H0 is to
b − `(0)). Under H0 , T
use the likelihood ratio test T = 2(`(θ) χ2k . But Neyman pointed
out that there is a computationally easier test,
X 2
U =n ψj
j

where
1X
ψj = ψj (Xi ).
n i
This also has the property that, under H0 , U χ2k . But it avoids having to deal with the
normalizing constant.

Now we move to the two-sample case. Let F (t) = P (X ≤ t) and G(t) = Q(Y ≤ t). Let
Z = F (Y ). Then the cdf of Z is
H(z) = P(Z ≤ z) = P(F (Y ) ≤ z) = P(Y ≤ R(z)) = G(R(z))
where R(z) = F −1 (z). Under H0 , Z ∼ Unif(0, 1). Now H has density
q(F −1 (z))
ρ(z) =
p(F −1 (z))
and ρ(z) = 1 under H0 . Bera, Ghosh and Xiao (2013) suggest using the family
k
!
X
ρθ (z) = c(θ) exp θj ψj (x) .
j=1
T
Their test statistic is mψ ψ where
1 X
ψj = ψj (Vi )
m i

and Vi = Fbn (Yi ). Bera, Ghosh and Xiao (2013) prove that the statistic again has a limiting
χ2k distribution.

Zhou, Zheng and Zhang (arXiv:1509.03459) considered the high-dimensional case. The con-
sider all one-dimensional projections of the data. There test is
r
nm
T = sup T (u)
n+m u
where the supremum is over the d − 1-dimensional sphere and T (u) is the Bera-Ghosh-Xiao
statistic based on the one-dimensional data uT Yi . They also allow the parameter k to be
chosen from the data. (In fact, they maximize the test over k.)

The limiting distribution of T under H0 is complicated: it is the supremum of a Gaussian

process. Tow get a practical test there are two possibilities. One is to use permutations. The
other is based on a version of the bootstrap called the multiplier bootstrap. Their simulations
suggest that this test works well. But it is unclear how it compares to the other tests.

6
5 Histogram Test

Under smoothness assumptions and compact support, Ingster (1987) showed that optimal
tests can be obtained using histograms. Arias-Castro, Pelletier and Saligrama (2016) ex-
tended this to the multivariate case. Assume smoothness level β. For simplicity let m = n.
Form a histogram with N ≈ n2/(4β+1) bins. Set
X
T = (Cj − Dj )2
j

where Cj is the number of Xi ’s in bin j and let Dj is the number of Yi ’s in bin j. We reject
for T large. This test is, in theory, optimal. In fact, Ingster later showed that the test can
be made adaptive to the degree of smoothness.

6 Sparsity

Let us write
Xi = (Xi (1), . . . , Xi (d)), Yi = (Yi (1), . . . , Yi (d)).
In some cases, we might suspect that P and Q only differ in a few features. In other words,
there is sparsity. If so, the easiest thing is to do all the one-dimensional marginal tests and
a Bonferroni correction. Let Tj be your favorite one dimensional test applied to the jth
feature only. The the statistic to be T = ∨j Tj . This test will have good power in the sparse
case and it is very easy to compute.

7 Minimax Theory

What does it mean for a test to be optimal? Just as there is a theory for minimax estimation,
there is also a theory for minimax testing. We discussed this a few weeks ago. I’ll remind
you of a few basic facts.

To keep it simple, suppose that m = n. We want to test H0 : P = Q. Let P be a set of

distributions and assume that P, Q ∈ P.

Recall that a level α test is a function φ of the data taking values 0 or 1 such that P (φ =
1) ≤ α for every P ∈ H0 . Let Φn denote all level α tests. The minimax type II error, for a
set of distributions P is
βn () = inf sup P n (φ = 0)
φ∈Φn P,Q

where the supremum is over all P, Q ∈ P such that d(P, Q) > . Fix any small δ > 0. We
say that the minimax separation is n if < n implies that βn () ≥ δ.

7
If P is the β smoothness class and d is the L2 distance between densities, then Arias-Castro,
Pelletier and Saligrama (2016) show that
2β
4β+d
1
n .
n
The minimax risk is achieved by the histogram test.

8 Discrete Distributions

Suppose that Xi and Yi are discrete random variables taking values in {1, . . . , d}. Let
Cj = #{Xi = j}, Dj = #{Yi = j}.
Let C = (C1 , . . . , Cd ) and D = (D1 , . . . , Dd ). These are multinomial and we can test
H0 : P = Q using a likelihood ratio test or χ2 test.

But when d is large, the usual tests might have poor power. Improved tests have been devel-
oped by Chan et al (2014) and Diakonikolas and Kane (2016). for example. Moreover, these
tests are designed to have good power against alternatives with respect to total variation
distance. For example, Chan et al propose the test statistic
X (Cj − Dj )2 − (Cj + Dj )
T = .
j
Cj + Dj

We reject
√ when T is large. The prove that this test has good power as long as TV(P, Q) >
d1/4 / n which is the minimax bound.

References

Anderson, Hall and Titterington (1994). Two-sample test statistics for measuring discrep-
ancies between two multivariate probability density functions using kernel-based density
estimates. Journal of Multivariate Analysis, 41-54.

Arias-Castro, Pelletier and Saligrama (2016). arXiv:1607.08156.

Berlinet and Thomas-Agnan (2011). Reproducing kernel Hilbert spaces in probability and
statistics, Springer.

Chan, Siu-On, et al. ”Optimal algorithms for testing closeness of discrete distributions.”
Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms.
Society for Industrial and Applied Mathematics, 2014.

8
Gretton, Borgwardt, Rasch, Malte, Scholkopf and Smola (2007). A kernel method for the
two-sample-problem. NIPS.

Henze, N. (1988). A multivariate two-sample test based on the number of nearest neighbor
type coincidences. The Annals of Statistics, 772-783.

Ingster, Y. (1987) Minimax testing of nonparametric hypotheses on a distribution density in

the Lp metrics. Theory of Probability and Its Applications, 333-337.

Mondal, Biswas and Ghosh (2015). On high dimensional two-sample tests based on nearest
neighbors. Journal of Multivariate Analysis, 168-178.

Rosenbaum (2005). An exact distribution-free test comparing two multivariate distributions

based on adjacency. Journal of the Royal Statistical Society: Series B (Statistical Methodol-
ogy), 515-530.

Schilling, Mark F. ”Multivariate two-sample tests based on nearest neighbors.” Journal of

the American Statistical Association 81.395 (1986): 799-806.

Schilling, M. F. ”Mutual and shared neighbor probabilities: finite-and infinite-dimensional

results.” Advances in Applied Probability 18.02 (1986): 388-405.

Sriperumbudur, Bharath K., et al. ”Hilbert space embeddings and metrics on probability
measures.” Journal of Machine Learning Research 11.Apr (2010): 1517-1561.

Szkeley and Rizzo (2004). Testing for equal distributions in high dimension. InterStat, 1-6.

Project
No ratings yet
Project
6 pages
Data Analysis For Social Scientists (14.1310x)
No ratings yet
Data Analysis For Social Scientists (14.1310x)
12 pages
PHD - Thesis - Final - Statistics Tests
100% (1)
PHD - Thesis - Final - Statistics Tests
154 pages
Solutions Chapter4
100% (2)
Solutions Chapter4
27 pages
Java Programming The Beginning Beginner's Guide PDF
No ratings yet
Java Programming The Beginning Beginner's Guide PDF
67 pages
Types of Statistical Distributions
No ratings yet
Types of Statistical Distributions
34 pages
18.6501x Fundamentals of Statistics
100% (1)
18.6501x Fundamentals of Statistics
8 pages
2015 CIMC Keystage III Team
No ratings yet
2015 CIMC Keystage III Team
11 pages
Module07 Notes
No ratings yet
Module07 Notes
14 pages
Information Retrieval 7 Boolean Model
No ratings yet
Information Retrieval 7 Boolean Model
11 pages
Final Assessment Timetable November 2024-Published UPDATED 24 Oct 2024
No ratings yet
Final Assessment Timetable November 2024-Published UPDATED 24 Oct 2024
10 pages
Introduction To Data Science With R Programming
No ratings yet
Introduction To Data Science With R Programming
12 pages
Lines and Angles
No ratings yet
Lines and Angles
3 pages
Stat520 Ch.5
No ratings yet
Stat520 Ch.5
5 pages
Tolerances and Fits: Min Max
No ratings yet
Tolerances and Fits: Min Max
24 pages
Open Electives Winter 2024-25-28nov - Open Electives Winter 2024-25 - 28nov-1
No ratings yet
Open Electives Winter 2024-25-28nov - Open Electives Winter 2024-25 - 28nov-1
1 page
18.443 MIT Stats Course
No ratings yet
18.443 MIT Stats Course
139 pages
Unit 6 (C++) - Arrays
No ratings yet
Unit 6 (C++) - Arrays
91 pages
PDC TR-06-02 Rev 1 SBEDS Users Guide DistribA
No ratings yet
PDC TR-06-02 Rev 1 SBEDS Users Guide DistribA
95 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Umbrello Handbook X
No ratings yet
Umbrello Handbook X
41 pages
Success Mantra - Class - 10 All Subject
No ratings yet
Success Mantra - Class - 10 All Subject
126 pages
Lynn - Intro Folding in Architecture
No ratings yet
Lynn - Intro Folding in Architecture
7 pages
Ricco Serial Verb Constructions in Three-Participant Event
No ratings yet
Ricco Serial Verb Constructions in Three-Participant Event
50 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
XXXX Hypothesis Testing
No ratings yet
XXXX Hypothesis Testing
101 pages
Hypothesis testing in univariate statistics based on N (µ, σ)
No ratings yet
Hypothesis testing in univariate statistics based on N (µ, σ)
15 pages
Adv Stat II
No ratings yet
Adv Stat II
140 pages
ch6 PDF
No ratings yet
ch6 PDF
47 pages
Lec 05
No ratings yet
Lec 05
28 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Rakhlin Mathstat sp22
No ratings yet
Rakhlin Mathstat sp22
108 pages
Mahamaya Technical University,: Noida
No ratings yet
Mahamaya Technical University,: Noida
47 pages
A Handout On Statistical Approach To Nonparametric Methods
No ratings yet
A Handout On Statistical Approach To Nonparametric Methods
62 pages
Manifold Estimation, Hidden Structure and Dimension Reduction
No ratings yet
Manifold Estimation, Hidden Structure and Dimension Reduction
39 pages
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
No ratings yet
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
35 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Advanced Statistical Inference
No ratings yet
Advanced Statistical Inference
7 pages
On The Optimality of Gaussian Kernel Based Nonparametric Tests Against Smooth Alternatives
No ratings yet
On The Optimality of Gaussian Kernel Based Nonparametric Tests Against Smooth Alternatives
62 pages
4 Feature Selection
No ratings yet
4 Feature Selection
46 pages
9 Fourier Transform Properties: Solutions To Recommended Problems
No ratings yet
9 Fourier Transform Properties: Solutions To Recommended Problems
15 pages
Hypothesis Test - Variance - Section B
No ratings yet
Hypothesis Test - Variance - Section B
40 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
Nonparametric Regression
No ratings yet
Nonparametric Regression
24 pages
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
22 pages
Grad Lecture 3
No ratings yet
Grad Lecture 3
27 pages
On The Superconformal Index of N 1 IR Fixed Points: A Holographic Check
No ratings yet
On The Superconformal Index of N 1 IR Fixed Points: A Holographic Check
27 pages
Sinusoidal Steady State Circuit Analysis (3.1 Study The Ac Basic Circuits)
No ratings yet
Sinusoidal Steady State Circuit Analysis (3.1 Study The Ac Basic Circuits)
66 pages
Nonparametric Classification 10/36-702: 1 1 N N N I I
No ratings yet
Nonparametric Classification 10/36-702: 1 1 N N N I I
20 pages
Lect 8
No ratings yet
Lect 8
22 pages
Graphical Models
No ratings yet
Graphical Models
43 pages
Sparse Additive Models: University of California, Berkeley, USA
No ratings yet
Sparse Additive Models: University of California, Berkeley, USA
22 pages
On The Optimality of Kernel-Embedding Based Goodness-of-Fit Tests
No ratings yet
On The Optimality of Kernel-Embedding Based Goodness-of-Fit Tests
45 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Power
No ratings yet
Power
29 pages
Lecture 8: Inference 36-401, Fall 2015, Section B
No ratings yet
Lecture 8: Inference 36-401, Fall 2015, Section B
16 pages
Two Populations PDF
No ratings yet
Two Populations PDF
16 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
Adaptive Rank-Based Tests For High Dimensional Mean Problems
No ratings yet
Adaptive Rank-Based Tests For High Dimensional Mean Problems
17 pages
Amidakuji.: Wednesday, January 26, 2011
No ratings yet
Amidakuji.: Wednesday, January 26, 2011
10 pages
BSDS Slides Module 8 9 11
No ratings yet
BSDS Slides Module 8 9 11
14 pages
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
Signtest
No ratings yet
Signtest
16 pages
22-23 323 Week6Notes
No ratings yet
22-23 323 Week6Notes
28 pages
2003 Bernoulli
No ratings yet
2003 Bernoulli
24 pages
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
No ratings yet
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
12 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
10/36-702 Statistical Machine Learning Homework #2 Solutions
No ratings yet
10/36-702 Statistical Machine Learning Homework #2 Solutions
11 pages
Coin: A Computational Framework For Conditional Inference
No ratings yet
Coin: A Computational Framework For Conditional Inference
11 pages
Model Selection and Multiple Hypothesis Testing
No ratings yet
Model Selection and Multiple Hypothesis Testing
6 pages
Assignment - Lagrangian Based Problems 1
No ratings yet
Assignment - Lagrangian Based Problems 1
15 pages
Armstrong 2018
No ratings yet
Armstrong 2018
62 pages
A Powerful Chi-Square Specification Test With Support Vectors
No ratings yet
A Powerful Chi-Square Specification Test With Support Vectors
34 pages
10 Meanvector PDF
No ratings yet
10 Meanvector PDF
10 pages
EC501 Lecture 03
No ratings yet
EC501 Lecture 03
30 pages
04 Testing
No ratings yet
04 Testing
35 pages
Feature Selection: The Goals
No ratings yet
Feature Selection: The Goals
10 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
Supersolid Phases of Hardcore Bosons On The Square Lattice: Correlated Hopping, Next-Nearest Neighbor Hopping and Frustration
No ratings yet
Supersolid Phases of Hardcore Bosons On The Square Lattice: Correlated Hopping, Next-Nearest Neighbor Hopping and Frustration
20 pages
Lecture 22
No ratings yet
Lecture 22
33 pages
FEWS NET Matrix Example
No ratings yet
FEWS NET Matrix Example
10 pages
Presentation RAMSDS 2024 Final
No ratings yet
Presentation RAMSDS 2024 Final
30 pages
Local Minimax
No ratings yet
Local Minimax
75 pages
Online Learning: T T T T T T T T
No ratings yet
Online Learning: T T T T T T T T
8 pages
Stat 245 Homework 3 Solution
No ratings yet
Stat 245 Homework 3 Solution
8 pages
Computer Statistics With R: 8. Nonparametric Tests
No ratings yet
Computer Statistics With R: 8. Nonparametric Tests
10 pages
HASTS215 - HSTS215 NOTES Chapter5
No ratings yet
HASTS215 - HSTS215 NOTES Chapter5
18 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
ISOM2500 Spring 25 - Topic 9
No ratings yet
ISOM2500 Spring 25 - Topic 9
71 pages
MIT15 075JF11 chpt08
No ratings yet
MIT15 075JF11 chpt08
9 pages
1 Review
No ratings yet
1 Review
7 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
HMWK 4
No ratings yet
HMWK 4
5 pages
Asymptotic Relative Efficiency of Tests: ARE on a G String: H θ H θ T H T K π θ P T K, θ n α α, π θ α
No ratings yet
Asymptotic Relative Efficiency of Tests: ARE on a G String: H θ H θ T H T K π θ P T K, θ n α α, π θ α
8 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
8 pages
The Fifteen Puzzle
No ratings yet
The Fifteen Puzzle
6 pages
Detecting A Vector Based On Linear Measurements: Ery Arias-Castro
No ratings yet
Detecting A Vector Based On Linear Measurements: Ery Arias-Castro
9 pages
Asymptotically Optimal One - and Two-Sample Testing With Kernels
No ratings yet
Asymptotically Optimal One - and Two-Sample Testing With Kernels
19 pages
Boosting: I I I I
No ratings yet
Boosting: I I I I
5 pages
iMS HW4 Ignoring Dependence
No ratings yet
iMS HW4 Ignoring Dependence
6 pages
Selectedsolutions 3
No ratings yet
Selectedsolutions 3
6 pages
Aaai07 262
No ratings yet
Aaai07 262
5 pages
Nonlinear Solid Mechanics A Continuum Ap PDF
No ratings yet
Nonlinear Solid Mechanics A Continuum Ap PDF
2 pages
Lecture 13
No ratings yet
Lecture 13
6 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
Cours 2 MVA
No ratings yet
Cours 2 MVA
5 pages
Data Analysis Exam 1 36-401, Section B
No ratings yet
Data Analysis Exam 1 36-401, Section B
3 pages
Math 1210 Project 2
No ratings yet
Math 1210 Project 2
3 pages
Math204 NonParThree
No ratings yet
Math204 NonParThree
4 pages
Predicting High-Speed Machining Dynamics by Substructure Analysis
No ratings yet
Predicting High-Speed Machining Dynamics by Substructure Analysis
6 pages
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
No ratings yet
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
3 pages
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
No ratings yet
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
2 pages
Errata
No ratings yet
Errata
2 pages
HW7
No ratings yet
HW7
1 page
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

High-Dimensional, Two-Sample Testing

Uploaded by

High-Dimensional, Two-Sample Testing

Uploaded by

High-Dimensional, Two-Sample Testing

We observe two iid sample

where Xi , Yi ∈ Rd . We want to test

Why are we interested in two-sample testing? We might be interested in testing whether

In general, estimating d(P, Q) is difficult. But if we take G to be a RKHS defined by a kernel

The plus-in estimator of d2 (P, Q) is

A related distance is the energy distance (Szekeley 1989, 2002) defined by

It can be shown that  

where m = n/2 and

h((xi , yi ), (xj , yj )) = K(Xi , Xj ) + K(Yi , Yj ) − K(Xi , Yj ) − K(Xj , Yi ).

3 Graph Based Tests

Let Ni be the k-nearest neighbors of Zi . Define

It can be shown that, under H0 ,

2σ12 < σ12 + σ22 + (a − b)2 < 2σ22 .

Every observation from Q is closer to an observation from P .

The data will look like this:

H0 : P = Uniform(0, 1) H1 : P 6= Uniform(0, 1).

where ψ1 , ψ2 , . . . , are orthonormal functions and

The limiting distribution of T under H0 is complicated: it is the supremum of a Gaussian

To keep it simple, suppose that m = n. We want to test H0 : P = Q. Let P be a set of

Arias-Castro, Pelletier and Saligrama (2016). arXiv:1607.08156.

Ingster, Y. (1987) Minimax testing of nonparametric hypotheses on a distribution density in

Rosenbaum (2005). An exact distribution-free test comparing two multivariate distributions

Schilling, Mark F. ”Multivariate two-sample tests based on nearest neighbors.” Journal of

Schilling, M. F. ”Mutual and shared neighbor probabilities: finite-and infinite-dimensional

You might also like

It can be shown that