NonParametric Methods
NonParametric Methods
Edited by
P. Bickel
P. Diggle
S. Fienberg
U. Gather
I. Olkin
S. Zeger
Multivariate Nonparametric
Methods with R
An Approach Based on Spatial Signs
and Ranks
ABC
Prof. Hannu Oja
University of Tampere
Tampere School of Public Health
FIN-33014 Tampere
Finland
[email protected]
ISSN 0930-0325
ISBN 978-1-4419-0467-6 e-ISBN 978-1-4419-0468-3
DOI 10.1007/978-1-4419-0468-3
Springer New York Dordrecht Heidelberg London
This book introduces a new way to analyze multivariate data. The analysis of data
based on multivariate spatial signs and ranks proceeds very much as does a tradi-
tional multivariate analysis relying on the assumption of multivariate normality: the
L2 norm is just replaced by different L1 norms, observation vectors are replaced by
their (standardized and centered) spatial signs and ranks, and so on. The methods are
fairly efficient and robust, and no moment assumptions are needed. A unified the-
ory starting with the simple one-sample location problem and proceeding through
the several-sample location problems to the general multivariate linear regression
model and finally to the analysis of cluster-dependent data is presented.
The one-sample location case is treated thoroughly in Chapters 5-8. The book
then starts with the familiar Hotelling’s T 2 test and the corresponding estimate, the
sample mean vector. The spatial sign test with the spatial median as well as the
spatial signed-rank test with Hodges-Lehmann estimate is treated in Chapters 6 and
7. All the tests and estimates are made practical; the algorithms for the estimates
and the estimates of the covariance matrices of the estimates are also discussed and
described in detail. Chapter 7 is devoted to the comparisons of these three competing
approaches.
vii
viii Preface
Sign and rank tests with companion estimates for the comparison of two or sev-
eral treatment effects are given in Chapter 11 (independent samples) and Chapter 12
(randomized block design). The general multivariate multiple regression case with
L1 objective functions is finally discussed in Chapter 13. The book ends with sign
and rank procedures for cluster-dependent data in Chapter 14.
Throughout the book, the theory is illustrated with examples. For computation
of the statistical procedures described in the book, the R package MNM (and Spa-
tialNP) is available on CRAN. In the analysis we always compare three different
score functions, the identity score, the spatial sign score, and the spatial rank (or
spatial signed-rank) score, and the general estimating and testing strategy is ex-
plained in each case. Some basic vector and matrix algebra tools and asymptotic
results are given in Appendices A and B.
Acknowledgements
The research reported in this book is to a great degree based on the thesis work
of several ex-students of mine, including Ahti Niinimaa, Jyrki Möttönen, Samuli
Visuri, Esa Ollila, Sara Taskinen, Jaakko Nevalainen, Seija Sirkiä, and Klaus Nord-
hausen. I wish to thank them all. This would not have been possible without their
work. I have been lucky to have such excellent students. My special thanks go to
Klaus Nordhausen for his hard work in writing and putting together (with Jyrki
and Seija) the R-code to implement the theory. I am naturally also indebted to
many colleagues and coauthors for valuable and stimulating discussions. I express
my sincere thanks for discussions and cooperation in this specific research area
with Biman Chakraborty, Probal Chaudhuri, Christopher Croux, Marc Hallin, Tom
Hettmansperger, Visa Koivunen, Denis Larocque, Jukka Nyblom, Davy Paindav-
eine, Ron Randles, Bob Serfling, Juha Tienari, and Dave Tyler.
Thanks are also due to the Academy of Finland for several research grants for work-
ing in the area of multivariate nonparametric methods. I also thank the editors of this
series and John Kimmel of Springer-Verlag for his encouragement and patience.
Tampere,
January 2010 Hannu Oja
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
ix
x Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Notation
xiii
Chapter 1
Introduction
The univariate concepts of sign and rank are based on the ordering of the uni-
variate data y1 , ..., yn . The ordering is manifested with the univariate sign function
U(y) with values −1, 0, and 1 for y < 0, y = 0, and y > 0, respectively. The sign
and centered rank of the observation yi is then U(yi ) and AVE j {U(yi − y j )}. In the
multivariate case there is no natural coordinate-free ordering of the data points; see
Barnett (1976) for a discussion on the problem. An approach utilizing L1 objective
or criterion functions is therefore often used to extend these concepts to the multi-
variate case. Let Y = (y1 , ..., yn ) be an n × p data matrix with n observations and
p variables. The multivariate spatial sign Ui , multivariate spatial (centered) rank Ri ,
and multivariate spatial signed-rank Qi , i = 1, ..., n, may be implicitly defined us-
ing the three L1 criterion functions with Euclidean norm | · |. The sign, rank and
signed-rank are then defined implicitly by
AVE{|yi |} = AVE{Ui yi },
1
AVE{|yi − y j |} = AVE{Ri yi }, and
2
1
AVE{|yi − y j | + |yi + y j |} = AVE{Qi yi }.
4
See Hettmansperger and Aubuchon (1988). Note also that the sign, centered rank,
and signed-rank may be seen as scores T(y) corresponding to the three objective
functions. These score functions then are
U(y) = |y|−1 y,
R(y) = AVE{U(y − yi)}, and
1
Q(y) = AVE{U(y − yi) + U(y + yi)}.
2
The identity score T(yi ) = yi , i = 1, ..., n, is the score corresponding to the regular
L2 criterion AVE{|yi |2 } = AVE{yi yi }.
Multivariate spatial sign and spatial rank methods are thus based on the L1 ob-
jective functions and corresponding score functions. The L1 methods have a long
history in the univariate case but they are often regarded as computationally highly
demanding. See, however, Portnoy and Koenker (1997). The first objective function
AVE{|yi |}, if applied to the residuals in the general linear regression model, is the
mean deviation of the residuals from the origin, and it is the basis for the so called
least absolute deviation (LAD) methods. It yields different median-type estimates
and sign tests in the one-sample, two-sample, several-sample and finally general
linear model settings. The second objective function AVE{|yi − y j |} is the mean
difference of the residuals which in fact measures how close together the residu-
als are. The second and third objective functions generate Hodges-Lehmann type
estimates and rank tests for different location problems.
The general strategy in the analysis of the multivariate data followed in this book
is first to replace the original observations yi by some scores Ti = T(yi ) or, in more
complex designs, by centered and/or standardized scores T̂i = T̂(yi ), i = 1, ..., n.
The statistical tests are then based on the new data matrix
The spatial sign score U(y), the spatial rank score R(y), and the spatial signed-
rank score Q(y) thus correspond to the three L1 criterion functions given above.
The tests are then rotation invariant but not affine invariant. Inner centering and/or
standardization is used to attain the desired affine invariance property of the tests.
The location estimates are chosen to minimize the selected criterion function; they
are also obtained if one applies inner centering with the corresponding score. Inner
standardization may be used to construct affine equivariant versions of the estimates
1 Introduction 3
and as a side product one gets scatter matrix estimates for the inference on the
covariance structure of the data.
The tests and estimates for the multivariate location problem based on multivari-
ate signs and ranks have been widely discussed in the literature. See, for example,
Möttönen and Oja (1995), Choi and Marden (1997), Marden (1999a), and Oja and
Randles (2004). The scatter matrix estimates by Tyler (1987) and Dümbgen (1998)
are often used in inner standardizations. The location tests and estimates are robust
and they have good efficiency properties even in the multivariate normal model.
Möttönen et al. (1997) calculated the asymptotic efficiencies e1 (p, ν ) and e2 (p, ν )
of the multivariate spatial sign and rank methods, respectively, in the p-variate tν ,p
distribution case (t∞,p is the p-variate normal distribution). In the 3-variate case, for
example, the asymptotic efficiencies are
The procedures based on spatial signs and ranks, however, yield only one
possible approach to multivariate nonparametric tests (sign test, rank test) and cor-
responding estimates (median, Hodges-Lehmann estimate). Randles (1989), for
example, developed an affine invariant sign test based on interdirections. Interdi-
rections measure the angular distance between two observation vectors relative to
the rest of the data. Randles (1989) was followed by a series of papers introduc-
ing nonparametric sign and rank interdirection tests for different location problems.
These tests are typically asymptotically equivalent with affine invariant versions of
the spatial sign and rank tests. The tests and estimates based on interdirections are,
unfortunately, computationally heavy.
The multivariate inference methods based on marginal signs and ranks are de-
scribed in detail in the monograph by Puri and Sen (1971). This first extension of
the univariate sign and rank methods to the multivariate setting is based on the cri-
terion functions
Affine equivariant multivariate signs and ranks are obtained if one uses the L1
criterion functions
where the average is over all the p-tuples and (p + 1)-tuples of observations, respec-
tively, and
1 1 ··· 1
V (y1 , ..., y p+1 ) = abs det
p! y1 · · · y p+1
is the volume of the p-variate simplex with vertices y1 , ..., y p+1 . The one-sample
location estimate is known as the Oja median Oja (1983). The approach in the
one-sample and two-sample location cases is described in Oja (1999). In a paral-
lel approach Koshevoy and Mosler (1997a,b, 1998) used so-called zonotopes and
lift-zonotopes to illustrate and characterize a multivariate data cloud. The duality
relationship between these two approaches is analyzed in Koshevoy et al. (2004).
In all the approaches listed above the power of the rank tests may often be in-
creased, if some further transformations are applied to the ranks. In a series of pa-
pers, Hallin and Paindaveine constructed optimal signed-rank tests for the location
and scatter problems in the elliptical model; see the series of papers starting with
Hallin and Paindaveine (2002). In their approach, the location tests were based on
the spatial signs and optimally transformed ranks of the Euclidean lengths of the
standardized observations. As the model of elliptically symmetric distributions, the
so called independent component (IC) model is also an extension of the multivariate
normal model. In the test construction in this model, one first transforms the obser-
vations to the estimated independent coordinates, then calculates the values of the
marginal sign and rank test statistics, and finally combines the asymptotically inde-
pendent tests in the regular way. See Nordhausen et al. (2009) and Oja et al. (2009)
for optimal rank tests in the IC model.
In this book the approach is thus based on the spatial signs and ranks. A uni-
fied theory starting with the simple one-sample location problem and proceeding
through the several-sample location problems to the general multivariate linear re-
gression model and finally to the analysis of cluster-dependent data is presented.
The theory is often presented using a general score function and, for comparison
to classical methods, the classical normal-based approach (L2 criterion and identity
score function) is carefully reviewed in each case. Also statistical inference on scat-
ter or shape matrices based on the spatial signs and ranks is discussed. The theory is
illustrated with several examples. For the computation of the statistical procedures
described in the book, R packages MNM and SpatialNP are available on CRAN.
The readers who are not so familiar with R are advised to learn more from Venables
et al. (2009), Dalgaard (2008), or Everitt (2005).
Chapter 2
Multivariate location and scatter models
Abstract In this chapter we first introduce and describe different symmetrical and
asymmetrical parametric and semiparametric (linear) models which are then later
used as the model assumptions in the statistical analysis. The models discussed in-
clude multivariate normal distribution N p (μ , Σ ) and its different extensions includ-
ing multivariate t distribution tν ,p (μ , Σ ) distribution, multivariate elliptical distribu-
tion E p (μ , Σ , ρ ), as well as still wider semiparametric symmetrical models. Also
some models with skew distributions (generalized elliptical model, mixture models,
skew-elliptical model, independent component model) are briefly discussed.
where M (n, p) is the set of n × p matrices. In the one-sample case, the p-variate
observations yi , i = 1, ..., n, may be thought to be independent and to be generated
by
yi = μ + Ω ε i , i = 1, ..., n,
where the p-vectors ε i are called standardized and centered residuals, μ is a lo-
cation p-vector, Ω is a full-rank p × p transformation matrix and Σ = Ω Ω > 0
is called a scatter matrix. We are more explicit later regarding what is meant by a
standardized and centered random vector. (It is usual to say that ε i is standardized
if COV(ε i ) = I p , and it is centered if E(ε i ) = 0.) Notation Σ > 0 means that Σ is
positive definite (with rank p).
X = (x1 , ..., xn )
multivariate normal model (A0), the location parameter μ is the well defined sym-
metry center of the distribution of yi . Extra assumptions are needed to make Σ well
defined. In the models (A0)–(A2) the scatter matrix Σ has a natural interpretation;
Σ is proportional to the covariance matrix if it exists.
Under the assumptions (B1)–(B4), parameters μ and Σ still describe the location
and the scatter of the distribution of the ε i . Under assumption (B2), for example, the
directions ui = |ε i |−1 ε i of the transformed (standardized and centered) observations
ε i = Ω −1 (yi − μ ) are “uniformly” distributed in the sense that E(ui ) = 0 and p ·
E(ui ui ) = I p . (Then μ and Σ are the so called Hettmansperger-Randles functionals
which are discussed later in the book.) Even in the weakest model (B4), μ is a
natural location parameter in the sense that E(ui ) = 0. (We later show that the spatial
median of ε i is then the zero vector.) Also, all hyperplanes going through μ divide
the probability mass of the distribution of yi into two parts of equal size 1/2. (Then
μ is the so called half-space median or Tukey median.) Note, however, that in this
models the scatter matrix Σ is no longer related to the regular covariance matrix.
Theorem 2.1. The assumptions (A0)–(A4) and (B1)–(B4) satisfy the joint hierarchy
yi = μ + Ω ε i , i = 1, ..., n.
We assume that ε 1 , ..., ε n are independent and identically distributed random vectors
from a spherically symmetrical and continuous distribution. We say that the distri-
bution of ε is spherically symmetrical around the origin if the density function f (ε )
of ε depends on ε only through the modulus |ε |. We can then write
for some function ρ (r). Note that the equal density contours are then spheres. The
modulus ri = |ε i | and direction ui = |ε i |−1 ε i are independent, and the direction
vector ui is uniformly distributed on the p-dimensional unit sphere S p . It is then
easy to see that
1
E(ui ) = 0 and COV(ui ) = E(ui ui ) = Ip
p
The density of the modulus is
where
2π p/2
cp =
Γ 2p
is the surface area of the unit sphere S p . The scatter matrix Σ = Ω Ω is, however,
confounded with g. To fix the scatter matrix Σ one can then, for example, assume
that ρ is chosen so that E(ri2 ) = p or Med(ri2 ) = χ p,.5
2
(the median of the chi-square
distribution with p degrees of freedom). Then Σ is the regular covariance matrix in
the multivariate normal case.
Under these assumptions, the random sample Y = (y1 , ..., yn ) comes from a p-
variate elliptical distribution with probability density function
fy (y) = |Σ |−1/2 f Σ −1/2 (y − μ ) ,
where μ is the symmetry center and Σ > 0 is the scatter matrix (parameter). The
matrix Σ −1/2 is chosen here to be symmetric. The location parameter μ is the mean
vector (if it exists) and the scatter matrix Σ is proportional to the regular covariance
matrix (if it exists). We also write
yi ∼ E p (μ , Σ , ρ ).
2.2 Multivariate elliptical distributions 9
Note that the transformation matrix Ω is not uniquely defined in the elliptical
model as, for any orthogonal matrix O, Ω ε i = (Ω O)(O ε i ) = Ω ∗ ε ∗i and also ε ∗i
has a spherically symmetric distribution with density f . In principal component
analysis the eigenvectors and eigenvalues of Σ are of interest; the orthogonal matrix
of eigenvectors O = O(Σ ) and the diagonal matrix of eigenvalues D = D(Σ ) (in a
decreasing order of magnitude) are obtained from the eigenvector and eigenvalue
decomposition
Σ = ODO .
Write diag(Σ ) for a diagonal matrix having the same diagonal elements as Σ . Often,
the scatter matrix is normalized as
This is the correlation matrix (if it exists). Another way to normalize the scatter
matrix is to divide it by a scale parameter.
Definition 2.2. Let Σ be a scatter matrix. Then a scale parameter σ 2 = σ 2 (Σ ) is the
scalar-valued function that satisfies
σ 2 (I p ) = 1 and σ 2 (c · Σ ) = c · σ 2(Σ ).
See Paindaveine (2008) for a discussion of the choices of the scale and shape pa-
rameters.
The distribution of the direction vector is easily obtained in the multivariate nor-
mal case as the components of ε i are independent and N(0, 1) distributed. Thus
εi2j ∼ χ12 and εi2j /2 ∼ Γ (1/2). But then
⎛ ⎞ ⎛ 2⎞
u2i1 εi1
⎝ ... ⎠ = 1 ⎝ ... ⎠
u2ip ε i ε i ε 2
ip
has a so called Dirichlet distribution D p (1/2, ..., 1/2). See Section 3.3 in Bilodeau
and Brenner (1999). As the distribution of ui is the same for all spherical distribu-
tions, we have the following.
Theorem 2.2. Let the distribution of a p-variate random vector ε be spherically
symmetric around the origin, and let u = |ε |−1 ε be the direction vector. Then
(u21 , ..., u2p ) has the Dirichlet distribution D p (1/2, ..., 1/2). Moreover, ∑ki=1 u2i ∼
Beta(k/2, (p − k)/2).
The components of ε i are uncorrelated, but not independent, and their marginal
distribution is a univariate tν distribution. The smaller ν is, the heavier are the tails
of the distribution. The expected value exists if ν ≥ 2, and the covariance matrix
exists for degrees of freedom ν ≥ 3. The very heavy-tailed distribution with ν = 1
is called the multivariate Cauchy distribution. The multivariate normal distribution
is obtained as a limit case as ν → ∞. The distribution of yi = μ + Ω ε i is denoted by
tν ,p (μ , Σ ). If ε ∗ ∼ N p (0, I p ) and s2 ∼ χν2 and ε ∗ and s2 are independent then
ε = (s2 /ν )−1/2 ε ∗ ∼ tν ,p .
where
pΓ (p/2)
k p,ν =
π p/2Γ ((2ν + p)/(2ν ))2(2ν +p)/(2ν )
2.3 Other distribution families 11
is determined so that the density integrates up to 1. Now |εi |2ν ∼ Γ (1/2, p/(2ν ))
which can be used to simulate the observations from this flexible parametric model.
If ν = 1 then εi ∼ N p (0, I p ). The model includes both heavy-tailed (ν > 1) and light-
tailed (ν < 1) elliptical distributions. The (heavy-tailed) multivariate double expo-
nential (Laplace) distribution is given by ν = 1/2 and a (light-tailed) multivariate
uniform elliptical distribution is obtained as a limit when ν → ∞. The distribution
of yi = Ω ε i + μ is denoted by PE(μ , Σ , ν ). See Gómez et al. (1998).
Example 2.4. Location-scatter model. (A2) is a wider model than the elliptical
model assuming only that the components of ε i are exchangeable and marginally
symmetric; that is,
where || · || is any permutation and sign change invariant metric (||JPε || = ||ε || for
all J and P). This is true for the Lα -norm
for example, and therefore a wide variety of distributional shapes is available in this
model. Recall that the elliptical model is given with the L2 -norm.
Note that if we still weaken the assumptions and assume only that
Example 2.5. Generalized elliptical model (B1) and its extension (B2). The defi-
nition of the generalized elliptical model was given by Frahm (2004). In this model
he assumes that
Oui ∼ ui , for all orthogonal O.
The direction vectors ui = |ε i |−1 ε i are then distributed as if the original observa-
tions ε i were coming from an elliptically symmetric distribution. Randles (2000)
considered the same assumption saying that the distribution has “elliptical direc-
tions” However, no assumptions on the distribution of modulus ri = |ε i | are made;
skew distributions may be obtained if the distribution of ri depends on ui . Again, a
weaker model is obtained if
Still in this wider model, the spatial median of ε i is 0 and the so-called Tyler’s scat-
ter matrix (discussed later) is proportional to the identity matrix. This fixes location
and shape of the distribution of yi , that is, parameters μ and Σ (up to scale).
π1 μ 1 + π2 μ 2 = 0 and π1 (Σ 1 + μ 1 μ 1 ) + π2(Σ 2 + μ 2 μ 2 ) = I p .
This mixture of two distributions can be easily extended to the case of general k ≥ 2
mixtures, and also to the case where the observations come from other elliptical
distributions. See McLachlan and Peel (2000).
where ε ∗i is coming from E p (0, I p , ρ ) and si is a random variable with possible val-
ues ±1 possibly depending on ε ∗i . Note that if μ = 0 is known then Σ is proportional
to the regular covariance matrix with respect to the origin; that is E(yi yi ) ∝ Σ . See
Azzalini (2005) and references therein.
Chapter 3
Location and scatter functionals and sample
statistics
We first define what we mean by a location vector and a scatter matrix defined as
vector- and matrix-valued functionals in wide nonparametric families of multivari-
ate distributions (often including discrete distributions as well). Let y be a p-variate
random variable with cumulative distribution function (cdf) Fy .
Definition 3.1.
(i) A p-vector M(F) is a location vector (functional) if it is affine equivariant; that
is,
M(FAy+b ) = AM(Fy ) + b
for all random vectors y, all full-rank p × p-matrices A and all p-vectors b.
(ii) A symmetric p × p matrix S(F) ≥ 0 is a scatter matrix (functional) if it is affine
equivariant in the sense that
for all random vectors y, all full-rank p × p-matrices A and all p-vectors b.
Classical location and scatter functionals, namely the mean vector E(y) and the
covariance matrix
COV(y) = E (y − E(y))(y − E(y)) ,
serve as first examples. Note that the matrix of second moments E(yy ), for example,
is not a scatter matrix in the regular sense but can be seen as a scatter matrix with
respect to the origin.
The theory of location and scatter functionals has been developed mainly to find
new tools for robust estimation of the regular mean vector and covariance matrix
in a neighborhood of the multivariate normal model or in the wider model of el-
liptically symmetric distributions. The competitors of the regular covariance matrix
do not usually have the so called independence property: if random vector y has
independent components then S(Fy ) is a diagonal matrix. It is easy to see that the
regular covariance matrix has the independence property. Naturally this property is
not important in the elliptic model as the multivariate normal distribution is the only
elliptical distribution that can have independent margins. On the other hand, the
independence property is crucial if one is working in the independent component
model mentioned in Chapter 2 (the ICA problem).
Using the affine equivariance properties, one easily gets the following.
The determinant det(S) or trace tr(S) is often used as a global measure of multi-
variate scatter. In fact, [det(S)]1/p is the geometric mean and tr(S)/p the arithmetic
mean of the eigenvalues of S. The functional det(COV(y)) is sometimes called the
3.1 Location and scatter functionals 17
There are several alternative competing techniques to construct location and scat-
ter functionals, for example, M-functionals, S-functionals, τ -functionals, projection-
based functionals, CM- and MM-functionals, and so on. These functionals and
related estimates are discussed throughout in numerous research and review pa-
pers, see, fro example, Maronna (1976), Davies (1987), Lopuhaä (1989), and Tyler
(2002). See also the recent monograph by Maronna et al. (2006). A common fea-
ture in these approaches is that the functionals and related estimates are built for
inference in elliptical models only. Next we consider M-functionals in more detail.
Definition 3.2. Location and scatter M-functionals are functionals M = M(Fy ) and
S = S(Fy ) which simultaneously satisfy two implicit equations
and
S = [E[w3 (r)]]−1 E w2 (r)(y − M)(y − M)
for some suitably chosen weight functions w1 (r), w2 (r), and w3 (r). The random
variable r is the Mahalanobis distance between y and M; that is,
r = |y − M|S = (y − M)S−1 (y − M).
Consider an elliptic model with known ρ (r) and its derivative function ψ (r) =
ρ (r). If one then chooses w1 (r) = w2 (r) = ψ (r)/r and w3 (r) ≡ 1, the M-functionals
are called the pseudo maximum likelihood (ML) functionals corresponding to that
specific distribution determined by ρ . In the multivariate normal case w1 (r) ≡
w2 (r) ≡ 1, and the corresponding functionals are the mean vector and the covari-
ance matrix again. A classical M-estimate is Huber’s M-functional with choices
(and w3 (r) ≡ 1) with some positive tuning constants c and d. The value of the func-
tional does not depend strongly on the tails of the distribution; the tuning constant c
controls this property. The constant d is just a scaling factor.
If M1 (F) and S1 (F) are any affine equivariant location and scatter functionals
then so are the one-step M-functionals, starting from M1 and S1 , and given by
and
S2 = E w2 (r)(y − M1 )(y − M1 ) ,
18 3 Location and scatter functionals and sample statistics
where now r = |y − M1 |S1 . It is easy to see that M2 and S2 are affine equivariant
location and scatter functionals as well. Repeating this step until it converges yields
the regular M-estimate (with w3 (r) ≡ 1).
In this section we introduce sample statistics to estimate unknown location and scat-
ter parameters μ and Σ in different models (or the unknown theoretical or population
values of location and scatter functionals M(F) and S(F)). The values of the sam-
ple statistics are similarly denoted by M(Y) and S(Y), where Y = (y1 , ..., yn ) ∈
M (n, p) (sample space).
If Y = (y1 , ..., yn ) is a random sample, it is then often natural that location and
scatter statistics are invariant in the following sense.
Definition 3.4.
(i) A location statistic M is permutation invariant if M(Y) = M(PY) for all Y and
all n × n permutation matrices P.
(ii) A scatter statistic S is permutation invariant if S(Y) = S(PY) for all Y and all
n × n permutation matrices P.
Note that if M(Y) and S(Y) are not permutation invariant, invariant estimates
(with the same bias but with smaller variation; use the Rao-Blackwell theorem) can
be easily obtained as
In the one-sample location test constructions we often use scatter statistics S with
respect to the origin that are permutation and sign-change invariant, that is,
S(JPYA ) = AS(Y)A
3.3 First and second moments of location and scatter statistics 19
for all Y with S(Y) > 0, all p × p matrices A, all n × n permutation matrices P, and
all sign-change matrices J.
Location and scatter functionals M(F) and S(F) yield corresponding sample
statistics simply by applying the definition to a discrete random variable Fn ,
and
S = [AVE[w3 (ri )]]−1 AVE w2 (ri )(yi − M)(yi − M)
for weight functions w1 (r), w2 (r), and w3 (r). The scalar ri is the Mahalanobis dis-
tance between yi and M; that is, ri = |yi − M|S .
Why do we need different location and scatter functionals and statistics? In symmet-
rical models (A0)–(A4) all location statistics M(Y) estimate the symmetry center μ .
In models (A0)–(A2) all scatter statistics S(Y) estimate the same population quan-
tity (up to a scaling factor). The statistical properties (convergence, limiting distribu-
tion, efficiency, robustness, computational convenience, etc.) of the estimates may
considerably differ, however. One can then just pick up an estimate that is best for
his or her purposes.
To consider the possible bias and the accuracy of location and scatter statistics we
next find a general structure for their first and second moments for random samples
coming from an elliptical distribution (A1) as well as from a location-scatter model
(A2). For most results in this section, we refer to Tyler (1982).
In the following, it is notationally easier to work with the vectors rather than with
the matrices. The “vec” operation is used to vectorize a matrix. If S > 0 is a scatter
20 3 Location and scatter functionals and sample statistics
matrix then vec(S) is a vector obtained by stacking the columns of S on top of each
other: ⎛ ⎞
s1
⎜ .. ⎟
vec(S) = vec((s1 , ..., s p )) = ⎝ . ⎠ .
sp
Moreover for treating the p2 × p2 covariance matrices of the vectorized p × p scatter
matrices, the following matrices prove very useful for bookkeeping. Let ei be a p-
vector with the ith element one and others zero, i = 1, ..., p. Then ∑i=1 ei ei = I p and
p
we write
p
D p,p = ∑ (ei ei ) ⊗ (eiei ),
i=1
p p
J p,p = ∑ ∑ (ei ej ) ⊗ (eiej ),
i=1 j=1
p p
K p,p = ∑ ∑ (ei ej ) ⊗ (e j ei ), and
i=1 j=1
p p
I p,p = ∑ ∑ (ei ei ) ⊗ (e j ej ).
i=1 j=1
Naturally I p,p = I p2 .
tr(A)
P3 vec(A) = vec(I p ) .
p
3.3 First and second moments of location and scatter statistics 21
Consider the location-scatter model (A2) with the assumption that the standard-
ized variable ε i is marginally symmetrical and exchangeable. Then the following
lemma yields a general structure for the first and second moments of location and
scatter statistics, M = M(ε ) and S = S(ε ) calculated for ε = (ε 1 , ..., ε n ) . These
statistics then clearly satisfy
Theorem 3.2. Assume that the p-variate random vector M satisfies PJM ∼ M for
all p × p permutation matrices P and all sign-change matrices J. Then there is a
positive constant σ 2 such that
Second, assume that the random symmetric p × p matrix S > 0 satisfies PJSJP ∼ S
for all p × p permutation matrices P and all sign-change matrices J. Then there are
positive constants η , τ1 , and τ2 and a constant τ3 such that
where
Here Cov(S11 , S22 ) means the covariance between any two different diagonal ele-
ments of S.
Proof. As E(M) = PJ E(M) and COV(M) = PJ COV(M) JP , one easily sees that
E(T) = 0 and COV(M) = σ 2 I p for some σ 2 > 0. Similarly, it is straightforward to
see that E(S) = η I p for some η > 0. A general formula for the covariance matrix
of the vectorized S is
p p p p
COV(vec(S)) = ∑ ∑ ∑ ∑ Cov(Si j , Srs)(ei er ) ⊗ (e j es ).
i=1 j=1 r=1 r=1
By symmetry arguments, all the variances of the diagonal elements of S must be the
same, say Var(S11 ), and all the variances of the off-diagonal elements must be the
same as well, say Var(S12 ), and all the correlations between off-diagonal elements
and other elements of S must be zero. Finally, all the covariances between diagonal
elements must also be the same, say Cov(S11 , S22 ). The details in the proof are left
to the reader.
Under the stronger assumption that OSO ∼ S for all orthogonal p × p matrices
O, corresponding to the elliptic model (A1), one can further show that
and one gets a still simpler structure for the covariance matrix of the scatter statistic
S.
Corollary 3.1. Assume that the random symmetric p × p matrix S > 0 satisfies
OSO ∼ S for all p × p orthogonal matrices O. Then there are positive constants η
and τ1 and a constant τ3 such that
Corollary 3.2. Assume that the random symmetric p × p matrix S > 0 satisfies
tr(S) = p and OSO ∼ S for all p × p orthogonal matrices O. Then there is a posi-
tive constant τ1 such that
In the general elliptical case we then obtain the following using Theorem 3.2
and Corollary 3.1 and the affine equivariance property of the location and scatter
statistics.
Theorem 3.3 thus implies that, in the elliptical model, all location statistics are
unbiased estimators of the symmetry center μ . The constant σ 2 may then be used
in efficiency comparisons. Remember, however, that σ 2 for the choice M depends
on the distribution of ri = |zi | and on the sample size n. The scatter statistic S is
unbiased for ηΣ , η again depending on S, the sample size n, and the distribution of
ri . Then the correction factors may be used to guarantee the unbiasedness (or at least
consistency) in the case of multivariate normality, for example. The constants τ1 and
τ3 determine the variance-covariance structure of a scatter estimate S for sample
size n and distribution Fyi . In many multivariate procedures based on the covariance
matrix, it is sufficient to know the covariance matrix only up to a constant, that is,
3.4 Breakdown point 23
the shape matrix. According to Corollary 3.2, the constant τ1 is sufficient for shape
matrix efficiency comparisons. The problem, of course, is how to estimate these
constants which are unknown in practice.
To define the breakdown point, we still need a distance measure δ (T, T∗ ) be-
tween the observed value T = T(Y) and the corrupted value T∗ = T(Y∗ ) of the
statistic. The maximum distance over all m-replacements is then
m = 0, ..., n.
The sample mean vector and sample covariance matrix have the smallest possi-
ble breakdown point 1/n. Maronna (1976) and Huber (1981) showed that the M-
statistics have relative low breakdown points, always below 1/(p + 1). For high
breakdown points, some alternative estimation techniques (e.g., S-estimates) should
be used.
The influence function (Hampel, 1968, 1974) is often used to measure local robust-
ness of the functional or the statistic.
Definition 3.7. The influence function (IF) of a functional T(F) at F is
and
IF(y; S, F) = α (r)Σ 1/2 uu Σ 1/2 − β (r)Σ ,
where r = |z| and u = |z|−1 z with z = Σ −1/2 (y − μ ), and γ , α , and β are real valued
functions determined by M and S and the spherical distribution F0,I .
If T(Y) = T(Fn ) is the sample version of the functional T(F ) and the functional
is sufficiently regular, then often (and this should of course be proven separately for
each statistic)
3.6 Other uses of location and scatter statistics 25
√ √
n(T(Y) − T(F)) = n AVE {IF(yi ; T, F)} + oP(1).
See Huber (1981). Then, under general conditions, using the central limit theorem
and the influence functions given in Theorem 3.5,
√
n(M(Y) − μ ) →D N p (0, σ 2 Σ )
√
and the limiting distribution of n (vec(S(Y)) − vec(Σ )) is
N p 0, τ1 (I p,p + K p,p)(Σ ⊗ Σ ) + τ3 vec(Σ )(vec(Σ )) ,
where now
E[γ 2 (ri )]
σ2 = ,
p
E[α 2 (ri )]
τ1 = , and
p(p + 2)
E[α 2 (ri )] 2E[α (ri )β (ri )]
τ3 = − + E[β 2 (ri )]
p(p + 2) k
with ri = |zi | with zi = Σ −1/2 (yi − μ ). Note that constants σ 2 , τ1 , and τ3 are related
but not the same as the finite sample constants discussed in Section 3.3.
γ (r) ∝ w1 (r)r,
α (r) ∝ w2 (r)r2 and
β (r) + constant ∝ w2 (r)r . 2
Thus in M-estimation the weight functions determine the local robustness and effi-
ciency properties of M and S. Recall that the Huber’s M-statistic, for example, is
obtained with choices
The scatter matrices S(Y) are often used to transform the dataset. If one writes
(spectral or eigenvalue decomposition)
S(Y) = O(Y)D(Y)(O(Y))
26 3 Location and scatter functionals and sample statistics
where O(Y) is an orthogonal matrix and D(Y) is a diagonal matrix with positive
diagonal elements in a decreasing order, then the components of the transformed
data matrix
Z = YO(Y),
are the so called principal components, used in principal component analysis (PCA).
The columns of O are also called eigenvectors of S, and the diagonal elements of D
list the corresponding eigenvalues. The principal components are uncorrelated and
ordered according to their dispersion in the sense that S(Z) = D. Principal com-
ponents are often used to reduce the dimension of the data. The idea is then to take
just a few first principal components combining most of the variation; the remaining
components are thought to represent the noise.
Scatter matrices are also often used to standardize the data. The transformed
standardized dataset
Z = YO(Y)[D(Y)]−1/2
or
Z = Y[S(Y)]−1/2
then has standardized components (in the sense that S(Z) = I p ), and the observations
zi tend to be spherically distributed in the elliptic case. The symmetric version of
the square root matrix
S−1/2 = OD−1/2O
is usually chosen. (A square root of a diagonal matrix with positive elements is a
diagonal matrix of square roots of the elements.) Unfortunately, even in that case,
the transformed dataset Z is not coordinate-free but the following is true.
Theorem 3.6. For all Y, A, and S, there is an orthogonal matrix O such that
In the independent component analysis (ICA), most ICA algorithms first standardize
the data using the regular covariance matrix S = S(Y) and then rotate the standard-
ized data in such a way that the components of Z = YS−1/2 O are “as independent as
possible”. In this procedure the regular covariance matrix may be replaced by any
scatter matrix that has the independence property.
A location statistic M = M(Y) and a scatter matrix S = S(Y) may be used to-
gether to center and standardize the dataset. Then the transformed dataset is given
by
Z = (Y − 1nM )S−1/2 .
This is often called the whitening of the data. Then M(Z) = 0 and S(Z) = I p . Again,
if you rotate the dataset using an orthogonal matrix O, it is still true that M(ZO) = 0
and S(ZO) = I p . This means that the whitening procedure is not uniquely defined.
Recently, Tyler et al. (2009) developed an approach called the invariant coordinate
selection (ICS) which is based on the simultaneous use of two scatter matrices, S1
3.6 Other uses of location and scatter statistics 27
−1/2
and S2 . In this procedure the data are first standardized using S1 and then the
standardized data are rotated using the principal component transformations O but
−1/2
now based on S2 and the transformed data YS1 . Then
−1/2 −1/2
S1 (YS1 O) = I p and S2 (YS1 O) = D,
−1/2
where now D is the diagonal matrix of eigenvalues of S1 S2 . If both S1 and S2
have the independence property, the procedure finds, under general assumptions, the
independent components in the ICA problem.
Two location vectors and two scatter matrices may be used simultaneously to
describe the skewness and kurtosis properties of a multivariate distribution. Affine
invariant multivariate skewness statistics may be defined as squared Mahalanobis
distances between two location statistics
Abstract In this chapter the concepts of multivariate spatial signs and ranks and
signed-ranks are introduced. The centering and standardization of the scores are
discussed. Different properties of the sign and rank scores are obtained. The sign
and rank covariance matrices UCOV, TCOV, QCOV, and RCOV are introduced
and discussed.
yi = μ + Ω ε i , i = 1, ..., n,
where the ε i are independent, centered, and standardized residuals with cumulative
distribution function F. As discussed before, different assumptions on the distribu-
tion of ε i yield different parametric and semiparametric models. In some applica-
tions,
Y = Xβ + εΩ
so that the symmetry center depends on the design matrix X. Before introducing dif-
ferent nonparametric scores used in our approach, we present alternative strategies
on how the scores should be centered and standardized before their use in the test
construction and in the estimation.
A general idea to construct tests and estimates for location parameter μ and scat-
ter matrix Σ = Ω Ω is to use a p-vector-valued score function T(y) yielding in-
dividual scores Ti = T(yi ), i = 1, ..., n. Throughout this book we use the identity
score function T(y) = y, the spatial sign score function U(y), the spatial rank score
function R(y), and the spatial signed-rank score function Q(y).
The general likelihood inference theory suggests that a good choice for T in the
location problem is the optimal location score function
that is, the gradient vector of log f (y − μ ) with respect to μ at the origin. In the
N p (0, I p ) case the optimal score function is the identity score function,
T(y) = y.
The optimal location score function for the p-variate t-distribution with ν degrees
of freedom, tν ,p (0, I p ), for example, is
ν+p
T(y) = y,
1 + |y|2
ψ (|y|)
T(y) = y,
|y|
where ψ (r) = ρ (r), that is, the derivative function of ρ . An example of a robust
choice of the score function is Huber’s score function
c
T(y) = min ,1 y
|y|
with some choice of c > 0. The validity, efficiency, and robustness properties of the
testing and estimation procedures then naturally depend on the choice of the score
function and of course on the true model.
For different testing or estimation purposes, one then often wishes the scores to be
centered and standardized in some natural way.
Ti → T̂i = Ti − T̄.
Ti → T̂i = COV(T)−1/2 Ti .
1. Inner centering of the scores: Find shift vector M such that, if T̂i = T(yi − M),
then
AVE{T̂i } = 0.
Then transform
Ti → T̂i = T(yi − M).
2. Inner standardization of the scores: Find transformation matrix S−1/2 such that,
if T̂i = T(S−1/2 yi ), then
Then transform
Ti → T̂i = T(S−1/2 yi ).
3. Inner centering and standardization of the scores: Find shift vector M and trans-
formation matrix S−1/2 such that, if T̂i = T(S−1/2 (yi − M)), then
Then transform
Ti → T̂i = T(S−1/2 (yi − M))
Note that, in the inner approach, M = M(Y) is a location statistic and S = S(Y)
is the scatter statistic corresponding to the score function T(y). The matrix S−1/2 is
assumed to be a symmetric matrix here. Note, however, that inner centering and/or
standardization may not always be possible.
The tests and estimates are then no longer optimal under the multivariate normality
assumption but are robust and more powerful under heavy-tailed distributions. For
the asymptotic behavior of the tests and estimates we need the following matrices
A = E T(ε i )L(ε i ) and B = E T(ε i )T(ε i ) ,
the covariance matrix between chosen score and the optimal score and the variance-
covariance matrix of the chosen score. These two matrices play an important role in
the following chapters.
We trace the ideas from the univariate concepts of sign, rank and signed-rank.
These concepts are linked with the possibility to order the data. The ordering is
done with the univariate sign function
⎧
⎨ +1, if y > 0
U(y) = 0, if y = 0
⎩
−1, if y < 0.
Consider a univariate dataset Y = (y1 , ..., yn ) and assume that there are no ties. Let
−Y = (−y1 , ..., −yn ) be the dataset when the observations are reflected with respect
to the origin. The (empirical) centered rank function is
The numbers Ui = U(yi ), Ri = R(yi ), and Qi = Q(yi ), i = 1, ..., n, are the observed
signs, observed centered ranks, and observed signed-ranks. The possible values of
observed centered ranks Ri are
n−1 n−3 n−3 n−1
− ,− , ..., , .
n n n n
The possible values of observed signed-ranks Qi are
4.2 Univariate signs and ranks 33
2n − 1 2n − 3 2n − 3 2n − 1
− ,− , ..., , .
2n 2n 2n 2n
The centered ranks and the signed-ranks are located on the interval (-1,1) (univari-
ate unit ball). Note that the centered rank R(x) and signed-rank Q(x) provide both
magnitude (robust distances from the median and from the origin, respectively) and
direction (sign with respect to the median and sign with respect to the origin, re-
spectively).
There are n! possible values of the vector of ranks (R1 , ..., Rn ) , given by
⎛ ⎞
−(n − 1)/n
⎜ −(n − 3)/n ⎟
⎜ ⎟
P⎜ ... ⎟,
⎜ ⎟
⎝ (n − 3)/n ⎠
(n − 1)/n
where P goes through all n × n permutation matrices, and if the observations are
independent and identically distributed, then all these possible values have an equal
probability 1/n!. The vector of signed-ranks (Q1 , ..., Qn ) has 2n n! possible values
obtained from ⎛ ⎞
1/(2n)
⎜ 3/(2n) ⎟
⎜ ⎟
JP ⎜⎜ ... ⎟,
⎟
⎝ (2n − 3)/(2n) ⎠
(2n − 1)/(2n)
with all possible permutation matrices P and all possible sign-change matrices J,
and if the distribution is symmetric around zero then all the possible values have the
same probability 1/(2nn!).
It is easy to find the connection between the regular rank (with values 1, 2, ..., n)
and the centered rank, namely,
2 n+1
centered rank = regular rank − .
n 2
The above definitions of univariate signs and ranks are based on the ordering
of the data. However, in the multivariate case there is no natural ordering of the
data points. The approach utilizing objective or criterion functions is then needed
to extend the concepts to the multivariate case. The concepts of univariate sign and
rank and signed-rank may be implicitly defined using the L1 criterion functions
34 4 Multivariate signs and ranks
AVE{|yi |} = AVE{U(yi ) · yi },
1
AVE{|yi − y j |} = AVE{RY (yi ) · yi },
2
1
AVE{|yi + y j |} = AVE{R−Y (yi ) · yi }, and
2
1
AVE{|yi − y j | + |yi + y j |} = AVE{QY (yi ) · yi }.
4
See Hettmansperger and Aubuchon (1988).
Let us have a closer look at these objective functions if applied to the residuals in
the linear regression model. We first remind the reader that the classical L2 objective
function optimal in the normal case is similarly
AVE{|yi |2 } = AVE{yi · yi }
and corresponds to the identity score function. The first objective function, the
mean deviation of the residuals, is the basis for the so-called least absolute devi-
ation (LAD) methods; it yields different median-type estimates and sign tests in the
one-sample, two-sample, several-sample and finally general linear model settings.
The second objective function is the mean difference of the residuals. The second,
third, and fourth objective functions generate Hodges-Lehmann-type estimates and
rank tests for different location problems. Note also that the sign, centered rank, and
signed-rank function may be seen as the score functions corresponding to the first,
second, and fourth objective functions. This formulation suggests a natural way to
generalize the concepts sign, rank, and signed-rank to the multivariate case (without
defining a multivariate ordering).
Consider next the corresponding theoretical functions, and assume that y, y1 , and
y2 are independent and identically distributed continuous random variables from a
univariate distribution with cdf F. Then, similarly to empirical equations,
as n → ∞. The functions QY (y) and QF (y) are odd, that is, QY (−y) = −QY (y)
and QF (−y) = −QF (y), and for distributions symmetric about the origin QF (y) =
RF (y). Note also that the inverse of the centered rank function (i.e., the inverse of
the centered cumulative distribution function) is the univariate quantile function.
Observe that in the univariate case regular sign, rank, and signed-rank functions
are obtained. Clearly multivariate signed-rank function Q(y) is also odd; that is,
Q(−y) = −Q(y).
The observed spatial signs are Ui = U(yi ), i = 1, ..., n. As in the univariate case,
the observed spatial ranks are certain averages of signs of pairwise differences
Theorem 4.1. The spatial signs, spatial ranks, and spatial signed-ranks are orthog-
onal equivariant in the sense that
U(Oyi ) = OU(yi ),
RYO (Oyi ) = ORY (yi ), and
QYO (Oyi ) = OQY (yi )
for all yi and all orthogonal matrices O. The centered ranks are invariant under
location shifts and
AVEi {Ri } = 0.
Example 4.1. The spatial signs, ranks, and signed-ranks are not affine equivariant,
however. In Figure 4.1 one can see scatterplots for 50 bivariate observations from
N2 (0, I2 ) with the corresponding bivariate spatial signs and ranks and signed-ranks.
The data points are then rescaled (Figure 4.2) and shifted (Figure 4.3). The figures
illustrate the behavior of the signs, ranks, and signed-ranks under these transforma-
tions: they are not equivariant under rescaling of the components. The spatial ranks
are invariant under location shifts. See below the R-code needed for the plots.
>library(MNM)
>set.seed(1)
>par(opar)
4 1.0
2 0.5
X_2
X_2
0 0.0
−2 −0.5
−4 −1.0
X_1 X_1
1.0 1.0
0.5 0.5
X_2
X_2
0.0 0.0
−0.5 −0.5
−1.0 −1.0
X_1 X_1
Fig. 4.1 The scatterplots for a random sample of size 50 from N2 (0, I2 ) with scatterplots for cor-
responding observed spatial signs, spatial ranks, and spatial signed-ranks.
38 4 Multivariate signs and ranks
15 1.0
10
0.5
5
X_2
X_2
0 0.0
−5
−0.5
−10
−15 −1.0
X_1 X_1
1.0 1.0
0.5 0.5
X_2
X_2
0.0 0.0
−0.5 −0.5
−1.0 −1.0
X_1 X_1
Fig. 4.2 The scatterplots for a random sample of size 50 from N2 (0, I2 ) with rescaled second
component (multiplied by 5) with scatterplots for corresponding observed spatial signs, spatial
ranks, and spatial signed-ranks.
The sign, centered rank, and signed-rank may again be implicitly defined through
multivariate L1 type objective functions
AVE{|yi |} = AVE{Ui yi },
1
AVE{|yi − y j |} = AVE{Ri yi }, and
2
1
AVE{|yi − y j | + |yi + y j |} = AVE{Qi yi }.
4
4 1.0
2 0.5
X_2
X_2
0 0.0
−2 −0.5
−4 −1.0
X_1 X_1
1.0 1.0
0.5 0.5
X_2
X_2
0.0 0.0
−0.5 −0.5
−1.0 −1.0
X_1 X_1
Fig. 4.3 The scatterplots for a random sample of size 50 from N2 (0, I2 ) with shifted first compo-
nent (shifted by 1) with scatterplots for corresponding observed spatial signs, spatial ranks, and
spatial signed-ranks.
The rank function RF (y) characterizes the distribution F (up to a location shift).
If we know the rank function, we know the distribution (up to a location shift). See
Koltchinskii (1997). For F symmetric around the origin, QF (y) = RF (y), for all y.
The empirical functions converge uniformly in probability to the theoretical ones
under mild assumptions. (For the proof, see Möttönen et al. (1997).)
Theorem 4.2. Assume that Y is a random sample of size n from a distribution with
cdf F and uniformly bounded density. Then, as n → ∞,
Chaudhuri (1996) considered the inverse of the spatial rank function and called it
the spatial quantile function. See also Koltchinskii (1997). Let u be a vector in the p-
variate open unit ball B p . Write Φ (u, y) = |y|− uy. Then, according to Chaudhuri’s
definition, the value of spatial quantile function θ = θ (u) of F at u minimizes
40 4 Multivariate signs and ranks
where y has the cdf F. The second term in the expectation guarantees that the ex-
pectation always exists. It is easy to check that the spatial quantile function is the
inverse of the map
y → RF (y).
Serfling (2004) gives an extensive review of the inference methods based on the
concept of the spatial quantile, and introduces and studies some nonparametric mea-
sures of multivariate location, spread, skewness and kurtosis in terms of these quan-
tiles. The quantile at u = 0 is the so-called spatial median which is discussed in
more detail later.
If F is spherically symmetric around the origin, the rank and signed-rank function
has a simple form as described in the following.
Theorem 4.3. For a spherical distribution F, the theoretical spatial rank and signed-
rank function is
RF (y) = QF (y) = qF (r)u,
where r = |y| and u = |y|−1 y and
r − y1
qF (r) = EF .
((r − y1 )2 + y22 + · · · + y2p)1/2
(The expected value always exists.) Naturally also |qF (r)| ≤ 1, so RF (y) is in the
unit ball.
In the following examples we give general formulas for the spatial rank function
in cases of multivariate normal, multivariate t, and multivariate normal scale mixture
models. For these distributions, the spatial rank functions can be formulated with the
generalized hypergeometric functions.
It is straightforward but tedious then to get the theoretical rank functions in the
following cases. See Möttönen et al. (2005).
and 2
r r Γ ( p+1
2 ) p + 1 p + 2 r2
q0 (r) = exp − F
1 1 ; ; .
21/2 2 Γ ( p+2 ) 2 2 2
2
Example 4.3. Let φ (y; μ , Σ ) be the density of a multivariate normal distribution with
mean vector μ and the covariance matrix Σ . Then a standardized multivariate nor-
mal scale mixture density is given by
φ (y; 0, s−2 I p )dH(s).
and
ν +1
rΓ ( p+1
2 )Γ ( 2 ) p + 1 ν + 1 p + 2 r 2 /ν
q(r) = 2 F1 , ; ; .
ν 1/2Γ ( ν2 )Γ ( p+2 (ν +1)/2 2 1 + r 2 /ν
2 )[1 + r /ν ]
2 2 2
Example 4.5. In the case of the mixture of two normal distributions, N p (0, I p ) with
the probability π1 and N p (0, σ 2 I p ) with the probability π2 , π1 + π2 = 1, then
If spatial sign and ranks are used to analyze the data, the sign and rank covariance
matrices also naturally play an important role.
Definition 4.4. Let Y ∈ M (n, p) be a data matrix. Then the spatial sign covariance
matrix UCOV(Y), and the spatial Kendall’s tau matrix TCOV(Y) are
UCOV(Y) = AVE U(yi )U(yi ) ,
TCOV(Y) = AVE U(yi − y j )U(yi − y j ) .
Definition 4.5. Let Y ∈ M (n, p) be a data matrix. Then the spatial rank covariance
matrix RCOV(Y) and the spatial signed-rank covariance matrix QCOV(Y) are
RCOV(Y) = AVE Ri Ri and
QCOV(Y) = AVE Qi Qi .
Note that if one uses the vectors of marginal signs and ranks instead of the spatial
signs and ranks, then the regular sign covariance matrix (UCOV), Kendall’s tau
matrix (TCOV) and Spearman’s rho matrix (RCOV) are obtained.
The matrices UCOV(Y), RCOV(Y), TCOV(Y), and QCOV(Y) are not scatter
matrices as they are not affine equivariant. They are equivariant under orthogonal
transformations only. The RCOV and TCOV are shift invariant. Note also that the
sign covariance matrix and the Kendall’s tau matrix are standardized in the sense
that tr(UCOV(Y)) = tr(TCOV(Y)) = 1.
The theoretical (population) spatial sign and Kendall’s tau covariance matrices
are defined in the following.
The result suggests that, in the elliptic model, the spatial sign and rank covariance
matrices can be used in the principal component analysis (PCA) to find the principal
components. Let D, DU , and DT be the (diagonal) matrices of distinct eigenvalues
of COV(F), UCOV(F), and TCOV(F), respectively, in a decreasing order. Then
!
D1/2 uu D1/2
D U = DT = E ,
u Du
where u is uniformly distributed on the unit sphere. See Locantore et al. (1999),
Marden (1999b), Visuri et al. (2000, 2003), and Croux et al. (2002) for these robust
alternatives to classical PCA.. For more discussion on PCA based on spatial signs
and ranks, see Chapter 9.
4.4 Sign and rank covariance matrices 43
The theoretical spatial rank and signed-rank covariance matrices are given in the
following.
Using Corollary 3.2, the moments of the sign covariance matrix are easily found
in the spherically symmetric case.
where as before
1 1
P1 + P2 = (I p,p + K p.p) − J p,p.
2 p
Proof. Let y1 , ..., yn be a random sample from a spherical distribution, and let Ui =
|yi |−1 yi , i = 1, ..., n. Then
Clearly p · E[UCOV(Z)] = I p . For the variances and covariances, recall from The-
orem 2.2 that (Ui12 , ...,Uip
2 ) has a Dirichlet distribution D (1/2, ..., 1/2), so
p
1 1 3
E[Ui2j ] = , E[Ui2jUik2 ] = and E[Ui4j ] = ,
p p(p + 2) p(p + 2)
the origin, the first and second moments of TCOV(Y) have a similar structure as
those of the UCOV(Y) with expected value (1/p)I p and covariance matrix
τ (P1 + P2 )
but τ now depends on sample size n and the distribution of the modulus. The limiting
distributions of UCOV(Y) and TCOV(Y) can be easily given in the spherical case.
See Sirkiä et al. (2008).
Theorem 4.6. Assume that Y is a random sample of size n from a spherical distri-
bution around the origin. Then, as n → ∞,
√ 1 2
nvec UCOV(Y) − I →d N p2 0, (P1 + P2) and
p p(p + 2)
√ 1 2τ
nvec TCOV(Y) − I →d N p2 0, (P1 + P2) ,
p p(p + 2)
where
|y1 | |y1 |
τ = 4E 1 − 1−
|y1 − y2 | |y1 − y3|
as n → ∞.
so that the sample statistics are asymptotically unbiased. They estimate the same
population quantity if F is symmetrical around the origin. In the spherically sym-
metric case their covariance matrices are structured as
τ2 (P1 + P2) + τ3 P3 .
Next we give a short survey of some other approaches to multivariate sign and rank
methods. The most straightforward extension is just to use vectors of componentwise
univariate signs and ranks: the componentwise signs and ranks then correspond to
the L1 type criterion functions
utilizing the so called “Manhattan distance”. The sign and rank vectors are invariant
under componentwise monotone transformations (e.g., marginal rescaling) but not
orthogonal equivariant. This approach is perhaps the most natural one for data with
independent components. See Puri and Sen (1971) for a complete discussion of this
approach.
The corresponding score functions are then affine equivariant or Oja multivariate
signs and ranks, and the inference methods based on these are then affine equiv-
ariant/invariant. For a review of the multivariate location problem, see Oja (1999).
Visuri et al. (2000, 2003) and Ollila et al. (2003b, 2004) introduced and considered
the corresponding affine equivariant sign and rank covariance matrices.
Koshevoy and Mosler (1997a,b, 1998) and Mosler (2002) proposed the use
of zonotopes and lift zonotopes, p- and (p + 1)-variate convex sets Z p (Y) and
LZ p+1 (Y), respectively, to describe and investigate the properties of a data matrix Y.
Koshevoy et al. (2003) developed a scatter matrix estimate based on the zonotopes.
It appears that there is a nice duality relation between zonotopes (lift zonotopes) and
affine equivariant signs (ranks); the objective functions yielding affine equivariant
signs and ranks are just volumes of zonotope Z p (Y) and lift zonotope LZ p+1 (Y),
respectively. See Koshevoy et al. (2004).
Randles (1989) developed an affine invariant sign test based on an ingenious con-
cept of interdirection counts. Affine invariant interdirection counts depend on the
directions of the observation vectors; they measure the angular distances between
two vectors relative to the rest of the data. Randles (1989) was followed by a series
of papers introducing nonparametric sign and rank interdirection tests for multivari-
ate one sample and two sample location problems, for example. This approach is
quite related to the spatial sign and rank approach, as we show in later chapters.
Still one important approach is to combine the directions (spatial signs or in-
terdirection counts) and the transformed ranks of the Mahalanobis distances from
the origin or data center. In a series of papers, Hallin and Paindaveine constructed
optimal signed-rank location tests in the elliptical model; see the seminal papers by
46 4 Multivariate signs and ranks
Hallin and Paindaveine (2002, 2006). Similarly, Nordhausen et al. (2009) developed
optimal rank tests in the independent component model.
Chapter 5
One-sample problem: Hotelling’s T 2 -test
Abstract We start with a one-sample location example with trivariate and bivariate
observations. It is shown how a general score function T(y) is used to construct tests
and estimates in the one-sample location problem. The identity score T(y) gives the
regular Hotelling’s T 2 -test and sample mean.
5.1 Example
We consider the classical data due to Rao (1948) consisting of weights of cork bor-
ings on trees in four directions: north (N), east (E), south (S), and west (W). We
have these four measurements on 28 trees, and we wish to test whether the weight
of cork borings is independent of the direction.
Table 5.1 Weights of cork borings (in centigrams) in the four directions
N E S W N E S W
72 66 76 77 91 79 100 75
60 53 66 63 56 68 47 50
56 57 64 58 79 65 70 61
41 29 36 38 81 80 68 58
32 32 35 36 78 55 67 60
30 35 34 26 46 38 37 38
39 39 31 27 39 35 34 37
42 43 31 25 32 30 30 32
37 40 31 25 60 50 67 54
33 29 27 36 35 37 48 39
32 30 34 28 39 36 39 31
63 45 74 63 50 34 37 40
54 46 60 52 43 37 39 50
47 51 52 43 48 54 57 43
−10 0 5 10
10
5
0
E_N −5
−10
−15
−20
10
5
0 S_N
−5
−10
5
0
−5
W_N −10
−15
−20
Fig. 5.1 The scatterplot for differences E-N, S-N and W-N.
30
20
10
W_E
−10
−20
−30
S_N
>data(cork)
yi = μ + Ω ε i , i = 1, ..., n,
where the ε i are centered and standardized residuals with cumulative distribution
function F. The tests and estimates are constructed under different symmetry as-
sumptions (A0)–(A4) and (B0)–(B4). Note that the zero vector may be used as a
null value without loss of generality, because to test H0 : μ = μ0 , we just substitute
yi − μ0 in place of yi in the tests.
We now describe the use of a general score function T(y) for the statistical infer-
ence in the one sample location problem. The results are only heuristic and general,
and the distributional assumptions for the asymptotic theory of course depend on the
chosen score function. For the one-sample symmetry center problem it is natural to
assume that the score function T(y) is odd; that is, T(−y) = −T(y) for all y.
Outer standardization. We first discuss the test and estimate that use the outer
standardization. The test may not be affine invariant, and the estimate may not be
affine equivariant. Now
• The test statistic is
• The companion location estimate μ̂ is the shift vector obtained in the inner cen-
tering, and is determined by estimating equations
AVE {T(yi − μ̂ )} = 0.
For the asymptotic theory in our approach we need the following p × p matrices
A and B (expectations taken under the null hypothesis),
A = E{T(yi )L(yi ) } and B = E T(yi )T(yi ) .
• Under the null hypothesis H0 , the squared version of the test statistic
where
B̂ = AVE Ti Ti .
• Under the sequence of alternatives Hn ,
√
n AVE {Ti } →d N p (Aδ , B).
Note that, in the testing case, S is not a regular scatter matrix estimate but a
scatter matrix with respect to a known center (the origin). In the estimation case, S
is a regular scatter matrix (around the estimated value μ̂ ).
For later extensions to the several-sample and regression cases we next give the
test statistics in a slightly different form. The test statistics is then seen to compare
two different scatter matrices. For that purpose, write
PX = X(X X)−1 X
for any n × q matrix X with rank q < n. Matrix PX is the p × p projection matrix to
the subspace spanned by the columns of X. The transformation Y → P1n Y then just
replaces all the observations by their sample mean vector. Now, in outer standard-
ization,
Y → T → Q2 = n · tr((T P1n T)(T T)−1 ),
and, in inner standardization,
The test statistic based on inner standardization (if possible) is affine invariant. This
is not necessarily true if one uses outer standardization.
The approximate p-value may thus be based on the limiting chi-square distribu-
tion. For small sample sizes, an alternative way to construct the p-value is to use the
sign-change argument. Let J be an n×n diagonal matrix with diagonal elements ±1.
It is called a sign-change matrix. The value of the test statistic for a sign-changed
sample JY is then
(T T and T̂ T̂ are invariant under sign changes). Then the p-value of a conditionally
distribution-free sign-change test statistic is
EJ I Q2 (JY) ≥ Q2 (Y) ,
52 5 One-sample problem: Hotelling’s T 2 -test
where J has a uniform distribution over its all 2n possible values. This sign-change
version of the test is valid for the null hypothesis −y ∼ y (model (A4)).
Let Y = (y1 , ..., yn ) be a random sample from an unknown distribution, and assume
that the p-variate observation vectors yi are generated by
yi = μ + Ω ε i , i = 1, ..., n,
where the ε i are centered and standardized random vectors with the cumulative dis-
tribution function F. In the Hotelling’s test case, we assume that the standardized
vectors ε i have mean vector zero and covariance matrix I p . Then E(yi ) = μ is an un-
known mean vector, and COV(yi ) = Σ = Ω Ω > 0 an unknown covariance matrix.
At first, we wish to estimate unknown μ and test the null hypothesis
H0 : μ = 0,
Hotelling’s T 2 and the sample mean: Hotelling’s test is obtained with score func-
tion T(y) = y. Then
T(Y) = ȳ and also μ̂ = ȳ
and, under the null hypothesis,
A = Ip and B = C = Σ.
Both the outer and inner standardizations yield the same Hotelling’s one-sample
test statistic
Q2 = Q2 (Y) = nȳ B̂−1 ȳ.
In the above test construction, we standardized the sample mean using the sample
covariance matrix with respect to the origin, B(Y) = AVE{yi yi }. The popular ver-
sion of Hotelling’s T 2 uses the regular sample mean and regular sample covariance
matrix, namely ȳ and
C = AVE{(yi − ȳ)(yi − ȳ) },
and is defined as
T 2 = T 2 (Y) = nȳ C−1 ȳ.
Under the null hypothesis, both B(Y) and
In the testing procedure with sample size n, instead of reporting a p-value, one
often compares the observed value of the test statistic Q2n to a critical value cn . The
null hypothesis is rejected if Q2n > cn . The validity and asymptotic validity of a test
(Q2n , cn ) for the null hypothesis H0 is defined as follows.
Definition 5.1. The test (Q2n , cn ) is a valid level-α test for sample size n if PH0 (Q2n >
cn ) = α . The sequence of tests (Q2n , cn ), n = 1, 2, ..., is asymptotically valid with
level α if PH0 (Q2n > cn ) → α . The probabilities PH0 are calculated under the null
hypothesis H0 .
nμ Σ −1 μ . Also the sample size calculations to attain fixed size and power may be
based on this result.
Q2 (Y) = 1n Y(Y Y)−1 Y1n and Q2 (JY) = 1n JY(Y Y)−1 YJ1n
and the p-value obtained from the sign-change or exact version of the test
EJ I Q2 (JY) ≥ Q2 (Y)
Q2 (YH ) = Q2 (Y)
for any full-rank H. This implies that the null distribution does not depend on Ω at
all. If we write
ε̂ = YB̂−1/2 ,
for the estimated residuals, then
In the multivariate normality case, the sample mean ȳ and the (slightly adjusted)
sample covariance matrix n/(n − 1)C(Y) are optimal (uniformly minimum variance
unbiased, UMVU) estimators of unknown μ and Σ . Also, it is known that
1
ȳ ∼ N p μ , Σ ,
n
μ̂ ) = 1 C(Y).
COV(
n
5.3 Hotelling’s T 2 -test 55
Example 5.1. Cork boring data Consider first the 3-variate vector of E-N, S-N and
W-N. If we wish to test the null hypothesis that the mean vector is zero, we get
> mv.1sample.test(X1)
data: X
T.2 = 20.742, df = 3, p-value = 0.0001191
alternative hypothesis: true location is not equal to c(0,0,0)
If we then wish to estimate the population mean vector with the sample mean
vector, the estimate and its estimated covariance matrix are as given below. The es-
timate and its estimated covariance matrix are then used to find the 95% confidence
ellipsoid for the population mean vector. This is given in Figure 5.4.
10
0
E_N
−20 −10
5 10
5 10
S_N
0
0
−10
−10
0 5
W_N
−10
−20
Fig. 5.3 The scatterplot with estimated mean and 95% confidence ellipsoid for 3-variate data.
Example 5.2. Consider then the bivariate vector of S-N and W-E. If we wish to test
the null hypothesis that the mean vector is zero, we get
> mv.1sample.test(cork_2v)
data: X
T.2 = 0.4433, df = 2, p-value = 0.8012
alternative hypothesis: true location is not equal to c(0,0)
Again, the mean vector, its estimated covariance matrix, and 95 % confidence
ellipsoids are given by
−20 −10 0 10
10
5
S_N
0
−10
10
W_E
0
−20 −10
−10 0 5 10
sample mean vector
Fig. 5.4 The scatterplot with estimated mean and 95% confidence ellipse for 2-variate data.
Chapter 6
One-sample problem: Spatial sign test
and spatial median
Abstract The spatial sign score function U(y) is used for the one-sample location
problem. The test is then the spatial sign test, and the estimate is the spatial median.
The tests and estimates using outer standardization as well as those using inner
standardization are discussed.
6.1.1 Preliminaries
The aim is to find, in the one-sample location problem, test statistics that are valid
under much weaker conditions than Hotelling’s T 2 . We consider a multivariate gen-
eralization of the univariate sign test, perhaps the simplest test ever proposed. The
spatial sign test uses the spatial sign score U(y) which is given by
and U(0) = 0.
We start by giving some approximations and key results needed in the following.
Let y = 0 and μ be any p-vectors, p > 1, and write
Lemma 6.1.
1.
||y − μ | − |y|| ≤ |μ |.
2.
|μ |2
||y − μ | − |y| − uμ | ≤ 2 .
r
3. " "
" " 2+δ
"|y − μ | − |y| − u μ − μ 1 [I p − uu ]μ " ≤ C |μ |
" 2r " r1+δ
for all 0 < δ < 1 where C does not depend on y or μ .
Lemma 6.3. Assume that the density function f (ε ) of the p-variate continuous ran-
dom vector ε is uniformly bounded. Then E{|ε |−α } exists for all 0 ≤ α < 2.
yi = μ + ε i , i = 1, ..., n,
6.1 Multivariate spatial sign test 61
E(U(ε i )) = 0.
H0 : μ = 0.
The matrices
A = E |ε i |−1 (I p − |ε i |−2 ε i ε i ) and B = UCOV(F) = E{|ε i |−2 ε i ε i }
are often needed in the following. As the density function of ε i is continuous and
uniformly bounded A also exists and is bounded.
Multivariate spatial sign test is thus obtained with score function T(y) = U(y)
(spatial sign score). Write Ui = U(yi ), i = 1, ..., n. Then
T = T(Y) = AVE{Ui }
and
Q2 = Q2 (Y) = nT B̂−1 T,
where
B̂ = AVE{Ui Ui }.
Note that T(Y) is of course not a location statistic; it is only orthogonal equivariant
(T(YO ) = OT(Y)).
E{U(yi − μ )} = 0.
Note that this assumption is naturally true under symmetry, that is, if −(yi − μ ) ∼
(yi − μ ). As B = E(Ui Ui ) always exists, the weak law of large numbers (WLLN,
Kolmogorov) implies that
B̂ →P B.
Then, using the central limit theorem (Lindeberg-Lévy) and Slutsky’s lemma, we
easily obtain the following.
√
Theorem 6.1. If the null hypothesis H0 : μ = 0 is true then nT →d N(0, B) and
the test statistic with outer standardization
62 6 One-sample problem: Spatial sign test and spatial median
Q2 = nT B̂−1 T → χ p2 .
The finite-sample power of the test can be approximated using the following
theorem.
Theorem 6.2. Assume that E{U(yi − μ )} = 0 and that the density of ε i is uniformly
bounded. Then, under the √ sequence of alternative distributions Hn : μ = n−1/2 δ ,
the limiting distribution of nT(Y) is N p (Aδ , B), and
Q2 (Y) → χ p2 (δ AB−1 Aδ ),
JU ∼ U
for all n × n sign-change matrices J. Here U = (U1 , ..., Un ) . Note that sign covari-
ance matrix UCOV(Y) = UCOV(JY), for all J (invariance under sign changes).
Therefore
Q2 (Y) = 1n U(U U)−1 U 1n and Q2 (JY) = 1n JU(U U)−1 U J1n .
where I{·} is an indicator function and the expected value is calculated for a uni-
formly distributed sign-change matrix J (with 2n possible values). In practice, the
expected value is naturally in often approximated by simulations from the uniform
distribution of J.
6.1 Multivariate spatial sign test 63
0.5
E_N 0.0
−0.5
1.0
0.5
0.0 S_N
−0.5
0.5
W_N 0.0
−0.5
−1.0
−0.5 0.0 0.5 −1.0 0.0 0.5
Fig. 6.1 The scatterplot for spatial signs for differences E-N, S-N and W-N. The spatial signs lie
on the 3-variate unit sphere.
Example 6.1. Cork boring data. Consider again the 3-variate vector of E-N, S-
N and W-N. The 3-variate spatial signs are illustrated in Figure 6.1. The observed
value of Q2 (Y) is 13.874 with corresponding p-value 0.003:
data: cork_3v
Q.2 = 13.87, df = 3, p-value = 0.003082
alternative hypothesis: true location is not equal to c(0,0,0)
>
> pairs(signs_3v, labels = colnames(cork_3v), las = 1)
>
64 6 One-sample problem: Spatial sign test and spatial median
1.0
0.5
W_E
0.0
−0.5
−1.0
S_N
Fig. 6.2 The scatterplot for spatial signs for differences S-N and W-E.
Example 6.2. Cork boring data. In the bivariate case with spatial signs given
in Figure 6.2. The observed value of Q2 (Y) 0.017 with p-value 0.991. With the
R-package,
data: cork_2v
Q.2 = 0.0173, df = 2, p-value = 0.9914
alternative hypothesis: true location is not equal to c(0,0)
>
> plot(signs_2v, xlab = "S_N", ylab = "W_E", ylim = c(-1, 1),
xlim = c(-1, 1), las = 1, pty = "s")
>
6.1 Multivariate spatial sign test 65
Despite all its nice properties listed so far, the sign test statistic with outer stan-
dardization is unfortunately not affine invariant. Then, for example, the p-value
depends on the chosen coordinate system. It is, however, invariant under orthogonal
transformation; that is, with outer standardization,
Q2 (YO ) = 1n UO (OU UO )−1 OU 1n = 1n U(U U)−1 U 1n = Q2 (Y),
How should one then choose the scatter statistic S? It seems natural to use the
scatter matrix given by inner standardization. It then appears that the resulting affine
invariant sign test is distribution-free under extremely weak assumptions. Using the
inner standardization we get the following.
Definition 6.1. Tyler’s transformation S−1/2 is the transformation that makes the
spatial sign covariance matrix proportional to the identity matrix,
p · UCOV(YS−1/2 ) = I p .
The matrix can be chosen so that tr(S) = p; this shape matrix is then called Tyler’s
scatter matrix (with respect to the origin).
Tyler’s transformation (and Tyler’s shape matrix) exists under weak conditions;
see Tyler (1987). Tyler’s transformation tries to make the spatial signs of the trans-
formed data points ± S−1/2 yi , i = 1, ..., n, be uniformly distributed on the unit p-
sphere. Tyler’s shape matrix S and Tyler’s transformation S−1/2 are surprisingly
easy to compute. The iterative construction may begin with S = I p and an iteration
step is
S ← p S1/2 UCOV(YS−1/2 ) S1/2 .
If |p UCOV(YS−1/2 ) − I p | is sufficiently small, then stop and fix the scale by
S ← [p/tr(S)]S. Tyler (1987) gives weak conditions under which the algorithm con-
verges.
Tyler’s shape matrix S(Y) is thus calculated with respect to the origin. It is inter-
esting to note that
1. Its value depends on yi only through Ui = |yi |−1 yi .
2. It is affine equivariant in the sense that
S(YH ) ∝ HS(Y)H ,
yi = μ + Ωε i , i = 1, ..., n,
where the independent residuals ε i have a uniformly bounded density and the ε i are
centered and standardized so that
(true in model (B2)). Then under the null hypothesis Tyler’s shape matrix S(Y)
converges in probability to [p/tr(Σ )]Σ where Σ = Ω Ω .
for the matrix of observed standardized spatial signs. The multivariate sign test
based on standardized signs, the spatial sign test with inner standardization, then
rejects H0 for large values of
p 2 " "2
Q2 (YS−1/2 ) = 1n Û(Û Û)−1 Û 1n = (1n Û) = np "AVE{Ûi }" .
n
Q2 (YS−1/2 ) is simply np times the squared length of the average direction of the
transformed data points. This test was proposed and developed in Randles (2000)
where the following important result is also given.
Theorem 6.4. The spatial sign test with inner standardization, Q2 (YS−1/2 ), is affine
invariant and strictly distribution-free in the model of (B1) of elliptical directions
(|ε i |−1 ε i uniformly distributed). The limiting distribution of Q2 (YS−1/2 ) under the
null hypothesis is then a χ p2 distribution.
Proof. The affine invariance was proven in Theorem 6.3. The fact that the test is
distribution-free under the model of elliptical directions follows from the fact that
Q2 (YS−1/2 ) depends on the observations only through |ε i |−1 εi , i = 1, ..., n.
Assume next (without loss of generality) that Ω = I p ; that is, Y = ε . Then S−1/2
is a root-n consistent estimate of I p (Tyler (1987)). Thus
√ −1/2
Δ∗ = n(S − I p) = OP (1).
Then write
S−1/2 = I p + n−1/2Δ ∗ ,
where Δ ∗ is thus bounded in probability. Using Lemma 6.2 we obtain
1 n 1 n 1 n
√ ∑ U(S−1/2 yi ) = √ ∑ Ui + ∑ (Δ ∗ − Ui Δ ∗ Ui )Ui + oP(1).
n i=1 n i=1 n i=1
6.1 Multivariate spatial sign test 67
For |Δ ∗ | < M, the second term in the expansion converges uniformly in probability
to zero due to its linearity with respect to the elements of Δ ∗ and due to the symmetry
of the distribution of Ui . Therefore
1 n 1 n
√ ∑ U(S−1/2 yi ) − √ ∑ Ui →P 0.
n i=1 n i=1
Example 6.3. Cork boring data. Consider again the 3-variate vector of E-N, S-
N and W-N. The 3-variate standardized spatial signs are illustrated in Figure 6.3.
The observed value of Q2 is 14.57 with corresponding p-value 0.002. Using the R
package,
data: cork_3v
Q.2 = 14.57, df = 3, p-value = 0.002222
alternative hypothesis: true location is not equal to c(0,0,0)
>
> pairs(signs_i_3v, labels = colnames(cork_3v), las = 1)
>
Example 6.4. Cork boring data. In the bivariate case with standardized signs in
Figure 6.4, the null hypothesis can not be rejected in this case as Q2 (Y) is 0.012
with p-value 0.994.
>
> signs_i_2v <- spatial.sign(cork_2v, FALSE, TRUE)
>
> mv.1sample.test(cork_2v, score = "s", stand = "i")
data: cork_2v
Q.2 = 0.0117, df = 2, p-value = 0.9942
alternative hypothesis: true location is not equal to c(0,0)
68 6 One-sample problem: Spatial sign test and spatial median
0.5
E_N 0.0
−0.5
−1.0
1.0
0.5
0.0 S_N
−0.5
0.5
W_N 0.0
−0.5
Fig. 6.3 The scatterplot for (inner) standardized spatial signs for differences E-N, S-N and W-N.
The spatial signs lie in the 3-variate unit sphere.
>
> plot(signs_i_2v, xlab = "S_N", ylab = "W_E", ylim = c(-1, 1),
xlim = c(-1, 1), las = 1, pty = "s")
>
We show some connections to other test statistics proposed in the literature. Assume
the model (B2). Then
If μ = 0 and Ω = I p then
p n n
Q2 (Y) − ∑ ∑ Ui U j →P 0.
n i=1 j=1
Therefore, in this case, the spatial sign test statistic is asymptotically equivalent to
Rayleigh’s statistic
6.1 Multivariate spatial sign test 69
1.0
0.5
W_E
0.0
−0.5
−1.0
S_N
Fig. 6.4 The scatterplot for standardized spatial signs for differences S-N and W-E.
p n n p n n
∑ ∑
n i=1 j=1
cos(Ui , U j ) = ∑ ∑ cos(yi , y j ),
n i=1 j=1
where cos(yi , y j ) is the cosine of the angle between yi and y j . Next note that if
S = S(Y) is Tyler’s scatter matrix then
p n n
Q2 (YS−1/2 ) = ∑ ∑ cos(Ûi , Û j ).
n i=1 j=1
where the proportion p̂i, j is the observed fraction of times that yi and y j fall on oppo-
site sides of data-based hyperplanes formed by the origin and p − 1 data points. This
is an extension of the Blumen (1958) bivariate sign test. The test statistic is affine
invariant and strictly distribution-free under the model (B1) of elliptical directions.
It is remarkable that no scatter matrix estimate is then needed to attain affine equiv-
ariance. The test is, however, computationally difficult in high dimensions. See also
Chaudhuri and Sengupta (1993) and Koshevoy et al. (2004).
70 6 One-sample problem: Spatial sign test and spatial median
The sign test using the affine equivariant Oja signs (see Oja (1999)) is also, in the
elliptic case, asymptotically equivalent to the invariant version of the spatial sign test
using Q2 (YS−1/2 ). The latter is again computationally much more convenient. For
general classes of distribution-free bivariate sign tests, see Oja and Nyblom (1989)
and Larocque et al. (2000).
The one-sample location test based on marginal signs is described in Puri and
Sen (1971). The test is not affine invariant but it is invariant under odd monotone
transformations to the marginal variables. Affine invariant versions are obtained us-
ing the transformation technique described in Chakraborty and Chaudhuri (1999).
See also the approach based on the invariant coordinate system in Nordhausen et al.
(2009).
Next we consider the one-sample location estimation problem. The estimate cor-
responding to the spatial sign test is the so-called spatial median μ̂ = μ̂ (Y) which
is, under general assumptions, a root-n consistent estimate of the true spatial popu-
lation median μ , and has good limiting and finite-sample distributional properties.
The estimate μ̂ is also robust with a bounded influence function and a breakdown
point 1/2. In addition, we also give an affine equivariant version of the estimate.
Definition 6.2. The sample spatial median μ̂ (Y) minimizes the criterion function
AVE{|yi − μ |} or, equivalently,
Dn (μ ) = AVE{|yi − μ | − |yi|}.
The spatial median has a very long history starting in Weber (1909), Gini and
Galvani (1929) and Haldane (1948). Gower (1974) used the term mediancenter.
Brown (1983) has developed many of the properties of the spatial median. This min-
imization problem is also sometimes known as the Fermat-Weber location problem.
Taking the gradient of the objective function, one sees that if μ̂ solves the equation
AVE{U(yi − μ̂ )} = 0,
6.2 Multivariate spatial median 71
then μ̂ is the observed spatial median. This shows the connection between the spatial
median and the spatial sign test. The estimate μ̂ is the value of the location parameter
which, if used as a null value, offers the highest possible p-value. The solution can
also be seen as the shift vector corresponding to the inner centering of the spatial
sign score test; the location shift makes the spatial signs (directions) of the centered
data points sum up to 0.
The spatial median is unique, if the dimension of the data cloud is greater than
one Milasevic and Ducharme (1987). The Weiszfeld algorithm for the computation
of the spatial median has a simple iteration step,
AVE{U(yi − μ )}
μ ← μ + .
AVE{|yi − μ |−1}
The algorithm may fail sometimes, however, but a slightly modified algorithm that
converges quickly and monotonically is described by Vardi and Zhang (2001).
Next we consider the consistency and limiting distribution of the spatial median
μ̂ (Y). Under general assumptions, if Y is a random sample from F, then the sample
spatial median μ̂ (Y) converges to the population spatial median μ = μ (F).
Definition 6.3. Assume that the cumulative distribution function of yi is F. Then the
theoretical or population spatial median μ = μ (F) minimizes the criterion function
Note that, as |yi − μ | − |yi| ≤ |μ |, the expectation in the above definition always
exists.
For the asymptotical properties of the estimate we need the following assump-
tion.
Assumption 1 The density function of yi is uniformly bounded and continuous.
Moreover, the population spatial median μ = μ (F) is unique; that is, D(μ ) > D(μ )
for all μ = μ .
For the consideration of the limiting distribution, we can assume that the popu-
lation spatial median is 0. This is not a restriction as the estimate and the functional
are clearly shift equivariant. Note also that no assumption about the existence of the
moments is needed.
First note that both Dn (μ ) and D(μ ) are convex and that, based on Lemmas 6.1
and 6.3, we have pointwise convergences
Dn (μ ) →P D(μ ) = μ Aμ + o(|μ |2 )
and
72 6 One-sample problem: Spatial sign test and spatial median
√ 1
nDn (n−1/2 μ ) − nT − Aμ μ = oP (1).
2
Then Theorem B.5 in Appendix B gives the following result.
√
Theorem 6.5. Under Assumption 1, μ̂ →P μ and the limiting distribution of n(μ̂ −
μ ) is
N p 0, A−1 BA−1 .
The proofs can be found in Appendix B. Heuristically, the results in Theorem 6.5
can be simply based on Taylor’s expansion with μ = 0,
√ √ √
0 = n AVE{U(yi − μ̂ )} = n AVE{Ui } − AVE{A(yi )} nμ̂ + oP(1),
where
A(y) = |y|−1 (I − |y|−2 yy )
is the p × p Hessian matrix (matrix of the second derivatives) of |y|. The result
follows as
AVE{A(yi )} →P A
and √
n AVE{Ui } →d N p (0, B).
Then write
which, under the stated assumption, converge in probability to the population values
Note that à and B̃ would be our estimates for A and for B, respectively, if we knew
the true value μ = 0. Then naturally
à →P A and B̃ →P B.
As " "
" a−b a "" |b|
"
" |a − b| − |a| " ≤ 2 |a| , ∀ a = 0, b
Then
1 n
à −  = ∑ (A(yi ) − A(yi − μ̂ ))
n i=1
1 n
= ∑ (I1i · [A(yi) − A(yi − μ̂ )])
n i=1
1 n
+ ∑ (I2i · [A(yi ) − A(yi − μ̂ )])
n i=1
1 n
+ ∑ (I3i · [A(yi ) − A(yi − μ̂ )]) .
n i=1
The first average, with nonzero terms in a shrinking neighborhood of μ̂ only, is zero
with a probability
n n
δ p cM δ 2 cM
→ e−cMδ2 ,
2
P(I11 = · · · = I1n = 0) ≥ 1 − 2 p/2 ≥ 1− 2
n n
74 6 One-sample problem: Spatial sign test and spatial median
where M = supy f (y) < ∞ and c = π p/2 /Γ ((p + 3)/2) is the volume of the p-variate
unit ball. The first average is thus zero with a probability that can be made close to
one with small choices of δ2 > 0. For the second average, one gets
1 n 1 n 6I2i |μ̂ |
∑
n i=1
|I2i · [A(y i ) − A(y i − μ̂ )]| ≤ ∑ |yi − μ̂ ||yi|
n i=1
1 n 6I2i δ1
≤ ∑ δ2 |yi |
n i=1
which converges to a constant that can be made as close to zero as one wishes with
small δ3 > 0. Finally also the third average
1 n 1 n 6I3i |μ̂ |
∑
n i=1
|I3i · [A(yi ) − A(yi − μ̂ )]| ≤ ∑
n i=1 |yi − μ̂ ||yi |
1 n 6I3i δ1
≤ √ ∑
n n i=1 δ3 |yi |
Theorems 7.4 and 6.6 thus imply that the distribution of μ̂ can be approximated
by
1 −1 −1
Np μ , Â B̂Â ,
n
and approximate confidence ellipsoids for μ can be constructed. In the spherically
symmetric case the estimation is much easier as the matrices are simply (p > 1)
(p − 1)E[|yi − μ |−1 ] 1
A= Ip and B= Ip.
p p
An estimate of the limiting covariance matrix of the spatial median is then
p 1
Ip.
(p − 1) [AVE{|yi − μ̂ |−1 }]2
2
The spatial median is extremely robust: Brown showed that the estimator has a
bounded influence function. It also has a breakdown point of 1/2. See Niinimaa and
Oja (1995) and Lopuhaä and Rousseeuw (1991).
If all components are on the same unit of measurement (and all the components
may be rescaled only in a similar way), the spatial median is an attractive descriptive
measure of location. Rotating the data cloud rotates the median correspondingly;
that is,
6.2 Multivariate spatial median 75
Theorem 6.7. Let S = S(Y) be any scatter matrix. Then the transformation retrans-
formation spatial median
is affine equivariant.
It is remarkable that the almost sure convergence and the limiting normality of
the spatial median did not require any moment assumptions. Therefore, for the trans-
formation, a scatter matrix with weak assumptions should be used as well. It is an
appealing idea also to link the spatial median with Tyler’s transformation. This was
proposed Hettmansperger and Randles (2002).
Definition 6.4. Let μ be a p-vector and S > 0 a symmetric p × p matrix, and define
The Hettmansperger-Randles (HR) estimate of location and scatter are the values of
μ and S that simultaneously satisfy
AVE {U(ε i )} = 0 and p AVE U(ε i )U(ε i ) = I p .
In the HR estimation, the location estimate is the TR spatial median, and the
scatter estimate is Tyler’s estimate with respect to the TR spatial median. The shift
vector and scatter matrix are thus obtained using inner centering and standardization
with the spatial sign score function U(y). The location and scatter estimates are
affine equivariant and apparently estimate μ and Σ in the model (B2) of elliptical
76 6 One-sample problem: Spatial sign test and spatial median
Theorem 6.8. Let Y = (y1 , ..., yn ) be a random sample and assume that the yi are
generated by
yi = Ω ε i + μ , i = 1, ..., n,
where
E {U(ε i )} = 0 and p E U(ε i )U(ε i ) = I p .
√
Then the limiting distribution of n(μ̃ − μ ) is N p (0, p−1 S1/2 A−2 S1/2 ) where A =
E(A(S−1/2 yi )) and S is Tyler’s scatter matrix.
The HR estimate is easy to compute even in high dimensions. The iteration steps
(as in M-estimation) first update the residuals, then the location center, and finally
the scatter matrix as follows.
1.
ε i ← S−1/2 (yi − μ ), i = 1, ..., n.
2.
S1/2 AVE{U(ε i )}
μ← μ + .
AVE{|ε i |−1 }
3.
S ← p S1/2 AVE{U(ε i )U(ε i ) } S1/2 .
Unfortunately, there is no proof so far for the convergence of the above algorithm
although in practice it always seems to work. There is no proof for the existence or
uniqueness of the HR estimate either. In practice, this is not a problem, however. If,
in the spherical case around the origin, the initial location and shape estimates, say
M and S are root-n consistent, that is,
√ √
nM = OP (1) and n(S − I p) = OP (1)
and tr(S) = p, then the k-step estimates (obtained after k iterations of the above
algorithm) satisfy
k
√ 1 √
nMk = nM
p
# k $
1 1 p √
+ 1− −1 p − 1
nAVE{ui } + oP(1)
p E(ri )
and
6.2 Multivariate spatial median 77
k
√ 2 √
n(Sk − I p ) = n(S − I p )
p+2
# k $
2 p + 2√
+ 1− n p · AVE{ui ui } − I p + oP (1).
p+2 p
Example 6.5. Cork boring data. If we then wish to estimate the unknown spatial
median (3-variate case), then the regular spatial median and HR estimate behave in
a quite similar way. See also Figure 6.5.
> summary(est.sign.o.2v)
The spatial median of cork_2v is:
[1] -0.3019 0.0580
5
E_N
−5
−20
5
5
S_N
0
0
−10
−10
0 5
W_N
−10
−20
−20 −5 5 −10 0 5
spatial median
equivariant spatial median
Fig. 6.5 The scatterplot with estimated regular spatial median and the HR estimate and their 95%
confidence ellipsoids.
The regular spatial median and HR estimate behave almost identically in the
case of bivariate data; see Figure 6.6. However, if the second component is multi-
plied by 10 ( Figure 6.7), the results differ. The equivariant spatial median now has
a smaller confidence ellipsoid because the regular spatial median loses efficiency
if the marginal variables are heterogeneous in their variation. The R code for this
comparison is as follows.
6.2 Multivariate spatial median 79
−20 0 10
5 10
S_N
0
−10
10
W_E
0
−20
−10 0 5 10
spatial median
equivariant spatial median
Fig. 6.6 The scatterplot with estimated regular spatial median and the HR estimate and their 95%
confidence ellipsoids.
−200 0 100
5 10
S_N
0
−10
100
W_E
0
−200
−10 0 5 10
spatial median
equivariant spatial median
Fig. 6.7 The estimates with 95 % confidence ellipsoids for the regular spatial median and the affine
equivariant spatial median (HR estimate).
See Puri and Sen (1971) and Rao (1988) for the asymptotic covariance matrix
of the vector of marginal sample medians. The asymptotic efficiencies naturally
agree with the univariate asymptotic efficiencies. It is not affine invariant but the
transformation-retransformation technique can be used to find the invariant version
of the estimate; see Chakraborty and Chaudhuri (1998).
and the multivariate Liu median Liu (1990). For these and other multivariate medi-
ans, see the surveys by Small (1990) and Niinimaa and Oja (1999).
Chapter 7
One-sample problem: Spatial signed-rank test
and Hodges-Lehmann estimate
Abstract The spatial signed-rank score function Q(y) is used for the one-sample
location problem. The test is then the spatial signed-rank test, and the estimate is the
spatial Hodges-Lehmann estimate. The tests and estimates based on outer standard-
ization as well as those based on inner standardization are again discussed.
yi = μ + ε i , i = 1, ..., n,
E(U(ε i + ε j )) = 0, i = j.
Again, we start by giving the (theoretical) score function QF , the test statistic
T(Y), and the matrices A and B. The multivariate spatial signed-rank test is obtained
if one uses the spatial signed-rank score function
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 83
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 7,
c Springer Science+Business Media, LLC 2010
84 7 One-sample problem: Spatial signed-rank test and Hodges-Lehmann estimate
1
T(y) = QF (y) = E {U(y − εi ) + U(y + εi)} .
2
Then the test for testing H0 : μ = 0 uses
Qi = Q(yi ), i = 1, ..., n.
and the test statistic using the estimated scores is simply the following.
Definition 7.1. The spatial signed-rank test statistic for testing H0 : μ = 0 is the
average of spatial signed-ranks,
T̂(Y) = AVE{Qi }.
As AVE U(yi − y j ) = 0 the test statistic can be simplified to
1
T̂(Y) = AVE U(yi + y j ) .
2
The statistic T̂ is a V-statistic and asymptotically equivalent to the corresponding
U-statistic
1
T̃(Y) = AVEi< j U(yi − y j ) + U(yi + y j ) .
2
For the theory of U-statistics and V-statistics, we refer to Serfling (1980). Asymp-
totic equivalence means that
√
n T̃(Y) − T̂(Y) →P 0.
The test statistic T̂ is only orthogonal equivariant, not affine equivariant. Moreover,
its finite sample and asymptotic null distribution depend both on the distributions
7.1 Multivariate spatial signed-rank test 85
of modulus |yi | and on the distribution of the direction Ui . A natural estimate of its
asymptotic covariance matrix is the signed-rank covariance matrix,
B̂ = B(Y) = QCOV(Y) = AVE Qi Qi .
We thus replace the “true” but unknown test statistic T(Y) by test statistic T̂(Y)
which in turn is asymptotically equivalent with T̃(Y). It is straightforward to see
that T(Y) is the projection of T̃(Y) in the sense that
n
T(Y) = ∑ E[T̃(Y)|yi ]
i=1
and therefore (use Theorem 5.3.2 in Serfling (1980) with a bounded kernel) the
following holds.
√ √
Lemma 7.2. n[T(Y) − T̃(Y)] →P 0 and also n[T(Y) − T̂(Y)] →P 0.
The regular central limit theorem (CLT) with independent and identically dis-
tributed observations then gives for T̂ = T̂(Y)
√
Theorem 7.1. Under H0 : μ = 0, nT̂ →d N(0, B) and
E(U(y1 + y2 − 2 μ )) = 0,
then the limiting distribution of the test statistic Q2 under H0 : μ = 0 is a chi square
distribution with p degrees of freedom and the approximate p-values are found as
the tail probabilities of χ p2 .
Recall that Hotelling’s test and the test based on spatial signs were similarly
respectively.
and the exact p-value for the conditionally distribution-free sign-change test is
EJ I Q2 (JY) ≥ Q2 (Y) .
yi = Ω ε i + μ , i = 1, ..., n,
Theorem 7.2. Let S = S(Y) be any scatter matrix. Then the signed-rank test statistic
calculated for the transformed data set, Q2 (YS−1/2 ), is affine invariant.
A natural choice for S is the scatter matrix that makes the signed-rank covariance
matrix of the standardized observations proportional to the identity matrix. Let us
give the following definition.
Unfortunately, unlike for Tyler’s scatter matrix, there is no proof of the convergence
of the algorithm so far, but in practice it seems always to converge.
which is simply np times the ratio of the squared length of the average signed-rank
to the average of squared lengths of signed-ranks. As in the case of the spatial sign
test, we have the next theorem.
Theorem 7.3. Test statistic Q2 (YS−1/2 ) is affine invariant and, under the null hy-
pothesis H0 : μ = 0,
Q2 (YS−1/2 ) →d χ p2 .
yi = Ω ε i + μ , i = 1, ..., n,
where constant τ 2 depends on the distribution of |ε i |. But then the test statistic
Q2 (YS−1/2 ) is asymptotically equivalent to
p
4n3 τ 2 ∑
cos(ε i + ε j , ε i + ε j ),
where i, j, i , and j all go over indices 1, ..., n. Jan and Randles (1994) constructed
an affine invariant analogue of this test based again on the interdirection counts.
Example 7.1. Cork boring data Consider again the datasets with a 3-variate
vector of E-N, S-N and W-N. The standardized spatial signed-ranks are illustrated
in Figures 7.1. The observed value of Q2 (Y), with inner standardization, in the 3-
variate case is 13.67 and the corresponding p-value is 0.003.
88 7 One-sample problem: Spatial signed-rank test and Hodges-Lehmann estimate
data: cork_3v
Q.2 = 13.67, df = 3, p-value = 0.003384
alternative hypothesis: true location is not equal to c(0,0,0)
0.6
0.4
0.2
E_N 0.0
−0.2
−0.4
−0.6
0.6
0.4
0.2
0.0
S_N
−0.2
−0.4
0.4
0.2
0.0
W_N −0.2
−0.4
−0.6
−0.8
−0.6 −0.2 0.2 0.6 −0.8 −0.4 0.0 0.4
Fig. 7.1 The standardized spatial signed-ranks for the 3-variate data.
Next consider the bivariate data with variables S-N and W-E. The standardized
spatial signed-ranks are illustrated in Figure 7.2. The observed value of Q2 (Y) with
inner standardization is 0.44 with corresponding p-value 0.80.
data: cork_2v
Q.2 = 0.4373, df = 2, p-value = 0.8036
alternative hypothesis: true location is not equal to c(0,0)
1.0
0.5
W_E
0.0
−0.5
−1.0
S_N
Fig. 7.2 The standardized spatial signed-ranks for the 2-variate data.
We now move to the estimation problem and define the multivariate Hodges-
Lehmann estimate of location center μ as the spatial median of all pairwise means,
Walsh averages
yi + y j
, i, j = 1, ..., n.
2
90 7 One-sample problem: Spatial signed-rank test and Hodges-Lehmann estimate
Definition 7.3. The sample spatial Hodges-Lehmann (HL) estimate μ̂ (Y) mini-
mizes the criterion function
Dn (μ ) = AVE |yi + y j − 2 μ | − |yi + y j | .
The link between the HL estimate and the signed-rank test statistic is again that
μ̂ often solves the equation
AVE{U(yi + y j − 2 μ̂ )} = 0.
The estimate μ̂ as a null value is the value with the highest possible p-value pro-
duced by the spatial signed-rank test. Again, the spatial median is unique, if the
dimension of the data cloud is greater than one. The iteration step to compute its
value is
1 AVE{U(yi + y j − 2 μ )}
μ ← μ + .
2 AVE{|(yi + y j ) − 2 μ |−1 }
Consider next the limiting distribution of the HL estimate μ̂ (Y) under mild as-
sumption.
Assumption 2 The density of yi is uniformly bounded and continuous with a unique
spatial median minimizing
D(μ ) = E |yi + y j − 2 μ | − |yi + y j | , i = j.
Dn (μ ) →P D(μ ) = 4μ Aμ + o(|μ |2 )
and √
nDn (n−1/2 μ ) − 4( nT − Aμ ) μ = oP (1),
where
A = E A (yi + y j ) , i = j
with A(y) = |y|−1 (I − |y|−2yy ). Then Theorem B.5 in Appendix B gives the fol-
lowing result.
√
Theorem 7.4. Under Assumption 2, μ̂ →P μ and the limiting distribution of n(μ̂ −
μ ) is
N p 0, A−1 BA−1 .
7.2 Multivariate spatial Hodges-Lehmann estimate 91
We next define sample statistics needed in the estimation of the limiting covari-
ance matrix of the HL estimate. Now write
 = A(Y − 1n μ̂ ) = AVE A (yi + y j − 2 μ̂ ) and B̂ = B(Y − 1n μ̂ )
which, under the stated assumption, converge in probability to the population values
A = E A (ε i + ε j ) , i = j, and B = E QF (ε i )QF (ε i ) ,
respectively. (The proof is similar to that in the spatial median case.) Theorem 7.4
suggests that the distribution of μ̂ (Y) can be approximated by
1 −1 −1
N p μ , Â B̂Â ,
n
The simultaneous estimates of location and scatter based on the spatial signed-rank
score function are the values of μ and S that satisfy
AVE {Q(ε i )} = 0 and p AVE Q(ε i )Q(ε i ) = AVE Q(ε i ) Q(ε i ) I p .
This pair of location and scatter estimates is easy to compute. Again the first
iteration step updates the residuals, the second one updates the location center, and
finally the third one updates the scatter matrix as follows.
1.
2.
1 S1/2 AVE{U(ε i + ε j )}
μ← μ + .
2 AVE{|ε i + ε j |−1 }
3.
Note, however, that the location estimate with inner standardization is now for
the transformation-retransformation Hodges-Lehmann center, not for the regular
Hodges-Lehmann center. In the symmetric case the centers of course are the same.
92 7 One-sample problem: Spatial signed-rank test and Hodges-Lehmann estimate
Example 7.2. Cork boring data Consider again the two datasets, one with a 3-
variate vector of E-N, S-N and W-N and one with a bivariate S-N and W-E. We wish
to estimate the Hodges-Lehmann estimates in the 3-variate case and in the bivariate
cases. We give an R code to find the estimate and its covariance matrix. To compare
the estimates, the 95% confidence ellipsoids for the three estimates are illustrated in
Figures 7.3 and 7.4.
The signed-rank scores tests by Puri and Sen (1971) combine marginal signed-rank
scores tests in the widest symmetric nonparametric model. These tests are not affine
7.3 Other approaches 93
−4 0 2 4 −8 −4 0
0
E_N
−4
−8
0 2 4
0 2 4
S_N
−4
−4
0
−4
W_N
−8
−8 −4 0 −4 0 2 4
Fig. 7.3 The standardized spatial signed-ranks for the 3-variate data.
−6 −2 2 6
2 4 6
S_N
−2
−6
6
2
W_E
−2
−6
−6 −2 2 4 6
Fig. 7.4 The standardized spatial signed-ranks for the 2-variate data.
94 7 One-sample problem: Spatial signed-rank test and Hodges-Lehmann estimate
invariant and may have poor efficiency if the marginal variables are dependent. In-
variant versions of Puri-Sen tests are obtained if the data points are first transformed
to invariant coordinates; see Chakraborty and Chaudhuri (1999) and Nordhausen
et al. (2006).
The optimal signed-rank scores tests by Hallin and Paindaveine (2002) are based
on standardized spatial signs (or Randles’ interdirections; see Randles (1989) for the
corresponding sign test) and the ranks of Mahalanobis distances between the data
points and the origin. These tests assume ellipticity but do not require any moment
assumption. The tests are optimal (in the Le Cam sense) at correctly specified (ellip-
tical) densities. They are affine-invariant, robust, and highly efficient under a broad
range of densities. Later Oja and Paindaveine (2005) showed that interdirections
together with the so-called lift-interdirections allow for building totally hyperplane-
based versions of these tests. Nordhausen et al. (2009) constructed optimal signed-
rank tests in the independent component model in a similar way.
The sign and signed-rank test in Hettmansperger et al. (1994) and Hettmans-
perger et al. (1997) are based on multivariate Oja signs and signed-ranks. They can
be used in all models above, are asymptotically equivalent to spatial sign and signed-
rank tests in the spherical case, and are affine-invariant. However, at the elliptic
model, their efficiency (as well as that of the spatial sign and signed-rank tests) may
be poor when compared with the Hallin and Paindaveine tests.
Chaudhuri (1992) gives Bahadur-type representations for the spatial median and
the spatial Hodges-Lehmann estimate. See also Möttönen et al. (2005) for multi-
variate generalized spatial signed-rank methods.
Chapter 8
One-sample problem: Comparisons of tests
and estimates
Abstract The efficiency and robustness properties of the tests and estimates are
discussed. The estimates (the mean vector, the spatial median, and the spatial
Hodges-Lehmann estimate) are compared using their limiting covariance matrices.
The Pitman asymptotical relative efficiencies (ARE) of the spatial sign and spatial
signed-rank tests with respect to the Hotelling’s T 2 are considered in the multivari-
ate t distribution case. The tests using inner and outer standardizations are compared
as well. Simulation studies and some analyses of real datasets are used to illustrate
the difference between the estimates and between the tests.
In this section we first consider the limiting Pitman efficiences of the spatial sign and
signed-rank tests with respect to the classical Hotelling’s T 2 -test in the one-sample
location case. In this comparison we assume that Y is a random sample from a
symmetrical distribution around μ with a p-variate density function f (y − μ ). Here
f (y) is a density function symmetrical around the origin; that is, f (−y) = f (y). We
wish to test the null hypothesis
H0 : μ = 0.
We write L(y) = −∇ log f (y) for the optimal location score function. We also
assume that the Fisher information matrix I = E{L(yi )L(yi ) } is bounded.
T = T(Y) = AVE{T(yi )}
for a p-vector valued function T(y). In the one-sample location case, it is natural
to assume that T is odd so that E{T(yi )} = 0. It is then well known that, if one is
interested in the high efficiency of the test, the best choice for the score function
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 95
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 8,
c Springer Science+Business Media, LLC 2010
96 8 One-sample problem: Comparisons of tests and estimates
L = L(Y) = AVE{L(yi )}
for this optimal test statistic. Also recall that the identity score function, T(y) = y,
yields a test that is asymptotically equivalent with Hotelling’s T 2 and optimal under
multivariate normality.
Using the multivariate central limit theorem we get the followin lemma.
Theorem 8.1. Assume that the alternative sequences of the form Hn : f (y − n−1/2δ )
are contiguous satisfying, under H0 ,
!
n
f (yi − n−1/2δ ) 1
∑ log f (y i )
= n1/2L δ − δ I δ + oP(1).
2
i=1
Then, under the alternative sequences Hn , the limiting distribution of the test statis-
tic n1/2 T is a p-variate normal distribution with mean vector Aδ and covariance
matrix B.
Corollary 8.1. Under the sequence of contiguous alternatives the limiting distribu-
tion of the squared test statistic Q2 = nT B−1 T is a noncentral chi-square distribu-
tion with p degrees of freedom and the noncentrality parameter δ A B−1 Aδ .
In the case of the null distribution, we could show (using Slutsky’s theorem) that
This is then true for the contiguous alternative sequences as well, and the limiting χ p2
distribution with noncentrality parameter δ A B−1 Aδ holds also for the tests where
the true value B is replaced by its convergent estimate B̂.
8.1 Asymptotic relative efficiencies 97
As all the test statistics have limiting distributions of the same type, χ p2 , the
Pitman asymptotical relative efficiencies (ARE) of the multivariate sign test and
multivariate signed-rank test relative to Hotelling’s T 2 , are simply the ratios of the
noncentrality parameters,
δ A B−1 Aδ
ARE = .
δ Σ −1 δ
In the following we compare the efficiency of the tests in the case of spherically
symmetrical distribution F. Assume that F is the distribution of y and write, as
before, r = |y| and u = |y|−1 y. In the spherically symmetric case the Pitman ARE
of the spatial sign test with respect to the Hotelling’s T 2 is then simply
p−1
ARE1 = E{r2 }E2 {r−1 }.
p
The theoretical signed-rank function Q(y) = q(r)u in the multivariate normal and
in the t distribution cases is given in Examples 4.3 and 4.4. The Pitman efficiency
of the spatial signed-rank test with respect to Hotelling’s test is then
ν (ν + p)2 2 r −1
ARE2 = E q(r) E{q2 (r)} ,
p(ν − 2) ν + r2
1 2
ARE2 = E {q(r)r}[E{q2 (r)}]−1 ,
p
Table 8.1 Asymptotic relative efficiencies of the sign test and the signed-rank test relative to
Hotelling’s T 2 under p-variate t distributions with ν degrees of freedom for selected values of p
and ν .
spatial median and the spatial Hodges-Lehmann estimate) are root-n consistent and
√
n(μ̂ − μ ) →D N p (0, A−1 BA−1 ),
where, as before,
depend on the chosen score T(y) and the distribution F. The comparison of the
estimates is then based on the asymptotic covariance matrix A−1 BA−1 and possible
global measures of variation are, for example, the geometric mean or the arithmetic
mean of the eigenvalues, that is,
1/p
det(A−1 BA−1 ) or tr A−1 BA−1 /p.
The former is more natural as it is invariant under affine transformations to the orig-
inal observations. Note also that the volume of the approximate confidence ellipsoid
is proportional to det(A−1 BA−1 ). In the case of elliptically symmetric distributions,
it is then enough to consider the spherical cases only and, when comparing two
estimates, the ratio of the geometric means of the eigenvalues is the same as the
asymptotic relative efficiency of the corresponding tests. The asymptotical efficien-
cies listed in Table 8.1, for example, then also hold true for the corresponding esti-
mates.
Example 8.1. Estimates and confidence ellipsoids for data sets with outliers.
In our first example we consider again the 3-variate and bivariate datasets with mea-
surements E-N, S-N and W-N and S-N and W-E, respectively. We use the estimates
with inner standardization in the comparisons. Figure 8.1 shows how different es-
timates and corresponding confidence ellipsoids change if the first observation in
the 3-variate data set is moved to (−50, 50, −50) (one outlier). The estimates and
8.2 Finite sample comparisons 99
the confidence ellipsoids for the original dataset are given in Figure 7.3. Note that
the sample mean and the corresponding confidence ellipsoid react strongly to the
outlying observation. The sample mean moves in the direction of the outlier and the
shape of the ellipsoid is also changing. The R code used for the comparison is as
follows.
−6 −2 2 6 −12 −8 −4 0
0
−4
E_N
−12 −8
6
6
2
S_N
−2
−2
0 −6
−6
−4
W_N
−8
−12
−12 −8 −4 0 −6 −2 2 6
Fig. 8.1 The confidence ellipsoids for the contaminated 3-variate data.
In Figure 8.2 a similar behavior of the estimates is illustrated for the bivariate
data. Now the first observation is replaced by (50, 50) . The estimates and the confi-
dence ellipsoids for the original data set are given in Figure 7.4. The R code follows.
100 8 One-sample problem: Comparisons of tests and estimates
−6 −2 2 6
6
2
S_N
−2
−6
6
2
W_E
−2
−6
−6 −2 2 6
Fig. 8.2 The confidence ellipsoids for the contaminated bivariate data.
0.0
−0.2
var 1
−0.5
0.2
var 2
0.0
−0.2
Fig. 8.3 A sample from a bivariate normal distribution: estimates with 95% confidence ellipsoids.
var 1
−0.4
0.6
0.6
0.2
0.2
var 2
−0.2
−0.2
0.2 0.4
var 3
−0.1
Fig. 8.4 A sample from a 3-variate t3 distribution: estimates with 95% confidence ellipsoids.
102 8 One-sample problem: Comparisons of tests and estimates
As expected, in the multivariate normal case, the accuracies of the sample mean vec-
tor and the Hodges-Lehmann estimate are almost the same (the asymptotic relative
efficiency of the HL-estimate is close to one), and the spatial median is poorest in
this sense. In the heavy-tailed t3 distribution case, the spatial median has the smallest
confidence ellipsoid, and the mean vector is now very poor in its efficiency. These
results are well in accordance with the asymptotical efficiencies reported in Table
8.1. The datasets, estimates, and plots were obtained as follows.
> set.seed(1234)
> X.N <- rmvnorm(100,c(0,0))
>
> est1.N <- mv.1sample.est(X.N)
> est2.N <- mv.1sample.est(X.N, "s", "i")
> est3.N <- mv.1sample.est(X.N, "r", "i")
>
> plotMvloc(est1.N, est2.N, est3.N, X.N, color.ell=1:3, ,
lty.ell=1:3, pch.ell= 15:17)
>
> set.seed(1234)
> X.t3 <- rmvt(100, diag(3), 3)
>
> est1.t3 <- mv.1sample.est(X.t3)
> est2.t3 <- mv.1sample.est(X.t3, "s", "i")
> est3.t3 <- mv.1sample.est(X.t3, "r", "i")
>
> plotMvloc(est1.t3, est2.t3, est3.t3, X.t3, alim="e",
color.ell=1:3, , lty.ell=1:3, pch.ell= 15:17)
0.8
0.6
p−value
0.4
0.2
Hotelling T^2
Inner spatial sign test
Inner spatial signed−rank test
Outer spatial sign test
0.0 Outer spatial signed−rank test
0 1 2 3 4 5
Fig. 8.5 The p-values of the tests as a function of the measurement unit. The second component
is multiplied by c.
Example 8.4. Simulation studies to compare the tests. Next we compare the
finite sample efficiencies of the three competing tests, Hotelling’s T 2 -test, the spa-
tial sign test, and the spatial signed-rank test. Again the samples of sizes n = 50
were simulated from a 3-variate standard normal distribution and from a 3-variate
t3 distribution with covariance matrices I3 . The powers of the tests for alternatives
μ = (0, 0, 0) , (0, 0, 0.25), (0, 0, 0.50), and (0, 0, 0.75) were estimated by generat-
ing 1000 samples of size n = 50 from all these distributions. In the tests we used
the asymptotical critical value χ p,0.95
2 . The R code for the simulation in the 3-variate
normal case with the results follows.
> set.seed(1)
> Sign.N.0.00 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0)),score="s",stand="i")$p.value)
> Sign.N.0.25 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.25)),score="s",stand="i")$p.value)
> Sign.N.0.50 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.50)),score="s",stand="i")$p.value)
> Sign.N.0.75 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.75)),score="s",stand="i")$p.value)
>
> set.seed(1)
> Rank.N.0.00 <- replicate(1000, mv.1sample.test(rmvnorm
(50, c(0,0,0)),score="r",stand="i")$p.value)
> Rank.N.0.25 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.25)),score="r",stand="i")$p.value)
> Rank.N.0.50 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.50)),score="r",stand="i")$p.value)
> Rank.N.0.75 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.75)),score="r",stand="i")$p.value)
>
> power.Hot.N <- rowMeans(rbind(Hot.N.0.00, Hot.N.0.25,
Hot.N.0.50, Hot.N.0.75) <= 0.05)
> power.Sign.N <- rowMeans(rbind(Sign.N.0.00, Sign.N.0.25,
Sign.N.0.50, Sign.N.0.75) <= 0.05)
> power.Rank.N <- rowMeans(rbind(Rank.N.0.00, Rank.N.0.25,
Rank.N.0.50, Rank.N.0.75) <= 0.05)
>
> res.N <- cbind(delta= seq(0,0.75,0.25), power.Hot.N,
power.Sign.N, power.Rank.N)
> rownames(res.N) <- NULL
> res.N
delta power.Hot.N power.Sign.N power.Rank.N
[1,] 0.00 0.058 0.040 0.035
[2,] 0.25 0.323 0.235 0.251
[3,] 0.50 0.873 0.747 0.818
[4,] 0.75 0.996 0.984 0.992
>
First note that the spatial sign test and the spatial signed-rank test seem too con-
servative; the true rejection probability seems to be smaller than 0.05. Therefore the
permutation version of the tests should be used for small sample sizes. The Hotelling
test is naturally the best one in this case. Next the results in the t3 distribution case
follow. Again, the spatial sign test is most efficient in this case. This is in agreement
with the asymptotical relative efficiencies of the tests.
> res.T
delta power.Hot.T power.Sign.T power.Rank.T
[1,] 0.00 0.050 0.039 0.031
[2,] 0.25 0.190 0.215 0.204
8.2 Finite sample comparisons 105
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 107
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 9,
c Springer Science+Business Media, LLC 2010
108 9 One-sample problem: Inference for shape
In this chapter we consider the inference tools for Σ in models (A0), (A1), and
(A2), and the main focus is on the procedures based on the spatial signs and ranks.
The scatter parameter Σ is assumed to be nonsingular and it is decomposed, as
in Section 2.2, into two parts by Σ = σ 2Λ where σ 2 = σ 2 (Σ ) > 0 is a scalar-
valued scale parameter and Λ = σ −2 Σ is a matrix-valued shape parameter. The
scale functional σ 2 (Σ ) is supposed to satisfy
σ 2 (I p ) = 1 and σ 2 (cΣ ) = cσ 2 (Σ ).
tr(Σ ) p
Σ11 , , , or det(Σ )1/p .
p tr(Σ −1 )
The shape matrix Λ can be seen as a normalized version of the scatter matrix Σ and
is a well-defined parameter in models (A0), (A1), and (A2). See also Paindaveine
(2008). We wish to test the null hypothesis of sphericity, H0 : Λ = I p , and estimate
the unknown value of Λ . This is not a restriction: if one is interested in testing the
−1/2
null hypothesis H0 : Λ = Λ 0 , then one can first transform yi → Λ 0 yi and then
apply the test to the transformed observations.
Most of the results here are given under the assumption of elliptical symmetry.
Recall that a random variable yi is elliptically symmetric if ε i is spherically sym-
metric. The density function of yi is then of the form
det(Σ )−1/2 f Σ −1/2 (y − μ ) ,
where
f (ε ) = exp(−ρ (|ε |))
with some function ρ . Then parameter μ is the symmetry center of the distribution
of yi , and parameter Σ is a positive definite symmetric p × p scatter matrix. It is easy
to see that the shape matrix estimate
Λ̂ = σ 2 (S)−1 S
9.2 Important matrix tools 109
Here we recall some matrix tools introduced and already used in Section 3.3. See
also Appendix A. As before, K p,p is the commutation matrix, that is, a p2 × p2 block
matrix with (i, j)-block being equal to a p × p matrix that has one at entry ( j, i) and
zero elsewhere, and J p,p for vec(I p )vec(I p ) . These matrices K p,p and J p,p have the
following interesting properties.
Matrix
1 1
C p,p = (I p2 + K p,p) − J p,p
2 p
projects a vectorized matrix vec(A) to the space of symmetrical and centered vec-
torized matrices. (Recall that C p,p = P1 + P2 where P1 and P2 are projections intro-
duced in Section 3.3.) The tests and estimates for the shape parameter are based on
the squared norm of such a projection,
Q2 (A) = 0 ⇔ A ∝ I p .
is
110 9 One-sample problem: Inference for shape
r
A− = ∑ λi−1 oi oi .
i=1
A general idea to construct tests and estimates for location was to use a p-vector
valued score function T(y) yielding individual scores Ti = T(yi ), i = 1, ..., n. To
attain affine equivariance/invariance of the location procedures we used either
• Inner standardization of the scores: Find transformation matrix S−1/2 such that,
if T̂i = T(S−1/2 yi ), then
or
• Inner centering and standardization of the scores: Find shift vector μ̂ and trans-
formation matrix S−1/2 such that, if T̂i = T(S−1/2 (yi − μ̂ )), then
In the first case, depending on the chosen score, one gets a scatter or shape matrix
estimate S = S(Y) with respect to the origin. In the second case, simultaneous es-
timates of location and scatter, μ̂ and S, are obtained. Of course one should check
separately in each case whether the estimates really exist for a data set at hand.
Also it is not at all clear whether the estimates using different scores estimate the
same population quantity. This did not present a problem in the location problem
as the transformation S−1/2 was seen there only as a natural tool to attain affine
invariance/equivariance.
The algorithm for the shape matrix estimate S = S(Y) with respect to the origin
then uses the following two steps.
1.
T̂i ← T(S−1/2 yi ), i = 1, ..., n; T̂ ← (T̂1 , ..., T̂n ) .
2.
p
S ← S1/2 T̂ T̂S1/2 .
tr(T̂ T̂)
Note that in our approach n−1 T T is COV, UCOV, TCOV or RCOV depending on
which score function is chosen.
9.3 The general strategy for estimation and testing 111
Several approaches based on the regular covariance matrix for testing the shape
in the multivariate normal and elliptic case can be found in the literature. These
tests are thus based on the identity score T(y) = y and on the regular covariance
matrix. Mauchly (1940) showed that in the multivariate normal distribution case the
likelihood ratio test for testing the sphericity, that is, the null hypotheses H0 : Λ = I p ,
is given by
n/2
det(COV)
L= ,
[tr(COV)/p] p
where COV = COV(Y) is the regular sample covariance matrix. Note that L is es-
sentially the ratio of two scale parameters det(COV)1/p and tr(COV)/p. Under the
null hypotheses, −2 log L ∼ χ(p+2)(p−1)/2
2
. Muirhead and Waternaux (1980) showed
that the test based on L may also be used to test the sphericity under elliptical mod-
els with finite fourth moments. Later, Tyler (1983) obtained a robust version of the
likelihood ratio test by replacing the sample covariance matrix with a robust scatter
matrix estimator.
Also John (1971) and John (1972) considered the testing problem at the normal
distribution case. He showed that the test
" "
np2 "" COV 1 ""2 np2 2 COV
Q2J = − I = Q ,
2 " tr(COV) p "
p
2 tr(COV)
is the locally most powerful invariant test for sphericity under the multivariate nor-
mality assumption. This test is, however, valid only under the multivariate normality
assumption. In the wider elliptical model one can use a slight modification of John’s
test, which remains asymptotically valid under elliptical distributions but of course
needs the assumption on the finite fourth moments. The modified John’s test is de-
fined as
np2 COV
QJ =
2
Q 2
,
2(1 + κF ) tr(COV)
where κF is the value of the classical kurtosis measure based on the standardized
fourth moment of the marginal distribution, that is,
E(ε 4i j )
κF = − 3.
E 2 (ε 2i j )
Hn : Λ ∝ I p + n−1/2D,
where D is a symmetric matrix. Note that D fixes the “direction” for the alternative
sequence.
The test statistic is proportional to the variance of the eigenvalues of the spatial
sign covariance matrix.
Definition 9.1. The spatial sign test statistic is defined as
Q2 = Q2 (UCOV) = |C p,pUCOV|2 .
Recall that the value of Q2 (S) is equal to zero if and only if S ∝ I p . It is remark-
able that the finite sample (and limiting) null distribution of Q2 is the same for all
spherical distributions as the sign covariance matrix UCOV depends on the observa-
tions only through their direction vectors. The limiting distribution of Q2 under the
null hypothesis as well as under the alternative sequence is given by the following.
In a wider model (A3) where the shape parameter Λ is still welldefined, the
covariance matrix of vec(UCOV) can be estimated by
1
COV(vec(UCOV)) = AVE Ui Ui ⊗ Ui Ui − vec(UCOV)vec(UCOV) .
n
In the elliptic case n COV(vec(UCOV)) → p τ1 C p,p , and the statistic (which is valid
in the wider model)
(C p,p vec(UCOV)) COV(vec(UCOV))−
(C p,pvec(UCOV))
Next we introduce the shape matrix estimate S = S(Y) corresponding to the spa-
tial sign score and give its limiting distribution in the elliptic case. This estimate
was already used to standardize the observations in the one sample location testing
problem.
Definition 9.2. The Tyler shape estimate S based on spatial signs is the matrix that
solves
1
UCOV(YS−1/2 ) = Ip (9.1)
p
The estimate was given in Tyler (1987) where the limiting distribution is also
found.
Theorem 9.2. Under elliptical symmetry with shape parameter Λ , the limiting dis-
tribution of the shape estimate S is given by
√
n vec(S − Λ ) →d N p2 0, (p + 2)2τ C p,p (Λ ) .
The case of unknown location μ is√also considered in Tyler (1987). It is, for
example, possible to replace μ with a n-consistent estimate μ̂ without affecting
the asymptotic properties of UCOV or S. As mentioned before, Hettmansperger and
Randles (2002) propose a simultaneous estimation of the multivariate median μ and
a shape matrix Λ .
114 9 One-sample problem: Inference for shape
For the null hypothesis H0 : Λ = I p , the multivariate Kendall’s tau-type rank test
statistic TCOV = TCOV(Y) is constructed in exactly the same way as the sign
test statistic but for the pairwise differences. We denote the pairwise differences by
yi j = yi − y j and their spatial signs by Ui j = U(yi j ), 1 ≤ i, j ≤ n.
Definition 9.3. The Kendall’s tau covariance matrix is defined as
TCOV = AVEi< j Ui j Ui j .
This matrix is introduced and studied in Visuri et al. (2000). Because vec(TCOV)
is a U-statistic with bounded vector-valued kernel
h(yi , y j ) = Ui j ⊗ Ui j ,
with some τF > 0. In the general case, the covariance may be estimated by
4
COV(vec(TCOV)) = AVE Ui j Uik ⊗ Ui j Uik − vec(TCOV)vec(TCOV) .
n
The test statistic based on Kendall’s tau covariance matrix is defined as follows.
Definition 9.4. The Kendall’s tau test statistic is defined as
Q2 = Q2 (TCOV) = |C p,pTCOV|2 .
(C p,pvec(TCOV)) COV(vec(TCOV))−
(C p,p vec(TCOV)) .
The Kendall’s tau test statistic naturally leads to a companion estimate of shape.
9.6 Tests and estimates based on RCOV 115
Definition 9.5. The Dümbgen shape estimate S based on the spatial signs of pair-
wise differences, is the one that solves
1
TCOV(YS−1/2 ) = Ip.
p
The estimator was first introduced by Dümbgen (1998) and further studied by
Sirkiä et al. (2007) as a member of a class of symmetrized M-estimates of scatter.
The algorithm to calculate S is similar to that for calculating Tyler’s estimate. The
breakdown properties were considered in Dümbgen and Tyler (2005). The Dümbgen
estimate is again affine equivariant in the sense that S(YA ) ∝ AS(Y)A .
The limiting distribution in the elliptic case is given in the following theorem.
Theorem 9.4. At elliptical distribution with shape parameter Λ , the limiting distri-
bution of the shape estimate S is given by
√
nvec(S − Λ ) →d N p2 0, (p + 2)2τF C p,p(Λ ) .
The Spearman’s rho-type test statistic for the null hypothesis H0 : Λ = I p is con-
structed in the same way as the spatial sign test statistic but using spatial rank co-
variance matrix RCOV = RCOV(Y) instead of the spatial sign covariance matrix.
Denote again yi j = yi − y j and Ui j = S(yi j ).
Definition 9.6. The spatial rank covariance matrix is defined as
RCOV = AVE Ri Ri = AVE Ui j Uik .
This matrix is considered in Marden (1999b) and Visuri et al. (2000). Now
vec(RCOV) is (up to a constant) asymptotically equivalent to a U-statistic with
symmetric kernel
h(y1 , y2 , y3 ) = vec U12 U13 + U13U12 + U21 U23 + U23 U21 + U31U32 + U32 U31 ,
covering all six possible permutations of the three arguments. Then under the null
hypothesis,
τF 1
E(C p,p vec(RCOV)) = 0 and COV(C p,p vec(RCOV)) = C p.p + o ,
n n
again with some constant τF . In the general case, the covariance may be estimated
with
p,p vec(RCOV)) =
COV(C
9
AVE h(yi , y j , yk )h(yi , yl , ym ) − vec(RCOV)vec(RCOV) .
n
116 9 One-sample problem: Inference for shape
RCOV(YS−1/2 ) ∝ I p .
In practice the algorithm always seems to yield a unique solution. The following
theorem gives
√ the limiting distribution of the shape estimator assuming that it is
unique and n-consistent.
Theorem 9.6. Under the assumptions above, for an elliptical distribution with
shape parameter Λ , the limiting distribution of the shape estimate S is given by
√
n vec(S − Λ )→d N p2 0, (p + 2)2τF (c2F )−1 C p,p (Λ ) .
9.7 Limiting efficiencies 117
We next compare the sphericity tests based on UCOV, TCOV, and RCOV to the
classical (modified) John’s test. The modified John’s test is based on the test statistic
np2 COV
Q2J = Q2 ,
2(1 + κF ) tr(COV)
where κF is the classical kurtosis measure of the marginal distribution. The limiting
distribution of the modified John’s test under the alternative sequence is derived in
Hallin and Paindaveine (2006) and is given in the following.
Theorem 9.7. Under the alternative sequence Hn ,
1
Q2J →d χ(p+2)(p−1)/2
2
Q2 (D) .
2(1 + κF )
The limiting distributions of different test statistics are of the same type, therefore
the efficiency comparisons may simply be based on their noncentrality parameters.
The Pitman asymptotic relative efficiencies of tests based on UCOV, TCOV and
RCOV with respect to the modified John’s test (based on COV) reduce to
Recall that κF , τF , and c2F are constants depending on the underlying distribution.
Note that the Pitman AREs give the asymptotical relative efficiencies of the corre-
sponding shape estimates as well.
In Table 9.1, the limiting efficiencies are given under t-distributions with some
selected dimensions p and some degrees of freedom ν , with ν = ∞ referring again to
the multivariate normal case. Note that κF = 0 for the multivariate normal distribu-
tion, and κF = 2/(ν − 4) for the multivariate tν distribution. Formulas for calculat-
ing the c2F coefficients can be found in Möttönen et al. (1997). One can see that the
Table 9.1 Asymptotic relative efficiencies of tests (estimates) based on UCOV, TCOV, and
RCOV relative to the test based on COV for different t-distribution cases with selected values
of dimension p and degrees of freedom ν
rank-based tests based on TCOV and RCOV behave very similarly and are highly
efficient even in the normal case. The test based on UCOV is less efficient than
the rank-based tests but still outperforms the classical test for heavy-tailed distribu-
tions. Note that the efficiencies increase with dimension. See Sirkiä et al. (2008) for
a more complete discussion and for the finite-sample efficiencies.
9.8 Examples
Example 9.1. Cork boring data: The tests and estimates for shape. To illustrate
the estimated shape matrices we plot the corresponding estimates of the 50 % toler-
ance regions. The estimated tolerance regions based on a location estimate T and a
shape estimate S are constructed as follows. First calculate the squared Mahalanobis
distances based on S and T; that is,
We compare the tolerance ellipsoids for the shape matrices based on COV,
UCOV (Tyler’s shape), and TCOV (Dümbgen’s shape). The corresponding loca-
tion estimates are the sample mean, the (affine equivariant) spatial median, and the
(affine equivariant) Hodges-Lehman estimate. Note that if we are interested in the
shape matrix, only the shape (not the size or location) of the tolerance ellipsoid is
relevant. The shape should be circular or spherical in the case that Λ = I p . The
tolerance ellipsoids for the 3-variate cork boring data are given in Figure 9.1.
> data(cork)
> cork_3v <- sweep(cork[,2:4], 1, cork[,1], "-")
> colnames(cork_3v) <- c("E_N", "S_N", "W_N")
>
> EST1 <- list(location = colMeans(cork_3v),
scatter = cov(cork_3v),
est.name = "COV")
>
> HR.cork_3v <- HR.Mest(cork_3v)
> EST2 <- list(location = HR.cork_3v$center,
scatter = HR.cork_3v$scatter,
est.name = "Tyler")
>
9.8 Examples 119
5
0
E_N
−10
−20
10
10
5
5
S_N
0
0
−10
−10
5
0
W_N
−10
−20
COV
Tyler
Duembgen
Fig. 9.1 The shape matrix estimates for the 3-variate data.
The ellipsoids do not seem spherical but only the test based on the regular co-
variance matrix gets a small p-value. (The sample size is too small for a reliable
inference on the shape parameter.) The p-values are as follows.
> mv.shape.test(cork_3v)
data: cork_3v
L = 0.0037, df = 5, p-value = 0.04722
data: cork_3v
120 9 One-sample problem: Inference for shape
data: cork_3v
Q2 = 9.232, df = 5, p-value = 0.1002
Consider the bivariate case next. The estimated shape matrices are illustrated
in Figure 9.2. As in the 3-variate case, the shape estimates based on the regular
covariance matrix and TCOV are close to each other. The p-values are now
−20 −10 0 10
10
5
S_N
0
−10 −5
10
0
W_E
−10
−20
−10 −5 0 5 10
COV
HR
Duembgen
Fig. 9.2 The shape matrix estimates for the 2-variate data.
> mv.shape.test(cork_2v)
data: cork_2v
L = 0.0748, df = 2, p-value = 0.07478
9.8 Examples 121
data: cork_2v
Q2 = 0.1885, df = 2, p-value = 0.91
data: cork_2v
Q2 = 2.637, df = 2, p-value = 0.2675
>
Example 9.2. Comparison of the tests and estimates in the t3 case. To illustrate
the finite sample efficiencies of the estimates for a heavy-tailed distribution we gen-
erated a random sample of size n = 150 from a spherical 3-variate t3 distribution.
The null hypothesis H0 : Λ = I p is thus true. The three shape estimates based on
COV, UCOV, and TCOV are illustrated in Figure 9.3. The R code for getting the
figure follows.
> set.seed(1234)
> X<-rmvt(150, diag(3),3)
>
>
> EST1 <- list(location = colMeans(X), scatter = cov(X),
est.name = "COV")
>
> HR.X<-HR.Mest(X)
> EST2 <- list(location = HR.X$center, scatter = HR.X$scatter,
est.name = "Tyler")
>
> EST3 <- list(location = mv.1sample.est(X, score = "rank",
stand = "inner")$location,
scatter = duembgen.shape(X), est.name = "Duembgen")
>
> plotShape(EST1, EST2, EST3, X, lty.ell = 1:3,
pch.ell = 14:16, level = 0.95)
The shape of the regular covariance matrix clearly differs most from the spherical
shape. This can also be seen from the p-values below. The regular covariance matrix
does not seem too reliable in the heavy-tailed distribution case.
122 9 One-sample problem: Inference for shape
4
2
var 1
0
−2
−4
5
5
−5
−5
var 2
−15
−15
−25
−25
5
0
var 3
−5
−10
−4 −2 0 2 4 −25 −15 −5 5
COV
Tyler
Duembgen
> mv.shape.test(X)
data: X
L = 0, df = 5, p-value = 2.371e-13
data: X
Q2 = 2.476, df = 5, p-value = 0.7801
data: X
Q2 = 1.323, df = 5, p-value = 0.9326
9.9 Principal component analysis based on spatial signs and ranks 123
Let again Y = (y1 , ..., yn ) be a random sample from a p-variate elliptical distribution
with the cumulative distribution function F. Write
COV(F) = ODO
for the eigenvector and eigenvalue decomposition of the covariance matrix. Thus
O is the matrix of eigenvectors and D is the diagonal matrix of eigenvalues of
COV(F). The orthogonal matrix O is then used to transform the random vector
yi to a new coordinate system
zi = O y i , i = 1, ..., n.
The components of zi in this new coordinate system are called the principal com-
ponents. The principal components are then uncorrelated and ordered according to
their variances (diagonal elements of D). In the multivariate normal case the princi-
pal components are independent. In principal component analysis (PCA) one wishes
to estimate the transformation O to principal components as well as the variances D.
PCA is often used to reduce the dimension of the original vector from p = p1 + p2
to p1 , say. If
O = (O1 , O2 ),
where O1 is a p × p1 matrix and O2 a p × p2 matrix, then the original observations
with the same principal component transformation. The same is naturally true for the
scatter or shape matrix functionals S that are based on UCOV and TCOV, namely
for Tyler’s shape estimate and Dümbgen shape estimate. This means that the eigen-
vectors of the sample matrices can be used to estimate the unknown population
eigenvectors. Locantore et al. (1999) and Marden (1999b) proposed the use of the
spatial signs and ranks for a simple robust alternative of classical PCA. See also
Visuri et al. (2000) and Croux et al. (2002).
124 9 One-sample problem: Inference for shape
All the scatter and shape matrices mentioned above are root-n consistent and
have a limiting multivariate normal distribution. We next find the limiting distri-
bution of the corresponding sample eigenvalue matrix, that is, an estimate of the
principal component transformation. For simplicity we assume that O = I p and that
the eigenvalues listed in D are distinct. (The limiting distribution in the general case
is found simply by using the rotation equivariance properties of the estimates.) Then
we have the following.
and √
n(Ôii − 1) = oP (1), i = 1, ..., n.
The efficiencies of the shape matrices then give the efficiencies for the eigenvectors
as well.
0 6 12 5 7 9 2 5 8
−12 −9
NegTemp
6 12
LargeF
0
8
Pop
4
0
5 7 9
Wind
7
AvRain
4
1
8
DaysRain
5
2
−12 −9 0 4 8 1 4 7
Fig. 9.4 Air pollution dataset. The marginal variables are standardized with MAD.
> data(usair)
> pairs(usair)
>
> # the dataset as used in Everitt + giving the variables names
>
> usair2 <- usair[,-1]
> usair2 <- transform(usair2, x1 = -x1)
> colnames(usair2) <- c("NegTemp", "LargeF", "Pop", "Wind",
"AvRain", "DaysRain")
>
> mads <- apply(usair2, 2, mad)
> usair3 <- sweep(usair2,2,mads, "/")
>
> pairs(usair3)
>
The next step is to calculate the three shape matrices: Tyler’s and Dümbgen’s
shape matrices and the one based on the regular covariance matrix. (The corre-
sponding transformations standardize UCOV, TCOV and COV, resp.) As some of
the marginal distributions are strongly skewed, the shape of the regular covariance
matrix clearly differs most from the spherical shape. Unlike the spatial sign- and
rank-based shape matrices, the covariance matrix is very sensitive to heavy tails,
126 9 One-sample problem: Inference for shape
which can be clearly seen from Figure 9.5. One can then expect that the results in
the PCA can be quite different.
1.0 2.0 3.0 1.0 2.0 7.5 8.5 4.0 5.0 5.2 6.0
LargeF
2.0
2.0
Pop
8.5 1.0
8.5 1.0
Wind
7.5
7.5
5.0
5.0
AvRain
4.0
4.0
6.0
DaysRain
5.2
−9.6 −8.8 1.0 2.0 3.0 1.0 2.0 7.5 8.5 4.0 5.0
Cov
Tyler’s shape matrix
Duembgen’s shape matrix
>
> COV <- cov(usair3)
> SI <- HR.Mest(usair3)
> Dumb <-duembgen.shape(usair3)
> rank.inner <-rank.shape(usair3)
>
> aff.HL <- mv.1sample.est(usair3, score = "rank",
stand = "inner")$location
>
> classical <- list(location= colMeans(usair3),
scatter= COV / sum(diag(COV)) * 6, est.name="Cov")
> signs.inner <- list(location= SI$center,
scatter= SI$scatter / sum(diag(SI$scatter)) * 6,
est.name="Tyler’s shape matrix")
> symm.signs.inner <- list(location=aff.HL,
scatter= Dumb / sum(diag(Dumb)) * 6 ,
est.name="Duembgen’s shape matrix")
> ranks.inner <- list(location= aff.HL, scatter=
9.9 Principal component analysis based on spatial signs and ranks 127
rank.inner / sum(diag(rank.inner)) * 6,
est.name="Rank shape matrix")
>
>
> plotShape(classical, signs.inner, symm.signs.inner,
lty.ell= 1:3, pch.ell=15:17,
+ x.legend= -3 ,y.legend= -1.2,
labels=colnames(usair3), cex.labels = 1.5)
In the following the R-function with the three scores is applied to get the results
in the corresponding PCA. We first compare the similarity of the results by calcu-
lating the correlations between the principal components coming from different ap-
proaches. The correlations show some similarity between the rank- and sign-based
solutions whereas the regular PCA solution differs from the others.
>
> PCA.identity <- mvPCA(usair3, score = "identity")
> PCA.signs.inner <- mvPCA(usair3, score = "sign",
estimate= "inner")
> PCA.symm.signs.inner <- mvPCA(usair3, score = "sym",
estimate= "inner")
> round(cor(PCA.signs.inner$scores,PCA.identity$scores),2)
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Comp.1 0.81 -0.34 -0.47 0.09 0.01 0.00
Comp.2 -0.91 -0.16 -0.37 -0.07 -0.01 0.01
Comp.3 -0.41 -0.71 0.36 0.45 -0.04 0.00
Comp.4 0.22 -0.75 0.30 -0.54 -0.01 -0.01
Comp.5 -0.58 -0.22 0.14 0.08 0.74 -0.20
Comp.6 0.14 -0.14 0.12 -0.03 0.38 0.89
> round(cor(PCA.symm.signs.inner$scores,PCA.identity$scores),2)
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Comp.1 0.99 -0.10 -0.12 0.04 0.01 0.00
Comp.2 -0.45 -0.68 -0.57 0.03 0.00 0.00
Comp.3 0.06 -0.73 0.67 0.16 -0.01 0.00
Comp.4 -0.23 0.34 -0.16 0.90 -0.01 0.00
Comp.5 -0.32 -0.11 0.07 0.04 0.93 -0.14
Comp.6 0.17 -0.04 0.05 0.00 0.23 0.96
> round(cor(PCA.symm.signs.inner$scores,
PCA.signs.inner$scores),2)
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Comp.1 0.89 -0.84 -0.36 0.24 -0.55 0.14
Comp.2 0.14 0.73 0.48 0.23 0.33 -0.04
Comp.3 0.00 -0.19 0.80 0.67 0.22 0.18
Comp.4 -0.14 0.16 0.21 -0.84 0.11 -0.12
Comp.5 -0.24 0.27 0.22 0.00 0.94 0.21
Comp.6 0.12 -0.17 -0.03 0.07 -0.10 0.98
We then compare the results based on the regular shape matrix with those based
on Tyler’s shape matrix. The principal components coming from the regular shape
128 9 One-sample problem: Inference for shape
matrix may now be easier to interpret. The first one is related to the human popula-
tion, the second one to the rain conditions, the third one to other climate variables,
and the fourth one to the wind. The robust shape matrices cut down the effects of few
outlying observations, the few cities with very high population and manufacturing
as well as the few cities with a hot climate and low precipitation. The results com-
ing from the robust PCA as reported below (based on Tyler’s shape matrix) therefore
differ a lot from the results of the regular PCA. The first principal component may
be related to the climate in general, the second one to the human population, the
third one to the rain, and the fourth one to the wind.
Comp.5 Comp.6
0.01368 0.008397
0.99160 1.000000
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
NegTemp 0.703 0.227 0.556 0.371
LargeF -0.780 0.272 -0.554
Pop -0.604 -0.176 -0.330 0.702
Wind -0.121 0.405 -0.893 -0.115
AvRain -0.755 -0.383 -0.188 0.461 0.188
DaysRain -0.649 0.402 0.327 -0.531 -0.152
> PCA.signs.inner
PCA for usair3 based on Tyler’s shape matrix
Standardized eigenvalues:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
2.4436 1.5546 1.1125 0.6872 0.1193 0.0829
Comp.5 Comp.6
0.01988 0.01382
0.98618 1.00000
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
NegTemp -0.453 -0.402 0.353 0.458 0.546
LargeF -0.389 0.564 0.164 -0.164 0.520 -0.454
9.10 Other approaches 129
Hallin and Paindaveine (2006) proposed test statistics for sphericity based on spatial
sign vectors ui = |yi |−1 yi and the ranks of lengths ri = |yi |, i = 1, ..., n. Their test
statistics are of the same form as that defined in John (1971), but instead of the
sample covariance matrix, they use AVE{K(Ri /(n + 1))ui ui }, where K = Kg is a
score function corresponding to spherical density g and Ri denotes the rank of ri
among r1 , ..., rn , i = 1, ..., n. The Hallin and Paindaveine (2006) tests appear to be
valid without any moments assumptions and asymptotically optimal if f = g.
Chapter 10
Multivariate tests of independence
As (X, Y) is a random sample, the rows are independent. Our null hypothesis of the
independence of the x- and y-variables can be written as
We wish to use general score functions Tx (x) and Ty (y) in the test construction.
We again use the identity score function, the spatial sign score function, and the
spatial rank score functions so that Tx (x) is an r-vector and Ty (y) is an s-vector,
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 131
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 10,
c Springer Science+Business Media, LLC 2010
132 10 Multivariate tests of independence
where
T̂X 1n = T̂Y 1n = 0,
and
r · T̂X T̂X = |T̂X |2 Ir and s · T̂Y T̂Y = |T̂Y |2 Is .
Here, as before,
Under general assumptions (which should be checked separately for each score
function) and under the null hypothesis, the limiting distribution of
|T̂X T̂Y |2
Q2 = Q2 (X, Y) = nrs
|T̂X |2 |T̂Y |2
Note also that, as the inner and outer centering and standardization are permutation
invariant,
|T̂ PT̂Y |2
Q2 (X, PY) = nrs X 2
|T̂X | |T̂Y |2
so that simply |T̂X PT̂Y |2 can be used in practice to find the permutation p-value.
The classical parametric test due to Wilks (1935) is the likelihood ratio test statistic
in the multivariate normal model and is based on
det(COV)
W = W (X, Y) =
det(COV11 ) det(COV22 )
10.3 Tests based on spatial signs and ranks 133
Another classical test for independence is Pillai’s trace test statistic (Pillai (1955))
which uses
W ∗ = W ∗ (X, Y) = tr COV−1 −1
11 COV12 COV22 COV21 .
Pillai’s trace statistic and Wilks’ test statistic are asymptotically equivalent in the
sense that if the fourth moments exist and the null hypothesis is true then
nW ∗ − n logW →P 0.
If we follow our general strategy with the identity score function (Tx (x) = x and
Ty (y) = y) then it is easy to see that
In the following we describe the tests that generalize the popular univariate tests due
to Blomqvist (1950), Spearman (1904), and Kendall (1938) to any dimensions r and
s. The tests provide practical and robust but still efficient alternatives to multivariate
normal theory methods.
Extension of Blomqvist quadrant test. The test is thus based on the r- and s-
variate spatial sign scores U(x) and U(y). To make the test statistic affine invariant,
134 10 Multivariate tests of independence
we first construct inner centered and standardized spatial signs separately for X and
Y and transform
(X, Y) → (ÛX , ÛY ).
(The outer centering and standardization yield procedures that are only rotation in-
variant.) Recall that the inner centering and standardization are accomplished by the
simultaneous Hettmansperger-Randles estimates of multivariate location and shape;
see Definition 6.4. For inner standardized spatial signs satisfy
and similarly
ÛY 1n = 0 and s · ÛY ÛY = nIs .
The test statistic is then
rs
Q2 = Q2 (X, Y) = |Û ÛY |2 .
n X
(X, Y) = (1n μ x + ε x Ω x , 1n μ y + ε y Ω y ),
and
E{U(ε yi )} = 0 and s · E{U(ε yi )U(ε yi ) } = Is .
Note that, for (ε x )i and (ε y )i separately, this is a wider model than the model (B2)
discussed in Chapter 2. The spatial median and Tyler’s shape matrix of ε xi (and
similarly of ε yi ) is a zero vector and an identity matrix, respectively. If μ̂ x and μ̂ y
are any root-n consistent estimates of μ x and μ y , and Σ̂ x and Σ̂ y are any root-n
consistent estimates of Σ x = Ω x Ω x and Σ y = Ω y Ω y (up to a constant), respectively,
and
−1/2 −1/2
(ÛX )i = U Σ̂ x (xi − μ̂ x ) and (ÛY )i = U Σ̂ y (yi − μ̂ y ) ,
i = 1, ..., n, one can show as in Taskinen et al. (2003) using the expansions in Section
6.1.1 that, under the null hypothesis, Q2 →d χrs 2 . (It naturally remains to show that
Extension of Spearman’s rho. The test is based on the spatial rank scores
RX (x) and RY (y) with dimensions r and s, respectively. We then calculate the inner
standardized spatial ranks separately for X and Y and transform
Note that, in this case, the scores are automatically centered, and the inner centering
is therefore not needed. For inner standardized spatial ranks
and
R̂Y 1n = 0 and s · R̂Y R̂Y = |R̂Y |2 Is .
The test statistic is then
|R̂X R̂Y |2
Q2 = Q2 (X, Y) = nrs .
|R̂X |2 |R̂Y |2
Next consider the limiting null distribution of Q2 . For considering the asymp-
totic behavior of the test, we assume that there are (up to a constant) unique scatter
matrices (parameters) Σ x and Σ y and positive constants τx2 and τy2 which satisfy
−1/2 −1/2
E U Σ x (x1 − x2 ) U Σ x (x1 − x3) = τx2 Ir
and
−1/2 −1/2
E U Σ y (y1 − y2 ) U Σ y (y1 − y3 ) = τy2 Is ,
respectively, and that Σ̂ x and Σ̂ y are root-n consistent estimates of Σ x and Σ y , again
up to a constant. If we then choose
−1/2
−1/2
(R̂X )i = AVE j U Σ̂ x (xi − x j ) and (R̂Y )i = AVE j U Σ̂ y (yi − y j ) ,
i = 1, ..., n, one can show again as in Taskinen et al. (2005) that, under these mild
assumptions and under the null hypothesis, Q2 →d χrs 2 . (For our proposal above,
one needs to show that the shape matrix estimate based on the spatial ranks is root-n
consistent.)
To illustrate and compare the efficiencies of different test statistics for independence,
we derive the limiting distributions of the test statistics under specific contiguous al-
ternative sequences. Let x∗i and y∗i be independent with spherical marginal densities
exp{−ρx (|x|)} and exp{−ρy (|y|)} and write, for some choices of M1 and M2 ,
∗
xi (1 − Δ )Ir Δ M1 xi
= ,
yi Δ M2 (1 − Δ )Is y∗i
√
with Δ = δ / n. Let fΔ be the density of (xi , yi ) . Note that the joint distribution
of ((xi ) , (yi ) ) is not spherically symmetric any more; the only exception is the
∗ ∗
multivariate normal case. The optimal likelihood ratio test statistic for testing H0
against HΔ is then
n
L = ∑ {log fΔ (xi , yi ) − log f0 (xi , yi )}.
i=1
We need the general assumption (which must be checked separately in each case)
that, under the null hypothesis,
δ n
L = √ ∑ r − ψx (rxi )rxi + ψx (rxi )ryi uxi M1 uyi
n i=1
s − ψy (ryi )ryi + ψy (ryi )rxi uyi M2 uxi + oP(1),
where rxi = |xi |, uxi = |xi |−1 xi , and ψx (r) = ρx (r), and similarly for ryi , uyi and
ψy (r). Under this assumption the sequence of alternatives is contiguous to the null
hypothesis.
Under the above sequence of alternatives we get the following limiting distribu-
tions.
Theorem 10.1. Assume that max(p, q) > 1. The limiting distribution of the multi-
variate Blomqvist statistic under the sequence of alternatives is a noncentral chi-
square distribution with rs degrees of freedom and noncentrality parameter
10.4 Efficiency comparisons 137
δ2
|c1 M1 + c2 M2 |2 ,
rs
where
δ2
|d1 M1 + d2 M2 |2 ,
4 rs τx2 τy2
where
d1 = (r − 1)E(|y∗i − y∗j |)E(|x∗i − x∗j |−1 )
and
d2 = (s − 1)E(|x∗i − x∗j |)E(|y∗i − y∗j |−1 ).
Finally, the limiting distribution of Wilk’s test statistic n logW (and Pillai’s trace
statistic Q2 = nW ∗ ) under the sequence of alternatives is a noncentral chi-square
distribution with rs degrees of freedom and noncentrality parameter
δ 2 |M1 + M2 |2 .
The above results are thus found in the case when the null marginal distributions
are spherically symmetric. If the marginal distributions are elliptically symmetric,
the efficiencies are of the same type |h1 M1 + h2 M2 |2 , where h1 and h2 depend on the
marginal spherical distributions and the test used. If the marginal distributions are of
the same dimension and the same type (which implies that h1 = h2 ) then the relative
efficiencies do not depend on M1 and M2 at all. Note that, unfortunately, the tests
are not unbiased (positive noncentrality parameter) for all alternative sequences.
high dimensions. For more details, simulation studies and efficiencies under other
distributions, see Taskinen et al. (2003, 2005).
Table 10.1 Asymptotical relative efficiencies of Kendall’s tau and Spearman’s rho (and the
Blomqvist quadrant test in parentheses) as compared to the Wilks test at different r- and s-variate
t distributions for selected ν = ν1 = ν2
r
s 2 5 10
2 1.12 1.14 1.16
(0.79) (0.91) (0.96)
ν =5 5 1.17 1.19
(1.05) (1.10)
10 1.20
(1.16)
2 1.00 1.02 1.03
(0.69) (0.80) (0.84)
ν = 10 5 1.04 1.05
(0.92) (0.96)
10 1.07
(1.01)
2 0.93 0.95 0.96
(0.62) (0.71) (0.75)
ν =∞ 5 0.96 0.97
(0.82) (0.86)
10 0.98
(0.91)
50 70 90 −20 0 −1000 0
190
Height
170
150
50 70 90
Weight
130
110
Hip
90
0
HRT1T4
−20
1.5
0.5
COT1T4
−0.5
0
SVRIT1T4
−1000
> data(LASERI)
> attach(LASERI)
> pairs( cbind(Height,Weight,Hip,HRT1T4,COT1T4,SVRIT1T4),
col=as.numeric(Sex), pch=15:16)
For the two 3-variate observations we first used the tests based on the identity
score (Pillai’s trace) and the tests based on standardized spatial signs of the obser-
vation vectors (an extension of the Blomqvist test) and those of the differences of
the observation vectors (an extension of Kendall’s tau). All the p-values are small;
see the results below.
> mv.ind.test(cbind(Height,Waist,Hip),
cbind(HRT1T4,COT1T4,SVRIT1T4))
> mv.ind.test(cbind(Height,Waist,Hip),
cbind(HRT1T4,COT1T4,SVRIT1T4),score="si")
> mv.ind.test(cbind(Height,Waist,Hip),
cbind(HRT1T4,COT1T4,SVRIT1T4),score="sy")
We next dropped two variables Hip and COT1T4. For the remaining two bivariate
observations we cannot reject the null hypothesis of independence, and one gets the
following results (Pillai’s trace and an extension of Spearman’s rho; approximate
p-values coming from a permutation tests are also given).
> mv.ind.test(cbind(Height,Waist),
cbind(HRT1T4,SVRIT1T4))
> mv.ind.test(cbind(Height,Waist),
cbind(HRT1T4,SVRIT1T4), method="p")
> mv.ind.test(cbind(Height,Waist),
cbind(HRT1T4,SVRIT1T4), score="r", method="p")
> mv.ind.test(cbind(Height,Waist),
cbind(HRT1T4,SVRIT1T4), score="r", method="p")
Assume that r ≤ s. The classical canonical correlation analysis based on the covari-
ance matrix finds the transformation matrices Hx and Hy such that
Ir L
COV(XHx , YHy ) = ,
L Is
where L = (L0 , O) with an r × r diagonal matrix L0 . Note that Pillai’s trace statistic
is the sum of squared canonical correlations
142 10 Multivariate tests of independence
Write again
COV11 COV12
COV(X, Y) = COV = .
COV21 COV22
Then
Hx COV11 Hx = Ir ,
Hy COV22 Hy = Is , and
Hx COV12 Hy = L.
To simplify the notations, assume that r = s and that the limiting distribution of
(vectorized)
√ COV11 COV12 Ir Λ
n −
COV21 COV22 Λ Ir
is a multivariate normal distribution with zero mean vector and the diagonal matrix
Λ has distinct diagonal
√ elements. Then,
√ using the three
√ equations above and Slut-
sky’s theorem, also n(Hx − Ir ), n(Hy − Ir ), and n(L − Λ ) have multivariate
normal limiting distributions that can be solved from
√ √ √
n(Hx − Ir ) + n(Hx − Ir ) + n(COV11 − Ir ) = oP (1),
√ √ √
n(Hy − Ir ) + n(Hy − Ir ) + n(COV22 − Ir ) = oP (1) and
√ √ √ √
n(Hx − Ir )Λ + nΛ (Hy − Ir ) + n(COV12 − Λ ) = n(L − Λ ) + oP(1).
A nonparametric analogue to Wilks’ test was given by Puri and Sen (1971). They
developed a class of tests based on componentwise ranking which uses a test statistic
of the form
10.7 Other approaches 143
|T |
SJ = .
|T11 |T22 |
Here the elements of (r + s) × (r + s) matrix T are
1 n Rki Rli
Tkl = ∑ J J ,
n i=1 n+1 n+1
where Rki denotes the rank of the kth component of (xi , yi ) among the kth compo-
nents of all n vectors and J(·) is an arbitrary (standardized) score function. Under
H0 , −n log SJ →d χrs
2.
Oja et al. (2009) found optimal nonparametric tests of independence in the sym-
metric independent component model. These tests were based on marginal signed-
ranks applied to the (estimated) independent components.
Chapter 11
Several-sample location problem
Abstract In this chapter we consider tests and estimates based on identity, spatial
sign, and spatial rank scores in the several independent samples setting. We get
multivariate extensions of the Moods test, Wilcoxon-Mann-Whitney test, Kruskal-
Wallis test and the two samples Hodges-Lehmann estimator. Equivariant/invariant
versions are found using inner centering and standardization.
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 145
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 11,
c Springer Science+Business Media, LLC 2010
146 11 Several-sample location problem
Again, as in the one-sample case, we wish to use a general location score function
T(y) in testing and estimation. Using inner centering and outer standardization, the
procedure for testing proceeds as follows.
AVE{T(yi j − μ̂ )} = 0.
2. The test statistic for testing whether the ith sample differs from the others is then
based on
T̂i = AVE j T̂i j , i = 1, ..., c.
3. The test statistic for H0 : F1 = · · · = Fc is
c
Q2 = Q2 (Y) = ∑ ni T̂i B̂−1 T̂i ,
i=1
where
B̂ = AVE T̂i j T̂i j .
4. Under the null hypothesis H0 : μ 1 = · · · = μ c and under assumptions specified
later, the limiting distribution of Q2 (Y) is χ(c−1)p
2 .
Inner centering makes the test statistic location invariant; that is,
Q2 (Y + 1nb ) = Q2 (Y)
for all p-vectors b. Note that the test statistic can be written as
!
c
−1
Q = tr ∑ ni T̂i T̂i AVE T̂i j T̂i j
2
i=1
which compares two scatter matrices (for “between” and “total” variation). In fact
Q2 is the classical Pillai trace statistic for MANOVA, now based on centered score
values T̂i j instead of original centered values yi j − ȳ.
The approach based on inner centering and inner standardization is given next.
∑i ni |T̂i |2
Q2 = np .
∑i ∑ j |T̂i j |2
The test using both inner centering and inner standardization is affine invariant;
that is,
Q2 (YH + 1nb ) = Q2 (Y)
for all full-rank p × p matrices H and for all p-vectors b. For both test versions, un-
der general assumptions stated later, the limiting distribution of the test statistic Q2 is
a chi-square distribution with (c − 1)p degrees of freedom that can be used to calcu-
late approximate p-values. The p-value can also be calculated for the conditionally
distribution-free, exact permutation test version. Let P be an n × n permutation ma-
trix (obtained from an identity matrix by permuting rows or columns). The p-value
of the permutation test statistic is then
EP I Q2 (PY) ≥ Q2 (Y) ,
We start with the classical multivariate analysis of variance (MANOVA) test statis-
tics which are given by the choice T(y) = y, identity score function. Again, write
yi j = μ i + Ωε i j , i = 1, ..., c; j = 1, ..., ni ,
148 11 Several-sample location problem
Our general testing strategy with the identity score function then uses sample
mean vectors
ȳi = AVE j {yi j }, i = 1, ..., c,
“grand mean vector”
ȳ = AVE{yi j },
and sample covariance matrix
2. The test statistic for testing whether the ith sample differs from the others is then
based on
T̂i = ȳi − ȳ, i = 1, ..., c.
3. The test statistic is
c
Q2 = Q2 (Y) = ∑ ni (ȳi − ȳ) S−1 (ȳi − ȳ) .
i=1
It is straightforward to see that the inner centering and inner standardization yield
exactly the same test statistic. The classical MANOVA procedure usually starts with
a decomposition
SST = SSB + SSW
corresponding to
11.2 Hotelling’s T 2 and MANOVA 149
Thus the “total” variation SST is decomposed into a sum of “between” and “within”
variations, SSB and SSW . Our test statistic
Q2 = n · tr SSBSST−1
compares the “between” and “total” matrices and it is known in the literature with
the name Pillai’s trace statistic. Another possibility is to base the test on the Lawley-
−1
Hotelling’s trace statistic, Q2LH = n ·tr(SSB SSW ). The test statistics Q2 (correspond-
ingly QLH ) is simply n times the sum of eigenvalues of SSBSST−1 (correspondingly
2
−1
SSBSSW ). Instead of considering the sum (or the arithmetic mean) of the eigen-
values of SSB SST−1 or SSBSSW −1
, one could base the test on the product (or the ge-
ometrical mean) of the eigenvalues. The so-called Wilks’ lambda, for example, is
Λ = det(SW SST−1). In fact, Wilks’ lambda is the likelihood ratio test statistic in the
multivariate normal case.
Another formulation of the problem and test statistics. In the practical anal-
ysis of data, the data are usually given in the form
(X, Y),
and X X is a c × c diagonal matrix whose diagonal elements are the c group sizes,
here denoted by n1 , ..., nc .
For the regular outer centering of the observation vectors, one can use the pro-
jection matrix
1
P1n = 1n (1n 1n )−1 1n = 1n 1n .
n
The projection Y → P1n Y then replaces the observations by the “grand” mean
vector, and the outer (and inner) centered observations are obtained as residuals,
1
Y → Ŷ = (In − P1n )Y = In − 1n 1n Y.
n
PX = X(X X)−1 X
is the projection matrix that projects the data points to the subspace spanned by the
columns of X. This means that, in our case, transformation Y → PX Y replaces the
observation vectors by their group mean vectors. Matrix In − PX is the projection
matrix to the corresponding residual space; that is, matrix (In − PX )Y yields the
differences between observations and the corresponding group (sample) means.
and therefore
Q2 = n · tr Ŷ PX Ŷ(Ŷ Ŷ)−1
and
Q2LH = n · tr Ŷ PX Ŷ(Ŷ (In − PX )Ŷ)−1 .
Recall that in the one-sample case, for testing H0 : μ = 0, one uses the decomposi-
tion
Y Y = Y P1n Y + Y (In − P1n )Y = Y P1n Y + Ŷ Ŷ
and there we obtained
Nonparametric model with finite second moments. The test statistics Q2 and
Q2LH are also asymptotically valid in a wider nonparametric model where we only
assume that the second moments exist. In this model, we then assume that the ob-
servations are generated according to
yi j = μ i + Ω ε i j , i = 1, ..., c; j = 1, ..., ni ,
11.2 Hotelling’s T 2 and MANOVA 151
where ε i j are independent vectors all having the same unknown distribution with
E(ε i j ) = 0 and COV(ε i j ) = I p . Recall that the vectors μ 1 , ..., μ c are unknown pop-
ulation mean vectors, and Σ = Ω Ω is the covariance matrix of yi j .
Next we consider the limiting null distribution of the test statistic. For that we
need the assumption that
ni
→ λi , i = 1, ..., c, as n → ∞,
n
where 0 < λi < 1, i = 1, ..., c. First, under the null hypothesis,
1 1
Ŷ Ŷ →P Σ and Ŷ (In − PX )Ŷ →P Σ
n n
and therefore Q2LH and Q2 are asymptotically equivalent; that is, Q2LH − Q2 →P 0.
Then note, using the multivariate central limit theorem (CLT), that the limiting dis-
tribution of
√ √
( n1 ȳ1 , ..., nc−1 ȳc−1 )
is a multivariate normal distribution with mean vector zero and covariance matrix
Ic−1 − dd ⊗ Σ ,
√
where d = ( λ1 , ..., λc−1 ) . We use the first c − 1 sample mean vectors only as
√ √
the covariance matrix of ( n1 ȳ1 , ..., nc ȳc ) is singular. Then we find that
Q2 = AVEi ni ȳi S−1 ȳi
⎛ √ ⎞
√ √
n1 ȳ1
−1
= ( n1 ȳ1 , ..., nc−1 ȳc−1 ) Ic−1 − d̂d̂ ⊗ S−1 ⎝ ... ⎠
√
nc−1 ȳc−1
with dˆi = ni /n and with
−1 d̂d̂
Ic−1 − d̂d̂ = Ic−1 + .
1 − d̂d̂
Assume that E(Y) = Δ and center the mean matrix as Δ̂ = (In − P1n )Δ . Accord-
ing to Theorem 11.1, the distribution of Q2 can be approximated by a noncentral
chi-square distribution with (c − 1)p degrees of freedom and noncentrality parame-
ter
tr(Δ̂ PX Δ̂ Σ −1 ).
This result can be used in the power and sample size calculations.
PY ∼ Y
for all n × n permutation matrices P. In this approach we use the second formulation
of the model. Now the value of the test statistic for permuted data set PY is
Q2 (PY) = n · tr Ŷ PPX PŶ(Ŷ Ŷ)−1 .
To calculate a new value of the test statistic after the permutation, we only need to
transform the projection matrix PX → PPX P. The exact p value is again obtained
as
EP Q2 (PY) ≥ Q2 (Y) ,
where P is uniformly distributed in the set of all different n! permutation matrices.
Some final comments. It is easy to check that the test statistic Q2 is affine
invariant in the sense that
for any full-rank p × p transformation matrix H and for any p-vector b. This is
naturally true for Q2LH as well. This means that the exact null distribution then does
not depend on μ or Ω at all. If we write
which is just the squared norm of a projection of the centered and standardized data
matrix.
In this section we consider the test statistic that uses the spatial sign score function
T(y) = U(y). The tests are then extensions of univariate two- and several-sample
Mood’s test or median test; see Mood (1954). Again, we assume that the observa-
tions yi j are generated by
yi j = μ i + Ωε i j , i = 1, ..., c; j = 1, ..., ni ,
where ε i j are independent, centered, and standardized observations (in the sense
described later) with a joint cumulative distribution function F. Again, we wish to
test the null hypothesis
H0 : μ 1 = · · · = μ c .
and
(yi j − μ i )(yi j − μ i )
B=E .
|yi j − μ i |2
Proof. We first assume that the null hypothesis is true with μ 1 = · · · = μ c = 0. Then
E(U(yi j )) = 0. Write
As the null hypothesis is true and we are back in the one-sample case and
√ √
nμ̂ = nA−1 Ū + oP(1).
Recall the results in Section 6.2. Using the results in Section 6.1.1, we can conclude
that
11.3 The test based on spatial signs 155
√ √ √ √
ni Ûi = ni Ūi + ni Aμ̂ + oP (1) = ni Ūi − Ū + oP (1).
But then we are back in the regular MANOVA case with the identity score function
and the Ui j as observation vectors, and the null result follows from Theorem 11.1.
Consider next a sequence of alternatives
1
Hn : μ i = μ + √ δ i , i = 1, ..., c,
n
(Ic−1 ⊗ A)δ̃
(X, Y),
where the ith column of the matrix X indicates the membership of the ith sample.
Then transform the matrix Y to the inner centered score matrix Û. Then again
Q2 (Y) = n · tr Û PX Û(Û Û)−1
156 11 Several-sample location problem
and
Q2 (PY) = n · tr Û PPX PÛ(Û Û)−1 .
As before, the exact p value is given by
EP Q2 (PY) ≥ Q2 (Y) ,
where P is uniformly distributed over the set of all n! different permutation matrices.
Affine invariant test. Unfortunately, the spatial sign test statistic discussed so
far is not affine invariant. An affine invariant version of the test is obtained if one
uses inner centering and inner standardization:
2. The test statistic for testing whether the ith sample differs from the others is then
based on
Ûi = AVE j Ûi j , i = 1, ..., n.
3. The several-sample location test statistic
Q2 = p · ∑ ni |Ûi |2 .
Note that if one transforms Y → Û where the scores in Û are inner centered and
inner standardized then simply
Now
Q2 (PY) = p · |PX PÛ|2
and the p values of the exact test version are easily calculated.
11.4 The tests based on spatial ranks 157
This approach uses the spatial rank function R(y) = RY (y). For a dataset Y =
(y1 , ..., yn ) , the spatial (centered) rank function is
Ri = R(yi ), i = 1, ..., n,
are automatically centered; that is, ∑ni=1 Ri = 0, and no inner centering is there-
fore needed. The tests extend the two-sample Wilcoxon-Mann-Whitney test and the
several-sample Kruskal-Wallis test.
yi j = μ i + Ωε i j , i = 1, ..., c; j = 1, ..., ni ,
where ε i j are independent, centered, and standardized observations (in the sense
described later) with a joint cumulative distribution function F. Again, we wish to
test the null hypothesis
H0 : μ 1 = · · · = μ c .
and
(y1 − y2 )(y1 − y3)
B=E .
|y1 − y2| · |y1 − y3 |
Proof. We first assume that the null hypothesis is true; that is, E(U(yi j − yrs )) = 0
for all i, j, r, s. The test statistic
(Ic−1 ⊗ A)δ̃
and
Q2 (PY) = n · tr R PPX PR(R R)−1
and the exact p-values are obtained as before.
The test statistic is not affine invariant, however. An affine invariant modifica-
tion of the test is obtained if one uses inner centering and inner standardization as
follows.
RCOV(YS−1/2 ) ∝ I p .
Write
R̂i j = RYS−1/2 (S−1/2 yi j ), i = 1, ..., c; j = 1, ..., ni .
2. The test statistic for testing whether the ith sample differs from the others is then
based on
160 11 Several-sample location problem
R̃i = AVE j R̂i j , i = 1, ..., n.
3. The several-sample location test statistic is then
∑ ni |R̃i |2
Q2 = np · .
∑i ∑ j |R̂i j |2
4. Under the null hypothesis and under some weak assumptions, the limiting distri-
bution of Q2 (Y) is χ(c−1)p
2
.
yi j = μ i + Ω ε i j , i = 1, ..., c; j = 1, ..., ni ,
where the ε i j are the standardized and centered vectors all having the same unknown
distribution. As before, μ 1 , ..., μ c are unknown location centers and the matrix Σ =
Ω Ω > 0 is a joint unknown scatter matrix.
Δ̂ i j = ȳ j − ȳi , i, j = 1, ..., c.
The difference between the sample spatial medians. Let now μ̂i be the spatial
median calculated for Yi , i = 1, ..., c. Our second estimate for Δ i j is then
Δ̂ i j = μ̂ j − μ̂i , i, j = 1, ..., c.
If
1 yy yy
A(y) = Ip − and B(y) =
|y| |y| |y|2
and
A = E A(yi j − μ i ) and B = E B(yi j − μ i )
then, using Theorem 7.4,
√ 1 1
n(Δ̂ i j − Δ i j ) →d N p 0, + A−1 BA−1 .
λ1 λ2
No moment assumptions are needed here. Of course, A and B are unknown but can
be estimated by
The Hettmansperger-Randles (HR) estimates of the location centers and the joint
scatter matrix are the values of μ 1 , ..., μ c and S that simultaneously satisfy
AVE j U(ε i j ) = 0, i = 1, ..., c,
162 11 Several-sample location problem
and
p · AVE U(ε i j )U(ε i j ) = I p .
If now
1 yy y1 y2
A(y) = I− and B(y1 , y2 ) =
|y| |y| |y1 | · |y2 |
and
A = E {A(y11 − y12)} and B = E {B(y11 − y12, y11 − y13)} ,
then again
√ 1 1 −1 −1
n(Δ̂ i j − Δ i j ) →d N p 0, + A BA .
λ1 λ2
and by
B̂ = AVE B(y jr − yis − Δ̂ i j , y jr − ykl − Δ̂ k j ) ,
respectively.
Unfortunately, the estimates Δ̂ i j are not any more compatible in the sense that
Δ̂ i j = Δ̂ ik + Δ̂ k j . To overcome this problem, one can first, using the kth sample as a
reference sample, find an estimate for the difference between ith and jth sample as
Δ̃ i j·k = Δ̂ ik + Δ̂ k j , k = 1, ..., c,
and then take the weighted average, Spjøtvoll’s estimator (Spjøtvoll (1968))
1 c
Δ̃ i j = ∑ nk Δ̃ i j·k .
n k=1
Then
Δ̃ i j = Δ̃ ik + Δ̃ k j , for all i, k, j = 1, ..., c.
One can easily show that √
n(Δ̃ i j − Δ̂ i j ) →P 0
√ √
so that the limiting distributions of n(Δ̃ i j − Δ i j ) and n(Δ̂ i j − Δ i j ) are the same.
See Nevalainen et al. (2007c).
11.6 An example: Egyptian skulls from three epochs. 163
and
AVE B S−1/2 (y jr − yis − Δ̂ i j ), S−1/2 (y jr − ykl − Δ̂ k j ) ∝ I p .
As an example, we consider the classical dataset on Egyptian skulls from three dif-
ferent epochs. The same data were analyzed in Johnson and Wichern (1998). The
three epochs were time periods in years around 4000, around 3300, and around 1850
BC. Thirty skulls were measured for each time period, and the four measured vari-
ables denoted by mb (maximal breadth), bh (basibregmatic height), bl (basialveolar
length), and nh (nasal height). We wish to consider if there are any differences in
the mean skull sizes among the time periods. See below the first description of the
dataset. The observations are plotted in Figure 11.1.
mb
120
145
135
bh
125
115
bl
100
90
45 50 55 60
nh
> library(MNM)
> library(HSAUR)
>
> # using skulls as in Johnson and Wichern
>
> SKULLS <- skulls[1:90,]
> levels(SKULLS epoch)<- c(levels(SKULLS epoch)[1:3], NA, NA)
> summary(SKULLS)
epoch mb bh
c4000BC:30 Min. :119 Min. :121
c3300BC:30 1st Qu.:130 1st Qu.:130
c1850BC:30 Median :133 Median :134
Mean :133 Mean :133
3rd Qu.:136 3rd Qu.:136
Max. :148 Max. :145
bl nh
Min. : 87.0 Min. :44.0
1st Qu.: 94.2 1st Qu.:48.0
Median : 98.0 Median :50.0
Mean : 98.1 Mean :50.4
3rd Qu.:101.0 3rd Qu.:53.0
Max. :114.0 Max. :60.0
> X <- SKULLS[,2:5]
> epoch <- SKULLS epoch
> pairs(X, col = as.numeric(epoch), pch=as.numeric(epoch)+14)
We compare three tests, namely, (i) the regular MANOVA based on the identity
score function, (ii) the MANOVA based on the inner centered and standardized
spatial signs, and (iii) the MANOVA based on the inner standardized spatial ranks.
The spatial signs and ranks used in the test are illustrated in Figures 11.2 and 11.3
which are given by
We next report the p-values coming from different tests. Also the p-values
based on the permutation distribution are reported for the comparison. The classical
MANOVA test produces
11.6 An example: Egyptian skulls from three epochs. 165
1.0
var 1
0.0
−1.0
0.5
var 2
−0.5
0.5
var 3
−0.5
1.0
var 4
0.0
−1.0
Fig. 11.2 Egyptian skulls data. Inner centered and inner standardized spatial signs.
var 1
−0.5
0.5
var 2
−0.5
0.5
var 3
−0.5
0.6
var 4
0.0
−0.6
>
> aggregate(X,list(epoch=epoch),mean)
epoch mb bh bl nh
1 c4000BC 131.4 133.6 99.17 50.53
2 c3300BC 132.4 132.7 99.07 50.23
3 c1850BC 134.5 133.8 96.03 50.57
>
> mv.Csample.test(X, epoch)
data: X by epoch
Q.2 = 15.5, df = 8, p-value = 0.05014
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0,0)
> set.seed(1234)
> mv.Csample.test(X, epoch, method="perm")
data: X by epoch
Q.2 = 15.5, replications = 1000, p-value = 0.05
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0,0)
If one uses the MANOVA based on the spatial signs (invariant version), one gets
data: X by epoch
Q.2 = 17.1, df = 8, p-value = 0.02909
alternative hypothesis: true location
difference between some groups is not equal to c(0,0,0,0)
> set.seed(1234)
> mv.Csample.test(X, epoch, "s", "i", "perm")
data: X by epoch
Q.2 = 17.1, replications = 1000, p-value = 0.032
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0,0)
11.6 An example: Egyptian skulls from three epochs. 167
With the invariant the rank-based MANOVA one gets the following results.
data: X by epoch
Q.2 = 16.67, df = 8, p-value = 0.03372
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0,0)
> set.seed(1234)
> mv.Csample.test(X, epoch, "r", "i", "perm")
data: X by epoch
Q.2 = 16.67, replications = 1000, p-value = 0.034
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0,0)
We end this example with the comparison of different two-sample location esti-
mates. The estimates are (i) the difference of the mean vectors, (ii) the difference
of the affine equivariant spatial medians, and (iii) the affine equivariant two-sample
Hodges-Lehmann estimate. We compare the first and the last epoch, and get the
following.
The differences in the estimates also seem to be minimal, as can be seen in Figure
11.4. The assumption on the multivariate normality of the data may be realistic here.
If the last observation is changed to (200, 200, 200, 200) to be an outlier then the
estimate and the confidence ellipsoid based on the regular mean vectors also changes
dramatically as can be seen in Figure 11.5.
See Möttönen and Oja (1995), Choi and Marden (1997), Marden (1999a), Visuri
et al. (2003), Oja and Randles (2004), and Nevalainen et al. (2007c) for different
uses of spatial signs and ranks in the multivariate several-sample location problem.
Mardia (1967) considered the bivariate problems. See Nevalainen and Oja (2006)
11.7 References and other approaches 169
−4 0 2 0 4 8 −2 0 2
−2
var 1
−6
0 2
0 2
8 −4
var 2
8 −4
var 3
4
4
0
0
2
var 4
0
−2
−6 −2 −4 0 2 0 4 8
Fig. 11.4 Egyptian skulls data. Estimates of the difference between the first and last epochs.
var 1
−10
5
5
0
var 2
10 −10
10 −10
var 3
0
0
−10
−10
−5 5
var 4
−20
Fig. 11.5 Egyptian skulls data with an outlier. Estimates of the difference between the first and
last epochs.
170 11 Several-sample location problem
for SAS macros for spatial sign MANOVA methods. Puri and Sen (1971) give a
full description of the several-sample location tests based on the vector of marginal
ranks. Chakraborty and Chaudhuri (1999) and Nordhausen et al. (2006) propose and
consider invariant versions of Puri-Sen tests.
Randles and Peters (1990), Peters and Randles (1990b), and Randles (1992) use
interdirections in the test constructions. The tests based on data depth are given in
Liu (1992), Liu and Singh (1993), and Liu et al. (1999). Multivariate Oja signs and
ranks are used in Hettmansperger and Oja (1994), Hettmansperger et al. (1998) and
Oja (1999).
Chapter 12
Randomized blocks
Abstract A multivariate extension of the Friedman test which is based on the spa-
tial ranks is discussed. Related adjusted and unadjusted treatment effect estimates
are considered as well. Again, the test using outer standardization is rotation invari-
ant but unfortunately not invariant under heterogeneous scaling of the components.
Invariant (equivariant) versions of the test (estimates) based on inner standardization
are discussed as well.
The design and the data. The data consist of N = nc p-dimensional vectors.
The N = nc subjects are in n blocks of equal size and within each block the c subjects
are assigned to c treatments at random. The p-variate observations are then usually
given in an n × k table as follows.
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 171
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 12,
c Springer Science+Business Media, LLC 2010
172 12 Randomized blocks
Treatments
Blocks 1 2 ··· c
1 y11 y12 · · · y1c
2 y21 y22 · · · y2c
.. ... ... . . . ...
.
n yn1 yn2 · · · ync
μ = (μ 1 , ..., μ c )
is the c × p matrix of the treatment effects, ∑ci=1 μ i = 0, and the rows of the c × p
random matrix ε i are dependent but exchangeable; that is,
Pε i ∼ ε i , i = 1, ..., n,
for all c × c permutation matrices P. We also assume that ε 1 , ..., ε n are independent.
H0 : μ 1 = · · · = μ c = 0.
Δ j j = μ j − μ j , j, j = 1, ..., c.
Under the null hypothesis and under a stronger assumption that ε 1 , ..., ε n are inde-
pendent and identically distributed, the random matrices Yi are also independent
and identically distributed (and the limiting distribution of the test statistic given
later can be easily found).
Classical MANOVA test. We first use the identity score in the test construction.
The first step is to center the observed values in each block, that is,
and
1 n c
B̂ = ∑ ∑ ŷi j ŷi j ,
nc i=1 j=1
which is just the covariance matrix estimate for the within-blocks variation.
Now we can define a squared form test statistic for testing H0 and give its limiting
permutational distribution.
Definition 12.1. MANOVA test statistic for testing H0 is
c−1 c
Q2 = Q2 (Y) =
nc ∑ ŷ. j B̂−1 ŷ. j .
j=1
The multivariate Friedman test. This test can be derived in exactly the same
way as the MANOVA tests. The multivariate blockwise centered response vectors
ŷi j are just replaced by multivariate blockwise centered rank vectors Ri j . The vector
Ri j is thus the centered rank of the observation yi j among all the observations in the
ith block, that is, among yi1 , ..., yic , i = 1, ..., n. The ranks can be displayed in a table
as follows.
Treatments
Blocks 1 2 ··· c Σ
1 R11 R12 · · · R1c 0
2 R21 R22 · · · R2c 0
.. .. .. . . .. ..
. . . . . .
n Rn1 Rn2 · · · Rnc 0
Σ R·1 R·2 · · · R·c 0
Now write
1 n c
B̂ = ∑ ∑ Ri j Ri j .
nc i=1 j=1
Definition 12.2. The multivariate Friedman test statistic is
c−1 c
Q2 =
nc ∑ R. j B̂−1 R. j .
j=1
Note that Q2 is rotation invariant but not invariant under rescaling of the compo-
nents.
Theorem 12.1. Assume that the null hypothesis of no-treatment effect is true and
that the sequence (Yi ) is independent and identically distributed up to a location
shift. The limiting distribution of Q2 is a central chisquare distribution with p(c − 1)
degrees of freedom.
For considering asymptotic properties of the tests and estimates we assume that the
original observations yi j are independent and that the cdf of the yi j is F(y − θi − μ j ),
where the θi are the (possibly random) block effects and the μ j the treatment effects
(∑i θi = ∑ j μ j = 0). We wish to test the hypothesis
1
H0 : μ 1 = · · · = μ c = 0 versus Hn : μ j = √ δ j , j = 1, ..., c,
n
where also ∑ j δ j = 0. Let δ denote the vector (δ 1 · · · δ k−1 ) . As the tests are based
on the centered observations and the centered ranks, it is not a restriction to assume
in the following that θ1 = · · · = θn = 0.
The limiting distribution of the classical MANOVA is then given in the following.
For limiting distributions of the rank tests, we first recall the asymptotic theory
for spatial sign and rank tests for comparing two treatments. This is also done to
introduce matrices A, B1 , and B2 needed in the subsequent discussion. First we
consider the dependent samples case (matched pairs) and use the one-sample spatial
sign test. The one-sample spatial sign test statistic for the difference vectors yi2 −yi1 ,
i = 1, . . . , n, for example, is defined as
n
T12 = ∑ U(yi2 − yi1).
i=1
n−1/2T12 −→ Nd (A(δ 2 − δ 1 ), B1 ),
d
where
B1 = E{U(yi2 − yi1 )U(yi2 − yi1 ) }
is the spatial sign covariance matrix for difference vectors and
A = E |yi2 − yi1|−1 I p − U(yi2 − yi1 )U(yi2 − yi1 ) .
176 12 Randomized blocks
c−1 (c − 1)(c − 2)
B = E(Ri j Ri j ) = 2
B1 + B2
c c2
is the covariance matrix of Ri j . (The expected values above are taken under the null
hypothesis.)
Now we are ready to give the limiting distributions of the rank tests
Theorem 12.3. Under the sequence of alternatives Hn , the limiting distribution of
the Friedman test statistic Q2 is a noncentral chi-square distribution with p(c − 1)
degrees of freedom and a noncentrality parameter
c−1 c
δFRIEDMAN
2
=
c ∑ δ j A B−1 Aδ j .
j=1
Relative efficiencies for comparing the regular MANOVA test and the Friedman
test are given by the following theorem.
Theorem 12.4. The Pitman asymptotic relative efficiency of the multivariate Fried-
man test with respect to MANOVA is
δFRIEDMAN
2
∑ j δ j A B−1 Aδ j
ARE12 = = .
δMANOVA
2
∑ j δ j Σ −1 δ j
Table 12.1 lists efficiencies of the Friedman test with respect to the regular
MANOVA test. The Friedman test clearly outperforms the classical MANOVA test
for heavy-tailed distributions. The efficiencies increase with the dimension p as well
as with the number of treatments c. In the multivariate normal case the efficiency
goes to one as dimension p → ∞. For c = 2, the efficiencies are as the efficiencies of
the spatial sign test; the efficiencies go to the efficiencies of the spatial signed-rank
test as the number of treatments c → ∞. See Möttönen et al. (2003) for a more de-
tailed study and for the asymptotic relative efficiency of the similarly extended Page
test.
Table 12.1 Asymptotic relative efficiencies (ARE) of the spatial multivariate Friedman tests with
respect to the classical MANOVA in the spherical multivariate t p,ν distribution case with different
choices of p, ν , and the number of treatments c
c
p ν 2 3 10
4 1.152 1.233 1.367
2 10 0.867 0.925 1.023
∞ 0.785 0.838 0.924
4 1.245 1.308 1.406
3 10 0.937 0.980 1.048
∞ 0.849 0.887 0.946
4 1.396 1.427 1.472
10 10 1.050 1.067 1.092
∞ 0.951 0.964 0.981
Δ̂ = (Δ̂ j j ) j, j =1,...,c
Then the adjusted treatment difference estimates are Δ̃ j j = Δ̂ j − Δ̂ j and the matrix
of these adjusted estimates is then
Δ̃ = (Δ̃ j j ) j, j =1,...,k .
For the univariate case, see Lehmann (1998) and Hollander and Wolfe (1999).
Möttönen et al. (2003) then proved the following.
√
Theorem 12.5. Under general assumptions, the limiting distribution of nvec(Δ̂ −
Δ ) is a multivariate singular normal with mean matrix zero and covariance matrix
given by
Σ(i j),(i j) = A−1 B1 A−1 , Σ(i j),(il) = A−1 B2 A−1 , and Σ(i j),(lm) = 0,
On the other hand, the asymptotic relative efficiency of Δ̃ j j with respect to ȳ· j − ȳ· j
is 1/p
det(Σ )
.
det(A−1 BA−1 )
Note that when the observations come from a spherical distribution the asymp-
totic relative efficiency of the estimate Δ̃ j j is the same as the asymptotic relative
efficiency of the multivariate Friedman test in Theorem 12.4.
Table 12.2 Asymptotic relative efficiency of Δ̃ j j with respect to Δ̂ j j in the spherical multivari-
ate normal distribution case with different choices of dimension p and with different numbers of
treatments c
number of treatments c
p 2 3 10
2 1.000 1.067 1.176
3 1.000 1.045 1.114
10 1.000 1.013 1.032
12.5 Examples and final remarks 179
The tests and estimates discussed above are rotation invariant/equivariant but un-
fortunately not scale invariant/equivariant. Due to the lack of the scale invariance
property, rescaling of one of the response variables, for example, change the results
(p-values) and also highly reduce the efficiency of the tests and estimates. The trans-
formation and retransformation approach introduced by Chakraborty and Chaudhuri
(1996) may again be used to construct affine invariant/equivariant versions of the
tests and estimates.
We now illustrate the use of the multivariate Friedman test on a dataset earlier an-
alyzed by Seber (1984). A randomized complete block design experiment was ar-
ranged to study the effects of six different treatments on plots of bean plants infested
by the serpentine leaf miner insect. In this study the number of treatments was c = 6
and the number of blocks was n = 4. The measurement vectors consist of three
different variables: y1 is the number of miners per leaf, y2 is the weight of beans
√
per plot (in kilograms) and y3 = sin−1 ( pr), where pr is the proportion of leaves
infested with borer. See Table 12.3 and Figure 12.1 for the original dataset. The
blockwise centered ranks are given in Table 12.4.
The observed value of the multivariate Friedman test statistic Q2r is now 30.48 and
using χ15
2 distribution the corresponding p-value is approximately 0.01. The results
are similar for the affine invariant version of the test (inner standardization). For
the MANOVA test the standardized test statistic and p-value are 32.10 and 0.006,
respectively. Also estimated p-values for the exact permutation tests are given in the
following printout.
> data(beans)
> plot(beans)
180 12 Randomized blocks
4.0
3.0
Block
2.0
1.0
6
5
4
Treatment
3
2
1
2.0
y1
1.0
0.0
1.2
0.8
y2
0.4
0.0
0.8
y3
0.4
0.0
1.0 2.0 3.0 4.0 0.0 1.0 2.0 0.0 0.4 0.8
> Y<-cbind(beans$y1,beans$y2,beans$y3)
Möttönen et al. (2003) gave an extension of the Page test as well. Another possibil-
ity is to use the marginal centered ranks as described in Puri and Sen (1971). This
approach is scale invariant/equivariant but not rotation invariant/equivariant, and the
efficiency can again be really poor if the response variables are highly correlated.
The transformation and retransformation approach may be used to construct affine
182 12 Randomized blocks
invariant/equivariant (and consequently more efficient) versions of the tests and esti-
mates. A third possibility is to use affine equivariant centered ranks based on the Oja
criterion; see Oja (1999). To compute blockwise centered ranks one has to require,
however, that c ≥ p + 1, which may be a serious limitation in practice.
Chapter 13
Multivariate linear regression
Assume that (X, Y) is the data matrix and we consider the linear regression model
Y = Xβ + ε ,
yi = β xi + ε i , i = 1, ..., n.
Assumption 6
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 183
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 13,
c Springer Science+Business Media, LLC 2010
184 13 Multivariate linear regression
for some positive definite q × q matrix D and for all p × q matrices C with positive
rank.
Testing problem I. Consider first the problem of testing the null hypothesis
H0 : β = 0. The null hypothesis thus simply says that y1 , ..., yn are independent and
identically distributed with a joint distribution centered at the origin. By a centered
observation, we mean here that E{T(yi )} = 0 for the chosen p-variate score function
T(y). We write
and
T = (T1 , ...., Tn ) and T(β ) = (T1 (β ), ..., Tn (β )) .
Again write A = E{T(ε i )L(ε i ) } and B = E{T(ε i )T(ε i ) }. As before, L(y) is the
optimal multivariate location score function. If B exists, then under the null hypoth-
esis and under our design Assumption 6,
has a limiting N pq (0, D ⊗ B) distribution and then the test statistic using the outer
standardization
The distribution of the test statistic at the true value β (close to the origin) can then
be approximated by a noncentral chi-square distribution with pq degrees of freedom
and noncentrality parameter
tr Δ PX Δ AB−1 A ,
where Δ = Xβ .
If one uses the inner standardization, one first finds a full rank transformation
matrix S−1/2 such that, if we transform
then
p · T̂ T̂ = tr(T̂ T̂)I p .
The test statistic is then
np
Q2 = Q2 (X, Y) = n · tr((T̂ PX T̂)(T̂ T̂)−1 ) = |PX T̂|2
tr(T̂ T̂)
Y = Xβ + ε
and wish to estimate unknown β . The estimate that is based on the score function T
then often solves
T(β̂ ) X = 0.
We thus find the estimate β̂ such that the transformed estimated residuals are uncor-
related with the explaining variable. Note that usually one of the explaining variables
in X corresponds to the intercept term in the regression model (the corresponding
column in X is 1n ); this implies that the transformed residuals also sum up to zero.
where, as before, A = E {T(ε i )L(ε i ) } and B = E {T(ε i )T(ε i ) }. Recall that L is the
optimal score function. If one uses inner standardization in the estimation problem,
one first finds an estimate β̂ and a full rank transformation matrix S−1/2 such that,
if we transform
yi → T̂i = T(S−1/2 (yi − β̂ xi )) and Y → T̂ = (T̂1 , ..., T̂n )
then
T̂ X = 0 and p · T̂ T̂ = tr(T̂ T̂)I p .
Testing problem II. Consider next the partitioned linear regression model
186 13 Multivariate linear regression
Y = X1 β 1 + X2 β 2 + ε
To construct the test statistic, first find the centered scores (the centering under
the null hypothesis)
T̂ = T(β̂ 1 , 0)
such that T̂ X1 = 0. We also write X̂2 = (In − PX1 )X2 . Then the test statistic
Q2 = Q2 (X1 , X2 , Y) = n · tr T̂ PX̂2 T̂(T̂ T̂)−1
where −1
1
COV(β̂ 2 ) = X̂2 X̂2 ⊗ Â−1 B̂Â−1
n
(with consistent estimates of  and B̂) is asymptotically equivalent with Q2 .
If one also uses inner standardization (to attain affine invariance), one first trans-
forms
yi → T̂i = T(S−1/2 (yi − β̂ 1 x1i )) and Y → T̂ = (T̂1 , ..., T̂n )
such that
T̂ X1 = 0 and p · T̂ T̂ = tr(T̂ T̂)I p .
The test statistic is then the same (but with changed, standardized scores)
13.2 Multivariate linear L2 regression 187
np
Q2 = n · tr T̂ PX̂2 T̂(T̂ T̂)−1 = |P T̂|2 .
tr(T̂ T̂) X̂2
Note that the use of the permutation test is questionable here. It is allowed only
if X1 and X2 are independent (X2 gives the treatment in a randomized trial, for
example) and then the p value is
EP I Q2 (X1 , PX2 , Y) ≥ Q2 (X1 , X2 , Y) .
Y = Xβ + ε ,
yi = β xi + ε i , i = 1, ..., n.
In this section we use the identical score function T(y) = y and therefore assume
that ε is a random sample from a p-variate distribution with
We thus need the assumption that the second moments exist. We assume that X X
is a full-rank matrix, the rank is q, and thus it has the inverse. For the asymptot-
ical results we assume that the explaining (design) variables are fixed and satisfy
Assumption 6 with
1
X X → D as n → ∞,
n
where the rank of D is also q. We naturally often assume that the first column of
X is 1n so that the first row of β is the so-called intercept parameter. Note that the
one-sample and several-sample location problems are special cases here.
Testing problem I. We first consider the problem of testing the null hypothesis
H0 : β = 0 versus H1 : β = 0. We thus wish to test whether there is any linear
structure in the population. The null hypothesis simply says that E(yi ) = 0 for all
i and therefore is independent on the values xi , i = 1, ..., n. If we use the identity
score, then the test statistic is simply
188 13 Multivariate linear regression
where V and W are q × q and p × p full rank transformation matrices, then the value
of the test statistic remains unchanged; that is,
1 1
Dn (β ) = |Y − Xβ |2 = tr (Y − Xβ ) (Y − Xβ ) = AVE{|yi − β xi |2 }
n n
or solves
(Y − Xβ̂ ) X = 0.
If X has rank q, then the solution is
β̂ = (X X)−1 X Y,
and the estimate is also called the least squares (LS) estimate. As
2. Y equivariance:
3. X equivariance:
Y = X1 β 1 + X2 β 2 + ε ,
where the linear part is partitioned into two parts. We wish to test the null hypothesis
H0 : β 2 = 0. We first center the matrices Y and X2 using X1 (inner centering); that
is
Y → Ŷ = (In − PX1 )Y and X2 → X̂2 = (In − PX1 )X2 .
Then Ŷ X1 = 0 and X̂2 X1 = 0. The test statistic for testing H0 : β 2 = 0 is
Q2 = n · tr Ŷ PX̂2 Ŷ(Ŷ Ŷ)−1
where V1 , V2 , and W are full rank transformation matrices (with ranks q1 , q2 , and
p, resp.) then the value of the test statistic is not changed; that is,
Y = Xβ + ε ,
ε is a random sample of size n from a p-variate distribution with the spatial median
at zero; that is,
E(U(ε i )) = 0.
With the spatial sign score we again need the assumption that the density of ε i is
bounded. Again, X X is a full-rank matrix, rank is q, and X satisfies Assumption 6
with
190 13 Multivariate linear regression
1
XX→D as n → ∞.
n
Testing problem I. Consider the testing problem with the null hypothesis H0 :
β = 0. Under the null hypothesis E(U(yi )) = 0 for all i = 1, ..., n. If one uses the
spatial sign score U(y), then one first transforms
and the test statistic is based on the covariances between components of U and X,
that is, on the matrix U X. Then, under the null hypothesis,
Unfortunately, this test is not affine invariant but an affine invariant test version
can be found using inner standardization. Find a transformation matrix S−1/2 such
that, if we transform
then
p · Û Û = nI p .
The transformation is then again Tyler’s transformation, and the test statistic is
Estimation problem. In the L1 estimation based on the spatial sign score, the
estimate β̂ minimizes
U(β̂ ) X = 0,
where
The estimate is sometimes also called the least absolute deviation (LAD) estimate.
13.3 L1 regression based on spatial signs 191
The solution β̂ cannot be given in a closed form but may be easily calculated
using the algorithm with the following two iteration steps.
1.
ei ← yi − β x i , i = 1, ..., n.
2. −1
β ← β + AVE{|ei |−1 xi xi } AVE{xi U(ei ) }.
See Appendix B. As in the case of the spatial median, it then follows that
√
n vec((β̂ − β ) ) →d Nqp 0, D−1 ⊗ (A−1 BA−1 ) ,
where
A = E {A(ε i )} and B = E {B(ε i )}
with
1
A(y) = I p − U(y)U(y) and B(y) = U(y)U(y) .
|y|
Natural consistent estimates of A and B are then
 = AVE A yi − β̂ xi and B̂ = AVE B yi − β̂ xi ,
respectively.
As in the case of the one-sample HR estimate, there is no proof for the conver-
gence of the algorithm but in practice it seems to work. If β = 0, (1/n)X X → D =
I p , and ε i is spherically distributed around the origin, and the initial regression and
shape estimates, say B and S are root-n consistent, that is,
√ √
nB = OP (1) and n(S − I p ) = OP (1)
with tr(S) = p then one can again show that the k-step estimates (obtained after k
iterations of the above algorithm) satisfy
k
√ 1 √
nBk = nB
p
# k $
1 1 p √
+ 1− −1
nAVE{xi ui } + oP(1)
p E(ri ) p − 1
and
k
√ 2 √
n(Sk − I p ) = n(S − I p )
p+2
# k $
2 p + 2√
+ 1− n p · AVE{ui ui } − I p + oP (1).
p+2 p
Testing problem II. Consider again the model with two parts of explaining
variables, X1 and X2 :
Y = X1 β 1 + X 2 β 2 + ε .
We wish to test the null hypothesis that the variables in the X2 part have no effect on
the response variable. Thus H0 : β 2 = 0. In the null case, the estimate of β 1 solves
U(β̂ 1 , 0) X1 = 0.
Then write
Û = U(β̂ 1 , 0) and X̂2 = (In − PX1 )X2 .
Then Û X1 = 0 and X̂2 X1 = 0. The test statistic for testing H0 : β 2 = 0 is then
Q2 = n · tr Û PX̂2 Û(Û Û)−1
obtained by using both inner centering and inner standardization. Then the spatial
signs
Ûi = U S−1/2 (yi − β̂ 1 x1i ) , i = 1, ..., n,
satisfy
Û X1 = 0 and p · Û Û = nI p .
The test statistic is then
Q2 = p · |PX̂2 Û|2 .
This approach uses the spatial rank function R(y) as a score function. Note that
the spatial ranks are invariant under location shifts. Therefore a separate procedure
is needed for the estimation of the intercept vector. This approach here extends
the Wilcoxon-Mann-Whitney and Kruskal-Wallis tests to the general multivariate
regression case.
yi j = y j − y i , xi j = x j − xi , and ε i j = ε j − ε i ,
Testing problem I. Consider the testing problem with the null hypothesis H0 :
β = 0; that is, E(U(yi − y j )) = 0 for all i = j. The test statistic one can use here is
It is, under the null hypothesis, asymptotically equivalent to a multivariate rank test
statistic
n−1/2 vec(R X̂),
where X̂ = (In − P1n )X. Recall that the matrix of spatial ranks R is obtained by
transformations
Unfortunately, this test is not affine invariant but an affine invariant test version
can be found, for example, using the following natural inner standardization. Find a
transformation matrix S−1/2 such that if we transform
then
p · R̂R̂ = tr(R̂ R̂)I p .
The transformation is then a Tyler-type transformation but using ranks instead of
signs, and the test statistic is
np
Q2 = Q2 (X, Y) = · |PX̂ R̂|2 .
tr(R̂ R̂)
Dn (β ) = AVE{|yi j − β xi j | − |yi j |}
or solves
AVE{Ui j (β )xi j } = 0,
where
Ui j (β ) = U(yi j − β xi j ), i, j = 1, ..., n.
The solution β̂ may be found as in the regular LAD regression but replacing ob-
servations and explaining variables by differences of observations and explaining
variables, respectively. The algorithm then uses the two iteration steps:
13.4 L1 regression based on spatial ranks 195
1.
ei j ← y i j − β x i j , i = j.
2. −1
β ← β + AVE{|ei j |−1 xi j xi j } AVE{xi j U(ei j ) }.
R(β̂ ) X = 0
with
R(β ) = (R1 (β ), ..., Rn (β )) ,
where Ri (β ) is the spatial rank of yi − β xi among y1 − β x1 , ..., yn − β xn . See also
Zhou (2009) for this estimate and its properties.
where
A = E A(ε i − ε j ) and B = E B(ε i − ε j , ε i − ε k )
with distinct i, j, and k, and
1
A(y) = I p − U(y)U(y) and B(y1 , y2 ) = U(y1 )U(y2 ) .
|y|
respectively. In fact, B̂ is simply the spatial rank covariance matrix of the estimated
residuals.
As in the case of the regular LAD estimate, the estimate β̂ = β̂ (X, Y) is regres-
sion equivariant and X equivariant but not Y equivariant. The transformation re-
transformation estimation procedure can be created, for example, by first updating
the residuals, then the β matrix, and finally the residual scatter matrix S as follows.
1.
ei j ← S−1/2 (yi j − β xi j ), i, j = 1, ..., n.
2. −1
β ← β + AVE{|ei j |−1 xi j xi j } AVE{xi j U(ei j ) }S1/2 .
3.
S ← p S1/2 AVE{U(ei j )U(eik ) } S1/2 .
196 13 Multivariate linear regression
Testing problem II. Consider again the model with two parts of explaining
variables, X1 and X2 :
Y = 1 n μ + X1 β 1 + X2 β 2 + ε .
We wish to test the null hypothesis H0 : β 2 = 0. In the null case, the estimate of β 1
solves
R(β̂ 1 , 0) X1 = 0.
Then write
R̂ = R(β̂ 1 , 0) and X̂2 = (In − PX1 )X2 .
Then R̂ X1 = 0 and X̂2 X1 = 0. The test statistic for testing H0 : β 2 = 0 is then
Q2 = n · tr R̂ PX̂2 R̂(R̂ R̂)−1
satisfy
R̂ X1 = 0 and p · R̂ R̂ = tr(R̂ R̂)I p .
The test statistic is then
np
Q2 = · |PX̂2 R̂|2 .
tr(R̂ R̂)
13.5 An example
The dataset considered in this example is the LASERI data already analyzed in
Chapter 10. We consider the multivariate regression problem where the response
variables are the differences HRT1T2, COT1T2, and SVRIT1T2, and the explaining
variables are sex (0/1), age (years), and WHR (waist to hip ratio). See Figure 13.1 for
the scatterplot matrix. The variables HRT1T2, COT1T2, and SVRIT1T2 measure
the reaction of the individual hemodynamic system to the change in positions.
We first estimate the regression coefficient matrix in the full model with three
explaining variables: sex, age, and WHR. If the spatial sign score (LAD) with inner
standardization is used, one gets
13.5 An example 197
1.8
Sex
1.4
1.0
40
35
Age
30
0.9
WHR
0.7
10
−10
HRT1T2
−30
0 1 2 3
COT1T2
−2
−500
SVRIT1T2
−2000
Fig. 13.1 Pairwise scatterplots for the variables used in the regression analysis.
> data(LASERI)
> with(LASERI, pairs( cbind(Sex, Age, WHR, HRT1T2,
COT1T2, SVRIT1T2 )))
>
> is.reg.fullmodel <- mv.l1lm(cbind(HRT1T2, COT1T2, SVRIT1T2)
˜ Age + WHR + Sex, data=LASERI, score="s", stand="i")
> with(LASERI, pairs( cbind(Sex, Age, WHR,
residuals(is.reg.fullmodel))))
> summary(is.reg.fullmodel)
Call:
mv.l1lm(formula = cbind(HRT1T2, COT1T2, SVRIT1T2) ˜
Age + WHR + Sex, scores = "s", stand = "i", data = LASERI)
Results by response:
Response HRT1T2 :
Estimate Std. Error
(Intercept) -21.713 6.249
198 13 Multivariate linear regression
Response COT1T2 :
Estimate Std. Error
(Intercept) 2.3401 0.7047
Age 0.0140 0.0113
WHR -2.5060 0.9196
SexMale -0.0496 0.1467
Response SVRIT1T2 :
Estimate Std. Error
(Intercept) -1525.80 415.80
Age -3.31 6.67
WHR 1173.87 542.58
SexMale 76.06 86.53
The residuals are plotted in Figure 13.2. The spatial signs of the residuals and the
explaining variables are made uncorrelated.
1.8
Sex
1.4
1.0
40
35
Age
30
0.9
WHR
0.7
20
HRT1T2
0
−20
2
1
COT1T2
0
−2
1000
SVRIT1T2
0
−1500
Fig. 13.2 Residual plots for the estimated full model with the spatial sign score for LASERI data.
13.5 An example 199
If one uses the identity score instead, regular L2 analysis gives quite similar re-
sults. See the results below.
Call:
mv.l1lm(formula = cbind(HRT1T2, COT1T2, SVRIT1T2)
˜ Age + WHR + Sex, data = LASERI)
Results by response:
Response HRT1T2 :
Estimate Std. Error
(Intercept) -21.013 6.282
Age 0.140 0.101
WHR 5.146 8.197
SexMale -2.510 1.307
Response COT1T2 :
Estimate Std. Error
(Intercept) 3.0223 0.6406
Age 0.0105 0.0103
WHR -3.1486 0.8359
SexMale -0.0084 0.1333
Response SVRIT1T2 :
Estimate Std. Error
(Intercept) -1834.03 365.30
Age -1.74 5.86
WHR 1462.88 476.69
SexMale 63.27 76.02
If one wishes to test the hypothesis that the variable WHR has no effect on the
response variables, one can first estimate the parameters in the submodel (without
WHR) and then use the score test as described earlier. One then gets
Rao (1988) proposed the use of univariate LAD regression separately for the
p response variable. Puri and Sen (1985), Section 6.4, and Davis and McKean
(1993) developed multivariate regression methods based on coordinatewise ranks.
Chakraborty (1999) used the transformation retransformation technique with
marginal LAD estimates to find affine equivariant versions of the LAD estimates.
Multivariate spatial sign methods have been studied in Bai et al. (1990) and Ar-
cones (1998). Multivariate affine equivariant regression quantiles based on spatial
signs and the transformation retransformation technique were introduced and dis-
cussed in Chakraborty (2003). Asymptotics for the spatial rank methods were con-
sidered in Zhou (2009).
Theil-type estimates based on the Oja median were given in Busarova et al.
(2006) and Shen (2009). For a different type of regression coefficient estimates that
are based on the Oja sign and rank covariance matrices, see Ollila et al. (2002,
2004b).
Chapter 14
Analysis of cluster-correlated data
Abstract In this chapter it is shown how the spatial sign and rank methods can be
extended to cluster-correlated data. Tests and estimates for the one-sample location
problem with a general score function are given in detail. Then two-sample weighted
spatial rank tests are considered.
14.1 Introduction
In previous chapters we assumed that the observations in Y = (y1 , ..., yn ) are gen-
erated by the model
Y = Xβ + ε ,
where the n p-variate residuals, that is, the rows of ε = (ε 1 , ..., ε n ) , are independent
and identically distributed (i.i.d.) random vectors. The assumption that the observa-
tions are independent is not true, however, if the data are clustered.
Clustered data can arise in a variety of applications. There may occur natural
groups in the target population. These groupings may, for example, be based on clin-
ics for patients, schools for students, litters for rats, and so on. Still one example on
clustered data is the data arising in longitudinal studies. Then the measurements on
the individuals (clusters) are taken repeatedly over a time interval. If the clustering
in the data is simply ignored, in some cases there can be a serious underestimation
of the variability of the estimators. The true standard deviation of the sample mean
as an estimator of the population mean, for example, may be much larger than its
estimate under the i.i.d. assumption. This underestimation will further result in con-
fidence intervals that are too narrow and p-values that are too small. Therefore an
adjustment to standard statistical methods depending on cluster sizes and intraclass
correlation is needed.
Traditionally, parametric mixed models have been used to account for the corre-
lation structures among the dependent observational units. Then one assumes that
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 201
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 14,
c Springer Science+Business Media, LLC 2010
202 14 Analysis of cluster-correlated data
Y = Zα + Xβ + ε
Let
Y = (y1 , y2 , ..., yn )
be a sample of p-variate random vectors with sample size n. We assume now that
the observations come in d clusters and that the n × d matrix
Z = (z1 , z2 , ..., zn )
Note that
1, if the ith and jth observations come from the same cluster
(ZZ )i j =
0, otherwise,
and that Z Z is a d × d diagonal matrix whose diagonal elements are the d cluster
sizes, say m1 , ..., md .
The one-sample parametric location model with random cluster effects is often
written as
Y = Zα + 1n μ + ε ,
where the d rows of α are i.i.d. from N p (0, Ω ) and the n rows of ε are i.i.d. from
N p (0, Σ ), and α and ε are independent. The model can be reformulated as
14.2 One-sample case 203
Y = 1n μ + ε ,
where now
vec(ε ) ∼ Nnp (0, In ⊗ Σ + ZZ ⊗ Ω ).
We thus move the cluster effect to the covariance matrix of the error variable. If
ε = (ε 1 , . . . , ε n ) then the model states that
1. ε i ∼ N p (0, Σ + Ω ) for all i = 1, . . . , n.
2. If (ZZ )i j = 1, i = j, then
Σ +Ω Ω
vec(ε i , ε j ) ∼ N2p 0, .
Ω Σ +Ω
Y = 1n μ + ε
A general idea to construct tests and estimates is again to use an odd vector-valued
score function T(y) to calculate individual scores Ti = T(yi ), i = 1, ..., n. We write
T = (T1 , T2 , ..., Tn ) .
We need the assumption that E(|T(ε i )|2+γ ) is bounded for some γ > 0. Let L(y) be
the optimal location score function, that is, the gradient vector of log f (y − μ ) with
respect to μ at the origin. Here f (y) is the density of ε i . If H0 : μ = 0 is true, then
E(Ti ) = 0 for all i = 1, ..., n. Write as before
We now also need covariances of two distinct transformed residuals in the same
cluster; that is,
Clearly
COV(vec(T )) = In ⊗ B + ZZ − In ⊗ C
For the sampling design, we have the next assumption.
Assumption 8 Assume that
1 1 d
1n (ZZ − In )1n = ∑ m2i − 1 → do
n n i=1
as d → ∞.
Then, for the one-sample location problem,
AVE{T(yi − μ̂ )} = 0.
and the estimated confidence ellipsoid without this correction is too small (positive
intracluster correlation).
In this section we consider the tests and estimates based on the weighted scores. Let
14.2 One-sample case 205
W = diag(w1 , ..., wn )
AVE{wi T(yi − μ̂ )} = 0.
How should one then choose the weights? Using the results above, one can
choose the weights to maximize Pitman efficiency of the test or to minimize the de-
terminant of the covariance matrix of the estimate, for example. Explicit solutions
can be found in some simplified cases. If C = ρ B (ρ is the intraclass correlation)
then the covariance matrix has the structure
One can then use the Lagrange multiplier technique to find the optimal weights
w = (w1 , ..., wn ) . The solution is
w = λ Σ −1 1n ,
We end this section with the notion that the proposed score-based testing and esti-
mation procedures are not necessarily affine invariant and equivariant, respectively.
Again, affine invariant and equivariant versions may be obtained, as before, using
the transformation retransformation technique. Natural unweighted and weighted
scatter matrix estimates for this purpose have not yet been developed.
Assume that (X, Z, Y) is the data matrix and consider first the general linear regres-
sion model
Y = Xβ + ε ,
where as before Y = (y1 , ..., yn ) is an n × p matrix of n observed values of p re-
sponse variables, X = (x1 , ..., xn ) is an n × q matrix of observed values of q explain-
ing variables, β is a q × p matrix of regression coefficients, and ε = (ε 1 , ..., ε n ) is
an n × p matrix of residuals. At the individual level, one can then write
yi = β xi + ε i , i = 1, ..., n.
As before, the matrix Z is the n × d matrix indicating the cluster membership. The
residuals ε 1 , ..., ε n are not iid any more but satisfy
Assumption 10 The rows of ε = (ε 1 , ..., ε n ) satisfy
1. E(U(ε i )) = 0 and ε i ∼ ε j , for all i, j = 1, ..., n.
2. (ε i , ε j ) ∼ (ε i , ε j ) for all i = j and i = j and (ZZ )i j = (ZZ )i j .
3. If (ZZ )i j = 0 then ε i and ε j are independent.
The first condition says that all the p-variate distributions of the ε i are the same; no
symmetry condition is needed. The condition E(U(ε i )) = 0 is used here just to fix
the center of the distribution of the residuals. (The spatial median of the residuals is
zero.)
X = (1n , x) and β = (μ , Δ ) ,
14.3 Two samples: Weighted spatial rank test 207
the n-vector x is the indicator for the second sample membership, and μ and μ + Δ
are the two location centers (spatial medians of the two populations). We wish to
test the null hypothesis H0 : Δ = 0 and estimate the value of unknown Δ .
and the cluster sizes m1 , ..., md are the diagonal elements of the d × d diagonal ma-
trix Z Z. The sample design is given by the frequency table for group and cluster
membership, that is,
(1n − x) Z, x Z .
If the null hypothesis H0 : Δ = 0 is true then the observations y1 , ..., yn are i.i.d.
from a distribution with the cdf F, say. The population spatial rank score function is
then
RF (y) = E(U(y − yi)).
Function RF is naturally unknown. An often improved estimate of the population
spatial rank function is obtained if one uses a weighted spatial rank function
with some positive strategically chosen individual weights w1 , ..., wn . We again write
RF = (RF (y1 ), ..., RF (yn )) and Rw = (Rw (y1 ), ..., Rw (yn )).
Note that the weighted ranks Rw (y1 ), ..., Rw (yn ) are now centered in the sense that
The test statistic is then based on the weighted sum of weighted ranks over the
second sample, that is,
Rw WX.
One can then show that, under the null hypothesis and under some general assump-
tions,
1 1
√ Rw Wx = √ RF Wxw + oP(1),
n n
where
1
xw = In − 1n 1n W x.
n
208 14 Analysis of cluster-correlated data
Note that now the xw are centered (instead of ranks) so that xw w = 0. Thus the lim-
iting null distribution of n−1/2 Rw Wx is a p-variate normal distribution with mean
value zero and covariance matrix d1 B + d2C where
B = E(RF (ε i )RF (ε i ) )
and
C = E(RF (ε i )RF (ε j ) ) with i = j such that (ZZ )i j = 1.
Here it is assumed that
Most theoretical work for the analysis of longitudinal or clustered data concerns
univariate continuous response variables having a normal distribution. Rosner and
Grove (1999) and Rosner et al. (2003) generalized the standard Wilcoxon-Mann-
Whitney rank sum test to the cluster-correlated case with cluster members belonging
to the same treatment groups. Datta and Satten (2005) and Datta and Satten (2008)
developed the rank-sum tests for cases where members in the same cluster may
belong to different treatment groups. Additionally, the correlation between cluster
members may depend on the cluster size. Finally, Rosner et al. (2003) derived an
adjusted variance estimate for a randomization-based Wilcoxon signed rank test
for clustered paired data. They also introduced a weighted signed-rank statistic to
attain better efficiency. The weighted multivariate sign test is the only nonparametric
multivariate test for cluster-correlated data considered in the literature thus far.
The number ai j is called the (i, j) element of A. The set of r × s matrices is here
denoted by M (r, s). An r × s zero matrix, written 0, is a matrix with all elements
zero and is the zero element in M (r, s).
r × 1 matrices are called (column) vectors or r-vectors; 1 × s matrices are row
vectors. Column vectors are denoted by bold lower-case letters a, b, .... A 1 × 1 ma-
trix a is just a real number. A set of vectors a1 , ..., ar is said to be linearly dependent
if there exist scalars c1 , ..., cr , not all zero, such that c1 a1 + · · · + cr ar = 0. Otherwise
they are linearly independent. Write ei , i = 1, ..., r for an r-vector with ith element
one and other elements zero. These vectors are linearly independent and give an
orthonormal base for Rr .
The transpose of A, written as A , is the s × r matrix
⎛ ⎞
a11 a21 ... ar1
⎜a12 a22 ... ar2 ⎟
A=⎜ ⎟
⎝ ... ... ... ... ⎠ ,
a1s a2s ... ars
obtained by interchanging the roles of the rows and columns; the ith row becomes
the ith column and the jth column becomes the jth row, i = 1, ..., r and j = 1, ..., s.
The sum of two r × s matrices A and B is again an r × s matrix C = A + B, whose
(i, j) element is
ci j = ai j + bi j .
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 209
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3,
c Springer Science+Business Media, LLC 2010
210 A Some vector and matrix algebra
The scalar product of real number c and r × s matrix A is an r × s matrix with (i, j)-
element c · ai j . The product of an r × s matrix A and an s × t matrix B is an r × t
matrix C = AB, whose (i, j) element is
s
ci j = ∑ aik bk j .
k=1
is obtained just by interchanging the ith and jth row of the identity matrix. Permu-
tation matrices can be given as a product of elemental permutations. Write c(Pr ) for
the smallest number of elemental permutations needed for transformation Ir → Pr .
The set of all r × r permutation matrices by Pr includes r! different permutation
matrices.
An r × r matrix A is called a projection matrix if it is idempotent, that is, if
A2 = A and A = A. If A is a projection matrix, then so is Ir − A.
r × r matrix B is the inverse of an r × r matrix A if
AB = BA = Ir .
Then we write B = A−1 . A square matrix A is called invertible if its inverse A−1
exists. Clearly identity matrix Ir is invertible and I−1
r = Ir . Permutation matrices are
A Some vector and matrix algebra 211
A = UDV ,
and
(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).
In statistics, people often wish to work with vectors instead of matrices. The
“vec” operation is then used to vectorize a matrix. If A = (a1 · · · as ) is an r × s
matrix, then ⎛ ⎞
a1
vec(A) = ⎝ ... ⎠
as
just stacks the columns of A on top of each other. An often very useful result for
vectorizing the product of three matrices is
and
p p
K p,p = ∑ ∑ (ei ej ) ⊗ (e j ei ).
i=1 j=1
and
K p,pvec(A) = vec(A ).
The matrix K p,p is sometimes called a commutation matrix.
Appendix B
Asymptotical results for methods based
on spatial signs
215
216 B Asymptotical results for methods based on spatial signs
2. " "
" y−μ y 1 " |μ |1+δ
" − − [I − "
] μ
" |y + μ | |y| r p uu " ≤ C r1+δ
for all 0 < δ < 1 where C does not depend on y or μ . Use Part 1 above and the
Taylor theorem again.
For the first three theorems, see Sections 1.8 and 1.9 in Serfling (1980), for example.
Theorem B.1. (Chebyshev). Let y1 , y2 , ... be uncorrelated (univariate) random vari-
ables with means μ1 , μ2 , ... and variances σ12 , σ22 , .... If ∑ni=1 σi2 = o(n2 ), n → ∞,
then
1 n 1 n
∑
n i=1
yi − ∑ μi →P 0.
n i=1
1 n 1 n
∑
n i=1
yi − ∑ μi → 0 almost surely.
n i=1
and
max |cin | → 0, as n → ∞.
1≤i≤n
Then
n
∑ cin yi →D N(0, σ 2 ).
i=1
(∑ni=1 γi )2
→0
(∑ni=1 σi2 )3
then
∑n y
i=1 i →d N(0, 1).
∑ni=1 σi2
The following key result is Lemma 4.2 in Davis et al. (1992) and Theorem 1 in
Arcones (1998).
Theorem B.5. Let Gn (μ ), μ ∈ R p , be a sequence of convex stochastic processes,
and let G(μ ) be a convex (limit) process in the sense that the finite dimensional dis-
tributions of Gn (μ ) converge to those of G(μ ). Let μ̂ , μ̂ 1 , μ̂ 2 , ... be random variables
such that
Then
μ̂ n →d μ̂ .
Let y be a p-variate random vector with cdf F and p > 1. The spatial median of F
minimizes the objective function
(Note that no moment assumptions are needed as D(μ ) ≤ |μ |.) We wish to test the
null hypothesis H0 : μ = 0 and also estimate the unknown value of μ .
Dn (μ ) = ave{|yi − μ | − |yi|}.
The function Dn (μ ) as well as D(μ ) is convex and bounded. The sample spatial
median is defined as
μ̂ = arg min Dn (μ ).
We also define vector- and matrix-valued functions
y 1 yy yy
U(y) = , A(y) = Ip − 2 , and B(y) =
|y| |y| |y| |y|2
Tn = ave{U(yi )}
218 B Asymptotical results for methods based on spatial signs
is then the spatial sign test statistic for testing the null hypothesis that the spatial
median is zero.
Assumption 11 We assume that (i) the density function f of y is continuous and
bounded in an open neighborhood of the origin, and that (ii) the spatial median of
the distribution of y is zero and unique; that is,
D(μ ) > 0, ∀μ = 0.
δ = inf D(μ ).
| μ |≥C
Using our results in Sections B.1 and B.2 we easily get the following.
√
Lemma B.4. Under our assumptions, nTn →d N p (0, B).
and
Lemma B.5. Under our assumptions
√ 1
nDn (n−1/2 μ ) − nTn − Aμ μ →P 0.
2
The proof was constructed in the multivariate case (p > 1). The univariate case
can be proved in the same way. The matrix A is then replaced by the scalar a = 2 f (0)
and B by b = 1.
Anderson, T.W. (1999). Asymptotic theory for canonical correlation analysis. Jour-
nal of Multivariate Analysis, 70, 1–29.
Anderson, T.W. (2003). An Introduction to Multivariate Statistical Analysis. Third
Edition, Wiley, New York.
Arcones, M.A. (1998). Asymptotic theory for M-estimators over a convex kernel.
Econometric Theory, 14, 387–422.
Arcones, M.A., Chen, Z., and Gine, E. (1994). Estimators related to U-processes
with applications to multivariate medians: Asymptotic normality. Annals of
Statistics, 22, 1460–1477.
Azzalini, A. (2005). The skew-normal distribution and related multivariate families.
Scandinavian Journal of Statistics, 32, 159–188.
Bai, Z.D., Chen, R., Miao, B.Q., and Rao, C.R. (1990). Asymptotic theory of least
distances estimate in multivariate linear models. Statistics, 4, 503–519.
Barnett, V. (1976). The ordering of multivariate data. Journal of Royal Statistical
Society, A, 139, 318–355.
Bassett, G. and Koenker, R. (1978). Asymptotic theory of least absolute error re-
gression. Journal of the American Statistical Association, 73, 618–622.
Bickel, P.J. (1964). On some asymptotically nonparametric competitors of
Hotelling’s T 2 . Annals of Mathematical Statistics, 36, 160–173.
Bilodeau, M. and Brenner, D. (1999). Theory of Multivariate Statistics. Springer-
Verlag, New York.
Blomqvist, N. (1950). On a measure of dependence between two random variables.
Annals of Mathematical Statistics, 21, 593–600.
Blumen, I. (1958). A new bivariate sign test for location. Journal of the American
Statistical Association, 53, 448–456.
Brown (1983). Statistical uses of the spatial median, Journal of the Royal Statistical
Society, B, 45, 25–30.
Brown, B. and Hettmansperger, T. (1987). Affine invariant rank methods in the bi-
variate location model. Journal of the Royal Statististical Society, B 49, 301–310.
Brown, B. and Hettmansperger, T. (1989). An affine invariant bivariate version of
the sign test. Journal of the Royal Statistical Society, B 51, 117–125.
Brown, B.M., Hettmansperger, T.P., Nyblom, J., and Oja, H. (1992). On certain
bivariate sign tests and medians. Journal of the American Statistical Association,
87, 127–135.
Busarova, D., Tyurin, Y., Möttönen, J., and Oja, H. (2006). Multivariate Theil esti-
mator with the corresponding test. Mathematical Methods of Statistics, 15, 1–19.
Chakraborty, B. (1999). On multivariate median regression. Bernoulli, 5, 683–703.
Chakraborty, B. (2003). On multivariate quantile regression. Journal of Statistical
Planning and Inference, 110, 109–132.
Chakraborty, B. and Chaudhuri, P. (1996). On the transformation and retransforma-
tion technique for constructing affine equivariant multivariate median. Proceed-
ings of the American Mathematical Society, 124, 2359–2547.
221
222 References
Donoho, D.L. and Huber, P.J. (1983). The notion of breakdown poin. In: A
Festschrift for Erich L. Lehmann ( ed. P.J. Bickel, K.A. Doksum and J.L. Hodges)
Belmont, Wadsworth, pp. 157–184.
Dümbgen, L. (1998). On Tyler’s M-functional of scatter in high dimension. Annals
of the Institute of Statistal Mathematics, 50, 471–491.
Dümbgen, L. and Tyler, D. (2005). On the breakdown properties of some multivari-
ate M-functionals. Scandinavian Journal of Statistics, 32, 247–264.
Everitt, B. (2004). An R and S-PLUS Companion to Multivariate Analysis. London:
Springer.
Frahm, G. (2004). Generalized Elliptical Distributions: Theory and Applications.
Doctoral Thesis: Universität zu Köln, Wirtschafts- uund Sozialwissenschaftliche
Fakultät. Seminar für Wirtschafts- und Sozialstatistik.
Gieser, P.W. and Randles, R.H. (1997). A nonparametric test of independence be-
tween two vectors. Journal of the American Statistical Association, 92, 561–567.
Gini and Galvani (1929). Di talune estensioni dei concetti di media ai caratteri qual-
itative. Metron, 8
Gómez, E., Gómez-Villegas, M.A., and Marı́n, J.M. (1998). A multivariate gener-
alization of the power exponential family of distributions. Communications in
Statististics -Theory and Methods, 27, 3, 589–600.
Gower, J. S. (1974). The mediancentre. Applied Statistics, 2, 466–470.
Haataja, R., Larocque, D., Nevalainen, J., and Oja, H. (2008). A weighted multivari-
ate signed-rank test for cluster-correlated data. Journal of Multivariate Analysis,
100, 1107–1119.
Haldane, J.B.S. (1948). Note on the median of the multivariate distributions.
Biometrika,35, 414–415.
Hallin, M. and Paindaveine, D. (2002). Optimal tests for multivariate location based
on interdirections and pseudo-Mahalanobis ranks. Annals of Statistics, 30, 1103–
1133.
Hallin, M. and Paindaveine, D. (2006). Semiparametrically efficient rank-based in-
ference for shape. I. Optimal rank-based tests for sphericity. Annals of Statistics,
34, 2707–2756.
Hampel, F.R. (1968). Contributions to the theory of robust estimation. Ph.D. Thesis,
University of California, Berkeley.
Hallin, M., Oja, H., and Paindaveine, D. (2006). Semiparametrically efficient rank-
based inference for shape. II. Optimal R-estimation of shape. Annals of Statistics,
34, 2757–2789.
Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal
of the American Statistical Association, 62, 1179–1186.
Hampel, F.R., Rousseeuw, P.J., Ronchetti, E.M., and Stahel, W.A. (1986). Robust
Statistics: The Approach Based on Influence Functions. Wiley, New York.
Hettmansperger, T.P. and Aubuchon, J.C. (1988). Comment on “Rank-based robust
analysis of linear models. I. Exposition and review” by David Draper. Statistical
Science, 3, 262–263.
Hettmansperger, T.P. and McKean, J.W. (1998). Robust Nonparametric Statistical
Methods. Arnold, London.
224 References
Niinimaa, A. and Oja, H. (1995). On the influence function of certain bivariate me-
dians. Journal of the Royal Statistical Society, B, 57, 565–574.
Niinimaa, A. and Oja, H. (1999). Multivariate median. In: Encyclopedia of Statis-
tical Sciences (Update Volume 3) (ed. S. Kotz, N.L. Johnson, and C.P. Read),
Wiley, New York.
Nordhausen, K., Oja, H., and Paindaveine, D. (2009). Signed-rank tests for location
in the symmetric independent component model. Journal of Multivariate Analy-
sis, 100, 821–834.
Nordhausen, K., Oja, H. and Ollila, E. (2009). Multivariate Models and the First
Four Moments. In: Festschrift in Honour of Tom Hettmansperger, to appear.
Nordhausen, K., Oja, H., and Tyler, D. (2006). On the efficiency of invariant multi-
variate sign and rank tests. In: Festschrift for Tarmo Pukkila on his 60th Birthday
(ed. E. Liski, J. Isotalo, J, J. Niemel, S. Puntanen, and G. Styan).
Oja, H. (1983). Descriptive statistics for multivariate distributions. Statistics &
Probability Letters 1, 327–332.
Oja, H. (1987). On permutation tests in multiple regression and analysis of covari-
ance problems. Australian Journal of Statistics 29, 81–100.
Oja, H. (1999). Affine invariant multivariate sign and rank tests and corresponding
estimates: a review. Scandinavian Journal of Statistics 26, 319–343.
Oja, H. and Niinimaa, A. (1985). Asymptotical properties of the generalized median
in the case of multivariate normality. Journal of the Royal Statistical Society, B,
47, 372–377.
Oja, H. and Nyblom, J. (1989). On bivariate sign tests. Journal of the American
Statistical Association, 84, 249–259.
Oja, H. and Paindaveine, D. (2005). Optimal signed-rank tests based on hyper-
planes. Journal of Statistical Planning and Inference, 135, 300–323.
Oja, H. and Randles, R.H. (2004). Multivariate nonparametric tests. Statistical Sci-
ence, 19, 598–605.
Oja, H., Paindaveine, D., and Taskinen, S. (2009). Parametric and nonparametric
tests for multivariate independence in the independence component model. Sub-
mitted.
Oja, H., Sirkiä, S., and Eriksson, J. (2006). Scatter matrices and independent com-
ponent analysis. Austrian Journal of Statistics, 35, 175–189.
Ollila, E., Croux, C., and Oja, H. (2004). Influence function and asymptotic ef-
ficiency of the affine equivariant rank covariance matrix. Statistica Sinica, 14,
297–316.
Ollila, E., Hettmansperger, T.P., and Oja, H. (2002). Estimates of regression coeffi-
cients based on sign covariance matrix. Journal of the Royal Statistical Society,
B, 64, 447–466.
Ollila, E., Oja, H., and Croux, C. (2003b). The affine equivariant sign covariance
matrix: Asymptotic behavior and efficiency. Journal of Multivariate Analysis, 87,
328–355.
Ollila, E., Oja, H., and Koivunen, V. (2003). Estimates of regression coefficients
based on rank covariance matrix. Journal of the American Statistical Association,
98, 90–98.
228 References
231
232 Index