100% found this document useful (1 vote)
358 views

NonParametric Methods

This document provides an overview of a book about multivariate nonparametric methods using spatial signs and ranks. The book has 14 chapters that introduce these new methods as an alternative to traditional multivariate analyses that rely on multivariate normality assumptions. Key topics covered include multivariate location and scatter models, descriptive statistics, multivariate signs and ranks, one-sample and multiple-sample tests and estimates, and applications to cluster-dependent data. Code for implementing the methods is available in the R package MNM. The book aims to provide a unified theoretical framework for these distribution-free multivariate methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
358 views

NonParametric Methods

This document provides an overview of a book about multivariate nonparametric methods using spatial signs and ranks. The book has 14 chapters that introduce these new methods as an alternative to traditional multivariate analyses that rely on multivariate normality assumptions. Key topics covered include multivariate location and scatter models, descriptive statistics, multivariate signs and ranks, one-sample and multiple-sample tests and estimates, and applications to cluster-dependent data. Code for implementing the methods is available in the R package MNM. The book aims to provide a unified theoretical framework for these distribution-free multivariate methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 247

Lecture Notes in Statistics 199

Edited by
P. Bickel
P. Diggle
S. Fienberg
U. Gather
I. Olkin
S. Zeger

For other titles published in this series, go to


http://www.springer.com/series/694
Hannu Oja

Multivariate Nonparametric
Methods with R
An Approach Based on Spatial Signs
and Ranks

ABC
Prof. Hannu Oja
University of Tampere
Tampere School of Public Health
FIN-33014 Tampere
Finland
[email protected]

ISSN 0930-0325
ISBN 978-1-4419-0467-6 e-ISBN 978-1-4419-0468-3
DOI 10.1007/978-1-4419-0468-3
Springer New York Dordrecht Heidelberg London

Library of Congress Control Number: 2010924740

°c Springer Science+Business Media, LLC 2010


All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer software,
or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject
to proprietary rights.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)


To my family.
Preface

This book introduces a new way to analyze multivariate data. The analysis of data
based on multivariate spatial signs and ranks proceeds very much as does a tradi-
tional multivariate analysis relying on the assumption of multivariate normality: the
L2 norm is just replaced by different L1 norms, observation vectors are replaced by
their (standardized and centered) spatial signs and ranks, and so on. The methods are
fairly efficient and robust, and no moment assumptions are needed. A unified the-
ory starting with the simple one-sample location problem and proceeding through
the several-sample location problems to the general multivariate linear regression
model and finally to the analysis of cluster-dependent data is presented.

The material is divided into 14 chapters. Chapter 1 serves as a short introduc-


tion to the general ideas and strategies followed in the book. Chapter 2 introduces
and discusses different types of parametric, nonparametric, and semiparametric sta-
tistical models used to analyze the multivariate data. Chapter 3 provides general
descriptive tools to describe the properties of multivariate distributions and multi-
variate datasets. Multivariate location and scatter functionals and statistics and their
use is described in detail. Chapter 4 introduces the concepts of multivariate spatial
sign, signed-rank, and rank, and shows their connection to certain L1 objective func-
tions. Also sign and rank covariance matrices are discussed carefully. The first four
chapters thus provide the necessary tools to understand the remaining part of the
book.

The one-sample location case is treated thoroughly in Chapters 5-8. The book
then starts with the familiar Hotelling’s T 2 test and the corresponding estimate, the
sample mean vector. The spatial sign test with the spatial median as well as the
spatial signed-rank test with Hodges-Lehmann estimate is treated in Chapters 6 and
7. All the tests and estimates are made practical; the algorithms for the estimates
and the estimates of the covariance matrices of the estimates are also discussed and
described in detail. Chapter 7 is devoted to the comparisons of these three competing
approaches.

vii
viii Preface

Chapters 9 and 10 continue discussion of the one-sample case. In Chapter 9, tests


and estimates for the shape (scatter) matrices based on different sign and rank co-
variance matrices are given. Also principal component analysis is discussed. Chap-
ter 10 provides different tests for the important hypothesis of independence between
two subvectors.

Sign and rank tests with companion estimates for the comparison of two or sev-
eral treatment effects are given in Chapter 11 (independent samples) and Chapter 12
(randomized block design). The general multivariate multiple regression case with
L1 objective functions is finally discussed in Chapter 13. The book ends with sign
and rank procedures for cluster-dependent data in Chapter 14.

Throughout the book, the theory is illustrated with examples. For computation
of the statistical procedures described in the book, the R package MNM (and Spa-
tialNP) is available on CRAN. In the analysis we always compare three different
score functions, the identity score, the spatial sign score, and the spatial rank (or
spatial signed-rank) score, and the general estimating and testing strategy is ex-
plained in each case. Some basic vector and matrix algebra tools and asymptotic
results are given in Appendices A and B.

Acknowledgements
The research reported in this book is to a great degree based on the thesis work
of several ex-students of mine, including Ahti Niinimaa, Jyrki Möttönen, Samuli
Visuri, Esa Ollila, Sara Taskinen, Jaakko Nevalainen, Seija Sirkiä, and Klaus Nord-
hausen. I wish to thank them all. This would not have been possible without their
work. I have been lucky to have such excellent students. My special thanks go to
Klaus Nordhausen for his hard work in writing and putting together (with Jyrki
and Seija) the R-code to implement the theory. I am naturally also indebted to
many colleagues and coauthors for valuable and stimulating discussions. I express
my sincere thanks for discussions and cooperation in this specific research area
with Biman Chakraborty, Probal Chaudhuri, Christopher Croux, Marc Hallin, Tom
Hettmansperger, Visa Koivunen, Denis Larocque, Jukka Nyblom, Davy Paindav-
eine, Ron Randles, Bob Serfling, Juha Tienari, and Dave Tyler.

Thanks are also due to the Academy of Finland for several research grants for work-
ing in the area of multivariate nonparametric methods. I also thank the editors of this
series and John Kimmel of Springer-Verlag for his encouragement and patience.

Tampere,
January 2010 Hannu Oja
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Multivariate location and scatter models . . . . . . . . . . . . . . . . . . . . . . . . . . 5


2.1 Construction of the multivariate models . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Multivariate elliptical distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Other distribution families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Location and scatter functionals and sample statistics . . . . . . . . . . . . . . 15


3.1 Location and scatter functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Location and scatter statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 First and second moments of location and scatter statistics . . . . . . . . 19
3.4 Breakdown point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Influence function and asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Other uses of location and scatter statistics . . . . . . . . . . . . . . . . . . . . . 25

4 Multivariate signs and ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29


4.1 The use of score functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Univariate signs and ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Multivariate spatial signs and ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Sign and rank covariance matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 Other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 One-sample problem: Hotelling’s T 2 -test . . . . . . . . . . . . . . . . . . . . . . . . . . 47


5.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 General strategy for estimation and testing . . . . . . . . . . . . . . . . . . . . . 49
5.3 Hotelling’s T 2 -test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

ix
x Contents

6 One-sample problem: Spatial sign test and spatial median . . . . . . . . . . 59


6.1 Multivariate spatial sign test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.1.2 The test outer standardization . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.1.3 The test with inner standardization . . . . . . . . . . . . . . . . . . . . . . 65
6.1.4 Other sign-based approaches for testing problem . . . . . . . . . . 68
6.2 Multivariate spatial median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.1 The regular spatial median . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.2 The estimate with inner standardization . . . . . . . . . . . . . . . . . . 75
6.2.3 Other multivariate medians . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

7 One-sample problem: Spatial signed-rank test and Hodges-


Lehmann estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.1 Multivariate spatial signed-rank test . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 Multivariate spatial Hodges-Lehmann estimate . . . . . . . . . . . . . . . . . . 89
7.3 Other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

8 One-sample problem: Comparisons of tests and estimates . . . . . . . . . . 95


8.1 Asymptotic relative efficiencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.2 Finite sample comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

9 One-sample problem: Inference for shape . . . . . . . . . . . . . . . . . . . . . . . . . 107


9.1 The estimation and testing problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.2 Important matrix tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.3 The general strategy for estimation and testing . . . . . . . . . . . . . . . . . . 110
9.4 Test and estimate based on UCOV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9.5 Test and estimates based on TCOV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
9.6 Tests and estimates based on RCOV . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.7 Limiting efficiencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
9.9 Principal component analysis based on spatial signs and ranks . . . . . 123
9.10 Other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

10 Multivariate tests of independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131


10.1 The problem and a general strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
10.2 Wilks’ and Pillai’s tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
10.3 Tests based on spatial signs and ranks . . . . . . . . . . . . . . . . . . . . . . . . . . 133
10.4 Efficiency comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
10.5 A real data example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
10.6 Canonical correlation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
10.7 Other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Contents xi

11 Several-sample location problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


11.1 A general strategy for testing and estimation . . . . . . . . . . . . . . . . . . . . 145
11.2 Hotelling’s T 2 and MANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
11.3 The test based on spatial signs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11.4 The tests based on spatial ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
11.5 Estimation of the treatment effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
11.6 An example: Egyptian skulls from three epochs. . . . . . . . . . . . . . . . . . 163
11.7 References and other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

12 Randomized blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171


12.1 The problem and the test statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
12.2 Limiting distributions and efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
12.3 Treatment effect estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
12.4 Affine invariant tests and affine equivariant estimates . . . . . . . . . . . . . 179
12.5 Examples and final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
12.6 Other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

13 Multivariate linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183


13.1 General strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
13.2 Multivariate linear L2 regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
13.3 L1 regression based on spatial signs . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
13.4 L1 regression based on spatial ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
13.5 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
13.6 Other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

14 Analysis of cluster-correlated data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201


14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
14.2 One-sample case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
14.2.1 Notation and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
14.2.2 Tests and estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
14.2.3 Tests and estimates, weighted versions . . . . . . . . . . . . . . . . . . 204
14.3 Two samples: Weighted spatial rank test . . . . . . . . . . . . . . . . . . . . . . . 206
14.4 References and other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

A Some vector and matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

B Asymptotical results for methods based on spatial signs . . . . . . . . . . . . 215


B.1 Some auxiliary results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
B.2 Basic limit theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
B.3 Notation and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
B.4 Limiting results for spatial median . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
B.5 Limiting results for the multivariate regression estimate . . . . . . . . . . 219
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Notation

Y = (y1 , y2 , ..., yn ) n × p matrix of response variables


X = (x1 , x2 , ..., xn ) n × q matrix of explaining variables
N p (μ , Σ ) p-variate normal distribution
tν ,p (μ , Σ ) p-variate t-distribution with ν degrees of freedom
E p (μ , Σ , ρ ) p-variate elliptic distribution
Sp p-variate unit sphere
Bp p-variate unitball
|A| Matrix norm tr(A A)
M(F) Multivariate location vector (functional)
C(F) Multivariate scatter matrix (functional)
M(Y) Multivariate location statistic
S(Y) Multivariate scatter statistic
T(y) General score function
L(y) Optimal location score function
U(y) Spatial sign function
R(y), RF (y) Spatial rank function
Q(y), QF (y) Spatial signed-rank function
Ui = U(yi ) Spatial sign of yi
Ri = R(yi ) Spatial rank of yi
Qi = Q(yi ) Spatial signed-rank of yi
AVE Average
COV Covariance matrix
UCOV Sign covariance matrix
TCOV Sign covariance matrix based of pairs of observations
QCOV Signed-rank covariance matrix
RCOV Rank covariance matrix

xiii
Chapter 1
Introduction

Classical multivariate statistical inference methods (Hotelling’s T 2 , multivariate


analysis of variance, multivariate regression, tests for independence, canonical cor-
relation analysis, principal component analysis, etc.) are based on the use of the reg-
ular sample mean vector and covariance matrix. See, for example, the monographs
by Anderson (2003) and Mardia et al. (1979). These standard moment-based mul-
tivariate techniques are optimal under the assumption of multivariate normality but
unfortunately poor in their efficiency for heavy-tailed distributions and are highly
sensitive to outlying observations. The book by Puri and Sen (1971) gives a com-
plete presentation of multivariate analysis methods based on marginal signs and
ranks. Pesarin (2001) considers permutation tests for multidimensional hypotheses.
In this book nonparametric and robust competitors to standard multivariate infer-
ence methods based on (multivariate) spatial signs and ranks are introduced and
discussed in detail. An R statistical software package MNM (Multivariate Nonpara-
metric Methods) to implement the procedures is freely available to users of spatial
sign and rank methods.

The univariate concepts of sign and rank are based on the ordering of the uni-
variate data y1 , ..., yn . The ordering is manifested with the univariate sign function
U(y) with values −1, 0, and 1 for y < 0, y = 0, and y > 0, respectively. The sign
and centered rank of the observation yi is then U(yi ) and AVE j {U(yi − y j )}. In the
multivariate case there is no natural coordinate-free ordering of the data points; see
Barnett (1976) for a discussion on the problem. An approach utilizing L1 objective
or criterion functions is therefore often used to extend these concepts to the multi-
variate case. Let Y = (y1 , ..., yn ) be an n × p data matrix with n observations and
p variables. The multivariate spatial sign Ui , multivariate spatial (centered) rank Ri ,
and multivariate spatial signed-rank Qi , i = 1, ..., n, may be implicitly defined us-
ing the three L1 criterion functions with Euclidean norm | · |. The sign, rank and
signed-rank are then defined implicitly by

H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 1


and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 1,
c Springer Science+Business Media, LLC 2010
2 1 Introduction

AVE{|yi |} = AVE{Ui yi },
1
AVE{|yi − y j |} = AVE{Ri yi }, and
2
1
AVE{|yi − y j | + |yi + y j |} = AVE{Qi yi }.
4
See Hettmansperger and Aubuchon (1988). Note also that the sign, centered rank,
and signed-rank may be seen as scores T(y) corresponding to the three objective
functions. These score functions then are

U(y) = |y|−1 y,
R(y) = AVE{U(y − yi)}, and
1
Q(y) = AVE{U(y − yi) + U(y + yi)}.
2
The identity score T(yi ) = yi , i = 1, ..., n, is the score corresponding to the regular
L2 criterion AVE{|yi |2 } = AVE{yi yi }.

Multivariate spatial sign and spatial rank methods are thus based on the L1 ob-
jective functions and corresponding score functions. The L1 methods have a long
history in the univariate case but they are often regarded as computationally highly
demanding. See, however, Portnoy and Koenker (1997). The first objective function
AVE{|yi |}, if applied to the residuals in the general linear regression model, is the
mean deviation of the residuals from the origin, and it is the basis for the so called
least absolute deviation (LAD) methods. It yields different median-type estimates
and sign tests in the one-sample, two-sample, several-sample and finally general
linear model settings. The second objective function AVE{|yi − y j |} is the mean
difference of the residuals which in fact measures how close together the residu-
als are. The second and third objective functions generate Hodges-Lehmann type
estimates and rank tests for different location problems.

The general strategy in the analysis of the multivariate data followed in this book
is first to replace the original observations yi by some scores Ti = T(yi ) or, in more
complex designs, by centered and/or standardized scores T̂i = T̂(yi ), i = 1, ..., n.
The statistical tests are then based on the new data matrix

T = (T1 , ..., Tn ) or T̂ = (T̂1 , ..., T̂n ) .

The spatial sign score U(y), the spatial rank score R(y), and the spatial signed-
rank score Q(y) thus correspond to the three L1 criterion functions given above.
The tests are then rotation invariant but not affine invariant. Inner centering and/or
standardization is used to attain the desired affine invariance property of the tests.
The location estimates are chosen to minimize the selected criterion function; they
are also obtained if one applies inner centering with the corresponding score. Inner
standardization may be used to construct affine equivariant versions of the estimates
1 Introduction 3

and as a side product one gets scatter matrix estimates for the inference on the
covariance structure of the data.

The tests and estimates for the multivariate location problem based on multivari-
ate signs and ranks have been widely discussed in the literature. See, for example,
Möttönen and Oja (1995), Choi and Marden (1997), Marden (1999a), and Oja and
Randles (2004). The scatter matrix estimates by Tyler (1987) and Dümbgen (1998)
are often used in inner standardizations. The location tests and estimates are robust
and they have good efficiency properties even in the multivariate normal model.
Möttönen et al. (1997) calculated the asymptotic efficiencies e1 (p, ν ) and e2 (p, ν )
of the multivariate spatial sign and rank methods, respectively, in the p-variate tν ,p
distribution case (t∞,p is the p-variate normal distribution). In the 3-variate case, for
example, the asymptotic efficiencies are

e1 (3, 3) = 2.162, e1 (3, 10) = 1.009, e1 (3, ∞) = 0.849,


e2 (3, 3) = 1.994, e2 (3, 10) = 1.081, e2 (3, ∞) = 0.973,

and in the 10-variate case one has even higher

e1 (10, 3) = 2.422, e1 (10, 10) = 1.131, e1 (10, ∞) = 0.951,


e2 (10, 3) = 2.093, e2 (10, 10) = 1.103, e2 (10, ∞) = 0.989.

The procedures based on spatial signs and ranks, however, yield only one
possible approach to multivariate nonparametric tests (sign test, rank test) and cor-
responding estimates (median, Hodges-Lehmann estimate). Randles (1989), for
example, developed an affine invariant sign test based on interdirections. Interdi-
rections measure the angular distance between two observation vectors relative to
the rest of the data. Randles (1989) was followed by a series of papers introduc-
ing nonparametric sign and rank interdirection tests for different location problems.
These tests are typically asymptotically equivalent with affine invariant versions of
the spatial sign and rank tests. The tests and estimates based on interdirections are,
unfortunately, computationally heavy.

The multivariate inference methods based on marginal signs and ranks are de-
scribed in detail in the monograph by Puri and Sen (1971). This first extension of
the univariate sign and rank methods to the multivariate setting is based on the cri-
terion functions

AVE{|yi1 | + · · · + |yip|} and AVE{|yi1 − y j1| + · · · + |yip − y j p|},



respectively. Note that here the Euclidean norm, the L2 -norm |y| = y21 + · · · + y2p,
is replaced by the L1 -norm |y| = |y1 | + · · · + |y p |. Unfortunately, the tests are
not affine invariant and the estimates are not affine equivariant in this approach.
Chakraborty and Chaudhuri (1996) and Chakraborty and Chaudhuri (1998) used the
4 1 Introduction

so-called transformation retransformation technique (TR-technique) to find affine


invariant versions of the tests and affine equivariant versions of the estimates.

Affine equivariant multivariate signs and ranks are obtained if one uses the L1
criterion functions

AVE{V (yi1 , ..., yi p , 0)} and AVE{V (yi1 , ..., yi p+1 )}

where the average is over all the p-tuples and (p + 1)-tuples of observations, respec-
tively, and   
1 1 ··· 1
V (y1 , ..., y p+1 ) = abs det
p! y1 · · · y p+1
is the volume of the p-variate simplex with vertices y1 , ..., y p+1 . The one-sample
location estimate is known as the Oja median Oja (1983). The approach in the
one-sample and two-sample location cases is described in Oja (1999). In a paral-
lel approach Koshevoy and Mosler (1997a,b, 1998) used so-called zonotopes and
lift-zonotopes to illustrate and characterize a multivariate data cloud. The duality
relationship between these two approaches is analyzed in Koshevoy et al. (2004).

In all the approaches listed above the power of the rank tests may often be in-
creased, if some further transformations are applied to the ranks. In a series of pa-
pers, Hallin and Paindaveine constructed optimal signed-rank tests for the location
and scatter problems in the elliptical model; see the series of papers starting with
Hallin and Paindaveine (2002). In their approach, the location tests were based on
the spatial signs and optimally transformed ranks of the Euclidean lengths of the
standardized observations. As the model of elliptically symmetric distributions, the
so called independent component (IC) model is also an extension of the multivariate
normal model. In the test construction in this model, one first transforms the obser-
vations to the estimated independent coordinates, then calculates the values of the
marginal sign and rank test statistics, and finally combines the asymptotically inde-
pendent tests in the regular way. See Nordhausen et al. (2009) and Oja et al. (2009)
for optimal rank tests in the IC model.

In this book the approach is thus based on the spatial signs and ranks. A uni-
fied theory starting with the simple one-sample location problem and proceeding
through the several-sample location problems to the general multivariate linear re-
gression model and finally to the analysis of cluster-dependent data is presented.
The theory is often presented using a general score function and, for comparison
to classical methods, the classical normal-based approach (L2 criterion and identity
score function) is carefully reviewed in each case. Also statistical inference on scat-
ter or shape matrices based on the spatial signs and ranks is discussed. The theory is
illustrated with several examples. For the computation of the statistical procedures
described in the book, R packages MNM and SpatialNP are available on CRAN.
The readers who are not so familiar with R are advised to learn more from Venables
et al. (2009), Dalgaard (2008), or Everitt (2005).
Chapter 2
Multivariate location and scatter models

Abstract In this chapter we first introduce and describe different symmetrical and
asymmetrical parametric and semiparametric (linear) models which are then later
used as the model assumptions in the statistical analysis. The models discussed in-
clude multivariate normal distribution N p (μ , Σ ) and its different extensions includ-
ing multivariate t distribution tν ,p (μ , Σ ) distribution, multivariate elliptical distribu-
tion E p (μ , Σ , ρ ), as well as still wider semiparametric symmetrical models. Also
some models with skew distributions (generalized elliptical model, mixture models,
skew-elliptical model, independent component model) are briefly discussed.

2.1 Construction of the multivariate models

In this chapter we consider different parametric, nonparametric, and semiparametric


models for multivariate continuous observations. Consider a data matrix consisting
of n observed values of a p-variate response variable,

Y = (y1 , ..., yn ) ∈ M (n, p),

where M (n, p) is the set of n × p matrices. In the one-sample case, the p-variate
observations yi , i = 1, ..., n, may be thought to be independent and to be generated
by
yi = μ + Ω ε i , i = 1, ..., n,
where the p-vectors ε i are called standardized and centered residuals, μ is a lo-
cation p-vector, Ω is a full-rank p × p transformation matrix and Σ = Ω Ω  > 0
is called a scatter matrix. We are more explicit later regarding what is meant by a
standardized and centered random vector. (It is usual to say that ε i is standardized
if COV(ε i ) = I p , and it is centered if E(ε i ) = 0.) Notation Σ > 0 means that Σ is
positive definite (with rank p).

In the several-sample and multivariate regression case, we write

H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 5


and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 2,
c Springer Science+Business Media, LLC 2010
6 2 Multivariate location and scatter models

X = (x1 , ..., xn )

for an n × q matrix of the values of q explaining variables measured on n individuals


and assume that
Y = Xβ + εΩ  ,
where β is the q × p matrix of regression coefficients and ε is a n × p matrix of n
independent and standardized residuals ε 1 , ..., ε n . The one-sample case is obtained
with X = 1n . In this book the aim is to develop statistical inference methods, tests
and estimates, for unknown parameters β and also for Σ = Ω Ω  under weak as-
sumptions on the distribution of the standardized residuals ε i , i = 1, ..., n.

Different parametric and semiparametric (or nonparametric) models are obtained


by making different assumptions on the distribution of ε i . First we list symmetry as-
sumptions often needed in future developments. Symmetry of a distribution of the
standardized p-variate random variable ε may be seen as an invariance property
of the distribution under certain transformations. Relevant transformations include
orthogonal transformations (ε → Oε with O O = OO = I p ), sign-change trans-
formations (ε → Jε , where J is a sign-change matrix, a p × p diagonal matrix with
diagonal elements ±1), and permutations ε → Pε , where P is a p × p permutation
matrix (obtained by successively permuting the rows and/or columns of I p ). See
also Appendix A for our matrix notation and definitions.

Definition 2.1. The random p-vector ε is


• spherically symmetrical if Oε ∼ ε for all orthogonal matrices O
• marginally symmetrical if Jε ∼ ε for all sign-change matrices J
• symmetrical (or centrally symmetrical) if −ε ∼ ε
• exchangeable if Pε ∼ ε for all permutation matrices P

Note that a spherically symmetrical random variable is also marginally symmet-


rical, centrally symmetrical and exchangeable as J, −I p and P are all orthogonal. A
marginally symmetrical random variable is naturally also centrally symmetrical. A
hierarchy of symmetrical models is then obtained with assumptions
(A0) ε i ∼ N p (0, I p )
(A1) ε i spherically symmetrical
(A2) ε i marginally symmetrical and exchangeable
(A3) ε i marginally symmetrical
(A4) ε i symmetrical
The first and strongest assumption gives the regular multivariate normal (para-
metric) model with yi ∼ N p (μ , Σ ) (or yi ∼ N p (xi β , Σ )). The classical multivariate
inference methods (Hotelling’s T 2 , multivariate analysis of variance (MANOVA),
multivariate regression analysis, principal component analysis (PCA), canonical
correlation analysis (CCA), factor analysis (FA), etc.) rely on the assumption of
multivariate normality. The multivariate normal distribution is discussed in Section
2.2. In models (A1)–(A4), which may be seen as semiparametric extensions of the
2.1 Construction of the multivariate models 7

multivariate normal model (A0), the location parameter μ is the well defined sym-
metry center of the distribution of yi . Extra assumptions are needed to make Σ well
defined. In the models (A0)–(A2) the scatter matrix Σ has a natural interpretation;
Σ is proportional to the covariance matrix if it exists.

Asymmetrical semiparametric models are obtained if the assumptions are made


merely on the distribution of the direction vectors ui = |ε i |−1 ε i . Our directional
symmetry assumptions then are
(B1) ui uniformly distributed on unit sphere S p
(B2) ui marginally symmetrical and exchangeable
(B3) ui marginally symmetrical
(B4) ui symmetrical
Note that the assumptions (B1)–(B4) do not say anything about the distribution of
the radius or modulus ri = |ε i | of the standardized vector; the radius ri and direction
vector ui may even be dependent so that skew distributions are allowed.

Under the assumptions (B1)–(B4), parameters μ and Σ still describe the location
and the scatter of the distribution of the ε i . Under assumption (B2), for example, the
directions ui = |ε i |−1 ε i of the transformed (standardized and centered) observations
ε i = Ω −1 (yi − μ ) are “uniformly” distributed in the sense that E(ui ) = 0 and p ·
E(ui ui ) = I p . (Then μ and Σ are the so called Hettmansperger-Randles functionals
which are discussed later in the book.) Even in the weakest model (B4), μ is a
natural location parameter in the sense that E(ui ) = 0. (We later show that the spatial
median of ε i is then the zero vector.) Also, all hyperplanes going through μ divide
the probability mass of the distribution of yi into two parts of equal size 1/2. (Then
μ is the so called half-space median or Tukey median.) Note, however, that in this
models the scatter matrix Σ is no longer related to the regular covariance matrix.

In the following, we refer to these models or families of distributions for yi using


symbols (A0)–(A4) and (B1)–(B4). Then the model (A1) is the so called ellipti-
cal model which we discuss in more detail in Section 2.2. The model (A2) with
somewhat weaker assumptions is called in the following the location-scatter model.
This model includes all elliptical distributions and, for example, distributions with
independent and identically distributed components. It is straightforward to see the
following.

Theorem 2.1. The assumptions (A0)–(A4) and (B1)–(B4) satisfy the joint hierarchy

(A0) ⇒ (A1) ⇒ (A2) ⇒ (A3) ⇒ (A4)


⇓ ⇓ ⇓ ⇓
(B1) ⇒ (B2) ⇒ (B3) ⇒ (B4).
8 2 Multivariate location and scatter models

2.2 Multivariate elliptical distributions

In this section we consider the model (A1). As before, let

yi = μ + Ω ε i , i = 1, ..., n.

We assume that ε 1 , ..., ε n are independent and identically distributed random vectors
from a spherically symmetrical and continuous distribution. We say that the distri-
bution of ε is spherically symmetrical around the origin if the density function f (ε )
of ε depends on ε only through the modulus |ε |. We can then write

f (ε ) = exp{−ρ (|ε |)}

for some function ρ (r). Note that the equal density contours are then spheres. The
modulus ri = |ε i | and direction ui = |ε i |−1 ε i are independent, and the direction
vector ui is uniformly distributed on the p-dimensional unit sphere S p . It is then
easy to see that
1
E(ui ) = 0 and COV(ui ) = E(ui ui ) = Ip
p
The density of the modulus is

g(r) = c p r p−1 exp {−ρ (r)} , r > 0,

where
2π p/2
cp = 
Γ 2p
is the surface area of the unit sphere S p . The scatter matrix Σ = Ω Ω  is, however,
confounded with g. To fix the scatter matrix Σ one can then, for example, assume
that ρ is chosen so that E(ri2 ) = p or Med(ri2 ) = χ p,.5
2
(the median of the chi-square
distribution with p degrees of freedom). Then Σ is the regular covariance matrix in
the multivariate normal case.

Under these assumptions, the random sample Y = (y1 , ..., yn ) comes from a p-
variate elliptical distribution with probability density function


fy (y) = |Σ |−1/2 f Σ −1/2 (y − μ ) ,

where μ is the symmetry center and Σ > 0 is the scatter matrix (parameter). The
matrix Σ −1/2 is chosen here to be symmetric. The location parameter μ is the mean
vector (if it exists) and the scatter matrix Σ is proportional to the regular covariance
matrix (if it exists). We also write

yi ∼ E p (μ , Σ , ρ ).
2.2 Multivariate elliptical distributions 9

Note that the transformation matrix Ω is not uniquely defined in the elliptical
model as, for any orthogonal matrix O, Ω ε i = (Ω O)(O ε i ) = Ω ∗ ε ∗i and also ε ∗i
has a spherically symmetric distribution with density f . In principal component
analysis the eigenvectors and eigenvalues of Σ are of interest; the orthogonal matrix
of eigenvectors O = O(Σ ) and the diagonal matrix of eigenvalues D = D(Σ ) (in a
decreasing order of magnitude) are obtained from the eigenvector and eigenvalue
decomposition
Σ = ODO .
Write diag(Σ ) for a diagonal matrix having the same diagonal elements as Σ . Often,
the scatter matrix is normalized as

[diag(Σ )]−1/2 Σ [diag(Σ )]−1/2 .

This is the correlation matrix (if it exists). Another way to normalize the scatter
matrix is to divide it by a scale parameter.
Definition 2.2. Let Σ be a scatter matrix. Then a scale parameter σ 2 = σ 2 (Σ ) is the
scalar-valued function that satisfies

σ 2 (I p ) = 1 and σ 2 (c · Σ ) = c · σ 2(Σ ).

The normalized matrix


σ −2 Σ
is then the corresponding shape matrix.
Possible choices for the scale parameter are, for example, the arithmetic, geo-
metric, and harmonic means of the eigenvalues of Σ , namely

tr(Σ )/p, det(Σ ), and [tr(Σ −1 )/p]−1 .

See Paindaveine (2008) for a discussion of the choices of the scale and shape pa-
rameters.

Example 2.1. Multivariate normal distribution. Assume that ε i is spherically


symmetric and that ri2 ∼ χ p2 . (Now E(ri2 ) = p.) It then follows that the distribution of
ε i is the standard multivariate normal distribution N p (0, I p ) with the (probability)
density function   
−p/2 εε
f (ε ) = (2π ) exp − .
2
We write ε i ∼ N p (0, I p ). The p components of ε i are then independent and dis-
tributed as N(0, 1). In fact, ε i is spherically symmetric and has independent compo-
nents if and only if ε i has a multivariate normal distribution. Finally, the distribution
of yi is a p-variate normal distribution N p (μ , Σ ) with the density function
 
−p/2 −1/2 1  −1
fy (y) = (2π ) |Σ | exp − (y − μ ) Σ (y − μ ) .
2
10 2 Multivariate location and scatter models

The distribution of the direction vector is easily obtained in the multivariate nor-
mal case as the components of ε i are independent and N(0, 1) distributed. Thus
εi2j ∼ χ12 and εi2j /2 ∼ Γ (1/2). But then
⎛ ⎞ ⎛ 2⎞
u2i1 εi1
⎝ ... ⎠ = 1 ⎝ ... ⎠
u2ip ε i ε i ε 2
ip

has a so called Dirichlet distribution D p (1/2, ..., 1/2). See Section 3.3 in Bilodeau
and Brenner (1999). As the distribution of ui is the same for all spherical distribu-
tions, we have the following.
Theorem 2.2. Let the distribution of a p-variate random vector ε be spherically
symmetric around the origin, and let u = |ε |−1 ε be the direction vector. Then
(u21 , ..., u2p ) has the Dirichlet distribution D p (1/2, ..., 1/2). Moreover, ∑ki=1 u2i ∼
Beta(k/2, (p − k)/2).

Example 2.2. Multivariate t distribution. Assume again that ε i is spherically


symmetric and that ri2 /p ∼ F(p, ν ) (F distribution with p and ν degrees of free-
dom). Then the distribution of ε i is the p-variate t distribution with ν degrees of
freedom; write ε i ∼ tν ,p . The density function of ε i is then
 
Γ ((p + ν )/2) ε  ε −(p+ν )/2
f (ε ) = 1 + .
Γ (ν /2)(πν ) p/2 ν

The components of ε i are uncorrelated, but not independent, and their marginal
distribution is a univariate tν distribution. The smaller ν is, the heavier are the tails
of the distribution. The expected value exists if ν ≥ 2, and the covariance matrix
exists for degrees of freedom ν ≥ 3. The very heavy-tailed distribution with ν = 1
is called the multivariate Cauchy distribution. The multivariate normal distribution
is obtained as a limit case as ν → ∞. The distribution of yi = μ + Ω ε i is denoted by
tν ,p (μ , Σ ). If ε ∗ ∼ N p (0, I p ) and s2 ∼ χν2 and ε ∗ and s2 are independent then

ε = (s2 /ν )−1/2 ε ∗ ∼ tν ,p .

Example 2.3. Multivariate power exponential family. In the so-called multivari-


ate power exponential family, the density of the distribution of ε i is
 
|ε |2ν
f (ε ) = k p,ν exp − ,
2

where
pΓ (p/2)
k p,ν =
π p/2Γ ((2ν + p)/(2ν ))2(2ν +p)/(2ν )
2.3 Other distribution families 11

is determined so that the density integrates up to 1. Now |εi |2ν ∼ Γ (1/2, p/(2ν ))
which can be used to simulate the observations from this flexible parametric model.
If ν = 1 then εi ∼ N p (0, I p ). The model includes both heavy-tailed (ν > 1) and light-
tailed (ν < 1) elliptical distributions. The (heavy-tailed) multivariate double expo-
nential (Laplace) distribution is given by ν = 1/2 and a (light-tailed) multivariate
uniform elliptical distribution is obtained as a limit when ν → ∞. The distribution
of yi = Ω ε i + μ is denoted by PE(μ , Σ , ν ). See Gómez et al. (1998).

2.3 Other distribution families

Example 2.4. Location-scatter model. (A2) is a wider model than the elliptical
model assuming only that the components of ε i are exchangeable and marginally
symmetric; that is,

JPε i ∼ ε i , for all sign-changes J and permutations P.

In addition to elliptical distributions, the family includes distributions where the


marginal variables are symmetrical, independent, and identically distributed. The
model also includes distributions where ε i has a density of the general form

f (ε ) = exp {−ρ (||ε ||)} ,

where || · || is any permutation and sign change invariant metric (||JPε || = ||ε || for
all J and P). This is true for the Lα -norm

||ε ||α = (|ε1 |α + · · · + |ε p |α )1/α ,

for example, and therefore a wide variety of distributional shapes is available in this
model. Recall that the elliptical model is given with the L2 -norm.

We thus assume that our observed vectors yi are generated by yi = μ + Ω ε i ,


i = 1, ..., n. As in the elliptic model, both μ and Σ = Ω Ω  are well defined if we
again assume that E(|ε i |2 ) = p or Med(|ε i |2 ) = χ p,.5
2 . Thus μ is the symmetry center

and Σ is proportional to the covariance matrix if it exists. (Σ is the covariance matrix


in the multivariate normal case.)

Note that if we still weaken the assumptions and assume only that

Jε i ∼ ε i , for all sign-changes J,

μ is still well defined (symmetry center) but Σ = ΩΛ Ω  , where Λ is a diagonal


matrix depending on the (unknown) distribution of ε i . It is, however, still possible
to fix some of the features of the transformation matrix Ω .
12 2 Multivariate location and scatter models

Example 2.5. Generalized elliptical model (B1) and its extension (B2). The defi-
nition of the generalized elliptical model was given by Frahm (2004). In this model
he assumes that
Oui ∼ ui , for all orthogonal O.
The direction vectors ui = |ε i |−1 ε i are then distributed as if the original observa-
tions ε i were coming from an elliptically symmetric distribution. Randles (2000)
considered the same assumption saying that the distribution has “elliptical direc-
tions” However, no assumptions on the distribution of modulus ri = |ε i | are made;
skew distributions may be obtained if the distribution of ri depends on ui . Again, a
weaker model is obtained if

JPui ∼ ui , for all sign-changes J and permutations P.

Still in this wider model, the spatial median of ε i is 0 and the so-called Tyler’s scat-
ter matrix (discussed later) is proportional to the identity matrix. This fixes location
and shape of the distribution of yi , that is, parameters μ and Σ (up to scale).

Example 2.6. Independent component model. In the independent component


model the independent and identically distributed random variables yi are gener-
ated by yi = μ + Ω ε i where the standardized and centered p-vector ε i has indepen-
dent components. In the so-called independent component analysis (ICA) the aim
is, based on the data Y = (y1 , ..., yn ) , to find an estimate of a retransformation ma-
trix Ω −1 (or transformation matrix Ω ) and then transform the data to independent
components for further analysis. See Hyvärinen et al. (2001) for the ICA problem.
It is then often assumed that only a few of the independent components are informa-
tive (dimension reduction problem). The problem is not well posed in this general
formulation of the model: if Ω −1 transforms to independent coordinates then so
do DPΩ −1 for all diagonal matrices D with nonzero diagonal elements and for all
permutation matrices P. See Nordhausen et al. (2009b) for additional assumptions
needed to make Ω unique.

Example 2.7. Mixtures of multivariate normal distributions and mixtures of el-


liptical distribution. In the mixture model of two multivariate normal distribu-
tions, ε i is thought to come from an N p (μ 1 , Σ 1 )-distribution with probability π1
and from an N p (μ 2 , Σ 2 )-distribution with probability π2 = 1 − π1 . The parame-
ters (π1 , π2 ), (μ 1 , μ 2 ), and (Σ 1 , Σ 2 ) then together determine the location, scatter,
skewness, and kurtosis properties of the multivariate distribution. A natural way to
standardize and center ε i is to require that E(ε i ) = 0 and COV(ε i ) = I p ; that is,

π1 μ 1 + π2 μ 2 = 0 and π1 (Σ 1 + μ 1 μ 1 ) + π2(Σ 2 + μ 2 μ 2 ) = I p .

If μ 1 = μ 2 = 0 then ε i is elliptically distributed and the heaviness of the tails is


determined by (π1 , π2 ) and (Σ 1 , Σ 2 ) Intuitively, if π2 is “small” and Σ 2 is “large”
then the second population may be thought to represent the population where the
2.3 Other distribution families 13

outliers originated. If, for example, Σ 1 = I p and Σ 2 = σ 2 I p then the probability


density function of ε i is
    
−p/2 εε 2 −p/2 ε ε
f (ε ) = π1 · (2π ) exp − + π2 · (2πσ ) exp − 2 .
2 2σ

This mixture of two distributions can be easily extended to the case of general k ≥ 2
mixtures, and also to the case where the observations come from other elliptical
distributions. See McLachlan and Peel (2000).

Example 2.8. Skew-normal and skew-elliptical model. In the skew-normal model,


the p-variate “standardized” observations ε i are obtained as follows. Let ε ∗i come
from a (p + 1)-variate standard normal distribution N p+1 (0, I p+1 ) and write
⎛ ∗ ⎞
ε i1
∗ ∗ ⎝ ... ⎠
ε i = U(εi,p+1 − α − β εip ) , i = 1, ..., n.
ε ∗ip

Here U(y) = −1, 0, 1 if y <, =, > 0. Then again yi = μ + Ω ε i , i = 1, ..., n. The


multivariate skew-elliptical distribution may be defined in the same way just by
assuming that ε ∗i come from E p+1(0, I p+1 , ρ ). Note that a still more general model
is obtained if we assume that
⎛ ∗ ⎞
ε i1
ε i = si ⎝ ... ⎠ ,
ε ∗ip

where ε ∗i is coming from E p (0, I p , ρ ) and si is a random variable with possible val-
ues ±1 possibly depending on ε ∗i . Note that if μ = 0 is known then Σ is proportional
to the regular covariance matrix with respect to the origin; that is E(yi yi ) ∝ Σ . See
Azzalini (2005) and references therein.
Chapter 3
Location and scatter functionals and sample
statistics

Abstract In this chapter, we introduce multivariate location and scatter functionals,


M(F) and S(F), which can be used to describe the multivariate distribution. The
sample versions of the functionals, M(Y) and S(Y), respectively, can then often be
used to estimate the population parameters μ and Σ in the semiparametric models
introduced in Chapter 2. Some general properties and the use of the sample statistics
are discussed.

3.1 Location and scatter functionals

The characteristics of univariate distributions most commonly considered are lo-


cation, scale, skewness, and kurtosis. These concepts are often identified with the
mean, standard deviation, and standardized third and fourth moments. Skewness
and kurtosis are often seen only as secondary statistics indicating the stability of
the primary statistics, location and scale. Skewness and kurtosis are also used in
(parametric) model selection. In parametric or semiparametric models one often has
natural parameters for location and scale, that is, μ and Σ . In wide nonparamet-
ric models functionals for location and scale must be used instead. The functionals
or measures are then supposed to satisfy certain natural equivariance or invariance
properties.

We first define what we mean by a location vector and a scatter matrix defined as
vector- and matrix-valued functionals in wide nonparametric families of multivari-
ate distributions (often including discrete distributions as well). Let y be a p-variate
random variable with cumulative distribution function (cdf) Fy .

Definition 3.1.
(i) A p-vector M(F) is a location vector (functional) if it is affine equivariant; that
is,
M(FAy+b ) = AM(Fy ) + b

H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 15


and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 3,
c Springer Science+Business Media, LLC 2010
16 3 Location and scatter functionals and sample statistics

for all random vectors y, all full-rank p × p-matrices A and all p-vectors b.
(ii) A symmetric p × p matrix S(F) ≥ 0 is a scatter matrix (functional) if it is affine
equivariant in the sense that

S(FAy+b ) = AS(Fy )A

for all random vectors y, all full-rank p × p-matrices A and all p-vectors b.

Classical location and scatter functionals, namely the mean vector E(y) and the
covariance matrix
 
COV(y) = E (y − E(y))(y − E(y)) ,

serve as first examples. Note that the matrix of second moments E(yy ), for example,
is not a scatter matrix in the regular sense but can be seen as a scatter matrix with
respect to the origin.

The theory of location and scatter functionals has been developed mainly to find
new tools for robust estimation of the regular mean vector and covariance matrix
in a neighborhood of the multivariate normal model or in the wider model of el-
liptically symmetric distributions. The competitors of the regular covariance matrix
do not usually have the so called independence property: if random vector y has
independent components then S(Fy ) is a diagonal matrix. It is easy to see that the
regular covariance matrix has the independence property. Naturally this property is
not important in the elliptic model as the multivariate normal distribution is the only
elliptical distribution that can have independent margins. On the other hand, the
independence property is crucial if one is working in the independent component
model mentioned in Chapter 2 (the ICA problem).

Using the affine equivariance properties, one easily gets the following.

Theorem 3.1. Assume that random vector y is distributed in the location-scatter


model (A2); that is,
y = μ + Ωε ,
where
JPε ∼ ε , for all sign-changes J and permutations P,
μ is the location vector (parameter) and Σ = Ω Ω  is the scatter matrix (parameter).
Then
(i) M(Fy ) = μ for all location vectors M.
(ii) S(FY ) ∝ Σ for all scatter matrices S.

The determinant det(S) or trace tr(S) is often used as a global measure of multi-
variate scatter. In fact, [det(S)]1/p is the geometric mean and tr(S)/p the arithmetic
mean of the eigenvalues of S. The functional det(COV(y)) is sometimes called the
3.1 Location and scatter functionals 17

generalized variance; the functional tr(COV(y)) may be seen as a multivariate ex-


tension of the variance as well because tr(COV(y)) = E(|y − E(y)|2 ).

There are several alternative competing techniques to construct location and scat-
ter functionals, for example, M-functionals, S-functionals, τ -functionals, projection-
based functionals, CM- and MM-functionals, and so on. These functionals and
related estimates are discussed throughout in numerous research and review pa-
pers, see, fro example, Maronna (1976), Davies (1987), Lopuhaä (1989), and Tyler
(2002). See also the recent monograph by Maronna et al. (2006). A common fea-
ture in these approaches is that the functionals and related estimates are built for
inference in elliptical models only. Next we consider M-functionals in more detail.

Definition 3.2. Location and scatter M-functionals are functionals M = M(Fy ) and
S = S(Fy ) which simultaneously satisfy two implicit equations

M = [E[w1 (r)]]−1 E [w1 (r)y]

and  
S = [E[w3 (r)]]−1 E w2 (r)(y − M)(y − M)
for some suitably chosen weight functions w1 (r), w2 (r), and w3 (r). The random
variable r is the Mahalanobis distance between y and M; that is,

r = |y − M|S = (y − M)S−1 (y − M).

Consider an elliptic model with known ρ (r) and its derivative function ψ (r) =
ρ  (r). If one then chooses w1 (r) = w2 (r) = ψ (r)/r and w3 (r) ≡ 1, the M-functionals
are called the pseudo maximum likelihood (ML) functionals corresponding to that
specific distribution determined by ρ . In the multivariate normal case w1 (r) ≡
w2 (r) ≡ 1, and the corresponding functionals are the mean vector and the covari-
ance matrix again. A classical M-estimate is Huber’s M-functional with choices

w1 (r) = min(c/r, 1) and w2 (r) = d · min(c2 /r2 , 1)

(and w3 (r) ≡ 1) with some positive tuning constants c and d. The value of the func-
tional does not depend strongly on the tails of the distribution; the tuning constant c
controls this property. The constant d is just a scaling factor.

If M1 (F) and S1 (F) are any affine equivariant location and scatter functionals
then so are the one-step M-functionals, starting from M1 and S1 , and given by

M2 = [E[w1 (r)]]−1 E [w1 (r)y]

and  
S2 = E w2 (r)(y − M1 )(y − M1 ) ,
18 3 Location and scatter functionals and sample statistics

where now r = |y − M1 |S1 . It is easy to see that M2 and S2 are affine equivariant
location and scatter functionals as well. Repeating this step until it converges yields
the regular M-estimate (with w3 (r) ≡ 1).

3.2 Location and scatter statistics

In this section we introduce sample statistics to estimate unknown location and scat-
ter parameters μ and Σ in different models (or the unknown theoretical or population
values of location and scatter functionals M(F) and S(F)). The values of the sam-
ple statistics are similarly denoted by M(Y) and S(Y), where Y = (y1 , ..., yn ) ∈
M (n, p) (sample space).

Definition 3.3. (i) A p-vector M(Y) is a location statistic if it is affine equivariant


in the sense that
M(YA + 1b ) = AM(Y) + b
for all data matrices Y, all full-rank p × p-matrices A, and all p-vectors b.
(ii) A symmetric p × p matrix S(Y) ≥ 0 is a scatter statistic if it is affine equivariant
in the sense that
S(YA + 1b ) = AS(Y)A
for all datasets Y with S(Y) > 0, all full-rank p × p-matrices A, and all p-vectors b.

If Y = (y1 , ..., yn ) is a random sample, it is then often natural that location and
scatter statistics are invariant in the following sense.

Definition 3.4.
(i) A location statistic M is permutation invariant if M(Y) = M(PY) for all Y and
all n × n permutation matrices P.
(ii) A scatter statistic S is permutation invariant if S(Y) = S(PY) for all Y and all
n × n permutation matrices P.

Note that if M(Y) and S(Y) are not permutation invariant, invariant estimates
(with the same bias but with smaller variation; use the Rao-Blackwell theorem) can
be easily obtained as

AVEP {M(PY)} and AVEP {S(PY)}

where the average is over all n! possible permutation matrices P.

In the one-sample location test constructions we often use scatter statistics S with
respect to the origin that are permutation and sign-change invariant, that is,

S(JPYA ) = AS(Y)A
3.3 First and second moments of location and scatter statistics 19

for all Y with S(Y) > 0, all p × p matrices A, all n × n permutation matrices P, and
all sign-change matrices J.

Location and scatter functionals M(F) and S(F) yield corresponding sample
statistics simply by applying the definition to a discrete random variable Fn ,

M(Y) = M(Fn ) and S(Y) = S(Fn ),

where Fn is the empirical p-variate cumulative distribution function based on sample


Y = (y1 , ..., yn ) . These statistics are then automatically permutation invariant.

The different techniques to construct location and scatter functionals (M-func-


tionals, S-functionals, MM-functionals, τ -functionals, etc.) as mentioned in Section
3.1 may thus be used to find the corresponding sample statistics. The M-statistics,
for example, may be defined in the following way.
Definition 3.5. Location and scatter M-statistics M = M(Y) and S = S(Y) simulta-
neously satisfy two implicit equations

M = [AVE[w1 (ri )]]−1 AVE [w1 (ri )yi ]

and  
S = [AVE[w3 (ri )]]−1 AVE w2 (ri )(yi − M)(yi − M)
for weight functions w1 (r), w2 (r), and w3 (r). The scalar ri is the Mahalanobis dis-
tance between yi and M; that is, ri = |yi − M|S .

3.3 First and second moments of location and scatter statistics

Why do we need different location and scatter functionals and statistics? In symmet-
rical models (A0)–(A4) all location statistics M(Y) estimate the symmetry center μ .
In models (A0)–(A2) all scatter statistics S(Y) estimate the same population quan-
tity (up to a scaling factor). The statistical properties (convergence, limiting distribu-
tion, efficiency, robustness, computational convenience, etc.) of the estimates may
considerably differ, however. One can then just pick up an estimate that is best for
his or her purposes.

To consider the possible bias and the accuracy of location and scatter statistics we
next find a general structure for their first and second moments for random samples
coming from an elliptical distribution (A1) as well as from a location-scatter model
(A2). For most results in this section, we refer to Tyler (1982).

In the following, it is notationally easier to work with the vectors rather than with
the matrices. The “vec” operation is used to vectorize a matrix. If S > 0 is a scatter
20 3 Location and scatter functionals and sample statistics

matrix then vec(S) is a vector obtained by stacking the columns of S on top of each
other: ⎛ ⎞
s1
⎜ .. ⎟
vec(S) = vec((s1 , ..., s p )) = ⎝ . ⎠ .
sp
Moreover for treating the p2 × p2 covariance matrices of the vectorized p × p scatter
matrices, the following matrices prove very useful for bookkeeping. Let ei be a p-
vector with the ith element one and others zero, i = 1, ..., p. Then ∑i=1 ei ei = I p and
p

we write
p
D p,p = ∑ (ei ei ) ⊗ (eiei ),
i=1
p p
J p,p = ∑ ∑ (ei ej ) ⊗ (eiej ),
i=1 j=1
p p
K p,p = ∑ ∑ (ei ej ) ⊗ (e j ei ), and
i=1 j=1
p p
I p,p = ∑ ∑ (ei ei ) ⊗ (e j ej ).
i=1 j=1

Naturally I p,p = I p2 .

The covariance structure of a scatter matrix C in symmetric models (A1) and


(A2) can be formulated using three orthogonal projection matrices
1 1 1
P1 = (I p,p + K p,p) − D p,p, P2 = D p,p − J p,p, and P3 = J p,p.
2 p p
Then, for any p × p matrix A,
1. P1 collects the off-diagonal elements of symmetrized A:
 
1 
P1 vec(A) = vec (A + A ) − diag(A) .
2

2. P2 picks up the centered diagonal elements; that is,


 
tr(A)
P2 vec(A) = vec diag(A) − (I p ) .
p

3. P3 projects matrix A to the space spanned by the identity matrix:

tr(A)
P3 vec(A) = vec(I p ) .
p
3.3 First and second moments of location and scatter statistics 21

Consider the location-scatter model (A2) with the assumption that the standard-
ized variable ε i is marginally symmetrical and exchangeable. Then the following
lemma yields a general structure for the first and second moments of location and
scatter statistics, M = M(ε ) and S = S(ε ) calculated for ε = (ε 1 , ..., ε n ) . These
statistics then clearly satisfy

PJM ∼ M and PJSJP ∼ S, for all P and J.

Theorem 3.2. Assume that the p-variate random vector M satisfies PJM ∼ M for
all p × p permutation matrices P and all sign-change matrices J. Then there is a
positive constant σ 2 such that

E(M) = 0 and COV(M) = σ 2 I p .

Second, assume that the random symmetric p × p matrix S > 0 satisfies PJSJP ∼ S
for all p × p permutation matrices P and all sign-change matrices J. Then there are
positive constants η , τ1 , and τ2 and a constant τ3 such that

E(S) = η I and COV(vec(S)) = τ1 P1 + τ2 P2 + τ3 P3 ,

where

τ1 = 2·Var(S12), tau2 = Var(S11)−Cov(S11 , S22 ) and τ3 = (p−1)·Cov(S11, S22 ).

Here Cov(S11 , S22 ) means the covariance between any two different diagonal ele-
ments of S.

Proof. As E(M) = PJ E(M) and COV(M) = PJ COV(M) JP , one easily sees that
E(T) = 0 and COV(M) = σ 2 I p for some σ 2 > 0. Similarly, it is straightforward to
see that E(S) = η I p for some η > 0. A general formula for the covariance matrix
of the vectorized S is
p p p p
COV(vec(S)) = ∑ ∑ ∑ ∑ Cov(Si j , Srs)(ei er ) ⊗ (e j es ).
i=1 j=1 r=1 r=1

By symmetry arguments, all the variances of the diagonal elements of S must be the
same, say Var(S11 ), and all the variances of the off-diagonal elements must be the
same as well, say Var(S12 ), and all the correlations between off-diagonal elements
and other elements of S must be zero. Finally, all the covariances between diagonal
elements must also be the same, say Cov(S11 , S22 ). The details in the proof are left
to the reader.

Under the stronger assumption that OSO ∼ S for all orthogonal p × p matrices
O, corresponding to the elliptic model (A1), one can further show that

Cov(S11 , S22 ) = Var(S11 ) − 2 ·Var(S12)


22 3 Location and scatter functionals and sample statistics

and one gets a still simpler structure for the covariance matrix of the scatter statistic
S.

Corollary 3.1. Assume that the random symmetric p × p matrix S > 0 satisfies
OSO ∼ S for all p × p orthogonal matrices O. Then there are positive constants η
and τ1 and a constant τ3 such that

E(S) = η I and COV(vec(S)) = τ1 (P1 + P2) + τ3 P3 ,

where τ1 and τ3 are as in Theorem 3.2.

As mentioned before, scatter matrices are often standardized with transformation


S → [p/tr(S)]S; the standardized matrix is then called a shape matrix (scale infor-
mation is lost). The covariance structure of the shape matrix in the elliptical case
may be found using the following.

Corollary 3.2. Assume that the random symmetric p × p matrix S > 0 satisfies
tr(S) = p and OSO ∼ S for all p × p orthogonal matrices O. Then there is a posi-
tive constant τ1 such that

E(S) = I and COV(vec(S)) = τ1 (P1 + P2 ),

where, as before, τ1 = 2 ·Var(S12) .

In the general elliptical case we then obtain the following using Theorem 3.2
and Corollary 3.1 and the affine equivariance property of the location and scatter
statistics.

Theorem 3.3. Assume that Y is a random sample from an elliptical distribution


with location vector μ and scatter matrix Σ . For any location and scatter statistics
M and S, there are positive constants η , σ 2 , and τ1 and a constant τ3 such that

E(M(Y)) = μ and COV(M(Y)) = σ 2 Σ

and E(S(Y)) = ηΣ and

COV(vec(S(Y))) = τ1 (P1 + P2)Σ ⊗ Σ + τ3 vec(Σ )(vec(Σ )) .

Theorem 3.3 thus implies that, in the elliptical model, all location statistics are
unbiased estimators of the symmetry center μ . The constant σ 2 may then be used
in efficiency comparisons. Remember, however, that σ 2 for the choice M depends
on the distribution of ri = |zi | and on the sample size n. The scatter statistic S is
unbiased for ηΣ , η again depending on S, the sample size n, and the distribution of
ri . Then the correction factors may be used to guarantee the unbiasedness (or at least
consistency) in the case of multivariate normality, for example. The constants τ1 and
τ3 determine the variance-covariance structure of a scatter estimate S for sample
size n and distribution Fyi . In many multivariate procedures based on the covariance
matrix, it is sufficient to know the covariance matrix only up to a constant, that is,
3.4 Breakdown point 23

the shape matrix. According to Corollary 3.2, the constant τ1 is sufficient for shape
matrix efficiency comparisons. The problem, of course, is how to estimate these
constants which are unknown in practice.

3.4 Breakdown point

The breakdown point (BP) of a sample statistic T = T(Y) is a measure of global


robustness. We discuss here the breakdown properties of a sample statistic only, in
the spirit of Donoho and Huber (1983). Hampel (1968) was the first one to consider
the breakdown point of a functional T(F) in an asymptotic sense.

Roughly speaking, the breakdown point of a statistic T is the maximum propor-


tion of contaminated “bad” observations in a dataset Y that can make the observed
value T(Y) totally unreliable or uninformative. To be more specific, define the m
neighborhood Bm (Y) of Y in sample space M (n, p) as

Bm (Y) = {Y∗ ∈ M (n, p) : Y and Y∗ have n − m joint rows} ,

m = 0, ..., n. A corrupted dataset Y∗ ∈ Bm (Y) may thus be obtained by replacing m


observation vectors (rows) in Y with arbitrary values. Clearly B0 ⊂ B1 ⊂ · · · ⊂ Bn
and the extreme cases are B0 (Y) = {Y} and Bn (Y) = M (n, p).

To define the breakdown point, we still need a distance measure δ (T, T∗ ) be-
tween the observed value T = T(Y) and the corrupted value T∗ = T(Y∗ ) of the
statistic. The maximum distance over all m-replacements is then

δm (T; Y) = sup {δ (T(Y), T(Y∗ )) : Y∗ ∈ Bm (Y)} ,

m = 0, ..., n.

Definition 3.6. The breakdown point of T at Y is then


m 
BD(T; Y) = min : δm (T; Y) = ∞ .
n
For location statistics, a natural distance measure is δ (M, M∗ ) = |M − M∗ |. For
scatter matrices, one can use δ (S, S∗ ) = max(|S−1 S∗ − I|, |(S∗ )−1 S − I|), for exam-
ple. The scatter matrix becomes “useless” if one of its eigenvalues approaches 0 or
∞. Davies (1987) gave and upper bound for the breakdown points of location vectors
and scatter matrices.
Theorem 3.4. For any location statistic M, for any scatter statistic S, and for any
(genuinely p-variate) data matrix Y,
n− p+1 n− p+1
BD(M; Y) ≤ and BD(S; Y) ≤
2n 2n
24 3 Location and scatter functionals and sample statistics

The sample mean vector and sample covariance matrix have the smallest possi-
ble breakdown point 1/n. Maronna (1976) and Huber (1981) showed that the M-
statistics have relative low breakdown points, always below 1/(p + 1). For high
breakdown points, some alternative estimation techniques (e.g., S-estimates) should
be used.

3.5 Influence function and asymptotics

The influence function (Hampel, 1968, 1974) is often used to measure local robust-
ness of the functional or the statistic.
Definition 3.7. The influence function (IF) of a functional T(F) at F is

T((1 − ε )F + εΔy )) − T(F)


IF(y; T, F) = lim
ε ↓0 ε
(if the limit exists), where Δy is the cumulative distribution function of a distribution
with all probability mass at y.
Note that the influence function may be seen as a simple derivative of a vector
valued function ε → T((1 − ε )F + εΔy ) at zero. A robust estimator should then
have a bounded and continuous IF. Intuitively, in that case, a small contamination of
the distribution does not have an arbitrarily large effect on the estimate. Estimates
with bounded IF can be further compared using so-called gross error sensitivity
supy |IF(y; T, F)|.

In the case of elliptically symmetric distributions, the influence functions of the


location and scatter functionals appear to have simple expressions ( Hampel et al.
(1986); Croux and Haesbrock (2000)). We assume that the scatter functional S is
equipped with a correction factor so that it is a consistent estimate of Σ . If F = Fμ ,Σ
is the cdf of E p (μ , Σ , ρ ) then the following holds.
Theorem 3.5. The influence functions of location and scatter functionals, M(F )
and S(F), at elliptical F = Fμ ,Σ are

IF(y; M, F) = γ (r)Σ 1/2 u

and
IF(y; S, F) = α (r)Σ 1/2 uu Σ 1/2 − β (r)Σ ,
where r = |z| and u = |z|−1 z with z = Σ −1/2 (y − μ ), and γ , α , and β are real valued
functions determined by M and S and the spherical distribution F0,I .
If T(Y) = T(Fn ) is the sample version of the functional T(F ) and the functional
is sufficiently regular, then often (and this should of course be proven separately for
each statistic)
3.6 Other uses of location and scatter statistics 25
√ √
n(T(Y) − T(F)) = n AVE {IF(yi ; T, F)} + oP(1).

See Huber (1981). Then, under general conditions, using the central limit theorem
and the influence functions given in Theorem 3.5,

n(M(Y) − μ ) →D N p (0, σ 2 Σ )

and the limiting distribution of n (vec(S(Y)) − vec(Σ )) is

N p 0, τ1 (I p,p + K p,p)(Σ ⊗ Σ ) + τ3 vec(Σ )(vec(Σ )) ,

where now
E[γ 2 (ri )]
σ2 = ,
p
E[α 2 (ri )]
τ1 = , and
p(p + 2)
E[α 2 (ri )] 2E[α (ri )β (ri )]
τ3 = − + E[β 2 (ri )]
p(p + 2) k

with ri = |zi | with zi = Σ −1/2 (yi − μ ). Note that constants σ 2 , τ1 , and τ3 are related
but not the same as the finite sample constants discussed in Section 3.3.

The influence functions of the M-functionals may be found in Huber (1981). It


is then interesting to note that

γ (r) ∝ w1 (r)r,
α (r) ∝ w2 (r)r2 and
β (r) + constant ∝ w2 (r)r . 2

Thus in M-estimation the weight functions determine the local robustness and effi-
ciency properties of M and S. Recall that the Huber’s M-statistic, for example, is
obtained with choices

w1 (r) = min(c/r, 1) and w2 (r) = d · min(c2 /r2 , 1)

which then guarantee the boundedness of the influence function.

3.6 Other uses of location and scatter statistics

The scatter matrices S(Y) are often used to transform the dataset. If one writes
(spectral or eigenvalue decomposition)

S(Y) = O(Y)D(Y)(O(Y))
26 3 Location and scatter functionals and sample statistics

where O(Y) is an orthogonal matrix and D(Y) is a diagonal matrix with positive
diagonal elements in a decreasing order, then the components of the transformed
data matrix
Z = YO(Y),
are the so called principal components, used in principal component analysis (PCA).
The columns of O are also called eigenvectors of S, and the diagonal elements of D
list the corresponding eigenvalues. The principal components are uncorrelated and
ordered according to their dispersion in the sense that S(Z) = D. Principal com-
ponents are often used to reduce the dimension of the data. The idea is then to take
just a few first principal components combining most of the variation; the remaining
components are thought to represent the noise.

Scatter matrices are also often used to standardize the data. The transformed
standardized dataset
Z = YO(Y)[D(Y)]−1/2
or
Z = Y[S(Y)]−1/2
then has standardized components (in the sense that S(Z) = I p ), and the observations
zi tend to be spherically distributed in the elliptic case. The symmetric version of
the square root matrix
S−1/2 = OD−1/2O
is usually chosen. (A square root of a diagonal matrix with positive elements is a
diagonal matrix of square roots of the elements.) Unfortunately, even in that case,
the transformed dataset Z is not coordinate-free but the following is true.
Theorem 3.6. For all Y, A, and S, there is an orthogonal matrix O such that

YA [S(YA )]−1/2 O = Y[S(Y)]−1/2 .

In the independent component analysis (ICA), most ICA algorithms first standardize
the data using the regular covariance matrix S = S(Y) and then rotate the standard-
ized data in such a way that the components of Z = YS−1/2 O are “as independent as
possible”. In this procedure the regular covariance matrix may be replaced by any
scatter matrix that has the independence property.

A location statistic M = M(Y) and a scatter matrix S = S(Y) may be used to-
gether to center and standardize the dataset. Then the transformed dataset is given
by
Z = (Y − 1nM )S−1/2 .
This is often called the whitening of the data. Then M(Z) = 0 and S(Z) = I p . Again,
if you rotate the dataset using an orthogonal matrix O, it is still true that M(ZO) = 0
and S(ZO) = I p . This means that the whitening procedure is not uniquely defined.
Recently, Tyler et al. (2009) developed an approach called the invariant coordinate
selection (ICS) which is based on the simultaneous use of two scatter matrices, S1
3.6 Other uses of location and scatter statistics 27

−1/2
and S2 . In this procedure the data are first standardized using S1 and then the
standardized data are rotated using the principal component transformations O but
−1/2
now based on S2 and the transformed data YS1 . Then
−1/2 −1/2
S1 (YS1 O) = I p and S2 (YS1 O) = D,
−1/2
where now D is the diagonal matrix of eigenvalues of S1 S2 . If both S1 and S2
have the independence property, the procedure finds, under general assumptions, the
independent components in the ICA problem.

Two location vectors and two scatter matrices may be used simultaneously to
describe the skewness and kurtosis properties of a multivariate distribution. Affine
invariant multivariate skewness statistics may be defined as squared Mahalanobis
distances between two location statistics

|M1 − M2 |2S = (M1 − M2 ) S−1 (M1 − M2 ).

The eigenvalues of S−11 S2 , say d1 ≥ · · · ≥ d p may be used to describe the multi-


variate kurtosis. The measures of skewness and kurtosis can then be used in testing
for symmetry, multivariate normality, or multivariate ellipticity as well as in the
separation between the models. See Kankainen et al. (2007) and Nordhausen et al.
(2009b).
Chapter 4
Multivariate signs and ranks

Abstract In this chapter the concepts of multivariate spatial signs and ranks and
signed-ranks are introduced. The centering and standardization of the scores are
discussed. Different properties of the sign and rank scores are obtained. The sign
and rank covariance matrices UCOV, TCOV, QCOV, and RCOV are introduced
and discussed.

4.1 The use of score functions

Let the random sample Y = (y1 , y2 , ..., yn ) be generated by

yi = μ + Ω ε i , i = 1, ..., n,

where the ε i are independent, centered, and standardized residuals with cumulative
distribution function F. As discussed before, different assumptions on the distribu-
tion of ε i yield different parametric and semiparametric models. In some applica-
tions,
Y = Xβ + εΩ 
so that the symmetry center depends on the design matrix X. Before introducing dif-
ferent nonparametric scores used in our approach, we present alternative strategies
on how the scores should be centered and standardized before their use in the test
construction and in the estimation.

A general idea to construct tests and estimates for location parameter μ and scat-
ter matrix Σ = Ω Ω  is to use a p-vector-valued score function T(y) yielding in-
dividual scores Ti = T(yi ), i = 1, ..., n. Throughout this book we use the identity
score function T(y) = y, the spatial sign score function U(y), the spatial rank score
function R(y), and the spatial signed-rank score function Q(y).

H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 29


and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 4,
c Springer Science+Business Media, LLC 2010
30 4 Multivariate signs and ranks

The general likelihood inference theory suggests that a good choice for T in the
location problem is the optimal location score function

L(y) = ∇ log f (y − μ )|μ =0 ,

that is, the gradient vector of log f (y − μ ) with respect to μ at the origin. In the
N p (0, I p ) case the optimal score function is the identity score function,

T(y) = y.

The optimal location score function for the p-variate t-distribution with ν degrees
of freedom, tν ,p (0, I p ), for example, is
ν+p
T(y) = y,
1 + |y|2

and, in the general spherical model with the density function

f (y) = exp{−ρ (|y|)},

the optimal location score function is

ψ (|y|)
T(y) = y,
|y|

where ψ (r) = ρ  (r), that is, the derivative function of ρ . An example of a robust
choice of the score function is Huber’s score function
 
c
T(y) = min ,1 y
|y|

with some choice of c > 0. The validity, efficiency, and robustness properties of the
testing and estimation procedures then naturally depend on the choice of the score
function and of course on the true model.

In the test construction we first transform

Y = (y1 , ..., yn ) → T = (T1 , ..., Tn ) = (T(y1 ), ..., T(yn )) .

For different testing or estimation purposes, one then often wishes the scores to be
centered and standardized in some natural way.

We then have the following possibilities.


1. Outer centering of the scores:

Ti → T̂i = Ti − T̄.

2. Outer standardization of the scores:


4.1 The use of score functions 31

Ti → T̂i = COV(T)−1/2 Ti .

3. Outer centering and standardization of the scores:

Ti → T̂i = COV(T)−1/2 (Ti − T̄).

However, it is often more natural to use the following.

1. Inner centering of the scores: Find shift vector M such that, if T̂i = T(yi − M),
then
AVE{T̂i } = 0.
Then transform
Ti → T̂i = T(yi − M).
2. Inner standardization of the scores: Find transformation matrix S−1/2 such that,
if T̂i = T(S−1/2 yi ), then

p · AVE{T̂i T̂i } = AVE{T̂i T̂i }I p .

Then transform
Ti → T̂i = T(S−1/2 yi ).
3. Inner centering and standardization of the scores: Find shift vector M and trans-
formation matrix S−1/2 such that, if T̂i = T(S−1/2 (yi − M)), then

AVE{T̂i } = 0 and p · AVE{T̂i T̂i } = AVE{T̂i T̂i }I p .

Then transform
Ti → T̂i = T(S−1/2 (yi − M))

Note that, in the inner approach, M = M(Y) is a location statistic and S = S(Y)
is the scatter statistic corresponding to the score function T(y). The matrix S−1/2 is
assumed to be a symmetric matrix here. Note, however, that inner centering and/or
standardization may not always be possible.

In this book we mainly use the following score functions.


• Identity score T(y) = y.
• Spatial sign score T(y) = U(y).
• Spatial rank score T(y) = R(y).
• Spatial signed-rank score T(y) = Q(y).
In different approaches we simply replace the regular identity score by nonpara-
metric scores,

Y = (y1 , ..., yn ) → T = (T1 , ..., Tn ) or T̂ = (T̂1 , ..., T̂n ).


32 4 Multivariate signs and ranks

The tests and estimates are then no longer optimal under the multivariate normality
assumption but are robust and more powerful under heavy-tailed distributions. For
the asymptotic behavior of the tests and estimates we need the following matrices
 
A = E T(ε i )L(ε i ) and B = E T(ε i )T(ε i ) ,

the covariance matrix between chosen score and the optimal score and the variance-
covariance matrix of the chosen score. These two matrices play an important role in
the following chapters.

4.2 Univariate signs and ranks

We trace the ideas from the univariate concepts of sign, rank and signed-rank.
These concepts are linked with the possibility to order the data. The ordering is
done with the univariate sign function

⎨ +1, if y > 0
U(y) = 0, if y = 0

−1, if y < 0.

Consider a univariate dataset Y = (y1 , ..., yn ) and assume that there are no ties. Let
−Y = (−y1 , ..., −yn ) be the dataset when the observations are reflected with respect
to the origin. The (empirical) centered rank function is

R(y) = RY (y) = AVE{U(y − yi)},

and the signed-rank function is


1
Q(y) = QY (y) = [RY (y) + R−Y (y)].
2
Note that the signed-rank function QY (y) is just the centered rank function calcu-
lated among the combined 2n-set of the original observations y1 , ..., yn and their
reflections −y1 , ..., −yn . The signed-rank function is odd, meaning that QY (−y) =
−QY (y) for all y and Y. Note also that R−Y (y) = −RY (−y).

The numbers Ui = U(yi ), Ri = R(yi ), and Qi = Q(yi ), i = 1, ..., n, are the observed
signs, observed centered ranks, and observed signed-ranks. The possible values of
observed centered ranks Ri are
n−1 n−3 n−3 n−1
− ,− , ..., , .
n n n n
The possible values of observed signed-ranks Qi are
4.2 Univariate signs and ranks 33

2n − 1 2n − 3 2n − 3 2n − 1
− ,− , ..., , .
2n 2n 2n 2n
The centered ranks and the signed-ranks are located on the interval (-1,1) (univari-
ate unit ball). Note that the centered rank R(x) and signed-rank Q(x) provide both
magnitude (robust distances from the median and from the origin, respectively) and
direction (sign with respect to the median and sign with respect to the origin, re-
spectively).

There are n! possible values of the vector of ranks (R1 , ..., Rn ) , given by
⎛ ⎞
−(n − 1)/n
⎜ −(n − 3)/n ⎟
⎜ ⎟
P⎜ ... ⎟,
⎜ ⎟
⎝ (n − 3)/n ⎠
(n − 1)/n

where P goes through all n × n permutation matrices, and if the observations are
independent and identically distributed, then all these possible values have an equal
probability 1/n!. The vector of signed-ranks (Q1 , ..., Qn ) has 2n n! possible values
obtained from ⎛ ⎞
1/(2n)
⎜ 3/(2n) ⎟
⎜ ⎟
JP ⎜⎜ ... ⎟,

⎝ (2n − 3)/(2n) ⎠
(2n − 1)/(2n)
with all possible permutation matrices P and all possible sign-change matrices J,
and if the distribution is symmetric around zero then all the possible values have the
same probability 1/(2nn!).

It is easy to find the connection between the regular rank (with values 1, 2, ..., n)
and the centered rank, namely,
 
2 n+1
centered rank = regular rank − .
n 2

The above definitions of univariate signs and ranks are based on the ordering
of the data. However, in the multivariate case there is no natural ordering of the
data points. The approach utilizing objective or criterion functions is then needed
to extend the concepts to the multivariate case. The concepts of univariate sign and
rank and signed-rank may be implicitly defined using the L1 criterion functions
34 4 Multivariate signs and ranks

AVE{|yi |} = AVE{U(yi ) · yi },
1
AVE{|yi − y j |} = AVE{RY (yi ) · yi },
2
1
AVE{|yi + y j |} = AVE{R−Y (yi ) · yi }, and
2
1
AVE{|yi − y j | + |yi + y j |} = AVE{QY (yi ) · yi }.
4
See Hettmansperger and Aubuchon (1988).

Let us have a closer look at these objective functions if applied to the residuals in
the linear regression model. We first remind the reader that the classical L2 objective
function optimal in the normal case is similarly

AVE{|yi |2 } = AVE{yi · yi }

and corresponds to the identity score function. The first objective function, the
mean deviation of the residuals, is the basis for the so-called least absolute devi-
ation (LAD) methods; it yields different median-type estimates and sign tests in the
one-sample, two-sample, several-sample and finally general linear model settings.
The second objective function is the mean difference of the residuals. The second,
third, and fourth objective functions generate Hodges-Lehmann-type estimates and
rank tests for different location problems. Note also that the sign, centered rank, and
signed-rank function may be seen as the score functions corresponding to the first,
second, and fourth objective functions. This formulation suggests a natural way to
generalize the concepts sign, rank, and signed-rank to the multivariate case (without
defining a multivariate ordering).

Consider next the corresponding theoretical functions, and assume that y, y1 , and
y2 are independent and identically distributed continuous random variables from a
univariate distribution with cdf F. Then, similarly to empirical equations,

E{|y|} = E{U(y) · y},


1
E{|y1 − y2 |} = E{(2F(y) − 1) · y},
2
1
E{|y1 + y2 |} = E{(1 − 2F(−y)) · y}, and
2
1
E{|y1 − y2| + |y1 + y2 |} = E{U(y)[Fy (|y|) − Fy (−|y|)] · y}.
4
The theoretical (population) centered rank function and signed-rank function for F
are naturally defined as

RF (y) = 2F(y) − 1 and


QF (y) = U(y)[F(|y|) − F(−|y|)].
4.3 Multivariate spatial signs and ranks 35

If Y is a random sample of size n from F then it is easy to see that

sup |RY (y) − RF (y)| →P 0 and sup |QY (y) − QF (y)| →P 0


y y

as n → ∞. The functions QY (y) and QF (y) are odd, that is, QY (−y) = −QY (y)
and QF (−y) = −QF (y), and for distributions symmetric about the origin QF (y) =
RF (y). Note also that the inverse of the centered rank function (i.e., the inverse of
the centered cumulative distribution function) is the univariate quantile function.

4.3 Multivariate spatial signs and ranks

Next we go to the multivariate case: Let Y = (y1 , ..., yn ) be an n × p dataset. The


multivariate concepts of spatial sign, spatial rank and spatial signed-rank are then
given in the following.
Definition 4.1. The empirical spatial sign, spatial rank, and spatial signed-rank
functions U(y), R(y) = RY (y), and Q(y) = QY (y) are defined as
 −1
|y| y, y = 0
U(y) = ,
0, y=0
R(y) = AVE {U(y − yi)} , and
1
Q(y) = [RY (y) + R−Y (y)] .
2

Observe that in the univariate case regular sign, rank, and signed-rank functions
are obtained. Clearly multivariate signed-rank function Q(y) is also odd; that is,
Q(−y) = −Q(y).

The observed spatial signs are Ui = U(yi ), i = 1, ..., n. As in the univariate case,
the observed spatial ranks are certain averages of signs of pairwise differences

Ri = R(yi ) = AVE j {U(yi − y j )}, i = 1, ..., n.

Finally, the observed spatial signed-ranks are given as


1
Qi = Q(yi ) = AVE j {U(yi − y j ) + U(yi + y j )}, i = 1, ..., n.
2
The spatial sign Ui is just a direction vector of length one (lying on the unit p-
sphere S p ) whenever yi = 0. The centered ranks Ri and signed-ranks Qi lie in the
unit p-ball B p . The direction of Ri (Qi ) roughly tells the direction of yi from the
center of the data cloud (the origin), and its length roughly tells how far away this
point is from the center (the origin). The next theorem collects some equivariance
properties.
36 4 Multivariate signs and ranks

Theorem 4.1. The spatial signs, spatial ranks, and spatial signed-ranks are orthog-
onal equivariant in the sense that

U(Oyi ) = OU(yi ),
RYO (Oyi ) = ORY (yi ), and
QYO (Oyi ) = OQY (yi )

for all yi and all orthogonal matrices O. The centered ranks are invariant under
location shifts and
AVEi {Ri } = 0.
Example 4.1. The spatial signs, ranks, and signed-ranks are not affine equivariant,
however. In Figure 4.1 one can see scatterplots for 50 bivariate observations from
N2 (0, I2 ) with the corresponding bivariate spatial signs and ranks and signed-ranks.
The data points are then rescaled (Figure 4.2) and shifted (Figure 4.3). The figures
illustrate the behavior of the signs, ranks, and signed-ranks under these transforma-
tions: they are not equivariant under rescaling of the components. The spatial ranks
are invariant under location shifts. See below the R-code needed for the plots.

>library(MNM)
>set.seed(1)

>X <- rmvnorm(50,c(0,0))


>colnames(X) <- c("X_1", "X_2")

>par(mfrow = c(2, 2), pty = "s", las = 1)


>plot(X, xlim = c(-4, 4), ylim = c(-4, 4),
main = "Original data")
>plot(spatial.sign(X, FALSE, FALSE), xlim = c(-1, 1),
ylim = c(-1, 1),xlab = "X_1", ylab = "X_2",
main = "Spatial signs")
>plot(spatial.rank(X, FALSE), xlim = c(-1, 1), ylim = c(-1, 1),
xlab = "X_1", ylab = "X_2", main = "Spatial ranks")
>plot(spatial.signrank(X, FALSE, FALSE), xlim = c(-1, 1),
ylim = c(-1, 1), , xlab = "X_1", ylab = "X_2",
main = "Spatial signed ranks")

>X.rescaled <- transform(X, X_2 = X_2 * 5)


>plot(X.rescaled, xlim = c(-15, 15), ylim = c(-15, 15),
main = "Rescaled data")
>plot(spatial.sign(X.rescaled, FALSE, FALSE), xlim = c(-1, 1),
ylim = c(-1, 1), xlab = "X_1", ylab = "X_2",
main = "Spatial signs")
>plot(spatial.rank(X.rescaled, FALSE), xlim = c(-1, 1),
ylim = c(-1, 1), xlab = "X_1", ylab = "X_2",
main = "Spatial ranks")
>plot(spatial.signrank(X.rescaled, FALSE, FALSE),
xlim = c(-1, 1), ylim = c(-1, 1), , xlab = "X_1", ylab = "X_2",
4.3 Multivariate spatial signs and ranks 37

main = "Spatial signed ranks")

>X.shifted <- transform(X, X_1 = X_1 + 1)


>plot(X.shifted, xlim = c(-4, 4), ylim = c(-4, 4),
main = "Shifted data")
>plot(spatial.sign(X.shifted, FALSE, FALSE),
xlim = c(-1, 1), ylim = c(-1, 1),
xlab = "X_1", ylab = "X_2", main = "Spatial signs")
>plot(spatial.rank(X.shifted, FALSE), xlim = c(-1, 1),
ylim = c(-1, 1), xlab = "X_1", ylab = "X_2",
main = "Spatial ranks")
>plot(spatial.signrank(X.shifted, FALSE, FALSE),
xlim = c(-1, 1), ylim = c(-1, 1), , xlab = "X_1",
ylab = "X_2", main = "Spatial signed ranks")

>par(opar)

Original data Spatial signs

4 1.0

2 0.5
X_2

X_2

0 0.0

−2 −0.5

−4 −1.0

−4 −2 0 2 4 −1.0 0.0 0.5 1.0

X_1 X_1

Spatial ranks Spatial signed ranks

1.0 1.0

0.5 0.5
X_2

X_2

0.0 0.0

−0.5 −0.5

−1.0 −1.0

−1.0 0.0 0.5 1.0 −1.0 0.0 0.5 1.0

X_1 X_1

Fig. 4.1 The scatterplots for a random sample of size 50 from N2 (0, I2 ) with scatterplots for cor-
responding observed spatial signs, spatial ranks, and spatial signed-ranks.
38 4 Multivariate signs and ranks

Rescaled data Spatial signs

15 1.0
10
0.5
5

X_2

X_2
0 0.0
−5
−0.5
−10
−15 −1.0

−15 −5 5 15 −1.0 0.0 0.5 1.0

X_1 X_1

Spatial ranks Spatial signed ranks

1.0 1.0

0.5 0.5
X_2

X_2

0.0 0.0

−0.5 −0.5

−1.0 −1.0

−1.0 0.0 0.5 1.0 −1.0 0.0 0.5 1.0

X_1 X_1

Fig. 4.2 The scatterplots for a random sample of size 50 from N2 (0, I2 ) with rescaled second
component (multiplied by 5) with scatterplots for corresponding observed spatial signs, spatial
ranks, and spatial signed-ranks.

The sign, centered rank, and signed-rank may again be implicitly defined through
multivariate L1 type objective functions

AVE{|yi |} = AVE{Ui yi },
1
AVE{|yi − y j |} = AVE{Ri yi }, and
2
1
AVE{|yi − y j | + |yi + y j |} = AVE{Qi yi }.
4

Here |y| = (y21 + · · · + y2p )1/2 .

The theoretical (population) functions are defined as follows.


Definition 4.2. The theoretical spatial rank function and signed-rank function for a
p-variate random variable yi with cdf F are

RF (y) = E {U(y − yi)} and


1
QF (y) = E {U(y − yi) − U(y + yi)} .
2
4.3 Multivariate spatial signs and ranks 39

Shifted data Spatial signs

4 1.0

2 0.5

X_2

X_2
0 0.0

−2 −0.5

−4 −1.0

−4 −2 0 2 4 −1.0 0.0 0.5 1.0

X_1 X_1

Spatial ranks Spatial signed ranks

1.0 1.0

0.5 0.5
X_2

X_2

0.0 0.0

−0.5 −0.5

−1.0 −1.0

−1.0 0.0 0.5 1.0 −1.0 0.0 0.5 1.0

X_1 X_1

Fig. 4.3 The scatterplots for a random sample of size 50 from N2 (0, I2 ) with shifted first compo-
nent (shifted by 1) with scatterplots for corresponding observed spatial signs, spatial ranks, and
spatial signed-ranks.

The rank function RF (y) characterizes the distribution F (up to a location shift).
If we know the rank function, we know the distribution (up to a location shift). See
Koltchinskii (1997). For F symmetric around the origin, QF (y) = RF (y), for all y.
The empirical functions converge uniformly in probability to the theoretical ones
under mild assumptions. (For the proof, see Möttönen et al. (1997).)

Theorem 4.2. Assume that Y is a random sample of size n from a distribution with
cdf F and uniformly bounded density. Then, as n → ∞,

sup |RY (y) − RF (y)| →P 0 and


y
sup |QY (y) − QF (y)| →P 0.
y

Chaudhuri (1996) considered the inverse of the spatial rank function and called it
the spatial quantile function. See also Koltchinskii (1997). Let u be a vector in the p-
variate open unit ball B p . Write Φ (u, y) = |y|− uy. Then, according to Chaudhuri’s
definition, the value of spatial quantile function θ = θ (u) of F at u minimizes
40 4 Multivariate signs and ranks

E{Φ (u, θ − y) − Φ (u, y)},

where y has the cdf F. The second term in the expectation guarantees that the ex-
pectation always exists. It is easy to check that the spatial quantile function is the
inverse of the map
y → RF (y).
Serfling (2004) gives an extensive review of the inference methods based on the
concept of the spatial quantile, and introduces and studies some nonparametric mea-
sures of multivariate location, spread, skewness and kurtosis in terms of these quan-
tiles. The quantile at u = 0 is the so-called spatial median which is discussed in
more detail later.

If F is spherically symmetric around the origin, the rank and signed-rank function
has a simple form as described in the following.

Theorem 4.3. For a spherical distribution F, the theoretical spatial rank and signed-
rank function is
RF (y) = QF (y) = qF (r)u,
where r = |y| and u = |y|−1 y and
 
r − y1
qF (r) = EF .
((r − y1 )2 + y22 + · · · + y2p)1/2

Note that qF (r) is the derivative function of

DF (r) = EF (|re1 − y| − |y|).

(The expected value always exists.) Naturally also |qF (r)| ≤ 1, so RF (y) is in the
unit ball.

In the following examples we give general formulas for the spatial rank function
in cases of multivariate normal, multivariate t, and multivariate normal scale mixture
models. For these distributions, the spatial rank functions can be formulated with the
generalized hypergeometric functions.

Definition 4.3. A generalized hypergeometric function is defined as a series



(a1 )i (a2 )i . . . (a p )i ri
p Fq (a1 , a2 , . . . , a p ; b1 , b2 , . . . , bq ; r) =∑ ,
i=0 (b1 )i (b2 )i . . . (bq )i i!

where (c)i = Γ (c + i)/Γ (c).

It is straightforward but tedious then to get the theoretical rank functions in the
following cases. See Möttönen et al. (2005).

Example 4.2. In the case of a multivariate normal distribution N p (0, I p ) we get


4.4 Sign and rank covariance matrices 41
 2  
r Γ ( p+1
2 ) p + 1 p r2
D0 (r) = 2 exp −
1/2
1 F1 ; ;
2 Γ ( 2p ) 2 2 2

and  2  
r r Γ ( p+1
2 ) p + 1 p + 2 r2
q0 (r) = exp − F
1 1 ; ; .
21/2 2 Γ ( p+2 ) 2 2 2
2

Example 4.3. Let φ (y; μ , Σ ) be the density of a multivariate normal distribution with
mean vector μ and the covariance matrix Σ . Then a standardized multivariate nor-
mal scale mixture density is given by

φ (y; 0, s−2 I p )dH(s).

Under general assumptions, one then easily gets


 
D(r) = D0 (rs)dH(s) and q(r) = q0 (rs)dH(s).

Example 4.4. Multivariate t p,ν distribution is obtained if ν s2 ∼ χ 2 (ν ). Then


ν −1  
ν 1/2Γ ( p+12 )Γ ( 2 ) p + 1 ν − 1 p r 2 /ν
D(r) = 2 F1 , ; ;
Γ ( ν2 )Γ ( 2p )[1 + r2/ν ](ν −1)/2 2 2 2 1 + r 2 /ν

and
ν +1  
rΓ ( p+1
2 )Γ ( 2 ) p + 1 ν + 1 p + 2 r 2 /ν
q(r) = 2 F1 , ; ; .
ν 1/2Γ ( ν2 )Γ ( p+2 (ν +1)/2 2 1 + r 2 /ν
2 )[1 + r /ν ]
2 2 2

Example 4.5. In the case of the mixture of two normal distributions, N p (0, I p ) with
the probability π1 and N p (0, σ 2 I p ) with the probability π2 , π1 + π2 = 1, then

D(r) = π1 D0 (r) + π2D0 (r/σ ) and q(r) = π1 q0 (r) + π2 q0 (r/σ ).

4.4 Sign and rank covariance matrices

If spatial sign and ranks are used to analyze the data, the sign and rank covariance
matrices also naturally play an important role.
Definition 4.4. Let Y ∈ M (n, p) be a data matrix. Then the spatial sign covariance
matrix UCOV(Y), and the spatial Kendall’s tau matrix TCOV(Y) are
 
UCOV(Y) = AVE U(yi )U(yi ) ,
 
TCOV(Y) = AVE U(yi − y j )U(yi − y j ) .

We also define the following.


42 4 Multivariate signs and ranks

Definition 4.5. Let Y ∈ M (n, p) be a data matrix. Then the spatial rank covariance
matrix RCOV(Y) and the spatial signed-rank covariance matrix QCOV(Y) are
 
RCOV(Y) = AVE Ri Ri and
 
QCOV(Y) = AVE Qi Qi .

Note that if one uses the vectors of marginal signs and ranks instead of the spatial
signs and ranks, then the regular sign covariance matrix (UCOV), Kendall’s tau
matrix (TCOV) and Spearman’s rho matrix (RCOV) are obtained.

The matrices UCOV(Y), RCOV(Y), TCOV(Y), and QCOV(Y) are not scatter
matrices as they are not affine equivariant. They are equivariant under orthogonal
transformations only. The RCOV and TCOV are shift invariant. Note also that the
sign covariance matrix and the Kendall’s tau matrix are standardized in the sense
that tr(UCOV(Y)) = tr(TCOV(Y)) = 1.

The theoretical (population) spatial sign and Kendall’s tau covariance matrices
are defined in the following.

Definition 4.6. Let y, y1 , and y2 be independent observations from a p-variate dis-


tribution with cdf F. Then the theoretical spatial sign covariance matrix UCOV(F),
and spatial Kendall’s tau matrix TCOV(F) are
 
UCOV(F) = E U(y)U(y) , and
 
TCOV(F) = E U(y1 − y2 )U(y1 − y2) .

Visuri et al. (2000) proved the following interesting result.

Theorem 4.4. Let F be the cumulative distribution of an elliptical random variable


y centered around the origin. Let the eigenvalues of COV(F) be distinct. Then the
eigenvalues of UCOV(F) and TCOV(F) are also distinct, and the eigenvectors of
COV(F), UCOV(F), and TCOV(F) (in the decreasing order of the corresponding
eigenvalues) are the same.

The result suggests that, in the elliptic model, the spatial sign and rank covariance
matrices can be used in the principal component analysis (PCA) to find the principal
components. Let D, DU , and DT be the (diagonal) matrices of distinct eigenvalues
of COV(F), UCOV(F), and TCOV(F), respectively, in a decreasing order. Then
!
D1/2 uu D1/2
D U = DT = E ,
u Du

where u is uniformly distributed on the unit sphere. See Locantore et al. (1999),
Marden (1999b), Visuri et al. (2000, 2003), and Croux et al. (2002) for these robust
alternatives to classical PCA.. For more discussion on PCA based on spatial signs
and ranks, see Chapter 9.
4.4 Sign and rank covariance matrices 43

The theoretical spatial rank and signed-rank covariance matrices are given in the
following.

Definition 4.7. Let y be a p-variate random variable with cumulative distribution


function F, spatial rank function RF (y) and spatial signed-rank function QF (y).
Then the theoretical spatial rank covariance matrix RCOV(F), and spatial signed-
rank covariance matrix QCOV(F) are
 
RCOV(F) = E RF (y)RF (y) and
 
QCOV(F) = E QF (y)QF (y) .

Using Corollary 3.2, the moments of the sign covariance matrix are easily found
in the spherically symmetric case.

Theorem 4.5. Assume that Y is a random sample from a spherical distribution


around the origin. Then UCOV(Y) is distribution-free (its distribution does not
depend on the distribution of the modulus), and
1 2
E(UCOV(Y)) = Ip and COV(vec(UCOV(Y))) = (P1 + P2),
p np(p + 2)

where as before
1 1
P1 + P2 = (I p,p + K p.p) − J p,p.
2 p
Proof. Let y1 , ..., yn be a random sample from a spherical distribution, and let Ui =
|yi |−1 yi , i = 1, ..., n. Then

UCOV(Z) = AVE{Ui Ui }.

Clearly p · E[UCOV(Z)] = I p . For the variances and covariances, recall from The-
orem 2.2 that (Ui12 , ...,Uip
2 ) has a Dirichlet distribution D (1/2, ..., 1/2), so
p

1 1 3
E[Ui2j ] = , E[Ui2jUik2 ] = and E[Ui4j ] = ,
p p(p + 2) p(p + 2)

with j = k. But then


2 1 2
Var(Ui2j ) = , Var(Ui jUik ) = and Cov(Ui2j ,Uik2 ) = − 2 ,
p(p + 2) p(p + 2) p (p + 2)

with j = k, and the result follows from Corollary 3.2.

Rank covariance matrices TCOV(Y), RCOV(Y), and QCOV(Y) are (asymptot-


ically equivalent to) matrix-valued U-statistics with kernel sizes 2, 3, and 3, respec-
tively; they are not (even asymptotically) distribution-free, and the expressions for
their covariance matrices are more complicated: For spherical distributions around
44 4 Multivariate signs and ranks

the origin, the first and second moments of TCOV(Y) have a similar structure as
those of the UCOV(Y) with expected value (1/p)I p and covariance matrix

τ (P1 + P2 )

but τ now depends on sample size n and the distribution of the modulus. The limiting
distributions of UCOV(Y) and TCOV(Y) can be easily given in the spherical case.
See Sirkiä et al. (2008).

Theorem 4.6. Assume that Y is a random sample of size n from a spherical distri-
bution around the origin. Then, as n → ∞,
   
√ 1 2
nvec UCOV(Y) − I →d N p2 0, (P1 + P2) and
p p(p + 2)
   
√ 1 2τ
nvec TCOV(Y) − I →d N p2 0, (P1 + P2) ,
p p(p + 2)

where   
|y1 | |y1 |
τ = 4E 1 − 1−
|y1 − y2 | |y1 − y3|
as n → ∞.

The expected values of RCOV(Y) and QCOV(Y) when Y is a random sample


from F satisfy

E(RCOV(Y)) = RCOV(F) + o(1/n) and


E(QCOV(Y)) = QCOV(F) + o(1/n)

so that the sample statistics are asymptotically unbiased. They estimate the same
population quantity if F is symmetrical around the origin. In the spherically sym-
metric case their covariance matrices are structured as

τ2 (P1 + P2) + τ3 P3 .

See Corollaries 3.1 and 3.2.

4.5 Other approaches

Next we give a short survey of some other approaches to multivariate sign and rank
methods. The most straightforward extension is just to use vectors of componentwise
univariate signs and ranks: the componentwise signs and ranks then correspond to
the L1 type criterion functions

AVEi {|yi1 | + · · · + |yip|} and AVEi, j {|yi1 − y j1 | + · · · + |yip − y j p|}


4.5 Other approaches 45

utilizing the so called “Manhattan distance”. The sign and rank vectors are invariant
under componentwise monotone transformations (e.g., marginal rescaling) but not
orthogonal equivariant. This approach is perhaps the most natural one for data with
independent components. See Puri and Sen (1971) for a complete discussion of this
approach.

Affine equivariant multivariate sign and rank methods may be constructed as


follows. The geometric idea in the p-variate location model then is to consider data
based simplices based on p + 1 residuals, or p residuals and the origin. The volume
of the simplex with vertices yi1 , ..., yi p+1 is
  
1 1 ··· 1
V (yi1 , ..., yi p+1 ) = abs det .
p! yi1 · · · yi p+1

The two criterion functions now are


   
AVE V (0, yi1 , ..., yi p ) and AVE V (yi1 , ..., yi p+1 ) .

The corresponding score functions are then affine equivariant or Oja multivariate
signs and ranks, and the inference methods based on these are then affine equiv-
ariant/invariant. For a review of the multivariate location problem, see Oja (1999).
Visuri et al. (2000, 2003) and Ollila et al. (2003b, 2004) introduced and considered
the corresponding affine equivariant sign and rank covariance matrices.

Koshevoy and Mosler (1997a,b, 1998) and Mosler (2002) proposed the use
of zonotopes and lift zonotopes, p- and (p + 1)-variate convex sets Z p (Y) and
LZ p+1 (Y), respectively, to describe and investigate the properties of a data matrix Y.
Koshevoy et al. (2003) developed a scatter matrix estimate based on the zonotopes.
It appears that there is a nice duality relation between zonotopes (lift zonotopes) and
affine equivariant signs (ranks); the objective functions yielding affine equivariant
signs and ranks are just volumes of zonotope Z p (Y) and lift zonotope LZ p+1 (Y),
respectively. See Koshevoy et al. (2004).

Randles (1989) developed an affine invariant sign test based on an ingenious con-
cept of interdirection counts. Affine invariant interdirection counts depend on the
directions of the observation vectors; they measure the angular distances between
two vectors relative to the rest of the data. Randles (1989) was followed by a series
of papers introducing nonparametric sign and rank interdirection tests for multivari-
ate one sample and two sample location problems, for example. This approach is
quite related to the spatial sign and rank approach, as we show in later chapters.

Still one important approach is to combine the directions (spatial signs or in-
terdirection counts) and the transformed ranks of the Mahalanobis distances from
the origin or data center. In a series of papers, Hallin and Paindaveine constructed
optimal signed-rank location tests in the elliptical model; see the seminal papers by
46 4 Multivariate signs and ranks

Hallin and Paindaveine (2002, 2006). Similarly, Nordhausen et al. (2009) developed
optimal rank tests in the independent component model.
Chapter 5
One-sample problem: Hotelling’s T 2 -test

Abstract We start with a one-sample location example with trivariate and bivariate
observations. It is shown how a general score function T(y) is used to construct tests
and estimates in the one-sample location problem. The identity score T(y) gives the
regular Hotelling’s T 2 -test and sample mean.

5.1 Example

We consider the classical data due to Rao (1948) consisting of weights of cork bor-
ings on trees in four directions: north (N), east (E), south (S), and west (W). We
have these four measurements on 28 trees, and we wish to test whether the weight
of cork borings is independent of the direction.

Table 5.1 Weights of cork borings (in centigrams) in the four directions
N E S W N E S W
72 66 76 77 91 79 100 75
60 53 66 63 56 68 47 50
56 57 64 58 79 65 70 61
41 29 36 38 81 80 68 58
32 32 35 36 78 55 67 60
30 35 34 26 46 38 37 38
39 39 31 27 39 35 34 37
42 43 31 25 32 30 30 32
37 40 31 25 60 50 67 54
33 29 27 36 35 37 48 39
32 30 34 28 39 36 39 31
63 45 74 63 50 34 37 40
54 46 60 52 43 37 39 50
47 51 52 43 48 54 57 43

H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 47


and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 5,
c Springer Science+Business Media, LLC 2010
48 5 One-sample problem: Hotelling’s T 2 -test

−10 0 5 10

10
5
0
E_N −5
−10
−15
−20

10
5
0 S_N
−5
−10

5
0
−5
W_N −10
−15
−20

−20 −10 0 10 −20 −10 0 5

Fig. 5.1 The scatterplot for differences E-N, S-N and W-N.

30

20

10
W_E

−10

−20

−30

−30 −20 −10 0 10 20 30

S_N

Fig. 5.2 The scatterplot for differences S-N and W-E.


5.2 General strategy for estimation and testing 49

>data(cork)

>cork_3v <- sweep(cork[,2:4], 1, cork[,1], "-")


>colnames(cork_3v) <- c("E_N", "S_N", "W_N")
>pairs(cork_3v, las = 1)

>cork_2v <- with(cork,


data.frame(S_N = South - North, W_E = West - East))
>plot(cork_2v, xlim = c(-30, 30), ylim = c(-30, 30),
las = 1, pty = "s")

The independence of the weight of cork borings on the direction is considered


with a 3-variate variable consisting on differences E-N, S-N and W-N when north
(N) is subtracted from the other three. See Figure 5.1. Also, the symmetry of cork
boring is studied with a bivariate vector of differences S-N and W-E. See Figure 5.2.
We wish to test the null hypothesis that the distributions of (E-N,S-N,W-N) as well
as that of (S-N,W-E) are symmetric around zero.

5.2 General strategy for estimation and testing

Let the random sample Y = (y1 , y2 , ..., yn ) be generated by

yi = μ + Ω ε i , i = 1, ..., n,

where the ε i are centered and standardized residuals with cumulative distribution
function F. The tests and estimates are constructed under different symmetry as-
sumptions (A0)–(A4) and (B0)–(B4). Note that the zero vector may be used as a
null value without loss of generality, because to test H0 : μ = μ0 , we just substitute
yi − μ0 in place of yi in the tests.

We now describe the use of a general score function T(y) for the statistical infer-
ence in the one sample location problem. The results are only heuristic and general,
and the distributional assumptions for the asymptotic theory of course depend on the
chosen score function. For the one-sample symmetry center problem it is natural to
assume that the score function T(y) is odd; that is, T(−y) = −T(y) for all y.

Outer standardization. We first discuss the test and estimate that use the outer
standardization. The test may not be affine invariant, and the estimate may not be
affine equivariant. Now
• The test statistic is

T(Y) = AVE {Ti } = AVE {T(yi )} .


50 5 One-sample problem: Hotelling’s T 2 -test

• The companion location estimate μ̂ is the shift vector obtained in the inner cen-
tering, and is determined by estimating equations

AVE {T(yi − μ̂ )} = 0.

For the asymptotic theory in our approach we need the following p × p matrices
A and B (expectations taken under the null hypothesis),
 
A = E{T(yi )L(yi ) } and B = E T(yi )T(yi ) .

Consider the null hypothesis H0 : μ = 0 and a (contiguous) sequence of alterna-


tives Hn : μ = n−1/2δ . The alternatives are used to consider the asymptotic relative
efficiencies (ARE) of the tests and estimates. They can also be used in sample size
calculations. Then, under general assumptions, we get the following results.
• Under the null hypothesis H0 ,

n AVE {Ti } →d N p (0, B).

• Under the null hypothesis H0 , the squared version of the test statistic

Q2 = n |B̂−1/2 AVE {Ti } |2 →d χ p2 ,

where  
B̂ = AVE Ti Ti .
• Under the sequence of alternatives Hn ,

n AVE {Ti } →d N p (Aδ , B).

• The limiting distribution of the estimate μ̂ is given by



n(μ̂ − μ ) →d N p (0, A−1 BA−1 ).

Inner standardization. Inner standardization is sometimes used in the construc-


tion of the test and the estimate. If the inner standardization is possible, then the test
is affine invariant, and the estimate is affine equivariant.

• In the inner standardization of the test, one first finds a p × p transformation


matrix S−1/2 such that, for T̂i = T(S−1/2 yi ), i = 1, ..., n,
   
p · AVE T̂i T̂i = AVE T̂i T̂i I p .

The squared form of the test statistic is then


5.2 General strategy for estimation and testing 51
 
|AVE T̂i |2
Q = np ·
2
 
AVE |T̂i |2

with a limiting chi-square distribution with p degrees of freedom.


• Find a shift vector μ̂ and a transformation matrix S−1/2 such that, for T̂i =
T(S−1/2 (yi − μ̂ )), i = 1, ..., n,
     
AVE T̂i = 0 and p · AVE T̂i T̂i = AVE T̂i T̂i I p .

Then μ̂ is the location estimate based on inner standardization.

Note that, in the testing case, S is not a regular scatter matrix estimate but a
scatter matrix with respect to a known center (the origin). In the estimation case, S
is a regular scatter matrix (around the estimated value μ̂ ).

For later extensions to the several-sample and regression cases we next give the
test statistics in a slightly different form. The test statistics is then seen to compare
two different scatter matrices. For that purpose, write

PX = X(X X)−1 X

for any n × q matrix X with rank q < n. Matrix PX is the p × p projection matrix to
the subspace spanned by the columns of X. The transformation Y → P1n Y then just
replaces all the observations by their sample mean vector. Now, in outer standard-
ization,
Y → T → Q2 = n · tr((T P1n T)(T T)−1 ),
and, in inner standardization,

Y → T̂ → Q2 = n · tr((T̂P1n T̂)(T̂ T̂)−1 ).

The test statistic based on inner standardization (if possible) is affine invariant. This
is not necessarily true if one uses outer standardization.

The approximate p-value may thus be based on the limiting chi-square distribu-
tion. For small sample sizes, an alternative way to construct the p-value is to use the
sign-change argument. Let J be an n×n diagonal matrix with diagonal elements ±1.
It is called a sign-change matrix. The value of the test statistic for a sign-changed
sample JY is then

Q2 (JY) = n · tr((T JP1n JT)(T T)−1 ) or n · tr((T̂JP1n JT̂)(T̂ T̂)−1 )

(T T and T̂ T̂ are invariant under sign changes). Then the p-value of a conditionally
distribution-free sign-change test statistic is
  
EJ I Q2 (JY) ≥ Q2 (Y) ,
52 5 One-sample problem: Hotelling’s T 2 -test

where J has a uniform distribution over its all 2n possible values. This sign-change
version of the test is valid for the null hypothesis −y ∼ y (model (A4)).

5.3 Hotelling’s T 2 -test

Let Y = (y1 , ..., yn ) be a random sample from an unknown distribution, and assume
that the p-variate observation vectors yi are generated by

yi = μ + Ω ε i , i = 1, ..., n,

where the ε i are centered and standardized random vectors with the cumulative dis-
tribution function F. In the Hotelling’s test case, we assume that the standardized
vectors ε i have mean vector zero and covariance matrix I p . Then E(yi ) = μ is an un-
known mean vector, and COV(yi ) = Σ = Ω Ω  > 0 an unknown covariance matrix.
At first, we wish to estimate unknown μ and test the null hypothesis

H0 : μ = 0,

and we use the identity score function T(y) = y. Then we get

Hotelling’s T 2 and the sample mean: Hotelling’s test is obtained with score func-
tion T(y) = y. Then
T(Y) = ȳ and also μ̂ = ȳ
and, under the null hypothesis,

A = Ip and B = C = Σ.

Both the outer and inner standardizations yield the same Hotelling’s one-sample
test statistic
Q2 = Q2 (Y) = nȳ B̂−1 ȳ.

In the above test construction, we standardized the sample mean using the sample
covariance matrix with respect to the origin, B(Y) = AVE{yi yi }. The popular ver-
sion of Hotelling’s T 2 uses the regular sample mean and regular sample covariance
matrix, namely ȳ and
C = AVE{(yi − ȳ)(yi − ȳ) },
and is defined as
T 2 = T 2 (Y) = nȳ C−1 ȳ.
Under the null hypothesis, both B(Y) and

C(Y) = B(Y) − ȳȳ


5.3 Hotelling’s T 2 -test 53

converge in probability to the true value Σ .

In the testing procedure with sample size n, instead of reporting a p-value, one
often compares the observed value of the test statistic Q2n to a critical value cn . The
null hypothesis is rejected if Q2n > cn . The validity and asymptotic validity of a test
(Q2n , cn ) for the null hypothesis H0 is defined as follows.

Definition 5.1. The test (Q2n , cn ) is a valid level-α test for sample size n if PH0 (Q2n >
cn ) = α . The sequence of tests (Q2n , cn ), n = 1, 2, ..., is asymptotically valid with
level α if PH0 (Q2n > cn ) → α . The probabilities PH0 are calculated under the null
hypothesis H0 .

Model of multivariate normality We now recall some well-known properties


of Hotelling’s T 2 -test in the case where the observations come from a multivariate
normal distribution N p (μ , Σ ). (The distribution of ε i is N(0, I p ).) The Hotelling’s
T 2 test rejects H0 if T 2 is large enough. Hotelling’s T 2 is a valid test statistic in this
model as its exact distribution under H0 is known. The large values of T 2 support
the alternative μ = 0. The distribution of
n− p 2
T
np
is known to be an F-distribution with p and n − p degrees of freedom, denoted by
Fp,n−p. The exact p-value is then obtained as the tail probability of Fp,n−p. Under the
alternative (μ = 0), the distribution of T 2 is known to be a noncentral Fp,n−p with
the noncentrality parameter nμ  Σ −1 μ ; one can therefore calculate the exact power
under a fixed alternative as well.

Nonparametric model with bounded second moments. Hotelling’s test statis-


tic (both versions Q2 and T 2 ) can be used in a larger model where we only assume
that the ε i have mean vector zero and covariance matrix identity. Note that the dis-
tribution does not have to be symmetric; only the assumption on the existence of
the second moments is needed. Under this wider model, the test statistic T 2 is still
asymptotically valid as its limiting distribution under H0 is known. Again, large val-
ues of T 2 support the alternative μ = 0. If the second moments exist (E(|ε i ε i |) < ∞),
then the central limit theorem (CLT) can be used to show that the limiting dis-
tribution of Q2 (and T 2 ) is a chi-square distribution with p degrees of freedom.
The test is asymptotically unbiased as, under any fixed alternative μ = 0 and any
c > 0, P(Q2 > c) → 1, as n → ∞. Under the sequence of alternative hypotheses
Hn : μ = n−1/2δ , the limiting distribution of n−1/2 ȳ is N p (δ , Σ ), and, consequently,
the limiting distribution of Q2 = Q2 (Y) is a noncentral chi-square distribution with p
degrees of freedom and noncentrality parameter δ  Σ −1 δ . This result can be used to
calculate approximate power as, for μ close to zero, Q2 has an approximate noncen-
tral chi square distribution with p degrees of freedom and noncentrality parameter
54 5 One-sample problem: Hotelling’s T 2 -test

nμ  Σ −1 μ . Also the sample size calculations to attain fixed size and power may be
based on this result.

Nonparametric model with symmetry assumption. The sign-change version


of Hotelling’s T 2 test can be used under the symmetry assumption

J(ε 1 , ..., ε n ) ∼ (ε 1 , ..., ε n ) ,

where J is a n × n sign-change matrix. Note that no assumption on the existence of


the moments is needed here. Next note that B(JY) = B(Y) but C(JY) = C(Y) is
not necessarily true. This is to the advantage of our version Q2 as one then simply
has

Q2 (Y) = 1n Y(Y Y)−1 Y1n and Q2 (JY) = 1n JY(Y Y)−1 YJ1n

and the p-value obtained from the sign-change or exact version of the test
  
EJ I Q2 (JY) ≥ Q2 (Y)

is more easily calculated.

Test statistic Q2 (as well as T 2 ) is affine invariant in the sense that

Q2 (YH ) = Q2 (Y)

for any full-rank H. This implies that the null distribution does not depend on Ω at
all. If we write
ε̂ = YB̂−1/2 ,
for the estimated residuals, then

Q2 (Y) = Q2 (ε̂ ) = n · |AVE{ε̂ i }|2 .

Note that the mean-squared length of the standardized observations is AVE{|ε̂ i |2 } =


p.

In the multivariate normality case, the sample mean ȳ and the (slightly adjusted)
sample covariance matrix n/(n − 1)C(Y) are optimal (uniformly minimum variance
unbiased, UMVU) estimators of unknown μ and Σ . Also, it is known that
 
1
ȳ ∼ N p μ , Σ ,
n

and a natural estimate of the covariance matrix of μ̂ is

 μ̂ ) = 1 C(Y).
COV(
n
5.3 Hotelling’s T 2 -test 55

In the wider model where we only assume that

E(ε i ) = 0 and COV(ε i ) = I p ,



the limiting distribution of n(ȳ − μ ) is a multivariate normal distribution N p (0, Σ )
as well, and therefore the distribution of ȳ may be approximated by N p (μ , (1/n)
C(Y)). The estimated covariance matrix is thus (1/n)C(Y) and the approximate
95% confidence ellipsoid is given by
 
μ : n(μ − ȳ)C−1 (μ − ȳ) ≤ χ p,0.95
2
.

Example 5.1. Cork boring data Consider first the 3-variate vector of E-N, S-N and
W-N. If we wish to test the null hypothesis that the mean vector is zero, we get

> mv.1sample.test(X1)

Hotelling’s one sample T2-test

data: X
T.2 = 20.742, df = 3, p-value = 0.0001191
alternative hypothesis: true location is not equal to c(0,0,0)

If we then wish to estimate the population mean vector with the sample mean
vector, the estimate and its estimated covariance matrix are as given below. The es-
timate and its estimated covariance matrix are then used to find the 95% confidence
ellipsoid for the population mean vector. This is given in Figure 5.4.

> est <- mv.1sample.est(cork_3v)


> summary(est)
The sample mean vector of cork_3v is:
E_N S_N W_N
-4.3571 -0.8571 -5.3571

And has the covariance matrix:


E_N S_N W_N
E_N 2.2440 0.2598 0.4199
S_N 0.2598 2.2691 1.2585
W_N 0.4199 1.2585 2.2810
>
> plotMvloc(est, X=cork_3v, color.ell=c(1,1,1))
>
56 5 One-sample problem: Hotelling’s T 2 -test

−10 0 5 10 −20 −10 0 5

10
0
E_N

−20 −10
5 10

5 10
S_N
0

0
−10

−10
0 5

W_N
−10
−20

−20 −10 0 10 −10 0 5 10

sample mean vector

Fig. 5.3 The scatterplot with estimated mean and 95% confidence ellipsoid for 3-variate data.

Example 5.2. Consider then the bivariate vector of S-N and W-E. If we wish to test
the null hypothesis that the mean vector is zero, we get

> mv.1sample.test(cork_2v)

Hotelling’s one sample T2-test

data: X
T.2 = 0.4433, df = 2, p-value = 0.8012
alternative hypothesis: true location is not equal to c(0,0)

Again, the mean vector, its estimated covariance matrix, and 95 % confidence
ellipsoids are given by

> est <- mv.1sample.est(cork_2v)


> summary(est)
The sample mean vector of cork_2v is:
S_N W_E
-0.8571 -1.0000
5.3 Hotelling’s T 2 -test 57

And has the covariance matrix:


S_N W_E
S_N 2.2691 0.9987
W_E 0.9987 3.6852
> plotMvloc(est, X=cork_2v, color.ell=c(1,1,1))
>

−20 −10 0 10

10
5
S_N

0
−10
10

W_E
0
−20 −10

−10 0 5 10
sample mean vector

Fig. 5.4 The scatterplot with estimated mean and 95% confidence ellipse for 2-variate data.
Chapter 6
One-sample problem: Spatial sign test
and spatial median

Abstract The spatial sign score function U(y) is used for the one-sample location
problem. The test is then the spatial sign test, and the estimate is the spatial median.
The tests and estimates using outer standardization as well as those using inner
standardization are discussed.

6.1 Multivariate spatial sign test

6.1.1 Preliminaries

The aim is to find, in the one-sample location problem, test statistics that are valid
under much weaker conditions than Hotelling’s T 2 . We consider a multivariate gen-
eralization of the univariate sign test, perhaps the simplest test ever proposed. The
spatial sign test uses the spatial sign score U(y) which is given by

U(y) = |y|−1 y, for y = 0,

and U(0) = 0.

We start by giving some approximations and key results needed in the following.
Let y = 0 and μ be any p-vectors, p > 1, and write

r = |y| and u = |y|−1 y.

H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 59


and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 6,
c Springer Science+Business Media, LLC 2010
60 6 One-sample problem: Spatial sign test and spatial median

The accuracies of different (constant, linear and quadratic) approximations of the


function |y − μ | of μ are given by the following.

Lemma 6.1.
1.
||y − μ | − |y|| ≤ |μ |.
2.
|μ |2
||y − μ | − |y| − uμ | ≤ 2 .
r
3. " "
" " 2+δ
"|y − μ | − |y| − u μ − μ  1 [I p − uu ]μ " ≤ C |μ |
" 2r " r1+δ
for all 0 < δ < 1 where C does not depend on y or μ .

In a similar way, the accuracies of constant and linear approximations of the


function |y − μ |−1 (y − μ ) of μ are given by
Lemma 6.2.
1. " "
" y−μ y "" |μ |
"
" |y − μ | − |y| " ≤ 2 r .
2. " "
" y−μ y 1 " |μ |1+δ
" − − [I − uu "
] μ ≤ C
" |y − μ | |y| r p " r1+δ
for all 0 < δ < 1 where C does not depend on y or μ .
See Appendix B.

We also often need the following lemma.

Lemma 6.3. Assume that the density function f (ε ) of the p-variate continuous ran-
dom vector ε is uniformly bounded. Then E{|ε |−α } exists for all 0 ≤ α < 2.

6.1.2 The test outer standardization

We consider first the location model

yi = μ + ε i , i = 1, ..., n,
6.1 Multivariate spatial sign test 61

where the independent residuals ε i have a joint density f (ε ) that is uniformly


bounded and they are centered so that

E(U(ε i )) = 0.

As before, the cumulative distribution function of ε i is denoted by F(ε ). The cen-


tering is needed to get an interpretation for the location parameter μ . We later show
that μ is the so-called spatial median of yi . We wish to test the null hypothesis

H0 : μ = 0.

The matrices
 
A = E |ε i |−1 (I p − |ε i |−2 ε i ε i ) and B = UCOV(F) = E{|ε i |−2 ε i ε i }

are often needed in the following. As the density function of ε i is continuous and
uniformly bounded A also exists and is bounded.

Multivariate spatial sign test is thus obtained with score function T(y) = U(y)
(spatial sign score). Write Ui = U(yi ), i = 1, ..., n. Then

T = T(Y) = AVE{Ui }

and
Q2 = Q2 (Y) = nT B̂−1 T,
where
B̂ = AVE{Ui Ui }.
Note that T(Y) is of course not a location statistic; it is only orthogonal equivariant
(T(YO ) = OT(Y)).

Totally nonparametric model. The sign test statistic is asymptotically valid


under extremely weak assumptions. For testing H0 : μ = 0 we only have to assume
that the observations yi are centered around μ in the sense that

E{U(yi − μ )} = 0.

Note that this assumption is naturally true under symmetry, that is, if −(yi − μ ) ∼
(yi − μ ). As B = E(Ui Ui ) always exists, the weak law of large numbers (WLLN,
Kolmogorov) implies that
B̂ →P B.
Then, using the central limit theorem (Lindeberg-Lévy) and Slutsky’s lemma, we
easily obtain the following.

Theorem 6.1. If the null hypothesis H0 : μ = 0 is true then nT →d N(0, B) and
the test statistic with outer standardization
62 6 One-sample problem: Spatial sign test and spatial median

Q2 = nT B̂−1 T → χ p2 .

It is remarkable that no assumptions are made on the distribution of the modulus


|yi − μ |. The modulus may even depend on direction Ui = |yi − μ |−1 (yi − μ ), and
so even skewed distributions are allowed. No assumptions on moments are needed.

The finite-sample power of the test can be approximated using the following
theorem.

Theorem 6.2. Assume that E{U(yi − μ )} = 0 and that the density of ε i is uniformly
bounded. Then, under the √ sequence of alternative distributions Hn : μ = n−1/2 δ ,
the limiting distribution of nT(Y) is N p (Aδ , B), and

Q2 (Y) → χ p2 (δ  AB−1 Aδ ),

a noncentral chi-square distribution with p degrees of freedom and noncentrality


parameter δ  AB−1 Aδ .
Proof. Lemmas 6.2 and 6.3 together imply that
  
√ δ √
nAVE U yi + √ = nAVE {U (yi )}
n
 
1 
+AVE I p − U(yi )U(yi ) δ + oP(1).
|yi |

See also Möttönen et al. (1997).

Nonparametric model with symmetry assumption. An exact sign-change


version of the test is obtained if we can assume that, under the null hypothesis,

JU ∼ U

for all n × n sign-change matrices J. Here U = (U1 , ..., Un ) . Note that sign covari-
ance matrix UCOV(Y) = UCOV(JY), for all J (invariance under sign changes).
Therefore

Q2 (Y) = 1n U(U U)−1 U 1n and Q2 (JY) = 1n JU(U U)−1 U J1n .

Recall that the exact p-value is calculated as


  
E I Q2 (JY) ≥ Q2 (Y) ,

where I{·} is an indicator function and the expected value is calculated for a uni-
formly distributed sign-change matrix J (with 2n possible values). In practice, the
expected value is naturally in often approximated by simulations from the uniform
distribution of J.
6.1 Multivariate spatial sign test 63

−0.5 0.0 0.5 1.0

0.5

E_N 0.0

−0.5

1.0

0.5

0.0 S_N
−0.5

0.5

W_N 0.0

−0.5

−1.0
−0.5 0.0 0.5 −1.0 0.0 0.5

Fig. 6.1 The scatterplot for spatial signs for differences E-N, S-N and W-N. The spatial signs lie
on the 3-variate unit sphere.

Example 6.1. Cork boring data. Consider again the 3-variate vector of E-N, S-
N and W-N. The 3-variate spatial signs are illustrated in Figure 6.1. The observed
value of Q2 (Y) is 13.874 with corresponding p-value 0.003:

> signs_3v <- spatial.sign(cork_3v, FALSE, FALSE)


> colMeans(signs_3v)
[1] -0.28654 0.01744 -0.28271
> SCov(signs_3v, location = c(0, 0, 0))
[,1] [,2] [,3]
[1,] 0.32215 0.05380 0.03225
[2,] 0.05380 0.34064 0.08021
[3,] 0.03225 0.08021 0.33721
> mv.1sample.test(cork_3v, score = "s")

One sample spatial sign test using outer standardization

data: cork_3v
Q.2 = 13.87, df = 3, p-value = 0.003082
alternative hypothesis: true location is not equal to c(0,0,0)

>
> pairs(signs_3v, labels = colnames(cork_3v), las = 1)
>
64 6 One-sample problem: Spatial sign test and spatial median

1.0

0.5

W_E
0.0

−0.5

−1.0

−1.0 −0.5 0.0 0.5 1.0

S_N

Fig. 6.2 The scatterplot for spatial signs for differences S-N and W-E.

Example 6.2. Cork boring data. In the bivariate case with spatial signs given
in Figure 6.2. The observed value of Q2 (Y) 0.017 with p-value 0.991. With the
R-package,

> signs_2v <- spatial.sign(cork_2v, FALSE, FALSE)


> colMeans(signs_2v)
[1] -0.0170622 0.0002194
> SCov(signs_2v, location = c(0, 0))
[,1] [,2]
[1,] 0.47083 0.01315
[2,] 0.01315 0.52917
> mv.1sample.test(cork_2v, score = "s")

One sample spatial sign test using outer standardization

data: cork_2v
Q.2 = 0.0173, df = 2, p-value = 0.9914
alternative hypothesis: true location is not equal to c(0,0)

>
> plot(signs_2v, xlab = "S_N", ylab = "W_E", ylim = c(-1, 1),
xlim = c(-1, 1), las = 1, pty = "s")
>
6.1 Multivariate spatial sign test 65

6.1.3 The test with inner standardization

Despite all its nice properties listed so far, the sign test statistic with outer stan-
dardization is unfortunately not affine invariant. Then, for example, the p-value
depends on the chosen coordinate system. It is, however, invariant under orthogonal
transformation; that is, with outer standardization,

Q2 (YO ) = 1n UO (OU UO )−1 OU 1n = 1n U(U U)−1 U 1n = Q2 (Y),

for all orthogonal O. Theorem 3.6 then implies the following.


Theorem 6.3. Let S = S(Y) be any scatter matrix with respect to the origin. Then
Q2 (YS−1/2 ) is affine invariant.

How should one then choose the scatter statistic S? It seems natural to use the
scatter matrix given by inner standardization. It then appears that the resulting affine
invariant sign test is distribution-free under extremely weak assumptions. Using the
inner standardization we get the following.
Definition 6.1. Tyler’s transformation S−1/2 is the transformation that makes the
spatial sign covariance matrix proportional to the identity matrix,

p · UCOV(YS−1/2 ) = I p .

The matrix can be chosen so that tr(S) = p; this shape matrix is then called Tyler’s
scatter matrix (with respect to the origin).
Tyler’s transformation (and Tyler’s shape matrix) exists under weak conditions;
see Tyler (1987). Tyler’s transformation tries to make the spatial signs of the trans-
formed data points ± S−1/2 yi , i = 1, ..., n, be uniformly distributed on the unit p-
sphere. Tyler’s shape matrix S and Tyler’s transformation S−1/2 are surprisingly
easy to compute. The iterative construction may begin with S = I p and an iteration
step is
S ← p S1/2 UCOV(YS−1/2 ) S1/2 .
If |p UCOV(YS−1/2 ) − I p | is sufficiently small, then stop and fix the scale by
S ← [p/tr(S)]S. Tyler (1987) gives weak conditions under which the algorithm con-
verges.

Tyler’s shape matrix S(Y) is thus calculated with respect to the origin. It is inter-
esting to note that
1. Its value depends on yi only through Ui = |yi |−1 yi .
2. It is affine equivariant in the sense that

S(YH ) ∝ HS(Y)H ,

for all datasets Y and all full rank matrices H.


66 6 One-sample problem: Spatial sign test and spatial median

Consider in the following the location-scatter model

yi = μ + Ωε i , i = 1, ..., n,

where the independent residuals ε i have a uniformly bounded density and the ε i are
centered and standardized so that

E(U(ε i )) = 0 and p · E(U(ε i )U(ε i ) ) = I p

(true in model (B2)). Then under the null hypothesis Tyler’s shape matrix S(Y)
converges in probability to [p/tr(Σ )]Σ where Σ = Ω Ω  .

The spatial signs of the Tyler transformed observations, Ûi = U(S−1/2 yi ), i =


1, ..., n, are called standardized spatial signs. As before, we then write

Û = (Û1 , ...., Ûn )

for the matrix of observed standardized spatial signs. The multivariate sign test
based on standardized signs, the spatial sign test with inner standardization, then
rejects H0 for large values of
p  2 " "2
Q2 (YS−1/2 ) = 1n Û(Û Û)−1 Û 1n = (1n Û) = np "AVE{Ûi }" .
n

Q2 (YS−1/2 ) is simply np times the squared length of the average direction of the
transformed data points. This test was proposed and developed in Randles (2000)
where the following important result is also given.

Theorem 6.4. The spatial sign test with inner standardization, Q2 (YS−1/2 ), is affine
invariant and strictly distribution-free in the model of (B1) of elliptical directions
(|ε i |−1 ε i uniformly distributed). The limiting distribution of Q2 (YS−1/2 ) under the
null hypothesis is then a χ p2 distribution.
Proof. The affine invariance was proven in Theorem 6.3. The fact that the test is
distribution-free under the model of elliptical directions follows from the fact that
Q2 (YS−1/2 ) depends on the observations only through |ε i |−1 εi , i = 1, ..., n.
Assume next (without loss of generality) that Ω = I p ; that is, Y = ε . Then S−1/2
is a root-n consistent estimate of I p (Tyler (1987)). Thus
√ −1/2
Δ∗ = n(S − I p) = OP (1).

Then write
S−1/2 = I p + n−1/2Δ ∗ ,
where Δ ∗ is thus bounded in probability. Using Lemma 6.2 we obtain

1 n 1 n 1 n
√ ∑ U(S−1/2 yi ) = √ ∑ Ui + ∑ (Δ ∗ − Ui Δ ∗ Ui )Ui + oP(1).
n i=1 n i=1 n i=1
6.1 Multivariate spatial sign test 67

For |Δ ∗ | < M, the second term in the expansion converges uniformly in probability
to zero due to its linearity with respect to the elements of Δ ∗ and due to the symmetry
of the distribution of Ui . Therefore

1 n 1 n
√ ∑ U(S−1/2 yi ) − √ ∑ Ui →P 0.
n i=1 n i=1

It is easy to see that both B(Y) and B(YS−1/2 ) converge in probability to B =


(1/p)I p . Therefore Q2 (YS−1/2 ) − Q2(Y) →P 0 and the result follows.

Example 6.3. Cork boring data. Consider again the 3-variate vector of E-N, S-
N and W-N. The 3-variate standardized spatial signs are illustrated in Figure 6.3.
The observed value of Q2 is 14.57 with corresponding p-value 0.002. Using the R
package,

> signs_i_3v <- spatial.sign(cork_3v, FALSE, TRUE)


>
> mv.1sample.test(cork_3v, score = "s", stand = "i")

One sample spatial sign test using inner standardization

data: cork_3v
Q.2 = 14.57, df = 3, p-value = 0.002222
alternative hypothesis: true location is not equal to c(0,0,0)

>
> pairs(signs_i_3v, labels = colnames(cork_3v), las = 1)
>

Example 6.4. Cork boring data. In the bivariate case with standardized signs in
Figure 6.4, the null hypothesis can not be rejected in this case as Q2 (Y) is 0.012
with p-value 0.994.

>
> signs_i_2v <- spatial.sign(cork_2v, FALSE, TRUE)
>
> mv.1sample.test(cork_2v, score = "s", stand = "i")

One sample spatial sign test using inner standardization

data: cork_2v
Q.2 = 0.0117, df = 2, p-value = 0.9942
alternative hypothesis: true location is not equal to c(0,0)
68 6 One-sample problem: Spatial sign test and spatial median

−0.5 0.0 0.5 1.0

0.5

E_N 0.0

−0.5

−1.0
1.0

0.5

0.0 S_N
−0.5

0.5

W_N 0.0

−0.5

−1.0 0.0 0.5 −0.5 0.0 0.5

Fig. 6.3 The scatterplot for (inner) standardized spatial signs for differences E-N, S-N and W-N.
The spatial signs lie in the 3-variate unit sphere.

>
> plot(signs_i_2v, xlab = "S_N", ylab = "W_E", ylim = c(-1, 1),
xlim = c(-1, 1), las = 1, pty = "s")
>

6.1.4 Other sign-based approaches for testing problem

We show some connections to other test statistics proposed in the literature. Assume
the model (B2). Then

E(|ε i |−1 ε i ) = 0 and p · E(|ε i |−2 ε i ε i ) = I p .

If μ = 0 and Ω = I p then

p n n
Q2 (Y) − ∑ ∑ Ui  U j →P 0.
n i=1 j=1

Therefore, in this case, the spatial sign test statistic is asymptotically equivalent to
Rayleigh’s statistic
6.1 Multivariate spatial sign test 69

1.0

0.5

W_E
0.0

−0.5

−1.0

−1.0 −0.5 0.0 0.5 1.0

S_N

Fig. 6.4 The scatterplot for standardized spatial signs for differences S-N and W-E.

p n n p n n
∑ ∑
n i=1 j=1
cos(Ui , U j ) = ∑ ∑ cos(yi , y j ),
n i=1 j=1

where cos(yi , y j ) is the cosine of the angle between yi and y j . Next note that if
S = S(Y) is Tyler’s scatter matrix then

p n n
Q2 (YS−1/2 ) = ∑ ∑ cos(Ûi , Û j ).
n i=1 j=1

Randles (1989) introduced a nonparametric counterpart of cos(Ui , U j ) based on the


so-called interdirection counts. His test statistic was
p n n
V (Y) = ∑ ∑ cos(π p̂i, j ),
n i=1 j=1

where the proportion p̂i, j is the observed fraction of times that yi and y j fall on oppo-
site sides of data-based hyperplanes formed by the origin and p − 1 data points. This
is an extension of the Blumen (1958) bivariate sign test. The test statistic is affine
invariant and strictly distribution-free under the model (B1) of elliptical directions.
It is remarkable that no scatter matrix estimate is then needed to attain affine equiv-
ariance. The test is, however, computationally difficult in high dimensions. See also
Chaudhuri and Sengupta (1993) and Koshevoy et al. (2004).
70 6 One-sample problem: Spatial sign test and spatial median

The sign test using the affine equivariant Oja signs (see Oja (1999)) is also, in the
elliptic case, asymptotically equivalent to the invariant version of the spatial sign test
using Q2 (YS−1/2 ). The latter is again computationally much more convenient. For
general classes of distribution-free bivariate sign tests, see Oja and Nyblom (1989)
and Larocque et al. (2000).

The one-sample location test based on marginal signs is described in Puri and
Sen (1971). The test is not affine invariant but it is invariant under odd monotone
transformations to the marginal variables. Affine invariant versions are obtained us-
ing the transformation technique described in Chakraborty and Chaudhuri (1999).
See also the approach based on the invariant coordinate system in Nordhausen et al.
(2009).

6.2 Multivariate spatial median

6.2.1 The regular spatial median

Next we consider the one-sample location estimation problem. The estimate cor-
responding to the spatial sign test is the so-called spatial median μ̂ = μ̂ (Y) which
is, under general assumptions, a root-n consistent estimate of the true spatial popu-
lation median μ , and has good limiting and finite-sample distributional properties.
The estimate μ̂ is also robust with a bounded influence function and a breakdown
point 1/2. In addition, we also give an affine equivariant version of the estimate.

We introduce the spatial median using the corresponding objective function: as


seen before, the spatial sign score corresponds to the sum of Euclidean distances.
Recall that the sample mean vector minimizes the objective function which is the
mean of squared Euclidean distances AVE{|yi − μ |2 }. A natural alternative measure
of distance is just the mean of Euclidean distances, and we define the following.

Definition 6.2. The sample spatial median μ̂ (Y) minimizes the criterion function
AVE{|yi − μ |} or, equivalently,

Dn (μ ) = AVE{|yi − μ | − |yi|}.

The spatial median has a very long history starting in Weber (1909), Gini and
Galvani (1929) and Haldane (1948). Gower (1974) used the term mediancenter.
Brown (1983) has developed many of the properties of the spatial median. This min-
imization problem is also sometimes known as the Fermat-Weber location problem.
Taking the gradient of the objective function, one sees that if μ̂ solves the equation

AVE{U(yi − μ̂ )} = 0,
6.2 Multivariate spatial median 71

then μ̂ is the observed spatial median. This shows the connection between the spatial
median and the spatial sign test. The estimate μ̂ is the value of the location parameter
which, if used as a null value, offers the highest possible p-value. The solution can
also be seen as the shift vector corresponding to the inner centering of the spatial
sign score test; the location shift makes the spatial signs (directions) of the centered
data points sum up to 0.

The spatial median is unique, if the dimension of the data cloud is greater than
one Milasevic and Ducharme (1987). The Weiszfeld algorithm for the computation
of the spatial median has a simple iteration step,

AVE{U(yi − μ )}
μ ← μ + .
AVE{|yi − μ |−1}

The algorithm may fail sometimes, however, but a slightly modified algorithm that
converges quickly and monotonically is described by Vardi and Zhang (2001).

Next we consider the consistency and limiting distribution of the spatial median
μ̂ (Y). Under general assumptions, if Y is a random sample from F, then the sample
spatial median μ̂ (Y) converges to the population spatial median μ = μ (F).

Definition 6.3. Assume that the cumulative distribution function of yi is F. Then the
theoretical or population spatial median μ = μ (F) minimizes the criterion function

D(μ ) = E{|yi − μ | − |yi|}.

Note that, as |yi − μ | − |yi| ≤ |μ |, the expectation in the above definition always
exists.

For the asymptotical properties of the estimate we need the following assump-
tion.
Assumption 1 The density function of yi is uniformly bounded and continuous.
Moreover, the population spatial median μ = μ (F) is unique; that is, D(μ  ) > D(μ )
for all μ  = μ .

For the consideration of the limiting distribution, we can assume that the popu-
lation spatial median is 0. This is not a restriction as the estimate and the functional
are clearly shift equivariant. Note also that no assumption about the existence of the
moments is needed.

First note that both Dn (μ ) and D(μ ) are convex and that, based on Lemmas 6.1
and 6.3, we have pointwise convergences

Dn (μ ) →P D(μ ) = μ  Aμ + o(|μ |2 )

and
72 6 One-sample problem: Spatial sign test and spatial median
 
√ 1
nDn (n−1/2 μ ) − nT − Aμ μ = oP (1).
2
Then Theorem B.5 in Appendix B gives the following result.


Theorem 6.5. Under Assumption 1, μ̂ →P μ and the limiting distribution of n(μ̂ −
μ ) is 
N p 0, A−1 BA−1 .

The proofs can be found in Appendix B. Heuristically, the results in Theorem 6.5
can be simply based on Taylor’s expansion with μ = 0,
√ √ √
0 = n AVE{U(yi − μ̂ )} = n AVE{Ui } − AVE{A(yi )} nμ̂ + oP(1),

where
A(y) = |y|−1 (I − |y|−2 yy )
is the p × p Hessian matrix (matrix of the second derivatives) of |y|. The result
follows as
AVE{A(yi )} →P A
and √
n AVE{Ui } →d N p (0, B).

For a practical use of the normal approximation of the distribution of μ̂ one


naturally needs an estimate for the asymptotic covariance matrix A−1 BA−1 . We
estimate A and B separately. First, let

A(y) = |y|−1 (I − |y|−2yy ) and B(y) = |y|−2 yy .

Then write

 = AVE {A(yi − μ̂ )} and B̂ = AVE {B(yi − μ̂ )} ,

which, under the stated assumption, converge in probability to the population values

A = E {A(yi − μ )} and B = E {B(yi − μ )} ,

respectively. We now prove the following

Theorem 6.6. Under Assumption 1, Â →P A and B̂ →P B.

Proof. It is not a restriction to assume that μ = 0. Then the spatial median μ̂ is a


root-n consistent estimate of 0. We write

à = AVE {A(yi )} and B̃ = AVE {B(yi )} .


6.2 Multivariate spatial median 73

Note that à and B̃ would be our estimates for A and for B, respectively, if we knew
the true value μ = 0. Then naturally

à →P A and B̃ →P B.

As " "
" a−b a "" |b|
"
" |a − b| − |a| " ≤ 2 |a| , ∀ a = 0, b

and " "


" (a − b)(a − b) aa " |b|
" − 2 "" ≤ 4 , ∀ a = 0, b
" |a − b|2 |a| |a|
then
1 n |μ̂ |
|B̂ − B̃| ≤ ∑ 4 |yi | →P 0.
n i=1

(Use Slutsky’s theorem.) As B̃ →P B, also B̂ →P B.


It is much trickier to prove that  →P A. We need to play with three positive
constants
√ δ1 , δ2 , and δ3 . We use the root-n consistency of μ̂ and assume that |μ̂ | <
δ1 / n. (This is true with a probability that can be made close to one with large δ1 .)
Also the observations yi should somehow be blocked out from μ̂ , i = 1, ..., n. For
that, write
 
δ2
I1i = I |yi − μ̂ | < √ ,
n
 
δ2
I2i = I √ ≤ |yi − μ̂ | < δ3 , and
n
I3i = I { |yi − μ̂ | ≥ δ3 } .

Then
1 n
à −  = ∑ (A(yi ) − A(yi − μ̂ ))
n i=1
1 n
= ∑ (I1i · [A(yi) − A(yi − μ̂ )])
n i=1
1 n
+ ∑ (I2i · [A(yi ) − A(yi − μ̂ )])
n i=1
1 n
+ ∑ (I3i · [A(yi ) − A(yi − μ̂ )]) .
n i=1

The first average, with nonzero terms in a shrinking neighborhood of μ̂ only, is zero
with a probability
 n  n
δ p cM δ 2 cM
→ e−cMδ2 ,
2
P(I11 = · · · = I1n = 0) ≥ 1 − 2 p/2 ≥ 1− 2
n n
74 6 One-sample problem: Spatial sign test and spatial median

where M = supy f (y) < ∞ and c = π p/2 /Γ ((p + 3)/2) is the volume of the p-variate
unit ball. The first average is thus zero with a probability that can be made close to
one with small choices of δ2 > 0. For the second average, one gets

1 n 1 n 6I2i |μ̂ |

n i=1
|I2i · [A(y i ) − A(y i − μ̂ )]| ≤ ∑ |yi − μ̂ ||yi|
n i=1
1 n 6I2i δ1
≤ ∑ δ2 |yi |
n i=1

which converges to a constant that can be made as close to zero as one wishes with
small δ3 > 0. Finally also the third average

1 n 1 n 6I3i |μ̂ |

n i=1
|I3i · [A(yi ) − A(yi − μ̂ )]| ≤ ∑
n i=1 |yi − μ̂ ||yi |
1 n 6I3i δ1
≤ √ ∑
n n i=1 δ3 |yi |

converges to zero in probability for all choices of δ1 and δ3 .

Theorems 7.4 and 6.6 thus imply that the distribution of μ̂ can be approximated
by  
1 −1 −1
Np μ , Â B̂Â ,
n
and approximate confidence ellipsoids for μ can be constructed. In the spherically
symmetric case the estimation is much easier as the matrices are simply (p > 1)

(p − 1)E[|yi − μ |−1 ] 1
A= Ip and B= Ip.
p p
An estimate of the limiting covariance matrix of the spatial median is then
p 1
Ip.
(p − 1) [AVE{|yi − μ̂ |−1 }]2
2

The spatial median is extremely robust: Brown showed that the estimator has a
bounded influence function. It also has a breakdown point of 1/2. See Niinimaa and
Oja (1995) and Lopuhaä and Rousseeuw (1991).

If all components are on the same unit of measurement (and all the components
may be rescaled only in a similar way), the spatial median is an attractive descriptive
measure of location. Rotating the data cloud rotates the median correspondingly;
that is,
6.2 Multivariate spatial median 75

μ̂ (YO ) = Oμ̂ (Y).


Unfortunately, the estimate is not equivariant under arbitrary affine transformations.

6.2.2 The estimate with inner standardization

To create an affine equivariant version of the multivariate median, we transform the


data as we did in the test construction. The estimates, however, must be transformed
back to the original coordinate system. The procedure is then as follows.
1. Take any scatter matrix S = S(Y).
2. Standardize the data matrix: YS−1/2 .
3. Find the spatial median for the standardized data matrix μ̂ (YS−1/2 ).
4. Retransform the estimate: μ̃ (Y) = S1/2 μ̂ (YS−1/2 ).
This median utilizing “data-driven” transformation S−1/2 is known as the
transformation-retransformation (TR) spatial median, and was considered by
Chakraborty et al. (1998). Then the affine equivariance follows.

Theorem 6.7. Let S = S(Y) be any scatter matrix. Then the transformation retrans-
formation spatial median

μ̃ (Y) = S1/2 μ̂ (YS−1/2 )

is affine equivariant.

It is remarkable that the almost sure convergence and the limiting normality of
the spatial median did not require any moment assumptions. Therefore, for the trans-
formation, a scatter matrix with weak assumptions should be used as well. It is an
appealing idea also to link the spatial median with Tyler’s transformation. This was
proposed Hettmansperger and Randles (2002).

Definition 6.4. Let μ be a p-vector and S > 0 a symmetric p × p matrix, and define

ε i = ε i (μ , S) = S−1/2 (yi − μ ), i = 1, ..., n.

The Hettmansperger-Randles (HR) estimate of location and scatter are the values of
μ and S that simultaneously satisfy
 
AVE {U(ε i )} = 0 and p AVE U(ε i )U(ε i ) = I p .

In the HR estimation, the location estimate is the TR spatial median, and the
scatter estimate is Tyler’s estimate with respect to the TR spatial median. The shift
vector and scatter matrix are thus obtained using inner centering and standardization
with the spatial sign score function U(y). The location and scatter estimates are
affine equivariant and apparently estimate μ and Σ in the model (B2) of elliptical
76 6 One-sample problem: Spatial sign test and spatial median

directions. Their properties were developed by Hettmansperger and Randles (2002).


They showed that the HR estimate has a bounded influence function and a positive
breakdown point. The limiting distribution is given by the following.

Theorem 6.8. Let Y = (y1 , ..., yn ) be a random sample and assume that the yi are
generated by
yi = Ω ε i + μ , i = 1, ..., n,
where  
E {U(ε i )} = 0 and p E U(ε i )U(ε i ) = I p .

Then the limiting distribution of n(μ̃ − μ ) is N p (0, p−1 S1/2 A−2 S1/2 ) where A =
E(A(S−1/2 yi )) and S is Tyler’s scatter matrix.

The HR estimate is easy to compute even in high dimensions. The iteration steps
(as in M-estimation) first update the residuals, then the location center, and finally
the scatter matrix as follows.
1.
ε i ← S−1/2 (yi − μ ), i = 1, ..., n.
2.
S1/2 AVE{U(ε i )}
μ← μ + .
AVE{|ε i |−1 }
3.
S ← p S1/2 AVE{U(ε i )U(ε i ) } S1/2 .

Unfortunately, there is no proof so far for the convergence of the above algorithm
although in practice it always seems to work. There is no proof for the existence or
uniqueness of the HR estimate either. In practice, this is not a problem, however. If,
in the spherical case around the origin, the initial location and shape estimates, say
M and S are root-n consistent, that is,
√ √
nM = OP (1) and n(S − I p) = OP (1)

and tr(S) = p, then the k-step estimates (obtained after k iterations of the above
algorithm) satisfy
 k
√ 1 √
nMk = nM
p
#  k $
1 1 p √
+ 1− −1 p − 1
nAVE{ui } + oP(1)
p E(ri )

and
6.2 Multivariate spatial median 77
 k
√ 2 √
n(Sk − I p ) = n(S − I p )
p+2
#  k $
2 p + 2√ 
+ 1− n p · AVE{ui ui } − I p + oP (1).
p+2 p

Asymptotically, the k-step estimate behaves as a linear combination of the initial


pair of estimates and Hettmansperger-Randles estimate. The larger k is, the more
similar is the distribution to that of the HR estimate.

Example 6.5. Cork boring data. If we then wish to estimate the unknown spatial
median (3-variate case), then the regular spatial median and HR estimate behave in
a quite similar way. See also Figure 6.5.

> est.sign.o.3v <- mv.1sample.est(cork_3v, "s")


> summary(est.sign.o.3v)
The spatial median of cork_3v is:
[1] -3.5013 -0.0875 -3.9750

And has the covariance matrix:


[,1] [,2] [,3]
[1,] 2.0120 0.9025 0.4734
[2,] 0.9025 3.3516 1.3180
[3,] 0.4734 1.3180 3.3704
>
> est.sign.i.3v <- mv.1sample.est(cork_3v, "s", "i")
> summary(est.sign.i.3v)
The equivariant spatial median of cork_3v is:
[1] -3.2736 -0.0013 -4.2687

And has the covariance matrix:


[,1] [,2] [,3]
[1,] 2.172 1.374 0.154
[2,] 1.374 3.487 1.297
[3,] 0.154 1.297 2.364
>
> plotMvloc(est.sign.o.3v, est.sign.i.3v, X=cork_3v,
color.ell=1:3, , lty.ell=1:3, pch.ell= 15:17)

Example 6.6. Cork boring data. In the bivariate case we get

> summary(est.sign.o.2v)
The spatial median of cork_2v is:
[1] -0.3019 0.0580

And has the covariance matrix:


[,1] [,2]
78 6 One-sample problem: Spatial sign test and spatial median

−10 0 5 −20 −10 0 5

5
E_N

−5
−20
5

5
S_N
0

0
−10

−10
0 5

W_N
−10
−20

−20 −5 5 −10 0 5

spatial median
equivariant spatial median

Fig. 6.5 The scatterplot with estimated regular spatial median and the HR estimate and their 95%
confidence ellipsoids.

[1,] 5.434 -1.883


[2,] -1.883 6.184
>
> est.sign.i.2v <- mv.1sample.est(cork_2v, "s", "i")
> summary(est.sign.i.2v)
The equivariant spatial median of cork_2v is:
[1] -0.2467 -0.0234

And has the covariance matrix:


[,1] [,2]
[1,] 5.626 -1.595
[2,] -1.595 6.017
>
> plotMvloc(est.sign.o.2v, est.sign.i.2v, X=cork_2v,
color.ell=1:3, , lty.ell=1:3, pch.ell= 15:17)

The regular spatial median and HR estimate behave almost identically in the
case of bivariate data; see Figure 6.6. However, if the second component is multi-
plied by 10 ( Figure 6.7), the results differ. The equivariant spatial median now has
a smaller confidence ellipsoid because the regular spatial median loses efficiency
if the marginal variables are heterogeneous in their variation. The R code for this
comparison is as follows.
6.2 Multivariate spatial median 79

−20 0 10

5 10
S_N

0
−10
10

W_E
0
−20

−10 0 5 10
spatial median
equivariant spatial median

Fig. 6.6 The scatterplot with estimated regular spatial median and the HR estimate and their 95%
confidence ellipsoids.

> cork_2v_scaled <- transform(cork_2v, W_E = W_E * 10)


>
> est.sign.o.2v_scaled <- mv.1sample.est(cork_2v_scaled, "s")
> summary(est.sign.o.2v_scaled)
The spatial median of cork_2v_scaled is:
[1] 0.4728 12.7727

And has the covariance matrix:


[,1] [,2]
[1,] 11.161 4.616
[2,] 4.616 880.176
>
> est.sign.i.2v_scaled <-
mv.1sample.est(cork_2v_scaled, "s", "i")
> summary(est.sign.i.2v_scaled)
The equivariant spatial median of cork_2v_scaled is:
[1] -0.2467 -0.2337

And has the covariance matrix:


[,1] [,2]
[1,] 5.626 -15.94
[2,] -15.945 601.74
>
> plotMvloc(est.sign.o.2v_scaled, est.sign.i.2v_scaled,
80 6 One-sample problem: Spatial sign test and spatial median

X=cork_2v_scaled, color.ell=1:3, , lty.ell=1:3,


pch.ell= 15:17)

−200 0 100

5 10
S_N

0
−10
100

W_E
0
−200

−10 0 5 10
spatial median
equivariant spatial median

Fig. 6.7 The estimates with 95 % confidence ellipsoids for the regular spatial median and the affine
equivariant spatial median (HR estimate).

6.2.3 Other multivariate medians

The vector of marginal medians minimizes the L1 objective function

AVE{|yi1 − μ1 | + · · · + |yip − μ p|}.

See Puri and Sen (1971) and Rao (1988) for the asymptotic covariance matrix
of the vector of marginal sample medians. The asymptotic efficiencies naturally
agree with the univariate asymptotic efficiencies. It is not affine invariant but the
transformation-retransformation technique can be used to find the invariant version
of the estimate; see Chakraborty and Chaudhuri (1998).

There are a number of affine equivariant multivariate generalizations of the me-


dian: the half-space median Tukey (1975), the multivariate Oja median Oja (1983),
6.2 Multivariate spatial median 81

and the multivariate Liu median Liu (1990). For these and other multivariate medi-
ans, see the surveys by Small (1990) and Niinimaa and Oja (1999).
Chapter 7
One-sample problem: Spatial signed-rank test
and Hodges-Lehmann estimate

Abstract The spatial signed-rank score function Q(y) is used for the one-sample
location problem. The test is then the spatial signed-rank test, and the estimate is the
spatial Hodges-Lehmann estimate. The tests and estimates based on outer standard-
ization as well as those based on inner standardization are again discussed.

7.1 Multivariate spatial signed-rank test

We consider first the location model

yi = μ + ε i , i = 1, ..., n,

where the independent residuals ε i have a joint density f (ε ) that is uniformly


bounded. The residuals are now thought to be centered so that

E(U(ε i + ε j )) = 0, i = j.

As before, the cumulative distribution function of ε i is denoted by F(ε ). In this


model, parameter μ is the so-called Hodges-Lehmann center of the distribution of
yi ; that is,
E(U(yi + y j − 2 μ )) = 0, i = j.
Parameter μ is the spatial median of the distribution of the average (y1 + y2 )/2. If
F is symmetrical then naturally μ is the symmetry center. We wish to test the null
hypothesis
H0 : μ = 0.

Again, we start by giving the (theoretical) score function QF , the test statistic
T(Y), and the matrices A and B. The multivariate spatial signed-rank test is obtained
if one uses the spatial signed-rank score function
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 83
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 7,
c Springer Science+Business Media, LLC 2010
84 7 One-sample problem: Spatial signed-rank test and Hodges-Lehmann estimate

1
T(y) = QF (y) = E {U(y − εi ) + U(y + εi)} .
2
Then the test for testing H0 : μ = 0 uses

T(Y) = AVE{QF (yi )}.

The asymptotical properties of the test are based on the matrices

A = E {A(ε 1 + ε 2 )} and B = QCOV(F) = E(QF (ε i )QF (ε i ) ),

where as before A(ε ) = |ε |−1 (I p − |ε |−2 εε  ). As the density function of ε i is as-


sumed to be continuous and uniformly bounded A also exists.

Note that the population signed-rank score function is of course unknown in


practice and it is in the test construction replaced by the estimated score function
1
Q(y) = AVE {U(y − yi) + U(y + yi)} .
2
(Q(y) is a consistent estimate of QF (y) if the null hypothesis is true.) Then the
observed spatial signed-ranks are then

Qi = Q(yi ), i = 1, ..., n.

and the test statistic using the estimated scores is simply the following.
Definition 7.1. The spatial signed-rank test statistic for testing H0 : μ = 0 is the
average of spatial signed-ranks,

T̂(Y) = AVE{Qi }.

 
As AVE U(yi − y j ) = 0 the test statistic can be simplified to

1  
T̂(Y) = AVE U(yi + y j ) .
2
The statistic T̂ is a V-statistic and asymptotically equivalent to the corresponding
U-statistic
1  
T̃(Y) = AVEi< j U(yi − y j ) + U(yi + y j ) .
2
For the theory of U-statistics and V-statistics, we refer to Serfling (1980). Asymp-
totic equivalence means that
√ 
n T̃(Y) − T̂(Y) →P 0.

The test statistic T̂ is only orthogonal equivariant, not affine equivariant. Moreover,
its finite sample and asymptotic null distribution depend both on the distributions
7.1 Multivariate spatial signed-rank test 85

of modulus |yi | and on the distribution of the direction Ui . A natural estimate of its
asymptotic covariance matrix is the signed-rank covariance matrix,
 
B̂ = B(Y) = QCOV(Y) = AVE Qi Qi .

Note that B̂ is asymptotically equivalent with a U-statistic with symmetric and


bounded kernel and the next lemma easily follows.
Lemma 7.1. Under H0 , B̂ →P B.

We thus replace the “true” but unknown test statistic T(Y) by test statistic T̂(Y)
which in turn is asymptotically equivalent with T̃(Y). It is straightforward to see
that T(Y) is the projection of T̃(Y) in the sense that
n
T(Y) = ∑ E[T̃(Y)|yi ]
i=1

and therefore (use Theorem 5.3.2 in Serfling (1980) with a bounded kernel) the
following holds.
√ √
Lemma 7.2. n[T(Y) − T̃(Y)] →P 0 and also n[T(Y) − T̂(Y)] →P 0.
The regular central limit theorem (CLT) with independent and identically dis-
tributed observations then gives for T̂ = T̂(Y)

Theorem 7.1. Under H0 : μ = 0, nT̂ →d N(0, B) and

Q2 = Q2 (Y) = nT̂ B̂−1 T̂ → χ p2 .

Totally nonparametric model. We collect the results obtained so far. We first


transform
Y = (y1 , ..., yn ) → Q = (Q1 , ..., Qn ) .
We then use outer standardization and get (the squared form of) the test statistic

Q2 = 1n Q(Q Q)−1 Q 1n or n · tr Q P1n Q(Q Q)−1 .

If the distribution of yi is centered around μ in the sense that

E(U(y1 + y2 − 2 μ )) = 0,

then the limiting distribution of the test statistic Q2 under H0 : μ = 0 is a chi square
distribution with p degrees of freedom and the approximate p-values are found as
the tail probabilities of χ p2 .

Recall that Hotelling’s test and the test based on spatial signs were similarly

1n Y(Y Y)−1 Y 1n and 1n U(U U)−1 U 1n ,


86 7 One-sample problem: Spatial signed-rank test and Hodges-Lehmann estimate

respectively.

Nonparametric model with symmetry assumptions. An exact sign-change


test version is obtained if
JY ∼ Y
for all n × n sign-change matrices J. Note that QCOV(JY) = QCOV(Y) and

Q2 (JY) = 1n JQ(Q Q)−1 Q J1n

and the exact p-value for the conditionally distribution-free sign-change test is
  
EJ I Q2 (JY) ≥ Q2 (Y) .

Consider next the model

yi = Ω ε i + μ , i = 1, ..., n,

where the standardized residuals ε i are independent, symmetric, and centered so


that
E(U(ε i + ε j )) = 0, i = j.
Symmetry center μ is thus the Hodges-Lehmann center of the distribution of yi . As
the model is closed under affine transformations, it is natural to require that the test
is affine invariant (the selected coordinate system does not have an effect on the p-
value obtained from the test). The signed-rank test statistic is not affine invariant.
However, we again have the following result.

Theorem 7.2. Let S = S(Y) be any scatter matrix. Then the signed-rank test statistic
calculated for the transformed data set, Q2 (YS−1/2 ), is affine invariant.

A natural choice for S is the scatter matrix that makes the signed-rank covariance
matrix of the standardized observations proportional to the identity matrix. Let us
give the following definition.

Definition 7.2. The scatter matrix based on signed-ranks is the symmetric p × p


matrix S = S(Y) > 0 with tr(S) = p such that, if Q̂i = QYS−1/2 (S−1/2 yi ) then the
signed-ranks Q̂i satisfy
   
p · AVE Q̂i Q̂i = AVE |Q̂i |2 · I p .

The transformation S−1/2 makes the signed-rank covariance matrix QCOV


(YS−1/2 ) proportional to the identity matrix as if the signed-ranks were spherically
distributed on the unit p-ball B p . The iterative algorithm for its computation again
uses the iteration steps
p
S ← S1/2 QCOV(YS−1/2 ) S1/2 and S ← S.
tr(S)
7.1 Multivariate spatial signed-rank test 87

Unfortunately, unlike for Tyler’s scatter matrix, there is no proof of the convergence
of the algorithm so far, but in practice it seems always to converge.

The spatial signed-ranks of the transformed observations, Q̂i , i = 1, ..., n, are


called standardized spatial signed-ranks. The multivariate spatial signed-rank test
based on the inner standardization then rejects H0 for large values of
" "
"AVE{Q̂i }"2
−1/2
Q (YS
2
) = 1n Q̂(Q̂ Q̂)−1 Q̂ 1n = np · ,
AVE{|Q̂i |2 }

which is simply np times the ratio of the squared length of the average signed-rank
to the average of squared lengths of signed-ranks. As in the case of the spatial sign
test, we have the next theorem.

Theorem 7.3. Test statistic Q2 (YS−1/2 ) is affine invariant and, under the null hy-
pothesis H0 : μ = 0,
Q2 (YS−1/2 ) →d χ p2 .

Elliptically symmetric model. Consider the elliptical model

yi = Ω ε i + μ , i = 1, ..., n,

where ε i has a spherical distribution with cumulative distribution function F. Write


ε = (ε 1 , ..., ε n ) . Now the statistics Q2 (YS−1/2 ) and Q2 (ε ) are asymptotically equiv-
alent. Let us construct the test using observations (ε 1 , ..., ε n ) from a spherically
symmetric distribution (model (A2)). Then

E(q2F (|ε i |)) τ2


B = QCOV(F) = RCOV(F) = Ip = Ip,
p p

where constant τ 2 depends on the distribution of |ε i |. But then the test statistic
Q2 (YS−1/2 ) is asymptotically equivalent to
p
4n3 τ 2 ∑
cos(ε i + ε j , ε i + ε j ),

where i, j, i , and j all go over indices 1, ..., n. Jan and Randles (1994) constructed
an affine invariant analogue of this test based again on the interdirection counts.

Example 7.1. Cork boring data Consider again the datasets with a 3-variate
vector of E-N, S-N and W-N. The standardized spatial signed-ranks are illustrated
in Figures 7.1. The observed value of Q2 (Y), with inner standardization, in the 3-
variate case is 13.67 and the corresponding p-value is 0.003.
88 7 One-sample problem: Spatial signed-rank test and Hodges-Lehmann estimate

> signed_ranks_i_3v <- spatial.signrank(cork_3v, FALSE, TRUE)


>
> pairs(signed_ranks_i_3v, labels = colnames(cork_3v), las = 1)
>

> mv.1sample.test(cork_3v, score = "r", stand = "i")

One sample spatial signed-rank test using


inner standardization

data: cork_3v
Q.2 = 13.67, df = 3, p-value = 0.003384
alternative hypothesis: true location is not equal to c(0,0,0)

−0.4 0.0 0.4

0.6
0.4
0.2

E_N 0.0
−0.2
−0.4
−0.6

0.6
0.4
0.2
0.0
S_N
−0.2
−0.4

0.4
0.2
0.0
W_N −0.2
−0.4
−0.6
−0.8
−0.6 −0.2 0.2 0.6 −0.8 −0.4 0.0 0.4

Fig. 7.1 The standardized spatial signed-ranks for the 3-variate data.

Next consider the bivariate data with variables S-N and W-E. The standardized
spatial signed-ranks are illustrated in Figure 7.2. The observed value of Q2 (Y) with
inner standardization is 0.44 with corresponding p-value 0.80.

> signed_ranks_i_2v <- spatial.signrank(cork_2v, FALSE, TRUE)


>
7.2 Multivariate spatial Hodges-Lehmann estimate 89

> plot(signed_ranks_i_2v, xlab = "S_N", ylab = "W_E",


ylim = c(-1, 1), xlim = c(-1, 1), las = 1, pty = "s")
>
> mv.1sample.test(cork_2v, score = "r", stand = "i")

One sample spatial signed-rank test using


inner standardization

data: cork_2v
Q.2 = 0.4373, df = 2, p-value = 0.8036
alternative hypothesis: true location is not equal to c(0,0)

1.0

0.5
W_E

0.0

−0.5

−1.0

−1.0 −0.5 0.0 0.5 1.0

S_N

Fig. 7.2 The standardized spatial signed-ranks for the 2-variate data.

7.2 Multivariate spatial Hodges-Lehmann estimate

We now move to the estimation problem and define the multivariate Hodges-
Lehmann estimate of location center μ as the spatial median of all pairwise means,
Walsh averages
yi + y j
, i, j = 1, ..., n.
2
90 7 One-sample problem: Spatial signed-rank test and Hodges-Lehmann estimate

We thus define the following.

Definition 7.3. The sample spatial Hodges-Lehmann (HL) estimate μ̂ (Y) mini-
mizes the criterion function
 
Dn (μ ) = AVE |yi + y j − 2 μ | − |yi + y j | .

The link between the HL estimate and the signed-rank test statistic is again that
μ̂ often solves the equation

AVE{U(yi + y j − 2 μ̂ )} = 0.

The estimate μ̂ as a null value is the value with the highest possible p-value pro-
duced by the spatial signed-rank test. Again, the spatial median is unique, if the
dimension of the data cloud is greater than one. The iteration step to compute its
value is
1 AVE{U(yi + y j − 2 μ )}
μ ← μ + .
2 AVE{|(yi + y j ) − 2 μ |−1 }

Consider next the limiting distribution of the HL estimate μ̂ (Y) under mild as-
sumption.
Assumption 2 The density of yi is uniformly bounded and continuous with a unique
spatial median minimizing
 
D(μ ) = E |yi + y j − 2 μ | − |yi + y j | , i = j.

Again, for the asymptotical results, it is not a restriction to assume that μ = 0. As


in the case of the spatial median, both Dn (μ ) and D(μ ) are again convex and one
can use Lemmas 6.1 and 6.3 for the pointwise results

Dn (μ ) →P D(μ ) = 4μ  Aμ + o(|μ |2 )

and √
nDn (n−1/2 μ ) − 4( nT − Aμ ) μ = oP (1),
where  
A = E A (yi + y j ) , i = j
with A(y) = |y|−1 (I − |y|−2yy ). Then Theorem B.5 in Appendix B gives the fol-
lowing result.


Theorem 7.4. Under Assumption 2, μ̂ →P μ and the limiting distribution of n(μ̂ −
μ ) is 
N p 0, A−1 BA−1 .
7.2 Multivariate spatial Hodges-Lehmann estimate 91

We next define sample statistics needed in the estimation of the limiting covari-
ance matrix of the HL estimate. Now write
 
 = A(Y − 1n μ̂  ) = AVE A (yi + y j − 2 μ̂ ) and B̂ = B(Y − 1n μ̂  )

which, under the stated assumption, converge in probability to the population values
   
A = E A (ε i + ε j ) , i = j, and B = E QF (ε i )QF (ε i ) ,

respectively. (The proof is similar to that in the spatial median case.) Theorem 7.4
suggests that the distribution of μ̂ (Y) can be approximated by
 
1 −1 −1
N p μ , Â B̂Â ,
n

and approximate confidence ellipsoids for μ can be constructed.

The spatial Hodges-Lehmann estimate with the inner standardization is obtained


by linking the spatial Hodges-Lehmaan estimate with the scatter matrix based on
spatial signed-ranks in the following way.
Definition 7.4. Let μ be a p-vector and S > 0 a symmetric p × p matrix, and define

ε i = ε i (μ , S) = S−1/2 (yi − μ ), i = 1, ..., n.

The simultaneous estimates of location and scatter based on the spatial signed-rank
score function are the values of μ and S that satisfy
   
AVE {Q(ε i )} = 0 and p AVE Q(ε i )Q(ε i ) = AVE Q(ε i ) Q(ε i ) I p .

This pair of location and scatter estimates is easy to compute. Again the first
iteration step updates the residuals, the second one updates the location center, and
finally the third one updates the scatter matrix as follows.
1.

εi ← S−1/2 (yi − μ ), i = 1, ..., n.

2.
1 S1/2 AVE{U(ε i + ε j )}
μ← μ + .
2 AVE{|ε i + ε j |−1 }
3.

S ← p S1/2 QCOV(ε ) S1/2 .

Note, however, that the location estimate with inner standardization is now for
the transformation-retransformation Hodges-Lehmann center, not for the regular
Hodges-Lehmann center. In the symmetric case the centers of course are the same.
92 7 One-sample problem: Spatial signed-rank test and Hodges-Lehmann estimate

Example 7.2. Cork boring data Consider again the two datasets, one with a 3-
variate vector of E-N, S-N and W-N and one with a bivariate S-N and W-E. We wish
to estimate the Hodges-Lehmann estimates in the 3-variate case and in the bivariate
cases. We give an R code to find the estimate and its covariance matrix. To compare
the estimates, the 95% confidence ellipsoids for the three estimates are illustrated in
Figures 7.3 and 7.4.

>% 3-variate case


> est <- mv.1sample.est(cork_3v)
> est.sign.i.3v <- mv.1sample.est(cork_3v, "s", "i")
> est.signrank.i.3v <- mv.1sample.est(cork_3v, "r", "i")
> summary(est.signrank.i.3v)
The equivariant spatial Hodges-Lehmann estimator of cork_3v is:
[1] -3.9246 -0.6865 -4.8635

And has the covariance matrix:


[,1] [,2] [,3]
[1,] 2.0198 0.2974 0.6502
[2,] 0.2974 1.8848 1.1947
[3,] 0.6502 1.1947 2.5190
>

> plotMvloc(est, est.sign.i.3v, est.signrank.i.3v, X=cork_3v,


alim="e", color.ell=1:3, , lty.ell=1:3, pch.ell= 15:17)
>
>% 2-variate case
> est <- mv.1sample.est(cork_2v)
> est.sign.i.2v <- mv.1sample.est(cork_2v, "s", "i")
> est.signrank.i.2v <- mv.1sample.est(cork_2v, "r", "i")
> summary(est.signrank.i.2v)
The equivariant spatial Hodges-Lehmann estimator of cork_2v is:
[1] -0.6854 -0.7337

And has the covariance matrix:


[,1] [,2]
[1,] 1.677 1.025
[2,] 1.025 3.198
>
> plotMvloc(est, est.sign.i.2v, est.signrank.i.2v, X=cork_2v,
color.ell=1:3, , lty.ell=1:3, pch.ell= 15:17)
>

7.3 Other approaches

The signed-rank scores tests by Puri and Sen (1971) combine marginal signed-rank
scores tests in the widest symmetric nonparametric model. These tests are not affine
7.3 Other approaches 93

−4 0 2 4 −8 −4 0

0
E_N

−4
−8
0 2 4

0 2 4
S_N
−4

−4
0
−4

W_N
−8

−8 −4 0 −4 0 2 4

sample mean vector


equivariant spatial median
equivariant spatial Hodges−Lehmann estimator

Fig. 7.3 The standardized spatial signed-ranks for the 3-variate data.

−6 −2 2 6
2 4 6

S_N
−2
−6
6
2

W_E
−2
−6

−6 −2 2 4 6

sample mean vector


equivariant spatial median
equivariant spatial Hodges−Lehmann estimator

Fig. 7.4 The standardized spatial signed-ranks for the 2-variate data.
94 7 One-sample problem: Spatial signed-rank test and Hodges-Lehmann estimate

invariant and may have poor efficiency if the marginal variables are dependent. In-
variant versions of Puri-Sen tests are obtained if the data points are first transformed
to invariant coordinates; see Chakraborty and Chaudhuri (1999) and Nordhausen
et al. (2006).

The optimal signed-rank scores tests by Hallin and Paindaveine (2002) are based
on standardized spatial signs (or Randles’ interdirections; see Randles (1989) for the
corresponding sign test) and the ranks of Mahalanobis distances between the data
points and the origin. These tests assume ellipticity but do not require any moment
assumption. The tests are optimal (in the Le Cam sense) at correctly specified (ellip-
tical) densities. They are affine-invariant, robust, and highly efficient under a broad
range of densities. Later Oja and Paindaveine (2005) showed that interdirections
together with the so-called lift-interdirections allow for building totally hyperplane-
based versions of these tests. Nordhausen et al. (2009) constructed optimal signed-
rank tests in the independent component model in a similar way.

The sign and signed-rank test in Hettmansperger et al. (1994) and Hettmans-
perger et al. (1997) are based on multivariate Oja signs and signed-ranks. They can
be used in all models above, are asymptotically equivalent to spatial sign and signed-
rank tests in the spherical case, and are affine-invariant. However, at the elliptic
model, their efficiency (as well as that of the spatial sign and signed-rank tests) may
be poor when compared with the Hallin and Paindaveine tests.

Chaudhuri (1992) gives Bahadur-type representations for the spatial median and
the spatial Hodges-Lehmann estimate. See also Möttönen et al. (2005) for multi-
variate generalized spatial signed-rank methods.
Chapter 8
One-sample problem: Comparisons of tests
and estimates

Abstract The efficiency and robustness properties of the tests and estimates are
discussed. The estimates (the mean vector, the spatial median, and the spatial
Hodges-Lehmann estimate) are compared using their limiting covariance matrices.
The Pitman asymptotical relative efficiencies (ARE) of the spatial sign and spatial
signed-rank tests with respect to the Hotelling’s T 2 are considered in the multivari-
ate t distribution case. The tests using inner and outer standardizations are compared
as well. Simulation studies and some analyses of real datasets are used to illustrate
the difference between the estimates and between the tests.

8.1 Asymptotic relative efficiencies

In this section we first consider the limiting Pitman efficiences of the spatial sign and
signed-rank tests with respect to the classical Hotelling’s T 2 -test in the one-sample
location case. In this comparison we assume that Y is a random sample from a
symmetrical distribution around μ with a p-variate density function f (y − μ ). Here
f (y) is a density function symmetrical around the origin; that is, f (−y) = f (y). We
wish to test the null hypothesis

H0 : μ = 0.

We write L(y) = −∇ log f (y) for the optimal location score function. We also
assume that the Fisher information matrix I = E{L(yi )L(yi ) } is bounded.

Consider the score test statistics of the general form

T = T(Y) = AVE{T(yi )}

for a p-vector valued function T(y). In the one-sample location case, it is natural
to assume that T is odd so that E{T(yi )} = 0. It is then well known that, if one is
interested in the high efficiency of the test, the best choice for the score function
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 95
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 8,
c Springer Science+Business Media, LLC 2010
96 8 One-sample problem: Comparisons of tests and estimates

T(y) is the optimal location score function L(y). Write

L = L(Y) = AVE{L(yi )}

for this optimal test statistic. Also recall that the identity score function, T(y) = y,
yields a test that is asymptotically equivalent with Hotelling’s T 2 and optimal under
multivariate normality.

Using the multivariate central limit theorem we get the followin lemma.

Lemma 8.1. Assume that

A = E{T(yi )L(yi ) } and B = E{T(yi )T(yi ) }

exist and are bounded. Then


     
1/2 T 0 B A
n →d N2p , .
L 0 A I

Theorem 8.1. Assume that the alternative sequences of the form Hn : f (y − n−1/2δ )
are contiguous satisfying, under H0 ,
!
n
f (yi − n−1/2δ ) 1
∑ log f (y i )
= n1/2L δ − δ  I δ + oP(1).
2
i=1

Then, under the alternative sequences Hn , the limiting distribution of the test statis-
tic n1/2 T is a p-variate normal distribution with mean vector Aδ and covariance
matrix B.

Proof. See Möttönen et al. (1997).

Corollary 8.1. Under the sequence of contiguous alternatives the limiting distribu-
tion of the squared test statistic Q2 = nT B−1 T is a noncentral chi-square distribu-
tion with p degrees of freedom and the noncentrality parameter δ  A B−1 Aδ .

In the case of the null distribution, we could show (using Slutsky’s theorem) that

nT B̂−1 T − nT B−1 T →P 0.

This is then true for the contiguous alternative sequences as well, and the limiting χ p2
distribution with noncentrality parameter δ  A B−1 Aδ holds also for the tests where
the true value B is replaced by its convergent estimate B̂.
8.1 Asymptotic relative efficiencies 97

As all the test statistics have limiting distributions of the same type, χ p2 , the
Pitman asymptotical relative efficiencies (ARE) of the multivariate sign test and
multivariate signed-rank test relative to Hotelling’s T 2 , are simply the ratios of the
noncentrality parameters,
δ  A B−1 Aδ
ARE = .
δ  Σ −1 δ

In the following we compare the efficiency of the tests in the case of spherically
symmetrical distribution F. Assume that F is the distribution of y and write, as
before, r = |y| and u = |y|−1 y. In the spherically symmetric case the Pitman ARE
of the spatial sign test with respect to the Hotelling’s T 2 is then simply
 
p−1
ARE1 = E{r2 }E2 {r−1 }.
p

The asymptotic relative efficiencies when the underlying population is a multivariate


spherical t distribution were derived by Möttönen et al. (1997). The optimal score
function of the p-variate t-distribution with ν degrees of freedom, tν ,p , is
ν+p
L(y) = y.
ν + y y

The theoretical signed-rank function Q(y) = q(r)u in the multivariate normal and
in the t distribution cases is given in Examples 4.3 and 4.4. The Pitman efficiency
of the spatial signed-rank test with respect to Hotelling’s test is then
 
ν (ν + p)2 2 r  −1
ARE2 = E q(r) E{q2 (r)} ,
p(ν − 2) ν + r2

where r2 /p has an F(p, ν ) distribution. If the observations come from a multivariate


normal distribution N p (0, I), then we get

1 2
ARE2 = E {q(r)r}[E{q2 (r)}]−1 ,
p

where now r2 has a χ p2 distribution. See Möttönen et al. (1997).

The efficiencies in the multivariate t distribution case with some choices of ν


are displayed in Table 8.1. We see that as the dimension p increases and as the
distribution gets heavier tailed (ν gets smaller), the performances of the spatial sign
test and the spatial signed-rank test improve relative to T 2 . The sign test and the
signed-rank test are clearly better than T 2 in heavy-tailed cases. For high dimensions
and very heavy tails, the sign test is the most efficient one. Note that ν = ∞ is the
multivariate normal case.

In the comparison of the asymptotical efficiencies of the estimates of the location


center μ , we first note that all the estimates μ̂ considered here (the mean vector, the
98 8 One-sample problem: Comparisons of tests and estimates

Table 8.1 Asymptotic relative efficiencies of the sign test and the signed-rank test relative to
Hotelling’s T 2 under p-variate t distributions with ν degrees of freedom for selected values of p
and ν .

Sign test Signed-rank test


dimension p ν = 3 ν = 6 ν = ∞ ν = 3 ν = 6 ν = ∞
1 1.62 0.88 0.64 1.90 1.16 0.95
2 2.00 1.08 0.78 1.95 1.19 0.97
4 2.25 1.22 0.88 2.02 1.21 0.98
10 2.42 1.31 0.95 2.09 1.22 0.99

spatial median and the spatial Hodges-Lehmann estimate) are root-n consistent and

n(μ̂ − μ ) →D N p (0, A−1 BA−1 ),

where, as before,

A = E{T(yi )L(yi ) } and B = E{T(yi )T(yi ) }

depend on the chosen score T(y) and the distribution F. The comparison of the
estimates is then based on the asymptotic covariance matrix A−1 BA−1 and possible
global measures of variation are, for example, the geometric mean or the arithmetic
mean of the eigenvalues, that is,
 1/p 
det(A−1 BA−1 ) or tr A−1 BA−1 /p.

The former is more natural as it is invariant under affine transformations to the orig-
inal observations. Note also that the volume of the approximate confidence ellipsoid
is proportional to det(A−1 BA−1 ). In the case of elliptically symmetric distributions,
it is then enough to consider the spherical cases only and, when comparing two
estimates, the ratio of the geometric means of the eigenvalues is the same as the
asymptotic relative efficiency of the corresponding tests. The asymptotical efficien-
cies listed in Table 8.1, for example, then also hold true for the corresponding esti-
mates.

8.2 Finite sample comparisons

Example 8.1. Estimates and confidence ellipsoids for data sets with outliers.
In our first example we consider again the 3-variate and bivariate datasets with mea-
surements E-N, S-N and W-N and S-N and W-E, respectively. We use the estimates
with inner standardization in the comparisons. Figure 8.1 shows how different es-
timates and corresponding confidence ellipsoids change if the first observation in
the 3-variate data set is moved to (−50, 50, −50) (one outlier). The estimates and
8.2 Finite sample comparisons 99

the confidence ellipsoids for the original dataset are given in Figure 7.3. Note that
the sample mean and the corresponding confidence ellipsoid react strongly to the
outlying observation. The sample mean moves in the direction of the outlier and the
shape of the ellipsoid is also changing. The R code used for the comparison is as
follows.

> cork_3v_cont <- cork_3v


> cork_3v_cont[1,] <- cork_3v[1,] + c(-50, 50, -50)
>
> est.c <- mv.1sample.est(cork_3v_cont)
> est.sign.i.3v.c <- mv.1sample.est(cork_3v_cont, "s", "i")
> est.signrank.i.3v.c <- mv.1sample.est(cork_3v_cont, "r", "i")
>
> plotMvloc(est.c, est.sign.i.3v.c, est.signrank.i.3v.c,
X=cork_3v_cont,
alim="e", color.ell=1:3, , lty.ell=1:3, pch.ell= 15:17)

−6 −2 2 6 −12 −8 −4 0

0
−4

E_N
−12 −8
6

6
2

S_N
−2

−2
0 −6

−6
−4

W_N
−8
−12

−12 −8 −4 0 −6 −2 2 6

sample mean vector


equivariant spatial median
equivariant spatial Hodges−Lehmann estimator

Fig. 8.1 The confidence ellipsoids for the contaminated 3-variate data.

In Figure 8.2 a similar behavior of the estimates is illustrated for the bivariate
data. Now the first observation is replaced by (50, 50) . The estimates and the confi-
dence ellipsoids for the original data set are given in Figure 7.4. The R code follows.
100 8 One-sample problem: Comparisons of tests and estimates

> cork_2v_cont <- cork_2v


> cork_2v_cont[1,] <- cork_2v[1,] + c(50, 50)
>
> est.c <- mv.1sample.est(cork_2v_cont)
> est.sign.i.2v.c <- mv.1sample.est(cork_2v_cont, "s", "i")
> est.signrank.i.2v.c <- mv.1sample.est(cork_2v_cont, "r", "i")
>
> plotMvloc(est.c, est.sign.i.2v.c, est.signrank.i.2v.c,
X=cork_2v_cont,
alim="e", color.ell=1:3, , lty.ell=1:3, pch.ell= 15:17)

−6 −2 2 6

6
2
S_N

−2
−6
6
2

W_E
−2
−6

−6 −2 2 6

sample mean vector


equivariant spatial median
equivariant spatial Hodges−Lehmann estimator

Fig. 8.2 The confidence ellipsoids for the contaminated bivariate data.

Example 8.2. Comparison of the estimates in the multivariate normal and t


distribution cases. To illustrate the finite sample efficiencies of the estimates we
generated random samples of size n = 100 from a bivariate normal distribution and
from a 3-variate t3 distribution. The estimates with corresponding 95 % confidence
ellipsoids are found in Figures 8.3 and 8.4.
8.2 Finite sample comparisons 101

−0.2 0.0 0.2

0.0
−0.2
var 1

−0.5
0.2

var 2
0.0
−0.2

−0.5 −0.2 0.0

sample mean vector


equivariant spatial median
equivariant spatial Hodges−Lehmann estimator

Fig. 8.3 A sample from a bivariate normal distribution: estimates with 95% confidence ellipsoids.

−0.2 0.2 0.6 −0.1 0.2 0.4


0.4
0.0

var 1
−0.4
0.6

0.6
0.2

0.2

var 2
−0.2

−0.2
0.2 0.4

var 3
−0.1

−0.4 0.0 0.4 −0.2 0.2 0.6

sample mean vector


equivariant spatial median
equivariant spatial Hodges−Lehmann estimator

Fig. 8.4 A sample from a 3-variate t3 distribution: estimates with 95% confidence ellipsoids.
102 8 One-sample problem: Comparisons of tests and estimates

As expected, in the multivariate normal case, the accuracies of the sample mean vec-
tor and the Hodges-Lehmann estimate are almost the same (the asymptotic relative
efficiency of the HL-estimate is close to one), and the spatial median is poorest in
this sense. In the heavy-tailed t3 distribution case, the spatial median has the smallest
confidence ellipsoid, and the mean vector is now very poor in its efficiency. These
results are well in accordance with the asymptotical efficiencies reported in Table
8.1. The datasets, estimates, and plots were obtained as follows.

> set.seed(1234)
> X.N <- rmvnorm(100,c(0,0))
>
> est1.N <- mv.1sample.est(X.N)
> est2.N <- mv.1sample.est(X.N, "s", "i")
> est3.N <- mv.1sample.est(X.N, "r", "i")
>
> plotMvloc(est1.N, est2.N, est3.N, X.N, color.ell=1:3, ,
lty.ell=1:3, pch.ell= 15:17)
>
> set.seed(1234)
> X.t3 <- rmvt(100, diag(3), 3)
>
> est1.t3 <- mv.1sample.est(X.t3)
> est2.t3 <- mv.1sample.est(X.t3, "s", "i")
> est3.t3 <- mv.1sample.est(X.t3, "r", "i")
>
> plotMvloc(est1.t3, est2.t3, est3.t3, X.t3, alim="e",
color.ell=1:3, , lty.ell=1:3, pch.ell= 15:17)

Example 8.3. Outer standardization or inner standardization? In this small


study we consider the effect of standardization. If outer standardization is used then
the spatial sign test and the spatial signed-rank test are not affine invariant. This
means that the p-value depends on the used measurement unit, for example. To
show the importance of the inner standardization we generated n = 150 observations
from an N p ((0.1, 0) , I2 ) distribution. The null hypothesis to be tested, H0 : μ = 0,
is not true. To see the effect of the measurement unit, we then rescaled the first
component by multiplying it with values c = 0.1, 0.3, 0.6, 1, 2, 3, 5 and calculated, in
all these seven cases, the p-values coming from Hotelling’s T 2 -test, from the spatial
sign test (with inner and outer standardization), and from the spatial signed-rank test
(with inner and outer standardization). See Figure 8.5 for the results. The p-values
of the spatial sign test and the spatial signed-rank test with outer standardization
depend strongly on the rescaling constant c. Note that the p-value obtained from the
spatial sign test with outer standardization varies from the smallest p-value to the
largest one with c. The invariant test versions behave as expected: the p-values are
in the order of the asymptotic efficiency of the tests. The R code used is lengthy but
available on request.
8.2 Finite sample comparisons 103

0.8

0.6
p−value

0.4

0.2
Hotelling T^2
Inner spatial sign test
Inner spatial signed−rank test
Outer spatial sign test
0.0 Outer spatial signed−rank test

0 1 2 3 4 5

Fig. 8.5 The p-values of the tests as a function of the measurement unit. The second component
is multiplied by c.

Example 8.4. Simulation studies to compare the tests. Next we compare the
finite sample efficiencies of the three competing tests, Hotelling’s T 2 -test, the spa-
tial sign test, and the spatial signed-rank test. Again the samples of sizes n = 50
were simulated from a 3-variate standard normal distribution and from a 3-variate
t3 distribution with covariance matrices I3 . The powers of the tests for alternatives
μ = (0, 0, 0) , (0, 0, 0.25), (0, 0, 0.50), and (0, 0, 0.75) were estimated by generat-
ing 1000 samples of size n = 50 from all these distributions. In the tests we used
the asymptotical critical value χ p,0.95
2 . The R code for the simulation in the 3-variate
normal case with the results follows.

> # Simulation in the normal case


>
> set.seed(1)
> Hot.N.0.00 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0)))$p.value)
> Hot.N.0.25 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.25)))$p.value)
> Hot.N.0.50 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.50)))$p.value)
> Hot.N.0.75 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.75)))$p.value)
>
104 8 One-sample problem: Comparisons of tests and estimates

> set.seed(1)
> Sign.N.0.00 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0)),score="s",stand="i")$p.value)
> Sign.N.0.25 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.25)),score="s",stand="i")$p.value)
> Sign.N.0.50 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.50)),score="s",stand="i")$p.value)
> Sign.N.0.75 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.75)),score="s",stand="i")$p.value)
>
> set.seed(1)
> Rank.N.0.00 <- replicate(1000, mv.1sample.test(rmvnorm
(50, c(0,0,0)),score="r",stand="i")$p.value)
> Rank.N.0.25 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.25)),score="r",stand="i")$p.value)
> Rank.N.0.50 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.50)),score="r",stand="i")$p.value)
> Rank.N.0.75 <- replicate(1000, mv.1sample.test
(rmvnorm(50, c(0,0,0.75)),score="r",stand="i")$p.value)
>
> power.Hot.N <- rowMeans(rbind(Hot.N.0.00, Hot.N.0.25,
Hot.N.0.50, Hot.N.0.75) <= 0.05)
> power.Sign.N <- rowMeans(rbind(Sign.N.0.00, Sign.N.0.25,
Sign.N.0.50, Sign.N.0.75) <= 0.05)
> power.Rank.N <- rowMeans(rbind(Rank.N.0.00, Rank.N.0.25,
Rank.N.0.50, Rank.N.0.75) <= 0.05)
>
> res.N <- cbind(delta= seq(0,0.75,0.25), power.Hot.N,
power.Sign.N, power.Rank.N)
> rownames(res.N) <- NULL
> res.N
delta power.Hot.N power.Sign.N power.Rank.N
[1,] 0.00 0.058 0.040 0.035
[2,] 0.25 0.323 0.235 0.251
[3,] 0.50 0.873 0.747 0.818
[4,] 0.75 0.996 0.984 0.992
>

First note that the spatial sign test and the spatial signed-rank test seem too con-
servative; the true rejection probability seems to be smaller than 0.05. Therefore the
permutation version of the tests should be used for small sample sizes. The Hotelling
test is naturally the best one in this case. Next the results in the t3 distribution case
follow. Again, the spatial sign test is most efficient in this case. This is in agreement
with the asymptotical relative efficiencies of the tests.

> res.T
delta power.Hot.T power.Sign.T power.Rank.T
[1,] 0.00 0.050 0.039 0.031
[2,] 0.25 0.190 0.215 0.204
8.2 Finite sample comparisons 105

[3,] 0.50 0.511 0.651 0.581


[4,] 0.75 0.835 0.958 0.930
>
Chapter 9
One-sample problem: Inference for shape

Abstract The one-sample multivariate shape problem is considered. The shape


matrix is a scatter matrix rescaled or normalized in a certain way. The procedures
here are based on multivariate spatial signs and spatial ranks. Tests and estimates
based on the spatial sign covariance matrix and the spatial rank covariance matrices
of different types are considered. The asymptotic efficiencies show a good perfor-
mance of these methods, particularly for heavy-tailed distributions.

9.1 The estimation and testing problem

We consider the one-sample case where Y = (y1 , y2 , ..., yn ) is a random sample


from a symmetrical distribution. We again assume that the observations are gener-
ated by
yi = μ + Ω ε i , i = 1, ..., n,
where the ε i are centered and standardized residuals with cumulative distribution
function F(ε ) and density function f (ε ). Different assumptions on the distribution
of ε i yield different parametric, nonparametric, and semiparametric models. Param-
eter μ is the unknown location center, and Σ = Ω Ω  the unknown scatter matrix.

In previous chapters we tested the null hypothesis H0 : μ = 0 and estimated


unknown μ . The tests and estimates were based on different score functions (iden-
tity, spatial sign, and spatial signed-rank). The inner standardization to attain the
affine invariance of the location tests or the affine equivariance of the location es-
timates also yielded the corresponding scatter matrix or shape matrix estimates.
In this chapter we compare the shape matrix estimates and the corresponding tests
obtained in this way. The proofs of the results here can mostly be found in Sirkiä
et al. (2008).

H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 107
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 9,
c Springer Science+Business Media, LLC 2010
108 9 One-sample problem: Inference for shape

We recall that assumptions on the distribution of ε i were needed to fix parameters


μ and Σ in a natural way. In all symmetric models (A0)–(A4) parameter μ is well
defined as the unique symmetry center of the distribution of yi . Scatter parameter Σ
is uniformly defined only in the multivariate normal model (A0), and defined up to
a constant in the elliptic model (A1) as well as in the location-scatter model (A2).
We refer back to Chapter 2 for a discussion of the symmetric models.

In this chapter we consider the inference tools for Σ in models (A0), (A1), and
(A2), and the main focus is on the procedures based on the spatial signs and ranks.
The scatter parameter Σ is assumed to be nonsingular and it is decomposed, as
in Section 2.2, into two parts by Σ = σ 2Λ where σ 2 = σ 2 (Σ ) > 0 is a scalar-
valued scale parameter and Λ = σ −2 Σ is a matrix-valued shape parameter. The
scale functional σ 2 (Σ ) is supposed to satisfy

σ 2 (I p ) = 1 and σ 2 (cΣ ) = cσ 2 (Σ ).

In the literature, the scale σ 2 = σ 2 (Σ ) has been defined as

tr(Σ ) p
Σ11 , , , or det(Σ )1/p .
p tr(Σ −1 )

The shape matrix Λ can be seen as a normalized version of the scatter matrix Σ and
is a well-defined parameter in models (A0), (A1), and (A2). See also Paindaveine
(2008). We wish to test the null hypothesis of sphericity, H0 : Λ = I p , and estimate
the unknown value of Λ . This is not a restriction: if one is interested in testing the
−1/2
null hypothesis H0 : Λ = Λ 0 , then one can first transform yi → Λ 0 yi and then
apply the test to the transformed observations.

In many classical problems in multivariate analysis it is sufficient to base the


analysis on the estimate of Λ only. The applications include the (robust) estimation
of the correlation matrix, principal component analysis (PCA), canonical correlation
analysis (CCA), and multivariate regression analysis among others.

Most of the results here are given under the assumption of elliptical symmetry.
Recall that a random variable yi is elliptically symmetric if ε i is spherically sym-
metric. The density function of yi is then of the form


det(Σ )−1/2 f Σ −1/2 (y − μ ) ,

where
f (ε ) = exp(−ρ (|ε |))
with some function ρ . Then parameter μ is the symmetry center of the distribution
of yi , and parameter Σ is a positive definite symmetric p × p scatter matrix. It is easy
to see that the shape matrix estimate

Λ̂ = σ 2 (S)−1 S
9.2 Important matrix tools 109

based on any consistent scatter matrix S is a consistent estimate of the corresponding


population quantity Λ .

9.2 Important matrix tools

Here we recall some matrix tools introduced and already used in Section 3.3. See
also Appendix A. As before, K p,p is the commutation matrix, that is, a p2 × p2 block
matrix with (i, j)-block being equal to a p × p matrix that has one at entry ( j, i) and
zero elsewhere, and J p,p for vec(I p )vec(I p ) . These matrices K p,p and J p,p have the
following interesting properties.

K p,p vec(A) = vec(A ) and J p,p vec(A) = tr(A)vec(I p ) .

Matrix
1 1
C p,p = (I p2 + K p,p) − J p,p
2 p
projects a vectorized matrix vec(A) to the space of symmetrical and centered vec-
torized matrices. (Recall that C p,p = P1 + P2 where P1 and P2 are projections intro-
duced in Section 3.3.) The tests and estimates for the shape parameter are based on
the squared norm of such a projection,

Q2 (A) = |C p,p vec(A)|2 ,

which is proportional to the variance of the eigenvalues of a symmetrized version of


A. It is easy to see that, for symmetrical positive definite p × p matrices A,

Q2 (A) = 0 ⇔ A ∝ I p .

In the following, we also often need the notation


1 1
C p,p (V) = (I p2 + K p,p)(V ⊗ V) − vec(V)vec(V) .
2 p

Clearly C p,p (I p ) = C p,p.

Finally, if A is a nonnegative definite symmetric p × p matrix with r positive


eigenvalues λ1 ≥ · · · ≥ λr > 0, p ≥ r ≥ 1, and if o1 , ..., or are the corresponding
eigenvectors, then the Moore-Penrose inverse of the matrix
r
A = ∑ λi oi oi
i=1

is
110 9 One-sample problem: Inference for shape
r
A− = ∑ λi−1 oi oi .
i=1

9.3 The general strategy for estimation and testing

A general idea to construct tests and estimates for location was to use a p-vector
valued score function T(y) yielding individual scores Ti = T(yi ), i = 1, ..., n. To
attain affine equivariance/invariance of the location procedures we used either
• Inner standardization of the scores: Find transformation matrix S−1/2 such that,
if T̂i = T(S−1/2 yi ), then

p · AVE{T̂i T̂i } = AVE{T̂i T̂i }I p ;

or
• Inner centering and standardization of the scores: Find shift vector μ̂ and trans-
formation matrix S−1/2 such that, if T̂i = T(S−1/2 (yi − μ̂ )), then

AVE{T̂i } = 0 and p · AVE{T̂i T̂i } = AVE{T̂i T̂i }I p .

In the first case, depending on the chosen score, one gets a scatter or shape matrix
estimate S = S(Y) with respect to the origin. In the second case, simultaneous es-
timates of location and scatter, μ̂ and S, are obtained. Of course one should check
separately in each case whether the estimates really exist for a data set at hand.
Also it is not at all clear whether the estimates using different scores estimate the
same population quantity. This did not present a problem in the location problem
as the transformation S−1/2 was seen there only as a natural tool to attain affine
invariance/equivariance.

The algorithm for the shape matrix estimate S = S(Y) with respect to the origin
then uses the following two steps.
1.
T̂i ← T(S−1/2 yi ), i = 1, ..., n; T̂ ← (T̂1 , ..., T̂n ) .
2.
p
S ← S1/2 T̂ T̂S1/2 .
tr(T̂ T̂)

The test statistic for testing H0 : Λ = I p is simply


 "  "2
Q2 n−1 T T = "C p,pvec n−1 T T " .

Note that in our approach n−1 T T is COV, UCOV, TCOV or RCOV depending on
which score function is chosen.
9.3 The general strategy for estimation and testing 111

Several approaches based on the regular covariance matrix for testing the shape
in the multivariate normal and elliptic case can be found in the literature. These
tests are thus based on the identity score T(y) = y and on the regular covariance
matrix. Mauchly (1940) showed that in the multivariate normal distribution case the
likelihood ratio test for testing the sphericity, that is, the null hypotheses H0 : Λ = I p ,
is given by
 n/2
det(COV)
L= ,
[tr(COV)/p] p
where COV = COV(Y) is the regular sample covariance matrix. Note that L is es-
sentially the ratio of two scale parameters det(COV)1/p and tr(COV)/p. Under the
null hypotheses, −2 log L ∼ χ(p+2)(p−1)/2
2
. Muirhead and Waternaux (1980) showed
that the test based on L may also be used to test the sphericity under elliptical mod-
els with finite fourth moments. Later, Tyler (1983) obtained a robust version of the
likelihood ratio test by replacing the sample covariance matrix with a robust scatter
matrix estimator.

Also John (1971) and John (1972) considered the testing problem at the normal
distribution case. He showed that the test
" "  
np2 "" COV 1 ""2 np2 2 COV
Q2J = − I = Q ,
2 " tr(COV) p "
p
2 tr(COV)

is the locally most powerful invariant test for sphericity under the multivariate nor-
mality assumption. This test is, however, valid only under the multivariate normality
assumption. In the wider elliptical model one can use a slight modification of John’s
test, which remains asymptotically valid under elliptical distributions but of course
needs the assumption on the finite fourth moments. The modified John’s test is de-
fined as  
np2 COV
QJ  =
2
Q 2
,
2(1 + κF ) tr(COV)
where κF is the value of the classical kurtosis measure based on the standardized
fourth moment of the marginal distribution, that is,

E(ε 4i j )
κF = − 3.
E 2 (ε 2i j )

In the multivariate normal case κF = 0. In practice, κF must be replaced by its esti-


mate, the corresponding sample statistic. See also Muirhead and Waternaux (1980)
and Tyler (1982).
112 9 One-sample problem: Inference for shape

9.4 Test and estimate based on UCOV

Assume that Y = (y1 , ..., yn ) is a random sample from an elliptically symmetric


distribution with symmetry center μ = 0 and shape parameter Λ . We wish to test
the null hypothesis
H0 : Λ = I p .
The null hypothesis then says that the observations are coming from a spherical
distribution. In efficiency studies we consider contiguous alternative sequences

Hn : Λ ∝ I p + n−1/2D,

where D is a symmetric matrix. Note that D fixes the “direction” for the alternative
sequence.

The spatial sign covariance matrix is defined as


 
UCOV = AVE Ui Ui .

Under the null hypothesis


1 τ
E(vec(UCOV)) = vec(I p ) and COV(vec(UCOV)) = C p,p ,
p n
where
2
τ= ,
p(p + 2)
and therefore also
τ
E(C p,p vec(UCOV)) = 0 and COV(C p,p vec(UCOV)) = C p,p .
n
(C p,p is a projection matrix.)

The test statistic is proportional to the variance of the eigenvalues of the spatial
sign covariance matrix.
Definition 9.1. The spatial sign test statistic is defined as

Q2 = Q2 (UCOV) = |C p,pUCOV|2 .

Recall that the value of Q2 (S) is equal to zero if and only if S ∝ I p . It is remark-
able that the finite sample (and limiting) null distribution of Q2 is the same for all
spherical distributions as the sign covariance matrix UCOV depends on the observa-
tions only through their direction vectors. The limiting distribution of Q2 under the
null hypothesis as well as under the alternative sequence is given by the following.

Theorem 9.1. Under the alternative sequence Hn ,


9.4 Test and estimate based on UCOV 113
 
n 2 1
Q →d χ(p+2)(p−1)/2
2
Q 2
(D) .
τ τ (p + 2)2

In a wider model (A3) where the shape parameter Λ is still welldefined, the
covariance matrix of vec(UCOV) can be estimated by

 1   
COV(vec(UCOV)) = AVE Ui Ui ⊗ Ui Ui − vec(UCOV)vec(UCOV) .
n

In the elliptic case n COV(vec(UCOV)) → p τ1 C p,p , and the statistic (which is valid
in the wider model)


(C p,p vec(UCOV)) COV(vec(UCOV))−
(C p,pvec(UCOV))

is asymptotically equivalent to (n/τ )Q2 .

Next we introduce the shape matrix estimate S = S(Y) corresponding to the spa-
tial sign score and give its limiting distribution in the elliptic case. This estimate
was already used to standardize the observations in the one sample location testing
problem.

Definition 9.2. The Tyler shape estimate S based on spatial signs is the matrix that
solves
1
UCOV(YS−1/2 ) = Ip (9.1)
p
The estimate was given in Tyler (1987) where the limiting distribution is also
found.

Theorem 9.2. Under elliptical symmetry with shape parameter Λ , the limiting dis-
tribution of the shape estimate S is given by
√ 
n vec(S − Λ ) →d N p2 0, (p + 2)2τ C p,p (Λ ) .

Here Λ and S are normalized so that det(Λ ) = det(S) = 1.

It is remarkable that in the elliptic case this estimate is distribution-free.

The case of unknown location μ is√also considered in Tyler (1987). It is, for
example, possible to replace μ with a n-consistent estimate μ̂ without affecting
the asymptotic properties of UCOV or S. As mentioned before, Hettmansperger and
Randles (2002) propose a simultaneous estimation of the multivariate median μ and
a shape matrix Λ .
114 9 One-sample problem: Inference for shape

9.5 Test and estimates based on TCOV

For the null hypothesis H0 : Λ = I p , the multivariate Kendall’s tau-type rank test
statistic TCOV = TCOV(Y) is constructed in exactly the same way as the sign
test statistic but for the pairwise differences. We denote the pairwise differences by
yi j = yi − y j and their spatial signs by Ui j = U(yi j ), 1 ≤ i, j ≤ n.
Definition 9.3. The Kendall’s tau covariance matrix is defined as
 
TCOV = AVEi< j Ui j Ui j .

This matrix is introduced and studied in Visuri et al. (2000). Because vec(TCOV)
is a U-statistic with bounded vector-valued kernel

h(yi , y j ) = Ui j ⊗ Ui j ,

the limiting multinormality easily follows with


 
−1 τF 1
E(vec(TCOV)) = p vec(I p ) and COV(vec(TCOV)) = C p,p + o
n n

with some τF > 0. In the general case, the covariance may be estimated by

 4   
COV(vec(TCOV)) = AVE Ui j Uik ⊗ Ui j Uik − vec(TCOV)vec(TCOV) .
n
The test statistic based on Kendall’s tau covariance matrix is defined as follows.
Definition 9.4. The Kendall’s tau test statistic is defined as

Q2 = Q2 (TCOV) = |C p,pTCOV|2 .

Again, it holds that tr(TCOV) = 1 and so Q2 = 0 only when TCOV = (1/p)I p .


Note that because the test statistics is based on pairwise differences, parameter μ
need not be known.
Theorem 9.3. Under the alternative sequence Hn , with a constant τF > 0 depending
on F,  
n 2 1
Q →d χ(p+2)(p−1)/2
2
Q 2
(D) .
τF (p + 2)2τF
In practice, the distribution of the observations, or more specifically the distri-
bution of the length of the observations, is not specified, and the coefficient τF
is not known and has to be estimated. See Sirkiä et al. (2008). Alternatively, as

n COV(vec(TCOV)) → p τF C p,p in the spherical case, one may use the statistic


(C p,pvec(TCOV)) COV(vec(TCOV))−
(C p,p vec(TCOV)) .

The Kendall’s tau test statistic naturally leads to a companion estimate of shape.
9.6 Tests and estimates based on RCOV 115

Definition 9.5. The Dümbgen shape estimate S based on the spatial signs of pair-
wise differences, is the one that solves
1
TCOV(YS−1/2 ) = Ip.
p
The estimator was first introduced by Dümbgen (1998) and further studied by
Sirkiä et al. (2007) as a member of a class of symmetrized M-estimates of scatter.
The algorithm to calculate S is similar to that for calculating Tyler’s estimate. The
breakdown properties were considered in Dümbgen and Tyler (2005). The Dümbgen
estimate is again affine equivariant in the sense that S(YA ) ∝ AS(Y)A .

The limiting distribution in the elliptic case is given in the following theorem.
Theorem 9.4. At elliptical distribution with shape parameter Λ , the limiting distri-
bution of the shape estimate S is given by
√ 
nvec(S − Λ ) →d N p2 0, (p + 2)2τF C p,p(Λ ) .

9.6 Tests and estimates based on RCOV

The Spearman’s rho-type test statistic for the null hypothesis H0 : Λ = I p is con-
structed in the same way as the spatial sign test statistic but using spatial rank co-
variance matrix RCOV = RCOV(Y) instead of the spatial sign covariance matrix.
Denote again yi j = yi − y j and Ui j = S(yi j ).
Definition 9.6. The spatial rank covariance matrix is defined as
   
RCOV = AVE Ri Ri = AVE Ui j Uik .

This matrix is considered in Marden (1999b) and Visuri et al. (2000). Now
vec(RCOV) is (up to a constant) asymptotically equivalent to a U-statistic with
symmetric kernel

h(y1 , y2 , y3 ) = vec U12 U13 + U13U12 + U21 U23 + U23 U21 + U31U32 + U32 U31 ,

covering all six possible permutations of the three arguments. Then under the null
hypothesis,
 
τF 1
E(C p,p vec(RCOV)) = 0 and COV(C p,p vec(RCOV)) = C p.p + o ,
n n

again with some constant τF . In the general case, the covariance may be estimated
with
 p,p vec(RCOV)) =
COV(C
9   
AVE h(yi , y j , yk )h(yi , yl , ym ) − vec(RCOV)vec(RCOV) .
n
116 9 One-sample problem: Inference for shape

Definition 9.7. The Spearman’s rho test statistic is defined as Q2 = Q2 (RCOV).


Note that the trace of RCOV is not fixed any more but varies from sample to sam-
ple. Moreover, even though the expected value of RCOV under the null hypothesis
is always proportional to the identity matrix, the expected value of the trace depends
on the distribution. Still, Q2 is equal to zero only when RCOV is proportional to the
identity matrix. The asymptotic distribution of Q2 is given in the following.
Theorem 9.5. Under the alternative sequence Hn ,
 
n 2 (c2F )2
Q →d χ(p+2)(p−1)/2
2
Q 2
(D) ,
τF (p + 2)2τF

where τF and c2F depend on the background distribution F.


Again, the coefficient τF needs to be estimated when the distribution of the data is
 p,p vec(RCOV)) → p τF C p,p in the spherical
not known. Alternatively, as n COV(C
case, one may use the statistic

 p,p vec(RCOV))− (C p,p vec(RCOV)) ,


(C p,p vec(RCOV)) COV(C

which is asymptotically equivalent to (n/τF )Q2 .

As in the previous sections, it is possible to define a shape estimate corresponding


to the above test.
Definition 9.8. The shape estimate S = S(Y) based on the rank covariance matrix
is the one for which Q2 (RCOV(YS−1/2 )) = 0, that is, for which

RCOV(YS−1/2 ) ∝ I p .

Unfortunately, there is no proof of the uniqueness or even existence of the solu-


tion. It seems to us, however, that one can use an iterative algorithm with steps
p
S ← S1/2 RCOVS1/2 and S ← S.
tr(S)

In practice the algorithm always seems to yield a unique solution. The following
theorem gives
√ the limiting distribution of the shape estimator assuming that it is
unique and n-consistent.

Theorem 9.6. Under the assumptions above, for an elliptical distribution with
shape parameter Λ , the limiting distribution of the shape estimate S is given by
√ 
n vec(S − Λ )→d N p2 0, (p + 2)2τF (c2F )−1 C p,p (Λ ) .
9.7 Limiting efficiencies 117

9.7 Limiting efficiencies

We next compare the sphericity tests based on UCOV, TCOV, and RCOV to the
classical (modified) John’s test. The modified John’s test is based on the test statistic
 
np2 COV
Q2J = Q2 ,
2(1 + κF ) tr(COV)

where κF is the classical kurtosis measure of the marginal distribution. The limiting
distribution of the modified John’s test under the alternative sequence is derived in
Hallin and Paindaveine (2006) and is given in the following.
Theorem 9.7. Under the alternative sequence Hn ,
 
1
Q2J →d χ(p+2)(p−1)/2
2
Q2 (D) .
2(1 + κF )

The limiting distributions of different test statistics are of the same type, therefore
the efficiency comparisons may simply be based on their noncentrality parameters.
The Pitman asymptotic relative efficiencies of tests based on UCOV, TCOV and
RCOV with respect to the modified John’s test (based on COV) reduce to

p(1 + κF ) 2(1 + κF ) 2(c2F )2 (1 + κF )


, , and .
p+2 (p + 2)2τF (p + 2)2τF

Recall that κF , τF , and c2F are constants depending on the underlying distribution.
Note that the Pitman AREs give the asymptotical relative efficiencies of the corre-
sponding shape estimates as well.

In Table 9.1, the limiting efficiencies are given under t-distributions with some
selected dimensions p and some degrees of freedom ν , with ν = ∞ referring again to
the multivariate normal case. Note that κF = 0 for the multivariate normal distribu-
tion, and κF = 2/(ν − 4) for the multivariate tν distribution. Formulas for calculat-
ing the c2F coefficients can be found in Möttönen et al. (1997). One can see that the

Table 9.1 Asymptotic relative efficiencies of tests (estimates) based on UCOV, TCOV, and
RCOV relative to the test based on COV for different t-distribution cases with selected values
of dimension p and degrees of freedom ν

p=2 p=4 p=5


ν UCOV TCOV RCOV UCOV TCOV RCOV UCOV TCOV RCOV
5 1.50 2.43 2.42 2.00 2.62 2.56 2.14 2.71 2.63
8 0.75 1.26 1.25 1.00 1.32 1.30 1.07 1.35 1.31
15 0.59 1.04 1.04 0.79 1.07 1.06 0.84 1.08 1.07
∞ 0.50 0.93 0.95 0.67 0.95 0.97 0.71 0.95 0.99
118 9 One-sample problem: Inference for shape

rank-based tests based on TCOV and RCOV behave very similarly and are highly
efficient even in the normal case. The test based on UCOV is less efficient than
the rank-based tests but still outperforms the classical test for heavy-tailed distribu-
tions. Note that the efficiencies increase with dimension. See Sirkiä et al. (2008) for
a more complete discussion and for the finite-sample efficiencies.

9.8 Examples

Example 9.1. Cork boring data: The tests and estimates for shape. To illustrate
the estimated shape matrices we plot the corresponding estimates of the 50 % toler-
ance regions. The estimated tolerance regions based on a location estimate T and a
shape estimate S are constructed as follows. First calculate the squared Mahalanobis
distances based on S and T; that is,

ri2 = |yi − M|2S = (yi − T) S−1 (yi − T), i = 1, ..., n.

The estimated 50 % tolerance region is then the ellipsoid


 
y : |y − M|2S ≤ Med{r12 , ..., rn2 } .

We compare the tolerance ellipsoids for the shape matrices based on COV,
UCOV (Tyler’s shape), and TCOV (Dümbgen’s shape). The corresponding loca-
tion estimates are the sample mean, the (affine equivariant) spatial median, and the
(affine equivariant) Hodges-Lehman estimate. Note that if we are interested in the
shape matrix, only the shape (not the size or location) of the tolerance ellipsoid is
relevant. The shape should be circular or spherical in the case that Λ = I p . The
tolerance ellipsoids for the 3-variate cork boring data are given in Figure 9.1.

The figures are obtained using the following R code.

> data(cork)
> cork_3v <- sweep(cork[,2:4], 1, cork[,1], "-")
> colnames(cork_3v) <- c("E_N", "S_N", "W_N")
>
> EST1 <- list(location = colMeans(cork_3v),
scatter = cov(cork_3v),
est.name = "COV")
>
> HR.cork_3v <- HR.Mest(cork_3v)
> EST2 <- list(location = HR.cork_3v$center,
scatter = HR.cork_3v$scatter,
est.name = "Tyler")
>
9.8 Examples 119

−10 0 5 10 −20 −10 0 5

5
0
E_N

−10
−20
10

10
5

5
S_N
0

0
−10

−10
5
0

W_N
−10
−20

−20 −10 0 5 −10 0 5 10

COV
Tyler
Duembgen

Fig. 9.1 The shape matrix estimates for the 3-variate data.

> EST3 <- list(location = mv.1sample.est(cork_3v, score="rank",


stand = "inner")$location, scatter = duembgen.shape(cork_3v),
est.name = "Duembgen")
>
> plotShape(EST1, EST2, EST3, cork_3v, lty.ell = 1:3,
pch.ell = 14:16, level = 0.5)

The ellipsoids do not seem spherical but only the test based on the regular co-
variance matrix gets a small p-value. (The sample size is too small for a reliable
inference on the shape parameter.) The p-values are as follows.

> mv.shape.test(cork_3v)

Mauchly test for sphericity

data: cork_3v
L = 0.0037, df = 5, p-value = 0.04722

> mv.shape.test(cork_3v, score= "si")

Test for sphericity based on UCOV

data: cork_3v
120 9 One-sample problem: Inference for shape

Q2 = 6.023, df = 5, p-value = 0.3040

> mv.shape.test(cork_3v, score= "sy")

Test for sphericity based on TCOV

data: cork_3v
Q2 = 9.232, df = 5, p-value = 0.1002

Consider the bivariate case next. The estimated shape matrices are illustrated
in Figure 9.2. As in the 3-variate case, the shape estimates based on the regular
covariance matrix and TCOV are close to each other. The p-values are now

−20 −10 0 10

10
5
S_N

0
−10 −5
10
0

W_E
−10
−20

−10 −5 0 5 10
COV
HR
Duembgen

Fig. 9.2 The shape matrix estimates for the 2-variate data.

> mv.shape.test(cork_2v)

Mauchly test for sphericity

data: cork_2v
L = 0.0748, df = 2, p-value = 0.07478
9.8 Examples 121

> mv.shape.test(cork_2v, score= "si")

Test for sphericity based on UCOV

data: cork_2v
Q2 = 0.1885, df = 2, p-value = 0.91

> mv.shape.test(cork_2v, score= "sy")

Test for sphericity based on TCOV

data: cork_2v
Q2 = 2.637, df = 2, p-value = 0.2675

>

Example 9.2. Comparison of the tests and estimates in the t3 case. To illustrate
the finite sample efficiencies of the estimates for a heavy-tailed distribution we gen-
erated a random sample of size n = 150 from a spherical 3-variate t3 distribution.
The null hypothesis H0 : Λ = I p is thus true. The three shape estimates based on
COV, UCOV, and TCOV are illustrated in Figure 9.3. The R code for getting the
figure follows.

> set.seed(1234)
> X<-rmvt(150, diag(3),3)
>
>
> EST1 <- list(location = colMeans(X), scatter = cov(X),
est.name = "COV")
>
> HR.X<-HR.Mest(X)
> EST2 <- list(location = HR.X$center, scatter = HR.X$scatter,
est.name = "Tyler")
>
> EST3 <- list(location = mv.1sample.est(X, score = "rank",
stand = "inner")$location,
scatter = duembgen.shape(X), est.name = "Duembgen")
>
> plotShape(EST1, EST2, EST3, X, lty.ell = 1:3,
pch.ell = 14:16, level = 0.95)

The shape of the regular covariance matrix clearly differs most from the spherical
shape. This can also be seen from the p-values below. The regular covariance matrix
does not seem too reliable in the heavy-tailed distribution case.
122 9 One-sample problem: Inference for shape

−25 −15 −5 5 −10 −5 0 5

4
2
var 1

0
−2
−4
5

5
−5

−5
var 2
−15

−15
−25

−25
5
0

var 3
−5
−10

−4 −2 0 2 4 −25 −15 −5 5

COV
Tyler
Duembgen

Fig. 9.3 Shape estimates for a sample from a 3-variate t3 distribution.

> mv.shape.test(X)

Mauchly test for sphericity

data: X
L = 0, df = 5, p-value = 2.371e-13

> mv.shape.test(X, score= "si")

Test for sphericity based on UCOV

data: X
Q2 = 2.476, df = 5, p-value = 0.7801

> mv.shape.test(X, score= "sy")

Test for sphericity based on TCOV

data: X
Q2 = 1.323, df = 5, p-value = 0.9326
9.9 Principal component analysis based on spatial signs and ranks 123

9.9 Principal component analysis based on spatial signs


and ranks

Let again Y = (y1 , ..., yn ) be a random sample from a p-variate elliptical distribution
with the cumulative distribution function F. Write

COV(F) = ODO

for the eigenvector and eigenvalue decomposition of the covariance matrix. Thus
O is the matrix of eigenvectors and D is the diagonal matrix of eigenvalues of
COV(F). The orthogonal matrix O is then used to transform the random vector
yi to a new coordinate system

zi = O  y i , i = 1, ..., n.

The components of zi in this new coordinate system are called the principal com-
ponents. The principal components are then uncorrelated and ordered according to
their variances (diagonal elements of D). In the multivariate normal case the princi-
pal components are independent. In principal component analysis (PCA) one wishes
to estimate the transformation O to principal components as well as the variances D.
PCA is often used to reduce the dimension of the original vector from p = p1 + p2
to p1 , say. If
O = (O1 , O2 ),
where O1 is a p × p1 matrix and O2 a p × p2 matrix, then the original observations

yi = O1 O1 yi + O2O2 yi , i = 1, ..., n,

are approximated by the first part

ŷi = O1 O1 yi = o1 zi1 + · · · + o p1 zip1 , i = 1, ..., n.

As shown before in Theorem 4.4, the eigenvalue and eigenvector decompositions of

UCOV and TCOV satisfy, for the elliptical distribution F,

UCOV(F) = ODU O and TCOV(F) = ODT O

with the same principal component transformation. The same is naturally true for the
scatter or shape matrix functionals S that are based on UCOV and TCOV, namely
for Tyler’s shape estimate and Dümbgen shape estimate. This means that the eigen-
vectors of the sample matrices can be used to estimate the unknown population
eigenvectors. Locantore et al. (1999) and Marden (1999b) proposed the use of the
spatial signs and ranks for a simple robust alternative of classical PCA. See also
Visuri et al. (2000) and Croux et al. (2002).
124 9 One-sample problem: Inference for shape

All the scatter and shape matrices mentioned above are root-n consistent and
have a limiting multivariate normal distribution. We next find the limiting distri-
bution of the corresponding sample eigenvalue matrix, that is, an estimate of the
principal component transformation. For simplicity we assume that O = I p and that
the eigenvalues listed in D are distinct. (The limiting distribution in the general case
is found simply by using the rotation equivariance properties of the estimates.) Then
we have the following.

Theorem 9.8. Let S be a positive


√ definite symmetric p × p random matrix such that
the limiting distribution of n(S − D) is a p2 -variate (singular) normal distribution
with zero mean vector. Let S = ÔD̂Ô be the eigenvalue and eigenvector decompo-
sition of S. Then the limiting distributions of
√ √
n vec(Ô − I p ) and n vec(D̂ − D)

are both multivariate normal and given by


√ √ √
n vec(S − D) = ((D ⊗ I p ) − (I p ⊗ D)) n vec(Ô − I p ) + n vec(D̂ − D) + oP(1).

If we are interested in the limiting distribution of Ôi j , we then obtain


√ √
nŜi j = (Dii − D j j ) nÔi j + oP (1), i = j.

and √
n(Ôii − 1) = oP (1), i = 1, ..., n.
The efficiencies of the shape matrices then give the efficiencies for the eigenvectors
as well.

Example 9.3. Principal component analysis on the air pollution dataset. We


apply the robust principal component analysis based on Tyler’s and Dümbgen’s
shape matrices for the air pollution data in the United States. In the dataset we use 6
measurements on 41 cities in the United States. The data were originally collected
to build a regression model for pollution measurements (SO2) using these 6 explain-
ing variables. Two of the 6 explaining variables were related to human population
(LargeF, Pop); four were climate variables (NegTemp, Wind, AvRain, DaysRain).
See Section 3.3 in Everitt (2005) for more details and the use of PCA for this dataset.
For the principal component analysis, we first rescaled all the marginal variables
with the median absolute deviation (MAD). The results are then independent on the
measurement units used for the original variables. (The PCA is not affine invariant,
however.) See Figure 9.4 for the transformed data set.

The R code for getting the figure follows.


9.9 Principal component analysis based on spatial signs and ranks 125

0 6 12 5 7 9 2 5 8

−12 −9
NegTemp

6 12
LargeF
0

8
Pop

4
0
5 7 9

Wind

7
AvRain

4
1
8

DaysRain
5
2

−12 −9 0 4 8 1 4 7

Fig. 9.4 Air pollution dataset. The marginal variables are standardized with MAD.

> data(usair)
> pairs(usair)
>
> # the dataset as used in Everitt + giving the variables names
>
> usair2 <- usair[,-1]
> usair2 <- transform(usair2, x1 = -x1)
> colnames(usair2) <- c("NegTemp", "LargeF", "Pop", "Wind",
"AvRain", "DaysRain")
>
> mads <- apply(usair2, 2, mad)
> usair3 <- sweep(usair2,2,mads, "/")
>
> pairs(usair3)
>

The next step is to calculate the three shape matrices: Tyler’s and Dümbgen’s
shape matrices and the one based on the regular covariance matrix. (The corre-
sponding transformations standardize UCOV, TCOV and COV, resp.) As some of
the marginal distributions are strongly skewed, the shape of the regular covariance
matrix clearly differs most from the spherical shape. Unlike the spatial sign- and
rank-based shape matrices, the covariance matrix is very sensitive to heavy tails,
126 9 One-sample problem: Inference for shape

which can be clearly seen from Figure 9.5. One can then expect that the results in
the PCA can be quite different.

1.0 2.0 3.0 1.0 2.0 7.5 8.5 4.0 5.0 5.2 6.0

1.0 2.0 3.0−9.6 −8.8


NegTemp
1.0 2.0 3.0

LargeF
2.0

2.0
Pop
8.5 1.0

8.5 1.0
Wind
7.5

7.5
5.0

5.0
AvRain
4.0

4.0
6.0

DaysRain
5.2

−9.6 −8.8 1.0 2.0 3.0 1.0 2.0 7.5 8.5 4.0 5.0

Cov
Tyler’s shape matrix
Duembgen’s shape matrix

Fig. 9.5 Air pollution dataset: Shape estimates.

>
> COV <- cov(usair3)
> SI <- HR.Mest(usair3)
> Dumb <-duembgen.shape(usair3)
> rank.inner <-rank.shape(usair3)
>
> aff.HL <- mv.1sample.est(usair3, score = "rank",
stand = "inner")$location
>
> classical <- list(location= colMeans(usair3),
scatter= COV / sum(diag(COV)) * 6, est.name="Cov")
> signs.inner <- list(location= SI$center,
scatter= SI$scatter / sum(diag(SI$scatter)) * 6,
est.name="Tyler’s shape matrix")
> symm.signs.inner <- list(location=aff.HL,
scatter= Dumb / sum(diag(Dumb)) * 6 ,
est.name="Duembgen’s shape matrix")
> ranks.inner <- list(location= aff.HL, scatter=
9.9 Principal component analysis based on spatial signs and ranks 127

rank.inner / sum(diag(rank.inner)) * 6,
est.name="Rank shape matrix")
>
>
> plotShape(classical, signs.inner, symm.signs.inner,
lty.ell= 1:3, pch.ell=15:17,
+ x.legend= -3 ,y.legend= -1.2,
labels=colnames(usair3), cex.labels = 1.5)

In the following the R-function with the three scores is applied to get the results
in the corresponding PCA. We first compare the similarity of the results by calcu-
lating the correlations between the principal components coming from different ap-
proaches. The correlations show some similarity between the rank- and sign-based
solutions whereas the regular PCA solution differs from the others.

>
> PCA.identity <- mvPCA(usair3, score = "identity")
> PCA.signs.inner <- mvPCA(usair3, score = "sign",
estimate= "inner")
> PCA.symm.signs.inner <- mvPCA(usair3, score = "sym",
estimate= "inner")
> round(cor(PCA.signs.inner$scores,PCA.identity$scores),2)
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Comp.1 0.81 -0.34 -0.47 0.09 0.01 0.00
Comp.2 -0.91 -0.16 -0.37 -0.07 -0.01 0.01
Comp.3 -0.41 -0.71 0.36 0.45 -0.04 0.00
Comp.4 0.22 -0.75 0.30 -0.54 -0.01 -0.01
Comp.5 -0.58 -0.22 0.14 0.08 0.74 -0.20
Comp.6 0.14 -0.14 0.12 -0.03 0.38 0.89
> round(cor(PCA.symm.signs.inner$scores,PCA.identity$scores),2)
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Comp.1 0.99 -0.10 -0.12 0.04 0.01 0.00
Comp.2 -0.45 -0.68 -0.57 0.03 0.00 0.00
Comp.3 0.06 -0.73 0.67 0.16 -0.01 0.00
Comp.4 -0.23 0.34 -0.16 0.90 -0.01 0.00
Comp.5 -0.32 -0.11 0.07 0.04 0.93 -0.14
Comp.6 0.17 -0.04 0.05 0.00 0.23 0.96
> round(cor(PCA.symm.signs.inner$scores,
PCA.signs.inner$scores),2)
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Comp.1 0.89 -0.84 -0.36 0.24 -0.55 0.14
Comp.2 0.14 0.73 0.48 0.23 0.33 -0.04
Comp.3 0.00 -0.19 0.80 0.67 0.22 0.18
Comp.4 -0.14 0.16 0.21 -0.84 0.11 -0.12
Comp.5 -0.24 0.27 0.22 0.00 0.94 0.21
Comp.6 0.12 -0.17 -0.03 0.07 -0.10 0.98

We then compare the results based on the regular shape matrix with those based
on Tyler’s shape matrix. The principal components coming from the regular shape
128 9 One-sample problem: Inference for shape

matrix may now be easier to interpret. The first one is related to the human popula-
tion, the second one to the rain conditions, the third one to other climate variables,
and the fourth one to the wind. The robust shape matrices cut down the effects of few
outlying observations, the few cities with very high population and manufacturing
as well as the few cities with a hot climate and low precipitation. The results com-
ing from the robust PCA as reported below (based on Tyler’s shape matrix) therefore
differ a lot from the results of the regular PCA. The first principal component may
be related to the climate in general, the second one to the human population, the
third one to the rain, and the fourth one to the wind.

> summary(PCA.identity, loadings=TRUE)


Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Proportion of Variation 0.5527 0.1991 0.1506 0.07555
Cumulative Proportion 0.5527 0.7518 0.9024 0.97792

Comp.5 Comp.6
0.01368 0.008397
0.99160 1.000000

Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
NegTemp 0.703 0.227 0.556 0.371
LargeF -0.780 0.272 -0.554
Pop -0.604 -0.176 -0.330 0.702
Wind -0.121 0.405 -0.893 -0.115
AvRain -0.755 -0.383 -0.188 0.461 0.188
DaysRain -0.649 0.402 0.327 -0.531 -0.152
> PCA.signs.inner
PCA for usair3 based on Tyler’s shape matrix

Standardized eigenvalues:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
2.4436 1.5546 1.1125 0.6872 0.1193 0.0829

6 variables and 41 observations.


> summary(PCA.signs.inner, loadings=TRUE)
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Proportion of Variation 0.4073 0.2591 0.1854 0.1145
Cumulative Proportion 0.4073 0.6664 0.8518 0.9663

Comp.5 Comp.6
0.01988 0.01382
0.98618 1.00000

Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
NegTemp -0.453 -0.402 0.353 0.458 0.546
LargeF -0.389 0.564 0.164 -0.164 0.520 -0.454
9.10 Other approaches 129

Pop -0.302 0.602 -0.146 -0.476 0.541


Wind -0.481 -0.342 0.798
AvRain 0.566 0.392 0.153 0.480 0.397 0.338
DaysRain 0.838 0.289 -0.358 -0.286
> plot(PCA.signs.inner)

9.10 Other approaches

Hallin and Paindaveine (2006) proposed test statistics for sphericity based on spatial
sign vectors ui = |yi |−1 yi and the ranks of lengths ri = |yi |, i = 1, ..., n. Their test
statistics are of the same form as that defined in John (1971), but instead of the
sample covariance matrix, they use AVE{K(Ri /(n + 1))ui ui }, where K = Kg is a
score function corresponding to spherical density g and Ri denotes the rank of ri
among r1 , ..., rn , i = 1, ..., n. The Hallin and Paindaveine (2006) tests appear to be
valid without any moments assumptions and asymptotically optimal if f = g.
Chapter 10
Multivariate tests of independence

Abstract Multivariate extensions of the quadrant test by Blomqvist (1950) and


Kendall’s tau and Spearman’s rho statistics are discussed. Asymptotic theory is
given to approximate the null distributions as well as to calculate limiting Pitman
efficiencies. The tests are compared to the classical Wilks’ (Wilks, 1935) test.

10.1 The problem and a general strategy

In the analysis of multivariate data, it is often of interest to find potential relation-


ships among subsets of the variables. In the air pollution data example 9.3, for in-
stance, two variables are related to human population and four variables represent
climate attributes. It may be of interest to find out whether the two sets of vari-
ables are related. This requires a test of independence between pairs of vectors. The
vectors may then have different measurement scales and dimensions.

Let (X, Y) be a random sample of size n from a p-variable distribution where


X is an n × r matrix of measurements of r variables and Y is an n × s matrix of
measurements of remaining s variables, r + s = p. As before, we write

X = (x1 , ..., xn ) and Y = (y1 , ..., yn ) .

As (X, Y) is a random sample, the rows are independent. Our null hypothesis of the
independence of the x- and y-variables can be written as

H0 : X and Y are independent.

We wish to use general score functions Tx (x) and Ty (y) in the test construction.
We again use the identity score function, the spatial sign score function, and the
spatial rank score functions so that Tx (x) is an r-vector and Ty (y) is an s-vector,

H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 131
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 10,
c Springer Science+Business Media, LLC 2010
132 10 Multivariate tests of independence

respectively. As explained in Section 4.1, the observations in X may be replaced


(i) by their outer centered and standardized scores or (ii) by their inner centered
and standardized scores, and similarly and independently with Y. If we use inner
centering and standardization, we transform

(X, Y) → (T̂X , T̂Y ).

where
T̂X 1n = T̂Y 1n = 0,
and
r · T̂X T̂X = |T̂X |2 Ir and s · T̂Y T̂Y = |T̂Y |2 Is .
Here, as before,

|T̂X |2 = tr(T̂X T̂X ) and |T̂Y |2 = tr(T̂Y T̂Y ).

Under general assumptions (which should be checked separately for each score
function) and under the null hypothesis, the limiting distribution of

|T̂X T̂Y |2
Q2 = Q2 (X, Y) = nrs
|T̂X |2 |T̂Y |2

is then a chi-square distribution with rs degrees of freedom. As the null hypothesis


says that
(X, PY) ∼ (X, Y)
for all n × n permutation matrices P, the p-value from a conditionally distribution-
free permutation test is obtained as
  
EP I Q2 (X, PY) ≥ Q2 (X, Y) .

Note also that, as the inner and outer centering and standardization are permutation
invariant,
|T̂ PT̂Y |2
Q2 (X, PY) = nrs X 2
|T̂X | |T̂Y |2
so that simply |T̂X PT̂Y |2 can be used in practice to find the permutation p-value.

10.2 Wilks’ and Pillai’s tests

The classical parametric test due to Wilks (1935) is the likelihood ratio test statistic
in the multivariate normal model and is based on
det(COV)
W = W (X, Y) =
det(COV11 ) det(COV22 )
10.3 Tests based on spatial signs and ranks 133

using the partitioned covariance matrix


 
COV11 COV12
COV(X, Y) = COV = ,
COV21 COV22

where COV(X) = COV11 is an r × r matrix, and COV(Y) = COV22 is an s × s


matrix. The test is optimal under the multivariate normal model. Under H0 with
finite fourth moments the test statistic −n logW →d χrs
2.

Another classical test for independence is Pillai’s trace test statistic (Pillai (1955))
which uses

W ∗ = W ∗ (X, Y) = tr COV−1 −1
11 COV12 COV22 COV21 .

Pillai’s trace statistic and Wilks’ test statistic are asymptotically equivalent in the
sense that if the fourth moments exist and the null hypothesis is true then

nW ∗ − n logW →P 0.

If we follow our general strategy with the identity score function (Tx (x) = x and
Ty (y) = y) then it is easy to see that

Q2 (X, Y) = nW ∗ (X, Y);

that is, Q2 is the Pillai trace statistic.

Muirhead (1982) examined the effect of the group of affine transformations on


this problem. The Wilks and Pillai tests are invariant under the group of affine trans-
formations
(X, Y) → (XHx + 1nbx , YHy + 1nby )
for arbitrary r and s vectors bx and by and for arbitrary nonsingular r × r and s × s
matrices Hx and Hy , respectively. Thus the performance of the tests does not depend
on the variance-covariance structures of either X (COV11 ) or Y ( COV22 ). This
characteristic generally improves the power and control of α -levels.

10.3 Tests based on spatial signs and ranks

In the following we describe the tests that generalize the popular univariate tests due
to Blomqvist (1950), Spearman (1904), and Kendall (1938) to any dimensions r and
s. The tests provide practical and robust but still efficient alternatives to multivariate
normal theory methods.

Extension of Blomqvist quadrant test. The test is thus based on the r- and s-
variate spatial sign scores U(x) and U(y). To make the test statistic affine invariant,
134 10 Multivariate tests of independence

we first construct inner centered and standardized spatial signs separately for X and
Y and transform
(X, Y) → (ÛX , ÛY ).
(The outer centering and standardization yield procedures that are only rotation in-
variant.) Recall that the inner centering and standardization are accomplished by the
simultaneous Hettmansperger-Randles estimates of multivariate location and shape;
see Definition 6.4. For inner standardized spatial signs satisfy

ÛX 1n = 0 and r · ÛX ÛX = nIr

and similarly
ÛY 1n = 0 and s · ÛY ÛY = nIs .
The test statistic is then
rs 
Q2 = Q2 (X, Y) = |Û ÛY |2 .
n X

Consider next the limiting null distribution of Q2 . We assume that

(X, Y) = (1n μ x + ε x Ω x , 1n μ y + ε y Ω y ),

where (ε x , ε y ) is a random sample from a standardized distribution such that

E{U(ε xi )} = 0 and r · E{U(ε xi )U(ε xi ) } = Ir

and
E{U(ε yi )} = 0 and s · E{U(ε yi )U(ε yi ) } = Is .
Note that, for (ε x )i and (ε y )i separately, this is a wider model than the model (B2)
discussed in Chapter 2. The spatial median and Tyler’s shape matrix of ε xi (and
similarly of ε yi ) is a zero vector and an identity matrix, respectively. If μ̂ x and μ̂ y
are any root-n consistent estimates of μ x and μ y , and Σ̂ x and Σ̂ y are any root-n
consistent estimates of Σ x = Ω x Ω x and Σ y = Ω y Ω y (up to a constant), respectively,
and



−1/2 −1/2
(ÛX )i = U Σ̂ x (xi − μ̂ x ) and (ÛY )i = U Σ̂ y (yi − μ̂ y ) ,

i = 1, ..., n, one can show as in Taskinen et al. (2003) using the expansions in Section
6.1.1 that, under the null hypothesis, Q2 →d χrs 2 . (It naturally remains to show that

the Hettmansperger-Randles estimate used in our approach is root-n consistent.)

Extension of Spearman’s rho. The test is based on the spatial rank scores
RX (x) and RY (y) with dimensions r and s, respectively. We then calculate the inner
standardized spatial ranks separately for X and Y and transform

(X, Y) → (R̂X , R̂Y ).


10.3 Tests based on spatial signs and ranks 135

Note that, in this case, the scores are automatically centered, and the inner centering
is therefore not needed. For inner standardized spatial ranks

R̂X 1n = 0 and r · R̂X R̂X = |R̂X |2 Ir

and
R̂Y 1n = 0 and s · R̂Y R̂Y = |R̂Y |2 Is .
The test statistic is then

|R̂X R̂Y |2
Q2 = Q2 (X, Y) = nrs .
|R̂X |2 |R̂Y |2

Next consider the limiting null distribution of Q2 . For considering the asymp-
totic behavior of the test, we assume that there are (up to a constant) unique scatter
matrices (parameters) Σ x and Σ y and positive constants τx2 and τy2 which satisfy


 
−1/2 −1/2
E U Σ x (x1 − x2 ) U Σ x (x1 − x3) = τx2 Ir

and 

 
−1/2 −1/2
E U Σ y (y1 − y2 ) U Σ y (y1 − y3 ) = τy2 Is ,

respectively, and that Σ̂ x and Σ̂ y are root-n consistent estimates of Σ x and Σ y , again
up to a constant. If we then choose

−1/2  
−1/2 
(R̂X )i = AVE j U Σ̂ x (xi − x j ) and (R̂Y )i = AVE j U Σ̂ y (yi − y j ) ,

i = 1, ..., n, one can show again as in Taskinen et al. (2005) that, under these mild
assumptions and under the null hypothesis, Q2 →d χrs 2 . (For our proposal above,

one needs to show that the shape matrix estimate based on the spatial ranks is root-n
consistent.)

Extension of Kendall’s tau. As in the Spearman rho case, standardize the


observation first (xi → x̂i and yi → ŷi ) so that

r · R̂X R̂X = |R̂X |2 Ir and s · R̂Y R̂Y = |R̂Y |2 Is .

Then form a new data matrix of n(n − 1) differences

(X̂, Ŷ)d = (X̂d , Ŷd )

with rows consisting of all possible differences

((x̂i − x̂ j ) , (ŷi − ŷ j ) ), i = j = 1, ..., n.

Then transform to the spatial signs


136 10 Multivariate tests of independence

(X̂, Ŷ)s → (ÛXd , ÛYd ).

The extension of Kendall’s tau statistic


|ÛXd ÛYd |2
Q2 = Q2 (X, Y) = nrs
(n − 1)24|R̂X |2 |R̂Y |2

is then again asymptotically chi-square distributed with rs degrees of freedom. For


details, see Taskinen et al. (2005).

10.4 Efficiency comparisons

To illustrate and compare the efficiencies of different test statistics for independence,
we derive the limiting distributions of the test statistics under specific contiguous al-
ternative sequences. Let x∗i and y∗i be independent with spherical marginal densities
exp{−ρx (|x|)} and exp{−ρy (|y|)} and write, for some choices of M1 and M2 ,
     ∗
xi (1 − Δ )Ir Δ M1 xi
= ,
yi Δ M2 (1 − Δ )Is y∗i

with Δ = δ / n. Let fΔ be the density of (xi , yi ) . Note that the joint distribution
of ((xi ) , (yi ) ) is not spherically symmetric any more; the only exception is the
∗  ∗

multivariate normal case. The optimal likelihood ratio test statistic for testing H0
against HΔ is then
n
L = ∑ {log fΔ (xi , yi ) − log f0 (xi , yi )}.
i=1

We need the general assumption (which must be checked separately in each case)
that, under the null hypothesis,

δ n 
L = √ ∑ r − ψx (rxi )rxi + ψx (rxi )ryi uxi M1 uyi
n i=1

s − ψy (ryi )ryi + ψy (ryi )rxi uyi M2 uxi + oP(1),

where rxi = |xi |, uxi = |xi |−1 xi , and ψx (r) = ρx (r), and similarly for ryi , uyi and
ψy (r). Under this assumption the sequence of alternatives is contiguous to the null
hypothesis.

Under the above sequence of alternatives we get the following limiting distribu-
tions.
Theorem 10.1. Assume that max(p, q) > 1. The limiting distribution of the multi-
variate Blomqvist statistic under the sequence of alternatives is a noncentral chi-
square distribution with rs degrees of freedom and noncentrality parameter
10.4 Efficiency comparisons 137

δ2
|c1 M1 + c2 M2 |2 ,
rs
where

c1 = (r − 1)E(|y∗i |)E(|x∗i |−1 ) and c2 = (s − 1)E(|x∗i |)E(|y∗i |−1 ).

The limiting distribution of multivariate Spearman’s rho and multivariate Kendall’s


tau under the sequence of alternatives above is a noncentral chi-square distribution
with rs degrees of freedom and noncentrality parameter

δ2
|d1 M1 + d2 M2 |2 ,
4 rs τx2 τy2

where
d1 = (r − 1)E(|y∗i − y∗j |)E(|x∗i − x∗j |−1 )
and
d2 = (s − 1)E(|x∗i − x∗j |)E(|y∗i − y∗j |−1 ).
Finally, the limiting distribution of Wilk’s test statistic n logW (and Pillai’s trace
statistic Q2 = nW ∗ ) under the sequence of alternatives is a noncentral chi-square
distribution with rs degrees of freedom and noncentrality parameter

δ 2 |M1 + M2 |2 .

The above results are thus found in the case when the null marginal distributions
are spherically symmetric. If the marginal distributions are elliptically symmetric,
the efficiencies are of the same type |h1 M1 + h2 M2 |2 , where h1 and h2 depend on the
marginal spherical distributions and the test used. If the marginal distributions are of
the same dimension and the same type (which implies that h1 = h2 ) then the relative
efficiencies do not depend on M1 and M2 at all. Note that, unfortunately, the tests
are not unbiased (positive noncentrality parameter) for all alternative sequences.

As the limiting distributions are of the same type, χrs 2


, the efficiency compar-
isons may be based on noncentrality parameters only. The efficiency comparisons
are now made in the multivariate normal, t5 and t10 distribution cases, and con-
taminated normal distribution cases. For simplicity, we assume that M1 = M2 . The
resulting limiting efficiencies for selected dimensions are listed in Table 10.1. Note
that because limiting multinormality of the regular covariance matrix holds if the
fourth moments of the underlying distribution are finite, n logW has a limiting dis-
tribution only when ν ≥ 3. When the underlying distribution is multivariate normal
(ν = ∞), the Wilks test is naturally the best one, but Kendall’s tau and Spearman’s
rho are very competitive with it. As the underlying population becomes heavy-tailed
(ν gets smaller), then the rank-based tests are better than the Wilks test. The exten-
sion of the Blomqvist quadrant test is good for heavy-tailed distributions and for
138 10 Multivariate tests of independence

high dimensions. For more details, simulation studies and efficiencies under other
distributions, see Taskinen et al. (2003, 2005).

Table 10.1 Asymptotical relative efficiencies of Kendall’s tau and Spearman’s rho (and the
Blomqvist quadrant test in parentheses) as compared to the Wilks test at different r- and s-variate
t distributions for selected ν = ν1 = ν2

r
s 2 5 10
2 1.12 1.14 1.16
(0.79) (0.91) (0.96)
ν =5 5 1.17 1.19
(1.05) (1.10)
10 1.20
(1.16)
2 1.00 1.02 1.03
(0.69) (0.80) (0.84)
ν = 10 5 1.04 1.05
(0.92) (0.96)
10 1.07
(1.01)
2 0.93 0.95 0.96
(0.62) (0.71) (0.75)
ν =∞ 5 0.96 0.97
(0.82) (0.86)
10 0.98
(0.91)

10.5 A real data example

The dataset considered in this example is a subset of hemodynamic data collected as


a part of the LASERI study (English title: Cardiovascular risk in young Finns study)
using whole-body impedance cardiography and plethysmographic blood pressure
recordings from fingers. The measurements are on 223 healthy subjects between 26
and 42 years of age. The hemodynamic variables were recorded both in a supine
position and during a passive head-up tilt on a motorized table. During that experi-
ment the subject spent the first ten minutes in a supine position, then the motorized
table was tilted to a head-up position (60 degrees) for five minutes, and for the last
five minutes the table was again returned to the supine position. We are interested in
the differences between some recorded values before and after the tilt, denoted by
HRT1T4, COT1T4, and SVRIT1T4. It is of interest whether these variables and the
weight, height, and hip measurements are independent. The observed values can be
seen in Figure 10.1.
10.5 A real data example 139

50 70 90 −20 0 −1000 0

190
Height

170
150
50 70 90
Weight

130
110
Hip

90
0

HRT1T4
−20

1.5
0.5
COT1T4

−0.5
0

SVRIT1T4
−1000

150 170 190 90 110 130 −0.5 0.5 1.5

Fig. 10.1 A scatterplot matrix for LASERI data.

The figure is obtained using

> data(LASERI)
> attach(LASERI)
> pairs( cbind(Height,Weight,Hip,HRT1T4,COT1T4,SVRIT1T4),
col=as.numeric(Sex), pch=15:16)

For the two 3-variate observations we first used the tests based on the identity
score (Pillai’s trace) and the tests based on standardized spatial signs of the obser-
vation vectors (an extension of the Blomqvist test) and those of the differences of
the observation vectors (an extension of Kendall’s tau). All the p-values are small;
see the results below.

> mv.ind.test(cbind(Height,Waist,Hip),
cbind(HRT1T4,COT1T4,SVRIT1T4))

Test of independence using Pillai’s trace


140 10 Multivariate tests of independence

data: cbind(Height, Waist, Hip) and


cbind(HRT1T4, COT1T4, SVRIT1T4)
Q.2 = 20.371, df = 9, p-value = 0.01576
alternative hypothesis: true
measure of dependence is not equal to 0

> mv.ind.test(cbind(Height,Waist,Hip),
cbind(HRT1T4,COT1T4,SVRIT1T4),score="si")

Spatial sign test of independence


using inner standardization

data: cbind(Height, Waist, Hip) and


cbind(HRT1T4, COT1T4, SVRIT1T4)
Q.2 = 24.1245, df = 9, p-value = 0.004109
alternative hypothesis:
true measure of dependence is not equal to 0

> mv.ind.test(cbind(Height,Waist,Hip),
cbind(HRT1T4,COT1T4,SVRIT1T4),score="sy")

Symmetrized spatial sign test


of independence using inner
standardization

data: cbind(Height, Waist, Hip) and


cbind(HRT1T4, COT1T4, SVRIT1T4)
Q.2 = 25.5761, df = 9, p-value = 0.002396
alternative hypothesis:
true measure of dependence is not equal to 0

We next dropped two variables Hip and COT1T4. For the remaining two bivariate
observations we cannot reject the null hypothesis of independence, and one gets the
following results (Pillai’s trace and an extension of Spearman’s rho; approximate
p-values coming from a permutation tests are also given).

> mv.ind.test(cbind(Height,Waist),
cbind(HRT1T4,SVRIT1T4))

Test of independence using


Pillai’s trace

data: cbind(Height, Waist) and


cbind(HRT1T4, SVRIT1T4)
Q.2 = 3.9562, df = 4, p-value = 0.412
alternative hypothesis: true measure
of dependence is not equal to 0
10.6 Canonical correlation analysis 141

> mv.ind.test(cbind(Height,Waist),
cbind(HRT1T4,SVRIT1T4), method="p")

Test of independence using


Pillai’s trace

data: cbind(Height, Waist) and


cbind(HRT1T4, SVRIT1T4)
Q.2 = 3.9562, replications = 1000, p-value = 0.422
alternative hypothesis:
true measure of dependence is not equal to 0

> mv.ind.test(cbind(Height,Waist),
cbind(HRT1T4,SVRIT1T4), score="r", method="p")

Spatial rank test of independence


using inner standardization

data: cbind(Height, Waist) and


cbind(HRT1T4, SVRIT1T4)
Q.2 = 4.0556, replications = 1000, p-value = 0.408
alternative hypothesis:
true measure of dependence is not equal to 0

> mv.ind.test(cbind(Height,Waist),
cbind(HRT1T4,SVRIT1T4), score="r", method="p")

Spatial rank test of independence


using inner standardization

data: cbind(Height, Waist) and


cbind(HRT1T4, SVRIT1T4)
Q.2 = 4.0556, replications = 1000, p-value = 0.408
alternative hypothesis:
true measure of dependence is not equal to 0

10.6 Canonical correlation analysis

Assume that r ≤ s. The classical canonical correlation analysis based on the covari-
ance matrix finds the transformation matrices Hx and Hy such that
 
  Ir L
COV(XHx , YHy ) = ,
L Is

where L = (L0 , O) with an r × r diagonal matrix L0 . Note that Pillai’s trace statistic
is the sum of squared canonical correlations
142 10 Multivariate tests of independence

W ∗ = tr(LL ) = tr(L L).

Write again  
COV11 COV12
COV(X, Y) = COV = .
COV21 COV22
Then

Hx COV11 Hx = Ir ,
Hy COV22 Hy = Is , and
Hx COV12 Hy = L.

To simplify the notations, assume that r = s and that the limiting distribution of
(vectorized)    
√ COV11 COV12 Ir Λ
n −
COV21 COV22 Λ Ir
is a multivariate normal distribution with zero mean vector and the diagonal matrix
Λ has distinct diagonal
√ elements. Then,
√ using the three
√ equations above and Slut-
sky’s theorem, also n(Hx − Ir ), n(Hy − Ir ), and n(L − Λ ) have multivariate
normal limiting distributions that can be solved from
√ √ √
n(Hx − Ir ) + n(Hx − Ir ) + n(COV11 − Ir ) = oP (1),
√ √ √
n(Hy − Ir ) + n(Hy − Ir ) + n(COV22 − Ir ) = oP (1) and
√ √ √ √
n(Hx − Ir )Λ + nΛ (Hy − Ir ) + n(COV12 − Λ ) = n(L − Λ ) + oP(1).

In the multivariate √ normal case, the limiting distribution of standardized canoni-


cal correlations n(lii − Λii ), i = 1, ..., r, for example, is N(0, (1 − Λii2 )2 ), and the
canonical correlations are asymptotically independent. See Anderson (1999). Nat-
urally, in the general elliptic case, any root-n consistent and asymptotically normal
scatter matrix S can be used for the canonical analysis in a similar way.√Also in that
case the limiting distribution of standardized canonical correlations n(lii − Λii ),
i = 1, ..., r, is a multivariate normal distribution N(0, (1 − Λii2 )2 ASV(S12 )), where
ASV(S12 ) is the limiting variance of the off-diagonal element of S(X) when X is a
random sample from a corresponding (r + s)-variate spherical distribution F stan-
dardized so that S(F) = Ir+s . Also, the canonical correlations are asymptotically
independent. See Taskinen et al. (2006) for details.

10.7 Other approaches

A nonparametric analogue to Wilks’ test was given by Puri and Sen (1971). They
developed a class of tests based on componentwise ranking which uses a test statistic
of the form
10.7 Other approaches 143

|T |
SJ = .
|T11 |T22 |
Here the elements of (r + s) × (r + s) matrix T are
   
1 n Rki Rli
Tkl = ∑ J J ,
n i=1 n+1 n+1

where Rki denotes the rank of the kth component of (xi , yi ) among the kth compo-
nents of all n vectors and J(·) is an arbitrary (standardized) score function. Under
H0 , −n log SJ →d χrs
2.

Gieser and Randles (1997) proposed a nonparametric test based on interdirection


counts that generalized the univariate (p = q = 1) quadrant test of Blomqvist (1950)
and which is invariant under the transformation group. Taskinen et al. (2003) gave
a more practical invariant extension of the quadrant test based on spatial signs. It is
easy to compute for data in common dimensions. These two tests are asymptotically
equivalent if the marginal vectors are elliptically distributed. Later Taskinen et al.
(2005) introduced multivariate extensions of Kendall’s tau and Spearman’s rho that
are based on interdirections. Sukhanova et al. (2009) gave extensions based on the
Oja signs and ranks.

Oja et al. (2009) found optimal nonparametric tests of independence in the sym-
metric independent component model. These tests were based on marginal signed-
ranks applied to the (estimated) independent components.
Chapter 11
Several-sample location problem

Abstract In this chapter we consider tests and estimates based on identity, spatial
sign, and spatial rank scores in the several independent samples setting. We get
multivariate extensions of the Moods test, Wilcoxon-Mann-Whitney test, Kruskal-
Wallis test and the two samples Hodges-Lehmann estimator. Equivariant/invariant
versions are found using inner centering and standardization.

11.1 A general strategy for testing and estimation

Let now data matrix


Y = (Y1 , ..., Yc )
consist of c independent random samples

Yi = (yi1 , ..., yini ) , i = 1, ..., c,

from p-variate distributions with cumulative distribution functions F1 , ..., Fc . We


write n = n1 + · · · + nc for the total sample size. We assume that the independent
p-variate observation vectors yi j with cumulative distribution functions Fi are gen-
erated by
yi j = μ i + Ω ε i j , i = 1, ..., c; j = 1, ..., ni ,
where the ε i j are independent and standardized vectors all having the same un-
known distribution. Then μ 1 , ..., μ c are unknown location centers and the matrix
Σ = Ω Ω  > 0 is a joint unknown scatter matrix specified later. At first, we wish to
test the null hypothesis
H0 : μ 1 = · · · = μ c .
Under our assumptions, the null hypothesis can also be written as H0 : F1 = · · · = Fc
saying that all observations come from the same population (back in the one-sample
case). Later in this chapter, we also wish to estimate unknown centers μ 1 , ..., μ c or
differences Δ i j = μ j − μ i , i, j = 1, ..., c.

H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 145
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 11,
c Springer Science+Business Media, LLC 2010
146 11 Several-sample location problem

Again, as in the one-sample case, we wish to use a general location score function
T(y) in testing and estimation. Using inner centering and outer standardization, the
procedure for testing proceeds as follows.

Test statistic using inner centering and outer standardization.


1. Find a shift vector μ̂ such that

AVE{T(yi j − μ̂ )} = 0.

For the inner centered scores, write

T̂i j = T(xi j − μ̂ ), i = 1, ..., c; j = 1, ..., ni .

2. The test statistic for testing whether the ith sample differs from the others is then
based on  
T̂i = AVE j T̂i j , i = 1, ..., c.
3. The test statistic for H0 : F1 = · · · = Fc is
c  
Q2 = Q2 (Y) = ∑ ni T̂i B̂−1 T̂i ,
i=1

where  
B̂ = AVE T̂i j T̂i j .
4. Under the null hypothesis H0 : μ 1 = · · · = μ c and under assumptions specified
later, the limiting distribution of Q2 (Y) is χ(c−1)p
2 .

Inner centering makes the test statistic location invariant; that is,

Q2 (Y + 1nb ) = Q2 (Y)

for all p-vectors b. Note that the test statistic can be written as
!
c    
 −1
Q = tr ∑ ni T̂i T̂i AVE T̂i j T̂i j
2 
i=1

which compares two scatter matrices (for “between” and “total” variation). In fact
Q2 is the classical Pillai trace statistic for MANOVA, now based on centered score
values T̂i j instead of original centered values yi j − ȳ.

The approach based on inner centering and inner standardization is given next.

Test statistic using inner centering and inner standardization.


1. Find a shift vector μ̂ and full rank transformation matrix S−1/2 such that the
scores
11.2 Hotelling’s T 2 and MANOVA 147

T̂i j = T(S−1/2 (yi j − μ̂ )), i = 1, ..., c; j = 1, ..., ni ,


are standardized and centered in the sense that
     
AVE T̂i j = 0 and p · AVE T̂i j T̂i j = AVE |T̂i j |2 I p .

This is the inner centering and inner standardization.


2. The test statistic for testing whether the ith sample differs from the others is then
based on  
T̂i = AVE j T̂i j , i = 1, ..., n.
3. The several-sample location test statistic is then (under some assumptions)

∑i ni |T̂i |2
Q2 = np .
∑i ∑ j |T̂i j |2

4. Under the null hypothesis H0 : μ 1 = · · · = μ c and under assumptions specified


later, the limiting distribution of Q2 (Y) is χ(c−1)p
2 .

The test using both inner centering and inner standardization is affine invariant;
that is,
Q2 (YH + 1nb ) = Q2 (Y)
for all full-rank p × p matrices H and for all p-vectors b. For both test versions, un-
der general assumptions stated later, the limiting distribution of the test statistic Q2 is
a chi-square distribution with (c − 1)p degrees of freedom that can be used to calcu-
late approximate p-values. The p-value can also be calculated for the conditionally
distribution-free, exact permutation test version. Let P be an n × n permutation ma-
trix (obtained from an identity matrix by permuting rows or columns). The p-value
of the permutation test statistic is then
  
EP I Q2 (PY) ≥ Q2 (Y) ,

where P has a uniform distribution over all possible n! permutations.

11.2 Hotelling’s T 2 and MANOVA

We start with the classical multivariate analysis of variance (MANOVA) test statis-
tics which are given by the choice T(y) = y, identity score function. Again, write

Y = (Y1 , ..., Yc ) and Yi = (yi1 , ..., yini ) , i = 1, ..., c,

and let the observations yi j be generated by

yi j = μ i + Ωε i j , i = 1, ..., c; j = 1, ..., ni ,
148 11 Several-sample location problem

where ε i j are independent, centered, and standardized observations with cumulative


distribution function F. In this first approach we assume that the second moments
of ε i j exist and
E(ε i j ) = 0 and COV(ε i j ) = I p .
Then E(yi j ) = μ i and COV(yi j ) = Σ = Ω Ω  . First, we wish to test the null hypoth-
esis
H0 : μ 1 = · · · = μ c .

Our general testing strategy with the identity score function then uses sample
mean vectors
ȳi = AVE j {yi j }, i = 1, ..., c,
“grand mean vector”
ȳ = AVE{yi j },
and sample covariance matrix

B̂ = S = AVE{(yi j − ȳ)(yi j − ȳ) }.

Then we get the following.

Test statistic using inner centering and outer standardization.


1. Inner (and outer) centered scores are

T̂i j = yi j − ȳ, i = 1, ..., c; j = 1, ..., ni .

2. The test statistic for testing whether the ith sample differs from the others is then
based on
T̂i = ȳi − ȳ, i = 1, ..., c.
3. The test statistic is
c  
Q2 = Q2 (Y) = ∑ ni (ȳi − ȳ) S−1 (ȳi − ȳ) .
i=1

4. Under the null hypothesis H0 : μ 1 = · · · = μ c , the limiting distribution of Q2 (Y)


is χ(c−1)p
2 .

It is straightforward to see that the inner centering and inner standardization yield
exactly the same test statistic. The classical MANOVA procedure usually starts with
a decomposition
SST = SSB + SSW
corresponding to
11.2 Hotelling’s T 2 and MANOVA 149

∑ ∑(yi j − ȳ)(yi j − ȳ) = ∑ ∑(ȳi − ȳ)(ȳi − ȳ) + ∑ ∑(yi j − ȳi)(yi j − ȳi) .


i j i j i j

Thus the “total” variation SST is decomposed into a sum of “between” and “within”
variations, SSB and SSW . Our test statistic

Q2 = n · tr SSBSST−1

compares the “between” and “total” matrices and it is known in the literature with
the name Pillai’s trace statistic. Another possibility is to base the test on the Lawley-
−1
Hotelling’s trace statistic, Q2LH = n ·tr(SSB SSW ). The test statistics Q2 (correspond-
ingly QLH ) is simply n times the sum of eigenvalues of SSBSST−1 (correspondingly
2
−1
SSBSSW ). Instead of considering the sum (or the arithmetic mean) of the eigen-
values of SSB SST−1 or SSBSSW −1
, one could base the test on the product (or the ge-
ometrical mean) of the eigenvalues. The so-called Wilks’ lambda, for example, is
Λ = det(SW SST−1). In fact, Wilks’ lambda is the likelihood ratio test statistic in the
multivariate normal case.

Another formulation of the problem and test statistics. In the practical anal-
ysis of data, the data are usually given in the form

(X, Y),

where X is an n × c matrix indicating the group (sample) membership and the n × p


matrix Y gives the values of the p-variate response vector. Then

1, if the ith observation comes from group (sample) j
Xi j =
0, otherwise.

Matrices XX and X X have nice interpretations, namely



 1, if ith and jth observations come from the same treatment group
(XX )i j =
0, otherwise.

and X X is a c × c diagonal matrix whose diagonal elements are the c group sizes,
here denoted by n1 , ..., nc .

For the regular outer centering of the observation vectors, one can use the pro-
jection matrix
1
P1n = 1n (1n 1n )−1 1n = 1n 1n .
n
The projection Y → P1n Y then replaces the observations by the “grand” mean
vector, and the outer (and inner) centered observations are obtained as residuals,
 
1
Y → Ŷ = (In − P1n )Y = In − 1n 1n Y.
n

Next recall that


150 11 Several-sample location problem

PX = X(X X)−1 X
is the projection matrix that projects the data points to the subspace spanned by the
columns of X. This means that, in our case, transformation Y → PX Y replaces the
observation vectors by their group mean vectors. Matrix In − PX is the projection
matrix to the corresponding residual space; that is, matrix (In − PX )Y yields the
differences between observations and the corresponding group (sample) means.

Using these notations,

Ŷ Ŷ = Ŷ PX Ŷ + Ŷ (In − PX )Ŷ,

and therefore 
Q2 = n · tr Ŷ PX Ŷ(Ŷ Ŷ)−1
and 
Q2LH = n · tr Ŷ PX Ŷ(Ŷ (In − PX )Ŷ)−1 .
Recall that in the one-sample case, for testing H0 : μ = 0, one uses the decomposi-

tion
Y Y = Y P1n Y + Y (In − P1n )Y = Y P1n Y + Ŷ Ŷ
and there we obtained

Q2 = n · tr(Y P1n Y(Y Y)−1 ) and T 2 = n · tr(YP1n Y(Ŷ Ŷ)−1 ).

Model of multivariate normality. We assume first that Yi is a random sample


from a multivariate normal distribution N p (μ i , Σ ), i = 1, ..., c. If c = 2,
n1 n2
Q2 = (ȳ1 − ȳ2 ) S−1 (ȳ1 − ȳ2),
n
similarly Q2LH , are different versions of Hotelling’s two-sample T 2 -test. In the gen-
eral c-sample case, Q2LH is the classical MANOVA test statistic, namely the Lawley-
Hotelling trace statistic. The test statistic Q2 is known as Pillai’s trace statistic. The
exact distributions of the test statistics are known but quite complicated; see, for
example, DasGupta (1999a,b) for a discussion of that. In practice, one can often use
the approximate distribution which is a chi-square distribution with (c− 1)p degrees
of freedom. This is discussed next.

Nonparametric model with finite second moments. The test statistics Q2 and
Q2LH are also asymptotically valid in a wider nonparametric model where we only
assume that the second moments exist. In this model, we then assume that the ob-
servations are generated according to

yi j = μ i + Ω ε i j , i = 1, ..., c; j = 1, ..., ni ,
11.2 Hotelling’s T 2 and MANOVA 151

where ε i j are independent vectors all having the same unknown distribution with
E(ε i j ) = 0 and COV(ε i j ) = I p . Recall that the vectors μ 1 , ..., μ c are unknown pop-
ulation mean vectors, and Σ = Ω Ω  is the covariance matrix of yi j .

Next we consider the limiting null distribution of the test statistic. For that we
need the assumption that
ni
→ λi , i = 1, ..., c, as n → ∞,
n
where 0 < λi < 1, i = 1, ..., c. First, under the null hypothesis,
1  1 
Ŷ Ŷ →P Σ and Ŷ (In − PX )Ŷ →P Σ
n n
and therefore Q2LH and Q2 are asymptotically equivalent; that is, Q2LH − Q2 →P 0.
Then note, using the multivariate central limit theorem (CLT), that the limiting dis-
tribution of
√ √
( n1 ȳ1 , ..., nc−1 ȳc−1 )
is a multivariate normal distribution with mean vector zero and covariance matrix

Ic−1 − dd ⊗ Σ ,
√ 
where d = ( λ1 , ..., λc−1 ) . We use the first c − 1 sample mean vectors only as
√ √
the covariance matrix of ( n1 ȳ1 , ..., nc ȳc ) is singular. Then we find that
 
Q2 = AVEi ni ȳi S−1 ȳi
⎛ √ ⎞
√ √
 n1 ȳ1
−1
= ( n1 ȳ1 , ..., nc−1 ȳc−1 ) Ic−1 − d̂d̂ ⊗ S−1 ⎝ ... ⎠

nc−1 ȳc−1

with dˆi = ni /n and with
 

 −1 d̂d̂
Ic−1 − d̂d̂ = Ic−1 + .
1 − d̂d̂

This shows that the limiting distribution of Q2 is a chi-square distribution with


(c − 1)p degrees of freedom. Approximate power is obtained if one considers a
sequence of alternatives Hn : μ i = μ + n−1/2δ i , i = 1, ..., c, where ∑ci=1 λi δ i = 0. It
is straightforward to see that, under this sequence of alternatives, the limiting distri-
bution of Q2 is a noncentral chi square distribution with (c − 1)p degrees of freedom
and the noncentrality parameter
c
∑ λi δ i Σ −1 δ i .
i=1
152 11 Several-sample location problem

We have thus proved the following.


Theorem 11.1.
• Under assumptions stated above and under the null hypothesis H0 : μ 1 = · · · =
μ c , the limiting distribution of Q2 is a chi-square distribution with (c − 1)p de-
grees of freedom.
• Under assumptions stated above and under the sequence of alternatives Hn :
μ i = μ + n−1/2δ i , i = 1, ..., c, where ∑ci=1 λi δ i = 0, the limiting distribution of
the test statistic Q2 is a chi-square distribution with (c − 1)p degrees of freedom
and a noncentrality parameter ∑ci=1 λi δ i Σ −1 δ i .

Assume that E(Y) = Δ and center the mean matrix as Δ̂ = (In − P1n )Δ . Accord-
ing to Theorem 11.1, the distribution of Q2 can be approximated by a noncentral
chi-square distribution with (c − 1)p degrees of freedom and noncentrality parame-
ter

tr(Δ̂ PX Δ̂ Σ −1 ).
This result can be used in the power and sample size calculations.

Totally nonparametric model. The permutation version of the MANOVA test


is based on the fact that under the null distribution

PY ∼ Y

for all n × n permutation matrices P. In this approach we use the second formulation
of the model. Now the value of the test statistic for permuted data set PY is

Q2 (PY) = n · tr Ŷ PPX PŶ(Ŷ Ŷ)−1 .

To calculate a new value of the test statistic after the permutation, we only need to
transform the projection matrix PX → PPX P. The exact p value is again obtained
as  
EP Q2 (PY) ≥ Q2 (Y) ,
where P is uniformly distributed in the set of all different n! permutation matrices.

Some final comments. It is easy to check that the test statistic Q2 is affine
invariant in the sense that

Q2 (YH + 1nb ) = Q2 (Y)

for any full-rank p × p transformation matrix H and for any p-vector b. This is
naturally true for Q2LH as well. This means that the exact null distribution then does
not depend on μ or Ω at all. If we write

ε̂ = ŶS−1/2 = (In − P1n )YS−1/2


11.3 The test based on spatial signs 153

then the test statistic simplifies to

Q2 (Y) = Q2 (ε̂ ) = tr(ε̂  PX ε̂ ) = |PX ε̂ |2 ,

which is just the squared norm of a projection of the centered and standardized data
matrix.

11.3 The test based on spatial signs

In this section we consider the test statistic that uses the spatial sign score function
T(y) = U(y). The tests are then extensions of univariate two- and several-sample
Mood’s test or median test; see Mood (1954). Again, we assume that the observa-
tions yi j are generated by

yi j = μ i + Ωε i j , i = 1, ..., c; j = 1, ..., ni ,

where ε i j are independent, centered, and standardized observations (in the sense
described later) with a joint cumulative distribution function F. Again, we wish to
test the null hypothesis
H0 : μ 1 = · · · = μ c .

Then the first testing procedure is as follows.

Test statistic using inner centering and outer standardization.


1. Inner centered scores are

Ûi j = U(yi j − μ̂ ), i = 1, ..., c; j = 1, ..., ni ,

where μ̂ = μ̂ (Y) is the spatial median so that AVE{Ûi j } = 0.


2. The test statistic for testing whether the ith sample differs from the others is based
on
Ûi = AVE j {Ûi j }, i = 1, ..., c.
3. Moreover, the test statistic is
c  
Q2 = Q2 (Y) = ∑ ni Ûi B̂−1 Ûi ,
i=1
 
where B̂ = AVE Ûi j Ûi j
4. Under the null hypothesis H0 : μ 1 = · · · = μ c and under some weak assumptions,
the limiting distribution of Q2 (Y) is χ(c−1)p
2 .
The distributional assumptions needed for the asymptotical results are now the
following.
154 11 Several-sample location problem

Assumption 3 The density function of yi j is bounded and continuous. Moreover,


the spatial median of yi j − μ i is unique and zero.
The assumptions are weaker than in the case of the identity score in the sense that no
moment assumptions are needed. As in the identity score case, we need assumptions
on the limiting behavior of the sample sizes.
Assumption 4
ni
→ λi , i = 1, ..., c, as n → ∞
n
with 0 < λi < 1, i = 1, ..., c.
Then we can prove the next theorem.
Theorem 11.2.
• Under Assumptions 3 and 4 and under the null hypothesis H0 : μ 1 = · · · = μ c ,
the limiting distribution of Q2 is a chi-square distribution with (c − 1)p degrees
of freedom.
• Under Assumptions 3 and 4 and under the sequence of alternatives Hn : μ i =
n−1/2δ i , i = 1, ..., c, where ∑ci=1 λi δ i = 0, the limiting distribution of the test
statistic Q2 is a chi-square distribution with (c − 1)p degrees of freedom and
the noncentrality parameter
c
∑ λi δ i AB−1 Aδ i ,
i=1

where, as in the one-sample case,


  
1 (yi j − μ i )(yi j − μ i )
A=E Ip −
|yi j − μ i | |yi j − μ i |2

and  
(yi j − μ i )(yi j − μ i )
B=E .
|yi j − μ i |2
Proof. We first assume that the null hypothesis is true with μ 1 = · · · = μ c = 0. Then
E(U(yi j )) = 0. Write

Ui j = U(yi j ), i = 1, ..., c; j = 1, ..., ni ,

for the “theoretical scores” and


   
Ūi = AVE j Ui j and Ū = AVE Ui j .

As the null hypothesis is true and we are back in the one-sample case and
√ √
nμ̂ = nA−1 Ū + oP(1).

Recall the results in Section 6.2. Using the results in Section 6.1.1, we can conclude
that
11.3 The test based on spatial signs 155
√ √ √ √ 
ni Ûi = ni Ūi + ni Aμ̂ + oP (1) = ni Ūi − Ū + oP (1).
But then we are back in the regular MANOVA case with the identity score function
and the Ui j as observation vectors, and the null result follows from Theorem 11.1.
Consider next a sequence of alternatives
1
Hn : μ i = μ + √ δ i , i = 1, ..., c,
n

where ∑ci=1 λi δ i = 0. Write


 
δ̃ = ( λ1 δ 1 , ..., λc−1 δ c−1 ) .

Then, under the alternative sequence, the limiting distribution of


√ √
( n1 Û1 , ..., nc−1 Ûc−1 )

is multivariate normal with mean vector

(Ic−1 ⊗ A)δ̃

and covariance matrix 


Ic−1 − dd ⊗ B,
√ 
where d = ( λ1 , ..., λc−1 ) . Then, under the alternative sequence, the limiting
distribution of Q2 is the noncentral chi-square distribution with (c − 1)p degrees of
freedom and noncentrality parameter
  
 dd c
δ̃ Ic−1 + 
⊗ AB −1
A δ̃ = ∑ λi δ i AB−1 Aδ i .
1−d d i=1

Again, if Δ is the n × p matrix of the location centers of the observations under


the alternative and Δ̂ = (In − P1n )Δ , one can use Theorem 11.2 to approximate the
distribution of Q2 . It has an approximate noncentral chi-square distribution with
(c − 1)p degrees of freedom and noncentrality parameter

tr(Δ̂ PX Δ̂ AB−1 A).

Totally nonparametric model. The permutation version of the MANOVA test


based on the spatial sign score is obtained as follows. As before, write the dataset as

(X, Y),

where the ith column of the matrix X indicates the membership of the ith sample.
Then transform the matrix Y to the inner centered score matrix Û. Then again

Q2 (Y) = n · tr Û PX Û(Û Û)−1
156 11 Several-sample location problem

and 
Q2 (PY) = n · tr Û PPX PÛ(Û Û)−1 .
As before, the exact p value is given by

EP Q2 (PY) ≥ Q2 (Y) ,

where P is uniformly distributed over the set of all n! different permutation matrices.

Affine invariant test. Unfortunately, the spatial sign test statistic discussed so
far is not affine invariant. An affine invariant version of the test is obtained if one
uses inner centering and inner standardization:

Test statistic using inner centering and inner standardization.


1. Find a shift vector μ̂ and full rank transformation matrix S−1/2 such that scores
for
Ûi j = U(S−1/2 (yi j − μ̂ )), i = 1, ..., c; j = 1, ..., ni ,
are standardized in the sense that
   
AVE Ûi j = 0 and p · AVE Ûi j Ûi j = I p .

2. The test statistic for testing whether the ith sample differs from the others is then
based on  
Ûi = AVE j Ûi j , i = 1, ..., n.
3. The several-sample location test statistic

Q2 = p · ∑ ni |Ûi |2 .

4. Under the null hypothesis H0 : μ 1 = · · · = μ c and under some weak assumptions,


the limiting distribution of Q2 is χ(c−1)p
2 .

Note that if one transforms Y → Û where the scores in Û are inner centered and
inner standardized then simply

Q2 (Y) = p · |PX Û|2 .

Now
Q2 (PY) = p · |PX PÛ|2
and the p values of the exact test version are easily calculated.
11.4 The tests based on spatial ranks 157

11.4 The tests based on spatial ranks

This approach uses the spatial rank function R(y) = RY (y). For a dataset Y =
(y1 , ..., yn ) , the spatial (centered) rank function is

R(y) = AVE{U(y − yi)}.

Remember that the spatial ranks

Ri = R(yi ), i = 1, ..., n,

are automatically centered; that is, ∑ni=1 Ri = 0, and no inner centering is there-
fore needed. The tests extend the two-sample Wilcoxon-Mann-Whitney test and the
several-sample Kruskal-Wallis test.

We assume that the observations yi j are generated by

yi j = μ i + Ωε i j , i = 1, ..., c; j = 1, ..., ni ,

where ε i j are independent, centered, and standardized observations (in the sense
described later) with a joint cumulative distribution function F. Again, we wish to
test the null hypothesis
H0 : μ 1 = · · · = μ c .

Then the first testing procedure is the following.

Test statistic with outer standardization.


1. Centered spatial rank scores are

Ri j = RY (yi j ), i = 1, ..., c; j = 1, ..., ni ,

which satisfy AVE{Ri j } = 0.


2. The test statistic for testing whether the ith sample differs from the others is based
on
R̄i = AVE j {Ri j }, i = 1, ..., c.
3. Moreover, the test statistic for H0 : μ 1 = · · · = μ c is
c  
Q2 = Q2 (Y) = ∑ ni R̄i B̂−1 R̄i ,
i=1
 
where B̂ = RCOV(Y) = AVE Ri j Ri j .
4. Under the null hypothesis and under some weak assumptions, the limiting distri-
bution of Q2 (Y) is χ(c−1)p
2 .
The distributional assumptions needed for the asymptotical results are now as
follows.
158 11 Several-sample location problem

Assumption 5 The density function of yi j is bounded and continuous. Moreover,


the spatial median of yi j − yrs is unique (μ i − μ r ).
Again, no moment assumptions are needed, and the limiting behavior of the sample
sizes is as stated in Assumption 4.
Then we can prove the next theorem.
Theorem 11.3.
• Under Assumptions 4 and 5 and under the null hypothesis H0 : μ 1 = · · · = μ c ,
the limiting distribution of Q2 is a chi square distribution with (c − 1)p degrees
of freedom.
• Under Assumptions 4 and 5 and under the sequence of alternatives Hn : μ i =
n−1/2δ i , i = 1, ..., c, where ∑ci=1 λi δ i = 0, the limiting distribution of the test
statistic Q2 is a chi-square distribution with (c − 1)p degrees of freedom and
a noncentrality parameter
c
∑ λi δ i AB−1 Aδ i ,
i=1

where now, with expected values calculated in the null case,


  
1 (y1 − y2)(y1 − y2)
A=E Ip −
|y1 − y2 | |y1 − y2 |2

and  
(y1 − y2 )(y1 − y3)
B=E .
|y1 − y2| · |y1 − y3 |

Proof. We first assume that the null hypothesis is true; that is, E(U(yi j − yrs )) = 0
for all i, j, r, s. The test statistic

R̄i = AVE j {Ri j },

is a two-sample U-statistic and, under the null hypothesis, asymptotically equivalent


with its projection statistic

AVE j {RF (yi j )} − AVEi j {RF (yi j )},

where RF (y) = EF (U(y − yi j )) is the theoretical (population) centered rank func-


tion. But now we are back in the regular MANOVA case with identity score function
and the RF (yi j ) as observed vectors, and the null result follows from Theorem 11.1.
Next consider a sequence of alternatives
1
Hn : μ i = μ + √ δ i , i = 1, ..., c,
n

where ∑ci=1 λi δ i = 0. Write


11.4 The tests based on spatial ranks 159
 
δ̃ = ( λ1 δ 1 , ..., λc−1 δ c−1 ) .

Then, under the alternative sequence, the limiting distribution of


√ √
( n1 R̄1 , ..., nc−1 R̄c−1 )

is a multivariate normal with the mean vector

(Ic−1 ⊗ A)δ̃

and the covariance matrix 


Ic−1 − dd ⊗ B,
√ 
where d = ( λ1 , ..., λc−1 ) . Then, under the alternative sequence, the limiting
distribution of Q2 is a noncentral chi-square distribution with (c − 1)p degrees of
freedom and the noncentrality parameter
  
 dd c
δ̃ Ic−1 + 
⊗ AB −1
A δ̃ = ∑ λi δ i AB−1 Aδ i .
1−d d i=1

Totally nonparametric model. The permutation version of the MANOVA test


based on the spatial ranks is obtained in a similar way as in the case of other scores.
As before an n × c matrix X identifies the sample so that the ith column of the
matrix X indicates the membership in the ith sample. The n × p data matrix Y is
transformed to the matrix R of centered spatial signs. Then

Q2 (Y) = n · tr R PX R(R R)−1

and 
Q2 (PY) = n · tr R PPX PR(R R)−1
and the exact p-values are obtained as before.

The test statistic is not affine invariant, however. An affine invariant modifica-
tion of the test is obtained if one uses inner centering and inner standardization as
follows.

Test statistic using inner standardization.


1. Find a full rank transformation matrix S−1/2 such that

RCOV(YS−1/2 ) ∝ I p .

Write
R̂i j = RYS−1/2 (S−1/2 yi j ), i = 1, ..., c; j = 1, ..., ni .
2. The test statistic for testing whether the ith sample differs from the others is then
based on
160 11 Several-sample location problem
 
R̃i = AVE j R̂i j , i = 1, ..., n.
3. The several-sample location test statistic is then

∑ ni |R̃i |2
Q2 = np · .
∑i ∑ j |R̂i j |2

4. Under the null hypothesis and under some weak assumptions, the limiting distri-
bution of Q2 (Y) is χ(c−1)p
2
.

11.5 Estimation of the treatment effects

Let the data matrix


Y = (Y1 , ..., Yc )
consist of c independent random samples

Yi = (yi1 , ..., yini ) , i = 1, ..., c,

from p-variate distributions. Write n = n1 + · · · + nc and, for the asymptotic results,


assume that
ni
n → ∞ and → λi , i = 1, ..., c,
n
where 0 < λi < 1, i = 1, ..., c. We again assume that the independent p-variate ob-
servation vectors yi j , are generated by

yi j = μ i + Ω ε i j , i = 1, ..., c; j = 1, ..., ni ,

where the ε i j are the standardized and centered vectors all having the same unknown
distribution. As before, μ 1 , ..., μ c are unknown location centers and the matrix Σ =
Ω Ω  > 0 is a joint unknown scatter matrix.

We now wish to estimate unknown differences Δ i j = μ j − μ i , i, j = 1, ..., c. The


three scores (identity, spatial sign, and spatial rank) yield the following estimates,
• The difference between the sample mean vectors,
• The difference between the sample spatial medians, and
• The two-sample Hodges-Lehmann estimate,
correspondingly. For symmetrical distributions, all three estimates estimate the same
population quantity Δ i j . We now have a closer look at these three estimates.

The difference between the sample means. The first estimate of Δ i j is

Δ̂ i j = ȳ j − ȳi , i, j = 1, ..., c.

If Σ = COV(yi j ) exists then, using the central limit theorem,


11.5 Estimation of the treatment effects 161
   
√ 1 1
n(Δ̂ i j − Δ i j ) →d N p 0, + Σ , i = j.
λ1 λ2

Of course, Σ is unknown but can be estimated with

Σ̂ = AVE{(yi j − ȳi )(yi j − ȳi ) }.

Naturally the estimates are affine equivariant and satisfy

Δ̂ i j = Δ̂ ik + Δ̂ k j , for all i, k, j = 1, ..., c.

The difference between the sample spatial medians. Let now μ̂i be the spatial
median calculated for Yi , i = 1, ..., c. Our second estimate for Δ i j is then

Δ̂ i j = μ̂ j − μ̂i , i, j = 1, ..., c.

If  
1 yy yy
A(y) = Ip − and B(y) =
|y| |y| |y|2
and    
A = E A(yi j − μ i ) and B = E B(yi j − μ i )
then, using Theorem 7.4,
   
√ 1 1
n(Δ̂ i j − Δ i j ) →d N p 0, + A−1 BA−1 .
λ1 λ2

No moment assumptions are needed here. Of course, A and B are unknown but can
be estimated by

 = AVE [A(yi j − μ̂i )] and B̂ = AVE [B(yi j − μ̂i )] .

The estimates satisfy

Δ̂ i j = Δ̂ ik + Δ̂ k j , for all i, k, j = 1, ..., c,

but they are not affine equivariant.

An affine equivariant estimator is obtained with the inner standardization as fol-


lows. Let μ i , i = 1, ..., c, be p-vectors and S > 0 a symmetric p × p matrix, and
define

ε i j = ε i j (μ i , S) = S−1/2 (yi j − μ i ), i = 1, ..., c; j = 1, ..., ni .

The Hettmansperger-Randles (HR) estimates of the location centers and the joint
scatter matrix are the values of μ 1 , ..., μ c and S that simultaneously satisfy
 
AVE j U(ε i j ) = 0, i = 1, ..., c,
162 11 Several-sample location problem

and  
p · AVE U(ε i j )U(ε i j ) = I p .

The two-sample Hodges-Lehmann estimate. The two-sample Hodges-


Lehmann estimate Δ̂ i j is the spatial median calculated from the pairwise differences

y jr − yis, r = 1, ..., n j ; s = 1, ..., ni .

If now  
1 yy y1 y2
A(y) = I− and B(y1 , y2 ) =
|y| |y| |y1 | · |y2 |
and
A = E {A(y11 − y12)} and B = E {B(y11 − y12, y11 − y13)} ,
then again
   
√ 1 1 −1 −1
n(Δ̂ i j − Δ i j ) →d N p 0, + A BA .
λ1 λ2

Again, A and B are unknown but can be estimated by


 
 = AVE A(y jr − yis − Δ̂ i j )

and by  
B̂ = AVE B(y jr − yis − Δ̂ i j , y jr − ykl − Δ̂ k j ) ,
respectively.

Unfortunately, the estimates Δ̂ i j are not any more compatible in the sense that
Δ̂ i j = Δ̂ ik + Δ̂ k j . To overcome this problem, one can first, using the kth sample as a
reference sample, find an estimate for the difference between ith and jth sample as

Δ̃ i j·k = Δ̂ ik + Δ̂ k j , k = 1, ..., c,

and then take the weighted average, Spjøtvoll’s estimator (Spjøtvoll (1968))

1 c
Δ̃ i j = ∑ nk Δ̃ i j·k .
n k=1

Then
Δ̃ i j = Δ̃ ik + Δ̃ k j , for all i, k, j = 1, ..., c.
One can easily show that √
n(Δ̃ i j − Δ̂ i j ) →P 0
√ √
so that the limiting distributions of n(Δ̃ i j − Δ i j ) and n(Δ̂ i j − Δ i j ) are the same.
See Nevalainen et al. (2007c).
11.6 An example: Egyptian skulls from three epochs. 163

An affine equivariant estimator is obtained with the transformation and retrans-


formation techique as follows. Let Δ̂ i j , i, j = 1, ..., c, be p-vectors and S > 0 a sym-
metric p × p that simultaneously satisfy


AVErs U S−1/2 (y jr − yis − Δ̂ i j ) = 0, for all i, j = 1, ..., c

and 

AVE B S−1/2 (y jr − yis − Δ̂ i j ), S−1/2 (y jr − ykl − Δ̂ k j ) ∝ I p .

11.6 An example: Egyptian skulls from three epochs.

As an example, we consider the classical dataset on Egyptian skulls from three dif-
ferent epochs. The same data were analyzed in Johnson and Wichern (1998). The
three epochs were time periods in years around 4000, around 3300, and around 1850
BC. Thirty skulls were measured for each time period, and the four measured vari-
ables denoted by mb (maximal breadth), bh (basibregmatic height), bl (basialveolar
length), and nh (nasal height). We wish to consider if there are any differences in
the mean skull sizes among the time periods. See below the first description of the
dataset. The observations are plotted in Figure 11.1.

125 135 145 45 50 55 60


135

mb
120
145
135

bh
125

115

bl
100
90
45 50 55 60

nh

120 135 90 100 115

Fig. 11.1 Egyptian skulls data.


164 11 Several-sample location problem

> library(MNM)
> library(HSAUR)
>
> # using skulls as in Johnson and Wichern
>
> SKULLS <- skulls[1:90,]
> levels(SKULLS epoch)<- c(levels(SKULLS epoch)[1:3], NA, NA)
> summary(SKULLS)
epoch mb bh
c4000BC:30 Min. :119 Min. :121
c3300BC:30 1st Qu.:130 1st Qu.:130
c1850BC:30 Median :133 Median :134
Mean :133 Mean :133
3rd Qu.:136 3rd Qu.:136
Max. :148 Max. :145
bl nh
Min. : 87.0 Min. :44.0
1st Qu.: 94.2 1st Qu.:48.0
Median : 98.0 Median :50.0
Mean : 98.1 Mean :50.4
3rd Qu.:101.0 3rd Qu.:53.0
Max. :114.0 Max. :60.0
> X <- SKULLS[,2:5]
> epoch <- SKULLS epoch
> pairs(X, col = as.numeric(epoch), pch=as.numeric(epoch)+14)

We compare three tests, namely, (i) the regular MANOVA based on the identity
score function, (ii) the MANOVA based on the inner centered and standardized
spatial signs, and (iii) the MANOVA based on the inner standardized spatial ranks.
The spatial signs and ranks used in the test are illustrated in Figures 11.2 and 11.3
which are given by

> pairs(spatial.sign(X, TRUE, TRUE), col = as.numeric(epoch),


pch=as.numeric(epoch)+14)
> pairs(spatial.rank(X, TRUE), col = as.numeric(epoch),
pch=as.numeric(epoch)+14)

We next report the p-values coming from different tests. Also the p-values
based on the permutation distribution are reported for the comparison. The classical
MANOVA test produces
11.6 An example: Egyptian skulls from three epochs. 165

−0.5 0.5 −1.0 0.0 1.0

1.0
var 1

0.0
−1.0
0.5

var 2
−0.5

0.5
var 3

−0.5
1.0

var 4
0.0
−1.0

−1.0 0.0 1.0 −0.5 0.5

Fig. 11.2 Egyptian skulls data. Inner centered and inner standardized spatial signs.

−0.5 0.5 −0.6 0.0 0.6


0.5

var 1
−0.5
0.5

var 2
−0.5

0.5

var 3
−0.5
0.6

var 4
0.0
−0.6

−0.5 0.5 −0.5 0.5

Fig. 11.3 Egyptian skulls data. Inner standardized spatial ranks.


166 11 Several-sample location problem

>
> aggregate(X,list(epoch=epoch),mean)
epoch mb bh bl nh
1 c4000BC 131.4 133.6 99.17 50.53
2 c3300BC 132.4 132.7 99.07 50.23
3 c1850BC 134.5 133.8 96.03 50.57
>
> mv.Csample.test(X, epoch)

Several samples location test


using Hotellings T2

data: X by epoch
Q.2 = 15.5, df = 8, p-value = 0.05014
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0,0)

> set.seed(1234)
> mv.Csample.test(X, epoch, method="perm")

Several samples location test


using Hotellings T2

data: X by epoch
Q.2 = 15.5, replications = 1000, p-value = 0.05
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0,0)

If one uses the MANOVA based on the spatial signs (invariant version), one gets

> mv.Csample.test(X, epoch, "s", "i")

Equivariant several samples location


test using spatial signs

data: X by epoch
Q.2 = 17.1, df = 8, p-value = 0.02909
alternative hypothesis: true location
difference between some groups is not equal to c(0,0,0,0)

> set.seed(1234)
> mv.Csample.test(X, epoch, "s", "i", "perm")

Equivariant several samples location


test using spatial signs

data: X by epoch
Q.2 = 17.1, replications = 1000, p-value = 0.032
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0,0)
11.6 An example: Egyptian skulls from three epochs. 167

With the invariant the rank-based MANOVA one gets the following results.

> mv.Csample.test(X, epoch, "r", "i")

Equivariant several samples location


test using spatial ranks

data: X by epoch
Q.2 = 16.67, df = 8, p-value = 0.03372
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0,0)

> set.seed(1234)
> mv.Csample.test(X, epoch, "r", "i", "perm")

Equivariant several samples location


test using spatial ranks

data: X by epoch
Q.2 = 16.67, replications = 1000, p-value = 0.034
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0,0)

We end this example with the comparison of different two-sample location esti-
mates. The estimates are (i) the difference of the mean vectors, (ii) the difference
of the affine equivariant spatial medians, and (iii) the affine equivariant two-sample
Hodges-Lehmann estimate. We compare the first and the last epoch, and get the
following.

> SKULLS.13 <- SKULLS[c(1:30,61:90),]


> levels(SKULLS.13 epoch)<-
c(levels(SKULLS.13 epoch)[1], NA,
levels(SKULLS.13 epoch)[3])
> X.13 <- SKULLS.13[,2:5]
> epoch.13 <- SKULLS.13 epoch
> EST1.13 <- mv.2sample.est(X.13,epoch.13)
> summary(EST1.13)
The difference between sample mean vectors
of X.13 by epoch.13 is:
mb bh bl nh
-3.1000 -0.2000 3.1333 -0.0333

And has the covariance matrix:


mb bh bl nh
mb 1.2810 0.1646 -0.0107 0.2715
bh 0.1646 1.4920 0.0933 0.0101
168 11 Several-sample location problem

bl -0.0107 0.0933 1.8450 -0.0083


nh 0.2715 0.0101 -0.0083 0.6745
> EST3.13 <- mv.2sample.est(X.13,epoch.13,"s" ,"i")
> summary(EST3.13)
The difference between equivariant spatial medians
of X.13 by epoch.13 is:
[1] -3.6719 -0.1304 3.8302 0.1285

And has the covariance matrix:


[,1] [,2] [,3] [,4]
[1,] 1.3423 0.4713 -0.1570 0.2580
[2,] 0.4713 1.4661 -0.0794 0.2313
[3,] -0.1570 -0.0794 1.7255 0.0738
[4,] 0.2580 0.2313 0.0738 0.5072
> EST5.13 <- mv.2sample.est(X.13,epoch.13,"r" ,"i")
> summary(EST5.13)
The equivariant spatial Hodges-Lehmann estimator for
location difference of X.13 by epoch.13 is:
[,1] [,2] [,3] [,4]
[1,] -3.3465 -0.1283 3.3335 0.1556

And has the covariance matrix:


[,1] [,2] [,3] [,4]
[1,] 1.2440 0.2697 -0.0018 0.2701
[2,] 0.2697 1.3138 -0.0741 0.1081
[3,] -0.0018 -0.0741 1.7973 0.0605
[4,] 0.2701 0.1081 0.0605 0.5920
>
> plotMvloc(EST1.13, EST3.13, EST5.13, lty.ell = 1:3,
pch.ell = 14:17)

The differences in the estimates also seem to be minimal, as can be seen in Figure
11.4. The assumption on the multivariate normality of the data may be realistic here.
If the last observation is changed to (200, 200, 200, 200) to be an outlier then the
estimate and the confidence ellipsoid based on the regular mean vectors also changes
dramatically as can be seen in Figure 11.5.

11.7 References and other approaches

See Möttönen and Oja (1995), Choi and Marden (1997), Marden (1999a), Visuri
et al. (2003), Oja and Randles (2004), and Nevalainen et al. (2007c) for different
uses of spatial signs and ranks in the multivariate several-sample location problem.
Mardia (1967) considered the bivariate problems. See Nevalainen and Oja (2006)
11.7 References and other approaches 169

−4 0 2 0 4 8 −2 0 2

−2
var 1

−6
0 2

0 2
8 −4
var 2

8 −4
var 3
4

4
0

0
2

var 4
0
−2

−6 −2 −4 0 2 0 4 8

difference between sample mean vectors


difference between equivariant spatial medians
equivariant spatial Hodges−Lehmann estimator for location difference

Fig. 11.4 Egyptian skulls data. Estimates of the difference between the first and last epochs.

−10 0 5 −10 0 10 −20 −5 5


0

var 1
−10
5

5
0

var 2
10 −10

10 −10

var 3
0

0
−10

−10
−5 5

var 4
−20

−10 0 −10 0 5 −10 0 10

difference between sample mean vectors


difference between equivariant spatial medians
equivariant spatial Hodges−Lehmann estimator for location difference

Fig. 11.5 Egyptian skulls data with an outlier. Estimates of the difference between the first and
last epochs.
170 11 Several-sample location problem

for SAS macros for spatial sign MANOVA methods. Puri and Sen (1971) give a
full description of the several-sample location tests based on the vector of marginal
ranks. Chakraborty and Chaudhuri (1999) and Nordhausen et al. (2006) propose and
consider invariant versions of Puri-Sen tests.

Randles and Peters (1990), Peters and Randles (1990b), and Randles (1992) use
interdirections in the test constructions. The tests based on data depth are given in
Liu (1992), Liu and Singh (1993), and Liu et al. (1999). Multivariate Oja signs and
ranks are used in Hettmansperger and Oja (1994), Hettmansperger et al. (1998) and
Oja (1999).
Chapter 12
Randomized blocks

Abstract A multivariate extension of the Friedman test which is based on the spa-
tial ranks is discussed. Related adjusted and unadjusted treatment effect estimates
are considered as well. Again, the test using outer standardization is rotation invari-
ant but unfortunately not invariant under heterogeneous scaling of the components.
Invariant (equivariant) versions of the test (estimates) based on inner standardization
are discussed as well.

12.1 The problem and the test statistics

The blocked design for the comparison of c ≥ 2 treatments is obtained by general-


izing the paired-sample design. A randomized complete block design with n blocks
then requires blocks of equal size c; the c subjects in each block are randomly
assigned to all c treatments. For univariate response variables, the most popular
tests for considering the null hypothesis of no treatment differences are the regular
balanced two-way analysis of variance (ANOVA) test and the nonparametric and
(under the null hypothesis) distribution-free rank-based Friedman test. These tests
are thus based on the identity score and on the rank score, respectively. Naturally,
the permutation test version of the ANOVA test is conditionally distribution-free as
well. For a discussion of the univariate tests and corresponding treatment difference
estimates, see Hollander and Wolfe (1999) or Lehmann (1998).

We discuss a p-variate generalization of the Friedman test with the companion


treatment effect estimates. The results and discussion here are based on Möttönen
et al. (2003). We do not use the spatial sign score function here as the limiting
properties of the test in that case seem too complicated.

The design and the data. The data consist of N = nc p-dimensional vectors.
The N = nc subjects are in n blocks of equal size and within each block the c subjects
are assigned to c treatments at random. The p-variate observations are then usually
given in an n × k table as follows.
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 171
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 12,
c Springer Science+Business Media, LLC 2010
172 12 Randomized blocks

Treatments
Blocks 1 2 ··· c
1 y11 y12 · · · y1c
2 y21 y22 · · · y2c
.. ... ... . . . ...
.
n yn1 yn2 · · · ync

The N × p data matrix can also be written as


⎛ ⎞
Y1
Y = ⎝ ... ⎠ ,
Yn

where blockwise c × p data matrices are

Yi = (yi1 , ..., yic ) , i = 1, ..., n.

Distributional assumptions. Due to the randomization, it is natural to assume


that
Yi = μ + 1c θ i + ε i , i = 1, ..., n,
where θ i , i = 1, ..., n, is the block effect (p-vector), and

μ = (μ 1 , ..., μ c )

is the c × p matrix of the treatment effects, ∑ci=1 μ i = 0, and the rows of the c × p
random matrix ε i are dependent but exchangeable; that is,

Pε i ∼ ε i , i = 1, ..., n,

for all c × c permutation matrices P. We also assume that ε 1 , ..., ε n are independent.

Problem. We wish to test the hypothesis of no treatment effects; that is,

H0 : μ 1 = · · · = μ c = 0.

As in the several-independent-sample case, we also wish to estimate the differences


of the treatment effects; that is,

Δ j j = μ j − μ j , j, j = 1, ..., c.

Note that, under the null hypothesis,

PYi ∼ Yi , for all c × c permutation matrices P.


12.1 The problem and the test statistics 173

Under the null hypothesis and under a stronger assumption that ε 1 , ..., ε n are inde-
pendent and identically distributed, the random matrices Yi are also independent
and identically distributed (and the limiting distribution of the test statistic given
later can be easily found).

Classical MANOVA test. We first use the identity score in the test construction.
The first step is to center the observed values in each block, that is,

yi j → ŷi j = yi j − ȳi , i = 1, ..., n; j = 1, ..., c,

where ȳi = (1/c) ∑cj=1 yi j , Next write


n
ŷ. j = ∑ ŷi j , j = 1, ..., c
i=1

and
1 n c
B̂ = ∑ ∑ ŷi j ŷi j ,
nc i=1 j=1

which is just the covariance matrix estimate for the within-blocks variation.

Now we can define a squared form test statistic for testing H0 and give its limiting
permutational distribution.
Definition 12.1. MANOVA test statistic for testing H0 is

c−1 c
Q2 = Q2 (Y) =
nc ∑ ŷ. j B̂−1 ŷ. j .
j=1

Asymptotically distribution-free MANOVA test. Möttönen et al. (2003) showed


that if (Yi ) is a (constant) sequence of matrices with uniformly bounded second and
third moments and if B̂ → B > 0, then the limiting permutation null distribution of
Q2 (Y) is a χ p(c−1)
2
distribution. If Y1 , ..., Yn are i.i.d with bounded second moments
then the limiting (unconditional) distribution is again the same χ p(k−1) 2 . (As the test
is based on the centered observations, the same limit is obtained even if the Yi − 1c θ i
are i.i.d for some p-vectors θ i , i = 1, 2, .... The constants θ i , i = 1, ..., n, are then the
fixed block effects.)

Note also that if


c
F̂ = n ∑ (ȳ· j − ȳ·· )(ȳ· j − ȳ·· )
j=1

then we can write


c−1
Q2 = tr(F̂B̂−1 ),
c
which is again the Pillai trace statistic. The statistic Q2 is naturally affine invariant.
174 12 Randomized blocks

Conditionally distribution-free MANOVA test. If the null hypothesis is true


then ⎛ ⎞ ⎛ ⎞
Y1 P 1 Y1
⎝ ... ⎠ ∼ ⎝ ... ⎠
Yn P n Yn
for all c × c permutation matrices P1 ,...,Pn . The p-value from the permutation test is
then given by   
EP I Q2 (PY) ≥ Q2 (Y) ,
where P = diag(P1 , ..., Pn ) is uniformly distributed over all (c!)n possible values.

The multivariate Friedman test. This test can be derived in exactly the same
way as the MANOVA tests. The multivariate blockwise centered response vectors
ŷi j are just replaced by multivariate blockwise centered rank vectors Ri j . The vector
Ri j is thus the centered rank of the observation yi j among all the observations in the
ith block, that is, among yi1 , ..., yic , i = 1, ..., n. The ranks can be displayed in a table
as follows.

Treatments
Blocks 1 2 ··· c Σ
1 R11 R12 · · · R1c 0
2 R21 R22 · · · R2c 0
.. .. .. . . .. ..
. . . . . .
n Rn1 Rn2 · · · Rnc 0
Σ R·1 R·2 · · · R·c 0

Now write
1 n c
B̂ = ∑ ∑ Ri j Ri j .
nc i=1 j=1
Definition 12.2. The multivariate Friedman test statistic is
c−1 c
Q2 =
nc ∑ R. j B̂−1 R. j .
j=1

Note that Q2 is rotation invariant but not invariant under rescaling of the compo-
nents.
Theorem 12.1. Assume that the null hypothesis of no-treatment effect is true and
that the sequence (Yi ) is independent and identically distributed up to a location
shift. The limiting distribution of Q2 is a central chisquare distribution with p(c − 1)
degrees of freedom.

Möttönen et al. (2003) also considered the permutation limiting distribution. It is


easy to see that the test statistic Q2 reduces to the classical Friedman test statistic as
p = 1.
12.2 Limiting distributions and efficiency 175

12.2 Limiting distributions and efficiency

For considering asymptotic properties of the tests and estimates we assume that the
original observations yi j are independent and that the cdf of the yi j is F(y − θi − μ j ),
where the θi are the (possibly random) block effects and the μ j the treatment effects
(∑i θi = ∑ j μ j = 0). We wish to test the hypothesis

1
H0 : μ 1 = · · · = μ c = 0 versus Hn : μ j = √ δ j , j = 1, ..., c,
n

where also ∑ j δ j = 0. Let δ denote the vector (δ 1 · · · δ k−1 ) . As the tests are based
on the centered observations and the centered ranks, it is not a restriction to assume
in the following that θ1 = · · · = θn = 0.

The limiting distribution of the classical MANOVA is then given in the following.

Theorem 12.2. Under the sequence of alternatives Hn , the limiting distribution of


the MANOVA test statistic Q2 is a noncentral χ 2 distribution with p(c − 1) degrees
of freedom and a noncentrality parameter
p
c−1
δMANOVA
2
=
c ∑ δ j Σ −1 δ j ,
j=1

where Σ is the covariance matrix of yi j .

For limiting distributions of the rank tests, we first recall the asymptotic theory
for spatial sign and rank tests for comparing two treatments. This is also done to
introduce matrices A, B1 , and B2 needed in the subsequent discussion. First we
consider the dependent samples case (matched pairs) and use the one-sample spatial
sign test. The one-sample spatial sign test statistic for the difference vectors yi2 −yi1 ,
i = 1, . . . , n, for example, is defined as
n
T12 = ∑ U(yi2 − yi1).
i=1

As given before in Chapter 8, under Hn ,

n−1/2T12 −→ Nd (A(δ 2 − δ 1 ), B1 ),
d

where
B1 = E{U(yi2 − yi1 )U(yi2 − yi1 ) }
is the spatial sign covariance matrix for difference vectors and
  
A = E |yi2 − yi1|−1 I p − U(yi2 − yi1 )U(yi2 − yi1 ) .
176 12 Randomized blocks

In the following we also need

B2 = E{U(y11 − y21)U(y11 − y31) }.

Then, under the null hypothesis,

c−1 (c − 1)(c − 2)
B = E(Ri j Ri j ) = 2
B1 + B2
c c2
is the covariance matrix of Ri j . (The expected values above are taken under the null
hypothesis.)

Now we are ready to give the limiting distributions of the rank tests
Theorem 12.3. Under the sequence of alternatives Hn , the limiting distribution of
the Friedman test statistic Q2 is a noncentral chi-square distribution with p(c − 1)
degrees of freedom and a noncentrality parameter

c−1 c
δFRIEDMAN
2
=
c ∑ δ j A B−1 Aδ j .
j=1

Relative efficiencies for comparing the regular MANOVA test and the Friedman
test are given by the following theorem.
Theorem 12.4. The Pitman asymptotic relative efficiency of the multivariate Fried-
man test with respect to MANOVA is

δFRIEDMAN
2
∑ j δ j A B−1 Aδ j
ARE12 = =  .
δMANOVA
2
∑ j δ j Σ −1 δ j

Table 12.1 lists efficiencies of the Friedman test with respect to the regular
MANOVA test. The Friedman test clearly outperforms the classical MANOVA test
for heavy-tailed distributions. The efficiencies increase with the dimension p as well
as with the number of treatments c. In the multivariate normal case the efficiency
goes to one as dimension p → ∞. For c = 2, the efficiencies are as the efficiencies of
the spatial sign test; the efficiencies go to the efficiencies of the spatial signed-rank
test as the number of treatments c → ∞. See Möttönen et al. (2003) for a more de-
tailed study and for the asymptotic relative efficiency of the similarly extended Page
test.

12.3 Treatment effect estimates

Consider now pairwise spatial sign test statistics,


n
T j j = ∑ U(yi j − yi j ) j, j = 1, ..., c
i=1
12.3 Treatment effect estimates 177

Table 12.1 Asymptotic relative efficiencies (ARE) of the spatial multivariate Friedman tests with
respect to the classical MANOVA in the spherical multivariate t p,ν distribution case with different
choices of p, ν , and the number of treatments c
c
p ν 2 3 10
4 1.152 1.233 1.367
2 10 0.867 0.925 1.023
∞ 0.785 0.838 0.924
4 1.245 1.308 1.406
3 10 0.937 0.980 1.048
∞ 0.849 0.887 0.946
4 1.396 1.427 1.472
10 10 1.050 1.067 1.092
∞ 0.951 0.964 0.981

for comparing treatments j and j . Note that T j j = 0 and T j j = −T j j , j, j = 1, ..., c.


The treatment difference estimates Δ̂ j j corresponding to T j j are the spatial medi-
ans of the differences yi j − yi j , i = 1, ..., n. Let

Δ̂ = (Δ̂ j j ) j, j =1,...,c

be the pc × c matrix of treatment difference estimates (Δ̂ j j = 0, j = 1, ..., k). As


in the univariate and in the case of several independent samples, the problem then
is that these estimates may not be consistent in the sense that Δ̂ j j = Δ̂ j j + Δ̂ j j .
A solution to this consistency problem is obtained as in the several-independent-
sample case. First take the averages
c
1
Δ̂ j = −
c ∑

Δ̂ j j .
j =1

Then the adjusted treatment difference estimates are Δ̃ j j = Δ̂ j − Δ̂ j and the matrix
of these adjusted estimates is then

Δ̃ = (Δ̃ j j ) j, j =1,...,k .

For the univariate case, see Lehmann (1998) and Hollander and Wolfe (1999).
Möttönen et al. (2003) then proved the following.

Theorem 12.5. Under general assumptions, the limiting distribution of nvec(Δ̂ −
Δ ) is a multivariate singular normal with mean matrix zero and covariance matrix
given by

Σ(i j),(i j) = A−1 B1 A−1 , Σ(i j),(il) = A−1 B2 A−1 , and Σ(i j),(lm) = 0,

for i < j < l < m, where


178 12 Randomized blocks

Σ(i j),(lm) = Cov(Δ̂ i j , Δ̂ lm ), i, j, l, m = 1, ..., c.



Moreover, the limiting distribution of nvec(Δ̃ − Δ ) is a multivariate singular nor-
mal with mean matrix zero and covariance matrix given by
2c −1 −1 c
Σ(i j),(i j) = A BA , Σ(i j),(il) = A−1 BA−1 , and Σ(i j),(lm) = 0,
c−1 c−1
for i < j < l < m, where now

Σ(i j),(lm) = Cov(Δ̃ i j , Δ̃ lm ), i, j, l, m = 1, ..., c.

The asymptotic relative efficiencies between competing treatment difference es-


timates is usually given in terms of the Wilks generalized variance, that is, the deter-
minant of the covariance matrix. The asymptotic relative efficiency (ARE) of Δ̃ j j
with respect to Δ̂ j j is then given by
 1/p
det(A−1 B1 A−1 )
 2c .
det c−1 A−1 BA−1

On the other hand, the asymptotic relative efficiency of Δ̃ j j with respect to ȳ· j − ȳ· j
is  1/p
det(Σ )
.
det(A−1 BA−1 )
Note that when the observations come from a spherical distribution the asymp-
totic relative efficiency of the estimate Δ̃ j j is the same as the asymptotic relative
efficiency of the multivariate Friedman test in Theorem 12.4.

Table 12.2 Asymptotic relative efficiency of Δ̃ j j with respect to Δ̂ j j in the spherical multivari-
ate normal distribution case with different choices of dimension p and with different numbers of
treatments c
number of treatments c
p 2 3 10
2 1.000 1.067 1.176
3 1.000 1.045 1.114
10 1.000 1.013 1.032
12.5 Examples and final remarks 179

12.4 Affine invariant tests and affine equivariant estimates

The tests and estimates discussed above are rotation invariant/equivariant but un-
fortunately not scale invariant/equivariant. Due to the lack of the scale invariance
property, rescaling of one of the response variables, for example, change the results
(p-values) and also highly reduce the efficiency of the tests and estimates. The trans-
formation and retransformation approach introduced by Chakraborty and Chaudhuri
(1996) may again be used to construct affine invariant/equivariant versions of the
tests and estimates.

To repeat, the idea in the transformation and retransformation approach is as


follows. Let S = S(Y) be any scatter matrix (estimate of Σ ) based on the data.
Transform your data set,
Ŷ = YS−1/2 ,
and construct the test based on the transformed data Ŷ. Then find the treatment
difference estimates Δ̂ j j (Ŷ) for the transformed data and retransform the estimates:

Δ̃ j j (Y) = S1/2 Δ̂ j j (Ŷ).

12.5 Examples and final remarks

We now illustrate the use of the multivariate Friedman test on a dataset earlier an-
alyzed by Seber (1984). A randomized complete block design experiment was ar-
ranged to study the effects of six different treatments on plots of bean plants infested
by the serpentine leaf miner insect. In this study the number of treatments was c = 6
and the number of blocks was n = 4. The measurement vectors consist of three
different variables: y1 is the number of miners per leaf, y2 is the weight of beans

per plot (in kilograms) and y3 = sin−1 ( pr), where pr is the proportion of leaves
infested with borer. See Table 12.3 and Figure 12.1 for the original dataset. The
blockwise centered ranks are given in Table 12.4.
The observed value of the multivariate Friedman test statistic Q2r is now 30.48 and
using χ15
2 distribution the corresponding p-value is approximately 0.01. The results

are similar for the affine invariant version of the test (inner standardization). For
the MANOVA test the standardized test statistic and p-value are 32.10 and 0.006,
respectively. Also estimated p-values for the exact permutation tests are given in the
following printout.

> data(beans)
> plot(beans)
180 12 Randomized blocks

1 2 3 4 5 6 0.0 0.4 0.8 1.2

4.0
3.0
Block

2.0
1.0
6
5
4
Treatment
3
2
1

2.0
y1

1.0
0.0
1.2
0.8

y2
0.4
0.0

0.8
y3

0.4
0.0
1.0 2.0 3.0 4.0 0.0 1.0 2.0 0.0 0.4 0.8

Fig. 12.1 Bean plants data.

Table 12.3 Bean plants data


Treatments
Blocks 1 2 3 4 5 6
1.7 1.7 1.4 0.1 1.3 1.7
1 0.4 1.0 0.8 0.8 1.0 0.5
0.20 0.40 0.28 0.10 0.12 0.74
1.2 1.2 1.5 0.2 1.4 2.1
2 1.4 0.6 0.8 1.2 1.2 1.0
0.20 0.25 0.83 0.08 0.20 0.59
1.3 1.7 1.1 0.3 1.3 2.3
3 0.6 0.1 0.7 1.2 0.8 0.4
0.36 0.32 0.58 0.00 0.30 0.50
1.7 1.1 1.1 0.0 1.2 1.3
4 1.1 0.0 0.9 0.4 0.6 0.9
0.39 0.29 0.50 0.00 0.36 0.28

> Y<-cbind(beans$y1,beans$y2,beans$y3)

> mv.2way.test(Y, beans$Block, beans$Treatment,


score="r", stand="o", method="a")

Multivariate Friedman test using spatial ranks


12.6 Other approaches 181

data: Y by beans$Treatment within beans$Block


Q2 = 30.4843, df = 15, p-value = 0.01029
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0)
> mv.2way.test(Y, beans$Block, beans$Treatment, "r", "i", "a")

Affine invariant multivariate Friedman test


using spatial ranks

data: Y by beans$Treatment within beans$Block


Q2 = 30.7522, df = 15, p-value = 0.00948
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0)
> mv.2way.test(Y, beans$Block, beans$Treatment, "i", "o", "a")

MANOVA test in a randomized complete block design

data: Y by beans$Treatment within beans$Block


Q2 = 32.1024, df = 15, p-value = 0.006235
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0)

> mv.2way.test(Y, beans$Block, beans$Treatment, "r", "o", "p")

Multivariate Friedman test using spatial ranks

data: Y by beans$Treatment within beans$Block


Q2 = 30.4843, replications = 1000, p-value < 2.2e-16
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0)

> mv.2way.test(Y, beans$Block, beans$Treatment, "i", "o", "p")

MANOVA test in a randomized complete block design

data: Y by beans$Treatment within beans$Block


Q2 = 32.1024, replications = 1000, p-value = 0.002
alternative hypothesis: true location difference
between some groups is not equal to c(0,0,0)

12.6 Other approaches

Möttönen et al. (2003) gave an extension of the Page test as well. Another possibil-
ity is to use the marginal centered ranks as described in Puri and Sen (1971). This
approach is scale invariant/equivariant but not rotation invariant/equivariant, and the
efficiency can again be really poor if the response variables are highly correlated.
The transformation and retransformation approach may be used to construct affine
182 12 Randomized blocks

Table 12.4 Bean Plants Data: Spatial Ranks


Treatments
Blocks 1 2 3 4 5 6 Sum
0.35 0.43 -0.08 -0.81 -0.20 0.31 0.00
1 -0.50 0.40 0.00 0.02 0.38 -0.31 0.00
-0.21 0.14 -0.03 -0.13 -0.32 0.56 0.00
-0.15 -0.13 0.14 -0.77 0.17 0.73 0.00
2 0.49 -0.52 -0.24 0.11 0.17 -0.01 0.00
-0.17 -0.15 0.53 -0.18 -0.21 0.18 0.00
-0.02 0.28 -0.35 -0.69 0.01 0.77 0.00
3 -0.13 -0.56 0.08 0.38 0.35 -0.12 0.00
-0.03 -0.07 0.39 -0.25 -0.17 0.12 0.00
0.65 0.01 -0.18 -0.75 0.06 0.21 0.00
4 0.45 -0.69 0.32 -0.16 -0.23 0.30 0.00
0.07 -0.03 0.32 -0.22 0.03 -0.17 0.00
0.83 0.60 -0.47 -3.01 0.04 2.02 0.00
Sum 0.31 -1.37 0.17 0.36 0.67 -0.14 0.00
-0.35 -0.11 1.22 -0.78 -0.67 0.69 0.00

invariant/equivariant (and consequently more efficient) versions of the tests and esti-
mates. A third possibility is to use affine equivariant centered ranks based on the Oja
criterion; see Oja (1999). To compute blockwise centered ranks one has to require,
however, that c ≥ p + 1, which may be a serious limitation in practice.
Chapter 13
Multivariate linear regression

Abstract In this chapter we consider the multivariate multiple regression problem.


The tests and estimates are again based on identity, spatial sign, and spatial rank
scores. The estimates obtained in this way are then the regular LS estimate, the
LAD estimate based on the mean deviation of the residuals (from the origin) and
the estimate based on mean difference of the residuals. The estimates are thus mul-
tivariate extensions of the univariate L1 estimates. Equivariant/invariant versions are
found using inner centering and standardization.

13.1 General strategy

Assume that (X, Y) is the data matrix and we consider the linear regression model

Y = Xβ + ε ,

where Y = (y1 , ..., yn ) is an n × p matrix of n observed values of p response vari-


ables, X = (x1 , ..., xn ) is an n × q matrix of observed values of q explaining vari-
ables, β is an q × p matrix of regression coefficients, and ε = (ε 1 , ..., ε n ) is an n × p
matrix of residuals. One can then also write

yi = β  xi + ε i , i = 1, ..., n.

We assume that ε = (ε 1 , ..., ε n ) is a random sample from a p-variate distribution


“centered” at the origin. Throughout this chapter we assume the following.

Assumption 6

1  max1≤i≤n {xi C Cxi }


XX→D and →0
n ∑ni=1 {xi C Cxi }

H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 183
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 13,
c Springer Science+Business Media, LLC 2010
184 13 Multivariate linear regression

for some positive definite q × q matrix D and for all p × q matrices C with positive
rank.

Testing problem I. Consider first the problem of testing the null hypothesis
H0 : β = 0. The null hypothesis thus simply says that y1 , ..., yn are independent and
identically distributed with a joint distribution centered at the origin. By a centered
observation, we mean here that E{T(yi )} = 0 for the chosen p-variate score function
T(y). We write

Ti = T(yi ) and Ti (β ) = T(yi − β  xi ), i = 1, ..., n,

and
T = (T1 , ...., Tn ) and T(β ) = (T1 (β ), ..., Tn (β )) .
Again write A = E{T(ε i )L(ε i ) } and B = E{T(ε i )T(ε i ) }. As before, L(y) is the
optimal multivariate location score function. If B exists, then under the null hypoth-
esis and under our design Assumption 6,

n−1/2 vec(T X) = n−1/2 (X ⊗ I p)vec(T )

has a limiting N pq (0, D ⊗ B) distribution and then the test statistic using the outer
standardization

Q2 = Q2 (X, Y) = n · tr((T PX T)(T T)−1 ) →d χ pq


2
,

where, as before, PX = X(X X)−1 X is an n × n projection matrix to the linear space


spanned by the columns of X. The test thus compares the covariance matrix of the
projected and transformed data to the covariance matrix of the transformed data.
Under the sequence of alternatives Hn : β = n−1/2δ where a q × p matrix δ gives
the direction of the alternative sequence, the limiting distribution of n−1/2 vec(T X)
is often a N pq (vec(Aδ  D), D ⊗ B) distribution, and that of Q2 is a noncentral chi-
square distribution with pq degrees of freedom and noncentrality parameter

vec(δ  ) (D ⊗ (AB−1 A))vec(δ  ) = tr (δ  Dδ )(AB−1 A) .

The distribution of the test statistic at the true value β (close to the origin) can then
be approximated by a noncentral chi-square distribution with pq degrees of freedom
and noncentrality parameter
 
tr Δ  PX Δ AB−1 A ,

where Δ = Xβ .

If one uses the inner standardization, one first finds a full rank transformation
matrix S−1/2 such that, if we transform

yi → T̂i = T(S−1/2 yi ) and Y → T̂ = (T̂1 , ..., T̂n )


13.1 General strategy 185

then
p · T̂ T̂ = tr(T̂ T̂)I p .
The test statistic is then
np
Q2 = Q2 (X, Y) = n · tr((T̂ PX T̂)(T̂ T̂)−1 ) = |PX T̂|2
tr(T̂ T̂)

which also often has the limiting χ pq


2 null distribution. The p value from a condi-

tionally distribution-free permutation test is obtained as


  
EP I Q2 (PX, Y) ≥ Q2 (X, Y) ,

where P is uniformly distributed in the space of all possible n × n permutation ma-


trices.

Estimation problem. We consider the linear regression model

Y = Xβ + ε

and wish to estimate unknown β . The estimate that is based on the score function T
then often solves
T(β̂ ) X = 0.
We thus find the estimate β̂ such that the transformed estimated residuals are uncor-
related with the explaining variable. Note that usually one of the explaining variables
in X corresponds to the intercept term in the regression model (the corresponding
column in X is 1n ); this implies that the transformed residuals also sum up to zero.

Under general assumptions, the approximate connection between the estimate


and test is  
√ 1
n(β̂ − β ) = A−1 √ T(β ) X D−1 + oP (1)
n
which further implies that


n vec (β̂ − β ) →d Nqp (0, D−1 ⊗ (A−1 BA−1 )),

where, as before, A = E {T(ε i )L(ε i ) } and B = E {T(ε i )T(ε i ) }. Recall that L is the
optimal score function. If one uses inner standardization in the estimation problem,
one first finds an estimate β̂ and a full rank transformation matrix S−1/2 such that,
if we transform

yi → T̂i = T(S−1/2 (yi − β̂ xi )) and Y → T̂ = (T̂1 , ..., T̂n )

then
T̂ X = 0 and p · T̂ T̂ = tr(T̂ T̂)I p .

Testing problem II. Consider next the partitioned linear regression model
186 13 Multivariate linear regression

Y = X1 β 1 + X2 β 2 + ε

where X1 (resp., X2 ) is an n × q1 (resp., n × q2 ) matrix. We wish to test the null


hypothesis H0 : β 2 = 0.

To construct the test statistic, first find the centered scores (the centering under
the null hypothesis)
T̂ = T(β̂ 1 , 0)
such that T̂ X1 = 0. We also write X̂2 = (In − PX1 )X2 . Then the test statistic


Q2 = Q2 (X1 , X2 , Y) = n · tr T̂ PX̂2 T̂(T̂ T̂)−1

has an approximate chi-square distribution with q2 p degrees of freedom. The dis-


tribution of the test statistic for Δ = Xβ (close to the null point where PX1 Δ = Δ )
can now be approximated by a noncentral chi-square distribution with q2 p degrees
of freedom and noncentrality parameter

 
tr (Δ̂ PX2 Δ̂ ) AB−1 A .

where Δ̂ = (In − PX1 )Δ .

Note also that, under general assumptions,


  −1
√  −1 1 1 
n(β̂ 2 − β 2 ) = A √ T(β ) X̂2 X̂ X̂2 + oP(1)
n n 2

and then the Wald test statistic


−1

n vec(β̂ 2 ) COV(β̂ 2 ) vec(β̂ 2 ),

where  −1
 1  
COV(β̂ 2 ) = X̂2 X̂2 ⊗ Â−1 B̂Â−1
n
(with consistent estimates of  and B̂) is asymptotically equivalent with Q2 .

If one also uses inner standardization (to attain affine invariance), one first trans-
forms

yi → T̂i = T(S−1/2 (yi − β̂ 1 x1i )) and Y → T̂ = (T̂1 , ..., T̂n )

such that
T̂ X1 = 0 and p · T̂ T̂ = tr(T̂ T̂)I p .
The test statistic is then the same (but with changed, standardized scores)
13.2 Multivariate linear L2 regression 187

np
Q2 = n · tr T̂ PX̂2 T̂(T̂ T̂)−1 = |P T̂|2 .
tr(T̂ T̂) X̂2

Note that the use of the permutation test is questionable here. It is allowed only
if X1 and X2 are independent (X2 gives the treatment in a randomized trial, for
example) and then the p value is
  
EP I Q2 (X1 , PX2 , Y) ≥ Q2 (X1 , X2 , Y) .

Again, P is uniformly distributed in the space of all possible n × n permutation


matrices.

13.2 Multivariate linear L2 regression

We consider the linear regression model

Y = Xβ + ε ,

where Y is an n × p matrix of the response variable, ε is an n × p matrix of the error


variable, X is an n × q matrix of explaining variables, and β is a q × p matrix of
regression coefficients. For each individual i, we then have

yi = β  xi + ε i , i = 1, ..., n.

In this section we use the identical score function T(y) = y and therefore assume
that ε is a random sample from a p-variate distribution with

E(ε i ) = 0 and COV(ε i ) = Σ .

We thus need the assumption that the second moments exist. We assume that X X
is a full-rank matrix, the rank is q, and thus it has the inverse. For the asymptot-
ical results we assume that the explaining (design) variables are fixed and satisfy
Assumption 6 with
1 
X X → D as n → ∞,
n
where the rank of D is also q. We naturally often assume that the first column of
X is 1n so that the first row of β is the so-called intercept parameter. Note that the
one-sample and several-sample location problems are special cases here.

Testing problem I. We first consider the problem of testing the null hypothesis
H0 : β = 0 versus H1 : β = 0. We thus wish to test whether there is any linear
structure in the population. The null hypothesis simply says that E(yi ) = 0 for all
i and therefore is independent on the values xi , i = 1, ..., n. If we use the identity
score, then the test statistic is simply
188 13 Multivariate linear regression

Y X or equivalently Y X(X X)−1 .

Under our assumptions and under the null hypothesis,


1  1 
XX → D and Y Y →P Σ
n n
and therefore
n−1/2vec(Y X) →d N pq (0, D ⊗ Σ )
and finally
Q2 = Q2 (Y) = n · tr(YPX Y(Y Y)−1 ) →d χ pq
2
.
This test is affine invariant in the sense that if we transform

(X, Y) → (XV, YW),

where V and W are q × q and p × p full rank transformation matrices, then the value
of the test statistic remains unchanged; that is,

Q2 (X, Y) = Q2 (XV, YW).

Estimation problem. In the L2 estimation , the estimate β̂ minimizes

1 1 
Dn (β ) = |Y − Xβ |2 = tr (Y − Xβ ) (Y − Xβ ) = AVE{|yi − β  xi |2 }
n n
or solves
(Y − Xβ̂ ) X = 0.
If X has rank q, then the solution is

β̂ = (X X)−1 X Y,

and the estimate is also called the least squares (LS) estimate. As

β̂ − β = (X X)−1 X (Y − Xβ ) = (X X)−1 X ε

we easily get that



n vec((β̂ − β ) ) →d Nqp (0, D−1 ⊗ Σ ).

The estimate β̂ = β̂ (X, Y) has the following important equivariance properties.


1. Regression equivariance:

β̂ (X, XH + Y) = β̂ (X, Y) + H for all full-rank matrices H.


13.3 L1 regression based on spatial signs 189

2. Y equivariance:

β̂ (X, YW) = β̂ (X, Y)W for all full-rank matrices W.

3. X equivariance:

β̂ (XV, Y) = V−1 β̂ (X, Y) for all full-rank matrices V.

Testing problem II. Next consider the model

Y = X1 β 1 + X2 β 2 + ε ,

where the linear part is partitioned into two parts. We wish to test the null hypothesis
H0 : β 2 = 0. We first center the matrices Y and X2 using X1 (inner centering); that
is
Y → Ŷ = (In − PX1 )Y and X2 → X̂2 = (In − PX1 )X2 .

Then Ŷ X1 = 0 and X̂2 X1 = 0. The test statistic for testing H0 : β 2 = 0 is


Q2 = n · tr Ŷ PX̂2 Ŷ(Ŷ Ŷ)−1

which has an approximate chi-square distribution with q2 p degrees of freedom. This


test for H0 : β 2 = 0 is affine invariant in the sense that if we transform

(X1 , X2 , Y) → (X1 V1 , X2 V2 , YW),

where V1 , V2 , and W are full rank transformation matrices (with ranks q1 , q2 , and
p, resp.) then the value of the test statistic is not changed; that is,

Q2 (X1 , X2 , Y) = Q2 (X1 V1 , X2 V2 , YW).

13.3 L1 regression based on spatial signs

We now assume that, in considering the linear regression model

Y = Xβ + ε ,

ε is a random sample of size n from a p-variate distribution with the spatial median
at zero; that is,
E(U(ε i )) = 0.
With the spatial sign score we again need the assumption that the density of ε i is
bounded. Again, X X is a full-rank matrix, rank is q, and X satisfies Assumption 6
with
190 13 Multivariate linear regression

1 
XX→D as n → ∞.
n

Testing problem I. Consider the testing problem with the null hypothesis H0 :
β = 0. Under the null hypothesis E(U(yi )) = 0 for all i = 1, ..., n. If one uses the
spatial sign score U(y), then one first transforms

yi → Ui = U(yi ), i = 1, ..., n, and Y → U = (U1 , ..., Un )

and the test statistic is based on the covariances between components of U and X,
that is, on the matrix U X. Then, under the null hypothesis,

n−1/2vec(U X) →d N pq (0, D ⊗ B),

where B = E{U(ε i )U(ε i ) } and finally

Q2 = Q2 (Y) = n · tr(UPX U(U U)−1 ) →d χ pq


2
.

Unfortunately, this test is not affine invariant but an affine invariant test version
can be found using inner standardization. Find a transformation matrix S−1/2 such
that, if we transform

yi → Ûi = U(S−1/2 yi ) and Y → Û = (Û1 , ..., Ûn )

then
p · Û Û = nI p .
The transformation is then again Tyler’s transformation, and the test statistic is

Q2 = Q2 (X, Y) = p · tr(Û PX Û) = p · |PX Û|2 .

The statistic Q2 has the limiting χ pq


2 null distribution as well.

Estimation problem. In the L1 estimation based on the spatial sign score, the
estimate β̂ minimizes

AVE{|yi − β  xi |} or Dn (β ) = AVE{|yi − β  xi | − |yi |}.

The estimate β̂ then often solves

U(β̂ ) X = 0,

where

Ui (β ) = U(yi − β  xi ), i = 1, ..., n, and U(β ) = (U1 (β ), ..., Un (β )) .

The estimate is sometimes also called the least absolute deviation (LAD) estimate.
13.3 L1 regression based on spatial signs 191

The solution β̂ cannot be given in a closed form but may be easily calculated
using the algorithm with the following two iteration steps.
1.
ei ← yi − β  x i , i = 1, ..., n.
2.  −1
β ← β + AVE{|ei |−1 xi xi } AVE{xi U(ei ) }.

Assume for a moment that β = 0. Then under our assumptions

Dn (β ) →P D(β ) = tr(Dβ  Aβ ) + o(|β |),

where Dn (β ) and D(β ) are convex. Also

nDn (n−1/2β ) − vec(n−1/2 U X − Aβ  D) vec(β ) = oP (1).

See Appendix B. As in the case of the spatial median, it then follows that
√ 
n vec((β̂ − β ) ) →d Nqp 0, D−1 ⊗ (A−1 BA−1 ) ,

where
A = E {A(ε i )} and B = E {B(ε i )}
with
1  
A(y) = I p − U(y)U(y) and B(y) = U(y)U(y) .
|y|
Natural consistent estimates of A and B are then


 


 = AVE A yi − β̂ xi and B̂ = AVE B yi − β̂ xi ,

respectively.

The estimate β̂ = β̂ (X, Y) is regression equivariant and X equivariant but, un-


fortunately not Y equivariant. A fully equivariant LAD estimate is again obtained
using the transformation retransformation technique. A natural extension of the
Hettmansperger-Randles estimate is then obtained with an algorithm that first up-
dates the residuals, then the β matrix, and finally the residual scatter matrix S as
follows.
1.
ei ← S−1/2 (yi − β  xi ), i = 1, ..., n.
2.  −1
β ← β + AVE{|ei |−1 xi xi } AVE{xi U(ei ) }S1/2 .
3.
S ← p S1/2 AVE{U(ei )U(ei ) } S1/2 .
192 13 Multivariate linear regression

As in the case of the one-sample HR estimate, there is no proof for the conver-
gence of the algorithm but in practice it seems to work. If β = 0, (1/n)X X → D =
I p , and ε i is spherically distributed around the origin, and the initial regression and
shape estimates, say B and S are root-n consistent, that is,
√ √
nB = OP (1) and n(S − I p ) = OP (1)

with tr(S) = p then one can again show that the k-step estimates (obtained after k
iterations of the above algorithm) satisfy
 k
√ 1 √
nBk = nB
p
#  k $
1 1 p √
+ 1− −1
nAVE{xi ui } + oP(1)
p E(ri ) p − 1

and
 k
√ 2 √
n(Sk − I p ) = n(S − I p )
p+2
#  k $
2 p + 2√ 
+ 1− n p · AVE{ui ui } − I p + oP (1).
p+2 p

Asymptotically, the k-step estimate behaves as a linear combination of the initial


pair of estimates and the L1 estimate. The larger k, the more similar is the limiting
distribution to that of the L1 estimate.

Testing problem II. Consider again the model with two parts of explaining
variables, X1 and X2 :
Y = X1 β 1 + X 2 β 2 + ε .
We wish to test the null hypothesis that the variables in the X2 part have no effect on
the response variable. Thus H0 : β 2 = 0. In the null case, the estimate of β 1 solves

U(β̂ 1 , 0) X1 = 0.

Then write
Û = U(β̂ 1 , 0) and X̂2 = (In − PX1 )X2 .
Then Û X1 = 0 and X̂2 X1 = 0. The test statistic for testing H0 : β 2 = 0 is then


Q2 = n · tr Û PX̂2 Û(Û Û)−1

with a limiting chi-square distribution with q2 p degrees of freedom.

Unfortunately, the test statistic Q2 is not invariant under affine transformations


to Y. The affine invariant version of the test is obtained if the spatial sign scores are
13.4 L1 regression based on spatial ranks 193

obtained by using both inner centering and inner standardization. Then the spatial
signs

Ûi = U S−1/2 (yi − β̂ 1 x1i ) , i = 1, ..., n,

satisfy
Û X1 = 0 and p · Û Û = nI p .
The test statistic is then
Q2 = p · |PX̂2 Û|2 .

13.4 L1 regression based on spatial ranks

This approach uses the spatial rank function R(y) as a score function. Note that
the spatial ranks are invariant under location shifts. Therefore a separate procedure
is needed for the estimation of the intercept vector. This approach here extends
the Wilcoxon-Mann-Whitney and Kruskal-Wallis tests to the general multivariate
regression case.

We again assume that


Y = 1 n μ  + Xβ + ε ,
where X is an n × q matrix of genuine explaining variables with regression coeffi-
cient matrix β , μ is the intercept, and ε is a random sample from a p-variate contin-
uous distribution. We again need the assumption that the density of ε i is bounded.
Then the densities of ε i − ε j and ε i + ε j are also bounded. The design matrix (1n , X)
satisfies Assumption 6 with
1 
X (In − P1n )X → D0 as n → ∞.
n

To shorten the notations, we write in the following

yi j = y j − y i , xi j = x j − xi , and ε i j = ε j − ε i ,

for i, j = 1, ..., n. Note that


yi j = β  xi j + ε i j
and μ cancels out of the formula.

Testing problem I. Consider the testing problem with the null hypothesis H0 :
β = 0; that is, E(U(yi − y j )) = 0 for all i = j. The test statistic one can use here is

n1/2 vec(AVE{Ui j xi j }).


194 13 Multivariate linear regression

It is, under the null hypothesis, asymptotically equivalent to a multivariate rank test
statistic
n−1/2 vec(R X̂),
where X̂ = (In − P1n )X. Recall that the matrix of spatial ranks R is obtained by
transformations

yi → Ri = R(yi ), i = 1, ..., n, and Y → R = (R1 , ..., Rn ) .

Under the null hypothesis

n−1/2 vec(R X̂) →d N pq (0, D0 ⊗ B),

where B = E{R(yi )R(yi ) } and finally

Q2 = Q2 (Y) = n · tr(RPX̂ R(R R)−1 ) →d χ pq


2
.

Unfortunately, this test is not affine invariant but an affine invariant test version
can be found, for example, using the following natural inner standardization. Find a
transformation matrix S−1/2 such that if we transform

yi → R̂i = R(S−1/2 yi ) and Y → R̂ = (R̂1 , ..., R̂n )

then
p · R̂R̂ = tr(R̂ R̂)I p .
The transformation is then a Tyler-type transformation but using ranks instead of
signs, and the test statistic is
np
Q2 = Q2 (X, Y) = · |PX̂ R̂|2 .
tr(R̂ R̂)

The statistic Q2 has the limiting χ pq


2 null distribution.

Estimation problem. The estimate β̂ corresponding the rank test minimizes

Dn (β ) = AVE{|yi j − β  xi j | − |yi j |}

or solves
AVE{Ui j (β )xi j } = 0,
where
Ui j (β ) = U(yi j − β  xi j ), i, j = 1, ..., n.
The solution β̂ may be found as in the regular LAD regression but replacing ob-
servations and explaining variables by differences of observations and explaining
variables, respectively. The algorithm then uses the two iteration steps:
13.4 L1 regression based on spatial ranks 195

1.
ei j ← y i j − β  x i j , i = j.
2.  −1
β ← β + AVE{|ei j |−1 xi j xi j } AVE{xi j U(ei j ) }.

The estimate β̂ can be called a rank-based estimate as the estimating equation


can also be written in the form

R(β̂ ) X = 0

with
R(β ) = (R1 (β ), ..., Rn (β )) ,
where Ri (β ) is the spatial rank of yi − β  xi among y1 − β  x1 , ..., yn − β  xn . See also
Zhou (2009) for this estimate and its properties.

As before, it is possible to show that


√ 
n vec((β̂ − β ) ) →d Nqp 0, D−1 −1 −1
0 ⊗ (A BA ) ,

where    
A = E A(ε i − ε j ) and B = E B(ε i − ε j , ε i − ε k )
with distinct i, j, and k, and
1  
A(y) = I p − U(y)U(y) and B(y1 , y2 ) = U(y1 )U(y2 ) .
|y|

Natural consistent estimates of A and B are then




 
 

 = AVE A yi j − β̂ xi j and B̂ = AVE B yi j − β̂ xi j , yik − β̂ xik ,

respectively. In fact, B̂ is simply the spatial rank covariance matrix of the estimated
residuals.

As in the case of the regular LAD estimate, the estimate β̂ = β̂ (X, Y) is regres-
sion equivariant and X equivariant but not Y equivariant. The transformation re-
transformation estimation procedure can be created, for example, by first updating
the residuals, then the β matrix, and finally the residual scatter matrix S as follows.
1.
ei j ← S−1/2 (yi j − β  xi j ), i, j = 1, ..., n.
2.  −1
β ← β + AVE{|ei j |−1 xi j xi j } AVE{xi j U(ei j ) }S1/2 .
3.
S ← p S1/2 AVE{U(ei j )U(eik ) } S1/2 .
196 13 Multivariate linear regression

Testing problem II. Consider again the model with two parts of explaining
variables, X1 and X2 :

Y = 1 n μ  + X1 β 1 + X2 β 2 + ε .

We wish to test the null hypothesis H0 : β 2 = 0. In the null case, the estimate of β 1
solves
R(β̂ 1 , 0) X1 = 0.
Then write
R̂ = R(β̂ 1 , 0) and X̂2 = (In − PX1 )X2 .
Then R̂ X1 = 0 and X̂2 X1 = 0. The test statistic for testing H0 : β 2 = 0 is then


Q2 = n · tr R̂ PX̂2 R̂(R̂ R̂)−1

with a limiting chi-square distribution with q2 p degrees of freedom.

Unfortunately, the test statistic Q2 is not invariant under affine transformations


to Y. The affine invariant version of the test is obtained if the spatial sign scores are
obtained by using both inner centering and inner standardization. Then the spatial
signs

R̂i = R S−1/2 (yi − β̂ 1 x1i ) , i = 1, ..., n,

satisfy
R̂ X1 = 0 and p · R̂ R̂ = tr(R̂ R̂)I p .
The test statistic is then
np
Q2 = · |PX̂2 R̂|2 .
tr(R̂ R̂)

13.5 An example

The dataset considered in this example is the LASERI data already analyzed in
Chapter 10. We consider the multivariate regression problem where the response
variables are the differences HRT1T2, COT1T2, and SVRIT1T2, and the explaining
variables are sex (0/1), age (years), and WHR (waist to hip ratio). See Figure 13.1 for
the scatterplot matrix. The variables HRT1T2, COT1T2, and SVRIT1T2 measure
the reaction of the individual hemodynamic system to the change in positions.

We first estimate the regression coefficient matrix in the full model with three
explaining variables: sex, age, and WHR. If the spatial sign score (LAD) with inner
standardization is used, one gets
13.5 An example 197

30 35 40 −30 −10 10 −2000 −500

1.8
Sex

1.4
1.0
40
35
Age
30

0.9
WHR

0.7
10
−10

HRT1T2
−30

0 1 2 3
COT1T2

−2
−500

SVRIT1T2
−2000

1.0 1.4 1.8 0.7 0.9 −2 0 1 2 3

Fig. 13.1 Pairwise scatterplots for the variables used in the regression analysis.

> data(LASERI)
> with(LASERI, pairs( cbind(Sex, Age, WHR, HRT1T2,
COT1T2, SVRIT1T2 )))
>
> is.reg.fullmodel <- mv.l1lm(cbind(HRT1T2, COT1T2, SVRIT1T2)
˜ Age + WHR + Sex, data=LASERI, score="s", stand="i")
> with(LASERI, pairs( cbind(Sex, Age, WHR,
residuals(is.reg.fullmodel))))
> summary(is.reg.fullmodel)

Multivariate regression using spatial sign scores


and inner standardization

Call:
mv.l1lm(formula = cbind(HRT1T2, COT1T2, SVRIT1T2) ˜
Age + WHR + Sex, scores = "s", stand = "i", data = LASERI)

Testing that all coefficients = 0:


Q.2 = 213.0913 with 12 df, p.value < 2.2e-16

Results by response:

Response HRT1T2 :
Estimate Std. Error
(Intercept) -21.713 6.249
198 13 Multivariate linear regression

Age 0.169 0.100


WHR 5.337 8.154
SexMale -2.648 1.300

Response COT1T2 :
Estimate Std. Error
(Intercept) 2.3401 0.7047
Age 0.0140 0.0113
WHR -2.5060 0.9196
SexMale -0.0496 0.1467

Response SVRIT1T2 :
Estimate Std. Error
(Intercept) -1525.80 415.80
Age -3.31 6.67
WHR 1173.87 542.58
SexMale 76.06 86.53

The residuals are plotted in Figure 13.2. The spatial signs of the residuals and the
explaining variables are made uncorrelated.

30 35 40 −20 0 20 −1500 0 1000

1.8

Sex
1.4
1.0
40
35

Age
30

0.9

WHR
0.7
20

HRT1T2
0
−20

2
1

COT1T2
0
−2
1000

SVRIT1T2
0
−1500

1.0 1.4 1.8 0.7 0.9 −2 0 1 2

Fig. 13.2 Residual plots for the estimated full model with the spatial sign score for LASERI data.
13.5 An example 199

If one uses the identity score instead, regular L2 analysis gives quite similar re-
sults. See the results below.

> is.reg.fullmodel2 <- mv.l1lm(cbind(HRT1T2, COT1T2, SVRIT1T2)


˜ Age + WHR + Sex, data=LASERI)
> summary(is.reg.fullmodel2)

Multivariate regression using identity scores

Call:
mv.l1lm(formula = cbind(HRT1T2, COT1T2, SVRIT1T2)
˜ Age + WHR + Sex, data = LASERI)

Testing that all coefficients = 0:


Q.2 = 225.7625 with 12 df, p.value < 2.2e-16

Results by response:

Response HRT1T2 :
Estimate Std. Error
(Intercept) -21.013 6.282
Age 0.140 0.101
WHR 5.146 8.197
SexMale -2.510 1.307

Response COT1T2 :
Estimate Std. Error
(Intercept) 3.0223 0.6406
Age 0.0105 0.0103
WHR -3.1486 0.8359
SexMale -0.0084 0.1333

Response SVRIT1T2 :
Estimate Std. Error
(Intercept) -1834.03 365.30
Age -1.74 5.86
WHR 1462.88 476.69
SexMale 63.27 76.02

If one wishes to test the hypothesis that the variable WHR has no effect on the
response variables, one can first estimate the parameters in the submodel (without
WHR) and then use the score test as described earlier. One then gets

> is.reg.submodel <- mv.l1lm(cbind(HRT1T2, COT1T2, SVRIT1T2)


˜ Age + Sex, data=LASERI, score="s", stand="i")
> anova(is.reg.fullmodel,is.reg.submodel)

Comparisons between multivariate linear models


200 13 Multivariate linear regression

Full model: mv.l1lm(formula =


cbind(HRT1T2, COT1T2, SVRIT1T2)
˜ Age + WHR + Sex, scores = "s", stand = "i", data = LASERI)
Restricted model: mv.l1lm(formula =
cbind(HRT1T2, COT1T2, SVRIT1T2)
˜ Age + Sex, scores = "s", stand = "i", data = LASERI)

Score type test that coefficients not


in the restricted model are 0:
Q.2 = 18.7278 with 3 df, p.value = 0.0003112

13.6 Other approaches

Rao (1988) proposed the use of univariate LAD regression separately for the
p response variable. Puri and Sen (1985), Section 6.4, and Davis and McKean
(1993) developed multivariate regression methods based on coordinatewise ranks.
Chakraborty (1999) used the transformation retransformation technique with
marginal LAD estimates to find affine equivariant versions of the LAD estimates.

Multivariate spatial sign methods have been studied in Bai et al. (1990) and Ar-
cones (1998). Multivariate affine equivariant regression quantiles based on spatial
signs and the transformation retransformation technique were introduced and dis-
cussed in Chakraborty (2003). Asymptotics for the spatial rank methods were con-
sidered in Zhou (2009).

Theil-type estimates based on the Oja median were given in Busarova et al.
(2006) and Shen (2009). For a different type of regression coefficient estimates that
are based on the Oja sign and rank covariance matrices, see Ollila et al. (2002,
2004b).
Chapter 14
Analysis of cluster-correlated data

Abstract In this chapter it is shown how the spatial sign and rank methods can be
extended to cluster-correlated data. Tests and estimates for the one-sample location
problem with a general score function are given in detail. Then two-sample weighted
spatial rank tests are considered.

14.1 Introduction

In previous chapters we assumed that the observations in Y = (y1 , ..., yn ) are gen-
erated by the model
Y = Xβ + ε ,
where the n p-variate residuals, that is, the rows of ε = (ε 1 , ..., ε n ) , are independent
and identically distributed (i.i.d.) random vectors. The assumption that the observa-
tions are independent is not true, however, if the data are clustered.

Clustered data can arise in a variety of applications. There may occur natural
groups in the target population. These groupings may, for example, be based on clin-
ics for patients, schools for students, litters for rats, and so on. Still one example on
clustered data is the data arising in longitudinal studies. Then the measurements on
the individuals (clusters) are taken repeatedly over a time interval. If the clustering
in the data is simply ignored, in some cases there can be a serious underestimation
of the variability of the estimators. The true standard deviation of the sample mean
as an estimator of the population mean, for example, may be much larger than its
estimate under the i.i.d. assumption. This underestimation will further result in con-
fidence intervals that are too narrow and p-values that are too small. Therefore an
adjustment to standard statistical methods depending on cluster sizes and intraclass
correlation is needed.

Traditionally, parametric mixed models have been used to account for the corre-
lation structures among the dependent observational units. Then one assumes that
H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 201
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3 14,
c Springer Science+Business Media, LLC 2010
202 14 Analysis of cluster-correlated data

Y = Zα + Xβ + ε

where Z is an n × d matrix of group (cluster) membership. (The jth column indicates


the group membership in the jth cluster.) The d rows of the d × p matrix α give the
random effect of d clusters. The idea is that the clusters in the sample are not fixed
but a random sample from a population of clusters. Often the data are collected
in clusters, and the approximate distributions are obtained letting d → ∞. In the
mixed model approach the rows of α and ε are assumed to be independent and
multinormally distributed. One can again relax the assumptions and use multivariate
spatial sign and rank scores in the analysis. It also seems clear that the observations
should have different weights in the analysis. The observations in a cluster of ten
observations should not have the same weight inasmuch as a single observation in a
cluster of size one as the information in the first cluster is not tenfold.

14.2 One-sample case

14.2.1 Notation and assumptions

Let
Y = (y1 , y2 , ..., yn )
be a sample of p-variate random vectors with sample size n. We assume now that
the observations come in d clusters and that the n × d matrix

Z = (z1 , z2 , ..., zn )

gives the cluster membership so that



1, if the ith observation comes from cluster j
zi j =
0, otherwise.

Note that

1, if the ith and jth observations come from the same cluster
(ZZ )i j =
0, otherwise,

and that Z Z is a d × d diagonal matrix whose diagonal elements are the d cluster
sizes, say m1 , ..., md .

The one-sample parametric location model with random cluster effects is often
written as
Y = Zα + 1n μ  + ε ,
where the d rows of α are i.i.d. from N p (0, Ω ) and the n rows of ε are i.i.d. from
N p (0, Σ ), and α and ε are independent. The model can be reformulated as
14.2 One-sample case 203

Y = 1n μ  + ε ,

where now
vec(ε  ) ∼ Nnp (0, In ⊗ Σ + ZZ ⊗ Ω ).
We thus move the cluster effect to the covariance matrix of the error variable. If
ε = (ε 1 , . . . , ε n ) then the model states that
1. ε i ∼ N p (0, Σ + Ω ) for all i = 1, . . . , n.
2. If (ZZ )i j = 1, i = j, then
  
Σ +Ω Ω
vec(ε i , ε j ) ∼ N2p 0, .
Ω Σ +Ω

3. If (ZZ )i j = 0 then ε i and ε j are independent.

In our approach, we relax the assumptions. We still assume that

Y = 1n μ  + ε

but the residuals ε 1 , ..., ε n now satisfy the following.


Assumption 7 The rows of ε = (ε 1 , ..., ε n ) satisfy
1. ε i ∼ −ε i and ε i ∼ ε j , for all i, j = 1, ..., n.
2. (ε i , ε j ) ∼ (ε i , ε j ) for all i = j and i = j and (ZZ )i j = (ZZ )i j .
3. If (ZZ )i j = 0 then ε i and ε j are independent.

14.2.2 Tests and estimates

A general idea to construct tests and estimates is again to use an odd vector-valued
score function T(y) to calculate individual scores Ti = T(yi ), i = 1, ..., n. We write

T = (T1 , T2 , ..., Tn ) .

We need the assumption that E(|T(ε i )|2+γ ) is bounded for some γ > 0. Let L(y) be
the optimal location score function, that is, the gradient vector of log f (y − μ ) with
respect to μ at the origin. Here f (y) is the density of ε i . If H0 : μ = 0 is true, then
E(Ti ) = 0 for all i = 1, ..., n. Write as before

A = E(T(ε i )L(ε i ) ) and B = E(T(ε i )T(ε i ) ).

We now also need covariances of two distinct transformed residuals in the same
cluster; that is,

C = E(T(ε i )T(ε j ) ) where i = j satisfy (ZZ )i j = 1.


204 14 Analysis of cluster-correlated data

Clearly 
COV(vec(T )) = In ⊗ B + ZZ − In ⊗ C
For the sampling design, we have the next assumption.
Assumption 8 Assume that

1  1 d
1n (ZZ − In )1n = ∑ m2i − 1 → do
n n i=1

as d → ∞.
Then, for the one-sample location problem,

• The test for H0 : μ = 0 is based on


1
AVE {T(yi )} = (1n ⊗ I p )vec(T ).
n
• The companion location estimate μ̂ is determined by the estimating equation

AVE{T(yi − μ̂ )} = 0.

Consider the null hypothesis H0 : μ = 0 and a (contiguous) sequence of alter-


natives Hn : μ = n−1/2δ . Then (under general assumptions that should be proved
separately for each choice of the score function)

• Under the null hypothesis H0 ,



n AVE {T(yi )} →d N p (0, B + d0D).

• Under the sequence of alternatives Hn ,

Q2 = 1n T(T ZZ T)−1 T 1n →d χ p2 (δ  A(B + d0C)−1 Aδ ).



• n(μ̂ − μ ) →d N p (0, A−1 (B + d0C)A−1 ).

Note that the limiting covariance matrix of μ̂ needs an adjustment

A−1 BA−1 → A−1 (B + d0 C)A−1 ,

and the estimated confidence ellipsoid without this correction is too small (positive
intracluster correlation).

14.2.3 Tests and estimates, weighted versions

In this section we consider the tests and estimates based on the weighted scores. Let
14.2 One-sample case 205

W = diag(w1 , ..., wn )

be an n × n diagonal matrix with a non-negative weight wi associated with the ith


observation. The weight matrix is assumed to be fixed and possibly defined by the
cluster structure Z. The covariance structure of the weighted score matrix WT is
then  
COV vec(T W) = W2 ⊗ B + W(ZZ − In )W ⊗ C.
For the cluster structure and the weights, we now need the following assumption.

Assumption 9 There exist constants d1 and d2 such that


1  2 1   
1 W 1n → d1 and 1n W ZZ − In W1n → d2
n n n
as d tends to infinity.
Then, for the one-sample location problem,
• The test is based on
1 1
AVE {wi T(yi )} = T W1n = (1n ⊗ I p )vec(T W).
n n
• The companion location estimate μ̂ is determined by the estimating equation

AVE{wi T(yi − μ̂ )} = 0.

Consider the null hypothesis H0 : μ = 0 and a (contiguous) sequence of alter-


natives Hn : μ = n−1/2 δ . Then (again under certain assumptions depending on the
score function),
• Under the null hypothesis H0 ,

n AVE {wi T(xi )} →d N p (0, d1 B + d2C).

• Under the sequence of alternatives Hn ,

Q2 = 1n WT(T WZZ WT)−1 T W1n →d χ p2 (δ  A(d1 B + d2 C)−1 Aδ ).



• n(μ̂ − μ ) →d N p (0, A−1 (d1 B + d2 C)A−1 ).

How should one then choose the weights? Using the results above, one can
choose the weights to maximize Pitman efficiency of the test or to minimize the de-
terminant of the covariance matrix of the estimate, for example. Explicit solutions
can be found in some simplified cases. If C = ρ B (ρ is the intraclass correlation)
then the covariance matrix has the structure

COV(vec(T )) = Σ ⊗ B, where Σ = In + ρ (ZZ − In ).


206 14 Analysis of cluster-correlated data

One can then use the Lagrange multiplier technique to find the optimal weights
w = (w1 , ..., wn ) . The solution is

w = λ Σ −1 1n ,

where λ is the Lagrange multiplier chosen so that the constraint w 1n = n is satisfied.


The weights in the ith cluster are then proportional to [1 + (mi − 1)ρ ]−1 (Larocque
et al. (2007)). The larger the cluster size, the smaller are the weights.

We end this section with the notion that the proposed score-based testing and esti-
mation procedures are not necessarily affine invariant and equivariant, respectively.
Again, affine invariant and equivariant versions may be obtained, as before, using
the transformation retransformation technique. Natural unweighted and weighted
scatter matrix estimates for this purpose have not yet been developed.

14.3 Two samples: Weighted spatial rank test

Assume that (X, Z, Y) is the data matrix and consider first the general linear regres-
sion model
Y = Xβ + ε ,
where as before Y = (y1 , ..., yn ) is an n × p matrix of n observed values of p re-
sponse variables, X = (x1 , ..., xn ) is an n × q matrix of observed values of q explain-
ing variables, β is a q × p matrix of regression coefficients, and ε = (ε 1 , ..., ε n ) is
an n × p matrix of residuals. At the individual level, one can then write

yi = β  xi + ε i , i = 1, ..., n.

As before, the matrix Z is the n × d matrix indicating the cluster membership. The
residuals ε 1 , ..., ε n are not iid any more but satisfy
Assumption 10 The rows of ε = (ε 1 , ..., ε n ) satisfy
1. E(U(ε i )) = 0 and ε i ∼ ε j , for all i, j = 1, ..., n.
2. (ε i , ε j ) ∼ (ε i , ε j ) for all i = j and i = j and (ZZ )i j = (ZZ )i j .
3. If (ZZ )i j = 0 then ε i and ε j are independent.
The first condition says that all the p-variate distributions of the ε i are the same; no
symmetry condition is needed. The condition E(U(ε i )) = 0 is used here just to fix
the center of the distribution of the residuals. (The spatial median of the residuals is
zero.)

Next we focus on the two-sample location model where

X = (1n , x) and β = (μ , Δ ) ,
14.3 Two samples: Weighted spatial rank test 207

the n-vector x is the indicator for the second sample membership, and μ and μ + Δ
are the two location centers (spatial medians of the two populations). We wish to
test the null hypothesis H0 : Δ = 0 and estimate the value of unknown Δ .

Note that the sample sizes are then

n1 = 1n (1n − x) and n2 = 1n x,

and the cluster sizes m1 , ..., md are the diagonal elements of the d × d diagonal ma-
trix Z Z. The sample design is given by the frequency table for group and cluster
membership, that is, 
(1n − x) Z, x Z .

If the null hypothesis H0 : Δ = 0 is true then the observations y1 , ..., yn are i.i.d.
from a distribution with the cdf F, say. The population spatial rank score function is
then
RF (y) = E(U(y − yi)).
Function RF is naturally unknown. An often improved estimate of the population
spatial rank function is obtained if one uses a weighted spatial rank function

Rw (y) = AVE{wi U(y − yi)}

with some positive strategically chosen individual weights w1 , ..., wn . We again write

w = (w1 , ..., wn ) and W = diag(w1 , ..., wn ).

Also write in short

RF = (RF (y1 ), ..., RF (yn )) and Rw = (Rw (y1 ), ..., Rw (yn )).

Note that the weighted ranks Rw (y1 ), ..., Rw (yn ) are now centered in the sense that

Rw W1n = Rw w = 0.

The test statistic is then based on the weighted sum of weighted ranks over the
second sample, that is,
Rw WX.
One can then show that, under the null hypothesis and under some general assump-
tions,
1 1
√ Rw Wx = √ RF Wxw + oP(1),
n n
where  
1 
xw = In − 1n 1n W x.
n
208 14 Analysis of cluster-correlated data

Note that now the xw are centered (instead of ranks) so that xw w = 0. Thus the lim-
iting null distribution of n−1/2 Rw Wx is a p-variate normal distribution with mean
value zero and covariance matrix d1 B + d2C where

B = E(RF (ε i )RF (ε i ) )

and
C = E(RF (ε i )RF (ε j ) ) with i = j such that (ZZ )i j = 1.
Here it is assumed that

xw W2 xw → d1 and xw W(ZZ − In )Wxw → d2 .

Finally, if G is a diagonal matrix with diagonal elements given in 2x − 1n then


the limiting null distribution of the squared version
 −1
Q2 = 1n GWRw Rw GWZZ WGRw Rw WG1n

is a chi-squared distribution with p degrees of freedom. This extends the Wilcoxon-


Mann-Whitney test to the case of multivariate clustered data. See Nevalainen et al.
(2009) for more details and for the several-sample case.

14.4 References and other approaches

Most theoretical work for the analysis of longitudinal or clustered data concerns
univariate continuous response variables having a normal distribution. Rosner and
Grove (1999) and Rosner et al. (2003) generalized the standard Wilcoxon-Mann-
Whitney rank sum test to the cluster-correlated case with cluster members belonging
to the same treatment groups. Datta and Satten (2005) and Datta and Satten (2008)
developed the rank-sum tests for cases where members in the same cluster may
belong to different treatment groups. Additionally, the correlation between cluster
members may depend on the cluster size. Finally, Rosner et al. (2003) derived an
adjusted variance estimate for a randomization-based Wilcoxon signed rank test
for clustered paired data. They also introduced a weighted signed-rank statistic to
attain better efficiency. The weighted multivariate sign test is the only nonparametric
multivariate test for cluster-correlated data considered in the literature thus far.

Quite recently, Larocque (2003), Nevalainen et al. (2007a,b, 2009), Larocque


et al. (2007), and Haataja et al. (2008) have developed extensions of the sign and
signed-rank tests and the corresponding estimates to multivariate cluster-correlated
data. The results presented here are based on these papers.
Appendix A
Some vector and matrix algebra

An r × s matrix A is an array of real numbers


⎛ ⎞
a11 a12 ... a1s
⎜a21 a22 ... a2s ⎟
A=⎜ ⎟
⎝ ... .. ... ... ⎠ .
ar1 ar2 ... ars

The number ai j is called the (i, j) element of A. The set of r × s matrices is here
denoted by M (r, s). An r × s zero matrix, written 0, is a matrix with all elements
zero and is the zero element in M (r, s).
r × 1 matrices are called (column) vectors or r-vectors; 1 × s matrices are row
vectors. Column vectors are denoted by bold lower-case letters a, b, .... A 1 × 1 ma-
trix a is just a real number. A set of vectors a1 , ..., ar is said to be linearly dependent
if there exist scalars c1 , ..., cr , not all zero, such that c1 a1 + · · · + cr ar = 0. Otherwise
they are linearly independent. Write ei , i = 1, ..., r for an r-vector with ith element
one and other elements zero. These vectors are linearly independent and give an
orthonormal base for Rr .
The transpose of A, written as A , is the s × r matrix
⎛ ⎞
a11 a21 ... ar1
⎜a12 a22 ... ar2 ⎟
A=⎜ ⎟
⎝ ... ... ... ... ⎠ ,
a1s a2s ... ars

obtained by interchanging the roles of the rows and columns; the ith row becomes
the ith column and the jth column becomes the jth row, i = 1, ..., r and j = 1, ..., s.
The sum of two r × s matrices A and B is again an r × s matrix C = A + B, whose
(i, j) element is
ci j = ai j + bi j .

H. Oja, Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs 209
and Ranks, Lecture Notes in Statistics 199, DOI 10.1007/978-1-4419-0468-3,
c Springer Science+Business Media, LLC 2010
210 A Some vector and matrix algebra

The scalar product of real number c and r × s matrix A is an r × s matrix with (i, j)-
element c · ai j . The product of an r × s matrix A and an s × t matrix B is an r × t
matrix C = AB, whose (i, j) element is
s
ci j = ∑ aik bk j .
k=1

An r × r matrix A is a square matrix (the number of rows equals the number of


columns). Square matrix A is symmetric if A = A. Square matrix A is a diagonal
matrix if ai j = 0 whenever i = j (off-diagonal elements are zero). r × r diagonal
matrix ⎛ ⎞
1 0 ... 0
⎜ 0 1 ... 0 ⎟
Ir = ⎜ ⎟
⎝... ... ... ...⎠
0 0 ... 1
is called an identity matrix (diagonal elements are one, off-diagonal elements zero).
Note that
r
Ir = ∑ ei ei .
i=1

If A is an r × s matrix then Ir A = A and AIs = A. For any r-vector a, diag(a) is


a diagonal matrix with r diagonal elements given by a = (a1 , ..., ar ) (in the same
order). On the other hand, we also write diag(A) for a diagonal matrix with the
diagonal elements as in A.
A permutation matrix Pr is obtained by permuting the rows and/or the columns
of Ir . The elements of the permutation matrix are then zeros and ones, and the row
and column sums equal one. An elemental permutation

Ir − ei ei − e j ej + ei ej + e j ei

is obtained just by interchanging the ith and jth row of the identity matrix. Permu-
tation matrices can be given as a product of elemental permutations. Write c(Pr ) for
the smallest number of elemental permutations needed for transformation Ir → Pr .
The set of all r × r permutation matrices by Pr includes r! different permutation
matrices.
An r × r matrix A is called a projection matrix if it is idempotent, that is, if
A2 = A and A = A. If A is a projection matrix, then so is Ir − A.
r × r matrix B is the inverse of an r × r matrix A if

AB = BA = Ir .

Then we write B = A−1 . A square matrix A is called invertible if its inverse A−1
exists. Clearly identity matrix Ir is invertible and I−1
r = Ir . Permutation matrices are
A Some vector and matrix algebra 211

invertible with P−1 


r = Pr . The inverse of a diagonal matrix A = diag(a1 , ..., ar ) exists
if all diagonal elements are nonzero, and then A−1 = diag(a−1 −1
1 , ..., ar ). Matrix A
−1 
is called orthogonal if A = A . Note that permutation matrices are orthogonal.
The determinant of an r × r diagonal matrix A is

det(A) = a11 a22 · · · arr .

More generally, the determinant of an r × r square matrix A, written as det(A), is


defined as  
det(A) = ∑ (−1)c(Pr ) det(diag(Pr A))

where the sum is over all possible permutations Pr ∈ Pr .


The trace of an r × r square matrix A, denoted by tr(A), is the sum of diagonal
elements, that is,
tr(A) = a11 + a22 + · · · + arr .
Note that
tr(A + B) = tr(A) + tr(B) and tr(AB) = tr(BA).
The matrix norm |A| of an r × s-matrix A may be written using the trace as
# $1/2
r s  1/2  1/2
|A| = ∑∑ a2i j = tr(AA ) = tr(A A) .
i=1 j=1

Eigenvalue decompositions: Let A be an r × r symmetric matrix. Then one can


write that
A = UDU ,
where U is an orthogonal matrix and D is a diagonal matrix with ordered diagonal
elements d1 ≥ d2 ≥ · · · ≥ dr . The columns of U = (u1 · · · ur ) are called the eigenvec-
tors and the diagonal elements d1 , ..., dr corresponding eigenvalues of A. Note that
then it is true that Aui = di ui , i = 1, ..., r. Also note that |A|2 = ∑ di2 .
Next let A be any r × s matrix, and assume that r ≤ s. Then it can be written as

A = UDV ,

where U and V are r × r and s × s orthogonal matrices, respectively, and D =


(D1 , D2 ) with r × r diagonal matrix D1 and r × s zero matrix D1 . Again write
d1 ≥ · · · ≥ dr for the diagonal elements of D1 . The columns of U (V) are the eigen-
vectors of AA (A A) with the first r eigenvalues di2 , i = 1, ..., r. Again, |A|2 = ∑ di2 .
An r × r matrix A is said to be nonnegative definite if a Aa ≥ 0 for all r-vectors
a. We then write A ≥ 0. Moreover, A is positive definite (write A > 0) if a Aa > 0
for all r-vectors a = 0.
The Kronecker product of two matrices A and B, written A ⊗ B, is the partitioned
matrix
212 A Some vector and matrix algebra
⎛ ⎞
a11 B a21 B ... ar1 B
⎜a12 B a22 B ... ar2 B⎟
A⊗B = ⎜
⎝ ... ... ... ... ⎠ .

a1s B a2s B ... ars B


The Kronecker product satisfies

(A ⊗ B) = A ⊗ B and (A ⊗ B)−1 = A−1 ⊗ B−1

and
(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).

In statistics, people often wish to work with vectors instead of matrices. The
“vec” operation is then used to vectorize a matrix. If A = (a1 · · · as ) is an r × s
matrix, then ⎛ ⎞
a1
vec(A) = ⎝ ... ⎠
as
just stacks the columns of A on top of each other. An often very useful result for
vectorizing the product of three matrices is

vec(BCD) = (D ⊗ B) vec(C).

Clearly then, with two matrices B and C,

vec(BC) = (I ⊗ B) vec(C) = (C ⊗ I) vec(B).

Note also that, for a r × r matrix A, tr(A) = [vec(Ir )] vec(A).


Starting with vectorized p × p matrices, the following matrices prove very useful.
Let ei be a p-vector with ith element one and others zero. Then define the following
p2 × p2 matrices.
n
D p,p = ∑ (ei ei ) ⊗ (ei ei ),
i=1
p p
J p,p = ∑ ∑ (ei ej ) ⊗ (ei ej ) = vec(I p )[vec(I p )] ,
i=1 j=1

and
p p
K p,p = ∑ ∑ (ei ej ) ⊗ (e j ei ).
i=1 j=1

Note also that


p p
I p2 = ∑ ∑ (ei ei ) ⊗ (e j ej ).
i=1 j=1

Then one can easily see that, if A is a p × p-matrix,

D p,p vec(A) = vec(diag(A)) and J p,pvec(A) = tr(A)vec(I p )


A Some vector and matrix algebra 213

and
K p,pvec(A) = vec(A ).
The matrix K p,p is sometimes called a commutation matrix.
Appendix B
Asymptotical results for methods based
on spatial signs

B.1 Some auxiliary results

Let y = 0 and μ be any p-vectors, p > 1. Write also

r = |y| and u = |y|−1 y.

Then accuracies of different (constant, linear, and quadratic) approximations of


function |y − μ | of μ are
1.
||y − μ | − |y|| ≤ |μ |.
2.
|μ |2
||y − μ | − |y| − uμ | ≤ 2 .
r
See (3.8) in Bai et al. (1990).
3. " "
" " 2+δ
"|y − μ | − |y| − u μ − μ  1 [I p − uu ]μ " ≤ C |μ |
" 2r " r1+δ
for all 0 < δ < 1 where C does not depend on y or μ . Use Part 2 above and the
Taylor theorem, and the result that

min{a, a2 } ≤ a1+δ , for all a > 0 and 0 < δ < 1.

See also Lemma 19(iv) in Arcones (1998).


In a similar way, the accuracies of constant and linear approximations of function
|y + μ |−1 (y + μ ) of μ are given by
1. " "
" y−μ y "" |μ |
"
" |y + μ | |y| " ≤ 2 r .

See (2.16) in Bai et al. (1990).

215
216 B Asymptotical results for methods based on spatial signs

2. " "
" y−μ y 1 " |μ |1+δ
" − − [I −  "
] μ
" |y + μ | |y| r p uu " ≤ C r1+δ

for all 0 < δ < 1 where C does not depend on y or μ . Use Part 1 above and the
Taylor theorem again.

B.2 Basic limit theorems

For the first three theorems, see Sections 1.8 and 1.9 in Serfling (1980), for example.
Theorem B.1. (Chebyshev). Let y1 , y2 , ... be uncorrelated (univariate) random vari-
ables with means μ1 , μ2 , ... and variances σ12 , σ22 , .... If ∑ni=1 σi2 = o(n2 ), n → ∞,
then
1 n 1 n

n i=1
yi − ∑ μi →P 0.
n i=1

Theorem B.2. (Kolmogorov). Let y1 , y2 , ... be independent (univariate) random vari-


ables with means μ1 , μ2 , ... and variances σ12 , σ22 , .... If ∑ni=1 σi2 /i2 converges, then

1 n 1 n

n i=1
yi − ∑ μi → 0 almost surely.
n i=1

The following theorem (Corollary A.1.2 in Hettmansperger and McKean (1998)


will be useful in the future. Note that the simple central limit theorem is a special
case.
Theorem B.3. Suppose that (univariate) random variables y1 , y2 , ... are iid with
E(yi ) = 0 and Var(yi ) = 1. Suppose that the triangular array of constants c1n , ..., cnn ,
n = 1, 2, ..., is such that
n
∑ c2in → σ 2 , 0 < σ2 < ∞
i=1

and
max |cin | → 0, as n → ∞.
1≤i≤n

Then
n
∑ cin yi →D N(0, σ 2 ).
i=1

The following result is an easy consequence of Corollary 1.9.2 in Serfling (1980).


Theorem B.4. Suppose that y1 , y2 , ... are independent random variables with E(yi )
= 0, E(y2i ) = σi2 and E(|yi |3 ) = γi < ∞. If
B.3 Notation and assumptions 217

(∑ni=1 γi )2
→0
(∑ni=1 σi2 )3

then
∑n y
 i=1 i →d N(0, 1).
∑ni=1 σi2

The following key result is Lemma 4.2 in Davis et al. (1992) and Theorem 1 in
Arcones (1998).
Theorem B.5. Let Gn (μ ), μ ∈ R p , be a sequence of convex stochastic processes,
and let G(μ ) be a convex (limit) process in the sense that the finite dimensional dis-
tributions of Gn (μ ) converge to those of G(μ ). Let μ̂ , μ̂ 1 , μ̂ 2 , ... be random variables
such that

G(μ̂ ) = inf G(μ ) and Gn (μ̂ n ) = inf Gn (μ ), n = 1, 2, · · · .


μ μ

Then
μ̂ n →d μ̂ .

B.3 Notation and assumptions

Let y be a p-variate random vector with cdf F and p > 1. The spatial median of F
minimizes the objective function

D(μ ) = E{|y − μ | − |y|}.

(Note that no moment assumptions are needed as D(μ ) ≤ |μ |.) We wish to test the
null hypothesis H0 : μ = 0 and also estimate the unknown value of μ .

Let y1 , ..., yn be a random sample from a p-variate distribution F, p > 1. Write

Dn (μ ) = ave{|yi − μ | − |yi|}.

The function Dn (μ ) as well as D(μ ) is convex and bounded. The sample spatial
median is defined as
μ̂ = arg min Dn (μ ).
We also define vector- and matrix-valued functions
 
y 1 yy yy
U(y) = , A(y) = Ip − 2 , and B(y) =
|y| |y| |y| |y|2

for y = 0 and U(0) = 0 and A(0) = B(0) = 0. The statistic

Tn = ave{U(yi )}
218 B Asymptotical results for methods based on spatial signs

is then the spatial sign test statistic for testing the null hypothesis that the spatial
median is zero.
Assumption 11 We assume that (i) the density function f of y is continuous and
bounded in an open neighborhood of the origin, and that (ii) the spatial median of
the distribution of y is zero and unique; that is,

D(μ ) > 0, ∀μ = 0.

First note the following.


Lemma B.1. If the density function f of y is continuous and bounded in an open
neighborhood of the origin then E(|y|−α ) < ∞ for all 0 ≤ α < 2.
We also write
A = E {A(y)} and B = E {B(y)} .
The expectation defining B clearly exists and is bounded (|B(y)| = 1). Our assump-
tion implies that E(|y|−1 ) < ∞ and therefore also A exists and is bounded. Auxiliary
results in Section B.1 and Lemma B.1 then imply the following.
Lemma B.2. Under our assumptions,
1
D(μ ) = μ  Aμ + o(|μ |2 ).
2
See also Lemma 19 in Arcones (1998).

B.4 Limiting results for spatial median

Lemma B.3. μ̂ → 0 almost surely.


Proof.
1. First, Dn (μ ) → D(μ ) almost surely for all μ (LLN).
2. As Dn (μ ) and D(μ ) are bounded and convex, also

sup |Dn (μ ) − D(μ )| → 0 almost surely


| μ |≤C

for all C > 0 (Theorem 10.8 in Rockafellar (1970)).


3. Write
μ̂ ∗ = arg min Dn (μ )
| μ |≤C

for some C. Then Dn (μ̂ ∗ ) → 0 and μ̂ ∗ → 0 almost surely. This is seen as

Dn (μ̂ ∗ ) ≤ Dn (0) → D(0) ≤ D(μ̂ ∗ )

almost surely and Dn (μ̂ ∗ ) − D(μ̂ ∗) → 0 almost surely.


B.5 Limiting results for the multivariate regression estimate 219

4. Finally, we show that |μ̂ | ≤ C almost surely. Write

δ = inf D(μ ).
| μ |≥C

If μ ≥ C then, for any 0 < ε < δ ,

Dn (μ ) > δ − ε > Dn (0) ≥ Dn (μ̂ ∗ )

almost surely, and the result follows.

Using our results in Sections B.1 and B.2 we easily get the following.

Lemma B.4. Under our assumptions, nTn →d N p (0, B).
and
Lemma B.5. Under our assumptions
 
√ 1
nDn (n−1/2 μ ) − nTn − Aμ μ →P 0.
2

Then Theorem B.5 implies the next theorem.


Theorem B.6. Under our assumptions,

nμ̂ →d N p (0, A−1 BA−1 ).

The proof was constructed in the multivariate case (p > 1). The univariate case
can be proved in the same way. The matrix A is then replaced by the scalar a = 2 f (0)
and B by b = 1.

B.5 Limiting results for the multivariate regression estimate

We now assume that


yi = β  xi + ε i , i = 1, ..., n,
where ε 1 , ..., ε n are iid from a distribution satisfying Assumption 11. The q-variate
design variables satisfy the following assumption.

Assumption 12 Let X = (x1 , ..., xn ) be the (fixed) n × q design matrix. We assume


that
1 
X X → D,
n
and
max1≤i≤n {xi C Cxi }
→ 0 for all p × q matrices C.
∑ni=1 {xi C Cxi }
220 B Asymptotical results for methods based on spatial signs

It is not a restriction to assume that β = 0 so that y1 , ..., yn also satisfy


Assumption 11. Write  
Tn = ave ui xi
which can be used to test the null hypothesis H0 : β = 0. Consider now the objective
function
Dn (β ) = AVE{|yi − μ i | − |yi |},
where
μi = β  x i , i = 1, ..., n.
The function Dn (β ) is again convex, and the solution is unique if there is no p × q-
matrix C such that yi = Cxi for all i = 1, ..., n. The multivariate regression estimate
is then defined as
β̂ = arg min Dn (β ).
Again, using our results in Sections B.1 and B.2, we obtain the following.

Lemma B.6. Under our assumptions, n vec(Tn ) →d N p (0, D ⊗ B).

Lemma B.7. Under our assumptions,


 
√ 1
nDn (n−1/2β ) − vec nTn − Aβ D vec(β ) →P 0.
2

Then Theorem B.5 implies the following.


Theorem B.7. Under our assumptions,

nvec(β̂ ) →d N pq (0, D−1 ⊗ A−1BA−1 ).
References

Anderson, T.W. (1999). Asymptotic theory for canonical correlation analysis. Jour-
nal of Multivariate Analysis, 70, 1–29.
Anderson, T.W. (2003). An Introduction to Multivariate Statistical Analysis. Third
Edition, Wiley, New York.
Arcones, M.A. (1998). Asymptotic theory for M-estimators over a convex kernel.
Econometric Theory, 14, 387–422.
Arcones, M.A., Chen, Z., and Gine, E. (1994). Estimators related to U-processes
with applications to multivariate medians: Asymptotic normality. Annals of
Statistics, 22, 1460–1477.
Azzalini, A. (2005). The skew-normal distribution and related multivariate families.
Scandinavian Journal of Statistics, 32, 159–188.
Bai, Z.D., Chen, R., Miao, B.Q., and Rao, C.R. (1990). Asymptotic theory of least
distances estimate in multivariate linear models. Statistics, 4, 503–519.
Barnett, V. (1976). The ordering of multivariate data. Journal of Royal Statistical
Society, A, 139, 318–355.
Bassett, G. and Koenker, R. (1978). Asymptotic theory of least absolute error re-
gression. Journal of the American Statistical Association, 73, 618–622.
Bickel, P.J. (1964). On some asymptotically nonparametric competitors of
Hotelling’s T 2 . Annals of Mathematical Statistics, 36, 160–173.
Bilodeau, M. and Brenner, D. (1999). Theory of Multivariate Statistics. Springer-
Verlag, New York.
Blomqvist, N. (1950). On a measure of dependence between two random variables.
Annals of Mathematical Statistics, 21, 593–600.
Blumen, I. (1958). A new bivariate sign test for location. Journal of the American
Statistical Association, 53, 448–456.
Brown (1983). Statistical uses of the spatial median, Journal of the Royal Statistical
Society, B, 45, 25–30.
Brown, B. and Hettmansperger, T. (1987). Affine invariant rank methods in the bi-
variate location model. Journal of the Royal Statististical Society, B 49, 301–310.
Brown, B. and Hettmansperger, T. (1989). An affine invariant bivariate version of
the sign test. Journal of the Royal Statistical Society, B 51, 117–125.
Brown, B.M., Hettmansperger, T.P., Nyblom, J., and Oja, H. (1992). On certain
bivariate sign tests and medians. Journal of the American Statistical Association,
87, 127–135.
Busarova, D., Tyurin, Y., Möttönen, J., and Oja, H. (2006). Multivariate Theil esti-
mator with the corresponding test. Mathematical Methods of Statistics, 15, 1–19.
Chakraborty, B. (1999). On multivariate median regression. Bernoulli, 5, 683–703.
Chakraborty, B. (2003). On multivariate quantile regression. Journal of Statistical
Planning and Inference, 110, 109–132.
Chakraborty, B. and Chaudhuri, P. (1996). On the transformation and retransforma-
tion technique for constructing affine equivariant multivariate median. Proceed-
ings of the American Mathematical Society, 124, 2359–2547.

221
222 References

Chakraborty, B. and Chaudhuri, P. (1998). On an adaptive transformation retransfor-


mation estimate of multivariate location. Journal of the Royal Statistical Society,
B., 60, 147–157.
Chakraborty, B. and Chaudhuri, P. (1999). On affine invariant sign and rank tests in
one sample and two sample multivariate problems. In: Multivariate, Design and
Sample Survey (ed. S. Ghosh). Marcel-Dekker, New York. pp. 499–521.
Chakraborty, B., Chaudhuri, P., and Oja, H. (1998). Operating transformation re-
transformation on spatial median and angle test. Statistica Sinica, 8, 767–784.
Chaudhuri, P. (1992). Multivariate location estimation using extension of R-
estimates through U-statistics type approach, Annals of Statistics, 20, 897–916.
Chaudhuri, P. (1996). On a geometric notion of quantiles for multivariate data. Jour-
nal of the American Statistical Society, 91, 862–872.
Chaudhuri, P. and Sengupta, D. (1993). Sign tests in multidimension: Inference
based on the geometry of data cloud. Journal of the American Statistical Soci-
ety, 88, 1363–1370.
Choi, K. and Marden, J. (1997). An approach to multivariate rank tests in multivari-
ate analysis of variance. Journal of the American Statistical Society, 92, 1581–
1590.
Croux, C. and Haesbrock, G. (2000). Principal component analysis based on ro-
bust estimators of the covariance or correlation matrix: Influence functions and
efficiencies. Biometrika, 87, 603–618.
Croux, C., Ollila, E. and Oja, H. (2002). Sign and rank covariance matrices: Statisti-
cal properties and application to principal component analysis. In Statistical Data
Analysis Based on the L1-Norm and Related Methods (ed. by Yadolah Dodge)
Birkhäuser, Basel, pp. 257–270.
Dalgaard, P. (2008). Introductory Statistics with R, Second Edition, Springer, New
York.
DasGupta, S. (1999a). Lawley-Hotelling trace. In: Encyclopedia of Biostatistics,
Wiley, New York.
DasGupta, S. (1999b). Wilks’ Lambda criterion. In: Encyclopedia of Biostatistics,
Wiley, New York.
Datta, S. and Satten, G. A. (2005). Rank-sum tests for clustered data. Journal of the
American Statistical Association, 100, 908–915.
Datta, S. and Satten, G. A. (2008). A signed-rank test for clustered data. Biometrics,
64, 501–507.
Davies, P.L. (1987). Asymptotic behavior of S-estimates of multivariate location
parameters and dispersion matrices. Annals of Statistics, 15, 1269–1292.
Davis, J. B. and McKean, J. (1993). Rank-based methods for multivariate linear
models. Journal of the American Statistical Association, 88, 245–251.
Davis, R. A., Knight, K. and Liu, J. (1992). M-estimation for autoregression with
infinite variance. Stochastic Processes and Their Applications, 40, 145–180.
Dietz, E.J. (1982). Bivariate nonparametric tests for the one-sample location prob-
lem. Journal of the American Statistical Association 77, 163–169.
References 223

Donoho, D.L. and Huber, P.J. (1983). The notion of breakdown poin. In: A
Festschrift for Erich L. Lehmann ( ed. P.J. Bickel, K.A. Doksum and J.L. Hodges)
Belmont, Wadsworth, pp. 157–184.
Dümbgen, L. (1998). On Tyler’s M-functional of scatter in high dimension. Annals
of the Institute of Statistal Mathematics, 50, 471–491.
Dümbgen, L. and Tyler, D. (2005). On the breakdown properties of some multivari-
ate M-functionals. Scandinavian Journal of Statistics, 32, 247–264.
Everitt, B. (2004). An R and S-PLUS Companion to Multivariate Analysis. London:
Springer.
Frahm, G. (2004). Generalized Elliptical Distributions: Theory and Applications.
Doctoral Thesis: Universität zu Köln, Wirtschafts- uund Sozialwissenschaftliche
Fakultät. Seminar für Wirtschafts- und Sozialstatistik.
Gieser, P.W. and Randles, R.H. (1997). A nonparametric test of independence be-
tween two vectors. Journal of the American Statistical Association, 92, 561–567.
Gini and Galvani (1929). Di talune estensioni dei concetti di media ai caratteri qual-
itative. Metron, 8
Gómez, E., Gómez-Villegas, M.A., and Marı́n, J.M. (1998). A multivariate gener-
alization of the power exponential family of distributions. Communications in
Statististics -Theory and Methods, 27, 3, 589–600.
Gower, J. S. (1974). The mediancentre. Applied Statistics, 2, 466–470.
Haataja, R., Larocque, D., Nevalainen, J., and Oja, H. (2008). A weighted multivari-
ate signed-rank test for cluster-correlated data. Journal of Multivariate Analysis,
100, 1107–1119.
Haldane, J.B.S. (1948). Note on the median of the multivariate distributions.
Biometrika,35, 414–415.
Hallin, M. and Paindaveine, D. (2002). Optimal tests for multivariate location based
on interdirections and pseudo-Mahalanobis ranks. Annals of Statistics, 30, 1103–
1133.
Hallin, M. and Paindaveine, D. (2006). Semiparametrically efficient rank-based in-
ference for shape. I. Optimal rank-based tests for sphericity. Annals of Statistics,
34, 2707–2756.
Hampel, F.R. (1968). Contributions to the theory of robust estimation. Ph.D. Thesis,
University of California, Berkeley.
Hallin, M., Oja, H., and Paindaveine, D. (2006). Semiparametrically efficient rank-
based inference for shape. II. Optimal R-estimation of shape. Annals of Statistics,
34, 2757–2789.
Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal
of the American Statistical Association, 62, 1179–1186.
Hampel, F.R., Rousseeuw, P.J., Ronchetti, E.M., and Stahel, W.A. (1986). Robust
Statistics: The Approach Based on Influence Functions. Wiley, New York.
Hettmansperger, T.P. and Aubuchon, J.C. (1988). Comment on “Rank-based robust
analysis of linear models. I. Exposition and review” by David Draper. Statistical
Science, 3, 262–263.
Hettmansperger, T.P. and McKean, J.W. (1998). Robust Nonparametric Statistical
Methods. Arnold, London.
224 References

Hettmansperger, T.P. and Oja, H. (1994). Affine invariant multivariate multisample


sign tests. Journal of the Royal Statistical Society, B, 56, 235–249.
Hettmansperger, T.P. and Randles, R.H. (2002). A practical affine equivariant mul-
tivariate median. Biometrica, 89, 851–860.
Hettmansperger, T. P. , Möttönen, J., and Oja, H. (1997). Multivariate affine invariant
one-sample signed-rank tests. Journal of the American Statistical Association, 92,
1591–1600.
Hettmansperger, T. P. , Möttönen, J., and Oja, H. (1998). Multivariate affine invariant
rank tests for several samples. Statistica Sinica, 8, 785–800.
Hettmansperger, T. P. , Möttönen, J., and Oja, H. (1999). The geometry of the affine
invariant multivariate sign and rank methods. Journal of Nonparametric Statis-
tics, 11, 271–285.
Hettmansperger, T.P., Nyblom, J., and Oja, H. (1992). On multivariate notions of
sign and rank. L1-Statistical Analysis and Related Methods, 267–278. Ed. by Y.
Dodge. Elsevier, Amsterdam.
Hettmansperger, T.P., Nyblom , J., and Oja, H. (1994). Affine invariant multivariate
one-sample sign test. Journal of the Royal Statistical Society, B, 56, 221–234.
Hodges, J.L. (1955). A bivariate sign test. Ann. Math. Statist., 26, 523–527.
Hollander, M. and Wolfe, D.A. (1999). Nonparametric Statistical Methods. 2nd ed.
Wiley, New York.
Huber, P.J. (1981). Robust Statistics. Wiley, New York.
Hyvärinen, A., Karhunen, J., and Oja, E. (2001). Independent Component Analysis.
Wiley, New York.
Hössjer, O. and Croux, C. (1995). Generalizing univariate signed rank statistics for
testing and estimating a multivariate location parameter. Journal of Nonparamet-
ric Statistics, 4, 293–308.
Jan, S.L. and Randles, R.H. (1994). A multivariate signed sum test for the one sam-
ple location problem. Journal of Nonparametric Statistics, 4, 49–63.
Jan, S.L. and Randles, R.H. (1996). Interaction tests for simple repeated measures
designs. Journal of the American Statistical Association, 91, 1611–1618.
John, S. (1971). Some optimal multivariate tests. Biometrika, 58, 123–127.
John, S. (1972). The distribution of a statistic used for testing sphericity of normal
distributions. Biometrika, 59, 169–173.
Johnson, R.A. and Wichern, D.W. (1998). Multivariate Statistical Analysis. Prentice
Hall, Upper Saddle River, NJ.
Kankainen, A., Taskinen, S., and Oja, H. (2007). Tests of multinormality based
on location vectors and scatter matrices. Statistical Methods & Applications, 16,
357–379.
Kendall, M.G. (1938). A new measure of rank correlation. Biometrika, 30, 81–93.
Kent, J. and Tyler, D. (1996). Constrained M-estimation for multivariate location
and scatter. Annals of Statistics, 24, 1346–1370.
Koltchinskii, V. I. (1997). M-estimation, convexity and quantiles. Annals of Statis-
tics, 25, 435–477.
Koshevoy, G. and Mosler, K. (1997a). Multivariate Gini indices. Journal of Multi-
variate Analysis, 60, 252–276.
References 225

Koshevoy, G. and Mosler, K. (1997b). Zonoid trimming for multivariate distribu-


tions. Annals of Statistics, 25, 1998–2017.
Koshevoy, G. and Mosler, K. (1998). Lift zonoids, random convex hulls and the
variability of random vectors. Bernoulli, 4, 377–399.
Koshevoy, G., Möttönen, J., and Oja, H. (2003). A scatter matrix estimate based on
the zonotope. Annals of Statistics, 31, 1439–1459.
Koshevoy, G., Möttönen, J., and Oja, H. (2004). On the geometry of multivariate L1
objective functions. Allgemeines Statistisches Archiv, 88, 137–154.
Larocque, D. (2003). An affine-invariant multivariate sign test for cluster correlated
data. The Canadian Journal of Stastistics, 31, 437–455.
Larocque, D., Nevalainen, J., and Oja, H. (2007). A weighted multivariate sign test
for cluster correlated data. Biometrika, 94, 267–283.
Larocque, D., Nevalainen, J. and Oja, H. (2007). One-sample location tests for mul-
tilevel data. Journal of Statistical Planning and Inference, 8, 2469–2482.
Larocque, D., Haataja, R., Nevalainen, J., and Oja, H. (2010). Two sample tests
for the nonparametric Behrens-Fisher problem with clustered data. Journal of
Nonparametric Statistics, to appear.
Larocque, D., Tardif, S. and van Eeden, C. (2000). Bivariate sign tests based on
the sup, L1 and L2-norms. Annals of the Institute of Statistical Mathematics 52,
488–506.
Lehmann, E.L. (1998). Nonparametrics: Statistical Methods Based on Ranks. Pren-
tice Hall, Englewood Cliffs, NJ.
Liu, R. Y.(1990). On a notion of data depth based upon random simplices. Annals
of Statistics, 18, 405–414.
Liu, R. Y. (1992). Data depth and multivariate rank tests. In: L1-Statistical Analysis
and Related Methods (ed. Y. Dodge), Elsevier, Amsterdam, pp. 279–294..
Liu, R. and Singh, K. (1993). A quality index based on data depth and multivariate
rank tests. Journal of the American Statistical Association, 88, 252–260.
Liu, R. Parelius, J.M., and Singh, K. (1999). Multivariate analysis by data depth:
Descriptive statistics, graphics and inference (with discussion). Annals of Statis-
tics, 27, 783–858.
Locantore, N., Marron, J.S., Simpson, D.G., Tripoli, N., Zhang, J.T., and Cohen,
K.L. (1999). Robust principal components for functional data. Test, 8, 1–73.
Lopuhaä, H.P. (1989). On the relation between S-estimators and M-estimators of
multivariate location and covariance. Annals of Statistics, 17, 1662–1683.
Lopuhaä, H.P. and Rousseeuw, P.J. (1991). Breakdown properties of affine equivari-
ant estimators of multivariate location and covariance matrices. Annals of Statis-
tics, 19, 229–248.
Magnus, J.R. and Neudecker, H. (1988). Matrix Differential Calculus with Applica-
tions in Statistics and Economics. Wiley, New York.
Mardia, K.V. (1967). A nonparametric test for the bivariate location problem. Jour-
nal of the Royal Statistical Society, B, 290, 320–342.
Mardia, K.V. (1972). Statistics of Directional Data. Academic Press, London.
Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979). Multivariate Analysis. Academic
Press, Orlando, FL.
226 References

Marden, J. (1999a). Multivariate rank tests. In Design of Experiments, and Survey


Sampling (ed. S. Ghosh), M. Dekker, New York, pp. 401–432.
Marden, J. (1999b). Some robust estimates of principal components. Statistics &
Probability Letters, 43, 349–359.
Maronna, R.A. (1976). Robust M-estimators of multivariate location and scatter.
Annals of Statistics, 4, 51–67.
Maronna, R.A., Martin, R.D., and Yohai, V. Y. (2006). Robust Statistics. Theory and
Methods. Wiley, New York.
Mauchly, J.W. (1940). Significance test for sphericity of a normal n-variate distri-
bution. Annals of Mathematical Statistics, 11, 204–209.
McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley, New York.
Milasevic, P. and Ducharme, G.R. (1987). Uniqueness of the spatial median. Annals
of Statistics, 15, 1332–1333.
Mood, A.M. (1954). On the asymptotic efficiency of certain nonparametric two-
sample tests. Annals of Mathematical Statistics, 25, 514–533.
Mosler, K. (2002). Multivariate Dispersion, Central Regions, and Depth: The Lift
Zonoid Approach. Springer, New York.
Möttönen, J. and Oja, H. (1995). Multivariate spatial sign and rank methods. Jour-
nal of Nonparametric Statistics, 5, 201–213.
Möttönen, J., Hüsler, J., and Oja, H. (2003). Multivariate nonparametric tests
in randomized complete block design. Journal of Multivariate Analysis, 85,
106–129.
Möttönen, J., Hettmansperger, T.P., Oja, H., and Tienari, J. (1998). On the efficiency
of the affine invariant multivariate rank tests. Journal of Multivariate Analysis, 66,
118–132.
Möttönen, J., Oja, H., and Serfling, R. (2005). Multivariate generalized spatial
signed-rank methods. Journal of Statistical Research, 39, 19–42.
Möttönen, J., Oja, H., and Tienari, J. (1997). On the efficiency of multivariate spatial
sign and rank tests. Annals of Statistics, 25, 542–552.
Muirhead, R.J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.
Muirhead, R.J. and Waternaux, C.M. (1980). Asymptotic distributions in canonical
correlation analysis and other multivariate procedures for nonnormal populations.
Biometrika, 67, 31–43.
Nevalainen, J. and Oja, H. (2006). SAS/IML macros for a multivariate analysis of
variance based on spatial signs. Journal of Statistical Software, 16, Issue 5.
Nevalainen, J., Larocque, D., and Oja, H. (2007b). On the multivariate spatial me-
dian for clustered data. Canadian Journal of Statistics, 35, 215–231.
Nevalainen, J, Larocque, D., and Oja, H. (2007a). A weighted spatial median for
clustered data. Statistical Methods & Applications, 15, 355–379.
Nevalainen, J., Larocque, D., Oja, H., and Pörsti, I. (2009). Nonparametric analysis
of clustered multivariate data. Journal of the American Statistical Association,
under revision.
Nevalainen, J., Möttönen, J., and Oja, H. (2007). A spatial rank test and corre-
sponding rank estimates for several samples. Statistics & Probability Letters, 78,
661–668.
References 227

Niinimaa, A. and Oja, H. (1995). On the influence function of certain bivariate me-
dians. Journal of the Royal Statistical Society, B, 57, 565–574.
Niinimaa, A. and Oja, H. (1999). Multivariate median. In: Encyclopedia of Statis-
tical Sciences (Update Volume 3) (ed. S. Kotz, N.L. Johnson, and C.P. Read),
Wiley, New York.
Nordhausen, K., Oja, H., and Paindaveine, D. (2009). Signed-rank tests for location
in the symmetric independent component model. Journal of Multivariate Analy-
sis, 100, 821–834.
Nordhausen, K., Oja, H. and Ollila, E. (2009). Multivariate Models and the First
Four Moments. In: Festschrift in Honour of Tom Hettmansperger, to appear.
Nordhausen, K., Oja, H., and Tyler, D. (2006). On the efficiency of invariant multi-
variate sign and rank tests. In: Festschrift for Tarmo Pukkila on his 60th Birthday
(ed. E. Liski, J. Isotalo, J, J. Niemel, S. Puntanen, and G. Styan).
Oja, H. (1983). Descriptive statistics for multivariate distributions. Statistics &
Probability Letters 1, 327–332.
Oja, H. (1987). On permutation tests in multiple regression and analysis of covari-
ance problems. Australian Journal of Statistics 29, 81–100.
Oja, H. (1999). Affine invariant multivariate sign and rank tests and corresponding
estimates: a review. Scandinavian Journal of Statistics 26, 319–343.
Oja, H. and Niinimaa, A. (1985). Asymptotical properties of the generalized median
in the case of multivariate normality. Journal of the Royal Statistical Society, B,
47, 372–377.
Oja, H. and Nyblom, J. (1989). On bivariate sign tests. Journal of the American
Statistical Association, 84, 249–259.
Oja, H. and Paindaveine, D. (2005). Optimal signed-rank tests based on hyper-
planes. Journal of Statistical Planning and Inference, 135, 300–323.
Oja, H. and Randles, R.H. (2004). Multivariate nonparametric tests. Statistical Sci-
ence, 19, 598–605.
Oja, H., Paindaveine, D., and Taskinen, S. (2009). Parametric and nonparametric
tests for multivariate independence in the independence component model. Sub-
mitted.
Oja, H., Sirkiä, S., and Eriksson, J. (2006). Scatter matrices and independent com-
ponent analysis. Austrian Journal of Statistics, 35, 175–189.
Ollila, E., Croux, C., and Oja, H. (2004). Influence function and asymptotic ef-
ficiency of the affine equivariant rank covariance matrix. Statistica Sinica, 14,
297–316.
Ollila, E., Hettmansperger, T.P., and Oja, H. (2002). Estimates of regression coeffi-
cients based on sign covariance matrix. Journal of the Royal Statistical Society,
B, 64, 447–466.
Ollila, E., Oja, H., and Croux, C. (2003b). The affine equivariant sign covariance
matrix: Asymptotic behavior and efficiency. Journal of Multivariate Analysis, 87,
328–355.
Ollila, E., Oja, H., and Koivunen, V. (2003). Estimates of regression coefficients
based on rank covariance matrix. Journal of the American Statistical Association,
98, 90–98.
228 References

Paindaveine, D. (2008). A canonical definition of shape. Statistics & Probability


Letters, 78, 2240–2247.
Pesarin, F. (2001). Multivariate Permutation Tests with applications in Biostatistics,
Wiley, Chichester.
Peters, D. and Randles, R.H. (1990). A multivariate signed-rank test for the one-
sampled location problem. Journal of the American Statistical Association, 85,
552–557.
Peters, D. and Randles, R.H. (1990). A bivariate signed rank test for the two-sample
location problem. Journal of the Royal Statistical Society, B, 53, 493–504.
Pillai, K.C.S. (1955). Some new test criteria in multivariate analysis. Annals of
Mathematical Statistics, 26, 117–121.
Portnoy, S. and Koenker, R. (1997). The Gaussian hare and the Laplacian tortoise:
Computability of squared-error versus absolute-error estimators. Statistical Sci-
ence, 12, 279–300.
Puri, M.L. and Sen, P.K. (1971). Nonparametric Methods in Multivariate Analysis.
Wiley, New York.
Puri, M.L. and Sen, P.K. (1985). Nonparametric Methods in General Linear Models.
Wiley, New York.
Randles, R.H. (1989). A distribution-free multivariate sign test based on interdirec-
tions. Journal of the American Statistical Association, 84 1045–1050.
Randles, R.H. (1992). A Two Sample Extension of the Multivariate Interdirection
Sign Test. L1-Statistical Analysis and Related Methods (ed. Y. Dodge), Elsevier,
Amsterdam, pp. 295–302.
Randles, R.H. (2000). A simpler, affine equivariant multivariate, distribution-free
sign test. Journal of the American Statistical Association, 95, 1263–1268.
Randles, R.H. and Peters, D. (1990). Multivariate rank tests for the two-sample
location problem. Communications in Statistics – Theory and Methods, 19,
4225–4238.
Rao, C.R. (1948). Tests of significance in multivariate analysis. Biometrika 35,
58–79.
Rao, C.R. (1988). Methodology based on L1 -norm in statistcal inference. Sankhya
A 50, 289–311.
Rockafellar, R.T. (1970). Convex Analysis. Princeton University Press, Princeton,
NJ.
Rosner, B. and Grove, D. (1999). Use of the Mann-Whitney U-test for clustered
data. Statistics in Medicine, 18, 1387–1400.
Rosner, B. Grove, D., and Ting Lee, M.-L. (2003). Incorporation of clustering ef-
fects for the Wilcoxon rank sum test: A large sample approach. Biometrics, 59,
1089–1098.
Rosner, B. Grove, D., and Ting Lee, M.-L. (2006). The Wilcoxon signed rank test
for paired comparisons of clustered data. Biometrics, 62, 185–192.
Seber, G.A.F. (1984). Multivariate Observations. Wiley, New York.
Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. Wiley,
New York.
References 229

Serfling, R.J. (2004). Nonparametric multivariate descriptive measures based on


spatial quantiles. Journal of Statistical Planning and Inference, 123, 259–278.
Shen, G. (2008). Asymptotics of Oja median estimate. Statistics and Probability
Letters, 78, 2137–2141.
Shen, G. (2009). Asymptotics of a Theil-type estimate in multiple linear regression.
Statistics and Probability Letters, 79, 1053–1064.
Sirkiä, S., Taskinen, S., and Oja, H. (2007). Symmetrised M-estimators of scatter.
Journal of Multivariate Analysis, 98, 1611–1629.
Sirkiä, S., Taskinen, S., Oja, H. and Tyler, D. (2008). Tests and estimates of
shape based on spatial signs and ranks. Journal of Nonparametric Statistics, 21,
155–176.
Small, C.G. (1990). A survey of multidimensional medians. International Statistical
Review, 58, 263–277.
Spearman, C. (1904). The proof and measurement of association between two
things. American J. Psychology, 15, 72–101.
Spjøtvoll, E. (1968). A note on robust estimation in analysis of variance. Annals of
Mathematical Statistics, 39, 1486–1492.
Sukhanova, E., Tyurin, Y., Möttönen, J., and Oja, H. (2009). Multivariate tests of
independence based on matrix rank and sign correlations. Manuscript.
Tamura, R. (1966). Multivariate nonparametric several-sample tests. Annals of
Mathematical Statistics, 37, 611–618.
Taskinen, S., Croux, C., Kankainen, A., Ollila, E., and Oja, H. (2006). Influence
functions and efficiencies of the canonical correlation and vector estimates based
on scatter and shape matrices. Journal of Multivariate Analysis, 97, 359–384.
Taskinen, S., Kankainen, A., and Oja, H. (2002). Tests of independence based on
sign and rank covariances. In: Developments in Robust Statistics (ed. R. Dutter,
P. Filzmoser, U. Gather and P.J. Rousseeuw), Springer, Heidelberg, pp. 387–403.
Taskinen, S., Kankainen, A., and Oja, H. (2003). Sign test of independence between
two random vectors. Statistics & Probability Letters, 62, 9–21.
Taskinen, S., Kankainen, A., and Oja, H. (2003). Rank scores tests of multivariate
independence. In: Theory and Applications of Recent Robust Methods (ed. M. Hu-
bert, G. Pison, A. Stryuf and S. Van Aelst), Birkhauser, Basel, pp. 153–164.
Taskinen, S., Oja, H., and Randles, R.H. (2005). Multivariate nonparametric tests of
independence. Journal of the American Statistical Association, 100, 916–925.
Tatsuoka, K.S. and Tyler, D. (2000). On the uniqueness of S-functionals and M-
functionals under nonelliptic distribtuions. Annals of Statistics, 28, 1219–1243.
Tukey, J.W. (1975). Mathematics and the picturing of data. In Procedings of Iner-
national Congress of Mathematics, vol. 2, Vancouver, 1974, pp. 523–531.
Tyler, D.E. (1982). Radial estimates and the test for the sphericity, Biometrika, 69,
429–436.
Tyler, D.E. (1983). Robustness and efficiency properties of scatter matrices,
Biometrika, 70, 411–420.
Tyler, D.E. (1987). A distribution-free M-estimator of multivariate scatter. Annals
of Statistics, 15, 234–251.
230 References

Tyler, D.E. (2002). High breakdown point multivariate M-estimation. Estadistica,


54, 213–247.
Tyler, D., Critchley, F., Dumbgen, L., and Oja, H. (2009). Invariant co-ordinate se-
lection. Journal of the Royal Statistical Society, B, 71, 549–592.
Vardi, Y. and Zhang, C-H. (2001). A modified Weiszfeld algorithm for the Fermat-
Weber location problem. Math.Program., 90, 559–556.
Venables, W.N. Smith, D.M., and the R Development Core Team (2009). An Intro-
duction to R. Notes on R: A Programming Environment for Data Analysis and
Graphics. Version 2.10.0 (2009-10-26), available at http://www.r-project.org/
Visuri, S., Koivunen, V., and Oja, H. (2000). Sign and rank covariance matrices.
Journal of Statistical Planning and Inference, 91, 557–575.
Visuri, S., Ollila, E., Koivunen, V., Möttönen, J. and Oja, H. (2003). Affine equiv-
ariant multivariate rank methods. Journal of Statistical Planning and Inference,
114, 161–185.
Weber, A. (1909). Über den Standort der Industrien, Tübingen.
Wilks, S.S. (1935). On the independence of k sets of normally distributed statistical
variates, Econometrica, 3, 309–326.
Zhou, W. (2009). Asymptotics of the multivariate Wilcoxon regression estimates.
Under revision.
Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function, Annals
of Statistics, 28, 461–482.
Index

L1 estimation, 190, 194 independence


L2 estimation, 188 test for, 133
L2 regression, 187 independent component analysis, 26
influence function, 24
breakdown point, 23 inner
centering, 31, 146
canonical correlation, 141 centering and standardization, 31, 110, 146
clustered data, 201 standardization, 31, 50, 110
computation interdirection, 45, 69, 94, 143
L1 regression estimate, 191, 195 invariant coordinate selection, 26
shape matrix, 110
spatial Hodges-Lehmann estimate, 91 John’s test, 111
spatial median, 71, 76
Kronecker product, 211
data Kruskal-Wallis test, 157
air pollution, 124
bean plants, 180 Lawley-Hotelling’s trace, 149
cork borings, 47 lift zonotope, 45
Egyptian skulls, 163 lift-interdirection, 94
LASERI, 138 location statistic, 18
dedication, v location vector, 15

eigenvalue, 211 M-functional, 17


eigenvector, 211 M-statistic, 19
estimate matrix
k-step, 77 commutation, 213
Dümbgen, 115 determinant, 211
Hettmansperger and Randles, 75 diagonal, 210
transformation retransformation, 75 identity, 210
treatment effect, 176 inverse, 210
Tyler, 113 norm, 211
orthogonal, 211
Friedman test, 174 permutation, 210
positive definite, 211
Hotelling’s T 2 projection, 210
one sample, 52 trace, 211
two samples, 150 transpose, 209

231
232 Index

median shape matrix, 107


half space, 80 sign-change matrix, 51
Liu, 81 spatial
Oja, 80 Hodges-Lehmann estimate
mixed model, 201 one sample, 90, 91
model two samples, 162
elliptical, 8 Kendall’s tau, 41, 114
generalized elliptical, 12 median, 70, 75, 217
independent component, 12 clustered data, 204
multivariate t, 10 quantile, 39
multivariate Cauchy, 10 rank, 35
multivariate location-scatter, 11 rank covariance matrix, 41
multivariate mixture, 12 rank test
multivariate normal, 9 for shape, 115
multivariate power exponential, 10 independence, 135, 136
multivariate skew elliptical, 13 randomized blocks, 174
Mood’s test, 153 regression, 193, 196
multivariate analysis of variance, 147, 173 several samples, 157, 160
multivariate kurtosis, 27 two clustered samples, 206
multivariate regression, 183 sign, 35
multivariate skewness, 27 sign covariance matrix, 41
sign test, 218
notation, xiii clustered data, 204
for shape, 112
Oja independence, 134
sign test, 94 one sample, 61, 66
signed-rank test, 94 regression, 190, 192
signs and ranks, 45, 170 several samples, 153, 156
outer signed-rank, 35
centering, 30 signed-rank test, 84, 87
centering and standardization, 31 Spearman’s rho, 42, 116
standardization, 30, 49, 146 sphericity
test for, 111
Page test, 181
Pillai’s trace, 133, 149 theorem
Pitman asymptotic relative efficiency, 97, 117, Chebyshev, 216
138, 176 convex processes, 217
preface, vii Kolmogorov, 216
principal component analysis, 26, 42, 123 Tyler
projection matrix, 51, 109, 150 shape matrix, 65, 113
transformation, 65
quadrant test, 133
valid test, 53
randomized blocks, 171 vec operation, 19, 212
Rayleigh’s statistic, 68
regression equivariance, 188 Wilcoxon-Mann-Whitney test, 157
Wilks’ lambda, 149
scale parameter, 108 Wilks’ test, 133
scatter matrix, 16, 108
scatter statistic, 18 zonotope, 45

You might also like