Model Validity
Model Validity
Series Editor
N. Balakrishnan
McMaster University
Department of Mathematics and Statistics
1280 Main Street West
Hamilton, Ontario L8S 4K1
Canada
Max Engelhardt
E G & G Idaho, Inc.
Idaho Falls, ID 83415
Harry F. Martz
Group A-1 M S F600
Los Alamos National Laboratory
Los Alamos, N M 87545
Gary C. McDonald
N A O Research & Development Center
30500 Mound Road
Box 9055
Warren, M I 48090-9055
Peter R. Nelson
Department of Mathematcal Sciences
Clemson University
Martin Hall
Box 341907
Clemson, SC 29634-1907
Kazuyuki Suzuki
Communication & Systems Engineering Department
University of Electro Communications
1-5-1 Chofugaoka
Chofu-shi
Tokyo 182
Japan
Goodness-of-Fit Tests and
Model Validity
C. Huber-Carol
N . Balakrishnan
M.S. Nikulin
M . Mesbah
Editors
M. S. Nikulin M. Mesbah
Laboratoire Statistique Mathematique Laboratoire de Statistique Appliquee
Universite de Bretagne Sud
Universite Bordeaux 2
56 000 Vannes
33076 Bordeaux Cedex
France
France
and
Laboratory of Statistical Methods
V. Steklov Mathematical Institute
191011 St. Petersburg
Russia
A CIP catalogue record for this book is available from the Library of Congress,
Washington D.C., U S A .
A l l rights reserved. This work may not be translated or copied in whole or in part without the written permission
of the publisher Springer Science+Business Media, L L C ,
except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form
of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar method-
ology now known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even i f the former are not
especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise
Marks Act, may accordingly be used freely by anyone.
9 8 7 6 5 4 3 2 1
Contents
Preface xvii
Contributors xix
List of Tables xxvii
List of Figures xxxiii
v
vi Contents
3 Approximate Models 25
Peter J. Huber
3.1 Models 25
3.2 Bayesian Modeling 27
3.3 Mathematical Statistics and Approximate Models 29
3.4 Statistical Significance and Relevance 31
3.5 Composite Models 32
3.6 The Role of Simulation 38
3.7 Summary Conclusions 40
References 40
4.1 Introduction 45
4.2 Neyman Smooth Goodness-of-Fit Tests 46
4.2.1 Smooth goodness-of-fit tests for
categorized data 47
4.2.2 Partitioning the Pearson-Fisher
chi-squared statistic 48
4.3 Constructing the Pearson-Fisher Decomposition 49
4.4 Simulation Study 50
4.5 Results and Discussion 51
References 55
6.1 Introduction 65
6.2 Chi-Squared Goodness-of-Fit Test 66
6.2.1 Statistics with censure 66
6.2.2 Goodness-of-fit test for a composite hypothesis 67
6.3 Demonstration 68
References 69
7.1 Introduction 73
7.2 Preliminary Notion 74
7.3 SOADR Results for BLUE and LSE 77
7.4 Asymptotics for W~ 81
7.5 Asymptotics Under Alternatives 85
References 87
8.1 Introduction 89
8.2 Progressive Censoring 91
8.3 Test for Exponentiality 92
8.3.1 Null distribution of T 93
8.4 Power Function Approximation and Simulation
Results 95
8.4.1 Approximation of power function 95
8.4.2 Monte Carlo power comparison 97
8.5 Modified EDF and Shapiro-Wilk Statistics 98
8.6 Two-Parameter Exponential Case 99
8.7 Illustrative Examples 100
8.7.1 Example 1: One-parameter exponential case 100
8.7.2 Example 2: Two-parameter exponential case 101
8.8 Multi-Sample Extension 102
8.9 Conclusions 103
References 103
viii Contents
Index 505
Preface
xvii
xviii Preface
May 2001
Contributors
Akaira, M asafumi
Institute of Mathematics, University of Tsukuba, Ibaraki 305-8571, Japan
e-mail: [email protected]
Arnold, Barry C.
Department of Statistics, University of California, Riverside, California
92521-0138, U.S.A.
e-mail: [email protected];edu
Bagdonavicius, V.
Departement de Mathematiques et Sciences Sociales, Universite Bordeaux
2, 33076 Bordeaux Cedex, France
e-mail: [email protected]
Balakrishnan, N.
Department of Mathematics and Statistics, McMaster University,
Hamilton, Ontario L8S 4K1, Canada
e-mail: [email protected]
Baraud, Y.
Ecole Normale Superieure, Paris, France
Beaver, Robert J.
Department of Statistics, University of California, Riverside, California
92521-0138, U.S.A.
e-mail: robert. [email protected]
Bosq, Denis
Laboratoire de Probabilites, Universite Paris VI, 4, Place Jussieu, 75252
Paris Cedex 05, France
e-mail: [email protected]
Bretagnolle, Jean
Laboratoire de Statistique Appliquee, Universite de Paris XI, 91405 Orsay
xix
xx Contributors
Castillo, Enrique
Department of Applied Mathematics and Sciences, University of Cantabria,
E-39005 Santander, Cantabria, Spain
e-mail: [email protected]
Cox, D. R.
Department of Statistics, Nuffield College, Oxford OX1 1NF, England,
U.K.
e-mail: [email protected]
Cuadras, Carles M.
Department of Statistics, University of Barcelona, 08023 Barcelona, Spain
e-mail: [email protected]
Cuadras, Daniel
University of Barcelona, 08023 Barcelona, Spain
D 'Agostino, Ralph B.
Statistics and Consulting Unit, Department of Mathematics and
Statistics, Boston University, Boston, Massachusetts 02215, U.S.A.
e-mail: ralph@bu. edu
Deheuvels. Paul
L.S.T.A., Universite Paris VI, 92340, Bourg-la-Reine, France
e-mail: [email protected]
Devarajan, Karthik
Division of Statistics, Northern Illinois University, DeKalb, Illinois 60115,
U.S.A.
Diack, Cheikh A. T.
Department of Statistics, University of Warwick, Coventry CV4 7 AL,
UK
e-mail: [email protected]
Dodge, Yadolah
Groupe de Statistique, University of Neuchatel, CH-2002 Neuchatel,
Switzerland
e-mail: [email protected]
Contributors xxi
Ducharme, Gilles R.
Departement des Sciences Mathematiques, Universite Montpellier II, 34095
Montpellier Cedex 5, France
e-mail: [email protected]
Dudley, Richard M.
Department of Mathematics, Massachusetts Institute of Technology,
Cambridge, Massachusetts 02215, U.S.A.
e-mail: [email protected]
Dufour, J ean-Marie
CIRANO and CRDE, Universite de Montreal, Montreal, Quebec H3C
3J7, Canada
e-mail:
Dupuy, Jean-Francois
Department of Applied Statistics, University of South Brittany, 56000
Vannes, France
e-mail: [email protected]
Ebrahimi, Nader
Division of Statistics, University of Northern Illinois, DeKalb, Illinois
60115, U.S.A.
e-mail: [email protected]
Farhat, Abdeljelil
CIRANO, Universite de Montreal, Montreal, Quebec H3A 2A5, Canada
e-mail: [email protected]
Frichot, Benoit
Departement des Sciences Mathematiques, Universite Montpellier II, 34095
Montpellier Cedex 5, France
e-mail: [email protected]
Gerville-Reache, Leo
Departement de Mathematiques et Sciences Sociales, Universite Bordeaux
2, 33076 Bordeaux Cedex, France
e-mail: [email protected]
Gross, Shulamith T.
Laboratoire de Statistique, Universite de Paris V, 75006 Paris, France
e-mail: [email protected]
Gulati, Sneh
Department of Statistics, Florida International University, Miami, Florida
33199, U.S.A.
e-mail: [email protected]
xxii Contributors
Hamon, Agnes
Laboratoire SABRES, Universite de Bretagne-Sud, 56000 Vannes, France
e-mail: [email protected]
Haughton, Dominique M.
Mathematical Sciences, Bentley College, Waltham, Massachusetts 02452-
4705, U.S.A.
e-mail: [email protected]
Hoang, Thu
Laboratoire de Statistique Medicale, Universite de Paris V, 75006 Paris,
France
e-mail: [email protected]
Huber, P. J.
P.O. Box 198, CH-7250, Klosters, Switzerland
e-mail: [email protected]
Huber-Carol, Catherine
Universite Paris Vand U472 INSERM, Paris, France
e-mail: [email protected]
Huet, S.
INRA Biometrie, 78352 Jouy-en-Josas Cedex, France
Jureckova, Jana
Statistics Department, Charles University, Czech Republic
e-mail: [email protected]
Kannan, N.
Division of Mathematics and Statistics, The University of Texas at San
Antonio, Texas 78249-0664, U.S.A.
e-mail: [email protected]
K eiding, Niels
Department of Biostatistics, University of Copenhagen, 2200 Copenhagen,
Denmark
e-mail: [email protected]
Khattree, Ravi
Department of Mathematics and Statistics, Oakland University, Rochester,
Missouri 48309-4485, U.S.A.
e-mail: [email protected]
Laurent, B.
Laboratoire de Statistique, Universite de Paris XI, 91405 Orsay Cedex,
France
e-mail: [email protected]
Con tri bu tors xxiii
M ohdeb, Zaher
Departement de Mathematiques, Universite Mentouri Constantine, 25000
Constantine, Algeria
e-mail: [email protected]
Mokkadem, Abdelkader
Department of Mathematics, University of Versailles-Saint-Quentin, 78035
Versailles Cedex, France
e-mail: [email protected]
Mudholkar, G. S.
Department of Statistics, University of Rochester, Rochester, New York
14627-0047, U.S.A.
e-mail: [email protected]
Naik, Dayanand N.
Department of Mathematics and Statistics, Oakland University, Rochester,
Missouri 48309-4485, U.S.A.
Nam, Byung-Ho
Statistics and Consulting Unit, Department of Mathematics and Statis-
tics, Boston University, Boston, Massachusetts 02215, U.S.A.
e-mail: [email protected]
Neus, Jordan
Biostatistics, State University of New York at Stony Brook, Stony Brook,
New York, U.S.A.
e-mail: [email protected]
Ng, H. K. T.
Department of Mathematics and Statistics, McMaster University,
Hamilton, Ontario L8S 4K1, Canada
e-mail: [email protected]
xxiv Contributors
Nikulin, M. S.
UFR de Mathematiques, Informatique et Sciences Sociales, Universite
Bordeaux 2, Bordeaux, France
e-mail: [email protected]
Parsons, Van L.
National Center for Health Statistics, Hyattsville, Maryland 20782-2003,
U.S.A.
e-mail: [email protected]
Pons, Odile
INRA Biometrie, 78352 Jouy-en-Josas Cedex, France
e-mail: [email protected]
Rao, C. R.
Department of Statistics, Pennsylvania State University, University Park,
Pennsylvania 16802, U.S.A.
e-mail: eer1 @psu.edu
Rayner, G. D.
School of Computing and Mathematics, Deakin University, Geelong,
VIC3217 Australia
e-mail: [email protected]
Rayner, J. C. W.
School of Mathematics and Applied Statistics, University of Wollongong,
Wollongong NSW 2522, Australia
e-mail: [email protected]
Rousseeuw, P. J.
Department of Mathematics and Computer Science, University of Antwerp,
Universiteitsplein 1, B-2610 Antwerp, Belgium
e-mail: [email protected]
Sen, P. K.
Department of Biostatistics, University of North Carolina at Chapel Hill,
North Carolina 27599-7400, U.S.A.
e-mail: [email protected]
Con tri bu tors xxv
Seymour, Lynne
Department of Statistics, The University of Georgia, Athens, Georgia
30602-1952, U.S.A.
e-mail: [email protected]
Solev, V.
The Laboratory of Statistical Methods, Steklov Mathematical Institute,
St. Petersburg, 19011, Russia
e-mail: [email protected]
StruyJ, Anja
Research Assistant, FWO, 1000, Brussels, Belgium
Whitmore, G. A.
McGill University, Montreal, Quebec H3A 2T5, Canada
Zerbet, Afcha
Departement de Mathematiques et Sciences Sociales, Universite Bordeaux
2, 33076 Bordeaux Cedex, France
e-mail: [email protected]
List of Tables
Table 8.1 Progressive censoring schemes used in the Monte Carlo 104
simulation study
Table 8.2 Monte Carlo power estimates for Weibull distribution at 105
10% and 5% levels of significance
Table 8.3 Monte Carlo power estimates for Lomax distribution at 106
10% and 5% levels of significance
Table 8.4 Monte Carlo power estimates for Lognormal distribution 107
at 10% and 5% levels of significance
Table 8.5 Monte Carlo power estimates for Gamma distribution at 108
10% and 5% levels of significance
Table 8.6 Monte Carlo null probabilities of T for exponential dis- 109
tribution at levels 2.5 (2.5) 50%
Table 8.7 Simulated and approximate values of the power of T* at 109
10% and 5% levels of significance
Table 9.1 Power comparisons, n = 50, 5 cutpoints @ 0.4, 0.8, 1.2, 119
1.6, 2.0
Table 9.2 Power comparisons, n = 50, 9 cutpoints @ 0.25,0.5,0.75, 119
1.0, 1.25, 1.5, 1.75, 2.0, 2.25
xxvii
xxviii List of Tables
Table 11.1 Simulation based upper 90, 95 and 99th percentiles of the 146
statistic T for different values of nand m
Table 11.2 Accuracy of chi-square approximations for percentiles of 147
T
Table 11.3 Simulation based upper 90, 95 and 99th percentiles of the 148
statistic T for different values of nand m
Table 11.4 Accuracy of chi-square approximations for percentiles of 149
T
Table 11.5 Power of the T test of size .05 with a standard normal 151
null hypothesis
Table 11.6 Power of the T test of size .05 with a standard normal 152
null hypothesis
Table 11.7 Power of the UkOD test of size .05 with a standard nor- 153
mal null hypothesis
Table 11.8 Ranked set sample of shrub sizes 154
Table 14.1 Empirical quantiles, when 172 is estimated by S;; (theo- 190
retical values at level 1%, 5%, 10%, are 2.33, 1.65, 1.28
respectively)
Table 14.2 Empirical quantiles, when 17 2 is estimated by fi; (theo- 190
retical values at level 1%, 5%, 10%, are 2.33, 1.65, 1.28
respectively)
Table 14.3 Proportion of rejections in 1000 samples of size n = 50; 191
with two examples of alternatives: h(t) = alt + a2 +
pte- 2t and h(t) = alt + a2 + pt 2 , (17 2 estimated by S;)
Table 14.4 Proportion of rejections in 1000 samples of size n = 50; 192
with two examples of alternatives: h (t) = ali + a2 +
pte- 2t and h(t) = alt + a2 + pt 2 , (17 2 estimated by (f )
~2
Table 19.1 Bachelor and Hackett (1970) skin grafts data on severely 262
burnt patients
Table 19.2 Some risk sets R and jump sets S for skin grafts data 263
Table 19.3 Model selection for burn data 263
Table 19.4 Parameters estimation in model 8 having the smallest 263
AlC
xxx List of Tables
Table 33.1 Continuous distributions with their means and variances 442
Table 33.2 Empirical level and power for tests of equality of two 443
distributions: m = 22, n = 22 and a = 5%
Table 33.3 Empirical level and power for MC tests of equality of 445
two continuous distributions having same mean and same
variance: m = n = 22 and a = 5%
Table 33.4 Empirical level and power for tests of equality of two 446
discrete distributions: m = n = 22 and a = 5%
Table 34.1 Power and efficiency of test statistics compared to iso- A57
tonized Kruskal Wallis statistic for a = 0.01, 5 x 5 grids
and one observation per cell
Table 34.2 Power and efficiency of test statistics compared to iso- 458
tonized Kruskal Wallis statistic for a = 0.01, 5 x 5 grids
and four observations per cell
Table 34.3 Comparing ranges of effiency of statistics and choosing a 459
test for selected trend shapes and distributions and for
a = 0.01, 5 x 5 grids and one observation per cell
Table 34.4 Comparing effiency of statistics and choosing a test for 460
selected trend shapes and distributions and for a = 0.01,
5 x 5 grids and four observations per cell
Figure 4.1 Sampling distribution of the V3, V4, Vs,V6 statistics ob- 52
tained from R = 10, 000 samples of size n = 20 taken
from the standard normal distribution. The top row is
for the uncategorised data (u) using Rayner and Best's
method (1989, Chapter 6), and the other rows use my cat-
egorised method (Section 4.3) with ml = 10 categories
(middle, Cl) and m2 = 6 categories (bottom, C2)
xxxi
xxxii List of Figures
Figure 28.1 Step by step procedure with the CAC for the Mobility 374
dimension
Figure 28.2 Traces of the Mobility dimension items 379
Figure 28.3 Difficulty estimates in each group formed by the individ- 380
uals who positively answer to item 2 (GI) and negatively
answer to item 2 (Go)
Figure 28.4 Difficulty estimates in each group formed by the individu- 381
als who positively answer to item 10 (GI) and negatively
answer to item 10 (Go)
Figure 30.1 Examples of (a) a discrete and (b) a continuous angularly 403
symmetric distribution around c. Transforming (a) and
(b) through the mapping h(x) = (x - c)/llx - cll yields
the centrosymmetric distributions in (c) and (d)
Figure 30.2 (a) Bagplot of the spleen weight versus heart weight of 408
73 hamsters. (b) Bagplot of the log-transformed data set
Figure 30.3 Evolution of the exchange rates of DEM/USD (dashed 409
line) and JPY /USD (full line) from July to December
1998
Figure 30.4 Differences between exchange rates on consecutive days 410
for DEM/USD and JPY /USD in the second half of 1998.
The origin is depicted as a triangle
Figure 30.5 The azimuth data 410
Figure 32.1 The probability distribution function g(x; fJ 4, 0,1) for 429
varying values of fJ4
Figure 32.2 Probability density function of the bimodal distribution 430
given by Equation (32.1) with modes at 0 and 4.32
Figure 32.3 Comparison of t-test, Wilcoxon test and score test power 431
curves for testing H 0 : /-L = 0 against K : /-L i- 0 as the
data becomes progressively more non-normal
Figure 32.4 Comparison of power curves of the Wald test using the 432
nearest mode technique for samples of size 20 (solid), 50
(dashes) and 100 (dots) from the bimodal distribution in
Figure 32.3 above; 1000 simulations
Goodness-of- Fit Tests and
Model Validity
PART I
HISTORY AND FUNDAMENTALS
1
Karl Pearson and the Chi-Squared Test
D. R. Cox
Nuffield College, Oxford, England, UK
Abstract: This historical and review paper is in three parts. The first gives
some brief details about Karl Pearson. The second describes in outline the
1900 paper which is being celebrated at this conference. The third provides
some perspective on the importance, historically and contemporarily, of the
chi-squared test.
3
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
4 D. R. Cox
not matrix algebra, and there would have been substantial emphasis on parts
of classical mathematical physics. More importantly the emphasis was strongly
on ingenuity and manipulative skill in problem solving rather than on the devel-
opment of new concepts. There is some evidence that K. P. met, although not
necessarily to be taught by, such major figures as Clerk Maxwell, Cayley and
Green and more particularly Todhunter. Todhunter had published a History of
the Theory of Probability, essentially a long critical essay and review of what
had been published on Probability up to that point, and he was engaged in a
comparable book on the Theory of Elasticity.
After graduating, K. P. spent an extremely influential year in Germany,
studying physics but also philosophy and other aspects of German culture. He
was particularly attracted to the 17th century rationalist philosopher Spinoza.
During this year he changed the spelling of his name from the English spelling
Carl to the Germanic Karl.
After returning to England he qualified as a lawyer and then spent some
years, partly supported by a Fellowship from Kings College, Cambridge, in mis-
cellaneous lecturing mostly on such topics as German philosophy and Marxism.
He was part of an active world of literary and cultural life in London towards
the end of the 19th century. His views seem broadly those of left-wing thought
of the time, enlightened in their attitude to women's rights, socialist in political
thought, believing in the solution of social problems via rational enquiry and
holding views on racial matters that would now widely be regarded as unac-
ceptable. Biographies of major non-scientific figures of the period quite often
mention K. P., in passing at least.
He applied for a number of permanent academic jobs and in 1884 was ap-
pointed Professor of Engineering and Applied Mathematics at University Col-
lege London. His primary duty was to teach elementary mathematics to engi-
neers; he is reported as being outstandingly successful in this. He published
research papers on the theory of elasticity and collaborated with Todhunter on
his History of that field, writing, it is said, much of the second volume.
In 1890 W. F. R. Weldon was appointed Professor of Biology at University
College and an intensely active collaboration developed between them lasting
until Weldon's early death in 1906. Following the impact on Victorian thought
of Charles Darwin and more immediately for K. P. and Weldon of Galton, this
was a period of intense interest in genetics and evolutionary biology. Weldon be-
lieved that careful collection of observational data on natural variability would
provide the key to important issues and K. P. became involved in the analysis
of data collected by Weldon (and others) in their extensive field work and in
the development of what came to be called the biometric school. Their main
technique was the careful study of the shape of univariate and occasionally
bivariate frequency distributions and, in discrete cases to the analysis of two-
dimensional contingency tables. Recognition that distributions were often far
from the normal or Gaussian form led to the development of the flexible system
Karl Pearson and the Chi-Squared Test 5
References
1. Barnard, G. A. (1991). Introduction to Pearson (1900), In Breakthroughs
in Statistics, Vol. 2 (Eds., S. Kotz and N. L. Johnson), pp. 1-10, New
York: Springer-Verlag.
C. R. Rao
Pennsylvania State University, University Park, Pennsylvania
2.1 Introduction
In an article entitled Trial by Number, Hacking says that the goodness-of-fit
chi-square test introduced by Karl Pearson (1900), "ushered in a new kind of
decision making" and gives it a place among the top 20 discoveries since 1900
considering all branches of science and technology. R. A. Fisher, who was in-
volved in bitter controversies with Pearson, was appreciative of the chi-square
test. In his book on Statistical Methods for Research Workers (1958, 13th edi-
9
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
10 C. R. Rao
tion, p. 22), Fisher says, "This (chi-square), I believe is the great contribution
to statistical methodology which the unsurpassed energy of Professor Pearson's
work will be remembered," and devoted one full chapter on numerous ingenious
applications of the chi-square test.
Pearson's chi-square is ideally applicable to qualitative data with a finite
number, say 8, of natural categories and the data are in the form of frequencies
of individuals in different categories. The specified hypothesis is of the form
D(p-7r(e)) (2.2)
for a suitable choice of distance or dissimilarity measure, where P = (PI, ... ,Ps)',
7r(e) = (7rl (e), . .. ,7rs(e))' and e is an efficient estimate of tI.
Ideally tI is estimated by
e= argminD(p
e
- 7r(tI)). (2.3)
(2.5)
ratio, Wald and Rao's score tests, also referred to as Holy Trinity. [See Koenkar
(1987, p. 294) and Lehmann (1999, pp. 525-529)]. In Section 2.3, Pearson's
chi-square and related tests are shown to be score tests, as observed by A.
Bera. The difficulties involved in deriving Wald tests for composite hypotheses
are discussed. Alternative tests of goodness-of-fit based on dissimilarity or
divergence measures derived from entropy functions are given in Section 2.4.
Tests of significance of goodness-of-fit for continuous distributions are reviewed
in Section 2.5. A new test is proposed and the possibility of using bootstrap is
pointed out.
Large sample properties of the likelihood ratio criterion were studied by Wilks
(1938). It was shown that asymptotically
(2.9)
holds, where f(fJ) is the information matrix for a single observation. Then the
Wald (1943) test for Hs : fJ = fJo is
I 2
n(fJ - fJ o) f(fJ)(fJ - fJ o) X (k)
A A A
rv (2.10)
and for the composite hypothesis He defined in (2.7) the Wald test is
(2.11)
where
M(fJ) -- (~fJfJi.),
u an r x k ma't·
nx.
(2.13)
(2.14)
where fJ is the m.l. estimate of fJ under the restrictions (2.7) of the composite
hypothesis.
e,
A variation of the test (2.14) where, instead of the m.l. estimate only Vn
consistent estimate iJ is substituted for fJ is called the Neyman-Rao test by Hall
and Mathiason (1990). Such a statistic has the same chi-square distribution on
r degrees of freedom.
Karl Pearson Chi-Square Test 13
(2.16)
(2.17)
8logL ni
i = 1, ... ,s (2.18)
87ri
where b. is a diagonal matrix with 7ri as the i-th diagonal element. The score
statistic is
(2.20)
where y' = (nI/7r1o, ... ,ns/7rso). Observing that (b.(7ro) - 7r07ro) is a g-inverse
of C(7rO) , we find
(2.21)
where EiO = n7riO, the expected value when 7ri = 7riO, i = 1, ... ,s. The statistic
(2.21) is Pearson's chi-square for testing a simple hypothesis. [Note that in gen-
eral, the scores have to be computed for independent parameters, 7ri, ... ,7rs-1
in the present case, in which case the variance-covariance matrix will be non-
singular. The statistic Rs will have the same expression (2.21)].
(2.22)
LRT
2 "~ Oi log E*
Oi rv X2 (s - 1 - k) (2.23)
~
Score test
Rao's score test, obtained by substituting Ei for EiO in (2.21), is
(2.24)
The results (2.21) and (2.24) show that Pearson's chi-square tests (with the
modification made by Fisher for degrees of freedom when () is estimated) can
be obtained as Rao's score tests.
Wald test
The derivation ofthe Wald test for the composite hypothesis (2.22) is somewhat
complicated as it requires the formulation of (2.22) in the form of restrictions
(2.25)
where () is the probability that a child is a male. The equations (2.25) can be
written as restrictions
s - 17rl s - 27r2 1 7rs- l
--- --- ---- (2.26)
I 7r2 2 7r3 S - 1 7rs
on 7rl, ... ,7rs , which are in the form (2.7) required for applying the Wald test.
We use the formula (2.11) to derive the Wald statistic.
It may be noted that there is no unique representation of the restrictions
(2.26), and the Wald statistic may depend on the actual form in which the
restrictions are expressed. This is one of the drawbacks of the Wald statistic.
Under the model HI, the null hypothesis (2.27) may be stated as
with 0' = (e1, . .. ,Ok) as nuisance parameters. To apply the score test, we need
to compute the scores for aI, ... ,ar and 01 , ... ,Ok and the information matrix
for the (r + k) parameters at the values al = ... = a r = 1 and maximum
likelihood estimates of 0 under Ho.
The scores for ai and OJ at estimated values under Ho are
ai = Oi - n7ri(iJ) , i = 1, ... ,r
8j = 0, j = 1, ... ,k. (2.29)
(2.30)
where
-7rr7rl -7rr7r2
and I is the information matrix for 01, ... ,Ok. The score statistic for testing
Ho is
(2.31)
where a' = (al, ... ,ar ). The asymptotic distribution of (2.31) is chi-square
with r degrees of freedom, if IA - BI- l B'I i- O. Otherwise it is equal to the
rank of A - B I-I B'; if the rank is less than r, we use a g- inverse in the definition
of (2.31). [Note that A - BI- l B' is the asymptotic variance covariance matrix
of a].
Karl Pearson Chi-Square Test 17
(2.32)
and
(2.33)
which is not a score test, although (2.32 and (2.33) are score tests, but is
asymptotically equivalent to the score test for H2 against HI. It can be further
shown that asymptotically
(2.35)
where Eli and E2i are expected values of frequencies in the i-th cell under the
hypotheses HI and H2 respectively.
An illustration of such a test for examining the equality of the A and E
gene frequencies of the 0, A, E, AB blood group system in two communities is
given in a paper by Rao (1961).
18 C. R. Rao
(2.36)
for every convex function ¢ : [0,00) --+ R U {oo} where O¢(OjO) = 0 and
o ¢(PjO) = lim¢(u)ju as u --+ 00. It is shown by Morales, Pardo and Va-
jda (1995) that
(2.37)
if we choose
Read and Cressie (1988) proposed what they call power divergence statistics
defined by
(2.38)
where Ei = n7ri(fJ), using a BAN estimator of e. The statistic (2.38) has the
same asymptotic chi-square distribution on s - 1 - k degrees of freedom. This
class can be obtained as a special case of (2.36) by choosing
_ 1 ).+1
¢(X) - ,X(,x + 1) (x - x), ,X -=J 0,-1. (2.39)
JH(X, y)
X +
= H ( -2- 1 Y) 1
- 2 H (x) - 2H(Y) (2.40)
8n Ls
i=l
[(0.
¢
n
-~
n
12 (0.)
+ E-) - -¢-~
n
- -1(E-)]
2 n
-~ -~ (2.41)
(2.42)
where x' = (Xl, ... , xs) and the coefficients aij are chosen such that the matrix
(2.43)
(2.44)
(2.47)
Karl Pearson Chi-Square Test 21
vnI)X(i) - X(i»)2
(2.49)
Bs = L:(X(i) - x)2 + L:(x(i) - x)2'
The sampling distribution can be obtained by simulation as F(x, ()o) is known.
To test a composite hypothesis that the sample comes from a distribution
F(x, ()), where the function F is specified and the parameter () is unknown, we
can use a statistic Be which is of the same form as (2.49) with xci) defined by
A 2i - 1
F(X(i)'())=~' i=l, ... ,n (2.50)
e
where is an efficient estimate of (). The distribution of Be may be compli-
cated or may involve the unknown parameter. In such a case, it is worthwhile
examining whether the bootstrap method is applicable.
There is some simplification if F(x, ()) belongs to translation scale family,
F[(x - J-L)/a]. In such a case, we define XCi) as
2i - 1
F(X(i)lJ-L = 0, a = 1) = -,)-, i = 1, ... ,n.
~n
(2.51)
In conclusion, the tests described in this paper may have different power
functions depending on the alternatives, but none of them dominates the others.
The purpose of data analysis is learning from data, and a global test is only
of an exploratory nature. Of greater interest is the study of the pattern of
deviations between observed and expected frequencies. No doubt a global test
provides some confidence in search for possible alternatives. Any reasonably
good global test will do for this purpose. Long live chi-square.
References
1. Beran, R. (1986). Simulated power functions, Annals of Statistics, 14,
151-173.
Peter J. Huber
Klosters, Switzerland
Abstract: The increasing size and complexity of data sets increasingly forces
us to deal with less than perfect, but ever more complicated models. I shall
discuss general issues of model fitting and of assessing the quality of fit, and
the important and often crucial roles of robustness and simulation.
3.1 Models
The anniversary of Karl Pearson's paper offers a timely opportunity for a digres-
sion and to discuss the role of models in contemporary and future statistics,
and the assessment of adequacy of their fit, in a somewhat broader context,
stressing necessary changes in philosophy rather than the technical nitty-gritty
they involve. The present paper elaborates on what I had tentatively named
"postmodern robustness" in Huber (1996, final section).
Karl Pearson's pioneering 1900 paper had been a first step. He had been con-
cerned exclusively with distributional models, and with global tests of goodness-
of-fit. He had disregarded problems caused by models containing free para-
meters: how to estimate such parameters, and how to adjust the count of the
number of degrees of freedom. Corresponding improvements then were achieved
by Fisher and others. Though, the basic paradigm of distributional models re-
mained in force and still forms the prevalent mental framework for statistical
modeling. For example, the classical texts in theoretical statistics, such as Cox
and Hinkley (1974) or Lehmann (1986), discuss goodness-of-fit tests only in the
context of distributional models. Apart from this, it is curious how current
statistical parlance categorizes models into classes - such as "linear models"
25
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
26 P. J. Huber
or "generalized linear models." The reason is of course that each such class per-
mits a specific formal statistical theory and analysis, or, expressing it the other
way around: each class constitutes the natural scope of a particular theoretical
framework. In my opinion, to be elaborated below, such narrow categorizations
can have undesirable consequences.
We now have a decisive advantage over Pearson: whenever there is an ar-
bitrarily complex but supposedly exact model and a test statistic, we can, at
least in principle, determine the distribution of that statistic under the null
hypothesis to an arbitrary degree of accuracy with the help of simulation. It is
this advantage which allows us to concentrate on the conceptual aspects, in par-
ticular on the traditionally suppressed problems posed by partial or otherwise
inaccurate models.
The word "model" has a bewilderingly wide semantic range, from tiny trains
to long-legged girls. Though, it hides a simple common feature: a model is
a representation of the essential aspects of some real thing in an idealized,
exemplary form, ignoring the inessential ones. The title of this paper is an
intentional pleonasm: by definition, a model is not an exact counterpart of the
real thing, but a judicious approximation. Mathematical statisticians, being
concerned with the pure ideas, sometimes tend to forget this, despite strong
admonitions to the contrary, such as by McCullagh and NeIder (1983, p. 6):
"all models are wrong." What is considered essential of course depends on the
current viewpoint - the same thing may need to be modeled in various different
ways. Progress in science usually occurs through thinking in models, they help
to separate the essential from the inessential.
A general discussion of the role mathematical models have played in sci-
ence should help to clarify the issues. They can be qualitative or quantitative,
theoretical or empirical, causal or phenomenological, deterministic or stochas-
tic, and very often are a mixture of all. The historical development of the
very first non-trivial mathematical models, namely those for planetary motion,
illustrates some salient points most nicely, namely the interplay between concep-
tual/qualitative and phenomenological/quantitative models, the discontinuous
jumps from a model class to the next, and the importance of precisely locating
the discrepancy rather than merely establishing the existence of discrepancies
by a global test of goodness-of-fit.
Early Greek scientists, foremost among them Anaxagoras (ca. 450 BC), had
tried to explain the irregular motions of the planets by a "stochastic model,"
namely by the random action of vortices. This model did not really explain
anything, it just offered a convenient excuse for their inability to understand
what was going on in the skies. In the 4th century BC, Eudoxos then invented
an incredibly clever qualitative model. He managed to explain the puzzling
retrograde motion of the planets deterministically in accordance with the philo-
sophical theory that celestial motions ought to be circular and uniform. For
each planet, he needed four concentric spheres attached to each other, all ro-
Approximate Models 27
tating uniformly. Quantitatively, the model was not too good. In particular,
even after several improvements, it could not possibly explain the varying lu-
minosity of the planets, because in this model their distances from the earth
remained constant. About the same time the Babylonians devised empirical
models, apparently devoid of any philosophical underpinning; they managed to
describe the motions of the moon and the planets phenomenologically through
arithmetic schemes involving additive superposition of piecewise linear func-
tions. Around 130 AD, Ptolemy then constructed an even cleverer model than
Eudoxos. He retained the politically correct uniform circular motion, but the
circles were no longer concentric. With minor adjustments this type of model
withstood the tests of observational astronomy for almost 1500 years, until Ke-
pler, with the superior observations of Tycho Brahe at his disposal, found a
misfit of merely 8' (a quarter of the apparent diameter of sun or moon, and just
barely above observational accuracy) in the motion of a single planet (Mars).
He devised a fundamentally new model, replacing the uniform circular by el-
liptical motions. The laws describing his model later formed the essential basis
for Newton's theory of gravitation.
For us, Kepler's step is the most interesting and relevant. First, we note that
the geocentric Ptolemaic and the heliocentric Copernican models phenomeno-
logically are equivalent - both belong to the class of epicyclic models, and with
properly adjusted parameters the Ptolemaic model renders the phenomena, as
seen from the earth, absolutely identically to the Copernican one. But the
conceptual step to heliocentricity was a most crucial inducement for Kepler's
invention. Second, Kepler's own account shows how difficult it is to overcome
modeling prejudices. It required the genius of Kepler to jump over the shadow
cast forward by 1500 years of epicyclic modeling. To resist and overcome the
temptation to squeeze the data into a preconceived but wrong model class -
by piling up a few more uninterpretable parameters - may require comparable
independence of mind. I think that here we have a warning tale about the
dangers of categorizing models into narrowly specified classes.
pirical and the model distribution. To fix the idea, let us take the Kolmogorov
distance, which makes exposition simpler (but exactly the same arguments ap-
ply to a distance based on the chi-2 test statistic):
In case (a), the test sometimes correctly rejects the hypothesis of normality
for the wrong reason, namely if an outlier inflates the ML estimate for (J, even
though the fit, in terms of minimized distance (b), is good. The disturbing
fact is that the traditional recommendation, namely to estimate the unknown
free parameters by any asymptotically efficient estimate, viz. either maximum
likelihood or minimum chi-2, may have very different consequences depending
on which estimate one chooses.
(2) Minimax robust estimates. Observational errors in most cases are excel-
lently modeled by the normal distribution, if we make allowance for occasional
gross errors (which may be of different origin). If we formalize this in terms of
a contamination model, then the normal part represents the essential aspects of
the observational process in an idealized form, the contamination part merely
models some disturbance factors unrelated to the quantities of interest. But
model-optimal estimates for the idealized model, e.g., the sample mean, are
unstable under small deviations in the tails of the distribution, while robust
estimates, such as judiciously chosen M-estimates, offer stability but lose very
little efficiency at the normal model.
Typically, we are not interested in estimating the parameters of the conta-
minant, or to test it for goodness-of-fit, and mostly not even able, given the
available sample size. Note that a least favorable distribution is not intended
to model the underlying situation (even though it may approximate the true
error distribution better than a normal model), its purpose is to provide a ML
estimate that is minimax robust.
(3) Linear fit. Assume that you want to fit a straight line to an approx-
imately linear function that can be observed with errors in a certain interval
(a,b). Assume that the goal of the fit is to minimize the integrated mean
square deviation between the true, approximately linear function and an ideal-
ized straight line. A model-optimal design will put one half of the observations
at each of the endpoints of the interval. A fit-optimal design will distribute the
observations roughly uniformly over the interval (a,b). The unexpected and
Approximate Models 31
surprising fact is that subliminal deviations of the function from a straight line
(i.e. deviations too small to be detected by goodness-of-fit tests) may suffice
to make the fit-optimal design superior to the model-optimal design [ef. Huber
(1975b)].
In short: for the sake of robustness, we may sometimes prefer a (slightly)
wrong to a (possibly) right model.
the separate models Ai, keeping the others parts of the model fixed, may be se-
rious underestimates. Cross-validation estimates, for example, are invalidated
by repeated cycling.
Once I had felt that stochastic modeling, despite its importance, belonged
so much to a particular field of application that it was difficult to discuss it in a
broad and general framework, and I had therefore excluded it from a discussion
of current issues in statistics [Huber (1975a, p. 86)J. I now would modify my
former stance. I still believe that an abstract and general discussion will fail
because it is practically impossible to establish a common basis of understanding
between the partners of such a discussion. On the other hand, a discussion based
on, and exemplified by, substantial and specific applications will be fruitful.
All my better examples are concerned with the modeling of various applied
stochastic processes. This is not an accident: stochastic processes are creating
the most involved modeling problems. The following example (on modeling the
rotation of the earth) may be the best I have met so far. It shows the intricacies
of stochastic models in real situations, in particular how much modeling and
data processing sometimes has to be done prior to any statistics, and how
different model components must and can be separated despite interactions.
The runner-up is in biology rather than in geophysics, and it operates in the
time domain rather than in the frequency domain [modeling of circadian cycles,
Brown (1988)J.
Example: Modeling the length of day [Huber (2000)J. Because of tidal fric-
tion, the rotation of the earth slows down: the length of day (LOD) increases by
about 2 milliseconds per century. An analysis of medieval and ancient eclipses
back to about 700 BC had shown that on top of the systematic slow-down
there are very substantial random fluctuations. They must be taken into ac-
count when one is extrapolating astronomical calculations to a more distant
historical past. In terms of the LOD-process, these poorly determinable fluctu-
ations are compatible with a Brownian motion (or random walk) model, whose
increments have a variance of about 0.05 ms2/year.
Being interested in estimating the size of the extrapolation errors, I won-
dered whether such a millennial Brownian motion component was a long-range
effect only, or whether it would be discernible also in the more accurate but
much shorter modern series of measurements, and in particular whether those
modern measurements might even permit a more accurate estimate of the vari-
ance of the increments. I could obtain three such series of different lengths and
observational accuracies, listing length-of-day values in intervals of 4 months,
5 days and 1 day, starting in the years 1830, 1962 and 1976 respectively. In
the power spectrum of a differenced series of LOD values, a Brownian motion
component· should manifest itself in the low frequency part of the spectrum
as a horizontal tail end. The problem with the modern measurements is that
there are many nuisance effects, with periodicities ranging from days to tens
of years, and of a size comparable to the putative Brownian motion process.
34 P. J. Huber
(1) Systematic drift of about 2 msjcy. The "true" rate cannot be estimated
very accurately from the data because of the random Brownian motion
(2) sitting on top.
(2) Brownian motion (or random walk process). Putative cause: cumulative
random changes in the rotational moment of inertia of the earth's mantle,
induced by plate tectonics.
(4) Seasonal effects, with an amplitude of about 0.4 ms. Exchange of angular
momentum between the atmosphere and the solid earth, caused by sea-
sonal temperature changes and winds. See Figure 3.2. They can be taken
out cleanly by fitting a trigonometric polynomial to the LOD-process.
(7) Solid earth tides. These are reasonably well understood, deterministic
effects. In the later parts of the data series made available to me, namely
since 1982, they had been eliminated through preprocessing, but not be-
fore (remnants are the peaks in the 10-14 days range in Figure 3.2).
time
1850 1900 1950
10
5
1990.8
~
"0
0
-5
-10
-15
15
10
5
al 1990.8
~0 0 .4
1924.5 ____- - - " ' -
E
Vl
-5
-10
Figure 3.1: The 4-lunation series (covering the years 1830-1990 in 4-month
intervals) in the time domain: the actual data in the series, and a smoothed
version (obtained by forming moving averages). Note the changing level of the
observational noise and the decadal waves
36 P. J. Huber
sqrt(freq)
4
182.86 Lagl0-Spectrum of
differenced 5d series
Data:
- actual (dotted)
- deseasoned (solid)
- 6 simulations (grey)
Model: superposition of
0.5
- random walk
- AR(2) "50 day oscillation"
-0.5
-1
Figure 3.2: Log10-spectrum of the differenced 5-day series (covering the years
1962-1995 in 5-day intervals). The cross-over between the random walk process
and the AR(2) model occurs near 8 months (243.81 days). On purpose, only
the two most prominent components (2) and (5) of the model are used
Approximate Models 37
There is a delicate interplay between the components (2) and (3). We note
that random changes in the rotational moment of inertia of the earth's mantle,
as postulated in (2), by preservation of angular momentum cause wiggles in the
rotation rate of the mantle. These wiggles excite damped oscillations on the
mantle-core boundary, with a resonance in the decadal range. High-frequency
components only wiggle the mantle, but in the low-frequency range, mantle
and core move together as one solid body. Even though the exact coupling
mechanisms are not known, the net effect is that the spectrum of the differenced
LOD-series will be flat both below and above the resonance frequency, with a
19% smaller value in the low frequency range (the size of the drop is determined
by the known ratio between the moments of inertia of mantle and core), with
a hump of poorly determined shape in between.
The feature of interest in the spectrum of Figure 3.2 is the putative Brown-
ian motion (or random walk) component, which should manifest itself in a flat
low-frequency spectrum. We would like to check whether a Brownian motion
model fits, and we would like to estimate the size of its contribution. In the 5-
day series, the AR(2)-contribution corresponding to the 50-day Madden-Julian
oscillation dominates the spectrum of the deseasoned series for periodicities
shorter than about 8 months. Information about the putative Brownian motion
part can be gleaned only from the tiny low-frequency tail end of the spectrum
(which has been stretched out in Figure 3.2 for better visibility by plotting the
power spectra against the square root of the frequency rather than against the
frequency itself). It is obviously non-trivial to separate the Brownian motion
contribution from the AR(2) component. When estimating the AR(2) parame-
ters we must rely on the middle frequency range (where the 50-day oscillation is
dominant), and we must remember the tricks of the robustness trade and make
sure for example that the irrelevant peaks in the 10-14 day range do not bias
our parameter estimates. Then, in the low frequency range, where the Brown-
ian motion is dominant, the expected contribution of the AR(2) model must
be subtracted as a correction from the total spectrum, in order to estimate the
contribution of the Brownian motion. As an added complication, the very low
end of the data spectrum may already be somewhat inflated by the contribu-
tions of decadal fluctuations (they were not modeled in the simulations depicted
in Figure 3.2). In order to reduce processing artifacts caused by smoothing, the
actual parameter estimation was not based on the spectra shown in Figure 3.2,
but on the periodogram values themselves. This yielded an estimate of the vari-
ance of the increments of 0.072 ms2/year, valid above the resonance frequency.
For the millennial range (below the resonance frequency), this translates into
0.058 ms2/year, with a 95% confidence interval (0.040, 0.089). - By the way, for
somebody reared on Box-Jenkins time-series analysis, it is quite an educational
experience if he has to devise his own ARMA estimates based on a periodogram
segment!
In the 4-lunation series the high level of measurement noise creates com-
38 P. J. Huber
plications: it dominates the spectrum for periodicities shorter than 8-10 years.
This leaves deplorably few periodogram ordinates for estimating the Brown-
ian motion contribution in the spectrum. They straddle the decadal resonance
hump (3); after subtracting a somewhat crudely estimated contribution from
measurement errors leaking into that range, the average spectral power there
is 0.13 ms2/year. While this value is boosted by the decadal hump, it still is
just barely significantly larger than the value 0.072 ms2/year estimated from
the 5-day series. If we model (3) by a damped harmonic oscillator excited by
that Brownian motion and estimate the coupling parameters from the data (in
statistical parlance this amounts to modeling the mantle-core oscillations by an
AR(2) process), we can get an essentially perfect fit of the spectrum in that
range.
Among other things, this example illustrates that for some parts of the
model (usually the less interesting ones) we may have an abundance of degrees
of freedom, and a scarcity for the interesting parts.
Of course, just like the classical approaches, also the simulation methods only
measure stochastic variabilities intrinsic to a model assumed to be correct: they
implicitly assume that the estimated parameter values are so close to the true
ones that the latter can be replaced by the former without committing a serious
error when assessing the variability of an estimate or test statistic. Admittedly,
in practice there may be problems; for example, parameter estimation may fail
to converge for a small percentage of the samples, and this may selectively affect
the tails.
Under mild monotonicity assumptions, but at a relatively high computing
cost, it is possible to supplement point estimates with somewhat more reliable
confidence interval estimates than by the method just described. For example,
in order to find a lower confidence bound, one takes the model and replaces
the estimated value () of the model parameter of interest (in the example of the
preceding section: the variance of the increments of the Brownian motion) by
a suitably chosen smaller value ()o. Then one uses the thus modified model for
simulating 1000 data sets, and derives new parameter estimates from each of
these sets. If, say, 950 of the newly estimated values of () then are smaller, and
50 larger than the original estimate of (), then ()o constitutes an approximate
one-sided 95% lower confidence bound. The determination of a suitable ()o is
expensive because it requires a considerable amount of trial and error.
But thanks to simulation, a judgmental assessment of goodness-of-fit need
not even be based on a test statistic (whose selection always is delicate). The
principle is simple and inspired by the line-up methods used by the police: if
the actual data hides well among half a dozen or a dozen simulations of the
model, the model is judged acceptable; if the actual data sticks out like a sore
thumb, the model is no good. Figure 3.2 illustrates the approach by showing
the spectrum estimates resulting from 6 simulations of the model, indicating a
good fit in the interesting low frequency range. But the example also illustrates
a general problem of any global approaches to goodness-of-fit, namely the peaks
in the 10-14 day range, which make the actual data set stick out like the sore
thumb mentioned above. In this case the origin of the discrepancy is under-
stood, and it is irrelevant because it lies in an uninteresting frequency range. A
more sophisticated version of the line-up method, permitting approximate sig-
nificance tests, is to create a pool composed of the actual data set and, say, 99
simulated sets. Somebody not knowing which is which has 5 attempts to pick
the actual set out of the pool, using any tools of his or her choice. If the actual
data set is not among the selected five, the model is deemed to be adequate.
3.7 Summary Conclusions
If we want to be able to deal with increasingly larger and more complex data
sets, we need to go beyond the current, overly narrow ingrained modeling con-
cepts of statistics. The models will become more complex, too, but a clean
statistical theory with mathematically rigorous results is possible only for clean
and simple models. We will have to pay a price, but we will also gain something
in the process. The questions will shift more and more from a mere yes-or-no
global check whether the model is adequate (I prefer the term "model ade-
quacy" to "model validity" - a model adequately rendering the observations
need not be valid in any intrinsic sense), to a detailed assessment of the quality
of the fit, to questions of interpreting the fit, and in particular to the need to
locate and interpret deviations from a model which is known to be imprecise,
and to separate essential deviations from irrelevant ones.
References
1. Besag, J., Green, P., Higdon, D., and Mengersen, K. (1995). Bayesian
computations and stochastic systems, Statistical Science, 10, 1-66.
10. Huber, P. J. (2000). Modeling the length of day and extrapolating th.e
rotation of the earth, In Astronomical Amusements, Papers in Honor of
Jean Meeus (Eds., F. Bonoli, S. De Meis, and A. Panaino), Milano: IsIAO.
11. Huber, P. J., Sachs, A., Stol, M., Whiting, R. M., Leichty, E., Walker, C.
B. F., and vanDriel, G. (1982). Astronomical Dating of Babylon I and Ur
III. Occasional Papers on the Near East, Vol. 1, Issue 4, Malibu: Undena
Publications.
G. D. Rayner
Deakin University, Geelong, Australia
Abstract: This paper presents an overview of Rayner and Best's (1989) cate-
gorised Neyman smooth goodness-of-fit score tests, along with an explanation
of recent work into how these tests can be used to construct components of the
Pearson-Fisher chi-squared test statistic in the presence of unknown nuisance
parameters. A short simulation study examining the size of these component
test statistics is also presented.
4.1 Introduction
Pearson's (1900) chi-squared test statistic
was the first, is the most well known, and is probably the most frequently used
test for goodness-of-fit. This test is essentially an omnibus test in that it is
sensitive to a wide variety of different ways in which the data can be different
to the hypothesized distribution. For example, the chi-squared test is able to
detect data that differs from the hypothesized distribution in terms of any of
location, scale, shape, etc. It is interesting to try and consider the component
test statistics, sensitive only to more specific departures, that might combine
to produce the chi-squared test statistic.
For a sample space broken into m classes, let Nj (j = 1, ... , m) be the
number of observations from the sample (of size n = ~j N j ) that fall into the
45
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
46 G. D. Rayner
j-th class. If Pj is the probability of an observation falling into the j-th class
under the completely specified hypothesized distribution, then the Pearson chi-
squared test statistic
m
(4.1)
j=l
where 0 = (01"'" Ok)T are k :::; m-1 real parameters and C(O) is a normalizing
constant (such that 2:j 7rj = 1), and for each i, hi,j are the values taken by a
random variable Hi with P(Hi = hi,)) = 7rj. Testing
Ho: 0 = 0, versus
HI: 0 -=1= 0
decides between the null hypothesis probabilities P and the k-dimensional Ney-
man smooth alternative probabilities 7r = (7r1' ..• , 7rm).
Put N = (Nl,"" Nm)T as the vector of observed counts in each category,
n = 2: j Nj the sample size, D = diag(Pl, ... ,Pm), and the k x m matrix H as
having entries hi,j. Rayner and Best (1989) calculate the score statistic for this
situation as
where ~ is the covariance matrix of the random variables Hi under the null
hypothesis.
For k = m -1, requiring that H satisfies HDHT = 1m-l and Hp = 0 gives
~ = 1m-I, where h is the k x k identity matrix. This means that the score
test statistic Sk can be written as
Sm-l (N - npfHTH(N - np)jn
(N - np)T(D-l - D- 1ppT D- 1 )(N - np)jn
m
L(Ni - npi)2 j(npi) = X~,
i=1
which define these orthogonal polynomials. Rayner and Best (1989) suggest
that this selection allows an r-th order moment departure interpretation for
the r-th component test statistic Vr. That is, a significantly large value for Vr
indicates that the data departs from the hypothesized distribution in terms of
moments of order less than or equal to r.
Rayner and Best (1989) clearly describe how these component statistics
should be used: either (1), in an EDA fashion to examine how the data differ
from the hypothesized distribution; or (2), when testing for the hypothesized
distribution, only the first few components along with a residual statistic (say,
VI, V2, V3, V4 and X~ - V? - vl- vl- Vi) should be used. It is important to
avoid post mortem testing using what are discovered to be the most significant
components.
Ho: () = 0, versus
HI : °
() I-
For ~ the MLE's of the nuisance parameters (3, then p = p(~) and iI = H(~).
Now the score statistic is
where t = 'E(~).
Partitioning the Pearson-Fisher Chi-Squared Goodness-ai-Fit Statistic 49
(4.6)
provides the desired m-q-1 component test statistics of X~F [Rayner (2000a)].
A program using the free statistical package R [Ihaka and Gentleman (1996)] to
obtain the component test statistics and p-values when testing for the normal
distribution is available from the author's website [Rayner (2000b)].
(i) First, the uncategorised data was used, and the composite uncategorised
tests for normality of Rayner and Best (1989, chapter 6) were calculated
to obtain V3,u, ... , V6,u;
(ii) Then the data were moderately categorised into the ml = 10 classes
(-00,-3]' (-3,-2], (-2,-1.5], (-1.5,-0.5]''(-0.5,0]' (0,0.5]' (0.5,1.5]'
(1.5,2]' (2,3]' (3,00) to obtain V3,Cl"'" %,Cl using the method in Sec-
tion 4.3;
Partitioning the Pearson-Fisher Chi-Squared Goodness-of-Fit Statistic 51
(iii) Finally, the data were coarsely categorised into m2 = 6 classes (-00, -2],
(-2, -1], (-1,0]' (0,1]' (1,2]' (2, (0) to obtain V3,C2"'" V5,C2' also using
the method in Section 4.3.
-4
1111
-2 0 2
values of V3 u
<1.
Cl
--4
IIII
-2 0 2
values of V4 u
4
011111
-4 -2 a 2
values of V5 __ u
4
011111
-4-2024
values of V6 u
nLLnLLHLLHLL
o
-4-2024
I I I I a I
-4
I
-2
I
0
I
2
values of V4_ c1
4
0 I
-4-2024
I I I
values of V5_c1
I 0 I I
-4-2024
I I i
HLLnLLHLL
o I
-4-2024
I I I
values of V3 __ C2
I 0 I
-4
I
-2
I
0
I
2
I
4
0 I
-4
I
-2 a
I I
2
values of V5_ c2
4
I
Figure 4.1: Sampling distribution of the V3, V4 , V5, V6 statistics obtained from
R = 10,000 samples of size n = 20 taken from the standard normal distribution.
The top row is for the uncategorised data (u) using Rayner and Best's method
(1989, Chapter 6), and the other rows use my categorised method (Section 4.3)
with ml = 10 categories (middle, Cl) and m2 = 6 categories (bottom, C2)
"Cr.l
M-
Il:>
M-
(jj.
M-
e=;.
CJ1
CJ-:)
Ci1
,.,.
Partitioning the Pearson-Fisher Chi-Squared Goodness-oE-Fit Statistic 55
References
1. Carolan, A. C. and Rayner, J. C. W. (2000). A note on the asymptotic
behaviour of smooth tests of goodness-of-fit, Manuscript in preparation.
Alcha Zerbet
Universite Bordeaux 2, Bordeaux, France
57
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
58 A. Zerbet
2
sn = -1 ~ - 2
LJXi - Xn) .
n i=l
Let
i = 1, ... ,n.
We note that under Ho the statistic Yi follows the so-called Thompson distrib-
ution with n - 2 degrees of freedom:
n-4
1 f(n2"l) jY ( y2)-2
P{Yi ::; y} = Tn -2(y) = 2 1 - -- ,
J7r(n - 1) r(n2" ) -y'n-l n- 1
Iyl < Vn=1,
)
which does not depend on J.L and 0"2 ; Tn -2 (x~;n is the MVUE for <I> ( x~J.I: ).
We consider v* = (vi, ... , v;) the frequency vector obtained by grouping Yl , }2,
... , Yn over the intervals (XO,Xl], (Xl,X2], ... , (Xr-l,X r ). For testing Ho we con-
sider, following Drost (1988) and Zhang (1999), the statistic of Nikulin-Rao-
Robson-Moore given by
where
X2 = t
i=l
t : ~Pi)2,
(v
Pt
a(v*) = t
j=l
vj(<p(Xj) -. <p(Xj-l))
PJ
t t
j=l
t
j=l j=l
Theorem 5.1.1 . Under Ho, the statistic Y; has in the limit, when n ----t 00,
the chi-squared distribution with r - 1 degrees of freedom.
where a is the significance level (0 < a < 0.5). Note that Sturges' empirical
rule suggests
r = 1 + logn.
With this choice of r, the expected number of observations in each class is not
small. If there is no alternative for H a, then it is reasonable to choose Pi = l/r.
For more details, see Drost (1988), and Greenwood and Nikulin (1996).
We suppose, our hypothesis Ha, that the data are the realizations of a normal
N(/-L, (J'2) sample of size 58. On the basis of this data we have Xn = 4.7808
and s; = 22980 * 10-8 . To test Ha, we construct a chi-squared test based on
the statistic Y; with equiprobable classes. This hypothesis should be rejected
if Y; > X a , where Xa is the a-upper quantile of the chi-squared with (r - 1)
degrees of freedom. If we choose, for example, r = 3 and a = 0.1 then the
results of computations are as follows:
So, in this case we have Xa = 4.6052. Since the observed value of Y528 is less
than Xa then we accept the hypothesis of normality Ha.
After using the statistics VI, ... , Vn , we construct the vector of the order sta-
tistics
V(.) = (V(l), ... , V(n)) , V(l) ~ ... ~ V(n).
V-c .)
Then, supposing a is fixed (0 ~ a ~ 0.5), we compute j(i) for all Xi,i
1, ... ,n, where j(i) is the number of V(j) corresponding to Xi. If
V-c·)
...1...!:.... a
<_
j(i) - .x'
then we declare that Xi is an outlier observation (.x = 1 in the unilateral case;
.x = 2 in the bilateral case).
5.2.2 Example 2: Analysis of the data of Daniel (1959)
Daniel records the results of an experiment factorille form 25 :5 factors, each one
on two levels, where the 31 contrasts, in ascending order of the absolute value,
are
o
0.2947
I 0.0281 1-0.0561 1-0.08421-0.09821 0.1263
-0.3087 0.3929 0.4069 0.4209 0.4350
I 0. 1684 1 0.1964
0.4630 ·0.4771
I 0.22451-0.2526
0.5472 0.6595
0.7437 -0.7437 -0.7577 -0.8138 ·0.8138 -0.8980 1.080 -1.305 2.147 -2.666
-3.143
By applying the test of chi-squared for testing normality to these data (a = 0.1),
we obtain for r = 4 the following results :
Since the value of the statistic Y;is higher than the quantile Xa = 6.2514 of
the law of chi-squared with 3 degrees of freedom corresponding to the level of
significance a = 0.1, we must reject the null hypothesis.
Carrying out the test of Bol'shev to detect the outliers on the same level
of significance a = 0.1, we conclude that the observation X31 = -3.143 is an
outlier. We apply again the chi-squared test on the remainder of the data after
elimination of outlier X 31. At the end we obtain the following results:
Statistical Tests for Normal Family 61
This time, we must accept the null hypothesis as long as Y}o < 6.2514, noting
that earlier the rejection of the hypothesis Ho was due to the presence of one
outlier.
Since we test Ho against H n , it is better [see, for example, Aguirre and Nikulin
(1994)] to consider the Neyman-Pearson classes hand h for grouping data,
where
X --
cP ( -
(j
I-l) = f (x -(j
n --
I-l) , W h'ICh'IS . 1ent to (0(X
eqmva y
- I-l) __ g(x - I-l).
(j (j
Let ai = ai(Bn ), (i = 1, ... , m) be the roots of the last equation and let
l aj (iJ) 1
, - fn(
aj_l(B) Sn
x-
Sn
Xn
)
V
1
= Pj + r.:;Cj(()n), (j = 1,2, ... , m).
n
A
Theorem 5.3.1
where
and
References
1. Aguirre, N. and Nikulin, M. S. (1994). Goodness-of-fit test for the family
of logistic distributions, Questiio, 18, 317-335.
2. Bol'shev, L. N. and Ubaidullaeva, M. (1974). Chauvenet's test in the
classical theory of errors, Theory of Probability and its Applications, 19,
683-692.
3. Chauvenet, W. (1863). A Manual of Spherical and Practical Astronomy,
II, Philadelphia.
4. Daniel, C. (1959). Use of half-normal plots in interpreting factorial two-
level experiments, Technometrics, 1, 311-341.
5. Drost, F. C. (1988). Asymptotics for generalised chi-square goodness-of-
fit tests, In Amsterdam: Centre for Mathematics and Computer Sciences,
CWI tracts, 48.
6. Greenwood, P. and Nikulin, M. (1996). A Guide to Chi-squared Testing,
New York: John Wiley & Sons.
7. Grubbs, F. E. (1950). Sample criteria for testing outlying observations,
Annals of Mathematical Statistics, 21, 27-58.
8. Linnik, Yu. V. (1962). The Method of Least Squares and the Principles of
the Mathematical-Statistical Theory of Processing of Observations, Second
revised and augmented edition. Moscow: Gosudarstv. Izdat. Fiz. -Mat.
Lit.
9. Moore, D. S. and Spruill, M. C. (1975). Unified large-sample theory of
general chi-squared statistics for tests of fit, Annals of Statistics, 3, 599-
616.
10. Pearson, E. S. and Chandrasekar, C. (1936). The efficiency of statistical
tools and a criterion for rejection of outlying observation, Biometrika, 28,
308-320.
11. Rao, K. C. and Robson, D. S. (1974). A chi-squared statistic for goodness-
of-fit tests within the exponential family, Communications in Statistics,
3, 1139-1153.
64 A. Zerbet
Leo Gerville-Reache
Universite Bordeaux 2, Bordeaux, France
6.1 Introduction
Consider an individual of age x (in years) at time 0 taken as the origin. Denote
by Tx his residual lifetime counted from this origin which is a random variable.
One characterizes the law of probability of Tx by the probability of death:
65
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
66 L. Gerville-Reache
So, we obtain easily the annual death rate for Makeham law as
(c - 1)
qx=l-exp ( -a-(3log(c)c
x) . a > 0, (3 > 0, c > 1. (6.4)
(6.5)
We notice that if Six = 1 (\I ix not belonging to Ix) and tix = (\I ix), then
Q~ = D~/lx (this is the case without censure). It is supposed that censure is
°
independent of death and age.
Let
1 Ix _ 1 Ix 1
Sx = l L Six , tx = l
x ix=l
L
tix and SIx = D*
x ix=l
Six; L
x Ix
we obtain then
* D~
Qx = lx(Sx - t x ) + D~(l - SIx)
X2 = L
w-l (D* [*
x - xqx
)2
(6.8)
w x=o l~qx(1- qx))
Follows asymptotically a chi-squared law with w degrees of freedom.
(6.9)
It should be noted here that the statistic X~(lj ) is different from the tra-
ditional statistic of Pearson insofar as we define a goodness-of-fit test on con-
ditional probabilities. We saw previously that D~ are independent random
variables which follow binomial laws with parameters lx and (sx - tx)qx.
The likelihood function of (Do, Di, ... , D:_1)t is
w-l
L(e) = II CE; [(sx - tx)qx(lJ)]D; [(1- (sx - tx)qx(O))]I.x-D;. (6.10)
x=o
One takes the estimator which maximizes the likelihood function
e= argmax
e
L(e). (6.11)
68 L. Gerville-Reache
6.3 Demonstration
Let q~(O) =(8 x - tx)qx(O). We have seen that the likelihood function of
(Do,Di, ... ,D~_l)t is
w-l
L(e) = II CE; [q;(O)]D; [1 - q;(O)]Z;-D; (6.12)
x=o
which yields
w-l
In [L(e)] = L
[In [cE;]
+ D~ ·In [q~(e)] + (lx - D~) ·In [(1 - q;( e))1] .
x=o
Under the assumption that qx(e), a function of e = (el, e,2 , ... , es)t ~ e ~ RS,
admits partial continuous derivatives, a necessary condition for e to be the
maximum likelihood estimator of e is
8
8e In [L(e)] = Os
w-l D* 8 (l - D*) 8
{:} x=o
L qx*(~). 8e q ;(e) - 2
(1 ~ *(;)). 8e q ;(e)
qx 2
= 0, Vi = 1, ... ,8
Chi-Squared Test for the Law of Annual Death Rates 69
Hence,
This is the equation which checks whether the estimator is a maximum like-
lihood estimator in the case when there is no censure as shown by in Gerville-
Reache and Nikulin (2000). Therefore, we can follow the derivation of the as-
x3
ymptotic law of the statistic (9) of the non-censured case simply by replacing
N by N*, lx by l;, Dx by D;, and Qx by Q; in the result of Gerville-Reache
and Nikulin (2000).
References
1. Chernoff, H. and Lehmann, E. L. (1954). The use of maximum likelihood
estimates in chi-square test of goodness-of-fit, Annals of Mathematical
Statistics, 25, 579-586.
7.1 Introduction
An omnibus goodness-of-fit (GOF) test for normality, with nuisance IL, (j, loca-
tion and scale parameters, is due to Shapiro and Wilk (1965). Their ingenious
test is based on the regression of the observed sample order statistics on the ex-
pected values of order statistics in a sample of the same size from the standard
normal distribution. Based on extensive numerical studies it has been revealed
that their test has good power properties against a broad class of alternatives.
However, the actual distribution of their test statistic, even under the null hy-
pothesis, is quite involved; tables have been provided up to sample size 50,
and beyond that suitable approximations have been incorporated to approxi-
mate them well (Shapiro 1998). In this context, some asymptotic distributional
problems have also been discussed by De Wet and Venter (1973), though that
provides very little simplicity in this respect. It has been thoroughly discussed
in Shapiro (1998) that generally such asymptotic approximations entail some
73
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
74 P. K. Sen
loss of power. One of the objectives of the current study is to focus on some
asymptotics that would provide good explanation for this shortcoming.
Basically, the asymptotic distribution of the Shapiro-Wilk type of tests is
governed by a second-order asymptotic distributional representation (SOADR)
property that has been systematically presented in Jureckova and Sen (1996,
Ch. 4). Borrowing strength from such results, Jureckova and Sen (2000) con-
sidered a general class of GOF-tests for a class of underlying distributions (in-
cluding the normal one as a notable case), and proposed alternative tests based
on a pair of location estimators that are first-order equivalent (FOE). In their
set-up too, the SOADR results playa vital role. We refer to Jureckova, Picek,
and Sen (2001) for some numerical studies relating to such robust GOF tests.
In view of the fact that such GOF tests are for composite hypotheses and
the alternatives are not necessarily contiguous, there may not be an omnibus
test having better power property for the entire class of alternatives. In the
same vein, the usual (asymptotic) optimality properties of likelihood ratio type
tests may not be tenable here. As such, we find it more convincing to stress the
simplicity of the asymptotic null hypothesis distribution and other robustness
properties. In this context as well, there is a basic role of the SOADR results
most of which are known by this time. Along with the preliminary notion, the
Shapiro-Wilk (1965) type of test statistics are presented in Section 7.2. SOADR
results are presented in Section 7.3. In the light of these results, in Section 7.4,
the contemplated asymptotics are discussed. The last section is devoted to
some concluding remarks.
Also let mn = (mnl, ... , m nn )' and V n = ((Vnij)) be the vector of expected
order statistics and their covariance matrix respectively. Then the best linear
unbiased estimator (BLUE) of!J" is given by
n
CTn = L aniXn:i, (7.3)
i=l
Note that by virtue of the symmetry of <P, we have a~ln = 0, but a~an may not
be strictly equal to one. For that reason, Shapiro and Wilk (1965) considered
a modified estimator wherein they let
I V-1V-1
o = { mn
an n
1
nmn )
nmn }-1/2(V- (7.5)
(7.7)
i=l i=l
where the ei are independent and identically distributed random variables with
zero mean and unit variance; under HQ, the ei have the standard normal distri-
bution. Further, let en:i, i = 1, ... , n be the associated order statistics, so that
Xn:i = /-l + !J"en:i, i = 1, ... , n. We also use the notation
Note that under the null hypothesis e~:i has the expectation mni, so that rewrit-
ing W~ as
W* = (n _ 1) en*' [I n - n -1 In In' - anan
0 0'] en*
(7.11)
n en*'[1 n - n -11 n 1']
n en* '
we claim that its distribution is free from the nuisance parameters, and un-
der the null hypothesis W~ would be stochastically smaller (still nonnegative)
[as may be verified by invoking the idempotent nature of the two matrices in
the numerator and denominator of (7.11)]. Under alternatives, the expectation
vector would differ from mn, and as a result, the expectation of the numerator
quadratic form would be away from zero, so that W~ would be Op(n). This
provides the rationality of the SW-test. In the same way, for a suitable ap-
proximation to a;;" say denoted by b n , the corresponding analogous form of the
modified SW-test statistic can be expressed as .
(7.15)
U(2)
n
= ( n)-l L g2(Xi,Xj )
2 {l:Si<j:Sn}
(7.18)
where Fn(x) is the empirical dJ of the Xi, and the score function ¢n(u) con-
verges to a smooth score function 'ljJ(u) = q>-l(u), the quantile function of the
standard normal dJ., for all u E (0,1). In fact, 'ljJn(u) can be approximated
well by q>-l(nu/(n + 1)), u E (0,1); we refer to Jung (1955) for an excel-
lent motivation and useful illustration. Although, in Jung's case there was
a smooth score function while we have here the scores generated by the ex-
pected order statistics values and their covariance matrix, smoothness can be
imported up to the second order terms by using the Hajek (1968) projection
results and representations on the projected terms. Thus, side by side, we let
~(F) = I~ F-l(u)'ljJ(u)du and note that when F is normal (with a scale para-
meter (7), ~(F) = (7. Then, proceeding as in Section 7.4 of Sen (1981) [viz.,
(7.4.30) to (7.4.37)], we obtain that
so that Znl is an average of LLd.r.v.'s with zero mean and a finite positive
variance, say "(2. The component Zn2 involves second-order terms; by virtue of
unbiasedness of ern when F is normal, we have E(Zn2) = 0, for F normal, and
proceeding as in Theorem 4.3.1 of Jureckova and Sen (1996), it can be shown
that Zn2 = Op(n- 1 ). Actually, their Theorem 4.5.2 (p. 155) gives a SOADR
result for ern. In passing we may remark that when F is itself normal, we have
Znl = (1/2)U~1), where the latter is defined by (7.16).
GOF Tests for Normality 79
(7.23)
(7.25)
so that the right hand side of (7.25) can not be greater than (n - 1). It is to
be noted further that by definition
n
m~mn = n- L Vnii = n - trace(V n), (7.26)
i=l
so that
(7.27)
where p;
= {(m~ V;;:-lmn)2}/{(m~mn)(m~ V;;:-2m n )} is bounded from above
by 1. Note further that
(7.28)
and as a result,
(7.29)
This displays the role of the score function an in relation to the constant Cn.
This result will be useful in the sequel.
For some b n other than a~, if we define c~ as in (7.23), we would have (7.24)
intact, though (7.25) and (7.27) would be somewhat different. Specifically,
parallel to (7.25), we would have
(7.30)
(7.31)
80 P. K. Sen
(7.32)
(7.33)
(7.34)
(7.35)
Or, in other words, 'Yn is an eigenvalue of V;;:-l with respect to the eigenvector
mn. As such, we have
(7.36)
and repeated use of this equation leads us to the identity
where p~ is defined after (7.27) and c~ refers to the specific case of the Shapiro-
Francia (1972) modification. Let us consider now the general form of c~ when
b n = Bnlbfmn where Bn is a symmetric positive definite matrix such that
l~Bnmn = 0 and m~BnBnmn = 1. Let then m~Bnmn = dnm~mn, so that
by similar argument we claim that dn is an eigenvalue of Bn with respect to
the eigenvector mn. Then, it foll,?ws from the above that for the entire class of
such B n , the corresponding c~ will be equal to en in (7.25), and by (7.29), we
write this as
Cn = (2n + 1) [n - trace (V n) ]j2n. (7.38)
This characterization is essentially based on the properties of the eigenvalues
and eigenvectors. On the other hand, for some other b n which can not be
GOF Tests for Normality 81
expressed exactly as Bnm n , we could write b n = Bnm~ for some scores vector
m~ that replaces mn. Although, for large sample sizes, mn and m~ could be
very close to each other (as is the case with some other modifications of the SW-
test proposed in the literature [Shapiro (1998)]), we would have an eigenvector
different from mn, and as a result, the corresponding p~ may not be strictly
equal to one, and this in turn may also cause perturbation in the associated c~.
This point will be made clear in the next section.
By virtue of (7.11), for the desired asymptotics, we may assume without any
loss of generality that J.L = 0, (J = 1. Thus, effectively, we work with the reduced
order statistics en:i; then 5;' can be defined as (n - 1)-le~[In - n-ll~ln]e~,
and we have the following SOADR result:
52 1 + U(l)
+ U(2)
n n
n'
U(2) = _1_{1 - Z2} + 0 (n- 3 / 2 ) (7.39)
n n-1 1 p ,
where U~l) is defined by (7.16) (but based on the en:i ), and Zr = ne~ has the
chi square distribution with 1 DF, independently of U~l). Note that 5;', Zl are
jointly sufficient statistics for the normal F, so that U~2) is also a function of
the sufficient statistics.
We rewrite a-n , defined by (7.3) and (7.4) as a- n = (m~ V;;:-le~)/(m~ V;;:-lm n )
and note that here a-n has expectation 1. Let then
n
Qni = E{a-nled =L anjE[en:jlei]
j=l
n
Lanjqnj(ei) = qn(ei), i = 1, ... ,n, (7.40)
j=l
where Znl is defined as in (7.22), but now based on the ei instead of the Xi.
We write
CJn - 1 = Znl + R n , and 2Zn l = U~l). (7.43)
Therefore, defining a~ as in (7.5), we have
where in the last step, we make use of (7.43) and (7.39). It may be noted that
~nU~1)2 has asymptotically a chi square distribution with 1 DF, independently
of Zr, and hence, the distribution of W~, under the null hypothesis of normal-
ity, depends on (i) the nonstochastic trace(Vn) (which is 2: 1), and (ii) the
stochastic Zr, nU~1)2 as well as the residual Rn (which is Op(n- 1)). On using
the basic results of Hoeffding (1953) on expected order statistics, we claim that
as n ~ 00,
1 , 1
-(mnmn ) ~ 1, so that -trace(Vn) ~ o. (7.46)
n n
With this simplification, we note that
(7.47)
We shall see that W~o has a simpler asymptotic distribution. To see this, we
write
(7.48)
GOF Tests for Normality 83
where the last factor on the right hand side is, by (7.39), 1 + Op(n- 1/ 2 ). Conse-
quently, by the Slutzky theorem, W~o and W~* both have the same asymptotic
distribution, if they have any at all. To study this, we incorporate (7.39) and
(7.43), and conclude that W~* has the asymptotic representation
(7.49)
At this stage, noting that for normal F, E(Rn) = 0, we make use of Theorem
4.5.2 of Jureckova and Sen (1996) (after verifying that for the normal distri-
bution, the associated scores an satisfy the needed regularity conditions), and
obtain a SOADR for Rn. We may write
where the Ak are nonnegative Fourier coefficients and the Zk are independent
standard normal deviates. Here Al = 1, A2 = 1/2. This form is in close
proximity with the form suggested by De Wet and Venter (1973) for some allied
forms of W n . We provide here a clear representation involving appropriate
SOADR results that have been studied in the literature, mostly, in the past
decade.
There are certain distinct advantages in writing W~o in terms of such a de-
generate U-statistics. It not only provides access to the study of the asymptotic
properties of the SW-test statistic, but also allows us to make use of suitable
resampling plans to generate asymptotic critical levels of the SW-test statistic
- a much needed task to make the SW-test applicable in large sample sizes. In
this context, we may refer to Huskova and Janssen (1993) where the validity of
bootstrapping for degenerate U -statistics has been critically examined, and we
may adopt with some advantage their methodology to generate the asymptotic
null distribution of W~o. The asymptotic distribution for W~ can be readily
obtained from that of W~o.
Let us now discuss the case of the Shapiro-Francia (1972) modification of
the SW-test statistic, considered in (7.29). If we define their estimate of (J by
(7.52)
84 P. K. Sen
and note that it is unbiased but not the BLUE of () (while a-n is BLUE),
exploiting the BLUE characterization, and noting that a-~ is asymptotically
first-order efficient (FOE), we can write
where Znl, Zn2 are defined as in (7.20) and Zn3 is orthogonal to Zn2 (and is
also Op(n- 1 )). Also the SOADR result applies to a-~ as well. As such, if we
proceed as in (7.19) through (7.50), we obtain the following representation for
w~~.
W~~-4D L Ak(Z~ - 1), (7.54)
k~l
This explains why in considering a modified form of the SW-test statistic, there
is a need to adjust for the critical values, and doing that might make those
modifications more competitive with the SW-test itself. Of course, speaking
for the power properties, even for large sample sizes, we need to pay adequate
attention to the intricate asymptotics, and these are considered in the next
section.
The asymptotics for the Shapiro-Francia test in general go over to other
cases where b n are related to mn by suitable matrix multiplication, as has been
discussed after (7.37). However, we need to assume that they are FOE and ad-
mit SOADR. If we consider other b n that are related to various approximations
to mn and V n , as has been discussed in Shapiro (1998, pp. 481-482), the eigen-
values will be different, though quite close to the ones discussed earlier. Further,
because of the fact that in such a case, p; defined in an analogous way, will be
typically less than one (though quite close to 1), while n-l(m~mn) -4 1, we
could see that there will be additional variation due to p; being less than one,
and more so due to other terms that appear in the second-order expansion. As
such, the SW-approximation may not generally apply here very satisfactorily.
Although their critical values can be estimated by similar resampling methods,
because of more variable second-order terms, their distribution will be more
dispersed, and as a result, there could be some loss of power.
GOF Tests for Normality 85
Let us recall the notation introduced in (7.3) through (7.7). Note that under
86 P. K. Sen
On the other hand, S;, defined by (7.13), is an unbiased and consistent esti-
mator of the second moment (say V2) of the standardized d.f. G (whose scale
parameter is taken as 1); V2 may not necessarily be equal to 1; take for example,
G logistic with unit scale parameter. Further, using the results of Hoeffding
(1953) on expected order statistics, we claim that
(7.58)
(7.59)
mn V-
2
n - ( I
n mn )( I' n l'n I ).
while using the Hoeffding (1953) results, along with the fact [Jung (1955)] that
V;lmn can be well approximated by mn, we conclude that as n increases,
(7.61)
where Fa is the standard normal d.f., so that its second moment is equal to 1,
and by definition, V2 = J~(G-l(u))2du. As a result, it follows by some standard
steps that under the alternative that G is the true d.f.,
On the other hand, from the results in Section 7.4, we conclude that under the
null hypothesis of normality,
As a result we conclude that the SW-test for normality is consistent for the
entire class of alternatives for which .6. is strictly less than one. This is the
case when the two quantile functions <I>-l(p) and G-1(p) (for p E (0,1)) do
not coincide for all values of p. In that way, the domain of consistency of
the SW-test includes all nonnormal distributions admitting finite second order
moments (so that the normal BLUE of cr converges to a limit other than the
scale parameter of such a distribution). This is certainly a very strong result in
GOF Tests for Normality 87
the sense that it includes separable families of alternatives in a very natural way,
and it includes mixture models also in the same vein. For example, against F
normal, we might be interested in the set of alternatives that it is contaminated
dJ., namely,
F(x) = (1-7])1>(x) + 7]H(x), (7.64)
where 7] > 0 is small, and H (x) has a heavier tail; it could be a normal dJ. with
a larger variance or even some other one like the Laplace that has a heavier
tail than a normal one. It is also possible to treat 7] as a sequence converging
to 0, and in that way local contamination models are also contemplated in this
setup. However, the consistency is a minimum requirement for any GOF test,
and it should not be overemphasized. There may not be a unique GOF test for
normality with power-optimality against such a broad class of alternatives. For
this reason, Jureckova and Sen (2000) discussed such asymptotic power pictures
for other tests.
As regards the consistency property of modified SW-type tests (as discussed
in earlier sections), the picture is the same. It is only with respect to power
properties there could be some difference. In view of (7.55), there is a need
to calibrate the critical levels of such modified test statistics (such as W~*)
as otherwise for local alternatives there might not be a perceptible difference
particularly when the sample size is large. However, if we consider a fixed
alternative (that is more appropriate in the present context), then the rate at
which the power function goes to one [in the Bahadur (1960) sense] might be
different. The basic difficulty for such a study stems from the fact that due
to their complicated null hypothesis distributions the exact Bahadur-slopes for
such statistics are not that simple to formulate, while the approximate Bahadur-
slope comparisons are known to be deficient in certain respects. As such, the
empirical evidence acquired from extensive numerical studies made so far [viz.,
Shapiro (1998)] should be used as a stepping stone for further comparative
studies.
References
1. Bahadur, R. R. (1960). Stochastic comparison of tests. Annals of Math-
ematical Statistics, 31, 276-295.
2. De Wet, T. and Venter, J. H. (1973). Asymptotic distributions of quadratic
forms with application to test of fit, Annals of Statistics, 1, 380-387.
88 P. K. Sen
Abstract: There have been numerous tests proposed in the literature to de-
termine whether or not an exponential model is appropriate for a given data
set. These procedures range from graphical techniques, to tests that exploit
characterization results for the exponential distribution. In this article, we pro-
pose a goodness-of-fit test for the exponential distribution based on general
progressively Type-II censored data. This test based on spacings generalizes a
test proposed by Tiku (1980). We derive the exact and asymptotic null dis-
tribution of the test statistic. The results of a simulation study of the power
under several different alternatives like the Weibull, Lomax, Lognormal and
Gamma distributions are presented. We also discuss an approximation to the
power based on normality and compare the results with those obtained by sim-
ulation. A wide range of sample sizes and progressive censoring schemes have
been considered for the empirical study. We also compare the performance of
this procedure with two standard tests for exponentiality, viz. the Cramer-von
Mises and the Shapiro-Wilk test. The results are illustrated on some real data
for the one- and two-parameter exponential models. Finally, some extensions
to the multi-sample case are suggested.
8.1 Introduction
The exponential distribution is one of the most widely used life-time models
in the areas of life testing and reliability. The volume by Balakrishnan and
Basu (1995) [see also Johnson, Kotz, and Balakrishnan (1994, Chapter 19)]
provides an extensive review of the genesis of the distribution and its properties,
89
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
90 N. Balakrishnan, H. K. T. Ng, and N. Kannan
Ho : X ~ Exp(rJ)
d
against HI: X i- Exp(rJ). (8.3)
For convenience, we will suppress the censoring scheme in the notation of the
XI:m:n s .
Define the normalized spacings 8 1 , 82, ... , 8 m as
(RI, ... ,Rm)
81 n X l:m:n ,
82 (n - R - l)(X(Rl, ... ,Rm.) _ X(Rl, ... ,Rm ))
I 2:m:n l:m:n'
83 (n - R I - R 2 - 2)(X(RI, ... ,Rm) _ X(Rl, ... ,Rm ))
3:m:n 2:m:n'
The numerator of the test statistic is a linear combination of the spacings with
decreasing weights, and the denominator is the sum of the spacings. The test
statistic is dearly scale invariant, with small and large values of T leading to the
rejection of Ho. The statistic T was suggested by Tiku (1980) for complete and
doubly Type-II censored samples. Balakrishnan (1983) studied the power of the
test against a variety of alternatives, and showed that the test (for complete
samples) performs well compared to standard tests in the literature.
(8.6)
where
j
"£ Si
Zj = i~1 , j = 1,2, ... , m - l.
"£ Si
i=1
Since S1, S2, ... , Sm are all independent and identically distributed as ex-
ponential with scale parameter CT, the joint p.d.f. of S1, S2, ... , Sm is given
by
We then have
94 N. Balakrishnan, H. K. T. Ng, and N. Kannan
82 Z2Zm - ZlZm
83 Z3Zm - Z2Zm
8m -i Zm-iZm - Zm-2 Z m
8m Zm - Zm-iZm.
Zm 0 0 0 0 Zi
-Zm Zm 0 0 0 Z2 - Zi
0 -Zm Zm 0 0 Z3 - Z2
PI det
0
0 0 0 -Zm Zm Zm-i - Zm-2
0 0 0 -Zm 1- Zm-i
The joint distribution of Zi, Z2, ... , Zm-i is thus the same as the joint
distribution of the (m-l) order statistics (say, U(i)"'" U(m-i)) obtained from
a random sample of size (m - 1) from the Uniform (0,1) distribution (say,
Ui,"" Um-i). Hence, we immediately have
This implies that the null distribution of the test statistic T is exactly the same
as the average of m-l LLd. Uniform(O,I) random variables. Therefore, the null
distribution of T tends to normality very rapidly as m increases. It is readily
verified that the mean of the limiting distribution is E(T) = ~ and variance
Var(T) = i2(~-1)'
A Test of Exponentiality 95
Remark 8.3.1 The above expressions of E(T) = ~ and Var(T) = 12(~-1) can
also be derived by taking expectations on both sides of
m m-l
T(m-1)L: Si= L:(m-i)Si
i=l i=l
and using Basu's theorem with the facts that 2 I:~l Si/~ is distributed as X~m'
2Sd ~ is distributed as X~, and that the ancillary statistic T is independent of
the complete sufficient statistic I:~l Si.
Pr(T ~ c) Pr
I:
m-l (m
[ i=l
- i)Si
m >c
1
(m -1) i~l Si
Pr(L ~ 0),
where
m-l m
L= L:(m-i)Si-c(m-1)L:Si.
i=l i=l
where
ai [(m - i) - c(m - 1)](Ri + 1) + (n - i - Rl - ... - Ri),
i=I, ... ,m-l,
am = -c(m - 1)(Rm + 1).
For large values of m, we may approximate the probability by
The single and product moments of progressively Type-II right censored or-
der statistics occurring in the above expression may be obtained by first-order
approximations; see Balakrishnan and Rao (1997). The idea is to use the prob-
ability integral transformation
X(R1, ... ,Rm) 1:: F-1(U(Rl, ... ,Rm)) (8.8)
~:m:n ~:m:n'
where UF!.~~··,Rm) is the ith progressively Type-II right censored order statis-
tic from the uniform U(O,l) distribution, and F- 1 is the inverse cdf of the
underlying distribution.
The mean, variance, and covariance for progressively Type-II order statis-
tics from the Uniform U(O, 1) distribution are given by [see Balakrishnan and
Aggarwala (2000)]
E(Ui:m:n) i = 1, ... ,m, (8.9)
Var(Ui:m:n ) aibi, i=l, ... ,m, (8.10)
COV(Ui:m:n, Uj:m:n) aibj, 1 ::; i ::; j ::; m, (8.11)
where
IT Rk + Rk+ + ... + Rm + k + 1 ,
1 m -
k=lRk + Rk+1 + ... + Rm + m - k + 2
IT Rk + Rk+1 + ... + Rm + k + 2 m -
k=lRk + Rk+ + ... + Rm + m - k + 3
1
h
were F -l(l)()
u = dP-l(u)
du = f(F 1l(u))' B a Iak rIS. hnan and Rao (1997) used
these results to derive expressions for the approximate best linear unbiased
estimators for an arbitrary location-scale family of distributions.
We would like to point out that even though limiting results for linear
combinations of regular order statistics are available [see, for example, David
(1981)]' such results under progressive censoring have not been studied yet. It
is unclear whether the results in the regular case can be easily extended to
progressive censoring.
Instead of rewriting the test statistic as a linear combination of the pro-
gressively censored order statistics, we may directly approximate the power by
considering the test statistic T. We may write
m-l
E (m - i)Si
T = ....;.i=_l_ _m"""'-_ (8.15)
(m -1) E Si
i=l
We then have,
(8.16)
See Kendall and Stuart (1969) for details. We may then approximate the dis-
tribution of T by a normal distribution with mean and variance given by the
above expressions.
(8.19)
where the scale (J > 0, and the location ~ are unknown parameters. In this case,
the progressively Type-II right censored spacings 8i, 82' ... , 8:n
are defined as
8 1* n
(X(Rl, ... ,Rm) _
l:m:n
)
~ ,
where Si'S are as defined earlier in (8.4). Once again, Si, S2' ... , S:n are all
independent and identically distributed as exponential with scale parameter CT.
Since the first spacing Si involves the unknown parameter /-t, the test statistic
T proposed earlier in (8.5) may be modified as
m-1
2: (m - i)S;
T* = .,;;i_=.::,2_ _::::-_ (8.22)
m
(m - 2)
i=2
2: S;
Following the same procedure outlined in Section 8.3, the null distribution of
the test statistic T* can be derived. The distribution of T* is the same as the
distribution of the average of (m - 2) Li.d. Uniform(O,l) random variables.
Hence, the asymptotic null distribution of T* is normal with mean E(T*) =
~ and variance Var(T*) = 12(;;'-2)' Furthermore, the power approximation
procedure discussed in Section 8.4 can also be adapted to this two-parameter
exponential case.
2<I> (0.4~
1/84
0.5) = 2 x 0.26810 = 0.53620.
Based on this p-value, we fail to reject the null hypothesis that the random
sample is from an exponential distribution. This is consistent with the findings
of Nelson (1982) and Viveros and Balakrishnan (1994).
Spinelli and Stephens (1987) studied tests based on regression and the em-
pirical distribution function for testing the null hypothesis of exponentiality
using the complete sample. They found that all the test statistics were highly
significant (with p-value < 0.01) and rejected the null hypothesis that the data
are exponentially distributed with p.d.f. (8.18).
The test statistic in (8.22) for testing the validity of a two-parameter expo-
nential distribution is computed as
m-l
L (m - i)Si
T* i=2 = 19983.72 = 0.75391
(m - 2) f: Si 26506.8 '
i=2
102 N. Balakrishnan, H. K. T. Ng, and N. Kannan
X~~~~·i·:·~~mii), i = 1,2, ... ,k, come from exponential populations E(lLi, O'i), we
can generalize the test statistic T* in (8.22) as follows:
(8.23)
where Tt is the test statistic computed from the ith sample. Small and large
values of T* indicate the non-exponentiality of at least one of the k samples.
If we wish to test that the samples come from one-parameter exponential
populations E(O'i), we can generalize the test statistic T in (8.5) as follows:
close to the simulated values for most cases considered. It is of interest to note
that combinations of censoring schemes for the k samples provide distinctly
different power values.
8.9 Conclusions
In this article, we have proposed goodness-of-fit tests for the one- and two-
parameter exponential models under general progressive Type-II censoring.
These tests are based on normalized spacings, generalizing tests proposed by
Tiku (1980). The exact and asymptotic null distribution of the test statistics
have been derived. Further, two approximations to compute the power under
different alternatives have been suggested.
Results of the simulation study for a wide range of sample sizes and cen-
soring schemes show that the test performs well in detecting departures from
exponentiality. If the alternative model is distinctly different from exponential,
the power values are close to 1. The approximations for the power are very close
to the values obtained through simulations. The proposed test procedures are
illustrated on some real data for the one- and two-parameter exponential mod-
els. The conclusions drawn from these tests are consistent with those drawn
by other authors using different procedures. Finally, some extensions to the
multi-sample case have been suggested.
There are several theoretical aspects that still need to be looked at carefully.
In particular, it would be useful to develop limit theorems for linear combina-
tions of progressively Type-II censored order statistics. This would provide
theoretical justification for the normal approximations suggested in this paper.
Finally, it would also be interesting to develop analogous goodness-of-fit tests
for the general location-scale family of distributions.
104 N. Balakrishnan, H. K. T. Ng, and N. Kannan
Table 8.2: Monte Carlo power estimates for Weibull distribution at 10% and
5% levels of significance
WeibuJl(0.5
10'70 5%
c.s. T App(L) App(W) A- WE T App(L) App(W) A< WE
1 0.71672 0.79033 0.70467 0.60746 0.55796 0.61452 0.69872 0.59668 0.52883 0.45663
2 0.51847 0.66998 0.57311 0.37713 0.33930 0.39990 0.55197 0.45694 0.29668 0.24614
3 0.57377 0.72616 0.63247 0.43776 0.39414 0.45581 0.61273 0.51555 0.35465 0.29455
4 0.83379 0.83901 0.81327 0.66873 0.63183 0.76001 0.77399 0.72796 0.59327 0.53617
5 0.69 96 0.78209 0.73798 0.47934 0.44554 0.59111 0.68995 0.63459 0.39233 0.34245
6 0.85449 0.85515 0.83683 0.68503 0.65829 0.79102 0.80064 0.76174 0.61718 0.56993
7 0.90389 0.87568 0.88596 0.71755 0.68749 0.85230 0.82815 0.82318 0.64605 0.59762
8 0.84024 0.85959 0.86140 0.59537 0.56385 0.76609 0.80119 0.78752 0.51014 0.46244
9 0.92164 0.89050 0.90996 0.74722 0.72349 0.87656 0.84947 0.85511 0.68152 0.63910
10 0.80630 0.83480 0.79047 0.69662 0.64007 0.72293 0.76423 0.69720 0.61928 0.54240
11 0.58011 0.69950 0.62607 0.39267 0.36022 0.46239 0.58494 0.51114 0.31052 0.26233
12 0.70747 0.81095 0.75718 0.51728 0.48377 0.60430 0.72898 0.65652 0.43550 0.38432
13 0.95650 0.91312 0.94612 0.80383 0.77610 0.92708 0.88138 0.90818 0.74337 0.69770
14 0.86101 0.87312 0.88381 0.56856 0.54223 0.78663 0.81655 0.81548 0.48023 0.43650
15 0.94605 0.91715 0.95127 0.75547 0.73113 0.91204 0.88713 0.91625 0.68957 0.64844
16 0.99036 0.95235 0.98797 0.87649 0.86136 0.98199 0.93601 0.97634 0.83067 0.80121
17 0.97063 0.94666 0.97797 0.74441 0.72546 0.94722 0.92471 0.95751 0.67077 0.63439
18 0.97969 0.95122 0.98410 0.79352 0.77495 0.96253 0.93245 0.96856 0.72763 0.69231
19 0.95994 0.91637 0.95033 0.82456 0.79270 0.93151 0.88588 0.91435 0.76548 0.71624
20 0.83761 0.85969 0.86431 0.53713 0.50973 0.75680 0.79580 0.78887 0.44811 0.40481
21 0.94038 0.92336 0.95982 0.72449 0.70207 0.90314 0.89470 0.92723 0.65344 0.61401
22 0.99840 0.97427 0.99792 0.93655 0.92493 0.99638 0.96575 0.99528 0.90704 0.88510
23 0.98985 0.97129 0.99328 0.80242 0.78826 0.97958 0.95890 0.98529 0.73386 0.70346
24 0.99819 0.97611 0.99834 0.92981 0.91833 0.99601 0.96820 0.99615 0.89855 0.87707
25 0.99970 0.98529 0.99961 0.96329 0.95653 0.99916 0.98049 0.99901 0.94347 0.92864
26 0.99872 0.98636 0.99921 0.90278 0.89330 0.99691 0.98104 0.99797 0.85950 0.83900
27 0.99924 0.98660 0.99949 0.92795 0.92069 0.99818 0.98174 0.99866 0.89362 0.87540
WeibuJl(2.0
10'70 5'70
c.s. T App(L) App(W) A WE T App(L) App(W) A WE
1 0.81945 0.89734 0.93811 0.25905 0.23759 0.68849 0.81533 0.84627 0.17926 0.14879
2 0.49956 0.55783 0.55755 0.14362 0.13726 0.34470 0.39873 0.39878 0.08634 0.07574
3 0.60582 0.68462 0.68882 0.16607 0.15746 0.44619 0.52574 0.52552 0.10594 0.09065
4 0.91172 0.94854 0.97258 0.30676 0.25619 0.82316 0.89889 0.92478 0.21770 0.16360
5 0.72826 0.78185 0.78672 0.19354 0.16742 0.58402 0.64632 0.64752 0.12369 0.09740
6 0.87935 0.92728 0.94502 0.24654 0.21070 0.76253 0.85062 0.86532 0.16674 0.12852
7 0.95772 0.97482 0.98732 0.34094 0.27326 0.90236 0.94434 0.96109 0.24467 0.17720
8 0.89272 0.92350 0.934U4 0.26365 0.21288 0.79847 0.84933 0.85772 0.17796 0.13134
9 0.96362 0.97831 0.98915 0.33296 0.27218 0.91216 0.94911 0.96429 0.23641 0.17514
10 0.93802 0.95734 0.99257 0.42401 0.35878 0.87462 0.92656 0.97241 0.32530 0.24949
11 0.58036 0.63285 0.63318 0.15917 0.14561 0.42602 0.47993 0.47986 0.09810 0.08196
12 0.73598 0.79830 0.80971 0.19894 0.17756 0.58615 0.66508 0.66851 0.12964 0.10439
13 0.99378 0.99339 0.99923 0.53410 0.40719 0.98161 0.98593 0.99659 0.42401 0.29094
14 0.91227 0.93098 0.93719 0.28490 0.22011 0.83312 0.86453 0.86957 0.19431 0.13708
15 0.98015 0.98502 0.99220 0.37401 0.29222 0.94816 0.96359 0.97446 0.27154 0.19144
16 0.99956 0.99932 0.99994 0.61063 0.45501 0.99800 0.99806 0.99964 0.49531 0.33160
17 0.99342 0.99455 0.99691 0.45004 0.32440 0.98097 0.98533 0.98956 0.33607 0.21861
18 0.99723 0.99742 0.99910 0.51169 0.37044 0.99111 0.99282 0.99636 0.39531 0.25746
19 0.99673 0.99372 0.99978 0.63063 0.48642 0.98981 0.98830 0.99890 0.52833 0.36543
20 0.89434 0.91226 0.91695 0.26949 0.20967 0.80681 0.83526 0.83849 0.18194 0.12919
21 0.97705 0.98067 0.98895 0.37580 0.29183 0.94247 0.95524 0.96683 0.27360 0.19119
22 1.00000 0.99994 1.00000 0.76301 0.57640 0.99993 0.99982 0.99999 0.66509 0.45036
23 0.99878 0.99896 0.99953 0.54811 0.38065 0.99612 0.99674 0.99805 0.42790 0.26786
24 1.00000 0.99994 1.00000 0.67704 0.50448 0.99984 0.99977 0.99997 0.56469 0.37931
25 1.00000 1.00000 1.00000 0.81038 0.61641 1.00000 0.99999 1.00000 0.71880 0.49141
26 0.99999 0.99996 0.99999 0.69585 0.49341 0.99988 0.99985 0.99996 0.58462 0.36861
27 0.99999 0.99998 1.00000 0.74535 0.54158 0.99997 0.99994 0.99999 0.64051 0.41547
106 N. Balakrishnan, H. K. T. Ng, and N. Kannan
Table 8.3: Monte Carlo power estimates for Lomax distribution at 10% and 5%
levels of significance
Lomax 0.5)
10% 520
c.s. T App(L) App(W) A- WE T App(L) App(W) A- WE
[1 0.82303 0.93658 0.79763 0.68983 0.68715 0.77238 0.91681 0.73874 0.64701 0.63292
2 0.21607 0.31514 0.23585 0.12015 0.11848 0.13322 0.23460 0.15311 0.06864 0.06345
3 0.35568 0.48739 0.37394 0.17834 0.17364 0.25550 0.37055 0.27518 0.11914 0.10764
4) 0.93605 0.96461 0.92302 0.82777 0.81768 0.91141 0.95555 0.89321 0.79650 0.77966
5 0.50119 0.61830 0.51626 0.22339 0.20745 0.39396 0.50200 0.41055 0.15846 0.13678
6 0.87040 0.93725 0.83274 0.73201 0.71268 0.83437 0.92106 0.78946 0.69530 0.66573
7 0.97759 0.97571 0.97297 0.90291 0.89338 0.96663 0.97001 0.95990 0.88231 0.86585
[8 0.84985 0.88405 0.85203 0.54634 0.51649 0.79228 0.84954 0.79410 0.47630 0.43396
9 0.97492 0.97467 0.97028 0.89796 0.88618 0.96332 0.96873 0.95639 0.87614 0.85794
[10) 0.89205 0.95408 0.87313 0.76905 0.76184 0.85680 0.94141 0.83023 0.73191 0.71581
[11) 0.15319 0.22198 0.16594 0.10395 0.10412 0.08547 0.15746 0.09761 0.05345 0.05197
12 0.30073 0.38308 0.31154 0.14420 0.13878 0.20851 0.28029 0.22031 0.08717 0.07877
13 0.99203 0.98155 0.99126 0.94559 0.93672 0.98753 0.97742 0.98616 0.93202 0.91870
14 0.51911 0.57796 0.53041 0.17396 0.15807 0.40612 0.45920 0.41917 0.11173 0.09385
15 0.87879 0.88714 0.84641 0.63073 0.57919 0.83686 0.85845 0.80122 0.57361 0.51157
16 0.99947 0.98876 0.99965 0.98655 0.98275 0.99906 0.98639 0.99934 0.98181 0.97631
17 0.95434 0.91384 0.95475 0.61017 0.55853 0.92672 0.88995 0.92805 0.53408 0.46811
[18J 0.98875 0.94983 0.98856 0.83422 0.80025 0.98088 0.93788 0.98073 0.79008 0.74271
19 0.99196 0.98152 0.99125 0.94550 0.93663 0.98750 0.97739 0.98614 0.93192 0.91861
[20) 0.27506 0.31672 0.29002 0.11468 0.10877 0.18182 0.21508 0.19538 0.06215 0.05715
21 0.67153 0.72728 0.67173 0.29673 0.26005 0.58254 0.64608 0.58271 0.22258 0.18253
22) 0.99998 0.99205 0.99999 0.99707 0.99574 0.99997 0.99041 0.99998 0.99578 0.99347
23 0.95153 0.90840 0.95330 0.49868 0.43662 0.92107 0.87992 0.92351 0.41043 0.33957
24) 0.99844 0.97332 0.99211 0.96215 0.94623 0.99740 0.96775 0.98879 0.95045 0.92816
25 1.00000 0.99392 1.00000 0.99937 0.99884 1.00000 0.99269 1.00000 0.99906 0.99835
[26J 0.99923 0.96646 0.99920 0.91593 0.88434 0.99846 0.95875 0.99839 0.88664. 0.84007
[27) 0.99997 0.97647 0.99992 0.98098 0.97197 0.99987 0.97147 0.99982 0.97274 0.95854
Lomax(2.0)
10-"" 5')'0
c.s. T App(L) App(W) A- WE T App(L) App(W) A- WE
[1) 0.30907 0.28451 0.21716 0.17958 0.16996 0.22452 0.21132 0.14050 0.12285 0.10901
[2 0.11154 0.18329 0.12257 0.09971 0.09993 0.05553 0.13127 0.06576 0.05001 0.05019
[3) 0.12760 0.19612 0.13394 0.10209 0.10199 0.06827 0.14152 0.07401 0.05164 0.05101
4 0.40318 0.34265 0.28542 0.21513 0.19097 0.31774 0.24740 0.19945 0.15517 0.12782
5) 0.14348 0.18665 0.14423 0.10234 0.10098 0.07840 0.12598 0.08144 0.05257 0.05097
6 0.34351 0.26750 0.22272 0.19374 0.17011 0.26190 0.18905 0.14520 0.13441 0.11023
7) 0.48634 0.41182 0.35544 0.24258 0.20713 0.39831 0.30515 0.26333 0.17987 0.14128
[8) 0.24019 0.25206 0.21992 0.11499 0.11005 0.15713 0.16879 0.14033 0.06245 0.05625
[9J 0.47963 0.40199 0.34740 0.24079 0.20562 0.39190 0.29672 0.25631 0.17800 0.13977
10 0.36079 0.30957 0.25018 0.19619 0.18124 0.27491 0.22425 0.16867 0.13741 0.11851
11 0.10710 0.16402 0.11660 0.09873 0.10080 0.05239 0.11186 0.06150 0.04911 0.04951
[12 0.12625 0.17748 0.12956 0.10021 0.10212 0.06505 0.12231 0.07084 0.05093 0.05065
13 0.55863 0.47640 0.42100 0.27115 0.22498 0.46974 0.36730 0.32598 0.20418 0.15610
[14 0.13945 0.16562 0.14154 0.10118 0.09983 0.07666 0.10342 0.07927 0.05174 0.05031
15 0.28328 0.23982 0.21548 0.13414 0.12020 0.20092 0.15828 0.13758 0.07975 0.06556
[16 0.69864 0.60886 0.56620 0.32850 0.26260 0.61953 0.51160 0.47313 0.25704 0.18883
[17) 0.28986 0.29036 0.27554 0.11381 0.10897 0.19623 0.19331 0.18472 0.06078 0.05647
18) 0.40740 0.39444 0.37317 0.13461 0.12268 0.30562 0.28362 0.27218 0.07802 0.06664
[19 0.55835 0.47602 0.42058 0.27110 0.22499 0.46945 0.36690 0.32560 0.20421 0.15603
20 0.11311 0.14384 0.11934 0.09941 0.09869 0.05773 0.08832 0.06338 0.05016 0.04953
21) 0.17575 0.19212 0.16926 0.10483 0.10222 0.10522 0.12260 0.10054 0.05452 0.05210
22 0.79492 0.70105 0.67852 0.37994 0.29782 0.72959 0.62078 0.59503 0.30652 0.21959
23 0.26916 0.26786 0.25884 0.10682 0.10408 0.17633 0.17449 0.16974 0.05611 0.05310
24J 0.59787 0.49216 0.46968 0.23909 0.18454 0.50840 0.38396 0.36923 0.17237 0.11847
[25 0.86092 0.76694 0.76334 0.42547 0.33020 0.80912 0.70166 0.69209 0.34902 0.24792
[26 0.51287 0.49757 0.48655 0.13395 0.12176 0.40291 0.38241 0.37653 0.07531 0.06508
27 0.65010 0.61974 0.60888 0.17696 0.15089 0.55128 0.51589 0.50570 0.11144 0.08636
A Test of Exponentiality 107
Table 8.4: Monte Carlo power estimates for Lognormal distribution at 10% and
5% levels of significance
Lognormal 0.5)
10'11> 5'7.
c.s. T App(L) App(W) A WE T App(L) App(W) A- WE
1 0.97317 0.99822 0.99953 0.57405 0.45279 0.93250 0.99138 0.99521 0.45287 0.32599
2 0.92186 0.95576 0.95264 0.36626 0.30817 0.82028 0.87119 0.86690 0.26666 0.20415
3 0.95509 0.98294 0.98435 0.44054 0.35978 0.88607 0.93814 0.93878 0.32760 0.24556
4 0.98369 0.99916 0.99951 0.61419 0.40693 0.95757 0.99580 0.99662 0.48638 0.28805
5 0.97790 0.99171 0.99046 0.49165 0.34131 0.93753 0.96818 0.96493 0.37131 0.23011
6 0.98599 0.99917 0.99931 0.53921 0.37542 0.95939 0.99506 0.99516 0.41481 0.25898
7 0.98804 0.99948 0.99947 0.62071 0.37729 0.96865 0.99725 0.99697 0.49201 0.26345
8 0.99050 0.99799 0.99747 0.57392 0.35703 0.97141 0.99078 0.98897 0.44477 0.24460
9 0.99008 0.99964 0.99960 0.60367 0.37816 0.97302 0.99787 0.99752 0.47486 0.26274
10 0.99883 0.99999 1.00000 0.91263 0.73497 0.99648 0.99993 1.00000 0.84945 0.61977
11 0.98880 0.99641 0.99603 0.58133 0.43831 0.96171 0.98239 0.98083 0.45922 0.31393
12 0.99844 0.99971 0.99988 0.70643 0.54012 0.99226 0.99791 0.99862 0.58682 0.40452
13 0.99968 1.00000 1.00000 0.93916 0.63184 0.99916 0.99999 1.00000 0.88579 0.50824
14 0.99983 0.99998 0.99998 0.83711 0.52330 0.99918 0.99985 0.99980 0.73616 0.39745
15 0.99995 1.00000 1.00000 0.88454 0.58418 0.99977 0.99999 1.00000 0.80170 0.45680
16 0.99987 1.00000 1.00000 0.94584 0.59018 0.99970 1.00000 1.00000 0.89330 0.46627
17 0.99999 1.00000 1.00000 0.92174 0.56214 0.99993 1.00000 0.99999 0.85183 0.43561
18 0.99999 1.00000 1.00000 0.93584 0.57814 0.99993 1.00000 1.00000 0.87549 0.45259
19 0.99998 1.00000 1.00000 0.99074 0.80103 0.99993 1.00000 1.00000 0.97817 0.70248
20 0.99997 1.00000 1.00000 0.90574 0.60691 0.99986 0.99998 0.99998 0.83300 0.48040
21 1.00000 1.00000 1.00000 0.95619 0.70913 1.00000 1.00000 1.00000 0.91205 0.58757
22 0.99999 1.00000 1.00000 0.99518 0.73613 0.99999 1.00000 1.00000 0.98564 0.62309
23 1.00000 1.00000 1.00000 0.98858 0.69547 1.00000 1.00000 1.00000 0.96991) 0.57676
24 1.00000 1.00000 1.00000 0.99267 0.73544 0.99999 1.00000 1.00000 0.97930 0.62046
25 1.00000 1.00000 1.00000 0.99540 0.71991 1.00000 1.00000 1.00000 0.98692 0.60573
26 1.00000 1.00000 1.00000 0.99435 0.71123 1.00000 1.00000 1.00000 0.98352 0.59598
27 1.00000 1.00000 1.00000 0.99510 0.71781 1.00000 1.00000 1.00000 0.98578 0.60334
Lognormal 1.0
10'11> 5'11>
c.s. T App(L) App(W) A WE T App(L) App(W) A WE
1 0.20500 0.29174 0.27413 0.11028 0.10874 0.12145 0.21294 0.18012 0.05800 0.05538
2 0.24330 0.27591 0.26382 0.10210 0.10222 0.13892 0.18235 0.16012 0.05253 0.05076
3 0.23727 0.28077 0.26728 0.10238 fr.10244 0.13645 0.18965 0.16506 0.05240 0.05095
4 0.18549 0.24289 0.22190 0.11502 0.10925 0.11137 0.17729 0.14109 0.06320 0.05743
5 0.21975 0.25508 0.24096 0.09901 0.09883 0.12839 0.17135 0.14707 0.05157 0.04913
6 0.20836 0.27502 0.26113 0.11233 0.10829 0.12532 0.19362 0.16736 0.06123 0.05592
7 0.18059 0.20926 0.18788 0.12156 0.11289 0.11081 0.15041 0.11520 0.06750 0.05970
8 0.16825 0.21182 0.19219 0.10035 0.10005 0.09605 0.14480 0.11486 0.05062 0.04966
9 0.18455 0.21834 0.19881 0.12050 0.11267 0.11222 0.15713 0.12356 0.06690 0.05911
10 0.29002 0.41242 0.40955 0.13184 0.12372 0.19056 0.30891 0.29732 0.07706 0.06719
11 0.40977 0.43709 0.43629 0.11688 0.11248 0.27017 0.30055 0.29513 0.06533 0.05891
12 0.42399 0.46854 0.46800 0.11977 0.11430 0.28509 0.32841 0.32411 0.06761 0.05954
13 0.21872 0.28042 0.27187 0.14004 0.12703 0.13898 0.20047 u.18414 0.08305 0.06956
14 0.40377 0.42535 0.42453 0.11755 0.10848 0.27285 0.29961 0.29391 0.06296 0.05608
15 0.33082 0.40361 0.40213 0.11992 0.11126 0.21963 0.28892 0.28084 0.06599 0.05802
16 0.20384 0.21442 0.20436 0.15210 0.13327 0.13046 0.15171 0.12945 0.09250 0.07404
17 0.26370 0.29442 0.28827 0.11259 0.10725 0.16711 0.20260 0.18821 0.06001 0.05530
18 0.20548 0.24718 0.23741 0.11441 0.10902 0.12479 0.17094 0.15149 0.06116 0.05656
19 0.27206 0.37697 0.37584 0.16762 0.14182 0.18277 0.28612 0.27495 0.10273 0.08003
20 0.59741 0.61445 0.61335 0.14869 0.12681 0.45061 0.46773 0.46762 0.08561 0.06788
21 0.54944 0.59360 0.59258 0.14769 0.12547 0.41023 0.45363 0.45334 0.08552 0.06683
22 0.21695 0.22832 0.22276 0.18074 0.14920 0.14186 0.16210 0.14505 0.11487 0.08614
23 0.41991 0.44414 0.44395 0.13418 0.11751 0.29660 0.32185 0.31788 0.07462 0.06233
24 0.27574 0.35465 0.35287 0.15682 0.13262 0.18403 0.25781 0.24838 0.09450 0.07436
25 0.22026 0.19908 0.19353 0.18767 0.15237 0.14658 0.13547 0.12114 0.12045 0.08813
26 0.25109 0.28162 0.27666 0.12797 0.11442 0.16094 0.19489 0.18231 0.06991 0.05983
27 0.19203 0.22635 0.21912 0.13294 0.11846 0.11469 0.15524 0.13924 0.07404 0.06270
108 N. Balakrishnan, H. K. T. Ng, and N. Kannan
Table 8.5: Monte Carlo power estimates for Gamma distribution at 10% and
5% levels of significance
Gamma(0.75)
10'10 5 0
c.s. T App(L) App(W) A" WE T App(L) App(W) A" WE
1 0.18415 0.20186 0.14326 0.13881 0.13226 0.10768 0.14541 0.08019 0.08196 0.07277
2 0.17285 0.24856 0.18925 0.13105 0.12493 0.09870 0.18408 0.11582 0.07548 0.06725
[3 0.17698 0.24464 0.18837 0.13428 0.12747 0.10133 0.17894 0.11471 0.07845 0.06911
[4] 0.20126 0.18905 0.15029 0.13406 0.12826 0.12070 0.12841 0.08523 0.07908 0.07091
5 0.19227 0.23571 0.20012 0.12946 0.12390 0.11468 0.16427 0.12364 0.07521 0.06779
[6 0.22455 0.19843 0.15971 0.13788 0.13169 0.14011 0.13668 0.09233 0.08220 0.07366
7 0.21915 0.18877 0.16068 0.13048 0.12508 0.13728 0.12449 0.09287 0.07550 0.06775
[8 0.21378 0.22985 0.20544 0.12798 0.12310 0.13250 0.15524 0.12734 0.07413 0.06615
9 0.23260 0.19744 0.17014 0.13328 0.12758 0.14705 0.13109 0.10003 0.07760 0.06 965
10 0.20925 0.20162 0.15722 0.14563 0.13917 0.12582 0.13937 0.08970 0.08834 0.07907
11] 0.19293 0.25732 0.21050 0.13239 0.12763 0.11360 0.18612 0.13223 0.07732 0.07132
12 0.21813 0.27227 0.23039 0.14202 0.13576 0.13313 0.19523 0.14796 0.08415 0.07732
[13 0.25790 0.20852 0.18919 0.13505 0.12917 0.16655 0.13580 0.11353 0.07895 0.07154
14 0.24734 0.27371 0.25698 0.12966 0.12466 0.15974 0.18714 0.16810 0.07488 0.06859
[15] 0.28922 0.26257 0.24672 0.13907 0.13300 0.19353 0.17766 0.15884 0.08176 0.07444
16 0.30315 0.23667 0.22608 0.13627 0.13220 0.20500 0.15465 0.14195 0.07859 0.07327
[17] 0.29555 0.30082 0.29251 0.13444 0.13005 0.19685 0.20602 0.19660 0.07629 0.07161
18 0.29813 0.29032 0.28190 0.13498 0.13071 0.19872 0.19721 0.18755 0.07702 0.07198
[19 0.26838 0.21659 0.19826 0.14554 0.13880 0.17535 0.14130 0.12004 0.08561 0.07798
20 0.25713 0.28799 0.27132 0.13654 0.13307 0.16713 0.19870 0.17981 0.07880 0.07305
21 0.30867 0.32241 0.30814 0.14807 0.14217 0.20850 0.22575 0.21093 0.08878 0.08112
[22] 0.35995 0.28396 0.27819 0.13843 0.13502 0.25210 0.19031 0.18315 0.08067 0.07448
23 0.35395 0.36277 0.35837 0.13581 0.13221 0.24655 0.25689 0.25220 0.07848 0.07290
24 0.39902 0.33282 0.32834 0.14345 0.13933 0.29018 0.23045 0.22488 0.08515 0.07792
25 0.40398 0.32213 0.31855 0.13907 0.13471 0.29295 0.22116 0.21665 0.07973 0.07439
[26] 0.39963 0.39085 0.38809 0.13739 0.13324 0.29200 0.28087 0.27803 0.07866 0.07312
27 0.40108 0.37789 0.37505 0.13829 0.13378 0.29301 0.26930 0.26625 0.07910 0.07378
Gamma(2.0)
10% 5%
c.s. T App(L) App(W) A" WFJ T App(L) App(W) A" WE
[I] 0.46065 0.62475 0.62450 0.12548 0.12036 0.31030 0.46339 0.46149 0.07321 0.06460
2 0.31230 0.35326 0.35024 0.10908 0.10780 0.18855 0.23024 0.22497 0.05885 0.05508
[3] 0.35993 0.42015 0.41776 0.11336 0.11036 0.22623 0.27977 0.27830 0.06245 0.05763
4 0.51783 0.67413 0.67629 0.13134 0.11943 0.37470 0.52369 0.52329 0.07698 0.06509
5 0.42380 0.46831 0.46754 0.11795 0.11029 0.28395 0.32517 0.32399 0.06544 0.05888
[6 0.50328 0.64337 0.64437 0.12347 0.11446 0.35268 0.47912 0.47894 0.07051 0.06077
7 0.57175 0.71129 0.71348 0.13545 0.12040 0.42260 0.56653 0.56661 0.07713 0.06545
[8] 0.52100 0.58340 0.58317 0.12623 0.11473 0.37236 0.43239 0.43223 0.07143 0.06223
[9] 0.58871 0.72468 0.72717 0.13433 0.11948 0.43735 0.57774 0.57791 0.07667 0.06468
10] 0.64455 0.79434 0.81290 0.18295 0.15373 0.50515 0.67882 0.68469 0.11693 0.09002
11 0.40708 0.44438 0.44338 0.12129 0.11380 0.27068 0.30636 0.30449 0.06812 0.06056
[12] 0.49898 0.56138 0.56073 0.13200 0.12107 0.34671 0.40423 0.40411 0.07695 0.06556
13 0.76435 0.86731 0.87863 0.19476 0.15400 0.64243 0.77634 0.78295 0.12190 0.08749
14 0.64708 0.67861 0.67864 0.15059 0.13037 0.50255 0.53621 0.53617 0.08910 0.07071
15 0.74438 0.81444 0.81902 0.16688 0.14117 0.60845 0.69161 0.69312 0.10085 0.07851
16 0.84495 0.91453 0.92076 0.19910 0.15700 0.74340 0.84214 0.84661 0.12385 0.08902
[17] 0.79250 0.82753 0.82892 0.17654 0.14476 0.67360 0.71617 0.71654 0.10737 0.08076
18 0.81585 0.85752 0.86045 0.18534 0.14961 0.70285 0.75773 0.75903 0.11441 0.08376
[19J 0.83133 0.90917 0.92771 0.24927 0.18278 0.73048 0.84521 0.86022 0.16829 0.11168
20 0.67630 0.70367 0.70371 0.16159 0.13807 0.53490 0.56499 0.56494 0.09688 0.07688
[21] 0.78997 0.83131 0.83647 0.18907 0.15461 0.66470 0.71712 0.71917 0.11754 0.08968
22 0.93598 0.96845 0.97375 0.25943 0.18517 0.87880 0.93434 0.94018 0.17081 0.11083
[23J 0.90123 0.91524 0.91700 0.21660 0.16524 0.82293 0.84274 0.84371 0.13600 0.09562
[24] 0.94513 0.96927 0.97321 0.24655 0.18208 0.89032 0.93273 0.93696 0.15958 0.10905
25] 0.96211 0.98182 0.98469 0.27168 0.19690 0.92165 0.95854 0.96208 0.18012 0.11732
26 0.94769 0.95990 0.96196 0.24890 0.18512 0.89636 0.91673 0.91854 0.16102 0.10871
27 0.95310 0.96790 0.97045 0.25895 0.19005 0.90658 0.93157 0.93412 0.16873 0.11246
A Test of Exponentiality 109
Table 8.7: Simulated and approximate values of the power of T* at 10% and
5% levels of significance
References
1. Balakrishnan, N. (1983). Empirical power study of a multi-sample test of
exponentiality based on spacings, Journal oj Statistical Computation and
Simulation, 18, 265-271.
11. Nelson, W. (1982). Applied LiJe Data Analysis, New York: John Wiley
& Sons.
9.1 Introduction
In a number of life testing experiments, it is impossible to monitor units con-
tinuously; instead one inspects the units intermittently or at prespecified times.
Thus the data consists of the number of failures or deaths in an interval. For
example, when testing a large number of inexpensive units for time to failure,
it may be cost prohibitive to connect each one to a monitoring device. Thus
an inspector may inspect them at predetermined time intervals and record the
number of units that failed since the last inspection. Similarly in cancer fol-
low up studies where the variable of interest is time to relapse, a patient may
be monitored only at regular intervals or may seek help only after tangible
symptoms of the disease appear. Thus the time to relapse cannot be specified
exactly, but will only be known to lie between two successive clinic visits [see
Yu et al. (2000) for details]. Grouped data also arise when it is not possible to
113
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
114 S. Gulati and J. Neus
measure units precisely due to the finite precision of the measuring instrument.
As a result, one can only record the interval in which a measurement falls. See
Steiner et al. (1994) for some excellent examples of how grouped data can arise
naturally in the industry.
The first test to assess the goodness-of-fit of any model was developed by
Karl Pearson and is the well-known chi-square test. The chi-square test is also
the first test developed for grouped data since the test discretises any given
data set and compares the observed cell counts to the expected cell counts.
Next, came the empirical distribution function (EDF) tests, the Kolmogrov-
Smirnov (KS) tests and the Cramer-von Mises statistics. Originally devel-
oped for complete data sets, they have also been extensively studied for testing
goodness-of-fit for discrete and grouped data sets. The use of the KS test sta-
tistic for goodness-of-fit tests was by Schmid (1958), Conover (1972), Pettitt
and Stephens (1977), Wood and Altavela (1978), among others. While Schmid
(1958) studied the asymptotic distribution of the KS statistic for grouped data,
Conover (1972) derived the exact null distribution of the test statistic as did
Pettitt and Stephens (1974). A detailed study of the use of Cramer-von Mises
statistics for studying goodness-of-fit of discrete data was done by Choulakian
et al. (1994). They derived the asymptotic distribution of the W 2 , U 2 and the
A2 statistics for a discrete distribution and showed that asymptotically all three
test statistics can be written as a sum of independent non-central chi-square
variates.
It is well known that the KS statistic is based on the maximum distance
between the EDF and the hypothesized cumulative distribution function (CDF),
while the Cramer-von Mises statistics are functions of the distance between the
empirical CDF and the true CDF at all the observed data values (for continuous
data the difference is measured at all data points, while for grouped data, the
distance is measured at all the end points of the groups). Hence, the Cramer-
von Mises are in general, more powerful than the KS statistics. As a result, a
subclass of KS statistics which utilize the distance between the EDF and the
hypothesized CDF at all data values or at certain quantiles have been proposed,
among others by, Riedwyl (1967), Maag et al. (1973) and Green and Hegazy
(1976) and more recently, Damianou and Kemp (1990). These test statistics are
more powerful than the KS statistic, Watson's U-statistic and are comparable
to the Cramer-von Mises statistics.
Most goodness-of-fit tests developed for grouped data so far have been for
a completely specified null distribution, i.e., a simple null hypothesis. The pur-
pose of this paper is to develop statistics to test whether the given grouped
data comes from an exponential distribution with an unknown mean. We use
the methodology of Damianou and Kemp (1990) to develop the test statis-
tics. We develop the test statistics in Section 9.2. The asymptotic distribution
of the statistics is studied in Section 9.3, and finally, in Section 9.4, we study the
GOF Statistics for Grouped Exponential Data 115
power of the test statistics via simulations. An example to show the practical
applications of the test is also presented in Section 9.4.
Since B is unknown, our first step in testing the hypothesis involves the
estimation of B. From Kulldorff (1961) the maximum likelihood estimator,
MLE, {j of B exists if and only if nl < nand nk < n, and is obtained by solving
the following equation:
k-l
ni ( Xi - Xi-l ) k
'"'" ()(_.)
~ e x, X,-l - 1 - '"'"
~ nixi-l = O.
(9.2)
i=l i=2
While the above equation can be solved easily by using iterative methods,
note that if all the intervals are of the same length, then (9.2) has a closed form
solution given as follows:
B~ -- ~ 1n
Xl
(1 + k
n - nk
2:i=l (i - l)ni
) . (9.3)
Let 7ri = e-()Xi-l - e-()Xi, 1 :s; i :s; n, be the true probability under the null,
of observing a value in the ith interval. Kulldorff (1961) has shown that {j is
consistent and asymptotically sufficient with asymptotic variance, aJ' given by
(n 2: ;i (d7rddB) 2) -1. From Nelson (1977), we also have that under the null hy-
pothesis, {j is asymptotically normally distributed with mean Band asymptotic
variance aJ.
Now in order to develop the test statistic, define the following quantities at
the inspection times, Xi:
SWl=Vii~~
j=l W1(j)
(9.4)
and
PI 0 0 0
0 P2 0 0
Dp=
0 0 0 ... Pk
Now if we let
0"2 = (dil/de)'(dil/de) = L (dpjde)2/Pi
and
D;I (dil/ de) (dil/ de)'
L = 0"
2 '
then, again from Bishop, Feinberg, and Holland (1975), we have the following:
Theorem 9.3.1 Under the null hypothesis (9.1) and the regularity conditions
defined in Chapter 14 of Bishop, Feinberg, and Holland (1975), the k x 1 vector
W defined as
fo(PI - 7rd
fo(p2 - 7r2)
W=
fo(Pk - 7rk)
converges in distribution to a multivariate normal random W vector with mean
o and variance covariance matrix ~ = (Dp - jJjJ') (I - L).
118 S. Gulati and J. Neus
Since the one-sided ~t statistics defined in (9.5) and (9.6) are linear com-
binations of the vector W, Theorem 9.3.1 then immediately gives us:
Theorem 9.3.2 Assume that the null hypothesis defined in (9.1) is true and
the aforementioned regularity conditions are satisfied. Then as n - t 00, SW1 *
and SW2* converge in distribution to normal random variables with mean 0 and
variance given by O'f and O'~ respectively, where O'f and O'~ are scalar functions
of the matrix ~.
1 1 1 '" 1
We also define the k x k matrices of the weight functions as follows: Q'¥I is
the k x k diagonal matrix with its kth diagonal entry 0 and for 1 :S j :S k - 1,
the jth diagonal entry is 'l1 1( .) =
1 J J 1
F(xj, 61)(1 - F(Xj, e))
and QW2 is defined
A A
similarly for the weight function 'l1 2. That is, QW2 is the k x k diagonal matrix
with its kth diagonal entry 0 and for 1 :S j :S k - 1, the jth diagonal entry is
'l12(j).
Now note that
SW1* =
k-1
L L
(j WO)
'l1 (J O ) = (BQW1C)W
_
(9.8)
J=l 2=1 1 J
SW2* =
k-1
L L
(j W)
(J
'l1 O ) = (BQ W2 C)W.
_ (9.9)
J=l 2=1 2 J
While a one tailed test is not commonly used to test hypotheses of the form
(9.1), Theorem 9.3.2 provides the foundation for the distribution theory of the
GOF Statistics for Grouped Exponential Data 119
two-sided test statistics. To test the hypothesis (9.1) against a general omnibus
alternate hypothesis, that is, Ha: the data do not come from an exponential
distribution, we use the test statistics defined in 9.4 and 9.5. Note that we can
write the statistics SW1 and SW2 as follows:
k-l j -
SW1=L L Wj.
j=l i=l \]i I (J)
and
Thus from Theorem 9.3.2, SW1 and SW2 converge in distribution to the sum of
the absolute values of the components of a multivariate normal random vector.
Note that while the asymptotic distribution of the test statistics is not known in
closed form, with the proliferation of high-speed computers, it can be simulated
quite easily and enable us to calculate "bootstrapped p-values" for the test.
Finally, the testing procedure is given as follows. From the given data set,
calculate the test statistic SW1 (or SW2), henceforth referred to as the data
test statistic SWdat. As mentioned previously, the distribution theory outlined
above allows us to calculate the p-value of the test through the following para-
metric bootstrap technique. Using the estimate, B, of ecalculated from the data,
generate 5,000 samples of size n from the density f(x, B) = Be- xo . Each sam-
ple is then grouped into the intervals defined by (0, Xl), (Xl, X2), ... , (Xk-l, 00),
and B, the "bootstrapped" test statistic SW1 (or SW2) are all calculated. The
p-value of the test is defined to be the proportion of "bootstrapped" test sta-
tistics less than or equal to the data test statistic, SWdat. The test is rejected
for small p-values. A FORTRAN program to calculate the p-value of the test
is available from the authors upon request.
1.0
----------_.
.. -
--------=-:
-"
. /'"
0.8
.. / O~:- •
0.6 O/A~.
~ //.~ e
C'/:/ .
(JJ
0.4
0.2 • /0
-/ --
--- weib(0.8)
:~O
-
Gamma(1.5)
weib(1.5)
HN(0.2)
0.0
References
1. Bishop, Y. M., Feinberg, S. E., and Holland, P. W. (1975). Discrete
Multivariate Analysis, MIT Press: Cambridge.
12. Schmid, P. (1958). On the Kolmogrov and Smirnov limit theorems for
discontinuous distribution functions, Annals of Mathematical Statistics,
29,1011-1027.
13. Steiner, S. H., Geyer, P. 1., and Wesolowsky, G. O. (1994). Control charts
based on grouped data, International Journal of Production Research, 32,
75-91.
14. Wood, C. L. and Altavela, M. M. (1978). Large sample results for the
Kolmogrov-Smirnov statistics for discrete distributions, Biometrika, 65,
235-239.
15. Yu, Q., Li, L., and Wong, G. (2000). On consistency of the self-consistent
estimator of survival functions with interval censored data, Scandinavian
Journal of Statistics, 27, 35-44.
10
Characterization Theorems and Goodness-of-Fit
Tests
125
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
126 C. E. Marchetti and G. S. Mudholkar
bility distributions will be used to illustrate a theme of this paper, namely the
role of the characterization results in GOF tests.
1. The Uniform Distribution. This is the simplest, the earliest and the best-
known entropy characterization. Among all random variables taking values in
[0,1]' the U(O, 1) variate has maximum entropy.
1. Sample Mean and Variance. The mean and variance of a random sample
from a population are independent if and only if the population is normal. This
result is attributed to Geary (1936), Lukacs (1942) and Zinger (1951).
4. Mean and Difference from Harmonic Mean. From the classical arith-
metic, geometric and harmonic inequality it follows that E(X- l ) - E(X) ;::: 0,
with equality if and only if X is degenerate. Actually, the above difference is a
legitimate scale parameter for the distribution of the reciprocal of the IG(J.L, >.)
variate. It is well known that the maximum likelihood estimates p, = X and
1/ A= V = ~(1/ Xi - 1/X) based on a random sample from the IG popula-
tion are independently distributed. Khatri (1962) has shown that X and V are
independently distributed if and only if the population is inverse gaussian.
where XCI) :::; X(2) :::; ... :::; X Cn ), m is a positive integer less than n/2, XCi) =
XCI) for i < 1 and XCi) = XCn) for i > n. Vasicek then proposed rejecting the
null hypothesis Ha that the population is normal if is small, or equivalently if
Kmn = ~ {
n
II (XCi+m) - X Ci - m))
}I/n :::; Ca· (10.2)
2ms i=1
(10.3)
where Hmn is the sample entropy of the Y(i) 's as defined by Vasicek and w is
found by
w2 = tr Yi
n
tr Yi- )-1
2/(n - 1) - n 2 ( n 2 / (n - 1) (10.4)
to be small. As in the case of the entropy test of normality, even the asymptotic
null distribution of this test is analytically intractable. Hence an empirical table
of the 5% points was constructed and compared with the similar table in Vasicek
(1976). Interestingly, the values in the two tables are remarkably close, but not
close enough to be considered identical. We shall return to this point in Section
10.5. Mudholkar and Tian have also considered the use of the Kullback-Leibler
information measure for testing the composite IG hypothesis against simple or
restricted composite alternatives.
1. The Z2-Test. This test due to Lin and Mudholkar (1980), then labeled the
Z test, used the characteristic independence of the sample mean and variance
132 C. E. Marchetti and G. S. Mudholkar
2
1 (1 r)
Z2 = -log -+- ,
1- r
(10.5)
as a test statistic for normality. Under normality, as n ----+ 00, foZ2 ----+ N(O, 3).
For use with small samples, n ~ 5, they empirically obtain approximations for
Var(Z2) and Kurto8i8(Z2) and recommend use of Edgeworth or Cornish-Fisher
corrections to the null distribution. Furthermore, Lin and Mudholkar show
that the Z2 test is consistent against and appropriate for detecting all skewed
alternatives to normality.
Z3 =
~
(1
~') 1og 1 + r3) '
- r3
(10.6)
Var(Z3) = 4 and COV(Z2, Z3) = O. Mudholkar, Lin, and Marchetti use the
two Z-tests to detect four targeted skewness-kurtosis alternatives: right-skew
heavy-tail, right-skew light-tail, left skew heavy-tail and left skew light-tail.
This is done by combining the one-tail versions of the two Z-tests, which are
for all practical purposes independent, using the Fisher (1932) classical method
of combining independent p-values.
(10.7)
(10.9)
as the test statistic, i.e. by applying the Z2 test to the Y's. They derive the
asymptotic null distribution of Zp, and offer its empirical refinement, which is
applicable for n ~ 10.
4. The Z(IG) Test. Mudholkar, Natarajan, and Chaubey (2000) have em-
ployed Khatri's (1962) characterization of the inverse gaussian distribution, as
in the examples above, to construct the Z(IG) statistic for testing the compos-
ite IG hypothesis. They find that asymptotically, under the null hypothesis, as
n ---t 00, jnZ(IG) is normally distributed with zero mean and variance 3, and
present a small sample refinement of the distribution. It is interesting that the
asymptotic null distribution of Z (IG) is exactly same as that of the Z2 statistic
of normality. We shall return to amplify this point in the next section.
We close this section by noting the paucity of the GOF tests for the im-
portant composite gamma distributional assumption and report that a Z test
based on the characterization of the gamma distribution stated in the previous
section is under development.
134 C. E. Marchetti and G. S. Mudholkar
4. Robust Tests. In her dissertation, Natarajan (1998) began with the IG-
GOF tests, then developed and studied IG analogs of the classical tests for
equality of variances due to Bartlett, Cochran, Hartley and others, and the IG
analogs of the order constrained versions of these tests due to Fujino (1979).
She also developed IG analogs of the robust tests for homogeneity of variances
in Mudholkar, McDermott, and Aumont (1993) and Mudholkar, McDermott,
and Mudholkar (1995). Also considered in her dissertation is an IG analog of
the transformation methods of Box and Cox (1964). The motivation for the
entire investigation was the similarity between the normal theory and the IG
theory originally stimulated by the GOF problem.
References
1. Barlow, R E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D.
(1971). Statistical inJerence under order restrictions, New York: John
Wiley & Sons.
4. Beirlant, J., Dudewicz, E. J., Gyorfi, 1., and van der Meulen, E. C.
(1997). Nonparametric entropy: An overview, International Journal oj
Mathematical and Statistical Sciences, 6, 17-39.
29. Hwang, T-Y. and Hu, C-Y. (1999). On a characterization of the gamma
distribution: The independence of the sample mean and the sample coef-
ficient of variation, Annals of the Institute of Statistical Mathematics, 51,
749-753.
31. Kagan, A. M., Linnik, Y. V., and Rao, B. (1973). Characterization Prob-
lems in Mathematical Statistics, New York: John Wiley & Sons.
35. Kullback, S. (1959). Information Theory and Statistics, p. 15, New York:
John Wiley & Sons.
57. Robertson, T., Wright, F. T., and Dykstra, R. L. (1988). Order Restricted
Statistical Inference, New York: John Wiley & Sons.
64. Soofi, E. S., Ebrahimi, N., and Habibullah, M. (1995). Information dis-
tinguishability with application to analysis of failure data, Journal of the
American Statistical Association, 90, 657-668.
67. Vasicek, O. (1976). A test for normality based on the sample entropy,
Journal of the Royal Statistical Society, Series B, 38, 54-59.
69. Wald, A. (1947). Sequential Analysis, New York: John Wiley & Sons.
142 C. E. Marchetti and G. S. Mudholkar
143
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
144 B. C. Arnold, R. J. Beaver, E. Castillo, and J. M. Sarabia
statistics with common parent distribution F (i.e. generalized ranked set sam-
ples [Kim and Arnold (1999)]. In both cases we will wish to test H : F = Fo.
It is natural to also consider the problem of testing a composite hypothesis
H : F E {F() : () E e} using record and ranked set data configurations. In such
a situation the first step will be to use the data to estimate ().
Simulation based power studies are provided for the proposed tests in the
simple hypothesis case. The major emphasis will be on the ranked set data
case. As we shall see in Section 11.2, record value data is readily analysed by
taking advantage of characteristic properties of record spacings for exponentially
distributed data.
(11.1)
and ask whether these are reasonably supposed to be uniform order statistics.
A Pearson-like goodness-of-fit statistic for this is of the form
(11.2)
the ratios ijfnj are not too extreme. In practice however, the nj's will be small.
If J is large a X) approximation may be adequate. If J is small then a more
accurate evaluation of the null distribution of T will be needed. Balanced rank
set samples are most commonly used. These consist of m independent replicates
of a complete set of n independent order statistics Xl:n, X2:n,' ", Xn:n where n
is small and m is usually not so small. Simulation based upper 90, 95 and 99th
percentiles of the statistic T for such balanced ranked set samples are provided
in Table 11.1 for an array of choices of values for m and n. These simulations
based on 200,000 replications for some representative choices of m and n can be
expected to provide two figure accuracy and often three figure accuracy. More
extensive tables will be published elsewhere.
The discrepancies between the percentiles displayed in Table 11.1 and the
corresponding X~n approximation can be quite large. Some representative com-
parisons are given in Table 11.2. Note that the percentage error is in the range
-7.8% to 6.3%.
It is evident from Table 11.2, that only for large values of mn (say mn > 100)
is it reasonable to use a X2 approximation for the 90th percentile of T. Even
larger values of mn are required if we wish to accurately approximate the 95th
and 99th percentiles. In general recourse should be made to the simulated
values in Table 11.1.
Of course, one could instead have transformed to get exponential order
statistics instead of uniform ones. Thus we might define
(11.3)
- LJ (Zi"n' -
T = J' J Mi··n·)
J' J
2
fa.··
2
"J' nJ·
(11.4)
i=l
where
and
Table 11.3 includes simulation based percentiles for the statistics T for bal-
anced ranked set samples for an array of choices of m and n. These simulations
are based on 200,000 replications for each choice of m and n. More extensive
tables will be published elsewhere.
Table 11.4 provides indications of the discrepancies between the entries of
Table 11.3 and the corresponding X~n approximation for the distribution of T.
Here too large values of mn are required if we wish to reliably approximate the
146 B. C. Arnold, R. J. Beaver, E. Castillo, and J. M. Sarabia
and
(nj - ij) Li.d. uniform (Yij:nj' 1) variates.
2 1 ~ 2i - 1 2 - 2
U =- + L..,,(-- - Y(i)) - N(Y - 0.5) (11.5)
12N i=l 2N
and the modified statistic is given by
Critical values for UJ.WD when N > 10 were supplied by Stephens. They are:
90th percentile = 0.152, 95th percentile = 0.187 and 99th percentile = 0.267.
150 B. C. Arnold, R. J. Beaver, E. Castillo, and J. M. Sarabia
For values of N :::; 10, Quesenberry and Miller (1977) provide simulated critical
values that differ only slightly from the values corresponding to the case N > 10.
Under the null hypothesis F = Fo, our augmented sample of Y's will be
distributed as a sample of size N from a uniform (0,1) distribution and so the
relevant critical value of U'kOD is the customary value for a random sample of
size N.
11.4 Power
In Section 11.3, we introduced 3 different tests for a goodness-of-fit of H : F =
Fo where Fo is a completely specified distribution. Critical values for the test
statistics T, i' and U'kOD' based on simulation studies, were provided. We
turn now to consider how well the tests perform. A priori it is not easy to
visualize which of the 3 tests will be best for particular situations. A small
power simulation study, reported in Section 11.3, indicated that sometimes T
is more powerful than i' and sometimes the situation is reversed.
Simulated power studies can provide some guidance in selection of a test
from the 3 available. They provide only limited information since the results
obtained may well be specific to the particular alternatives considered and the
particular sample sizes used, etc. The simulation studies to be reported in this
section are based on balanced rank set samples with a spectrum of choices of
values of m and n. In all cases a test of size .05 was used. The null hypothesis
was that F is a standard normal distribution. Four alternative hypotheses were
considered: Normal (0,4), Normal (2,1), Logistic (0,1) and Logistic (0,4).
The simulated power determinations are based on 10,000 replications for
T and i' and on 20,000 replications for U'kOD' The results for a selection of
values of m and n are displayed in Tables 11.5-11.7. More extensive tables will
be presented elsewhere.
Comparison of Tables 11.5-11.7 reveals that, almost uniformly over the
range of values of m and n considered, the i' test is more powerful than the T
test which itself is more powerful than the U'kOD test. It must be emphasized
that this may be specific to the choice of null hypothesis (normal) and the
choices of alternatives. We know that for a logistic null hypothesis, as reported
in Section 11.3, i' is not uniformly more powerful than T. More extensive
and detailed power simulations will be required to resolve the issue. For the
moment, however, for a standard normal null hypothesis the test based on i'
seems to be the one to choose.
The reader will have noticed from Tables 11.5-11.7 that none of the tests
is really able to distinguish standard normal data from standard logistic data.
This is especially true for the test based on T which actually appears to be
biased since the test of size .05 actually rejects normality less than 5% of the
Goodness-oE-Fit Tests 151
Table 11.5: Power of the T test of size .05 with a standard normal null hypoth-
esis
n m 95th Percentile N(O,l) N(0,4) N(2,1) L(O,l) L(0,4)
1 1 2.709000 0.049300 0.319200 0.514400 0.057200 0.296400
3 1 7.610900 0.055000 0.191400 0.925600 0.039900 0.156300
5 1 11.317700 0.055100 0.286700 0.999700 0.042000 0.242700
10 1 19.351601 0.049500 0.638600 1.000000 0.048200 0.491700
Table 11.6: Power of the T test of size .05 with a standard normal null hypoth-
esis
Table 11.7: Power of the UkOD test of size .05 with a stan-
dard normal null hypothesis
n m N(O,l) N(0,4) N(2,1) L(O,l) L(O,4)
1 1
3 1 0.04755 0.14545 0.71445 0.04755 0.11595
5 1 0.04970 0.21365 0.96800 0.05045 0.16185
10 1 0.04675 0.46550 1.00000 0.05090 0.34700
Initial data
Rank: 1 0.79 0.20 0.57 0.35 0.75
Rank: 2 1.45 0.97 0.97 0.98 1.50
Rank: 3 0.52 0.62 2.54 2.12 1.86
Rank: 1 -0.235722 -1.609440 -0.562119 -1.049820 -0.287682
Rank: 2 0.371564 -0.030459 -0.030459 -0.020203 0.405465
Rank: 3 -0.653926 -0.478036 0.932164 0.751416 0.620576
time when the data has a standard logistic distribution. The tests based on
T and UkOD do better but an embarrassingly low power is achieved for a
standard logistic alternative even for large values of m and n. It has been
observed by many authors that the normal and the logistic densities are not
easy to distinguish. The current study reinforces that observation.
and
k
a- = ~
~
c)·Xi J'· n'J (11.8)
i=l
Goodness-oi-Fit Tests 155
where
(11.9)
(11.10)
and
aj = E(Xij,nj)' (11.11)
These estimates for the data in Table 11.8 are:
fl = -0.125112; (j = 0.283775.
Step 2: We complete the sample by simulating the missing data using the
actual estimates of /-l and (J'. To this end we first transform the data to a uniform
sample using the cdf associated with these values of /-l and (J', we simulate the
uniform missing data, and finally, we return to our normal sample.
Step 3: We calculate x = 2::f:1 xdN and 82- 2::f:1 (Xi - x)2(N - 1) using the
actual completed sample.
Step 4: We simulate (J'[, an inverted gamma JG( Nil, S;) random value.
Step 5: We simulate /-li, a normal N(x, ~) random value.
Step 6: We repeat Steps 2 to 5 N1 + N2 times (in the example we have used
N1 = 500, N2 = 500).
Step 7: We disregard the first N1 iterations and then estimate the parameters
using
1 M+~ 1 M+~
fl =]V: :L /-li = -0.125202; 8- 2 =]V: :L (J'[ = 0.502888 .
2 i=Nl+l 2 i=Nl+l
Step 8: We complete the sample, as in Step 2, but using the estimates for /-l
and (J' from Step 7.
Step 9: We obtain an iid uniform sample transforming the sample using the
transformation
Xi - /-l)
Ui = <I> ( ---;;- ,
where <I>O is the cdf of the standard N(O, 1) distribution and simulating missing
uniform observations. This sample of size 45 after being sorted becomes:
0.00158157,0.0315106,0.0329857,0.0552611, 0.0624795, 0.0880001,
0.100829,0.101379,0.112136,0.121759,0.146543, 0.192474, 0.214823,
0.217738, 0.24146, 0.252666, 0.292152, 0.311323, 0.373312, 0.395422,
0.413025,0.448858,0.536143,0.539685,0.569218, 0.574718, 0.574718,
0.582695,0.599063,0.623866,0.624577, 0.771671,0.803022,0.838381,
0.841018,0.854342,0.895774,0.900308,0.930962, 0.942142, 0.954493,
0.959348, 0.980614, 0.982249, 0.993978
156 B. C. Arnold, R. J. Beaver, E. Castillo, and J. M. Sarabia
Step 10: We test the uniformity of this sample using the U 2 statistic given by
N {2''/, - 1
U
2 I
= -l?N +" - - - u(') }2 -
~?N 2
- 2
5) =.
N(u - 0. 0 141958
,
(11.12)
... i=l ,;,.J
where u(i) is the ith order statistic from the transformed sample of size N. The
value of the test statistic is modified as follows prior to entering the table of
critical values
2
UMOD = {2
U -
0.1 0.1}{
N + N2 0.8}
1 + N = 0.14227.
Observe that we get a value that is smaller than the critical value 0.187 at the
0.05 significance level. Thus, we cannot reject the assumption that the sample
comes from a lognormal population.
11.6 Remarks
(i) Only minor modifications of our ranked set procedures will be required
if more than one unit in each ranked set is measured, i.e. if some of the
Xij:nj'S are dependent, coming from the same sample.
References
1. Arnold, B. C., Balakrishnan, N., and Nagaraj a, R.N. (1998). Records,
New York: John Wiley & Sons.
Lynne Seymour
University of Georgia, Athens, Georgia
Abstract: We explore a model for social networks that may be viewed either
as an extension of logistic regression or as a Gibbs distribution on a com-
plete graph. The model was developed for data from a mental health service
system which includes a neighborhood structure on the clients in the system.
This neighborhood structure is used to develop a Markov chain Monte Carlo
goodness-of-fit test for the fitted model, with pleasing results.
12.1 Introduction
Researchers in the social sciences require an understanding of the social net-
work within which individuals act, as well as the individual interactions within
that network. In an attempt to capture the global and local interactions si-
multaneously, spatial models, in which the spatial adjacency matrix is replaced
by a matrix of social interdependencies, were considered [Doreian (1980, 1982,
1989)] with some success [e.g., Gould (1991)]. Another modeling effort looks
at log-linear models. These models, in which the social interdependency is the
observed random variable, have also been successful in modeling social networks
[Strauss and Ikeda (1990), Galaskiewicz and Wasserman (1993) and Wasserman
and Pattison (1996)]. Such logistic regression models are called Markov random
graphs in the social science literature.
In statistical image analysis, the Gibbs distribution - which was originally
introduced by Gibbs (1902) to model particle interactions in statistical mechan-
161
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
162 L. Seymour
The dependent variable, Yi, is a discrete measure of whether or not the ith
client received case management.
For this study, Yi is taken to be the simplest case - a symmetric binary
variable (-lor 1) - but in general Yi may be a multinomial response. For
example, the response of interest here is the number of case managers a client
has had. Ideally, the responses should be reflect whether client i has had no
case manager, has had one case manager, or has had more than one case man-
ager. Due to model complexity and the available data, however, Yi is herein
considered as a binary variable.
Initial considerations imply a model of the form
(12.1)
where
Xi = {I, XiI, ... ,Xi5}' is the vector of covariates for the ith client.
If there is no interdependence, then the model in (12.1) - the logistic regression
model - is adequate. However, an extension of the logistic regression model
which can account for client interdependence is required. The description of
and estimation strategies for such a model follows, the bulk of which may also
be found in Seymour et al. (2000).
A general solution to this problem was given by Besag (1974). Assuming
that there are no interactions of orders greater than two, and that the second-
order interactions are determined by the Wij'S, the general formula in Seymour
et al. (2000) yields
the strength of dependence in the random field; see, Seymour (2000) . If the
dependence is very weak, then a "small" sample size is 100 clients; the sample
size considered "small" increases as the strength of dependence increases.
A computational technique which circumvents the intractable likelihood
function in a more statistically satisfying way is the Markov chain Monte
Carlo (MCMC) approximation to the likelihood function derived by Geyer and
Thompson (1992). The value that maximizes that MCMC approximation with
respect to e is called a Monte Carlo MLE (MCMLE) and converges almost
surely to the true MLE as the length of the chain goes to infinity. In addition,
since this technique estimates the true log-likelihood function (to within a mul-
tiplicative constant), the approximation of standard errors using the observed
information matrix [approximated numerically via quasi-Newton methods; see
Georgii (1988)] is valid.
In principle, eo may be any value in the parameter space, but in practice,
it is known that the procedure works best if eo is not too far from the MLE.
For this purpose, the current demonstration uses the MLE under independence
as eo since it is easily implemented. In order to get values of eo which are
closer to the dependence MLE, subsequent values of the Monte Carlo MLE
are iteratively assigned to eo as the Monte Carlo procedure is run again. This
procedure was suggested by Geyer and Thompson (1992), and results obtained
using this procedure appear to be numerically stable.
All three models indicate that a client's sex and level of education (variables
1 and 3) are potentially significant predictors of whether the client changes case
management (males are more likely to change case management; the likelihood
of changing case management increases as educational level increases), whereas
a client's age, schizophrenia diagnosis, and marital status (variables 2, 4, and 5)
are nowhere near significant. An interesting phenomenon is that inclusion of the
W information forces the intercept to be zero in both the logistic( +) and Gibbs
regressions. Particularly for Gibbs regression, this makes some sense since an
intercept should depend on the neighboring responses of a given client. Using
Gibbs Regression 167
the negative log-likelihoods and the standard errors of the parameter estimates
as a guide, the Gibbs regression appears to contribute something significant
towards explaining the relationships involved.
In order to further assess the fit of the Gibbs regression, an MCMC version of
Pearson's goodness-of-fit statistic may be calculated for the "contingency table"
interaction profile in Table 12.1 (with lumping). In a traditional contingency
table setting, let c be the number of categories into which N responses uniquely
fall. Let Oi be the observed number of responses in category i E {I, ... , c},
N = 0 1 + ... + Oc. Let Ei be the expected number of responses in category
i E {I, ... , c}, under an assumed model. Then Pearson's goodness-of-fit statistic
is given by
ti=1
(Oi - Ei)2,
Ei
(12.4)
which has a X2 (c - 1) distribution under the null hypothesis that the assumed
model is the true model, assuming that the responses making up the contingency
table are independent. The following development is the first goodness-of-fit
test developed for Gibbs distributions.
In the current setting, the categories are the numbers of organizations shared
in common (as in Table 12.1), a response is whether both of a given pair of
clients have changed case management, the observed counts are given in the
168 L. Seymour
sixth column of Table 12.1, and the expected counts must be estimated via
MCMC methods. In addition, the responses are not independent. Hence, the
goodness-of-fit statistic (12.4) may not have a X2 distribution.
In order to evaluate (12.4), we first generate a Markov chain of social net-
works via the Metropolis algorithm [Metropolis et al. (1953)], using the can-
didate model with both the W matrix and the covariate information held con-
stant. "Expected" counts of shared positive responses for each value of Wij are
aggregated from the chain, and the statistic (12.4) is then calculated using the
number of shared positive responses in Table 12.1 as "observed" values. In a
traditional contingency table setting, since c = 3, the appropriate distribution
for this statistic is X2(2). In this situation, however, the MCMC Pearson sta-
tistic for the Gibbs regression model in Table 12.2 appears to be distributed
Gamma(a, ()) as in Table 12.3, where a and () depend on the length of the
Markov chain. (N. B. X2(2) =Gamma(l, 2).) We did not explore this distribu-
tion under the logistic or logistic( +) regression models.
10 0.8636 18.1036
100 0.8922 15.8216
1000 0.9004 14.9529
The MCMC Pearson statistics for each of the models are shown in Table
12.4; all used a Markov chain of length 1000. The expected counts and Pearson
statistics of the logistic and logistic( +) regressions were calculated simply for
comparison; in fact, it is expected that their MCMC Pearson statistics will be
distributed differently from that of the Gibbs regression. Nevertheless, note
that the expected counts match up best with the observed counts under the
Gibbs regression.
Table 12.5 gives the percentiles from the simulated distribution (sample size
of 5000) for the MCMC Pearson statistic under the chosen Gibbs regression for
the chain lengths shown in Table 12.5. One can easily determine that one
cannot reject the null hypothesis that the Gibbs regression fits this data.
12.4 Discussion
Though the Gibbs regression model may be very effective in modeling social
networks, there are some difficulties with the data and with the model.
The fit of a Gibbs regression will almost surely be improved by weighting
the individual organizations according to their expected impact upon the re-
sponse. Unfortunately, the data described herein gave no information about
the individual organizations within the service system.
There is an abundance of model diagnostic tools in the logistic regression
literature which may be extended to Gibbs regression, some ,of which were
used in Seymour et ai. (2000); however, there are no such diagnostic tools
in the Gibbs random field literature. For model selection, an ad hoc kind
of backwards selection from classical mUltiple regression was· used to choose
the models in Table 12.2; some other ad hoc selection criterion could easily
have been used. Again, there are numerous criteria from the logistic regression
literature that could be extended. In addition, there are two criteria from the
Gibbs-Markov random field literature that could be used for Gibbs regression:
one MCMC-based Bayesian information criterion [Seymour and Ji (1996)], and
one pseudolikelihood criterion [Ji and Seymour (1996)].
170 L. Seymour
References
1. Besag, J. E. (1974). Spatial interaction and the statistical analysis of lat-
tice systems (with discussion), Journal of Royal Statistical Society, Series
B, 36, 192-236.
4. Cressie, N. (1993). Statistics for Spatial Data, New York: John Wiley &
Sons.
13. Gould, R (1991). Multiple networks and mobilization in the Paris Com-
mune, 1871, American Sociological Review, 56, 716-29.
15. Lehman, A., Postrado, L., Roth, D., McNary, S., and Goldman, H. (1994).
An evaluation of continuity of care, case management, and client outcomes
in the Robert Wood Johnson Program on chronic mental illness, The
Milbank Quarterly, 72, 105-122.
17. Morrissey, J. P., Calloway, M., Bartko, W. T., Ridgley, S., Goldman, H.,
and Paulson, R 1. (1994). Local mental health authorities and service
system change: Evidence from the Robert Wood Johnson Foundation
Program on Chronic Mental Illness, The Milbank Quarterly, 72, 49-80.
21. Seymour, L. and Ji, C.(1996). Approximate Bayes model selection crite-
ria for Gibbs-Markov random fields, Journal of Statistical Planning and
Inference, 51, 75-97.
22. Seymour, L., Smith, R, Calloway, M., and Morrissey, J. P. (2000). Lattice
models for social networks with binary data, Technical Report 2000-24,
Department of Statistics, University of Georgia.
24. Wasserman, S. and Pattison, P. (1996). Logit models and logistic regres-
sions for social networks: 1. An introduction to Markov graphs and p*,
Psychometrika, 61, 401-425.
13
A CLT for the L_2 Norm of the Regression
Estimators Under Q- Mixing: Application to
G-O-F Tests
Cheikh A. T. Diack
University of Warwick, Coventry, UK
13.1 Introduction
The local and global properties of commonly used nonparametric estimators on
the basis of i.i.d. observations are now well known and allow powerful methods of
statistical inference such as goodness-of-fit tests. However, much less is known
in the case of dependent observations. Whereas there are many papers in
nonparametric curve estimation under mixing, only local properties are usually
established.
In this paper, we consider the problem of estimating a regression function
when the design points are nonrandom and the errors are dependent. We es-
timate the regression function using splines. The rate of convergence for such
estimators are derived by Burman (1991). Our objective is to obtain a global
measure of quality for the least squares spline as an estimate of the regression
function. Specifically, we derive the central limit theorem for the integrated
square error of the least squares splines estimator. We apply this new result to
validating an asymptotic goodness-of-fit test. We also discuss the consistency
of the proposed tests.
173
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
174 C. A. T. Diack
(13.1)
The design points {xd ~=l are deterministic. Without loss of generality, we
assume that Xi E [0,1]. We also assume that {Zk, k E Z} is the two-sided
moving average
+00
Zt = 'l/JjXt-j, L: (13.2)
j=-oo
where X t "-' II D (0, (J2) and the sequence {'l/Jj} is absolute summable. Let
(13.3)
be its covariance sequence. Let (J (Zi' i<O) and (J (Zi' i 2: j) be the (J-fields
generated by {Zi' i < O} and {Zi' i 2: j}, respectively. We assume that the
sequence {Zk,k E Z} is a-mixing, that is:
We assume that the spectral density of Z is bounded away from zero and infinity.
13.2 Estimators
To estimate the function g, we use a least squares spline estimator. Let 'TIo =
o< 'TIl < .. , < 'TIk+1 = 1 be a subdivision of the interval [0,1] by k distinct
points. We define S(k, d) as the collection of all polynomial splines of order d
(Le., degree < d -1) having a sequence of knots 'r/l < ... < 'TIk. The class S(k, d)
of such splines is a linear space of functions with dimension (k + d). A basis
for this linear space is provided by the B-splines [see Schumaker (1981)]. Let
{Nl, .. , Nk+d} denote the set of normalized B-splines. The least squares spline
estimator of 9 is defined by
k+d
flex) = L:0pNp(x) ,
p=l
Regression Splines for G-O-F 175
where
(13.4)
We need to specify some conditions. For any two sequences of positive real
numbers {an} and {b n }, we write an r-v bn to mean that an/bn stays bounded
between two positive constants.
We assume that the sequence of knots is generated by p (x), a positive
continuous density on [O~ 1] such that
l° '1)i
p (x) dx
i
= -k- , i = 0, ... , k + 1.
+1
We set 15k = maxO:Si:Sk (r]i+l - rli) , then it is easy to see that
,
Uk r-v
k- 1 • (13.5)
We assume that
sup IHn(x)-H(x)l=o(k- 1 ) (13.6)
XE[O,l]
where Hn (x) is the empirical distribution function of {Xi}r=l and H (x) is the
limit distribution with positive density h (x) .
We denote the n x n matrices with (i, j)th element f ij = rli-jl and fij = ri+j
by f and f+ respectively. We assume that the spectral density of {Zk' k E Z}
is bounded away from zero and infinity. A classical result on Toeplitz matrices
[see Grenander and Szego (1984)] proves that 27rAminf and 27rA max f (where
Aminf and Amaxf are the smallest and the largest eigenvalues of f respectively)
converge, respectively, to the minimum and the maximum of the spectral density
of Z. Hence, the assumption on the spectral density of {Zk' k E Z} guarantees
that the eigenvalues of f are bounded away from zero and infinity.
Let 7rpq be the (p, q)th element of the n x n matrix F' M;;l MhM;;l F and
(13.9)
We set A = (7r1' ... , 7rn -d' .
Theorem 13.3.1 Suppose that L;=' a~-2/E < 00 and E /Zl/ 2E < 00 for some
E> 2. Assume that {13.5} and {13.6} hold and limn->oop~ < 1. We also assume
that {Zt} is the two-sided moving average
+00
Zt = L 'l/JjXt-j,
j=-oo
where X t rv lID (0, ()2) and Lt~oo /j'l/Jj/ < +00. Then, if k o (n) and
EXt = rw 4 < 00,
nT - ~tr (F' M;;l Mh M;; 1 Ff) - ~ J {g(d) (x) jpd (x)} 2 h (x) dx
Actually, we believe that Theorem 13.3.1 is new even for the case of uncor-
related errors when the variance is 2r5 L;~J 7r~ + (fJ - 1) 71"515.
Regression Splines for G-O-F 177
13.4 Inference
In this section we use Theorem 13.3.1 to construct consistent nonparametric
tests. We prove that the tests have asymptotic powers for some local alterna-
tives.
Goodness-of-fit tests
The null hypothesis is that Ho : g = go. Against an unrestricted alternative,
it is natural to use the L2 distance between the estimator 9 and go. Therefore,
the statistic of the test is given by
Using Theorem 13.3.1, we see that the null hypotheses can be rejected at as-
ymptotic level a if
r;;
nT2:qQvV+~tr ' -1 Mh M n-1)
1 (FMn Fr nB2d
+ (2d)!k 2d
J go (x) }2 h(x)dx
{(d)
(13.11)
where
Specification test
Under some assumptions, the same cutoff point for the goodness-of-fit test
may be used for testing composite hypotheses of the form Ho : g = go (., (3)
where {3 E e, is an unknown parameter. However, we must use the statistic
T by substituting an estimate ~ for the unknown parameter {3. We need the
following assumption:
Under some mild regularity conditions, estimators such as the least squares,
generalized method for moments or the adaptive efficient weighted estimators
satisfy the required assumption. Hence, the specification test has the same
properties as the goodness-of-fit test.
178 C. A. T. Diack
Asymptotic power
To make a local power calculation for the tests described above, we need to con-
sider the behavior of different statistics (calculated under a fixed but unknown
point go E Ho) for a sequence of alternatives of the form
Discussion
We have proposed an asymptotic goodness-of-fit test and a specification test
based on least squares spline estimators. The tests are consistent and have
power against some local alternatives.
In applications, the covariance matrix r is unknown. Therefore, we must
estimate it. The estimators which we shall use for Ir, r 2:: are °
n-h
:Yr = ~ L (Yi - Y) (Yi+r - Y) ,r = 0, ... , n - 1,
n i=l
where Y is the sample mean. The estimators :Yr, r = 0, ... , n - 1, have the
r
desirable property that for each n 2:: 1, the matrix with elements rij = :Yli-jl,
is non-negative definite [ef. Brockwell and Davis (1991)]. However, plug-in r
in order to estimate the variance does not guarantee that we have a consistent
estimator. This is an open problem which is under study.
Using regression splines may be advantageous when we want to impose
properties such as monotonicity and/or convexity. We could then test the
shape of a regression function by using the functional T defined in (13.7) when
we substitute an constrained estimator for g. This problem is also under study.
13.5 Proofs
PROOF OF THEOREM 13.3.1. We can write T = Tl + T2 + T3 where
and
T3 = J {g (x) -lEg (x)} {lEg (x) - 9 (x)} h (x) dx.
B 2d
T2~(2d)!k2d J{ 9 (4) (x)/p(x) d}2 h(x)dx. (13.15)
On the other hand, ET3 = 0 and var(T3) = 0 (var (T)). Therefore, to prove
Theorem 3, it is enough to prove that
where U = 1
~
tr(F' M- 1
h M-
n M(j) n Fr) and
1
var (T1 ) =
1
2"
n
L L 1Tp1TqCOV (ZoZp, ZoZq) .
Ipl<n Iql<n
This can be rewritten in the following form
L
00
It follows that
var (nTl) = L L
Ipl<n Iql<n
7rp7rq {('T7 - 3) 0- 4 f:
i=-oo
'I/J;'l/Ji+p'l/Ji+q + 'Yo'Yp-q + 'YP'Yq} .
(13.17)
One can show easily the following equality
(13.18)
where m;s is the (r, s )th element of the matrix M;;l MhM;;l. Using equation
6.22 in Agarwal and Studden (1980) and Lemma 6.3 in Zhou, Shen, and Wolf
(1998), we see that Im;sl = 0 (kv 1r - sl ) with v E" (0,1) . Therefore, it is easy to
see that the second term of the right hand side of (13.18) is 0 (kp/n) . Besides,
A classical result on B-splines proves that sUPx IN' (x)1 = 0 (k). Moreover,
using (13.6), we see that IXp+q - Xql = 0 (k-lp) . Finally, we obtain
L L 7rp7rq L 'I/J;'l/Ji+p'I/Ji+q
Ipl<n Iql<n i=-oo
00
7r5 L L L 'I/J;'l/Ji+P'l/Ji+q
Ipl<n Iql<n i=-oo
00
Hence we have
L
00
(13.21)
Next, we show that TI is Gaussian. But we first show that var (nTI) rv k 2 .
Using Lemma 6.5 in Zhou, Shen, and Wolf (1998), we see that we just need to
show that IIAI12 rv k 2 • We have
n-l
IIAI12 = L 7r~,
p=o
and
n-p
< C8 k '
---:;;: " N r (Xq) Ns (x p+q ) Y Ir-sl
" '~
~
q=l r,s
Noting that N r (Xq) = 0 when Xq ~ (tr, tr+d) and since IX p+q - xql = cnk-Ip,
we have
cgk (n - p) (k-1)P
l7rpl < ---:;;: yen .
with
and
Using Lemma 4.1 in Burman (1991) and since l7rpp l < k 2 we have for some
E>2
clOk2
var (T1,1 ) -< n 4
L
1-2/t < clOk2
0: 1p-q 1 - n3
L O:p
1-2/t -
- 0
(k2)
.
~q p
(13.22)
where ()~ = var (i:P;;fq ZpZq -lEZpZq) . Eqs. (13.22) and (13.23) are trivial.
Reasoning as above one can show that ()~ 2:: n 3 hence, (13.24) follows easily
and Theorem 13.3.1 is proven. •
PROOF OF THEOREM 13.4.1. The theorem follows quite readily from Theorem
13.3.1. •
Acknowledgement. This research was carried out while the author was a
Fellow of EURANDOM in Eindhoven, The Netherlands.
References
1. Agarwal, G. G. and Studden, W. J. (1980). Asymptotic integrated mean
square error using least squares and bias minimizing splines, The Annals
of Statistics, 8, 1307-1325.
7. Zhou, X., Shen, S., and Wolf, D. A. (1998). Local asymptotics for regres-
sion splines and confidence regions, The Annals of Statistics, 26, 1760-
1782.
14
Testing the Goodness-of-Fit of a Linear Model in
N onparametric Regression
14.1 Introduction
We consider the following regression model
where f is an unknown real function, defined on the interval [0, 1] and h = a <
t2 < ... < tn = 1, is a fixed sampling on [0,1]. The errors Ei are independent
and identically distributed random variables with zero mean and variance (J'2.
Our aim is to construct a linear hypothesis test on the regression function
f. Let 91(t), ... , 9p(t) be linearly independent functions in [0,1] and let Up be
the vector space spanned by gl, ... , gp ; we want to test the hypothesis
185
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
186 Z. Mohdeb and A. Mokkadem
literature is based on the spline method: Cox and Koh (1989), Eubank and
Spiegelman (1990) derived goodness-of-fit tests for a linear model based on test
statistics constructed from nonparametric regression fits to residuals from lin-
ear regression; Jayasuriya (1996) proposed a test based on nonparametric fits
to the residuals from kth order polynomial regression. A method based on
a kernel estimate for 1 is proposed by Mi.iller (1992). HardIe and Mammen
(1993) suggested a test statistic based on a weighted L 2 -distance between the
parametric fit and a nonparametric fit based on a kernel estimator. The use of
empirical Fourier coefficients of 1 to construct a hypothesis test in the model
(14.1) is developed by Eubank and Spiegelman (1990) and Eubank and Hart
(1992).
In order to test the hypothesis Ho in (14.2), Dette and Munk (1998) pro-
posed a procedure based on the large sample behaviour of an empirical L2_
distance between 1 and the subspace Up. Their test statistic makes use of
weights. In the present paper, we propose a test statistic based on a simi-
lar approach but without weight and show that it has the same asymptotic
behaviour.
The remainder of this paper is organized as follows. In Section 14.2 a test
is derived for Ho and its asymptotic distribution under the null and alternative
hypotheses is given. In Section 14.3 we employ insights from Section 14.2 to
derive a practical test for Ho using Monte Carlo techniques for small samples.
(A2) h, 1, gk, ,k = 1, ... ,p, satisfy the Holder condition with order I> 1/2;
(A3) Vn, el, ... , En are independent and :JC E lR,+ such that E(E[) < C, Vi.
Set
(14.3)
Since the hypothesis "1 E Up" is equivalent to "cjJ = 1 - L~=l akgk E Up," as
a measure of discrepancy between the regression function and the subspace Up,
Testing the Goodness-ai-Fit 187
(14.4)
where G(VI,'" ,Vk) denotes the Gramian determinant I((Vi, Vj)) li,j=l, ... ,k for
VI,"" vk in L 2 (dJ..l).
We thus need to estimate r2(¢); for this, we introduce the observations
X = (Xl"'" X n )', where
P
Xi = Yi - 2: akgk(ti) , i = 1, ... ,n.
k=l
We follow the procedure of Dette and Munk (1998), but applied to ¢ and
Xi, and without use of weights. Let Ll i = ti - ti-l (i = 2, ... , n), fl.l =
fl. 2 , W = diag( fl.ih(ti))._ and gk,n = (9k(t l ), ... , gk(tn) )', k = 1, ... , n.
2-1, ... ,n
Let Up,n be the vector subspace of IRn spanned by (gl,n," ., gp,n) and II~ the
projection matrix on the orthogonal of Up,n.
We define Gn(X,gl, .. ' ,gp) as the determinant obtained by replacing in
(14.4) the inner products (¢, ¢) and (¢,9k), (k = 1, ... ,p) respectively by
n
X'WX = 2: fl.ih(ti)Xl
i=l
and n
X'W gk,n = 2: fl.ih(ti)9k(ti)Xi k = 1, ... ,p.
i=l
Hence, we estimate r2 (¢) by
(14.5)
where S;
is the estimator of Gasser, Sroka, and Jennen-Steinmetz (1986), de-
fined by
t;
1 n-l
S; = 6(n _ 2) (Yi+l + Yi-l - 2Yi)2 . (14.6)
(14.7)
where r}i = r}i,n, i = 1, ... , n are random variables which form a centered row-
wise 2-dependent array.
We show that
lim Var (
n->oo
vn"
In
n~
r}i
) 17 4
= _cr
9
+ 4cr 2T2(j) .
t=2
fi = ~ t
n i=1
{Yi - t k=1
Zik9k(td}2 , (14.8)
14.3 Simulations
In order to investigate both the power and level of our test (14.2), we conducted
a small-scale simulation using the model Yi = !(ti) + Ci, i = 1, ... ,n (n = 50)
with t = (i - 1) I (n - 1), i = 1, ... , n and the uncorrelated normal random
errors with variance (72. In our simulations, we study the test of the hypothesis
Ho: ! E U2, where U2 is the subspace of L2(dp,) spanned by gl(t) = t and
g2(t) = 1. The Monte Carlo study, for small sample size n = 50, turns on
the comparison of the statistic T; and the statistic M~ proposed by Dette and
Munk (1998). The empirical quantiles, denoted by QT and QDM of the test
statistics T and M~ respectively, are given in Table 14.1 and Table 14.2. As
shown in Table 14.1, use of the 8 2 =S; estimator of Gasser, Sroka, and Jennen-
Steinmetz (1986), in the statistics reveals that the normal approximation is not
satisfactory. In Table 14.2, 8 2 is replaced by &2 = (lin) 2:i=lIYi - alti - a212;
it appears, that the normal law is better approximated with T; than M~.
In order to study the power, we consider two different forms for j, namely
fr(t) = -1.5t + 0.5 + ,8te- 2t and h(t) = -1.5t + 0.5 + ,8t2 with several choices
o.n the interval [0,2] for ,8. The empirical powers of T and M~ are denoted by
PT and PDM. As shown in Table 14.3 and Table 14.4, where the level is taken
at a = 5%, the results of the empirical power study give a little advantage to
the statistic T;. We note also that the test has good power properties.
190 z. Mohdeb and A. Mokkadem
a= 1% a=5% a= 10%
a= 1% a=5% a= 10%
Table 14.3: Proportion of rejections in 1000 samples of size n = 50; with two
examples of alternatives: fI(t) = alt + a2 + /3te- 2t and h(t) = alt + a2 + /3t 2,
((/2 estimated by S;)
fI(t) h(t)
/3 PT PDM PT PDM
Table 14.4: Proportion of rejections in 1000 samples of size n = 50; with two
examples of alternatives: ft (t) = alt + a2 + (3te- 2t and f2(t) = alt + a2 + (3t 2 ,
(0"2 estimated by 82 ) ,
ft (t) f2(t)
(3 PT PVM PT PVM
References
1. Cox, D. and Koh, E. (1989), A smoothing spline based test of model
adequacy in polynomial regression, Annals of the Institute of Statistical
Mathematics, 41, 383-400.
15.1 Introduction
We consider the regression model
(15.1)
195
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
196 Y. Baraud, S. Huet, and B. Laurent
belonging to ]0, 1[ and consider the family of Fisher tests of level am of the null
hypothesis F E V* against the alternative F E V* + 8~. Our procedure rejects
the null hypothesis if one of these Fisher tests does.
Let us give some examples. If F is defined on I = [0,1] one can take for
mEN, 8~ as the space of trigonometric polynomials of degree not larger
than m, the space of piecewise polynomials with fixed degree based on the
grid {kim, k = 0, ... ,m} or the space spanned by the Haar basis, {'Pj,k, j E
{O, ... ,m}, k E {I, ... , 2j }}. Moreover, it is possible to mix several kinds of these
spaces to constitute the collection {8~, mE M}. If F is defined on I = [0,1]2,
one can take for example the space of constant functions on a finite partition
of [0,1]2 in rectangles.
Under the Gaussian assumption on the errors, the results given in this paper
are nonasymptotic. For each n, the test has the desired level and we charac-
terize a set of functions over which our test is powerful. Under the posterior
assumption that F belongs to the Holderian ball
Hs(R) = {F: [0, 1] ~ R, \f(x,y) E [0,1]2, IF(x) - F(y)1 ~ Rlx - yls}, (15.2)
°
for some R> and s E]O, 1], we show that the test is rate optimal among the
adaptive tests (that is among the tests which have no prior knowledge of R nor
s). Such a result has been obtained by Horowitz and Spokoiny (2000) for a
different procedure that will be described in Section 15.4. In addition our test
recovers the parametric rate of testing over directional alternatives. A similar
result has been obtained by Eubank and Hart (1992).
Finally, we present a simulation study to evaluate the performances of our
testing procedure both when the errors are Gaussian and non-Gaussian.
(15.3)
where for each u E]O, 1[, Pi} N (u) is the 1 - u quantile of a Fisher random
variable with Dm and N m de'gr~es of freedom and where {am, m EM} is a
collection of numbers in ]0, 1[ satisfying
\Ve reject the null hypothesis when Ta is positive. In the sequel, we choose the
collection {am, m E M} in accordance with one of the following procedures:
~a. (15.5)
L am=a.
mEM
Comments: For each m EM, the Fisher t,est of level am of the hypothesis
F E V* against F ¢ V*, F E V* + 8:n rejects the null hypothesis if the test
statistic
(15.7)
where for each u E]O, 1[, XD~ (u) denotes the 1 - u quantile ofaX2 random
variable with Dm degrees of freedom (note that in this case the assumption
that N m f- 0 can be relaxed). For a complete proof of those results in the case
of an unknown variance, i.e. for the test statistic Tn, we refer the reader to the
paper by Baraud, Huet, and Laurent (2000).
Theorem 15.3.1 Let a and f3 be two numbers in ]0,1[. Let To; be the test
statistic defined by (15.7). Let
Comment: The definition of Fn(f3) given in (15.8) shows that we would take
advantage of a collection of spaces {S~,m E M} containing spaces with good
approximation properties, that is spaces S~'s such that both d~(F, S~) and the
dimension of S~ are small.
• Let
Ml = {k E {I, ... , n}, kE {2 j , j ~ O} U {n} } .
We set a(k,l) = a/(4IMll) if k t=
nand a(n,l) = a/4. Let SCk,l) be the
linear subspace of F ([0, 1], R) spanned by the piecewise constant functions
(Il~,il' j = 1, ... , k) if k t= nand SCn,l) = F([O, 1], R) .
• Let (cPj)f2l beaHilbertbasisofF([O, I],R). For each k E M2 = {I, ... ,n}
we define a(k,2) ,= 3a/(1f 2 k 2 ) and SCk,2) as the linear space spanned by
the function cPk.'
In the sequel, we set M = {(k, 1), k E Md U {(k, 2), k E M2} and consider
the collections {S~, mE M} and {am, mE M}. Note that LmEM am :::; a,
and therefore the inequality (15.4) holds.
200 Y. Baraud, S. Huet, and B. Laurent
Corollary 15.3.1 Let 0: and (3 be two numbers in ]0,1[. Let t:r be the test
statistic defined by (15.7), then the following holds:
(i) Let us denote by Ln the quantity log log(n), and assume that R2 ~
vr:;a-
n
2
°
There exists a constant C(o:, (3) depending only on 0: and (3 such that for
all s E]O, 1], for all R > and for all FE Hs(R) such that
d;(F, V*)
(ii) Assume that F E L2([0, 1], dx) and that the Xi'S satisfy for all k ~ 1,
If for some ko ~ 1,
15.4 Simulations
We carry out a simulation study in order to evaluate the performances of our
procedure both when the errors are normally distributed and when they are not,
and to compare its performances with the testing procedure proposed recently
by Horowitz and Spokoiny (2000). This section is organized as follows: we
present the simulation experiment, then we describe how our testing procedure
is performed. Finally we give the results of the simulation study.
Fo(x) l+x
F,.(x) 1+x + ~¢ ( ~) with T = 0.25 and T = 1
where ¢ is the density of a standard Gaussian variable. When '( = 0.25 the
regression function FT presents a peak, when '( = 1, FT presents a small bump.
We will test the linearity of the function F at level a = 5%.
The number of observations n = 250. The· Xi'S are simulated once and for
all, as centered Gaussian variables with variance equal to 25 and are constrained
to lie in the interval [<p- 1 (0.05), <p- 1 (0.95)]. where <p is the distribution function
of a standard Gaussian variable.
202 Y. Baraud, S. Huet, and B. Laurent
The collection {S;;', mE M}. We consider the spaces S;;' based on piecewise
functions and trigonometric polynomials. More precisely, for each k = 1, ... n,
we consider the spaces S(k,l) of regular histograms based on the regular grid
{l / k, l = 0, ... k} and S(k,2) the space of trigonometric polynomials of degree
not larger than k. For each 6 E {1,2}, and for each k = 1, ... n let us recall
than S(k,8) is the linear subspace of R n defined as the orthogonal projection
of {(F(XI), ... , F(xn))T, FE S(k,8)} onto V..l, where V is the space of R n with
dimension d = 2 spanned by the vectors (1, ... , l)T and (Xl, ... ,Xn)T. Now let
us set
M = {( k, 6) E {I, ... , n} x {I, 2}, dime S(k,8)) E { {2 j , j :::: O} n {I, ... , n /2} } .
They proposed an adaptive testing procedure for testing that the regression
function belongs to some parametric family of functions. Their procedure re-
jects the null hypothesis if for some bandwidth among a grid, the distance
between the nonparametric kernel estimator and the kernel smoothed paramet-
ric estimator of F under the null hypothesis is large. The quantiles of their test
statistic are estimated by a bootstrap method.
The results of the simulation experiment are reported in Table 15.1, under
the column HS-test. They used 1000 simulations for estimating the level of the
test and 250 simulations for estimating the power.
Test of Linear Hypothesis 203
15.5 Proofs
15.5.1 Proof of Theorem 15.3.1
For the sake of simplicity and to keep our formulae as short as possible we
assume that (j2 = 1. By definition of t~, for any F E F(R k , R),
(15.9)
then P p(Ta ::; 0) ::; (3. We shall use the two following inequalities, respectively
due to Laurent and Massart (2000) and to Birge (2000). For all u EjO, 1[:
Sincey'u + .jV ::; V2vu + v for any positive numbers u, v, inequality (15.9)
holds as soon as
For any linear space W eRn, we denote by IIw the orthogonal projector onto
W. Using the fact that Sm C Vl-, by the Pythagorean inequality, IIIImfl1 2 =
Ilf-IIvfI12-IIIIV.Lf-IImfI12. Noting that IIIIV.Lf-IImfI12 = Ilf-IIv+smfI12,
we get IIIImfl1 2In = d~(F, V*) - d~(F, V* + S:n), which concludes the proof of
Theorem 15.3.1.
It follows from Theorem 15.3.1 that P FCi'e" > 0) ~ 1 - (3 for all F such that
d~(F, V*) ~ p~(F). Let us therefore give an upper bound for p;(F).
Note that since S(n,l) = .1'([0,1], R), we have d;(F, S(n,I») = 0 and since
D (n,1) ::; n (we do not assume the design points to be distinct) and a (n, I) = a I 4
we have
(15.12)
Noting that d;(F, V* +S(k,I») ::; d;(F, S(k,I»)' the statement of the first part
of the corollary follows from the two following inequalities: for all F E Hs(R)
and for all k E MI,
The inequality (15.14) follows easily by noting that for all k E MI,
and D(k,l) = k.
Therefore, we have
(15.15)
(15.16)
(15.17)
(15.18)
Using that 2P(X > u) ~ exp( -u 2 /2) for all u > 0, where X rv N(O, 1), it is
easy to show that for all t EJO, IJ
XII(t) ~ -2Iog(t).
In the same way using that for all /-l E Rand 0 < u < /-l,
we get
•
References
1. Baraud, Y. (2000). Non asymptotic minimax rates of testing in signal
detection, Technical Report 00.25, Ecole Normale Superieure, Paris.
2. Baraud, Y., Huet, S., and Laurent, B. (2000). Adaptive tests of linear
hypotheses by model selection, Technical Report 99-13, Ecole Normale
Superieure, Paris.
Test of Linear Hypothesis 207
Odile Pons
INRA Biometrie, Jouy-en-Josas, France
16.1 Introduction
The distribution of a survival time TO conditionally on a vector Z of explanatory
variables or processes is characterized by the hazard function
1
>..(t I Z) = ~ttO
lim -;:-Pr(t < TO < t + ~tITO > t, Z(s), 0 < s < t).
ut - - - -
Cox's model (1972) is widely used for the analysis of censored survival data
under controlled experimental conditions, it expresses the hazard function of
TO in the form>..(t I Z) = >..(t)ef3TZ (t) where (3 is a vector of unknown regression
parameters and>" is an unknown non-parametric baseline hazard function. If
the survival time is right censored at a random time C, the observations are
the censored time T = TO /\ C, the indicator 6 = 1{TO::;C} and the covariate Z.
The regression parameter is estimated by the value that maximizes the "partial
likelihood" [Cox (1972)] and an estimator of the cumulative hazard function
JJ
A(t) = >..(s) ds was defined by Breslow (1972).
211
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
212 0. Pons
These models have been discussed and studied in the literature by several au-
thors, in particular by Keiding (1991) for ( 16.1), by Brown (1975) and Zucker
and Karr (1990) for time-varying coefficients and by Hastie and Tibshirani
(1993) for (16.2). In Pons and Visser (2000) and Pons (1999), I proposed new
estimators of the re~ession parameters and the cumulative baseline hazard
function, and I studied their asymptotic properties. They are based on the
local likelihood estimation method introduced by Tibshirani and Hastie (1987)
for non-parametric regression models and adapted with kernels by Staniswalis
(1989). They are defined using a kernel K that is a continuous and symmetric
density with support [-1,1] and Khn(S) = h:;;lK(h:;;ls), where the bandwidth
h n tends to zero at a convenient rate. The asymptotic properties of the es-
timators fin and An of (3 and of the cumulative baseline hazard function will
be presented for both models. They follow the classical lines but the kernel
estimation requires modifications. In model (16.1), the convergence rate of fin
is not modified by the kernel smoothing, it is n- 1 / 2 as in the Cox model and
An converges at the non-parametric rate for a kernel estimator as expected. In
model (16.2), An only involves kernel terms through the regression functions
but both fin and An have the same non-parametric rate of convergence as it was
also the case for splines estimators in Zucker and Karr (1990). Goodness-of-fit
test statistics for the classical Cox model against the alternatives of models
(16.1) or (16.2) are deduced from these results.
where S~O) (x; S, j3) = n- 1 Lj Kh n (s - Sj)Yj-(x) exp{j3T Zj(Sj + x)} and Yi(x) =
l{T?!\Ci~Si+x} = l{Xi~x} with Xi = Xi 1\ (Ci - Si). The estimator 7Jn of the
regression coefficient maximizes the following partial likelihood
where cn(s) = l[h n ,T-hn ] (s). In this section, we shall use the following notations
and assumptions. We assume that under Po, S has a density is such that for
some ry > 0, the supports of the distributions of Sand C contain [-ry, T + ry]
and we define I T,77 = {(s,x): x E [O,T],S E [-ry,T+ry],S+X E [-ry,T+ry]}. For
a vector z in lRP, let z00 = 1, z0 1 = z and z0 2 = zzT. The norms in lRP and
in (lRP)02 are denoted II . II and the uniform norm of a function or a process
on a set J is denoted II . IIJ. For k = 0,1,2, (s, x) in In,T and under the next
conditions let
variable and if there exist conditional densities fXls,z and gCls,z for X and C
and a constant M2 such that
The weak consistency of i3n and the asymptotic normality of n 1 / 2 (i3n - /30)
are established by the classical arguments in maximum likelihood estimation.
For large n, an expansion of the score process gives
sup
11,6-,60 II sc:
IIIn(/3) - 1(/30)11 ~ °
as E ---t °
and n ---t 00,
is approximated using a statistic of the form Lih i.pn (~i' (j) with ~i = (Si, Xi, 8i)
and(j = (Sj,Xjl\(Cj-Sj),Zj). Denoting1/Jn((~i'(i),(~j,(j)) = Hi.pn(~i,(j)+
i.pn(~j, (i)}, we obtain a U-statistic of order 2, Lih 1/Jn( (~i, (i), (~j, (j)), and the
weak convergence of n 1/ 2 Un follows from a Hoeffding decomposition [Serfling
(1980)].
For fixed s in [hn, T - hn], let D([O, T - s]) be the space of right contin-
uous real functions with left-hand limits on [0, T - s]. In Pons and Visser
(2000), conditions are given for weak convergence of the process Ln (·; s) =
(nh n )1/2(An,XIS - Ao,xIS)(·; s) in the Skorohod topology on D[O, T - s]. Its limit
is a continuous Gaussian process L with mean zero and covariance (J(x ' 1\ x; s)
at (x; s) and (x'; s) in In,r, where
(16.3)
(16.4 )
216 O. Pons
lE{ a;; (s, x )a;; (s', x')} H(s 1\ s', x 1\ x') - H(s, x)H(s', x'),
lE{ a~O) (s, x )a~O) (s', x')} lE[l I1 ~ e(3'{; {Zi(S+X)+Zi(S'+X')}j
{Si:SsI\S} {Xi~XVX'}
- W(O)(s,x)W(O)(s',x'),
lE{ ''''(O)(
U- n S, X
)
s, x ')}
anH(' lE[l {Si:Ssl\s'} 1{x:sxi:Sx'} e(3'{; Zi(S+X)j
Lln(X; s) = (nh n )1/2 fox {S~O)-I(y; S, fjn) - S~O)-l(y; s)} Hln(dy; s),
- fox s(0)-2(y;s)a~0)(y;s)Hl(dy;s)
with an(x;s) = f; Khn(S - u)an(x,du) for (s,x) E In,T. An integration by
parts entails
a~O)(s,x) = L 6Z {fo
ef3
T vn(s,dt,dz) - vn(s,x-,dz)}
and
a;; (s, x) = L v;(s, x, dz),
218 O. Pons
Tests for Ho can be based on the difference between the estimated hazard
functions under the hypothesis and under the alternative, i.e. on the process
D 1n defined on In,T by
°
based on a discretization of D1n on a finite grid: Let (Xj)oS:jS:J be an increasing
sequence of [0, r] with Xo = and let V1n (s) be the vector of dimension J with
components the variables
or
and therefore to zero if Is-s/l > 2h n since K is zero outside [-1,1]. Then under
C1-C5, the statistic V; A~l Vn has an asymptotic X2 distribution with I J de-
grees offreedom under Ho and it tends in probability to infinity ifAxls(Xj; Si) =f
Ax(xj) for some i :::; I and j :::; J.
every x in Jx. In Pons (1999), the estimator i3n(x) is defined as the value of (3
which maximizes
where Yi(t) = l{T;~t} is the risk indicator for individual i at t, and an estimator
of the integrated baseline hazard function follows,
where (3 is a function satisfying the conditions below. For (3 E IRP, x E Jx, let
Now, we denote Un(j3, x) and -In ((3, x) the first two derivatives of n-IZn,x((3)
with respect to /3, and we simply write
By classical arguments, 73n (x) and In (/3, x) converge in probability to /30 (x)
and 1(/3, x) for any x in Jx and /3 in B, and this point-wise weak consistency
is extended as a uniform convergence under 05: The variables
where
Vi,n,i(X)
V2,n,i(X)
Let W(k) = IEWi(k) , a~k) = n 1 / 2 (n- 1 L:i WiCk) - W(k»), k = 1, ... ,4, and an =
(a~l), ... ,a~4»)T. For every n the variables an(t, x) and an(t', x') have the same
covariance matrix ~a(t, t', x, x') which is again of the form
and the proof is similar to the convergence of the process L 2n for Theorem
16.2.2.
The asymptotic behavior of the process (An - Ao) relies on an expansion of
S~O) (jjn) for jjn close to (30, and therefore on the behavior of S~O) (jjn) - S~O) ((30).
From a development of this sum and due to the convergence rate (nh n )-1/2 of
the estimators jjn(Xi), the process
A goodness-of-fit test of a Cox model for the survival time TO against the
alternative of a model (16.2) where the regression coefficients vary with the
values of the variable X on Jx is a test for the hypothesis Ho : (3(x) = (3 E B
°
for every x in J x. Tests for H can be based on the process
224 o. Pons
x in J x, where 7Jn,o is the Cox estimator of the regression parameter (30 under
the null hypothesis. Under H o, 7Jn,o converges to (30 at the rate n- 1/ 2 and
Under C1-C5 and under the conditions of Theorem 16.3.3, it converges weakly
under Ho to a Gaussian process with mean zero and variance Ir;l(x) and
IID2nllJx tends to infinity under the alternative. A bootstrap test based on
the process D§n(x)I;;1(X)D2n(X) may be used, where In(x) = In (/3n (x) , x) is a
consistent estimator of Io(x).
If we restrict the alternative to possible differences between values (3(Xi) on
a finite subset (Xi)i9 of Jx such that (xiksI is an increasing sequence in Jx
with Xi -Xi-I> 2hn . Let V2n be the vector of dimension I with components the
variables D2n(Xi). From the expression of the covariances of an, the covariance
of
foT Kh n (x - u) an (du) and foT Kh n (x' - u') an(du')
tends to zero for every sand s' E Jx such that Is - s'l > 2h n . The asymptotic
variance of the variable V2n is therefore a block-diagonal matrix of dimension pI
and with the sub-matrices Ir;l(Xi), i ::; I, as diagonal blocks, it is consistently
estimated by the block-diagonal matrix A 2n with sub-matrices I n (7Jn(Xi)). Then
a simple test statistic for constant regression coefficients is given by V2~A2~ Y2n,
under conditions C1-C5 it has an asymptotic X2 distribution with I degrees of
freedom under Ho and it tends to infinity under the alternative if (3o(xd -I
(3o(Xj) for some i and j ::; I.
References
1. Andersen, P. K., Borgan, 0., Gill, R. D., and Keiding, N. (1993). Statis-
tical Models Based on Counting Processes, New York: Springer-Verlag.
227
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
228 M-L. T. Lee and C. A. Whitmore
the WCL model and investigate the relationship between a disease marker and
clinical disease by modeling them as a bivariate stochastic process. The disease
process is assumed to be latent or unobservable. The time to reach the primary
endpoint or failure (for example, death, disease onset, etc.) is the time when
the latent disease process first crosses a failure threshold. The marker process
is assumed to be correlated with the latent disease process and, hence, tracks
disease, albeit imperfectly perhaps. The general development of this latent sur-
vival model does not require the proportional hazards assumption. The Wiener
processes assumptions of the original WCL model and the extended model by
LDS, however, must be verified in actual applications to have confidence in the
validity of the findings in these applications. In this article, we present a suite
of techniques for checking assumptions of this model and discuss a number of
remedies that are available to make the model applicable.
We assume that the subject's initial disease level is some negative number
X(O) < O. This initial level is unknown and will be estimated. We set a failure
threshold at zero on the disease scale. The closer the value of X(t) is to zero, the
more diseased is the subject. A subject fails, i.e, reaches a primary endpoint,
when the disease level first reaches the origin. The distance from the initial level
to the origin will be denoted by parameter 8, where 8 = IX(O)I. Also, we find it
convenient to consider a modified marker process that measures changes in the
marker process from its initial level, i.e., we consider the marker change process
{Y(t)} where Y(t) = Yw(t)-Yw(O). The initial marker level Yw(O) = Ywo is used
as a baseline covariate. We denote the first passage time from the initial disease
Assumptions of a Latent Survival Model 229
1. Transformation:
Appropriate transformations may bring an application context into con-
formity with the model. Engineering applications of this kind of model
show that monotonic transformations of the time scale are often needed so
a Wiener process has a constant drift over time [see, for example, Whit-
more and Schenkelberg (1997)]. Some disease or marker processes will
tend to accelerate or decelerate with time. A monotonic transformation
of the calendar time scale r, such as t = r"l for some, > 0, may pro-
duce a constant process mean parameter on the transformed time scale
t. Parameters in this time transformation (such as ,) would then require
estimation. Transformations may be suggested by scientific knowledge
rather than by the data. Longitudinal data would be useful for checking
on the suitability of a transformation for a marker but there are some
checks that don't require this. For example, a monotonic transformation
of the time scale may bring the observed survival times into conformity
with an inverse Gaussian distribution. Nonstationarity in both the marker
and disease processes may be an issue. As the marker is expected to track
the disease, a similar transformation may be required for both processes.
2. Measurement Errors:
If independent measurement errors are present in the increments of the
marker process then the associated measurement bias and variability will
already be incorporated in the parameters J-ly and (J"yy. The marker incre-
ments are then interpreted as 'marker increments measured with error'
and no modifications are required in the analysis or the model. If mea-
surement error is an independent disturbance term that appears in each
marker reading (as indicated by a negative correlation of marker incre-
ments) then the true marker Y becomes latent or unobservable. The
jth measurement on the marker process then becomes disguised by the
presence of a measurement error term Ej as follows (here we suppress the
subject index i in the notation).
Owj = Y wj + Ej (17.3)
234 M-L. T. Lee and G. A. Whitmore
Here OWj is the observed reading and YWj is the true marker value. In
this situation, we might assume that the Ej are independently distributed
as N(O, v). The likelihood function of the model can then be extended
accordingly and parameter v estimated together with the process para-
meters. See Whitmore (1995) for a similar extension of the Wiener model
to include measurement error.
It should be noted that an independent measurement error of the type
shown in (17.3) will introduce a negative correlation between the baseline
marker reading (with error) Owo = Ywo + EO and the observed marker
increment .6.0wl = Owl - Owo because they share the measurement error
EO in the initial reading Owo. The same negative correlations would ap-
pear in the marker increments for longitudinal marker data, as we noted
earlier. This kind of correlation is found in longitudinal blood pressure
readings for example. We note, however, that the presence of measure-
ment error can be confounded with the presence of other dependencies in
the increments of a marker process.
References
1. Lee, M.-L.T., DeGruttola, V., and Schoenfeld, D. (2000). A model for
marker and latent health status, Journal of Royal Statistical Society, Se-
ries B, 62, 747-762.
Abstract: For testing the validity of the Cox proportional hazards model,
a goodness-of-fit test of the null proportional hazards assumption is proposed
based on a semi-parametric generalization of the Cox model, whereby the haz-
ard functions can cross for different values of the covariates, using Kullback-
Leibler distance. The proposed method is illustrated using some real data. Our
test was compared to that of previously described tests by using simulation
experiments and found to perform very well.
18.1 Introduction
The Cox proportional hazards (PH) model [Cox (1972)] offers a method for
exploring the association of covariates with the failure time variable often seen
in medical and engineering studies. It is a widely used tool in the analysis of
survival data and hence testing its validity is a matter of prime importance.
For a given vector of observed covariates z = (1, Zl, ... , zp)', the hazard
time t is modeled as
A(tlz) = AO(t) exp(,B' z), (18.1)
where,B = (/30, /31,'" ,/3p )' is a (p+ 1) vector of regression coefficients and Ao(t)
is an unspecified function of t and is referred to as the baseline hazard function.
Over the years, numerous graphical and analytical procedures have been de-
veloped to test the assumption of proportional hazards. The graphical methods
237
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
238 K. Devarajan and N. Ebrahimi
include plotting the logarithm of survival [Kalbfleisch and Prentice (1980)] and
methods based on the definitions of different residuals, among others. Schoen-
feld (1982) and Lin and Wei (1991) recommended plotting the elements of the
Schoenfeld residuals against failure times. Pettitt and Bin Daud (1990) sug-
gested smoothing the Schoenfeld residuals to consider time dependent effects.
Wei (1984) and Therneau, Grambsch and Fleming (1990) recommended plotting
the cumulative sums of martingale residuals. Lagakos (1980) proposed a graph-
ical method for assessing covariates in the Cox PH model based on the cumu-
lative hazard transformation and the score from the partial likelihood. Thaler
(1984) proposed nonparametric estimation and plotting of the hazards ratio
to check for non-proportionality. Arjas (1988) proposed a graphical method
based on comparisons between observed and expected frequencies of failures as
estimated from the Cox PH model. Departure from the proportional hazards
assumption is indicated by an imbalance between such frequencies as shown by
the graphs. Other graphical methods include those by Kay (1977), Crowley and
Hu (1977), Cox (1979) and Crowley and Storer (1983). In general, graphical
procedures give a first-hand idea about the departure from proportionality but
are quite subjective.
The analytical methods include, among others, tests for equality sensitive
to crossing hazards [Fleming et al. (1980)] and tests of proportionality for
grouped data [Gail (1981)]. A number of tests have been proposed based on
the time-weighted score tests of the proportional hazards hypothesis. These
include models in which the parameter vector f3 is defined as a given function
of time as considered by Cox (1972) and Stablein et al. (1981), tests in which f3
varies as a step function according to a given partition of the time axis [Moreau,
o 'Quigley, and Mesbah (1985)], tests in which f3 has defined trends along the
time intervals [O'Quigley and Pessione (1989)] and tests in which f3 varies as a
step function according to a given partition of the covariate space [Schoenfeld
(1980) and Andersen (1982)]' among others. Wei (1984) and Gill and Schumaker
(1987) developed tests using time-weighted score tests based on the two-sample
hazards ratio. In the multiple regression setting, similar tests were developed
for the parameters using a rank transformation of time by Harrell (1986) and
Harrell and Lee (1986). Nagelkerke, Oosting, and Hart (1984) considered testing
the global validity of the proportional hazards hypothesis without reference to
any alternative. Horowitz and Neumann (1992) and Lin, Wei, and Ying (1993)
proposed global tests based on the cumulative sums of martingale residuals.
Most of these tests have been shown to be special cases of the methods developed
by Therneau and Grambsch (1994). These methods are applicable when a
prespecified form is given for departures from proportionality.
Lin and Wei (1991) extend the methods of White (1982) for detecting para-
metric model misspecification to the Cox partial likelihood. Kooperberg, Stone,
and Truong (1995) introduced a model for the log-hazard function conditional
on the covariates. The Cox PH model is a member of the class of models con-
Goodness-of-Fit Testing 239
sidered for the conditional log-hazard and hence they test the proportionality
assumption. Hess (1994) considered cubic spline functions for assessing time by
covariate interactions in the Cox PH model. Quantin et al. (1996) derived a
global test of the proportional hazards hypothesis using the score statistic from
the partial likelihood. Pena (1998) discussed smooth goodness-of-fit tests for
the baseline hazard in the Cox PH model.
We define a semi-parametric generalization of the Cox PH model in which
the hazard functions corresponding to different values of the covariates can
cross. The cumulative hazard function corresponding to a covariate vector z is
given by
/\( tlz) = {(/\o(t))} exp(,' Z) exp(,l3' z), (18.2)
where /\o(t) is an arbitrary baseline cumulative hazard function and ,l3 and,
are unknown (p + 1) vectors of parameters.
In addition to being a semi-parametric generalization of the Cox PH model,
the model (18.2) has several nice features. Below we only mention two. For
other features and also more details about this model see Quantin et al. (1996)
and Devarajan and Ebrahimi (2000).
That is, the corresponding ratios of the hazard function to the cumulative
hazard function are proportional.
In this paper, our goal is to develop goodness-of-fit testing methods for the
Cox PH model against the model (18.2) using Kullback-Leibler Discrimination
(KLD) information measures [see Kullback (1978) and Ebrahimi et al. (1994)].
The structure of the paper is as follows.
The methods are described in Section 18.2. In Section 18.3, we compare
the proposed test with existing tests based on empirical power estimates via
simulation experiments. In Section 18.4, we illustrate our methods using real
life data sets. Throughout this paper we assume that the data consists of
independent observations on the triple (Xi, 6i, Zi), i = 1, ... ,n, where Xi is the
minimum of a failure and censoring time pair (Ti, Ci), 6i = J(Ti ~ C i ) is the
indicator of the event that a failure has been observed, and Zi = (1, Zi1, ... , Zip)'
is a (p + 1) vector of covariates for the i-th individual. The random variables
Ii and Ci are assumed to be independent.
240 K. Devarajan and N. Ebrahimi
which measures the discrepancy between the two distributions F and G in the
direction of HI.
We can derive the divergence measure
Similarly, the survival function and the density function corresponding to the
non-proportional hazards model of HI, denoted by G(tlz) and g(tlz) respec-
tively, are
and
g(tlz) = exp { - I\a (tlz) + ,8' z +;' z} Ao(t) [1\0 (t)]exp(;'Z-I) . (18.10)
Thus using (18.8) and (18.10) the directed divergence J(G : Flz) given the
covariate vector z, is given by
and the directed divergence J(F : Glz) given covariate vector z, is given by
Hence, given the covariate vector z, the divergence measure J(F : Glz) is
Here Ke = 1000 (eu log u )du is Euler's constant and r(·) is the gamma function.
An important feature of all three measures described in (18.11)-(18.13) is that
they are all free from the baseline AD (t).
Since evaluations of J(G : Flz),J(F : Glz) and J(F : Glz) in (18.11)-
(18.13) require complete knowledge of unknown parameters,8 and;, then these
measures are not operational. We operationalize J(G : Flz), J(F : Glz) and
J(F: Glz) by developing discrimination information statistics i(G : Flz), i(F :
Glz) and J(F : Glz), where ,8 and; are replaced by j3 and 1'. Here, j3 and l'
are the maximum likelihood estimates obtained by approximating the baseline
hazard function with a linear combination of cubic B spline basis functions .
For more details about these estimates and their properties see Devarajan and
Ebrahimi (2000). Thus, for a given value of z, our goodness-of-fit test will be
based on either i(G : Flz),i(F : Glz) or J(F : Glz).
242 K. Devarajan and N. Ebrahimi
Remark 18.2.1 Observing i(G : Flz), i(F: Glz) and J(F: Glz) we see that
they all depend on the covariate vector z. As a global measure of goodness-
of-fit, we suggest averaging i(F : GIZi), i(G : Flzi) and J(F : GIZi) over all
individuals in the sample. That is considering ~ fi(F : GIZi)'~ fi(G :
n i=l n i=l
1~
Flzd and - L J(F : GIZi). Another approach is taking J(F : Glz), J(G :
A A A
n i=l
Flz), and J(F Glz), where z is the average covariate value over all the
individuals.
Now, to implement test statistics i(F : Glz), i(G : Flz) and J(F : Glz) use
the following steps:
Step 1: Use the Devarajan and Ebrahimi (2000) approach to estimate {3 and
'Y. Denote the estimates by /3 and 1'.
Step 2: Replace (3 and 'Y by /3 and l' in equations (18.11)-(18.13) to get
i(F: Glz), i(G: Flz) and J(F: Glz).
Step 3: One can show that 2n i(F : Glz), 2n i(G : Flz) and J(F : Glz)
have asymptotically chi-squared distribution with q degrees of freedom under
Ho. [See Kullback (1978)]. Here q is the number of parameters under Ho.
Therefore, if you are using i(F : Glz), then reject Ho if 2n i(F : Glz) > X~,x,
where X~,x is the chi-squared distribution with q degrees of freedom and a is
the significance level of the test. If you are using i(G : Flz), then reject Ho if
2n i(G : Flz) > X~,x' Finally, if you are using J(F : Glz), then reject Ho if
2
n J(F : Glz) > Xq,x'
A
(O,B) to result in 25% censoring. Computations for the proposed method were
based on the full likelihood approach using B-spline approximations for the
baseline hazard as described by Devarajan and Ebrahimi (2000).
The proposed method is compared with goodness-of-fit tests of Quantin et
al. (1996), Breslow et al. (1984), Gill and Schumaker (1987), Nagelkerke et
al. (1984), Wei (1984) and that of Cox (1972) incorporating time-dependent
covariate effect. The critical values of the test statistics for the proposed test
were computed based on the Chi-square distribution with 1 degree of freedom
at a significance level of 0.05 as discussed in Section 18.2. Quantin et al. (1996)
note that the critical values of all the test statistics in their comparison study
were also based on the Chi-square distribution with 1 degree of freedom at a
significance level of 0.05 except that of Wei (1984) whose critical values are
based on the tables of Koziol and Byar (1975).
lt was seen in the simulations that the proposed test achieves the specified
significance level of 5% and hence it is a consistent test. Tables 18.1 through
18.6 present the results of the power study. From the results, there is clear
evidence that the proposed test performs better than most of the existing tests
for the Cox PH model. The empirical power of the proposed test is higher than
all the other tests in most situations. Overall, the results are much better for the
case of the Weibull distribution with shape parameter (3 = 0.5 for both choices
of the scale parameter a = 1,2. The proposed test performs moderately well
relative to the other tests for the case of (3 = 2 and a = 1 in both uncensored
and censored samples with group sizes 30 and 50.
lt would be interesting to study how the proposed test performs in the case
of other distributions such as a log-logistic and a lognormal distribution that al-
low the hazard functions corresponding to differing scale and shape parameters
to cross but they do not satisfy the model (18.2). In order to get a first-hand
idea on the performance of the proposed test in such situations, a small simu-
lation study was performed. Simulations were performed based on a lognormal
model for the two sample problem using hazard functions AO(t) in Group 0 cor-
responding to the standard lognormal distribution with scale parameter J-L = 0
and shape parameter cr = 1 and Al (t) in Group 1 corresponding to a lognormal
distribution with scale parameter J-L and shape parameter cr. The experiment
was repeated 1000 times for uncensored samples for each of the following combi-
nations: J-L = 0, 1, cr = 0.5,2 and n = 30,50 and 100 per group. The results are
shown in Tables 18.7 through 18.9. From the tables, we see that the empirical
power estimates are much better when cr = 2 relative to cr = 1. Even in the
case of cr = 1, the empirical powers are higher for the case of J-L = 1 relative to
J-L = O. There is an indication that the proposed test is able to pick up crossing
hazards even in situations where the underlying distributions do not belong to
the family of models (i.e., Weibull) included in the non-proportional hazards
model (18.2).
We also compare the goodness-of-fit statistics based on the directed diver-
244 K. Devarajan and N. Ebrahimi
gence as given by i(F : G) and i(G : F) and the divergence J(F : G). The
comparison is made for each combination of sample size, censoring percentage
and Weibull distribution characteristics as given above. In each case, we ob-
serve that the directed divergence given by i(F : G) gives the highest empirical
power among the three statistics. But J(F : G) measures the directed diver-
gences in both directions as pointed out earlier. Overall, the power estimates
based on these three measures are in proximity to each other.
(a) Use of the information statistic enables us to test the Cox PH model against
a non-proportional hazards model.
(b) Compared with all the tests, our proposed test is not dependent on the
baseline hazard function >'0 (t) .
(c) The proposed test performs very well in terms of power compared with other
leading tests against non-proportional hazards alternatives.
References
1. Andersen, K. (1982). Testing goodness-of-fit of Cox's regression and life
model, Biometrics, 38, 67-77.
11. Ebrahimi, N., Habibullah, M., and Soofi, E. (1994). Testing exponential-
ity based on Kullback-Leibler information, Journal of the Royal Statistical
Society, Series B, 54, 739-748.
20. Kay, R (1977). Proportional hazards regression models and the analysis
of censored survival data, Applied Statistics, 26, 227-237.
21. Kooperberg, C., Stone, C. J., and Truong, Y. K. (1995). Hazard regres-
sion, Journal of the American Statistical Association, 78, 78-94.
27. Lin, D. Y., Wei, L. J., and Ying. Z. (1993). Checking the Cox model with
cumulative sums of martingale residuals, Biometrika, 80, 557-572.
28. Lin, D. Y. and Wei, L. J. (1991). Goodness-of-fit tests for the general
Cox regression model, Statistica Sinica, 1, 1-17.
29. Moreau, T., O'Quigley, J., and Mesbah, M. (1985). A global goodness-
of-fit statistic for the proportional hazards model, Applied Statistics, 34,
212-218.
34. Quantin, C., Moreau, T., Asselain, B., Maccario, J., and Lellouch, J.
(1996). A Regression model for testing the proportional hazards hypoth-
esis, Biometrics, 52, 874-885.
36. Schoenfeld, D. (1982). Partial residuals for the porportional hazards re-
gression model, Biometrika, 69, 239-241.
19.1 Introduction
A large fraction of analyses of correlated survival data appearing in recent sta-
tistical publications is based on some form of frailty models. Frailty models
have been used both to model population heterogeneity in univariate survival
data and to model association in multiple survival data. The model we intro-
duce in the present treatise is meant to offer an alternative to the latter only.
255
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
256 S. T. Gross and C. Huber-Carol
The former were first introduced by Vaupel, Manton, and Stallard (1979) and
Clayton and Cuzick (1985). For recent references and an illuminating discussion
of the two types of frailty models, we refer the reader to Scheike, Petersen, and
Martinussen (1999), and for a rigorous study of the asymptotic properties of
frailty models to Pamer (1998). A Cox type conditional frailty model assumes
that the data for a single cluster k, k = 1,2, ... ,K consists of possibly censored
observations (Tk1,"', T kd ) and death indicators (6k1,"', 6kd) where
Tkj X kj /\ Ckj
6kj 1{Xkj:::;Ckj}'
The censoring variables Ck = (Ck1, ... ,Ckd) are assumed independent and in-
dependent of the survival variables Xk = (Xk1,"', Xkd). Their distributions
are also assumed not to depend explicitly of the parameters (3 and 'Y character-
izing the laws of Xk. The components of Xk are assumed independent given the
frailty Wk and the covariates (Z, V) of cluster k with hazards (instantaneous in
the continuous case and discrete in the discrete case)
(19.1)
for some parameter a > 0, where the marginal survival functions Sl, S2, . ", Sd
are arbitrary, and often assumed to follow Cox model:
(19.3)
New Multivariate Survival Model 257
Here the margin parameters are interpretable in the usual way but the model,
dependent as it is on a single association parameter Ct, allows no possibility
of modeling within cluster dependence on the individual level covariate Vkj,
j = 1,2, ... ,d. The model may therefore represent a first order approximation
to the far more complicated dependence structure likely to be present in real
survival data. Our model which breaks down the dependence structure in the
data into hierarchical components without resorting to the use of random effects
allows a finer representation of dependence. Moreover it allows an interpretation
of the parameters as log odds ratios for failure.
Here Yk(t) and rk(t) denote the observed values of Rk(t) and Yk(t) for k
1,2," " K, and t = 1,2"", T. For uncensored data rk(t) = rk(t - 1) - Yk(t)
and V(O) = Vl(O). In case of censoring, we assume that 0 = (01,02 ) where 01
is the parameter of interest, so that we can use the partial likelihood Vi for
inference on 01. Without further comment, we shall drop the subscript 1 from
258 S. T. Gross and C. Huber-Carol
P(Yk(t) = Y I Rk(t) = r)
P(Y(t) = Y I R(t) = r). (19.6)
This is a Markov like assumption that characterizes our model. The likelihood
K
V(e) = II pO[Yk(t) = Yk(t) I Rk(t) = rk(t)] (19.7)
k=l
is a complete likelihood for our model in the uncensored case and a partial
likelihood in the right censored case. It may also be viewed, in light of (19.5-
19.6) as a partial likelihood for (j for censored or uncensored data when the true
model for the complete, possibly unobserved, data, does not satisfy assumption
(19.6). Defining now
K T
L =L LLL N(r, y, t) In(po[Y(t) = Y I R(t) = r]). (19.8)
k=lt=l r y5,r
(19.9)
and
c(r, t) = L e L:o<rl:O;r L:o<yl:O;rIAyPrl,yl(t). (19.10)
y5,r
When no restrictions are placed on the new parameters Pr,y(t), our model is
saturated and imposes no further restrictions on the model beyond the basic
Markov assumption (19.6). For d = 2 we have a five p-parameters set, Pl1,l1(t),
PU,Ol (t), Pl1,lO( t), P10,1O( t) and P01,0l (t).
Theorem 19.3.1 For clusters of size d, the law defined by (19.7)-(19.10) with
K = 1 for complete data defines a law for independent Xl, ... ,Xd if and only
if Pr,y(t) == 0 for all r with Irl > 1 and all Y :::; r, and marginal laws defined by
We state Theorems 19.3.2 and 19.3.3 for the bivariate case. Extensions to
the general d-dimensional case are straightforward but require cumbersome
notation.
Theorem 19.3.2 For the bivariate law defined by (19.1}-(19.10) with K = 1,
d = 2 for the complete data (without censoring), we have:
Pll,IO
= In (P[XI = t,X2 > tlXI ~ t,X2 ~ t]jP[XI > t,X2 > tlXI ~ t,X2 ~ tJ)
P[XI = t,X2 < tlXI ~ t,X2 < tJ/P[XI > t,X2 < tlXI ~ t,X2 < tJ
(19.12)
tl-l
II P(XI > t,X2 > tlX I ~ t, X2 ~ t)
t=l
xP(XI = tl, X 2 > tllXI ~ tl, X2 ~ tl)
t2-1
X II P(X2 > tlXI = tl, X2 ~ t)
t=h+l
XP(X2 = t21XI = tl,X2 ~ t2)' (19.16)
260 s. T. Gross and C. Huber-Carol
By symmetry, we have the same result for the case T 2:: tl > t2 > O. For
o :S tl = t2 = t :S T,
we can always write
t-l
P(XI = t, X2 = t) = II P(XI > t/, X 2 > t/IXI 2:: t/, X2 2:: t/)
t'=l
xP(Xl = t, X2 = tlXl 2:: t, X 2 2:: t). (19.18)
This proves the if part of the theorem. We note that the formulation of this
result in terms of our parameters Pnn and c(r, t) is
Conversely, if £(Xl' X2) is in our family, (19.14) follows directly from the basic
assumption of our model (19.6). Thus, a test criterion for being in our family
may be based on equations (19.14). This completes the proof of the theorem .
•
Remarks
1. In case of censoring, the only unusable observations are those for which
the first time is a censoring time. Otherwise, one can stratify on the first
observed "death" time the number of the second "deaths" at each time t
and check for independence. For consistency and asymptotic results for
our partial maximum likelihood estimate see Gross and Huber (2000).
2. In applications, when (Xkl' ... ,Xkd) are not exchangeable, their indices
can represent "structural covariates". In some applications Xkl may rep-
resent survival of the treated and Xk2 survival of the non treated. In
New Multivariate Survival Model 261
the skin grafts example below there are up to 4 closely matched grafts
and up to 4 poorly matched grafts per patient. One representation of the
data would then involve up to d = S grafts, where X kl, ... , X k4 represent
survival of closely matched grafts, and Xk5, ... ,Xk8 represent survival
of poorly matched grafts. We chose a more parsimonious representation
below that reflects the exchangeability within the poorly matched and
within closely matched grafts.
Table 19.1: Bachelor and Hackett (1970) skin grafts data on severely burnt
patients
Table 19.2: Some risk sets R and jump sets S for skin grafts data
Risk sets:
Rk(i) = (1122) 2 close and 2 poor grafts are present at time i in patient k.
Rk,(i) = (1120) 2 close and 1 poor grafts are present at time i in patient k'.
Rk,,(i) = (1100) 2 close grafts are present at time i in patient k".
Jump sets:
Sk(i) = (1200) 1 close and 1 poor grafts were rejected at time i in patient k.
Sk,(i) = (2000) 1 poor graft was rejected at time i in patient k'.
Sk" (i) = (0000) No graft was rejected at time i in patient k".
effect of 0.22 with a 95% confidence interval of (0.048, 1.03) based on a subset
of the data. Our result is certainly in rough agreement with theirs. Nielsen et
al. (1992) could not reject the hypothesis that the variance parameter of the
frailty gamma distribution is zero, in other words, the hypothesis that allografts
of a given patient are independent. In our fitted model we can reject the null
hypothesis that allograft survivals in a single patient are independent (although
the 95% confidence interval for P12,1 = P12,2 contains zero, but just barely). The
likelihood ratio test for Ho : P12,1 = P12,2 = 0 in model 8 is LR = 5.7, which is
significant with P = 0.02. Note that our model contains only one dependence
parameter and that it does not depend on graft match.
References
1. Arjas, E. and Haara, P. (1988). A note on the asymptotic normality in
the Cox regression model, The Annals of Statistics, 16, 1133-1140.
2. Bachelor, J. R. and Hackett, M. (1970). HL-A Matching in treatment of
burned patients with skin allografts, Lancet, 19, 581-583.
3. Clayton, D. G. and Cuz ick , J. (1985). Multivariate Generalizations of
the Proportional Hazards Model (with discussion), Journal of the Royal
Statistical Society, Series A, 148, 82-117.
4. Clegg, 1. X., Cai, J., and Sen, P. K. (1999). A marginal mixed baseline
hazards model for multivariate failure time data, Biometrics, 55, 805-812.
5. Efron, B. (1988). Logistic regression, survival analysis, and the Kaplan-
Meier curve, Journal of the American Statistical Association, 83,414-425.
6. Gross, S. T. and Huber-Carol C. (2000). Hierarchical dependency models
for multivariate survival data with censoring, Lifetime Data Analysis, 6,
299-320.
7. Hanley, J. A. and Parnes, M. N. (1983). Nonparametric estimation of
a multivariate distribution in the presence of censoring, Biometrics, 39,
129-139.
8. Huster, W. J., Brookmeyer, R., and Self, S. G. (1989). Modeling paired
survival data with covariates, Biometrics, 45, 145-156.
9. Kalbfleisch, J. G. and Prentice (1980). The Statistical Analysis of Failure
Time Data, New York: John Wiley & Sons.
10. Nielsen, G. G., Gill, R. D., Andersen, P. K., and Sorensen, T. 1. A. (1992).
Scandinavian Journal of Statistics, 19, 25-43.
New Multivariate Survival Model 265
14. Vaupel, J. W., Manton, K. G., and Stallard, E. (1979). The impact of
heterogeneity in individual frailty and the dynamics of mortality, Demog-
raphy, 16,439-447.
20
Discrimination Index, the Area Under the ROC
Curve
267
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
268 B-H. Nam and R. B. D'Agostino
20.1 Introduction
Background: Performance measures in mathematical predictive
models
Consider a vector of variables V(Vl' V2, ... , Vk), independent variables in a re-
gression, or risk factors, and a va,riable W, the dependent variable, or outcome
variable having 1 for positive outcome and 0 for negative outcome. Here, 'pos-
itive outcome' indicates occurrence or presence of an event such as coronary
heart disease.
Health Risk Appraisal functions (HRAF) are mathematical models that are
functions of the data (V), which relates to the probability of an event (W).
Symbolically, for the configuration of V of the data
1 1
P(W = 1) = _
1 + exp(( -,B/V))
Second, the Cox regression model is a survival analysis model that relates
V to the development of an event over a period of time t, but in this model, we
take into consideration the time to event and censoring, for example, dropouts,
lost to follow-up. The following is its mathematical expression:
where So(T = t/V) is the survival probability for those with the mean vector
values, V.
The value of C varies from 0.5 with no discrimination ability to 1 with per-
fect discrimination and is related only to the ranks of the predicted probabilities.
Bamber (1975) recognized that the area under the ROC curve is an unbiased
estimator of the probability of correctly ranking a (event, no-event) pair and
that this probability is closely connected with the Mann-Whitney statistic.
Hanley and McNeil (1982) elaborated the relationship between the area
under the ROC curve and the Mann-Whitney statistic and showed that the
two are identical, i.e.,
C statistic (20.1)
where
Point estimate of ~
Say, we have random samples of nI events and n2 non-events, then under the
shift model, the observations Y2I , Y22, .. " Y 2n2 , and Yu -~, Yi2 -~, ... , Y In ! -
~ have the same distribution. Hence, we could estimate ~ by the amount
which the YI-values must be shifted to give the best possible agreement with
Y2-values. To do this, we define Djk = YIj - Y 2k , for j = 1 to nI, and k = 1
to n2. Then following Lehmann (1975), the estimator Li would be the median
of nIn2 values of Djk which Lehmann (1975) showed is an unbiased estimator
of ~ (i.e., E(Li) = ~) and also median unbiased (i.e. ~ is the median of the
distribution of Li) if one of the following conditions are satisfied:
(2) The two sample sizes are equal, that is, nI = n2.
Confidence interval of ~
(20.2)
(20.3)
(20.4)
272 B-H. Nam and R. B. D'Agostino
o
(, = '12 [
nln2 + 1 - 196
. nln2(nl + n2 + 1)
3 (20.5)
so, the ~low will be the fth value from the lowest of nln2 values of Djk. In
a similar fashion, the upper bound ~up = DC n ln2-£+1) is the (nIn2 - f + 1)th
value from the lowest of values of Djk.
20.2.3 Confidence interval for the area under the ROC curve
We now construct the lower and the upper confidence bounds for C (Ctow, Cup)
by using the lower and the upper bounds for ~. Let Dis(Y) denote the distri-
bution of Y. Then, it can be seen that, under the shift model,
Hence,
For the lower bound (Ctow), let ~I = ~low' ~o = A. Then, from (20.6), (20.7),
Dis(Yl - A) = Dis(Y2) and
Now, say we have a new pair of (Y2k, Vj) for k = 1 to n2, j = 1 to nl, where
Vj = YIj + (~low - A). Hence, the Ctow would be
_1_ [{ number of pairs (k, j) with Y2k < Vj}
mn
+{number of pairs (k,j) with Y2k = Vj}]
1
= --WVY2 . (20.9)
nln2
For the upper bound (Cup), let ~I = ~up, ~o = A. Then, from (20.6), (20.7),
Dis(Yl - A) = Dis(Y2). So,
Now, say we have a new pair of (Y2k, Uj) for k = 1 to n2, j = 1 to nl, where
1
Uj = }j + (~up - ~). Hence, the Cup would be nln2 WUY2. Therefore, the
h
confidence interval for C, the area under the ROC curve would be
(20.11)
Area Under the ROC Curve 273
Y1 : 0.111 0.1480.1890.2370.251
Y2: 0.0340.0670.095 0.107 0.114 0.121 0.1280.133 0.139 0.142
0.1470.152 0.155 0.164 0.175 0.1870.1930.2160.2270.243
1 1
2[D(SO) + D(s1)l = 2[0.041 + 0.042] = 0.0415.
From (20.5), the £ value for 95% confidence interval is £ = 22. Hence, ~low is
D(22) = -0.022. Similarly, ~tip is D(lOO-22+1) = D(79) = 0.104. So, the 95%
confidence interval for ~ is (-0.022,0.104).
Then, we have n pairs of (Tl' Yl), (T2, Y2), ... , (Tn, Yn).
Define
where,
1. event vs. non-event: comparing those who developed events against those
who did not
2. event vs. event: comparing those who developed events against those who
also developed events
3. event vs. censored: comparing those who developed events against those
who were censored
(20.12)
Area Under the ROC Curve 275
where
Here, since all the survival times for those who did not develop events are
longer than the maximum value of the event time for those who developed
events, it is obvious that aij is always equal to 1. Hence ,
1 nl n
Cl = -Q L Lbij. (20.13)
1 i=1 j=1
(20.15)
where
C2 = ~ (7 + 1). (20.16)
(20.17)
where
(20.18)
c =
Now, we can argue that the overall C tends to normality since Cl, C 2 and C3
are all independent of one another and each of them is asymptotically normal.
See appendix for the mean and variance of the overall C.
Appendix
1. Mean and Variance of C statistic, the area under the ROC curve in logistic
regression
where,
E(f)
Var(f)
E(C3)
Var(C3)
where,
P32 P(YI > Y3, Y{ > Y3!Tl < T3, T{ < T3)
P33 P(YI > Y3, Yl > Y;!Tl < T3, Tl < T~)
and Q3, A and B are unknown quantities.
278 B-H. Nam and R. B. D'Agostino
Var[C]
Var[aCI + bC2 + (1 - a - b)C3]
a 2Var[C I ] + b2Var[C2] + (1 - a - b)2Var[C3]
a 2_ 1 _{PI(1_ PI) + (ni - 1)(P12 - Pf) + (n2 - 1)(PI3 - Pf)}
nIn2
21 [ 4(nl - 2) 2 2 ]
+b- 4 nl (nl - 1)Var(7i)+ nl (nl - 1)(1-7)
2 1 2 2
+(1 - a - b) Q2)3 [Q3 P3(1- P3) + A(P32 - P 3 ) + B(P33 - P 3 )]
nIn2
{nIn2 + ~nl(ni -1) + E~';I Ej!1 aij}2
x{H(l - PI) + (ni - 1) (P12 - Pf) + (n2 - 1)(PI3 - Pf)}
gnl(nl - 1)}2
+ 2
{nIn2 + ~nl (ni - 1) + E~';l Ej!1 aij}
1 [ 4(nl - 2) 2 2 ]
4 nl (ni - 1)Var(7i)+ ni (ni - 1)(1-7)
x-
1
+ 2
{nIn2 + ~nl(nl -1) + E~';I Ej!1 aij}
References
1. Bamber, D. (1975). The area above the ordinal dominance graph and
the area below the receiver operating graph, Journal of Mathematical
Psychology, 12, 387-415 .
4. Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area
under a receiver operating characteristic (ROC) curve, Radiology, 143,
29-36.
11. Dry, H. K. (1972). On distribution-free confidence bounds for Pr{Y < X},
Technometrics, 14, 577-581.
21
Goodness-of-Fit Tests for Accelerated Life Models
21.1 Introduction
In accelerated life testing (ALT) units are tested at higher-then-usuallevels of
stress to induce early failures. The results are extrapolated to estimate the
lifetime distribution at the design stress using models which relate the lifetime
to the stress.
Many models for constant-over-time stresses are known. An important tool
for generalization of such models to the case of time-varying stresses is the physi-
cal principle in reliability formulated by Sedyakin (1966) for simple step-stresses
and generalized by Bagdonavicius (1978) for general time-varying stresses.
Some of the well-known accelerated life models [see, for example, Bagdon-
avicius et al. (2000)] for time-varying stresses as, for example, the accelerated
failure time (AFT) model, verify this principle, some do not. An example is the
case of the proportional hazards (PH) model when the failure time distribution
is not exponential under constant stresses. In this paper a goodness-of-fit test
is given for the generalized Sedyakin's (GS) model when the data are obtained
from accelerated experiments with step-stresses.
281
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
282 V. BagdonaviCius and M. S. Nikulin
x(r) = {Xl.X2,
0::; r < tI,
r 2:: tI,
and the second always under the constant stress X2, then for all s > 0
(21.1 )
Goodness-of-Fit Tests for Accelerated Life Models 283
Equivalently, the model can be written in the form axe) (t) = gl (x (t), A x (.) (t) ) ,
where gl(X,S) = g(x,exp{-s}). On sets El of constant stresses the equality
(21.1) always holds. It is seen from the following proposition.
Proposition 21.2.1 If the hazard rates ax(t) > 0, t > 0 exist on a set of
constant stresses El then the as model holds on El.
(21.2)
where Xt is constant stress equal to the value of time-varying stress x(·) at the
moment t.
•
Restrictions of the GS model when not only the rule (21.2) but also some
relations between survival under different constant stresses are assumed can be
considered. These narrower than GS models can be formulated by using models
for constant stresses and the rule (21.2).
Let us consider the meaning of the rule (21.2) for step-stresses of the form:
(21.4)
(21.5)
The accumulated hazard rate verifies the integral equation (we denote by g the
function gl in what follows)
(21.6)
and
So for all t 2: tl the functions A x (.) (t) and AX2 (t - tl + tn satisfy the integral
equation
h(t) =a+ it
h
g (X2, h(u)) du
with the initial condition h(tI) = a. The solution of this equation is unique,
therefore we have
Let us consider a set Em of more general stepwise stresses of the form (21.3).
Set to = O. Using Proposition 21.2.3 and by recurrence the following proposition
can be shown.
Goodness-of-Fit Tests for Accelerated Life Models 285
In the literature on ALT [see Nelson (1990)J the model (21.7) is also called
the basic cumulative exposure model.
N. M. Sedyakin called his model the physical principle in reliability meaning
that this model is very wide. Nevertheless, this model and it's generalization
can be not appropriate in situations of periodic and quick change of stress level
or when switch-on's or switch-off's of stress from one level to another can imply
failures or shorten the lifetime.
Let us consider an example which shows how the GS model can be used
for generalization of models for constant stresses to the case of time-varying
stresses. Suppose that under different constant stresses the survival functions
differ only in scale: for any x E EI
Let us generalize the model (21.9) to the case of time-varying stresses by sup-
posing that the GS model also holds, i.e. the hazard rates under time-varying
stresses are obtained from the hazard rates under constant stresses by the rule
(21.2).
Let us find the expression of the survival function under time-varying stresses
for the model (21.10).
Proposition 21.2.6 The model {21010} holds on a set of stresses E iff there
exists a survival function G such that for all x(o) E E
(21.11)
The model (21.13) is not natural when items are aging under usual constant
stress. Indeed, denote by Xt constant-in-time stress equal to the value of time-
varying stress xC) at the moment t. Then the PH implies that
For any t the intensity under time-varying stress xC) at the moment t does not
depend on the values of stress x(·) before the moment t but only on the value of
stress at this moment. It is not natural when the hazard rates are not constant
under constant stresses, i.e. when times-to-failure under constant stresses are
not exponential under constant stresses. When is the PH model also the GS
model? It is given in the following proposition.
here 6i is the probability for an item not to fail because of the switch-up at
the moment ti. In this case the GS model for step-stresses can be modified as
follows:
(21.14)
where
(21.15)
AA(l)(
t
) = rt dN(v)
Jo Y(v)'
The second is suggested by the GS model [formulae (21.7) and (21.8)] and is
obtained from the experiments under constant stresses:
where
0,
AlA A AlA A
A2 (Al(tl)), ... , ti+l =Ai+2(Ai+l(ti+l-ti+ti)), i=0, ... ,m-2,
(21.16)
(a) The hazard rates ai are positive and continuous on (0, DO);
1 1- S(Sl /\82) 2
Cov (U(81), U(82)) = -Z S( ):= 0" (81/\82)
o Sl /\ 82
with Si = exp{ -Ad, S = exp{ -A}.
PROOF. Under Assumptions A for any t E (0, t m ) the estimators Ai and A(1)
are uniformly consistent on [0, t], and
(21.18)
on D[O, t], the space of cadlag functions on [0, t] with Skorokhod metric.
We prove (21.17) by recurrence. If i = 1 then
(21.20)
(21.21 )
290 V. Bagdonavicius and M. S. Nikulin
vn(t;H - tj+l)
vn{Aj;2(A jH (ij + ~tj)) - Aj;2(AjH(tj + ~tj))}
= aHI {UHI(tj + ~tj) - Uj +2(tj+In + aHI ~j vn(ij - tj) + ~n,
J
p
where ~n -+ 0 as n -+ O. The last formula and the assumption of recurrency
imply that
vn(ij+1 - tjH) E.
aHI { UH1 (tj + ~tj) - UH2(tj+d
Let us consider the limit distribution of the statistic Tn. Note that
K (v) P loliH
r,;:;
yn
-+ k(v) = l O+i+1
l S(v) 9 ((lo + li+1)S(V)) , v E [ti, tiHl·
Set
e'J
Goodness-of-Fit Tests for Accelerated Life Models 291
(21.23)
(21.24)
m-2 m-l
= L fi+lUi+l(ti + ~ti) --.: L fiUi+1(ti) + op(l). (21.25)
i=O i=l
The formulae (21.24) and (21.23) imply the result of the proposition. •
292 v. Bagdonavicius and M. S. NikuJin
Corollary 21.5.1 Under the assumptions of the theorem Tn E. N(O, at), where
(21.26)
where
io im = 0,
2m-l
ii 2: esdsi i = 1, ... , m - 1,
s=i
s-1
dsi II el, i = 1, ... , s - 1, d ss = 1,
l=i
{Xl+1 (ti + t:,.tl)
{Xl+1(tn
Goodness-oE-Fit Tests Eor Accelerated Life Models 293
Under H*:
i
A(l)(v) .!: A~l)(v) = Ai+1(V) - Ai+1(ti) + l{i > o} 2) Ai (tz) - Ai(tZ-1)},
Z=l
A(2) (v) .!: A(2)(V) = Ai(V-ti+ti), VE[ti,ti+1), i=O,···,m-1,
1 p
fo,K(v) -7 k*(v), v E [ti, ti+1),
294 V. BagdonaviCius and M. S. Nikulin
where
101i+1S~I)(V)Si(V - ti + tn
k*(v)
10S~I)(v) + li+lSi(V - ti + tn
x 9 (lOS~I)(v) + li+1Si(v - ti + ti)) , (21.27)
Tn = 10 00
K(v) d{A(I)(v) - Ai1) (v)} - 10 00
K(v) d{A(2) (v) - A(2) (v)}
+ 10 00
K(v) d{AP) (v) - A(2) (v)} = TIn + T2 n + T3n. (21.28)
Under H*
TIn + T2 n ~ N(O, 0']-,2),
where 0']-,2 has the same form (21.26) with the only difference that k is replaced
by k* and 0'2 (t) is replaced by
(0'
(1) 2 _
) (t) - 10
1(1 )
S(I)(t) - 1 .
Under H* we have
(21.29)
and
Tln:- T2n ~ N(O, 1). (21.30)
O'T
The third member in (21.28) can be written in the form
:L 1
m-l t i+l
T3n = K (v){ G¥i+ 1 (v) - G¥i+ 1 (v - ti + ti)} dv. (21.31)
i=1 ti
•
Goodness-of-Fit Tests for Accelerated Life Models 295
Remark 21. 7.1 If D:i are increasing (decreasing) then the test is consistent
against H*.
PROOF. The inequalities Xl < ... < Xm imply that ti > ti for all i. If D:i are
increasing (decreasing) then Ll* > 0 (Ll* < 0) under H*. Proposition 21.7.1
implies the consistency 'of the test. •
* PH w~thD:i
Hn: . ()
t = (t)Jn
()i
T3n
p
-+ /-L = -E t; 1
m-l
ti
t i +1
k*(v) In(l +
t~ - ti
-2-
v-)dv > 0,
and
Tn
-A-
D
N ( a,l), Tn)2
( a-T D 2
-+ -+ X (1, a),
aT
where a = /-LIar and X2(1, a) denotes the chi-square distribution with one degree
of freedom and the non-centrality parameter a (or a random variable having
such distribution).
r
The power function of the test is approximated by the function
Let us find the power of the test against the following alternatives:
H**: the model (21.14) with specified time-to-failure distributions under
constant stresses
Under H**
where k** (v) has the same form as k* (v) with the only difference that si l ) is
replaced by si!)(v) = exp{ -A~;)(v)}. Convergence is uniform on [0, tmJ.
Ll** = 10 00
k**(v) d{A~;)(v) - A(2) (v)} =I- O.
Remark 21.7.2 If Cti are increasing (decreasing) then the test is consistent
against H**.
in.
H~*: the model (21.14) with specified time-to-failure distributions under
c·
constant stresses and 8i = 1 -
The parameter f..L is positive (negative) if the functions Cti are convex (con-
cave).
The power function of the test is approximated by the function (21.32) with
a = f..L/0"y.*.
References
1. Andersen, P. K., Borgan, 0., Gill, R. D., and Keiding, N. (1993). Statis-
tical Models Based on Counting Processes, New York: Springer-Verlag.
Niels Keiding
University of Copenhagen, Copenhagen, Denmark
22.1 Introduction
Survival analysis has contributed a number of specialised approaches to the
general methodology of goodness-of-fit; for recent textbook surveys see e.g.
Andersen et al. (1993, Section VII.3), Hill et al. (1996, Chapter 7), Klein and
Moeschberger (1997, Chapter 11), Hosmer and Lemeshow (1999, Chapter 6) or
Therneau and Grambsch (2000, Chapter 6).
Beyond asserting the role of the specific approaches to goodness-of-fit in
survival analysis and more general event history analysis, the purpose of this
presentation is to report on two recent examples from my own experience, where
the classical stratification approach to graphically assessing proportionality of
hazards was used in nonstandard contexts.
301
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
302 N. Keiding
ACtlz) = Ao(t)ef3z
for the hazard A(tlz) at time t for given covariates z. Here AO(t) is a freely
varying "underlying" hazard which together with the log-linear dependence on
covariates makes the model semi-parametric.
The proportionality assumption is restrictive, but nevertheless often taken
for granted. The most common specific graphical checks and numerical tests
focus attention on one covariate component at a time, i.e. asks whether pro-
portionality holds "with respect to w" in the model
Cox (1972a) pointed out that his model and the estimation techniques allowed
time-dependent covariates z(t), and he was there primarily motivated by wishing
to test numerically for proportional hazards. As an example in our situation,
this could be done by defining an additional covariate WI (t) = w . log t and
testing the hypothesis c5 = 0 in the extended model
A(tlz,w) = Aw(t)ef3 z
and check (usually graphically) proportionality of AO(t) and Al (t). Note that
(3 does not depend on w. Under proportionality Al (t) = e'Y AO(t), i.e. Al (t) =
JJ
e'YAo(t) and logAI(t) = "! + log Ao(t) with Aw(t) = Aw(u)du.
Thus, the curves (t,logAw(t),w = 0,1, are parallel, and (Ao(t),AI(t)) is
the line through (0,0) with slope e'Y. The two most common plots are therefore
(t,logAw(t)) and (Ao(t),AI(t)), see Andersen et al. (1993, Section VII.3).
We shall here give two non-standard examples of this basic stratification
approach to testing proportional hazards.
Assessing Proportionality of Hazards 303
Under the assumed model the ratio between the logarithms of one minus
the hazard for persons i and j is
1. Times Xi from one renewal to the next, contributing the density !(Xi) to
the likelihood.
Assessing Proportionality of Hazards 305
,
...., - Kapian.Meier estimate
0.9
,
..
~, '" ,- -·.. Kt.1 + confidence interval
-'. -'.'" ·"tCt. ...... KM .. confidence Interval
to, '''.
0.8 ",.
-',
<'",
....: :., ..
'., ~~
to '0
...~ .....
0.7 ... ,.. ~'"
I, to,. '
......
'lit'''.-r._ ..
_-- .. .. -......... _- ..... _-.;- . . . ..
..... ... ::-:::-.......................
0.6 ..... .... ......... .......... .. . -.~
... .
~
"
'" ...........
0.5 " ..........
"
'" -, -"
0.4 "
" ,"
0,3
0.2
0.1
o++++++++++++++++++~~++++++++++++++++++
o 2 4 6 8 10 12 14 18 18 20 22 24 28 28 30 32 34 38 38
In practice the regression coefficients f3hj and the underlying intensities aOhj (t)
after claim j are assumed identical for j = 2,3, .... A good evaluation of the fit of
the Cox model can be based on first assessing identity of regression coefficients
(f3hl = f3h2) and then, refitting in a so-called stratified Cox regression model
with identical f3hj but freely varying ahOj (t) over j, comparing the undedying
intensities (aohl (t) = aOh2 (t)) after first and after later claims. For the first
hypothesis a standard log partial likelihood ratio test may be performed, for
the second Keiding, Anderson, and Fledelius (1998) documented a series of
graphical checks as surveyed by Andersen et al. (1993, Section VII.3). Further
development of this goodness-of-fit approach might follow the lines of Andersen
et al. (1983).
Based on generalizations of the standard graphs mentioned above, Keiding,
Anderson, and Fledelius (1998) concluded that property and auto claims could
not be described by the postulated modulated renewal processes, while the
model for household claims was not rejected by this approach.
308 N. Keiding
References
1. Aalen, O. O. and Husebye, E. (1991). Statistical analysis of repeated
events forming renewal processes, Statistics in Medicine, 10, 1227-1240.
3. Andersen, P. K., Borgan, 0., Gill, R D., and Keiding, N. (1993). Statis-
tical Models Based on Counting Processes, New York: Springer Verlag.
10. Gill, R D. (1983). Discussion of the papers by Helland and Kurtz, Bul-
letin of the International Statistical Institute, 50, 239-243.
12. Hill, C., Com-Nougue, C., Kramar, A., Moreau, T., O'Quigley, J., Senoussi,
R, and Chastang, C. (1996). Analyse statistique des donnes de survie, 2e
ed., Paris: Flammarion.
Assessing Proportionality of Hazards 309
17. Keiding, N., Andersen, C., and Fledelius, P. (1998). The Cox regression
model for claims data in non-life insurance, ASTIN Bulletin, 28, 95-118.
18. Keiding, N. and Gill, R. D. (1990). Random truncation models and
Markov processes, The Annals of Statistics, 18, 582-602.
23.1 Introduction
Andrews plots [Andrews (1972)] as a tool for the graphical representation of
multivariate data have recently gained considerable popularity due to the sim-
plicity of their plotting and many desirable and attractive mathematical prop-
erties. Khattree and Naik (2001) provide a review of Andrews and other related
plots along with their merits and demerits.
For a p-dimensional multivariate observation y = (Yl, ... , YP)" the Andrews
curve in argument t is defined by
311
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
312 R. Khattree and D. N. Naik
which has all the properties of the Andrews plots yet avoids some of the short-
comings of Andrews plots. Specifically, the odd numbered terms in the Andrews
plot given in (23.1) simultaneously vanish at t = O. This is a major disadvan-
tage since human eyes tend to concentrate on and notice changes or similarities
more easily and more immediately in the central regions of the graph (that is,
around t = 0); yet, for Andrews plot around t = 0, these similarities or dissim-
ilarities are mostly due to even numbered variables Y2, Y4,' ... This is not so,
for the modified Andrews plots given by (23.2). It can be seen from the fact
that ~[sin(kt)±cos(kt)] = sin (kt ± ~J = sin (k (t ± ,Ik)), which has the phase
angle ±4'k' Thus, none of the terms in (23.2) has the common phase angle and
hence will not vanish at the same point. Therefore, for any t, the value of fy(t)
depends on at least (p - 1), Yi values.
The objective of this article is to use the Andrews plots to graphically rep-
resent the associations in the contingency tables. Often such associations are
depicted through correspondence analysis where the rows and columns of the
contingency table are represented by points in a two-or three-dimensional scat-
ter plot. As mentioned earlier, this may not be completely satisfactory due
to loss of information caused by the dimensionality reduction. Andrews and
modified Andrews plots are clearly the ideal alternatives to avoid such loss. We
describe the approach in the next section.
Modified Andrews Plots 313
r=Ql
and
e = Q'l,
Dr = diag (r),
Dc = diag (e),
and let
F = D;l(Q - re')D~l(Q - re')'.
which is the Pearson's chi-squared test statistic for testing independence be-
tween A and B. The quantity TJ = tr(F) is referred to as the total inertia. It is
just a multiple of Pearson's chi-squared and can be interpreted as a measure of
the magnitude of the total row (or column) squared deviations. This interpreta-
tion avails us an opportunity to attempt to partition TJ into several components
314 R. Khattree and D. N. Naik
Normal (1) 13 8 1 0 0
Some (2) 6 43 19 4 5
One (3) 1 9 155 54 24
Two (4) 0 2 18 162 68
Three (5) 0 0 11 27 240
The row and column points y = (Yl, Y2, Y3, Y4)' for the data are obtained by
performing the generalized singular value decomposition and these are reported
in Table 23.2. Using these, we obtain the functions fy(t) and gy(t) for each of
the five rows and columns. These plots are given in Figures 23.1 and 23.2. In
the case of perfect agreement between the clinical site and quality control site
readings, all row points should be the same as the corresponding column points,
and thus the corresponding Andrews curves should be identical also. As seen
in the plots, the agreement between the two readings appears to be quite good
for each of the five classification groups and the (modified) Andrews curves of
the corresponding points follow each other very closely in case of fy(t) as well
as gy(t).
The same argument can be made for the plots in Figures 23.3, 23.4 and
23.5, where the components of vector y have been permuted so that different
Yi are assigned to the constant ~ in the expressions of gy(t) (similar plots
corresponding to fy(t) are not shown here). It is not surprising, since in the
present case, none of the four dimensions can be deemed redundant in that each
explains a significant percent of total inertia. Specifically, these percentages are
43.80, 28.59, 16.31 and 11.30%. However, in many cases, the major portion (up
to 85 or 90%) of the total inertia can be attributed to the first two dimensions.
Whenever this does not happen, as in the present case, the use of only the first
two dimensions to obtain a plot of points will not be very effective compared
to the use of Andrews plots which utilize all dimensions.
316 R. Khattree and D. N. Naik
Points Coordinates
YI Y2 Y3 Y4
Row Normal 3.14513 1.72789 0.91817 1.52739
(QO Site) Some 1.67179 0.25096 -0.32230 -0.97903
One 0.24290 -0.84605 -0.32512 0.27160
Two -0.34541 -0.16330 0.72270 -0.15729
Three -0.61364 0.68014 -0.34912 0.05434
Example 23.3.2 Data in Table 23.3 are taken from Srole et al. (1978) and
are reported in Agresti (1990) as well. The objective of the study was to
examine the relationship, if any, between the mental impairment and parents'
socioeconomic status. Six levels of socioeconomic status, 1 (high) to 6 (low)
and four levels of mental health status, A (well), B (mild symptom formation),
C (moderate symptom formation) and D (impairment) are taken. Data are
obtained on a sample of 1660 residents of Manhattan. It may be pointed out
that both variables are ordinal in nature.
As per the structure of the data for plotting, we have six row points and
four column points in the three-dimensional space. However, the first dimension
alone is able to explain 93.95% of the total inertia, and the first two dimensions
combined explain almost 99% of that. Thus, the third dimension contains no
significantly useful information for all practical purposes. The ten row and
column points are listed in Table 23.4. The modified Andrews plots are given
in Figures 23.6 and 23.7. The following observations are made from Figure 23.6.
(a) The two variables are ordered categorical in nature and this fact clearly
shows up in Figure 23.6. Also, it can be observed that there is possibly
a further subgrouping within the socioeconomic status. Status 1 and 2
are similar to each other; 3 and 4 are similar and to some extent, there
Modified Andrews Plots 317
5 1 \ i
C
4 ~
.2
U
c
~
I..-
~ 1
~
-c
c
0
... -1
-2
-4 -3 -2 -1 o 1 2 3 4
6
5-
-=-
Ol
I:
~0
4
3
r~;
" 2 " "' ....
'"
::l
"-
1 < ~
"-!! -~ 13
:0
0
:::;
0 i
-1
-2
-4 -3 -2 -1 0 1 2 4
Coeff_ of y1 is=1Isqrt!21
(b) With respect to mental health status, the health status B (mild symptom
formation) and C (moderate symptom formation) appear to be similar,
but are clearly different from A (well) and D (impaired). Further, the
ordinal nature of each health status is self evident in these plots.
(c) There appears to be positive association between the mental health status
of children and parents' socioeconomic status, with mental health of chil-
dren to be generally better at the higher levels of parents' socioeconomic
status. Further, the upper two levels of parents' socioeconomic status
groups generally correspond to a "well" mental health status; the next
318 R. Khattree and D. N. Naik
6 j
'"
......
'"c
11
0
:nc
..."
:ll
"t)
~
'ij
a
:::<
j
" i
-4 -3 -2 -1 0 1 2 3 4
g;
i::
.~
H
1
1i 0
"
::I
"- -I
"~ -2
'ii -3
0
:;
-4
-5
-4 -3 -2 -1 o 1 2 3 4
Coeff. of y3 is=1/sqrtl2i
The plot Figure 23.7 corresponds to the case when Y2 has been assigned
to the constant ~ in the function gy(t). This plot reveals a few additional
interesting observations.
(a) With respect to parents' socioeconomic status, curves corresponding to
groups 1 and 2 show the patterns just opposite to those corresponding to
Modified Andrews Plots 319
~~
i"l
4 -1
'"C
.2 3
;:; 2
t:
:l
U. 1 j
"-=
~
-~ ~
'6
0
::::; -21
=~ l
-4 - 3 -2 -1 1 2
groups 5 and 6.
(b) With respect to mental health status, groups A (well) and D (impaired)
follow the patterns opposite to each other.
(c) There, possibly, is very little difference between groups 3 and 4 of parents'
socioeconomic status and between groups Band C with respect to mental
health status. Conceivably, in each case, the corresponding two groups
can be combined. If this is done, then there is a positive association
between the combined groups (3, 4) and (B, C). Plots for this situation
r-
are not shown.
0.21
0.16
0; 0.11 .
i: 0.06
.2
;:; 0.01
c:
2 -0.04
., -0.09
"1J
-= - 0 . 14
:;:;
0
::::; -0.19
-0.24
-0.29
-4 -3 -2 -1 0 1 2 4
1 (high) 64 94 58 46
2 57 94 54 40
3 57 105 65 60
4 72 141 77 94
5 36 97 54 78
6 (low) 21 71 54 71
It may be emphasized that so much information could not have been ex-
tracted from these data if traditional two-dimensional scatter plots of corre-
spondence analysis were constructed or if the usual Pearson's chi-squared test
for independence was applied. Further, we note that the suggested modifica-
tion of Andrews plots provides curves which are easier to interpret and are more
informative.
Points Coordinates
Row Yl Y2 Y3
(Socioeconomic ) 1 0.180932 0.019248 0.027525
2 0.184996 0.011625 -.027386
3 0.059031 0.022197 -.010575
4 -.008887 -.042080 0.011025
5 -.165392 -.043606 -.010368
6 -.287690 0.061994 0.004824
Define
h'1
h'2
h'a
where the vector hi is defined as
0.4
-:::- 0.3
0:; 6
i: o. 2
"
/
.Q
U 0.1
c:
:>
u..
O. 0
"tl
~~~~~----~~~~d
~"
0
-0. 1 -
,~
:::e - 0 .2 Well
- 0. 3
-4 -3 -2 -1 1 2 4
and as stated earlier, it depends only on the ith and jth profiles.
The canonical coordinates for representing row profiles are determined from
1
ro '
the singular value decomposition of the a x b matrix D~ (H - 1'11), where Rao
suggests the choice of vector. 11 as
11=( fI~
n.l
~' ~, ... , ~
n.2 n.b
or
11 = H'r.
It must be noted that the dimension of the space m, in this case, is the rank
1
of D~ (H - 1'11) and is not necessarily equal to min(a, b) - 1. If the singular
1
value decomposition of D~ (H - 1'11) is
1
D~(H -1'11) = AlUlV~ + A2U2V; + ... + AmUmV~,
then canonical coordinates for row profiles are given by
1 1 1
A1D;2"Ul, A2D;2"U2, ... , AmD ;2"Um ,
where 1
ac= [diag[(H - l11')'(H - 111')]] 2" .
Modified Andrews Plots 323
These can be plotted in the same plot. If one wishes to have the canonical
coordinates for column profiles and standard coordinates for row profiles, above
formulas can be suitably modified.
We will illustrate the modified Andrews plots based on Rao's correspondence
analysis using a drug efficacy data of Calimlin et al. (1982).
Example 23.4.1 The data are reported in Table 23.5. The objective of the
study was to examine the ratings assigned by the hospital patients to four
drugs, with an objective to determine whether a particular drug or a group of
drugs are favored by the patients. The four drugs are named as Z100, EC4,
C60 and C15 and the five ratings on ordinal scale are from poor to excellent.
We compute here the canonical coordinates for row profiles representing various
drugs and standard coordinates for the column profiles representing the ratings.
The coordinates are not shown here to save space. Also, we present here only
the modified Andrews plots as Figures 23.8 and 23.9.
Figure 23.8 clearly illustrates the closeness of the modified Andrews curves
corresponding to drugs C15 and C60 and that of curves corresponding to EC4
and Z100. In the same plot, one also observes the anticipated clustering of the
efficacy ratings, namely, {Excellent and Very Good} forming one cluster and
the remaining three ratings forming the other cluster. The proximity of EC4
and Z100 to the higher two efficacy ratings and proximity of C15 and C60 to
the lowest three ratings is also noted. This indicates that the EC5 and Z100
are perceived as the superior choices by patients.
Figure 23.9 represents modified Andrews plot when the roles of Yl and Y2
(that is, the scores corresponding to the first and second dimensions) in (23.2)
have been interchanged. This obviously changes the curves. However, in this
plot the similarities and dissimilarities between various drugs and ratings and
associations between the particular drugs and ratings are depicted not by the
relative closeness of corresponding curves but by their particular patterns (in
terms of ups and downs). The curves corresponding to EC4 and Z100 have
324 R. Khattree and D. N. Naik
-2
-4 -3 -2 -1 a 1 2 3 4
--------
2 oj
~
-0: ,,-
;; 1
.Q
"0
c:
::J
L.-
0
"tl
:! .....
'5 -1
0
:::E
-2
-4 -3 -2 -1 o 1 2 4
23.5 Conclusions
Andrews plots as a powerful graphical technique for multivariate data are not
only useful in traditional clustering or outlier detection problems, but they can
also be very important tools in analyzing experimental data in other discrete
or descriptive multivariate analysis problems, such as contingency tables and
correspondence analysis. The fact that there is no loss of dimension, and hence
no approximations before the graphical displays of the data, allows these plots to
he much more informative than traditional scatter plots or biplots. Further, the
modified Andrews plots are even more informative and useful in the graphical
displays of these data.
References
1. Agresti, A. (1990). Categorical Data Analysis, New York: John Wiley &
Sons.
24.1 Introduction
Given an ordered sample X = {Xl ~ ... ~ x n }, where the x~s are iid observa-
tions of a r.v. X with continuous cdf F, the classical statistics for testing the
hypothesis Ho : F = Fo are based on
and
(24.1)
where Fn is the empirical cdf and 'l1 is a suitable function. Recently, some
goodness-of-fit tests have been proposed by studying the Wasserstein distance
between Fn and Fo
327
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
328 C. M. Cuadras and D. Cuadras
[del Barrio, Cuesta-Albertos, and Matran (2000)]. Another related test is based
on the maximum correlation between the sample and X
where x, J-L are the means, s2, 0"2 the variances and
[Cuadras and Fortiana (1993)]. These statistics can be justified as follows. Let
X, Y be two r.v.'s with cdf's F,G and finite means and variances J-Ll,J-L2,O"I,0"§,
respectively. The Wasserstein distance between F, Gis
W2 = inf E(X -
H
y)2 = r [F-l(u) - G- (u)]2du,
Jo
1 1
p+ = supp(X, Y) = (
H
r F-l(u)G- (u)du - J-LIJ-L2)/0"1O"2,
Jo
1 1
i.e., (Aj, '1/Jj) satisfies K'1/Jj = Aj'1/Jj. Cuadras and Fortiana (1995, 2000) proved
l
that defining
x
hj (x) = '1/Jj (s)ds,
where
/.Lo = E(X), /.Lj = E(Xj ).
The sequence (Xj) can be obtained by Karhunen-Loeve expansion on the sto-
chastic process X = {Xt, t E [a, b]}, where X t for each t E [a, bl is the indica-
tor of [X > t] = X-I (t, <Xl), or by continuous scaling on the distance function
8(x, X') = (I x - x' 1)1/2. It can be proved that
1 00
!
where X, X' are iid and V = E[I X-X' Il is the so-called geometric variability
of X with respect to distance 8. Moreover Aj =Var(Xj ) and each eigenvalue
accounts for the geometric variability V, which is a dispersion measure for X.
lb
Thus
Xl = h1(X) = X t '1/J1(t)dt,
1 1
L -('
00
tr(K) = )2 = -6'
J7r
j=l
4 1
L e = 2'
00
tr(K) =
j=l J
330 C. M. Cuadras and D. Cuadras
1
=L .(" + 1) = 1,
00
tr(K)
j=l J J
where Ll (x) = y'3(2x - 1), L2(X) = J5(6x 2 - 6x + 1), ... are the shifted
Legendre polynomials on [0, 1J.
3. Let us define the "maximum correlations" rl, r2, ... between the sample
and the principal dimensions Xl, X 2, ... of a r. v. X as the correlations
°
obtained by considering the bivariate cdf Hn(x, y) = min{F(x), Fn(Y)}.
Assuming /-LO = and var(X) = 1, writing the last expansion in (24.3) as
00
X = LPiYj,
j=l
where each }j now has variance 1, it is clear that Pj = p(X, Xj) is the
correlation coefficient between X and the principal dimension X j . The
agreement between Fn and F may then be seen by comparing rl, r2, ... to
Orthogonal Expansions 331
PI, P2, .... Taking the expectation E(X·X), where X is the sample, with
respect to H n , we obtain the following relation between these correlations
and the overall maximum correlation:
CXl
r;t = LPjrj.
j=l
Thus r;t has an expansion like A~ and W~, where Pl,P2, .... are constant
coefficients and rl, r2, ... are random but not independent.
where
Ai = (n - i) log (n - i) + i log (i) ,
Bi = (n - i + 1) log (n - i + 1) + (i - 1) log (i - 1) ,
with 0 log 0 = O. Thus the maximal Hoeffding correlation between the sample
and the logistic variable X, with mean 0 and variance 71 2 /3, is given by
Thus, the first four correlations between the sample X and the principal dimen-
sions are:
rl = v'I2(ml-x/2)/s,
r2 = ..)180 (m2 - ml + x/6) Is,
r3 = .J28 (10m3 - 15m2 + 6ml - x/2) Is,
r4 = J95 (7m4 - 14m3 + 9m2 - 2ml + x/l0) / s,
where x, s are the mean and standard deviation of the sample. The expansion
of r;t is
r+ = 3.. "J3(2j+l)r.
n 7r ~ . ( . + 1) J
J odd J J
D. Plotting the principal dimensions.
Figures 24.1-24.4 give the principal dimensions hi(X),i = 1, ... ,4, where
X is standard logistic. By using the following approximation [Abramowitz and
Stegun (1972)]:
y(p) = ..)-2Iogp
2.515517 + 0.802853y'log(l/p2) + 0.0103281og(l/p2)
1 + 1.432788Jlog(l/p2) + 0.1892691og(l/p2) + .001308(log(l/p2))1.5'
where p is (0,1) uniform, we generate the N(O,I) r.v. Z = y(1 - p). Thus
y = (7r / V3) Z is a normal r. v. with the same mean and variance as X with
Orthogonal Expansions 333
standard logistic cdf F. Then we also plot hi(Y), i = 1, ... ,4, where Y is
normal, see Figures 24.1-24.4.
These plots have been obtained as follows. We write hi(X) in terms of
F(X) = p, and hi(Y) in terms of F(Y), i.e., F(G- 1 (p)), where G is the cdf of
Z. As it is described below, in Figures 24.5-24.8 we perform a similar plot, but·
replacing X, Y by a logistic and a normal sample, respectively.
where:
1. The ten left and the ten right samples are generated as logistic and normal,
respectively.
t, t
2. r r are the maximum correlations and D L, D N are the Kolmogorov
statistics obtained assuming logistic and normal distribution [Stephens
334 C. M. Cuadras and D. Cuadras
Thus, for the first logistic sample rt = 0.983 < rt = 0.985 and the fit to
the normal appears better. However, the sample correlations with the first four
dimensions are:
and the agreement with the theoretical correlations is quite good. For the first
normal sample with the same size we have rt = 0.976 < rt = 0.983.
Figures 24.5-24.8 give a plot of the first to the fourth dimensions. The
continuous line contains the theoretical values, the 6-line is the first logistic
sample and the .-line corresponds to the first normal sample. The fit to the
theoretical line is better for the logistic sample, whereas the normal sample
trend is similar to its theoretical curve, see Figures 24.5-24.8. So both samples
can be identified correctly.
Some conclusions are:
1. r+ may not distinguish between logistic and normal when the sample is
logistic, but provides a correct distinction when the sample is normal.
4. The graphical decision could not be 100% conclusive due to the proximity
between logistic and normal curves, but may help the user when the other
tests are unable to make a clear distinction.
Orthogonal Expansions 335
2
•
1.5
0.5
1
O~O----~O.~2--~O~.4~-p~O~.6----~O.~8----
Figure 24.1: Plot of the theoretical principal dimensions hI (X), hI (Y), where
X, Y follow the logistic (solid line) and normal (dashed line) distribution re-
spectively
1.2
Figure 24.2: Plot of the theoretical principal dimensions h2(X), h2(Y), where
X, Y follow the logistic (solid line) and normal (dashed line) distribution re-
spectively
336 C. M. Cuadras and D. Cuadras
14}! I
1.21 ,c·
j.l.
i
08+
0.6
Figure 24.3: Plot of the theoretical principal dimensions h3(X), h3(Y), where
X, Y follow the logistic (solid line) and normal (dashed line) distribution re-
spectively
Figure 24.4: Plot of the theoretical principal dimensions h4(X), h4(Y), where
X, Y follow the logistic (solid line) and normal (dashed line) distribution re-
spectively
Orthogonal Expansions 337
Figure 24.5: First logistic dimension: continuous line. Logistic sample: ~-line.
Normal sample: .-line. Compare to Figure 24.1
12
•
• t.
••
<!.<!.
<!.
• ••
0.6
•
t.
f
{
1.2
/'" j"
"" •• •
" }. .• ••
0.6 "t. •
/ " •
! t.
""
•• A
Figure 24.7: Third logistic dimension: continuous line. Logistic sample: .6.-line.
Normal sample: .-line. Compare to Figure 24.3
0.8
"' f:
•
\-'
o
•
\
Figure 24.8: Fourth logistic dimension: continuous line. Logistic sample: .6.-,-
line. Normal sample: .-line. Compare to Figure 24.4
Orthogonal Expansions 339
References
1. Abramowitz, M. and Stegun, 1. A. (1972). Handbook of Mathematical
Functions, New York: Dover Publications.
2. Cuadras, C. M. and Fortiana, J. (1993). Continuous metric scaling and
prediction, In Multivariate Analysis, Future Directions 2 (Eds., C. M.
Cuadras and C. R. Rao), pp. 47-66, Amsterdam:Elsevier Science Pub-
lishers B. V. (North-Holland).
Denis Bosq
Universite Paris VI, Paris, France
25.1 Introduction
In this paper we study a large class of goodness-of-fit tests in a general frame-
work. This class contains the smooth test introduced by Neyman (1937) and
the Pearson's X2-test (1900).
The functional tests of fit (FTF) are based on the deviation of a density
estimator with respect to the true density. They have been considered by
several authors, we may quote Bickel and Rosenblatt (1973), Nadaraja (1976),
Henze (1997), Hart (1997), Gregory (1980), among others.
We now describe the class oftests that is studied in this paper: Let X I, ... ,
Xn be i.i.d. random variables with values in a measurable space (E,8). We
want to test
Ho : Xl has the distribution J..l(XI J..l). ('oJ
341
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
342 D. Bosq
with
(25.6)
If B holds, then
(25.8)
344 D. Bosq
where eo = 1, el, e2, ... is a fixed orthonormal system in L2(1-") such that mn =
maxO::;j::;kn lIejlloo < 00.
Concerning v we suppose that the covariance matrix
r v,n (1 ejef.dv - 1 1
djdv ef. dv ) l::;j, f.::;kn
(25.10)
v=j·I-"+'Y
where 0 1= f E L2(1-") and'Y is orthogonal to 1-".
Now as usual we will say that the test IITnll > Wn is consistent if
Functional Tests of Fit 345
The following statement gives necessary and sufficient conditions for consis-
tency.
Theorem 25.3.1 If B holds the FTF test is consistent if and only if
Wn- +
-
n
°, (25.11)
and
Wn - L:l AJn
(25.12)
(L:l A]n) 1/2 -+ 00.
Now it is possible to obtain bounds for an and f3n. In the next statement
K~ = Kn - 1 and Wn = cy'ri.
2
an < 2 exp (_n _ _-,-c_ -:-:--) (25.13)
- 2a n + (3/2)b n c
with
and
bn = IIKn (X1, ,)1100'
(2) If Tn := I J Kn(X, ·)dv(x) II - c > 0, then
where
and
b~ = IIK~(X1,') - EI/K~(X1' ,)1100'
If Kn = L:J=O ej @ ej where k is fixed and l\ejlloo < 00, j = 1, ... , k,
Theorem 25.3.2 shows that the level and the power of the test tend to zero at
an exponential rate. More generally the same property holds if
00
Kn =L Ajej @ ej
j=O
is fixed and bounded.
346 D. Bosq
I/n = (1 + hn ) . 1/ (25.16)
a condition that implies contiguity of (I/n ) with respect to J.L [see Roussas (1978)
or Van der Vaart (1998)].
Under adjacency we have the following asymptotic result.
Theorem 25.4.1 If (CJ) and (C2) [resp. (C)] hold then
(25.18)
kn
Kn(x,y) = 1 + Lejn(x)ejn(Y); X,Y E E, n ~ 1 with (k n ) -+ 00. (25.19)
j=1
We then have
kll
Theorem 25.4.2 IjsuPj,n Ilejnlloo < 00, kn -+ 00, ~ -+ 0 then
(25.20)
thus
(25.21)
Now for the kernel defined by (25.19) one may use (25.8) in Corollary 25.2.1.
Set w; = kn + v'2knN a then IITnl1 > Wn has asymptotic level a and Theorem
25.4.2 provides the asymptotic power
(25.22)
I/n-i = (1 + gn,i)
Vn' JL,
gn,i- t git= 0, weakly, 1 ~ i ~ k; with 1,gl, ... ,gk orthogonal. This situation
leads to the choice
(25.24)
Concerning the .xjs they can be useful to measure a weight for each "part" of
HI.
(2) If Xl n r-..J vn , n ~ 1
This lemma is, in some sense, more general than those generally given since
it does not suppose that (cnllgnI1 2 ) has a limit.
Here we only use Lemma 25.6.1 in the particular case where Cn ---t 1. It
is then easy to see that, if the N.P. test has asymptotic level a EjO,I[, its
asymptotic power is given by
Now if (C) holds one may use (25.21) and (25.27) to obtain asymptotic
efficiency of the optimal FTF test. We have
which shows that the FTF test has a good asymptotic behaviour.
350 D. Bosq
with SUPj lIejlloo < 00, IAjl 1 0 and l:j A; < 00. On the other hand we set
(K') ~2(v)
cT (v) = ~(1 + 0(1)) as ~(v) -t O. (25.31)
1
and
IIhl1 2 (25.34)
EB(h) = 2 n1 + h) log(1 + h)dJ.L .
Note that
where
1 n
o'jn = "2 L
ej(Xj), j 2: 1.
i=1
Concerning Znl one uses Sazonov (1968) inequality that gives the bound
3coMn~. The bound (6 + 2£n + M2)A;(;f is obtained by using Tchebychev
inequality at the order 4. Details which are rather intricate, appear in Bosq
(1980).
Corollary 25.2.1 is an easy consequence of Theorem 25.2.1. Proof of Theo-
rem 25.2.2 is similar to that of Theorem 25.2.1. Theorem 25.3.1 is a consequence
of Theorems 25.2.1 and 25.2.2. Theorem 25.3.2 is easily established by using
exponential type inequalities in Hilbert space [see Pinelis-Sakhanenko (1985)
and Bosq (2000)].
Proofs of Theorems 25.4.1, 25.4.2 and Lemma 25.6.1 are given in Bosq
(1983) .
PROOF OF THEOREM 25.6.1. (sketch) The FTF test is based on the statistics
Un = II~ L~1 K'(Xi, ')11·
The strong law of large numbers in a Hilbert space entails
On the other hand the Sethuraman theorem [see Nikitin (1995, p. 23)] implies
1
-logPj.t(Un > c) - - - t £(c), c>O (25.37)
n n-H)Q
where
c2
£(c) =- 2(}2 (1 + 0(1)) as c - 0 (25.38)
with
(}2 = sup Varx*[K'(Xl' .)].
Ilx*II=1
It is easy to see that
Now we are in a position to use the Bahadur Theorem [see Nikitin (1995,
pp. 6-7)]. We obtain
Ct(v) = :r II! 2
K'(x, .)dV(X)11 (1 + 0(1)) hence (25.31).
25.8 Simulations
The simulations presented below have been performed by Izraelewitch et al.
(1988). The kernel has the form (25.15) where el, ... , ek are Legendre polyno-
mials over [-1, +1].
Here m is the uniform distribution on [-1, +1]. The goal of these simulations
is comparison between the power of X2 test and the power of the FTF test based
on the Legendre polynomials under various alternatives.
For each alternative the problem is transported over [-1, +1] by putting
2 3 4 5 6 7 8 9 10
100
90
80 /
70 ____
60
50
40
30
20
10
2 3 4 5 6 7 8 9 10
tOO
90
80
70 '-----"
60
50
40
30
20
10
2 3 4 5 6 7 8 9 10
100
~----------
90
80
70
60
50
40
30
20
10
2 J 4 5 6 7 3 9 :0
100
90
, ,
80 I \
I
70 I ,,
I
60 I
I
I
50 I
/
I
40 I
30
2 3 4 5 6 7 8 9 10
100
90
----- , ,
80
70
60
50
40
30
20
10
2 3 4 5 6 7 3 9 10
100
90
80
70
60
50
/-
40
30
20 ,,
10 -- -- ... , -.' _.. -"" --- , ,
2 3 4 5 6 7 8 9 10
100
90
80
70
60
50
'-',\
40 I
\ , ... - ....
,. "
30 \
..... "
,, ,
~
20 '--
10
2 3 4 5 6 7 8 9 10
References
1. Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the
deviations of density function estimates, Annals oj Statistics, 1, 1071-
1095.
2. Bosq, D. (1980). Sur une classe de tests qui contient le test du X 2 , Publ.
[SUP fasc. 1-2 p. 1-16.
9. Izraelewitch, E., Lafitte, I., Lavault, Z., and Roubert, B. (1988). Le test
du X2 et Ie test de Legendre, Projet ISUP - Paris.
11. Neyman, J. (1937). Smooth test for goodness of fit, Skand. Aktuar, 20,
119-128.
26.1 Introduction
Let Xl, ... , Xn be i.i.d. observations from a real random variable X with density
f (.) on lR. We consider the problem of testing
1 ('-1-")
'Ho : f(·) = ;fo --;- against 'HI: f(·) 1 ('-1-")
=;h --;- (26.1)
where fo and h are two densities with known form and (1-", 0") E lR x lRt are
the location-scale parameters. If (1-",0") are given under both hypotheses, the
Neyman-Pearson Lemma gives the most powerful test for (26.1). Otherwise,
it is natural to restrict attention to the class of tests invariant to the group of
affine-linear transformations
Lehmann (1959) gives in this context the most powerful invariant (MPI) test
for (26.1), who rejects 'Ho for large values of log(ql/qo) where, for j = 0,1 and
357
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
358 G. R. Ducharme and B. Frichot
(26.2)
(26.3)
e
Proposition 26.2.1 If the derivatives of order 2 of j are continuous in the
neighborhood of (fl, 0") and if (fl, 0") are in the interior of D, then
Corollary 26.2.1 If oej jO/1 and o2ej jou 2 are continuous in the neighborhood
of (flj,O"j) in D, (26.2) can be written
Quasi Most Powerful Invariant Tests of GOF 359
(26.6)
(26.7)
A eC(it,&-) (27r)
qL = no-vlAB _ C21
with (ft, 0-) the maximum likelihood estimators of (p" IJ) and
A
1 2 n exp (2~)
0- 2 - n0- 2 ~ (1 + exp (Xi-;/ ))2
(26.8)
~ n [( ~) exp (~)]2
B
A2L
nlJ i=l 1 + exp (')
Xi-;;'""
(26.9)
C (26.10)
360 G. R. Ducharme and B. Frichot
References
1. Barndorff-Nielsen, O. E. and Cox, D. R. (1989). Asymptotic Techniques
Jor Use in Statistics, New York: Chapman & Hall.
2. D'Agostino, R. B. and Stephens, M. A. (1986). Goodness-oj-fit Tech-
niques, New York and Basel: Marcel Dekker Inc.
3. Franck, W. E. (1981). The most powerful invariant test of normal versus
Cauchy with applications to stable alternatives, Journal oj the American
Statistical Association, 76, 1002-1005.
4. Hajek, J. and Sidak, F. (1967). Theory oj Rank Tests, New York: Acad-
emic Press.
Quasi Most Powerful Invariant Tests of GOF 361
Jean Bretagnolle
University of Paris-Sud, Orsay, France
365
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
366 J. Bretagnolle
(27.4)
Then, to evaluate the level of a test about the partial sums,we can increase it
under the additional hypothesis, that in every column,the Pij does not depend
on i:
- for the tests on the total sum
(In the Rasch Model, we suppose that the odds ratio A = pl(l - p) can be
written Aij = exp(Bi - /3j).)
The good ordering of the questions correspond to the case with () = id.
We have one observation, (ei,j) (independent conditionally to the Ai,j), with
law B(l,Pi,j) (in the following, we consider the A as deterministic). Let Sj be the
score of the question j. We want to test the hypothesis () = id against () =I- id.
(Actually, as there is an equal number of parameters and observations,we cannot
estimate them.) Let then
Ho: The Pj are increasing on j. (27.5)
Let us consider the associated problem, where the independent Tj follow re-
spective laws B(I,pj). Note HA the additional hypothesis:
The natural estimates are the pj = Sjll. But under Ho and HA, we can use
another estimate called isotonic:
Let Fj be the cumulative suit: Fo = 0 , ... , Fj = Fj-1 + Sj,'" .
The maximum likelihood estimate, under monotonicity constraint, is con-
structed as follows: Let Gj be the bigger convex function smaller than Fj. The
estimate is defined as
(27.7)
By nodes suit we mean the suit of integers whereG and F coincide, and by
structure of the observation, the statistic
:E = (U, VO = O,Fo = 0, v1,Fyu "', Vu = J,FJ) (27.8)
where the right-hand depend only on the Pj' We will then get a good
approximation by substituting the fiJ or the Pj.
This upper bounding-evaluation is valid underHo , but unfortunately, it
is very conservative because the inequality F L ~ <5 + ~ is very bad (in
a similar problem of isotonic estimation of a density or a regression, the
order of size is not the same).
(B) The statistic FL is convex on the scores. Note p(pj) any probability
with mean (Pj), and P(pj),A the binomial probability with the same mean.
Then, for any c.p increasing convex, we have
(27.11)
(27.12)
Under H A, the bootstrap level is very good, but slightly larger than the
theoretical level when the Pj are constants, which is not surprising for special-
ists who well know that the isotonic estimation is most problematic when the
estimate is not strictly monotonic. This phenomenon will correct itself when
I increases (else, the test is conservative). Let a the theoretical level, a the
bootstrap level.
For J = 4,5,6, f = 400, m = 200, by I = 10, for the two tests, with Pj
constant
a = 5%, a - a < 1 . 9%
a = 10%, a - a < 3%
a = 20%, a - a < 4%.
27.6 Conclusion
Our present applications are promising, but do not include the expected result
which could be:
for tests about the gap between the observation and its isotonic, the
maximum level is obtained under the binomial case, and that will
allow us to use a test function only of the scores.
References
1. Borodin, A. and Salminen, P. (1996). Handbook of Brownian Motion
Facts and Formulae, Basel: Birkha,user Verlag.
370 J. Bretagnolle
28.1 Introduction
For the past ten years, almost all major clinical trials have included programs
designed to evaluate Quality of Life (QoL). Now that the data from these trials
are being analyzed as part of the results, we are faced with the validity of the
instrument which produced such measures.
"Increasing and sometimes indiscriminate use of QoL measures has provoked
concern about these method in the (health) context, especially when important
consequences, such as treatment decisions or resource allocation, depend on
them" [Cox et al. (1992)]. Generally we can assume that QoL is a complex
notion which can be divided in multiple components. Each component is related
to a specific domain, such as sociability, communication or mobility, and is
371
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
372 A. Hamon, J. F. Dupuy, and M. Mesbah
This formula is known as the Spearman-Brown formula, it shows that when the
number of items increases, the reliability tends to 1. Its maximum likelihood
Validation of QoL Measurements 373
estimator, under the assumption of a normal distribution of the error and the
true score, is known as the Cronbach Alpha Coefficient (CAC) [Kristof (1963)]:
k - [ 1-
a=- 2:~=12 SJ] (28.3)
k- 1 Stot
where SJ = n~l 2:r=l (Xij - Xj)2 and Slot = nLl 2:r=l 2:~=1 (Xij - x)2. CAC
can be computed to find the most reliable subset of items [Moret et al. (1993)].
As a first step, all items are used to compute the CAC. Then at every step,
one item is removed from the scale. The removed item is the one which gives
for the scale without the item the maximum CAC. This procedure is repeated
until only two items remain. If the parallel model is true, it can be shown, using
the Spearman-Brown formula that increasing the number of items increases the
reliability of the total score which is estimated by Cronbach alpha. Thus, a
decrease of such a curve when adding an item could strongly bring us to suspect
that the given item is a bad one (in term of goodness-of-fit of the model).
A supplementary and popular way to assess the influence of the item on the
goodness-of-fit of the parallel model is by examining the empirical correlations
item to total (or to total minus the given item). Assuming the parallel model,
these correlations must be equal. A low correlation indicates a bad item. We
now present a real data example.
().72,------,--~--....,.--..,_--,_-__,--____,--,
0.71
Item 3 is remo ed
0.7
0.68
0.67
OG6 2~----L-----'----:----:---:':------::---~---:, 0
Number 01 items
Figure 28.1: Step by step procedure with the CAC for the Mobility dimension
Validation of QoL Measurements 375
(28.4)
(28.5)
(28.6)
The CML estimators will be denoted by /3e. The conditional likelihood based
on the sub-sample of persons with score s will be denoted by Lg) (x, (3) and the
associated maximum likelihood estimators by /3g).
Andersen (1973) proposed
a test based on the ratio of the estimated likelihood Lc(x, /3e)
and the sum of
the sub-samples likelihoods Lg)(x,/3g)). The test statistic is :
k-1
Z = 2LLg)(x,/3g)) - 2Lc(x,/3e). (28.7)
s=l
This theorem implies that, only when all the size (ns) of the sub samples of
persons with a score s is sufficiently large, can we consider that the distribution
of Z can be approximated by a X 2 -distribution. These requirements are not
generally achieved in practice. Most often, we find that the number of persons
with a low or large score is too small. To take this problem into account,
Andersen (1973) proposed to group the values s to obtain larger and more
homogeneous sub-samples. Let so, ... , Sr, be r integers (r > 1) so that we have
So = 0 < Sl < ... < Sr-1 < Sr = k - 1. Then we group in a sub-sample all
the persons who achieved a score between 1 and Sl, in another sub-sample we
group the individuals with scores between Sl + 1 and S2, and so on .... In each
subsample, we can obtain CML estimators /3g).
Andersen proves that with
slight modifications, the previous theorem is always true. Now the test statistic
is Zl
Sl
where nz is the number of individuals with a score between SZ-l + 1 and sz. If
nz ---> 00 for l = 1, ... , r then the statistic Zl is asymptotically X2-distributed
with (r - l)(k - 1) degrees of freedom. Andersen (1973) considers the power
of this statistic against the specific alternative of the logistic two parameters
model. In this model the probability of a positive answer now depends on two
parameters
exp{aj (B - ;3j)}
P(Xij = 1I Bi,;3j) = l+exp{aj(B-;3j)}' (28.10)
Ik-l(S -l)exp(-;3j) ~
IE(Nsj,;3INs = ns) = () ~p(Si = siNs = ns). (28.11)
Ik s i=l
(28.12)
Let df = (d~)j=l, ... ,k be the vector of the differences nsj - IE(Nsj, /3c INs = ns)
and -v;,c its estimated variance-covariance matrix.
(28.13)
If for all s (1 :::; s :::; k - 1) the size of the sub sample of persons with score
s tends to infinity then the statistic Rl is asymptotically X2 distributed with
(k - l)(k - 2) degrees of freedom.
The same argument with the number of persons who positively answer to two
items land j leads to the construction of the R2 statistic. This statistic has an
asymptotic X2 distribution with k(k~H) - 2(k - 1) degrees of freedom.
378 A. Hamon, J. F. Dupuy, and M. Mesbah
In a validating process, we aim at detecting bad items (in the sense that
they do not fit well), and sometimes also subsets of items that fit the model.
Molenaar (1983) proposes a graphical method to select homogeneous subsets of
items. We will briefly present this interesting method in the next section.
(a) for each particular item i in its turn, separate the respondents into two
subgroups (respectively denoted by Go and Gd according to their re-
sponse (respectively yes and no) to the chosen item i;
(b) estimate the difficulties of the other items j (j -I- i) in the two groups Go
and G 1 ;
(c) plot the estimates of the difficulties in G 1 versus the estimates of the
difficulties in Go.
If the items i and j are locally independent, the position of the item j should
lie close to the first diagonal. If there exists some dependence between these
two items, item j should lie below or above the diagonal.
100,----,-----,----,-----~---,----~--~~----~
90
80
~ 70
~
C) 60
m
>
co
w
~ 50
o
ill
rn
~ 40
~
(L
Item 7
I_--~-
Raw score
data. The Rl test is also significant. This test is based on the observed and
expected frequency of the positive answers to each item in each subgroup. In
Table 28.4, we present these numbers for the item 3 because it is the worst item
of all, in the sense of a large difference between observed and expected frequency.
If this item is removed, the statistic Rl is no longer significant. It value is 20 and
the asymptotic distribution is a X2 with 24 degrees of freedom (p-value = 0.68).
With this set of 9 items, the R2 test is highly significant, indicating that at
least two items of the questionnaire are not locally independent. Applying the
described graphical method results in the following conclusions. Separation of
respondents is made on item 2 of the scale.
+itp. 4
+ item S
+ item S
+ item 1
+ ilp.1n
-2 - + it(~~Hii
V
-U.5 () 0.':,
dif1ir:ully estimates in qrollp Go
Figure 28.3: Difficulty estimates in each group formed by the individuals who
positively answer to item 2 (Gl) and negatively answer to item 2 (Go)
to be more difficult for people scoring 0 to item 2 than for people scoring 1 to
item 2. To a less extent, item 10 seems to be correlated with item 2. Other
items seem to lie close to the diagonal, indicating that the response to each of
them is independent of the response to item 2. This method is repeated for
each item of the scale. We present one more example. It involves separation on
item 10, in order to check the conclusions drawn from the previous graph. Item
;J [, I,----~-~-~--,--___,__-___,__-___,__-___,__-_,_-/~
/
Ilentj
1.5
item 4
(')~
++
Ilet 1
~
~ 0.5 + Ilem 2
~ o~--------~~---------~
~
item I:l
"
ij
-1
-1.5
Item ~
-2
1+ iter :
" c,l::':-----'---~-~-~--~-~-~-~-
-?S -? -1,~) () O,~; 1's
dilliclJlty estllnCl\es ill qrollp Gn
Figure 28.4: Difficulty estimates in each group formed by the individuals who
positively answer to item 10 (Gl) and negatively answer to item 10 (Go)
10 is used to split and gives the graph 28.4. Item 2 appears to be correlated
with item 10 whereas there is no indication that the other items depend on the
response to item 10.
All the items appear to be locally independent of item 8 (the figure is not
presented here). Similar plots are obtained when using items 1,4,5,6,7 and
9 to separate the respondents. Hence the scale Mobility contains a group of
three items correlated with each other {I, 2, 1O}. In order to obtain a Rasch
homogeneous instrument, we choose one item from {1, 2, 10} and form a scale
by gathering this item and the items {4, 5, 6, 7, 8, 9} .
For making a choice, we calculate the likelihood ratio test. We check that
it is not significant for each of the three possi~le sets of items. As a scale for
Mobility we choose the set for which the level of significance of the test is the
largest.
382 A. Hamon, J. F. Dupuy, and M. Mesbah
28.6 Conclusion
A century after the famous Karl Pearson paper, fields of application where
goodness-of-fit tests are useful, are increasing. The validation of quality of life
questionnaire is undoubtedly one of those. Nevertheless, checking underlying
properties of those ideal measurement models is in practice, often done by
simple graphics or by way of simple statistics.
References
1. Andersen, E. B. (1973). Asymptotic properties of conditional maximum
likelihood estimators, Journal of the Royal Statistical Society, Series B,
32, 283-301.
2. Cox, D., Fitzpatrick, R., Fletcher, A. E., Gore, S. M., Spiegelhalter, D.
J., and Jones, D. R. (1992). Quality-of-life Assessment: Can we Keep It
Simple?, Journal of the Royal Statistical Society, Series A, 155,353-393.
3. Glas, C. A. W. (1988). The derivation of some tests for the Rasch model
from the multinomial distribution, Psychometrika, 53, 525-546.
4. Glas, C. A. W. and Ellis, J. (1993). Rasch Scaling Program: User's
Manual Guide, iec ProGAMMA, Groningen, The Netherlands.
5. Glas, C. A. W. and Verhelst, N. D. (1995). Testing the Rasch model, In
Rasch Models Foundations, Recent Developments and Applications (Eds.,
G. Fischer and 1. Molenaar), New York: Springer-Verlag.
6. Kristof, W. (1963). The statistical theory of stepped-up reliability coef-
ficients when a test has been divided into several equivalent parts, Psy-
chometrika, 28, 221-238.
7. Mellenbergh, G. J. and Vijn, P. (1981). The Rasch model as a loglinear
model, Applied Psychological Measurement, 5, 369-376.
8. Molenaar, 1. W. (1983). Some improved diagnostics for failure of the
Rasch model, Psychometrika, 48, 49-72.
9. Moret, L., Mesbah, M., Chwalow, J., and Lellouch, J. (1993). Valida-
tion statistique interne d'une chelle de me sure : relation entre analyse
en composantes principales, coefficient a de Cronbach et coefficient de
correlation intra-classe, Revue d'Epidmiologie et de Sante Publique, 41,
179-186.
Validation of QoL Measurements 383
10. Rasch, G. (1960). Probabilistic models for some intelligence and attain-
ment tests, Danmarks Paedagogiske Institut, Copenhagen.
11. Tjur, T. (1982). A connection between Rasch's item analysis model and
a multiplicative Poisson model, Scandinavian Journal of Statistics, 9, 23-
30.
PART VIII
TESTS OF HYPOTHESES AND ESTIMATION
WITH ApPLICATIONS
29
One-Sided Hypotheses in a Multinomial Model
29.1 Introduction
In this paper we continue [Dudley and Haughton (1997)] to study model selec-
tion based on an extended Schwarz (1978) BIC method for multiple data sets
and where individual models may be half-spaces or half-lines.
Multiple clinical trials of the same treatment against placebo give multiple
data sets, where the true or pseudo-true parameter values may differ between
studies. We also consider models where one parameter, a common odds ratio,
is the same for all data sets, while others vary. Some models of interest are
one-sided; specifically, models where the treatment is beneficial or harmful. For
one treatment, the long-term use of aspirin after a heart attack (myocardial
infarction, or MI), of seven individual studies, six did not show a significant
effect of aspirin on overall mortality at the 5% level, and one showed a mildly
significant benefit. The AMIS (1980) study had the largest sample size and
found a (non-significant) negative effect for aspirin. Several meta-analyses of
the mortality data have been done, e.g. Canner (1987), DerSimonian and Laird
(1986), and Gaver et al. (1992), which included 6 of the 7 studies, but not
Vogel, Fischer, and Huyke (1979), the one study showing a significant benefit
387
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
388 R. M. Dudley and D. M. Haughton
(29.1)
of e for pCk) [Berk (1966), Huber (1967) and Poskitt (1987)J. We will need some
regularity conditions on models mj [Dudley and Haughton (1997, 2000)J.
One is that each mj is either a manifold, or a manifold-with-boundary in-
cluded in some mi where in a suitable parameterization, mi is an open set in a
Euclidean space and mj is a half-space intersected with mi.
We call a model mj the best model if it is the unique smallest model (in the
e
sense of inclusion) containing Ok for k = 1, ... , K. A best model will always
exist if for any i and j = 1, ... , J, mi n mj = mr for some r, as will hold for
the models we consider. Our object is to choose the best model mj, under loss
functions which may differ for different wrong choices of models.
Well-known model selection criteria are based on penalized maximum like-
lihood. Let M Lk (mj) be the supremum of the likelihood for the kth data set
over 8 E mj. Let MLLkj := 10gMLk(mj). Suppose first that there is only
one data set (K = 1) and all the models are manifolds. One chooses the model
for which MLLlj - tl j is largest, where the penalty tlj increases with the di-
mension dj of mj. In the BIC of G. Schwarz (1978), tlj = dj(10gn)/2 where
n = nl is the sample size. It has been shown under some conditions [Haughton
(1988) and Poskitt (1987)] that as nl ---t 00, the BIC is consistent, i.e. the
probability of choosing the best model converges to 1. Moreover, the BIC is
equivalent to choosing the model having the highest value of the leading terms
in the logarithm of its posterior probability of being the best model, under any
prior probabilities having strictly positive, continuous densities. The leading
terms do not involve the particular choices of priors.
The BIC has not been found to work optimally, so one may look beyond
the leading terms. In our cases the well-known Jeffreys (1946) prior measure Jj
exists on each mj. The density of Jj is the square root of the determinant of
the Fisher information matrix, independently of the parameterization, see also
Kass (1989). If 0 < Jj(mj) < 00, as is true in our cases, then the Jeffreys prior
probability on mj will be called J-lj := Jj/Jj(mj).
In the hierarchical Bayes method of meta-analysis, e.g. DerSimonian and
Laird (1986) and Gaver et al. (1992), a parameter 8i gives an effect size, e.g.
a difference in mortality rates, for the treatment vs. placebo in the ith study,
where 8i are Li.d. with a hyper-parametric prior. Such priors may be more
flexible and realistic, but we use Jeffreys priors to avoid subjective choices.
To approximate posterior probabilities we will need to approximate integrals
J likCn)dJ-lj for k = 1, ... , K, j = 1, ... , J, where likCn) is a likelihood function.
Here n may be nk for some k, when likCn) is the likelihood function for the kth
data set, or n may be the total sample size N = nl + ... + nK in (29.1). If mj
is a manifold, and ML(mj) := SUPmj likCn) , then we use the approximation
(29.2)
some further hypotheses. If eo ~ mj, both the actual integral and its ap-
proximant become exponentially small in n relative to the integral over any mi
containing eo [Haughton (1988, 1989), Poskitt (1987) and Dudley and Haughton
(2000)]. The simple and accurate approximation (29.2) applies if and only if
/-lj is the Jeffreys prior. The factors n- dj / 2ML(mj) correspond to the BIC; the
remaining constant factors provide the sharpening.
If mj is a half-space intersected with a parameterized manifold mi, and mr
is the boundary of mj, so that dj = di = dr + 1, then Jj is Ji restricted to mj.
Let MLLu := sUPmu loglik(n). We use the approximation
(29.3)
M7(t) will be the full common odds ratio model where 0 < 'l/J < 00. M 4 (t), M5(t)
and M6(t) will be the submodels of M7(t) where 'l/J = 1, 'l/J ~ 1 and 'l/J :::; 1
respectively. Thus M4(t) is the null hypothesis that the treatment has no effect.
Since M7(t) = M5(t) U M6(t), M7(t) can never be the best model.
One-Sided Hypotheses 391
The models Mj(t) for K = 1 will be called mj(t), where 0 < t = tl < 1 in
this case. Then ml(t) = m7(t), m2(t) = m5(t), and m3(t) = m6(t). The total
Jeffreys prior measures of these models are, for 0 < t < 1,
of Mj(t). For each model Mj(t), the Fisher information matrix has a block
structure Fj = [6 ~.] J
where V is the (K -1) x (K -1) Fisher information matrix
for the multinomial family (1, VI, ... , VK) with the parameters VI, ... , VK-1. Thus
i/
VK == I-VI - ... VK-1, Vkk = Vk 1 +V for k = 1, ... , K -1 and Vkr = Vrk = vl/ ,
1 ::; k < r ::; K - 1. We have det V = I/IIf'=l Vk, as can be seen by way
of the parameters Wk = v~/2: the Jeffreys prior measure for the multinomial
(1, wI, ... , wk) family is 2K-lwI/dwl ... dWK-l = (IIf'=l Vk)-1/2dvl ... dVK-l,
which is the surface area measure in the positive orthant of the sphere wI +
... + wk = 1, e.g. Kass (1989).
The matrix C 1 = C2 = C3 is a 2K x 2K diagonal matrix with diagonal
entries vktk/[rk(l - rk)] and vk(l - tk)/[Pk(l - Pk)] for k = 1, ... , K. So
By symmetry, if the integral is taken only over the region where rk ::; Pk for
each k, for M 3 (t), it is multiplied by 2- K , and likewise for the M2(t) region. By
normalization of Dirichlet distributions, e.g. Johnson and Kotz (1972, Chap.
40, Sec. 5), the total Jeffreys prior measures of Mj(t) for j = 1,2,3 are
(29.7)
(29.8)
The matrix C4 for M4(t) is a K x K diagonal matrix with entries Vk/[Pk(I-Pk)],
k = 1, ... , K, so the total Jeffreys prior measure of M4(t) is
(29.9)
For the common odds ratio model M7(t) and its submodels M5(t) and M6(t),
the matrix C5 = C6 = C7 is a (K + 1) x (K + 1) non-diagonal matrix C with
Vr [ 1 - tr 'ljJtr]
Cr+1,r+l = Yr (1 + Yr)2 + (1 + 'ljJYr) 2
One-Sided Hypotheses 393
and CI,r+1 = Cr+l,l = trvr/(1 + 'ljJYr)2 for r = 1, ... , K, and Cuv = Cvu = 0 for
2 :S u < v :S K + 1. By row reduction we get a matrix B with
Define Sijk and Xirk as Sij and Xir respectively for the kth data set in (29.3)
with n = nk. We use (29.3) for each data set separately and (29.8) to get
for j = 2,3, 7rN(Mj) '" )q(2K + 1)IIf=I~(SljkX14k)/D, as N ~ +00, where
Vk > 0 for all k and '" denotes asymptotic equivalence except possibly for
exponentially small probabilities (Dudley and Haughton, 2000). We also get:
- K K
7rN(MI) '" AI[1-IIk=I~(S12kXI4k)-IIk=I~(SI3kXI4k)l/D. By (29.2) and (29.9),
7rN(M4) = 7rN(M4) '" (2/N)(2K-I)/2ML(4)(K - l)!/(DJ7i} Applying (29.2)
to MI gives Al '" (27r/N)(3K-I)/2ML(I)/JI(MI) where JI(MI) is given
by (29.7). For M7 we get Jlik(N)dI/7 '" (27r/N)KML(7)/J7(M 7). Let Sij
and Xir be as in (29.3) for the full data set with n = Nand mu replaced
by Mu for each u. Then for j = 5 or 6, since Mj is a half-space in M7,
J lik(N)dl/j / J lik(N)dI/7 '" 2~(S7jX74)'
The numbers in the above seven tables are as in Appendix I of the ATC
(1994) survey, which included some updates from the original publications,
except for the first row of the table from the PARIS study, not given in ATC
(1994).
Patients entered the CDPRG study on average 7 years after their last heart
attack. For two other studies, it was also a long time between last MI and
entry into the study for some patients. The next two tables give data only on
those patients who began treatment within 6 months after their heart attack
in those two studies. In the other four studies, all or nearly all patients entered
the studies within 6 months of their last MI.
References
1. AMIS (1980). The Aspirin Myocardial Infarction Study Research Group.
The Aspirin Myocardial Infarction Study: Final results, Circulation, 62
(suppl. V), V79-V84.
11. Elwood, P. C., Cochrane, A. L., Burr, M. L., Sweetnam, P. M., Williams,
G., Welsby, E., Hughes, S. J., and Renton, R. (1974). A randomized con-
trolled trial of acetyl salicylic acid in the secondary prevention of mortality
from myocardial infarction, British Medical Journal, 1, 436-440.
13. Gaver, D. P~, Draper, D., Goel, P. K., Greenhouse, J. B., Hedges, L.
V., Morris, C. N., and Waternaux, C. (1992). In Combining Informa-
tion: Statistical Issues and Opportunities for Research, Washington D.C.:
National Academy Press.
15. Haughton, D. (1989). Size of the error in the choice of a model to fit data
from an exponential family, Sankhya, Series A, 51 , 45-58.
17. Jeffreys, H. (1946). An invariant form for the prior probability in esti-
mation problems, Proceedings of the Royal Society of London, Series A,
186, 453-461.
28. Vogel, G., Fischer, C., and Huyke, R. (1979). Reinfarktprophylaxe mit
Azetylsalizylsaure, Folia Haematologica, 106, 797-803.
30.1 Introduction
It is natural to expect of a multivariate location estimator that in the case of
a symmetric distribution the population estimate corresponds to the center of
symmetry. Rousseeuw and Struyf (2000) prove that for any angularly sym-
metric multivariate distribution the point with maximal location depth [Tukey
(1975)] corresponds to the center of angular symmetry, and they give an expres-
sion for this maximal depth. Moreover, they show the converse: whenever the
maximal depth equals this expression, the distribution has to be angularly sym-
metric. Based on this characterization we will now construct a test for angular
symmetry of a particular distribution, which also gives us more insight in some
existing tests for centrosymmetry and uniformity of a spherical distribution.
401
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
402 P. J. Rousseeuw and A. Struyf
(30.1)
(a) (b)
(c) (d)
Figure 30.1: Examples of (a) a discrete and (b) a continuous angularly sym-
metric distribution around e. Transforming (a) and (b) through the mapping
h(x) = (x - e)/llx - ell yields the centrosymmetric distributions in (c) and (d)
as multivariate ranking [Eddy (1985) and Green (1981)]. This can be visualized
by means of the ldepth regions D a given by
These regions are convex sets, with Da ~ Da' for each a' < a. The center of
gravity of the innermost ldepth region is a point with maximal ldepth, called
the deepest location or the Tukey median of the data set. This multivariate
location Tz* (P) is a robust generalization of the univariate median. Donoho and
Gasko (1992) explored the properties of the location depth and of the deepest
location for finite data sets. Masse and Theodorescu (1994) and Rousseeuw and
Ruts (1999) gave several properties of the location depth for general probability
distributions P, that need not have a density. The asymptotic behavior of the
404 P. J. Rousseeuw and A. Struyf
depth function was studied by He and Wang (1997) and Masse (1999), and that
of the deepest location by Bai and He (1999). Many statistical applications of
location depth have been developed. A survey is given in Liu, Parelius, and
Singh (1999).
Recently, Rousseeuw, Ruts, and Tukey (1999) proposed the bagplot, a bi-
variate generalization of the univariate boxplot based on location depth. Fig-
ure 30.2a depicts the weight of the spleen versus the heart weight of 73 hamsters
[Cleveland (1993)]. The Tukey median is given by the black-on-white cross in
the center of the data cloud. The dark gray area around the deepest location
is called the bag. It is an interpolation of two subsequent depth regions, and
contains 50% of the data. The bag corresponds to the box in the classical box-
plot. An outlier (plotted as a star) is a point lying outside of the fence, which is
obtained by inflating the bag by a factor 3 relative to the Tukey median. The
light gray loop ("bolster") is the convex hull of all nonoutlying data points. In
this example, 4 hamsters seem to have an extraordinary large spleen and/or
heart. The shape of the loop reveals skewness of the data cloud, and suggests
a logarithmic transformation of both variables. In Figure 30.2b the bagplot of
the transformed data set is given. In this plot, only one outlier remains.
Rousseeuw and Struyf (2000) prove that the location depth can be used to
characterize angular symmetry :
1 1
ldepth((}o; P) = 2: + 2: P ({(}o}).
Jn that case, ldepth( (}o; P) is the maximal value of ldepth( 0; P) over all 0 E JRl.
From Theorem 30.2.1 it follows that any P which is angularly symmetric about
some 00 with P( {(}o}) > 0 has a unique center of angular symmetry. Otherwise,
there can only be two different centers 0 1 i- O2 of angular symmetry if P has
all its mass on the straight line through 0 1 and 02. These corollaries have been
proved in another way by Liu (1990) and Zuo and Serfling (2000). A similar
property holds for the L1-median
since Zuo and Serfling (2000) have proved that a distribution P that is angularly
symmetric about a unique point 00 has 0 0 as an L1-median.
Zuo and Serfling (2000)]. This is not sufficient, however. For instance, take
a distribution PI which is not angularly symmetric and with ldepth(O) = k.
Then put P2 = ~o and P := ~PI + ~P2' For this probability measure P we find
ldepth(O) = ~(k) + ~(1) = ~ ~ ~ although P is not angularly symmetric. As a
consequence, Theorem 30.2.1 is stronger than a similar property given by Zuo
and Serfling (2000) which is based on halfspace symmetry and requires stricter
conditions on P.
For the special case of probability measures with a density it always holds
that maxe ldepth(O; P) ~ ~, which yields the following corollary of Theo-
rem 30.2.1.
1
ldepth(Oo; P) = "2'
In that case, ldepth(Oo;P) = maXe ldepth(O;P).
The 'only if' part of this property was previously proved by Rousseeuw and Ruts
(1999) in a different way, whereas the 'if' part follows from Theorem 30.2.1.
In the bivariate case, Daniels (1954) gave an expression for the cumulative
distribution function
if k ~ [en - 1)/2],
otherwise.
(30.3)
406 P. J. Rousseeuw and A. Struyf
Here j' = [kj(n-2k)] and each term is a probability of the binomial distribution
B(n, ~).
The same test statistic has been used by other people to test for different null
hypotheses. In two dimensions, the location depth ldepth(B o, Xn) reduces to the
bivariate sign test statistic of Hodges (1955) where the null hypothesis Ho was
that P is centrosymmetric about Bo. By Theorem 30.2.1 we can now see that the
real null hypothesis of this test is larger than the original Ho. It actually tests
for angular symmetry instead of centrosymmetry, which is a special case. Ajne
(1968) uses essentially the same test statistic to test for another null hypothesis,
that a distribution on the circle is uniform. Bhattacharyya and Johnson (1969)
first noted that both tests use the same test statistic. By the construction in
(30.1) and Property 30.2.1 it follows that Ajne's test has a much larger null
hypothesis, namely centrosymmetry of the circular distribution. The latter
is an illustration of the fact that the masses of all hemispheres of a sphere
S in JRP do not suffice to characterize the distribution P on S. Indeed, for
any centrosymmetric distribution P on S (such as the one in Figure 30.1d)
it is true that the mass of each hemisphere equals !, and hence we cannot
distinguish between such distributions on the basis of the masses of hemispheres
alone. On the other hand, the masses of all caps of S would be sufficient to
characterize P on S by the theorem of Cramer and Wold (1936), since any
nontrivial intersection of a halfspace H C lRP and S determines a cap of Sand
vice versa.
Example 1. Let us consider the exchange rates of the German Mark relative
to the US Dollar (DEMjUSD) and of the Japanese Yen (JPY jUSD) from July
to December 1998. Every weekday (except on holidays), the exchange rates
were recorded at 8PM GMT. Figure 30.3 shows the evolution of the exchange
rates over this time period, measured in units of 0.0001 DEMjUSD and 0.01
JPY jUSD. The data set in Figure 30.4 consists of the 129 differences (~x, ~y)
between the exchange rates on consecutive days, for both currencies.
From the time series plot in Figure 30.3 as well as from the scatter plot in
Figure 30.4 it is clear that ~x and ~y are correlated. We want to test whether
these pairs of exchange rate movements come from a bivariate distribution
which is angularly symmetric around the origin. Intuitively, we want to test if a
movement (~x, ~y) of the rates of DEMjUSD and JPY jUSD with ~yj ~x = a
and ~x > 0 is equally likely to occur as a movement (~x, ~y) with ~yj ~x = a
and ~x < O. The location depth of the point Bo = (0,0) can be calculated with
the program of Rousseeuw and Ruts (1996). Here, ldepth(B o, X) = 57. The
p-value equals H29(57) = 0.88435, hence we accept the null hypothesis that the
data are angularly symmetric around Bo. Note that large distances or long tails
have no effect on this result.
A Depth Test for Symmetry 407
Example 2. The azimuth data [Till (1974, p. 39) and Hand et al. (1994)]
consist of 18 measurements of paleocurrent azimuths from the Jura Quartzite,
Islay. The original measurements (in degrees east of north) are projected onto
the circle in Figure 30.5. The location depth of the point (0,0) relative to this
data set equals 1. The p-value is H8(1) = 0.002197, so we conclude that the
distribution of these data points deviates significantly from angular symmetry.
Bagplol
0.3
(a)
••
0.25
•
E 0.2 •
''"t
.
c:
.9!
Co
• •
•
:u'" 0.15
• •
.
in
E
<=
•
0.1 •
• •• • •
• •• ••
0.05
••
•
0.4 0.6 0.8 1.2 1.4 1.6 1.8
hamsler heart weighl
Bagplot
-2
• .
•
•
- 2.5 • • •
E • •
.2'
-3 •
.
~
c:
.9!
•
~
:u - 3.5
..
in
e
E \..
• •
E • •
-4
• • • • •
•
•• •
- 4.5 • • •
•
•
-5
•
-1 2 -1 - 0.8 - 0.6 - 0 .4- 0.2 0 0 .2 04 0 .6 0.8
(b) loglhamSler heart weIght}
Figure 30.2: (a) Bagplot of the spleen weight versus heart weight of 73 hamsters.
(b) Bagplot of the log-transformed data set
A Depth Test for Symmetry 409
JPY
~
Q)
0>
C
'"
.<:: a
a
~
Q)
a
o 20 40 60 80 100 120
index
Figure 30.3: Evolution of the exchange rates of DEMjUSD (dashed line) and
JPY jUSD (full line) from July to December 1998
410 P. J. Rousseeuw and A. Struyf
0
0
'<t
•
• • •
• ..••
• •••.e..
, .• • ..I"
0
.. -
0
"' ...
•
C\I
•
.....
•• • •
• •• •
. .. •
, . e\
0
•
• • ••• • .~
~~
.-..'
: ,. .....
•
0 0 •• ••• • •
en 0
C)I • • •
• • • ••
;:)
):
a.
....,
•
8 • •
"1 •
~
0
~
•
I I I I
DEMIUSD
~ 0
0
E
·N
os
In
9
azimuth!, 1]
References
1. Ajne, B. (1968). A simple test for uniformity of a circular distribution,
Biometrika, 55, 343-354.
11. Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J., and Ostrowski, E.
(1994). A Handbook of Small Data Sets, London: Chapman & Hall.
12. He, X. and Wang, G. (1997). Convergence of depth contours for multi-
variate datasets, Annals of Statistics, 25, 495-504.
16. Masse, J. C. (1999). Asymptotics for the Tukey depth, Technical Report,
Universite Laval, Quebec, Canada.
21. Rousseeuw, P. J., Ruts, I., and Tukey, J. W. (1999). The bagplot: A
bivariate boxplot, The American Statistician, 53, 382-387.
23. Till, R. (1974). Statistical Methods for the Earth Scientist, London:
MacMillan.
26. Zuo, Y. and Serfiing, R. (2000). On the performance of some robust non-
parametric location measures relative to a general notion of multivariate
symmetry, Journal of Statistical Planning and Inference, 84, 55-79.
31
Adaptive Combination of Tests
31.1 Introduction
Consider the linear regression model
Y = X,6+z (31.1)
413
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
414 Y. Dodge and J. Jureckova
absolute errors; and (3) maximum of absolute errors. These three methods
are members of the class called Lp-estimators which are obtained by mini-
mizing what is known as Minkowsky metric or Lvnorm (criteria) defined as
[2: I Zi IP]l/P with P ~ 1. If we set p = 1, we obtain what is known as an
absolute or city block metric or Ll-norm. The minimization of this criteria is
called Ll-norm or Least Absolute Deviations (LAD) method. If p = 2, we have
what is known as a Euclidian metric or L 2 -norm. The minimization of this
distance is known as the least squares (LS) method. The classical approach
to the regression problem uses this method. If we minimize the Lp-norm for
p = 00 we have the minimax method. There are, however, many other methods
for estimating {3. For a complete treatment of some of the major methods the
reader is referred to Birkes and Dodge (1993) and to Jureckova and Sen (1996).
While the method of least squares enjoys well-known properties within
Gaussian parametric models, it is recognized that outliers, which arise from
heavy-tailed distributions, have an unusually large influence on the resulting
estimates. Outlier diagnostics statistics based on least squares have been devel-
oped to detect observations with a large influence on the least squares estima-
tion. For documents related to such diagnostics the reader is referred to Cook
and Weisberg (1982, 1994).
While many location estimators were extended in a straightforward way to
the linear regression model, this, until recently, was not the case of L-estimators
(linear combinations of order statistics). The attempts which were made either
were computationally difficult or did not keep the convenient properties of lo-
cation estimators. In 1978, Koenker and Bassett introduced the concept of
regression quantile which provided a basis for L-procedures in the linear model.
The trimmed least squares estimator, suggested by the same authors, is an
extension of the trimmed mean to the linear model.
At present, there exists a big variety ofrobust estimators in the linear model.
Besides distributional robustness, estimators resistant to the leverage points in
the design matrix and possessing a high breakdown point [introduced originally
by Hampel (1968); the finite sample version is studied in Donoho and Huber
(1983)] were developed and studied.
Summarizing, the last 40 years brought a host of statistical procedures,
many of them enjoying excellent properties and being equipped with a com-
putational software. On the other hand, this progress has put an applied sta-
tistician into a difficult situation: if he needs to fit his data with a regression
hyperplane, he is hesitating which procedure should he use. His decision is then
sometimes based on peculiar reasons. If he had some more information on the
model, he could choose the estimation procedure accordingly. If his d~ta are
automatically collected by a computer and he is not able to make any diag-
nostics then he might use one of the high breakdown-point estimators but he
usually would not, either due to the difficult computation or perhaps due to
his scepticism. Then, finally, he might prefer the simplicity to the optimality
Adaptive Combination of Tests 415
and good asymptotics and use the classical least squares, LAD-method or one
of other reasonably simple methods.
An idea of what to advise for such a situation, instead of to concentrate on
one method, is to combine two convenient methods and in such a way diminish
the eventual shortages of both. This idea, simple as it is, was surprisingly not
very much elaborated until recently. Arthanari and Dodge (1981) introduced
an estimation method based on a direct convex combination of LAD- and LS-
methods. Dodge (1984) extended this method to a convex combination of
LAD and Huber's M-estimation methods and supplemented that by a numerical
study based on simulated data.
Later on Dodge and Jureckova (1988) observed that the convex combina-
tion of two methods could be adapted in the sense that the optimal value of the
convex combination methods coefficient, minimizing the resulting asymptotic
variance, could be estimated from the observations. The resulting estimator at-
tains a minimum asymptotic variance over all estimators of this kind and for any
general distribution with a nuisance scale. Dodge and Jureckova (1988, 1991)
then extended the adaptive procedure to the combinations of LAD-method
with M-estimation and with trimmed least squares estimation methods. An
analogous idea can be used to develop for the combination of two tests of the
linear hypothesis in the linear regression model. In Section 31.3 we develop and
discuss optimal combination of tests.
In what follows we briefly describe the general idea, leading to a construction
of an adaptive convex combination of two estimation methods.
~
L..JP
i=1
(Yi - x~t)
Sn
:= mm
. (31.4)
(T5 = J x 2 fo(x)dx
and
EP = J 1x 1fo(x)dx,
and in the M-estimation case
and
EP = J 1 'IjJ(x) 1 fo(x)dx,
and the minimization (31.4) leads to the estimator T n( 60) such that
(31.5)
Adaptive Combination of Tests 417
where
(31.6)
Y=X{3+z (31.7)
418 Y. Dodge and J. Jureckova
where fo is a fixed (but generally unknown) symmetric density such that fo(O) =
1 and the scale statistic s is s = 1/ f(O). Denote F = {j : f(z) = (l/s)
fo(z/s), s > O} the family of densities, satisfying (31.8) to (31.9), indexed by s.
We shall consider the hypothesis
Ho: j3 = 0; (31.10)
but obviously the procedure could also be applied to more general hypotheses
of the type H: Aj3 = b.
We can generally consider three types of tests of the linear hypothesis:
(i) the Wald type tests,
(31.12)
with a fixed (30 E RP. The problem may be that of estimating the covariance
matrix V.
(ii) and (iii): The likelihood ratio tests and the score type tests are closely
related. The latter has a simpler linear form: for instance, for the model f(x, e)
with the scalar parameter e, and hypothesis H*: e = eo, the parametric score
test is based on the statistic
(31.13)
Adaptive Combination of Tests 419
The score tests can be performed with less or no estimation of unknown parame-
ters and matrices, compared with the two other tests; moreover, the sign-rank
tests, which asymptotically have forms of score tests, need even less estimation
due to their invariance. For this reason, we recommend using the ranks rather
than the regression quantiles or LAD estimation for testing various hypotheses
in the linear model.
The score tests belong to the class of M-tests of Ha, which are closely
connected with the M-estimation of (3. The M-test of Ha is characterized by
the test criterion
(31.14)
where n
Mn = (nQn)-1/2 LXi?j!(Yi), Qn = n-IX~Xn, (31.15)
i=1
i:
parameter
(31.17)
where
n R+
S~ = (nQn)-1/2 {; Xi<P+ (n ~ 1)' (31.18)
where Rt is the rank of IYiI among IYII, ... , IYnl, and <P+ : [0,1) I--t R~ is a
nondecreasing score function, square-integrable on (0, 1), and such that <p+(O) =
O. Denote also
The test criterion (31.17) has asymptotically the X~ distribution under Ha and
i: -
the noncentral X~ distribution under Hn with noncentrality parameter
Moreover, under H o, the sign-rank statistic (31.18) admits the asymptotic rep-
resentation
n
s~ = (nQn)-lf2 LXi'P(F(Zi)) + op(l) as n -+ 00, (31.21)
i=1
(31.22)
and o-~ is an estimator of (12. On the other hand, the median test of Ho is also
of type (31.14)-(31.15) with
Mn -_ s+
n -_ (nQn )-1/2 L.,..x
~ t. 2"slgn
1. Yi.. (31.23)
i=1
While the F -test is optimal for f normal, the median test is the locally most
powerful signed-rank test of Ho for the double exponential distribution of errors.
We are looking for the optimal convex combination
W n -_ (1 - 6) -Tn
1
s
+ 6S n+ , 0 S; 6 S; 1. (31.24)
The test is considered optimal when it has the maximum Pitman efficacy over
6 E [0,1]. Notice that the test (31.24) is of the type (31.14) with the 'IjJ function
(31.26)
where
2 2 (/2 [j2 El 1 2
,." =(1-6) -+-+6(1-6)-=-(/ ('ljJ,F,8) (31.27)
82 4 8 82
with
2
(/2 ( 'ljJ, F, 8) = ~ {4 (1 - 6) 2 (/6 + 46 (1 - 6) EP + 62 } . (31.28)
(31.29)
If we knew the parameters El, (/2, 8, we would use the test criterion
(31.30)
The test would have the maximal efficacy for 6 minimizing (/2 ( 'ljJ, F, 8) given in
(31.28). Thus, we conclude that 5n , leading to the optimal adaptive combination
of the least squares and Ll estimators leads to the optimal adaptive combination
of the F-test and the median-type test.
(31.31 )
(31.32)
422 Y. Dodge and J. Jureckova
with
v N p (/-L, /'l,-21)
W n ---+ p, II.
rv = Q-l/2/30. (31.34)
If we knew the parameters EP, O"~, s, we would use the test criterion
S2
Wn = K;-2W~Wn = 40"-2(~,8,k)W~Wn' (31.35)
o
80 = (31.36)
40"6 - 4EP'Yo + 'Y5
1
EP _1
nSn i=l
t {(Yi - x~t3 (~)) I [IYi - x~t3 (~) I ~ kS m ]} (31.37)
k n
+- L
n i=l
{1- I [IYi - x~t3 (~)I ~ kS m ]}
(31.39)
Adaptive Combination of Tests 423
References
1. Arthanari, T. S. and Dodge, Y. (1981). Mathematical Programming in
Statistics, New York: John Wiley & Sons.
J. C. W. Rayner
University of Wollongong, Wollongong, Australia
that involves kBs. We test Ho : B = 0 against K : B of. O. Here f(x; (3) involves a
q x 1 vector of nuisance parameters (3 (for example, composed of the mean J-L and
standard deviation (j when testing for normality with unspecified parameters).
The {hi (x; (3)} may be taken to be a set offunctions orthonormal on f(x; (3) and
C(B, (3) is a normalizing constant that ensures gk(X; B, (3) is a proper probability
density function. Care must be taken because C(B, (3) may not exist or may
425
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
426 J. C. W. Rayner
only exist over restricted domains. See Figures 32.1 and 32.2 for examples of
smooth probability density functions.
Neyman (1937) used this model with f(x; (3) the continuous uniform dis-
tribution and {hi(X;,a)} the Legendre polynomials. He assumed there were no
nuisance parameters, so the probability integral transform could be used to con-
vert completely specified distributions to uniformity. If testing for normality the
orthonormal system is usually taken to be the normalized Hermite-Chebyshev
polynomials. Kopecky and Pierce (1979) and Thomas and Pierce (1979) both
permitted nuisance parameters but took hr(x; (3) = x r , with a consequent lack
of convenience.
Here by Partially Parametric Inference I mean inference based on the more
complicated probability density function of the form 9k(X; B, (3) rather than
f(x; (3). One way this may arise is if a smooth goodness-of-fit test for f(x; (3) has
been applied and rejected. Often goodness-of-fit testing is preliminary to some
other use, or is a retrospective verification that use was valid. So normality
may be assessed if the analysis of variance is contemplated. If normality is
rejected, the alternative specified by a smooth test of goodness-of-fit is of the
form 9k(X; B, (3). This leads to generalizations of the tests recommended in
Section 32.3. In general the term Partially Parametric Inference could be used
for more general forms. For example, the family {hi(X; (3)} may not be the
orthonormal polynomials; a different form of nesting family other than that
specified in (32.1) may be used. An example of the latter occurred in Rayner
and Best (1996), where a smooth alternative to independence was proposed and
used to find new tests of independence.
If the summation in (32.1) is "full" (k maximal), then virtually all distribu-
tions are possible. Pearson's X~ is an example. If the summation is "empty",
the probability density function is f(x; (3) and the inference is parametric. The
more terms involved in the summation, the richer the family 9k(X; B, (3) and
the more "distribution-free" the inference. The focus here is on having only
one or two terms in the summation, towards the parametric end of what might
be called the partially parametric continuum. Semiparametric generally has a
different meaning. Three examples will be outlined.
0 1 2 3 4 Total Mean
81 0 1 8 6 1 16 2.438
82 0 0 4 11 6 21 3.095
83 0 0 4 6 0 10 2.600
84 1 14 42 19 3 79 2.114
85 4 6 19 16 12 56 2.456
may be neglected, with the caveat that the conclusions only apply in respect
of, roughly, location and dispersion effects.
Rayner and Rayner (1997) derive score tests for the situation when hierar-
chic testing is appropriate; Rayner and Rayner (1998) construct the same test
statistics nonhierarchically as contrasts in the Vrs . From this approach an LSD
assessment can be made. Ultimately we find a first order difference between
samples two and four, and a second order difference between samples four and
five. The location differences suggest the relatively small class responded better
than the larger class, a well-known effect. The second order difference between
the larger classes reflect greater polarization of one class relative to the other.
In fact this was due to using different teaching methods that some students
in the polarized class responded well to, while others responded poorly. This
effect is not usually accessed by current methods of analysis of such surveys.
Using different targets may affect the conclusion, but the example in Rayner
and Rayner (1997) shows remarkable robustness to the targets considered.
I suspect the sort of modifications made by the data-driven school to the
one sample problem may well be applied profitably to this problem.
9=0.2
___ ._._ 9 = 0.4
0.5
------ 9 =0.8
- - 9=-0.4
"
_ ... _ ..- 9=-0.8 ,-, ~
, ,.
I \
\
,
\
0.1
~., .
, I
0.0
-2 -1 ox 1 2
Figure 32.1: The probability distribution function g(x; ()4, 0,1) for varying val-
ues of ()4
when ()4 = 0 there is no real difference between the power curves, and that
as ()4 increases the score test quickly becomes dominant. The Wilcoxon test
is inferior to the t-test for -1.2 < fh < 0 and thereafter becomes the more
powerful of the two. The results are most effectively shown graphically. See
Figure 32.3.
It is interesting to note that one of the early criticisms of Pearson's test
was that when it rejected a model, no alternative model was identified. The
presentation of Pearson's test as a smooth test [see Rayner and Best (1989,
Theorem 5.1.2)] overcomes this objection. The study reported above shows
there may be considerable power gains available if testing is based on the model
identified by goodness-of-fit testing.
A score test based on this model is not the only option. In fact Carolan
and Rayner (2000b) found a Wald test to be preferable in a number of ways.
In particular the Wald test was found to have better power properties for non-
local alternatives. While the chi-squared asymptotic distribution is quite sat-
isfactory for the score test with moderate to large sample sizes, resampling is
recommended to obtain p-values for the Wald test. Both these tests require the
numerical evaluation of maximum likelihood estimates and computational con-
siderations limit the number of parameters that can be included in the model
(particularly if resampling is to be used for p-values). In general it is imprac-
tical to include more than two extra parameters. Fortunately, Carolan and
Rayner (2000b) showed that including just (h can lead to large power gains
430 J. C. W. Rayner
0.2
g(x)
0.1
-4 -3 -2 -1 0 1 2 345 6 7
x
Figure 32.2: Probability density function of the bimodal distribution given by
Equation (32.1) with modes at 0 and 4.32
and in general the gains to be made by including further parameters are small
for data from all but the most extremely nonnormal distributions (including e6
may be worth while for distinctly trimodal data).
This work has been extended to cover the completely randomized, random-
ized blocks and balanced incomplete block designs. Papers on these designs are
in preparation.
sample of size n (taking the values 20, 50 and 100 below) from a bimodal
distribution of the form (32.1) with modes at 0 and 4.32. To achieve this we
put q = 1 (f3 = J.L), k = 3 and fJ = (0,0.25, -0.03l. See Figure 32.2.
The power functions have the interesting but sensible property of having
power approximately equal to the size at both modes. It seems we are not so
much testing "are the data consistent with a population mode of zero?" as "are
the data consistent with a population mode?" See Figure 32.4.
This procedure addresses a problem previously given very little attention in
the literature.
6. =0 6. =-{).4
1.0 1.
0.8 0.8
0.6 0.6
Power
0.4 0.4
0.2 0.2
0.0 O.
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
8. =-{).8 8. =-1.2
1.0 1.
Score test Score test
t test t test
0.8 0.8 WilcoX!l11 test
0.6 O.
Power
0.4 O.
0.2
0.0 O.
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
!1 !1
8.=-1.6
1.0
Score test,'
ttest.
0.8 Wilcoxon test
0.6
Power
0.4
02
0.0
0.0 0.1 0.2 0.3 0.4 0.5
!1
Figure 32.3: Comparison of t-test, Wilcoxon test and score test power curves
for testing Ho : J.L = 0 against K : J.L -=1= 0 as the data becomes progressively
more non-normal
432 J. C. W. Rayner
1.0 .I,....
... -
".!.\..·..:~·I.
"
...
".
,, '..
.
0.8
0.6
Power
0.4
0.2
. .
• • • • • • 1 . . . . . . . . . . . . . . . . . .; ' . • • • • • • • • • • • • • • • • • • • • • • • • • I ' • • • r •••••• II ...... ~·~ II .......... ~.I ••
0.0
-2 o 2 4 6
Il
Figure 32.4: Comparison of power curves of the Wald test using the nearest
mode technique for samples of size 20 (solid), 50 (dashes) and 100 (dots) from
the bimodal distribution in Figure 32.2 above; 1000 simulations
References
1. Carolan, A. M. and Rayner, J. C. W. (2000a). One sample tests of location
for nonnormal symmetric data, Communications in Statistics- Theory
and Methods, 29, 1569-1581.
Abstract: In this paper, we study several tests for the equality of two un-
known distributions. Two are based on empirical distribution functions, three
on nonparametric probability density estimates, and the last ones on differences
between sample moments. We suggest controlling the size of such tests (under
nonparametric assumptions) by using permutational versions of the tests jointly
with the method of Monte Carlo tests properly adjusted to deal with discrete
distributions. In a simulation experiment, we show that this technique provides
perfect control of test size, in contrast with usual asymptotic critical values.
33.1 Introduction
A common problem in statistics consists in testing whether the distributions
of two random variables are identical against the alternative that they differ
in some way. More precisely, we consider two random samples Xl, ... ,Xn and
YI , ... ,Ym such that F(x) = P[Xi :::; x], i = 1, ... ,n, and G(x) = P[Y] :::; x],
j = 1, ... ,m. In this paper, we do not wish to impose additional restrictions on
the form of the cumulative distribution functions (cdf) F and G, which may be
continuous or discrete. We consider the problem of testing the null hypothesis
Ho : F = G against the alternative HI : F =I- G.
Ho is a nonparametric hypothesis, so testing Ho requires a distribution-free
procedure. Thus, many users who have to make such a confrontation resort to
a goodness-of-fit test, usually the two-sample Kolmogorov-Smirnov (KS) test
[Smirnov (1939)] or the Cramer-von Mises (CM) test [see Lehmann (1951),
435
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
436 J.-M. Dufour and A. Farhat
Rosenblatt (1952) and Fisz (1960)]. Other procedures that have been suggested
include permutation tests based on Ll and L2 distances between kernel-type
estimators of the relevant probability density functions (pdf) [Allen (1997)] and
tests based on the difference of the means of the two samples considered [see
Pitman (1937), Dwass (1957), Efron and Tibshirani (1993)]. Except for the last
procedure, which is designed to have power against samples that differ through
their means, the exact and limiting distributions of the test statistics are not
standard, and tables for the exact distributions are only available for a limited
number of sample sizes. Thus these tests are usually performed with the help
of tables based on asymptotic distributions. This leads to procedures that do
not have the targeted size (which can easily be too small or too large) and may
have low power.
In this paper, we aim at finding test procedures with two basic features.
Namely, the latter should be: (1) truly distribution-free, irrespective of whether
the underlying distribution F is discrete or continuous, and (2) exact in finite
samples (i.e., they must achieve the desired size even for small samples). In this
respect, it is important to note that the finite and large sample distributions of
usual test statistics are not necessarily distribution-free under Ho. In particu-
lar, while the KS and eM statistics are distribution-free when the observations
are i.i.d with continuous distribution, this is not anymore the case when they
follow a discrete distribution. For the statistics based on kernel-type density
estimators, distribution-freeness does not obtain even for i.i.d observations with
a continuous distribution. This difficulty can be relaxed by considering a per-
mutational version of these tests which uses the fact that all permutations of
the pooled observations are equally likely when the observations are i.i.d with a
continuous distributions. The latter property, however, does not hold when the
observations follow a discrete distribution. So none of the procedures proposed
to date for testing Ho satisfies the double requirement of yielding a test that is
both distribution-free and exact.
Given recent progress in computing power, a way to solve this difficulty
consists in using simulation-based methods, such as bootstrapping or Monte
Carlo tests. The bootstrap technique however does not ensure that level will be
fully controlled in finite samples. For this reason, we favor Monte Carlo (MC)
test methods. MC tests were introduced by Dwass (1957) and Barnard (1963)
and developed by Birnbaum (1974), Dufour (1995), Kiviet and Dufour (1996),
Dufour et al. (1998), and Dufour and Kiviet (1998)].
In this paper, we first show how the size of all the two-sample homogene-
ity tests described above can be perfectly controlled for both continuous and
discrete distributions on considering their permutational distribution and using
the technique of MC tests properly adjusted to deal with discrete distributions.
As a result, in order to implement these tests, it is not anymore necessary to
establish the distributions of the test statistics, either in finite samples or as-
ymptotically. Second, as a consequence of the great flexibility allowed by the
Two-Sample Homogeneity Tests 437
where Fn(x) and Gm(x) are the usual empirical distribution functions (edf)
associated with the X and Y samples respectively. It is well known that K S
is distribution-free [see Conover (1971, page 313)] under Ho when the common
distribution function F is continuous, but its exact and limiting distributions
are not standard. Birnbaum and Hall (1960) provide tables for its exact dis-
tribution in the case of equal sample sizes, whereas Massey (1952) does the
same for unequal sample sizes. Further, it is important to note that KS is not
distribution-free when F is a discrete distribution.
The two-sample eM statistic is defined as
438 J.-M. Dufour and A. Farhat
where
Cx = nl/5/(2Sx), K (x) = { ~, if Ix I : : ; 1,
0, if Ixl > 1,
n
and Sx = [(n - 1)-1 2::i=l (Xi - X)
- 2] 1/2 . .
IS the usual estImator of the popula-
tion standard deviation. If 9 is the pdf associated with the cdf G, its estimator
9m(X) is defined in a way analogous to (33.3). The L 1-distance test initially
proposed by Allen (1997) is based on the statistic
n m
(33.5)
For the case where both F and G are discrete, the pdf i and 9 are replaced
by the probability mass functions p(x) = P[X = x] and q(x) = pry = x]:
each one is estimated with the help of formula (33.3) after which in and 9m are
replaced by the these estimators in the L 1, L2 and Loo statistics. In contrast
with the K Sand C M statistics, the finite sample distributions of the statistics
L1, L2 and Loo are not distribution-free even when the distribution function F
is continuous as well as (a fortiori) when F is discrete.
The next statistic to enter our study is the difference of the sample means
(h = X - Y. Permutation tests based on (h were initially proposed by Fisher
Two-Sample Homogeneity Tests 439
(1935) and used by Dwass (1957) for testing the equality of means, but Efron
and Tibshirani (1993, Chap. 15) suggested to extend their use, along with
bootstrap tests, for testing the equality of two unknown distributions. Contrary
to Allen (1997) who also considered bootstrap tests, the statistic based on the
studentized difference of sample means
(33.7)
will not be considered since our study is restricted to permutation tests and
it is straightforward to see that such tests based on el
and i are equivalent
[see, for instance, Lehmann (1986)]. Further, we suggest here alternative test
statistics based on comparing higher-order moments. Namely, the difference
between unbiased estimators of sample variances,
fh
A
= -1-
n-1
2:
n
( Xi - X-)2 - -1-
m-1
2:
m
( Yi - y-)2 ,
i=l i=l
(33.8)
(33.9)
where
(33.10)
Note that skewness and kurtosis coefficients playa central role in testing nor-
mality [see Jarque and Bera (1987) and Dufour et al. (1998)].
440 J.-M. Dufour and A. Farhat
~ ( ) _ NGN(X) + 1 (33.11)
PN x - N +1
where
lA(X) = { 1, x E A
0, x ~ A
(33.12)
P r;;::;-
IYN iO
(fTl) <
_
n'l LIO =
u.
IT ] I [a(N + 1)]
N +1 ,0::; a ::; 1 , (33.13)
Two-Sample Homogeneity Tests 441
where I[x] denotes the largest integer not exceeding x. Thus if N is chosen
such that o:(N + 1) is an integer, the critical region (33.12) has the same size as
the critical region G(To) ~ 0:. The Me test so obtained is theoretically exact,
irrespective of the number N of replications used.
The above procedure is closely related to the parametric bootstrap, with a
fundamental difference however. Bootstrap tests are, in general, provably valid
for N ---t 00. In contrast, we see from (33.13) that N is explicitly taken into
consideration. in establishing the validity of Me tests. Although the value of
N has no incidence on size control, it may have an impact on power which
typically increases with N.
Note that (33.13) holds for tests based on statistics with continuous dis-
tributions. In such a case, ties have non-zero probability. Nevertheless, the
technique of Me tests can be adapted to discrete distributions by appeal to the
following randomized tie-breaking procedure [see Dufour (1995, Section 2.2)].
Draw N + 1 uniformly distributed variates Uo, UI, ... , UN, independently of
the Ti'S and arrange the pairs (Ti, Ui) following the lexicographic order:
_ ( ) _ NGN(X) + 1
PN X - N +1 '
where
The resulting critical region PN(To) ~ 0: has the same level as the region
G(To) ~ 0:, provided again that o:(N + 1) is an integer. More precisely,
For the second part of the study where F and C are discrete, the five most
commonly used distributions were retained: discrete uniform (DU), binomial
(Bin), geometric (Ceo), negative binomial (Nbin) and Poisson (P). Since it is
Two-Sample Homogeneity Tests 443
a prohibitive task to find parameters that will simultaneously give rise to either
common mean and common variance, the following three situations were con-
sidered: (i) the distributions were DU(19), Bin(20, 0.5), Geo(O.l), Nbin(8, 0.2),
P(10) and, thus had common mean 10 and variance 30, 5, 90, 2.5 and 10 respec-
tively; (ii) the distributions were DU(lO), Bin(33,0.5), Geo(( J34 - 1)/16.5),
Nbin(3, ()108 - 3)/16.5), P(8.25) and, thus had mean 5.5, 16.5, 3.42, 2.23
and 8.25 respectively but common variance 8.25; (iii) the distributions were
DU(10), Bin(lO, 0.1), Geo(0.3), Nbin(10, 0.2), P(5) and, thus had mean 5.5, 1,
3.33, 50 and 5 respectively and variance 8.25, 0.9, 7.78, 200 and 5 respectively.
lh As a first check on the accuracy of our study, Tables 1 and 2 of Allen (1997)
were done again adding, however, the CM, the Lx) and MC tests based on
higher-moments and by excluding the bootstrap tests. The results appear in
Table 33.2 and they are quite similar to those of Allen(1997).
Table 33.2: Empirical level and power for tests of equality of two distributions:
m = 22, n = 22 and a = 5%
F = N(O, 1)
G Original tests Me tests
KS CM KS CM ih ih ih iJ4 il i2 i=
N(O, 1) 6.2 5.2 4.8 5.1 5.0 4.9 4.7 5.4 4.8 4.8 4.7
N(0.2,1) 7.7 6.7 6.2 6.3 6.4 4.7 5.0 5.1 5.3 5.2 5.1
N(0.3,1) 10.8 9.5 8.6 9.0 9.8 4.8 5.2 4.9 6.8 6.7 6.7
N(O.4,l) 16.1 15.8 13.5 15.0 16.2 4.7 5.6 5.5 10.2 10.1 9.2
N(0.5,1) 32.8 34.1 28.3 32.9 36.1 3.9 6.1 5.9 19.3 18.7 17.1
N(0.7, 1) 54.3 57.5 48.9 55.9 60.3 3.1 6.1 6.5 34.8 33.9 30.8
N(O, 1.2") 7.3 5.9 5.8 5.8 5.1 11.6 4.7 5.1 9.9 10.1 10.0
N(O, 1.42) 9.1 6.8 7.2 6.6 5.0 28.5 4.2 5.2 22.7 23.4 23.6
N(O, 1.62) 12.5 9.7 10.5 9.8 5.1 49.9 4.0 4.5 42.0 42.6 42.9
N(O, 1.82) 15.4 11.5 13.1 11.6 5.3 66.2 3.2 3.9 58.9 59.7 59.8
N(0,2.2) 20.4 16.1 17.3 15.6 5.1 80.2 2.6 3.4 74.6 75.3 74.5
Tables 33.3 and 33.4 contain the results of our study. The following con-
clusions can be drawn. As expected, the MC tests control size perfectly and
are easily applicable. The original K Sand C M tests, for which tables are
available, show size distortions. Although in the case where both F and G are
contir.uous, the eM test appears adequate and the K S test only exhibits light
size distortions, the distortions become severe when both F and G are discrete.
The use of fh to carry out equality tests of two distributions is erroneous.
It is obvious that two distributions cannot be equal if they do not have the
same mean but the converse is not true. Consequently, if the test based on lh
accepts the hypothesis Ha, it should not be interpreted as an acceptance of the
fact that F = G but rather that these distributions have equal means. The
LI and L2 tests behave almost identically and differ slightly from the Loo test.
444 J.-M. Dufour and A. Farhat
In the same way, the power of the K S test is not very different from that of
the C M test. In general, if we compare the powers of the tests based on edf's
(KS and CM tests) with those based on pdf estimates (£1, £2 and LX) tests),
we notice a great difference and we cannot conclude that a test stemming from
one group is more powerful than all the tests in the other group. The edf tests
are more powerful than those based on pdf estimates when two distributions
have the same variance but different means. On the other hand, if the two
distributions have the same mean but different variances, the tests based on
pdf estimates are the most powerful.
As for the case where both F and G are discrete, the results for two from the
three situations are presented successively in Table 33.4. As in the case where
the distributions were continuous, the conclusions reached for the MC tests still
apply. Moreover, the simulation confirms the result stated by Noether (1967)
indicating that, if random variables are discrete, the K S test is still valid but
becomes conservative. On the other hand, it reveals that the C M test is quite
often tolerant although Conover (1971) indicates that it has a tendency to be
conservative in the case of discrete distributions.
33.5 Conclusion
In this paper, we first showed that finite-sample distribution-free two-sample
homogeneity tests, for both continuous and discrete distributions, can be eas-
ily obtained on combining two techniques: (1) by considering permutational
versions of most proposed tests for that problem; (2) by implementing the per-
mutation procedures as Monte Carlo tests with an appropriate tie-breaking
technique to take account of the discreteness of the test null distributions. Sec-
ond, due to the flexibility of the Monte Carlo test technique, we could easily
introduce and implement several alternative procedures, such as permutation
tests comparing higher-order moments. Other alternative procedures are de-
scribed in Dufour and Farhat (2000). Thirdly, in a simulation study, it was
shown that the procedures proposed work as expected from the viewpoint of
size control, while the new suggestions made yielded power gains.
Table 33.3: Empirical level and power for Me tests of equality of two continuous
distributions having same mean and same variance: m = n = 22 and a = 5%
F=N
G Original tests Me tests
KS CM KS CM 81 82 83 84 Ll L2 Loo
N 6.2 5.2 4.8 5.1 5.0 4.9 4.7 5.4 4.8 4.8 4.7
Exp 16.1 12.8 13.6 12.6 5.6 10.6 41.7 15.7 17.4 17.2 15.7
r 11.0 8.4 8.8 8.2 5.2 7.9 26.8 10.1 11.3 11.4 11.0
B 7.1 5.6 5.7 5.7 4.7 5.6 7.7 7.4 6.3 6.2 5.9
Log 6.4 5.2 5.1 5.0 4.7 5.5 5.4 6.3 5.2 5.4 5.4
A 77.3 69.1 71.6 65.0 6.0 59.0 70.3 63.1 68.8 67.9 65.6
U 8.2 5.9 6.7 5.9 5.2 6.1 6.7 17.1 6.7 6.8 7.2
F=Exp
Exp 6.1 5.0 4.9 5.0 4.9 5.2 5.4 5.2 4.8 4.9 4.8
r 7.0 5.7 5.7 5.5 5.0 6.3 7.6 6.2 6.2 6.1 6.4
B 13.6 9.8 11.6 9.9 5.1 12.8 36.6 20.9 15.6 16.0 16.1
Log 16.7 13.3 14.1 12.8 5.2 8.9 36.5 11.5 15.1 14.7 13.6
A 88.7 76.3 84.8 72.2 5.0 49.9 30.6 31.8 55.2 55.0 55.0
U 19.0 13.6 16.2 13.1 5.0 16.7 55.7 35.3 22.1 22.4 23.5
F=f
r 5.9 4.9 4.7 4.8 4.9 5.2 4.9 5.5 5.0 5.1 5.0
B 9.0 7.1 7.4 6.7 5.0 8.9 20.2 11.7 9.4 9.5 9.6
Log 11.3 8.7 9.2 8.7 4.8 6.5 22.1 6.5 9.1 9.1 8.6
A 84.3 72.5 80.1 67.8 5.2 53.6 43.0 43.4 60.7 60.1 59.7
U 12.8 9.4 10.6 9.0 5.2 12.6 35.9 25.6 15.0 15.2 15.5.
F=B
B 6.5 5.6 5.2 5.5 5.3 5.2 5.1 4.9 5.3 5.2 5.4
Log 7.6 5.3 6.2 5.4 4.5 6.4 8.2 11.4 6.8 6.7 6.8
A 83.7 76.1 78.3 71.4 5.8 63.3 74.4 69.7 72.3 71.4 69.9
U 6.6 5.2 5.5 5.1 5.0 5.8 8.0 10.3 5.9 5.8 6.1
F=Log
Log 6.4 5.2 5.0 5.1 5.1 4.7 4.5 4.6 4.7 4.7 4.6
A 72.8 64.6 66.2 60.6 5.8 56.9 60.1 52.7 66.4 65.4 63.3
U 9.6 6.9 8.0 6.7 5.0 8.2 9.4 25.6 9.2 9.3 9.6
F=A
A 6.2 4.9 4.8 4.8 4.7 5.0 5.1 5.0 5.0 5.0 4.9
U 87.9 82.4 82.3 77.2 5.7 65.5 91.0 83.8 76.6 75.7 73.2
446 J.-M. Dufour and A. Farhat
Table 33.4: Empirical level and power for tests of equality of two discrete
distributions: m = n = 22 and a = 5%
References
1. Allen, D. 1. (1997). Hypothesis testing using an Ll-distance bootstrap,
The American Statistician, 51-2, 145-150.
10. Dufour, J.-M. (1995). Monte Carlo tests with nuisance parameters: A
general approach to finite sample inference and nonstandard asymptotics
in econometrics, Discussion paper, CRDE, Universite de Montreal.
12. Dufour, J.-M. and Farhat, A. and Gardiol, L. and Khalaf, 1. (1998).
Simulation based finite sample normality tests in linear regressions, The
Econometrics Journal, 1, 154-173.
448 J.-M. Dufour and A. Farhat
13. Dufour, J.-M. and Kiviet, J. F. (1998). Exact inference methods for first
order autoregressive distributed lag models, Econometrica, 66, 79-104.
19. Kiviet, J. F. and Dufour, J.-M. (1996). Exact tests in single equation au-
toregressive distributed lag models, Journal of Econometrics, 20-2, 325-
353.
22. Massey, F. J. (1952). Distribution table for the deviation between two
sample cumulatives, Annals of Mathematical Statistics, 23, 435-441.
34.1 Introduction
Data from two-factor completely random designs are typically analyzed using
classical ANOVA methods. However, in certain two-factor experiments, the
response is expected to increase as the levels of both factors increase. (Note,
decreasing expectations and/or levels can be reparameterized to satisfy this
framework.) This knowledge allows the researcher to implement order-restricted
data analysis techniques on the experimental data. The usage of such techniques
449
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
450 T. Hoang and V. L. Parsons
should result in testing procedures having more power than unrestricted F-tests.
For example, consider a two-factor design for controlling high blood pressure
by a hypertension treatment at increasing dosage and a regimen of physical
exercise at increasing levels. Blood pressure should be expected to decrease
if levels of both treatments increase. In such cases, we say that experimental
responses are stochastically ordered with respect to the lattice order on the two
factors. Discussion of such experiments appears in Higgins and Bain (1999),
and we adopt some of their notation in presenting the analytical structures.
More precisely, consider a two-way experiment in which the increasing levels
of the factors A and B are labeled by i E {I, 2, ... I} and j E {I, 2, ... J},
respectively. We refer to the resulting I . J pairs of levels as the I x J cross-
classification grid. A stochastic ordering of the experimental response random
variable, X ij , defined on the I x J grid can be imposed, and for this paper the
case of distributional shift will be considered. We assume that the response has
the form Xij = ILij + eij where the ei/s are independent identically distributed
random errors having a continuous distribution function. The trend function,
ILij, on the I x J grid will be said to be isotone with respect to the lattice
ordering::s on the grid if ILij :::; ILkl whenever (i, j) ::S (k, l), i.e., i :::; k and j :::; l.
In experimental situations involving small sample sizes and weak distri-
butional assumptions about the samples, nonparametric procedures may be
appropriate. Most of the previous work on nonparametric tests of ordered al-
ternatives against homogeneity have been discussed for one-way layouts in com-
pletely random design or randomized complete block designs [Terpstra (1952),
Jonckheere (1954), Buning and Kassler (1996), Chacko (1963) and Shorack
(1967)]. Ordered two-factor completely random designs have been investigated
to a lesser degree [Ager and Brent (1978) and Higgins and Bain (1999)]. Chacko
(1963) and Shorack (1967) used an order-restricted least squares (ORLS) ap-
proach [see Robertson, Wright, and Dykstra (1988) for general discussion] to
define test statistics. For two-factor designs we focus on a Kruskal-Wallis-type
test statistic defined by ORLS procedures [see Robertson, Wright, and Dykstra
(1988, Chapter 4.5)]. For this case the test statistic and its null distribution are
difficult to compute, and thus, its usage has been deferred. Recent advances
in computer processing have made ORLS methods more practical. With non-
parametric tests, complicated null distributions can be simulated, and statistics
involving heavy computation can be put to practical use. The authors are not
aware of a power study of nonparametric ORLS tests such as the one herein.
against the alternative hypothesis, HI: The J.lij'S are isotone with respect to
the lattice order on the I x J grid, and strict inequality holds for at least one
pair of cells.
Three types of "traditional" test statistics for trend will be assessed on
the rectangular grid: Jonckheere-Kendall, lattice-rank, and linear rank type
statistics. The two-factor Jonckheere-type statistic is of the form:
J=
{a,b : ajb}
V* = L L ij L RL(Xijk)
i=lj=1 k=1
where RL(Xijk) is the lattice rank of Xijk. The lattice rank for an observation
in cell (i,j) is its rank among all observations in cells that are comparable to
(i,j), i.e., all Xghk such that (g,h):::::s (i,j) or (i,j):::::s (g,h).
Traditional linear rank statistics can also be defined for a lattice ordering,
for example,
I J nij
L = LLCij LRijk
i=1 j=1 k=1
where Cij are isotone coefficients with respect to the lattice order. We chose
Cij = (Vi + VJ)2 = i + j + 2ViJ to represent monotonic main effects and
interaction structure. The case Cij = i + j turned out to be quite similar.
The statistics J, V* and L all reject Ho when large values are attained.
These three statistics will be compared to a nonparametric version of the ORLS-
motivated statistic, X5I' for testing homogeneity against ordered alternatives
[see Robertson, Wright, and Dykstra (1988, Chapter 2)]. A general ORLS
version of X5I can be presented as follows:
If { (Ii, nd 0"2 Hf=1 are random variables and specified weights associated
with the k elements of a finite partially ordered set, (8, :::::5), then a test statistic
for testing Ho : E(Ti) are constant versus HI : E(Ii) satisfy the partial ordering
(not all constant) is
452 T. Hoang and V. L. Parsons
where (Ti, T2, ... Tn is the isotonic regression of {(Ti' nd (J"2)}f=1 [see Robert-
son, Wright, and Dykstra (1988, Definition 1.3.3)] and T is the weighted grand
mean of the Ti's. One rejects Ho for large values of X6I.
For rank-transformed data with Wilcoxon scores Chacko (1963) and Shorack
(1967) proposed analogues of the X6I test statistic, which can be thought of as
ORLS versions of the Kruskal-Wallis test statistic. Shiraishi (1982) provided
generalizations that cover scores other than Wilcoxon scores. For our study
let Rijk denote the Wilcoxon-scored rank of Xijk based upon all n = Li,j nij
observations on the I x J grid. We will also consider the median scores:
cP(R- ok)
tJ
= {o if 0 < Rijk < (n + 1)/2
1 if (n + 1)/2 ::; Rijk ::; n .
The latter transformation is often used when sampling from heavy tailed
distributions. For each cell (i, j) we compute the cell means for the scores
nij nij
-2
XOl (cPR) =
(n-1) ~~
(1 _ ) L... L... nij cPij - P
(-* )2 ,
np p i=1 j=1
2. Shift alternative distributions, Xijk = J.Lij + eijk where eijk were indepen-
dent identically distributed random errors from the families:
Standard Gaussian, double exponential, H dg/ Z exp(hZ 2/2), h = 0.5, 1,2
[see Hoaglin, Mosteller, and Tukey (1985)] and Gamma b) with density
function ex: x'Y- 1 e- x , I = ~,1 (exponential), 2, were considered.
All the distributions for the error, e, were scaled to have unit variance
except for the H distributions which were scaled so that P( -1 < e <
1 ) = .6826 to agree with a standard Gaussian. The Gamma distributions
were selected to generate a skewed error.
3. The shape of the trend function, J.Lij, on the I x J grid was determined
to be the most important factor in power comparisons, but having two
factors now squares the order of magnitude for the number of cases needed
for consideration over that of one dimension. For a two-factor design, the
response as a function of the experimental levels is J.Lij = f) + Qi + (3i +
lij , Q and (3 are the main factors and I the interaction. To represent
such a broad range of increasing trends on the grid we considered simple
discretized monotonic functions on the unit square and simple monotone
step functions. The following list provides some basic trend shapes on the
25-cell grid which should provide some insight into relative merits of the
proposed statistics.
Our choice of the terms "early" and "late" for trends Tr8-Tr 11 refer to the
position of effects producing maximum change. Trends Tr1-Tr4 represent
extreme cases of trend functions taking Just two values.
relative superiority for the larger n. This superior behavior is also appar-
ent for trends Tr7-Trll in Tables 34.3 and 34.4. While the L test and V*
test exhibit superiority in some cases, the Jonckheere statistic appears to
perform well over a wider range of strictly increasing trend shapes.
• The one-step trend shapes Tr2, Tr3, Tr4 and Tr6 reverse the superiority
just discussed. For these trend shapes the X51 (R) statistic shows superior
power over the J, Land V* statistics. The border one-step trend, Tr3,
appears to be an extreme case. The one-main-effect trend, Tr6, is detailed
in Tables 34.1 and 34.2.
• The angle three step trend, Tr5, represents a degree of trend somewhat
between the one-step and strictly increasing trends. Here, the J and
V* statistics tend to have greater power than the X51 (R) statistic, but
reversals are more frequent than for the stronger trends, especially for
n=l.
• Some caution must be used with the interpretation of the one-step, Tr1,
trend. Here, cell parameter /--l5,5 can be thought of as the only cell that
deviates from Ho. The reduction of the data to ranks results in very
low power for small sample sizes, even as /--l5,5 -> 00. For trend Tr1 the
maximum power was of the order 0.05 for n = 1 and about 0.20 for n = 4.
When comparing the two different sample sizes for this case, the patterns
of eff were less consistent between than for the other trend shapes.
• The V* statistic is defined to detect strong trends that occur during late
levels of the experimental treatments. Its best performance was for the
"late effects + interaction trend", TrIO, and the one-step trend, Trl.
These observations are consistent with the discussion presented in Higgins
and Bain (1999).
In conclusion the study seems to suggest that if extremely heavy tailed distri-
butions are discounted, then the Jonckheere statistic or X51 (R) statistic would
be a reasonable choice test statistic for testing a broad range of two factor or-
dered alternatives. The Jonckheere statistic might be favored if the researcher
believed that an increase in anyone factor level should result in a strictly in-
creased response. For experiments where the effectiveness of one factor or many
of levels are questionable, the X51 (R) statistic would be favored.
Power Comparisons 457
,.-
J
~
Power and effiency are computed for random variables of the form eX + J.L8 where J.L is a
trend function on the grid scaled by 8, the distance of J.L to H a, and X a random variable
standardized by c. Distributions of X are symmetric: Gaussian Z, H = Z exp(hZ 2 /2)
for h = .5,1,2, or skewed: exponential, gamma(.5), gamma(2). In the case of uniform
gradient trend J.Lij oc (i - 1) + (j - 1) and for uniform steps J.Lij oc (i - 1)
458 T. Hoang and V. L. Parsons
Power and effiency are computed for random variables of the form cX + l·u5 where /L
is a trend function on the grid scale~ by 8, the distance of /L to Ho, and X a random
variable standardized by a constant c. Distributions of X are symmetric: Gaussian
Z, H = Z exp(hZ2 /2) for h = .5,1,2, or skewed: exponential, gamma(.5), gamma(2).
In the case of uniform gradient trend /Lij ex (i - 1) + (j - 1) and for uniform steps
/Lij ex (i - 1) on cells (i,j)
Power Comparisons 459
Table 34.3: Compa ring ranges of effiency of statist ics and choosi
ng a test for
selected trend shapes and distrib utions and for 0; = 0.01, 5 x 5
grids and one
observ ation per cell
Trend Ostribution Jonckheer e test I Unear rank test T V'test I Isotonic rredian test I Test
Mn tv'ex I Mn tv'ex Mn tv'ex Mn tv'ex choice
n-l
O1estep Gauss 1.21 1.34 1.14 1.24 2.15 2.26 0.18 0.36 V'
Skewed 1.08 1.50 0.98 1.32 1.57 2.17 0.18 0.58 V'
Heavy tail 1 1.13 1.45 1.11 1.40 1.87 2.34 0.20 0.50 V'
Heavy tail 2 1.07 1.37 1.18 1.30 2.07 2.40 0.20 0.38 V'
lJagona/ Gauss 0.90 0.96 0.91 0.96 0.71 0.83 0.74 0.77 f<MI
one step Skewed 0.90 0.98 0.89 0.94 0.66 0.73 0.53 0.81 f<MI
Heavy tail 1 0.90 0.99 0.92 1.00 0.72 0.85 0.94 1.19 Mf<MI
Heavy tail 2 0.94 1.00 0.94 1.01 0.77 0.84 1.30 1.76 M
Border Gauss 0.50 0.77 0.39 0.63 0.53 0.90 0.30 0.62 f<MI
one step Skewed 0.49 0.90 0.39 0.77 0.51 0.85 0.33 1.47 f<MI
Heavy tail 1 0.48 0.83 0.38 0.73 0.52 0.93 0.29 0.94 f<MI
Heavy tail 2 0.56 0.85 0.47 0.69 0.62 0.83 0.42 1.11 f<MI
Angle Gauss 0.66 0.91 0.56 0.84 0.64 0.99 0.61 0.69 f<MI
corner Skewed 0.65 0.92 0.57 0.78 0.63 0.81 0.70 1.71 f<MIlM
one step Heavy tail 1 0.64 0.95 0.55 0.82 0.64 0.92 0.71 1.03 f<MI
Heavy tail 2 0.77 0.91 0.66 0.81 0.74 0.86 0.98 1.49 M f<MI
Angle Gauss 0.94 1.08 0.91 1.05 0.96 1.15 0.62 0.71 f<MI V'
three steps Skewed 0.98 1.08 0.93 0.99 0.89 1.05 0.66 0.93 f<MI Jone
Heavy tail 1 0.96 1.09 0.92 1.03 0.98 1.12 0.74 1.06 V'
Heavy tail 2 1.06 1.11 0.99 1.03 1.04 1.11 1.04 1.52 MJone
Lhfform steps Gauss 0.73 0.88 0.67 0.88 0.67 0.90 0.60 0.67 f<MI
(one effect) Skewed 0.77 0.92 0.67 0.86 0.67 0.78 0.57 0.87 f<MI
Heavy tail 1 0.77 0.94 0.68 0.93 0.70 0.92 0.73 1.11 f<MI
Heavy tail 2 0.90 1.02 0.78 0.93 0.82 0.95 1.03 1.58 M f<MI
Lhfform gracfler1! Gauss 1.07 1.09 1.06 1.08 1.01 1.03 0.61 0.67 Jonc
(tw 0 unfforn1y Skewed 1.05 1.24 1.01 1.13 0.83 1.10 0.56 0.85 Jane
increasing effects) Heavy tail 1 1.11 1.18 1.09 1.12 1.00 1.09 0.73 1.03 Jonc
Heavy tail 2 1.12 1.30 1.09 1.17 1.04 1.07 1.07 1.51 MJonc
Early effects Gauss 1.03 1.08 1.08 1.09 0.89 0.97 0.61 0.66 LJane
Skewed 1.00 1.14 0.98 1.10 0.74 0.99 0.51 0.86 Jonc
Heavy tail 1 1.06 1.14 1.11 1.13 0.93 0.99 0.75 1.00 LJonc
Heavy tail 2 1.13 1.27 1.11 1.18 0.98 1.03 1.08 1.52 MJonc
Late effects Gauss 1.04 1.06 0.98 1.00 1.05 1.07 0.60 0.65 V'Jonc
Skewed 1.06 1.26 1.01 1.09 0.90 1.18 0.63 0.88 Jonc
Heavy tail 1 1.08 1.14 1.00 1.07 1.06 1.11 0.72 0.97 Jonc
Heavy tail 2 1.11 1.28 1.01 1.10 1.07 1.11 1.05 1.48 MJone
Late effects Gauss 1.04 1.06 1.06 1.09 1.15 1.18 0.57 0.65 V'
+ interaction Skewed 1.04 1.27 1.06 1.18 0.99 1.27 0.64 0.83 Jane V'
Heavy tail 1 1.09 1.16 1.09 1.15 1.17 1.24 0.71 0.96 V'
Heavy tail 2 1.10 1.28 1.08 1.17 1.12 1.21 1.04 1.46 MJonc
Early + late effects Gauss 1.04 1.05 1.04 1.06 0.97 1.01 0.59 0.68 L Jane
Skewed 1.04 1.20 0.97 1.10 0.82 1.08 0.56 0.81 Jane
Heavy tail 1 1.10 1.14 1.06 1.11 0.99 1.07 0.70 1.06 Jonc
Heavy tail 2 1.13 1.27 1.09 1.13 1.02 1.05 1.01 1.48 MJane
Distrib utions: Gauss; Skewed: expone ntial, gamma (.5), gamma (2);
Heavy tail 1: dou-
ble expone ntial, H = Z exp(hZ 2 /2) for h = .5; Heavy tail 2: H for
h = 1,2
460 T. Hoang and V. L. Parsons
Table 34.4: Comparing effiency of statistics and choosing a test for selected
trend shapes and distributions and for Q; = 0.01, 5 X 5 grids and four observations
per cell
Trend Distribution Janckheere test Unear r..,k test I yo test I Isotanie medi.., test Test
Min Max Min Max Min Max Min Max choice
n=4
One step Gauss 0.48 0.85 0.48 0.82 0.90 1.48 0.20 0.46 KW yo
Skewed 0.49 1.04 0.48 1.01 0.94 1.67 0.20 1.51 KW yo
Heavy tall 1 0.45 0.95 0.44 0.99 0.86 1.81 0.16 0.61 KW V·
Heavy tall 2 0.59 0.96 0.59 0.91 1.10 1.62 0.28 0.81 yo
Diagonal Gauss 0.99 1.21 0.95 1.15 0.76 0.90 0.65 0.68 Jane KW
one step Skewed 0.95 1.20 0.94 1.10 0.78 0.87 0.28 0.56 Jane KW
Heavy tail 1 0.98 1.24 0.95 1.19 0.76 0.93 0.96 1.23 Jane KW
Heavy tail 2 1.00 1.21 0.97 1.13 0.78 0.90 1.44 2.18 1M
Border Gauss 0.44 0.99 0.39 0.86 0.49 0.98 0.56 0.73 KW
one step Skewed 0.28 0.85 0.26 0.77 0.32 0.84 0.59 1.79 KW
Heavy tail 1 0.41 0.95 0.36 0.84 0.46 0.98 0.69 1.24 KW
Heavy tail 2 0.48 0.93 0.42 0.81 0.54 0.94 0.96 2.03 1M KW
Angle Gauss 0.73 1.01 0.63 0.88 0.67 0.91 0.63 0.64 KW
caner Skewed 0.64 0.99 0.56 0.89 0.63 0.92 0.41 0.91 KW
one step Heavy tall 1 0.71 1.07 0.62 0.93 0.66 1.00 0.94 1.18 KW 1M
Heavy tall 2 0.75 1.06 0.66 0.90 0.70 0.94 1.40 2.20 1M KW
Angle Gauss 0.99 1.29 0.94 1.24 0.97 1.21 0.62 0.66 Jonc
lhree steps Skewed 0.95 1.17 0.89 1.08 0.98 1.09 0.36 1.06 Jone
Heavy1alll 0.98 1.29 0.93 1.20 0.95 1.20 0.89 1.14 Jane
Heavy tail 2 1.03 1.27 0.94 1.20 0.95 1.21 1.34 2.11 1M Jane
Uniform steps Gauss 0.76 1.02 0.73 0.98 0.89 0.92 0.62 0.65 KW
(ane effect) Skewed 0.76 0.99 0.70 0.95 0.70 0.85 0.29 0.86 KW
Heavy tail 1 0.75 1.02 0.72 0.99 0.89 0.94 0.90 1.12 KW
Heavytait 2 0.83 1.05 0.76 1.02 0.74 0.93 1.34 2.19 1M KW
Uniform gradient Gauss 1.18 1.22 1.16 121 1.08 1.09 0.63 0.67 JoncL
(two uniformly Skewed 1.19 1.29 1.15 1.21 1.04 1.25 0.35 0.92 Jane
increasing effects) Heavy tall 1 1.19 1.28 1.15 1.25 1.06 1.19 0.87 1.11 Jane
Heavy tail 2 1.22 1.29 1.14 1.21 1.03 1.11 1.26 1.87 1M Jane
Ea1y effects Gauss 1.11 1.21 1.14 121 0.94 1.01 0.63 0.65 L Jane
Skewed 1.12 1.21 1.11 1.20 0.92 1.10 0.30 0.77 L Jane
Heavy tail 1 1.10 1.27 1.11 128 0.95 1.07 0.85 1.12 L Jane
Heavy1all2 1.16 1.24 1.13 1.25 0.96 1.03 1.25 1.84 1M LJane
L.a!e effects Gauss 1.12 1.21 1.06 1.13 1.10 1.15 0.60 0.65 Jane
Skewed 1.16 1.30 1.04 1.14 1.11 1.23 0.41 0.98 Jane
Heavy1alll 1.12 1.28 1.05 1.18 1.08 1.24 0.87 1.12 Jane
Heavy1all2 1.18 1.28 1.06 1.17 1.05 1.17 1.26 1.91 IMJonc
L.a!e effects Gauss 1.12 1.24 1.14 1.24 1.21 1.28 0.63 0.65 yo
+ interaction Skewed 1.15 1.31 1.12 1.23 1.22 1.30 0.47 1.08 V' Jane
Heavy tail 1 1.14 1.28 1.13 1.27 1.18 1.34 0.85 1.10 V' Jane
Heavy1all2 1.19 1.25 1.15 123 1.13 1.28 1.22 1.85 1M yo Jane
One eaiy + Gauss 1.12 1.23 1.10 1.21 1.02 1.10 0.62 0.66 Jane
ane Iale effects Skewed 1.14 1.25 1.10 1.14 1.02 1.15 0.35 0.97 Jane
Heavy tail 1 1.11 1.27 1.08 1.24 1.02 1.15 0.85 1.12 Jane
Heavy1all2 1.16 1.24 1.10 1.18 1.00 1.09 1.22 1.85 1M Jonc
Appendix
The simulation standard error of eff can be approximated by applying a Taylor
linearization on the functional form. For an estimator P of a proportion P
we have Var(cp-I(p)) ~ [¢(cp-l(p))]-2 p(l - p)jn , where n is the number of
simulations. For two estimated proportions on the same simulation we have
the correlation coefficient P (cp-I(PI), cp-I(P2)) ~ p(PI,P2). For a variable of
the form: T2 = [(cp-I(P2) - cp-I(a)) / (cp-I(PI) - cp-l(a))]2 with a treated
as a constant, we have after additional Taylor linearizations an approximation
for the standard error of T2, SE(T2), due to the simulation:
SE(T2) ~ 2T2 (CVl + CV;2 - 2p(pl ,P2)CVICV2) 1/2 where
CV? = [¢(cp-I(Pi)) . (cp-I(pd - cp-l(a))]-2 Pi(l- Pi)/n. The value of pis
estimated from the simulation. Using this approximation examples of estimates
for the standard error of eff will be 0.0281 and 0.0127 when PI = P2 = 0.15 and
PI = P2 = 0.50 respectively. (Here, we used p = 0.30 ).
References
1. Ager, J. W. and Brent, S. B. (1978). An index of agreement between a
hypothesized partial order and an empirical rank order, Journal of the
American Statistical Association, 73, 827-830.
10. Robertson, T., Wright, F. T., and Dykstra, R. (1988). Order Restricted
Statistical Inference, Chichester: John Wiley & Sons.
Paul Deheuvels
L.S. T.A., Universite Paris VI, Bourg-la-Reine, France
lP(X ~ "(x, Y ~ vy) = exp ( - (x + Y)A(x: y)) for x> 0, y > 0, (35.1)
463
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
464 P. Deheuvels
°
-Towards the aim of testing (H.O) against the general alternative (H.l) that
A(u) i- 1 for some < u < 1, we introduce the empirical process [see, for
example, Deheuvels (1991)J
(35.2)
where we set
_ 1 n
and Yn = - 2:Yi. (35.3)
n i=l
Theorem 35.1.1 Under (H.O)) the empirical process {Zn(u) : 0 < u < I}
converges weakly in (C[O, l],U) to {Z(u) : 0 ~ u ~ I} as n -+ 00.
Given this result, it is natural to consider the general class of tests of inde-
pendence of X and Y based upon statistics of the form 8(Zn), where 8 is an
appropriate functional on (C[O,l],U). Since, under suitable continuity condi-
tions on 8, the limiting distribution as n -+ 00 of 8(Zn) is that of 8(Z), it is
not too difficult to make use of tests of this type, as long as the distribution
of 8(Z) can be evaluated. In the following, we present some examples of the
kind. For further details on this problem, we refer to Deheuvels and Martynov
(1996,2000).
and almost surely continuous sample paths. Under these assumptions [see,
for example, Adler (1990, pp. 66-79), Kac and Siegert (1947), Shorack and
Wellner (1986, pp. 206-218)] there exist constants Al ~ A2 ~ ... ~ 0, together
with continuous functions el (t), e2(t), . .. , on [0,1] (the eigenfunctions of the
covariance kernel R(u, v)), such that the following properties (K.1-2-3-4) are
fulfilled.
(K.1) The {ek : k ~ I} are orthonormal in L2[0, 1], i.e.,
r 1
10 ei(t)ej(t)dt =
{I0 ifi=j,
if i i= j, (35.5)
466 P. Deheuvels
(K.2) The {(Ak' ek) : k 2: I} form a complete set of solutions of the Fredholm-
type equation in (A, e),
(K.3) We have
<Xl
lH
k=1
<Xl
1, L ak <
k=1
oo}, (35.9)
where
<Xl <Xl
(35.12)
°
The relations (35.12)-(35.13) have applications when {Z(u) : ~ u ~ I} is the
weak limit (with respect to an appropriate topology) of a sequence {Zn (u)
°~ u ~ I} of empirical processes. In this framework, the statistic
(35.14)
Cramer-von Mises statistic [see, for example, Shorack and Wellner (1986, Propo-
sition 1 and Theorem 1, pp. 213-217), Durbin (1973, p. 32), Darling (1955, p.
15), Smirnov (1948), Anderson and Darling (1952), and Darling (1957)].
- The limiting process Z(t) = B(t)/ Jt(1 - t) of the Anderson-Darling statistic,
where B(t) denotes a Brownian bridge [see, for example, Anderson and Darling
(1954), Watson (1961,1967), Shorack and Wellner (1986, pp. 148 and 224-227)].
In this case, Ak = 1/(k(k + 1)) for k 2:: l.
It turns out that in the framework of our study, the KL expansion of the
centered Gaussian process Z with covariance function (35.4) has been obtained
by Deheuvels and Martynov (2000). Their result is given in the next theorem~
The following notation and facts from the theory of orthogonal polynomials will
be needed.
The Jacobi polynomials [see, for example, Tricomi (1970, pp. 160-177) and
Szego (1967)] denoted by Pi:,f3(x) for n 2:: 0 and a,f3 > -1, with x E [-1,1],
yield the modified Jacobi polynomials [see, for example, Chihara (1978, (2.1),
p. 143)] via the change of variable u = (x + 1)/2. These are defined, for n 2:: 0
and 0 ~ u ~ 1, by
(l)n 1 dn
QQ,f3(u) = PQ,f3(2u - 1) = - - - -{uf3+n (1- u)Q+n}. (35.15)
n n n! uf3(1 - u)Q dun
The modified Jacobi polynomials {Q~,f3 : n 2:: O} fulfill the orthogonality rela-
tions [see, for example, Chihara (1978, (2.18), p. 148)], for m, n 2:: 0,
10 1 Q~l(u)Q~,f3(u)uf3(1- u)Qdu
= 0 when m -I n, (35.16)
r(n + a + l)r(n + f3 + 1)
when m = n.
(2n+a+f3+ l)r(n+a+f3+ l)n!
For a = f3 = 2, the polynomials, defined, for n 2:: 0 and 0 ~ u ~ 1, by
= An
u
= 1 X (n + 1)(n + 2) when m=n.
2n + 5 (n + 3)(n + 4)
Theorem 35.2.1 Let Z = Z and R( u, v) = R( u, v) be as in (35.4). Then, the
properties (K.I-2-3--4) hold with
6
Ak = k(k + 1)(k + 2)(k + 3) for k 2:: 1, (35.19)
Tests of Independence 469
x ~
.
( k + 1 .) (k ~ l)uk-i(U _ l)i+1.
k-l-J J
(35.21)
)=0
L L
00 00
PROOF. Combine Theorems 35.1.1 and 35.2.1 with (35.12)-(35.13) and (35.26) .
•
The statistic .1; in (35.26) allows us to test (H.O) against (H.l) by rejecting
the null hypothesis when .1; exceeds a critical level cn,a, chosen in such a way
that, for a specified 0 < ll! < 1, IP(.J; ~ cn,al(H.O)) = ll!. The evaluation of the
exact values of cn,a for the various possible choices of the risk level ll! E (0,1),
and the sample size n ~ 1, is beyond the scope of the present paper. Below, we
limit ourselves to some selected values of the limiting constants Ca such that
(35.28)
Remark 35.3.1 We will not discuss here the efficiency of the test based upon
.1;, with respect to alternative methods [refer to Balakrishnan and Basu (1995)].
Among the many possible statistics based upon functionals of Zn which (in
addition to .In) may be used to test (H.O) against (H.l), one should mention
the principal component test statistics
(35.29)
A direct consequence of the above theorems is that, under (H.O), for each k ~ 1,
(35.30)
Tests of Independence 471
and we may use this property to reject (H.O) when ±Tn,k or ITn,kl exceeds the
appropriate quantiles of the N(O, 1) law. The fact that the ek have explicit
expressions allows a simple use of this methodology. For example, making use
of (35.22), we obtain readily, under (H.O), that, as n ---+ 00,
J35
Vii
t{
i=l
XiYi
Xi Y n + YiX n
- ~}
3
---+d N(O, 1).
so that the test of (H.O) based upon Tn ,l is consistent. This property is not
shared in general by Tn,k for k ::::: 2. For example for k = 2 and A(u) = A(I- u)
for 0 :S u :S I, we infer from (35.22) that
Jro {A(u)
1
Jor 1
1 1
- 1 }e2 (u)du = {A(u) - 1 }J2IOu(I - u)(1 - 2u)du = O.
41 Jor }2 du,
1 {
.1; Zn(u) + Zn(1- u) (35.32)
and
.1;: = 4 o
r
1 J { Zn(u) - Zn(I - u) }2 duo
1
(35.33)
(35.34)
and
II (1 - 2iuA2£)
00 -1/2
Ji.,~ lE(exp(iu.1;:)) = . (35.35)
£=1
References
1. Adler, R. J. (1990). An introduction to continuity, extrema, and related
topics for general Gaussian processes, IMS Lecture Notes-Monograph Se-
ries 12, Hayward, California: Institute of Mathematical Statistics.
21. Durbin, J. (1973). Distribution theory for tests based upon the sample
distribution function, Regional Conference Series in Applied Mathematics,
9, Philadelphia: S.LA.M ..
22. Einmahl, J. H. J., de Haan, L., and Huang, X. (1993). Estimating a multi-
dimensional extreme-value distribution. Journal of Multivariate Analysis,
47,35-47.
23. Falk, M., Husler, J., and Reiss, R. D. (1994). Laws of Small Numbers:
Extremes and Rare Events, Basel: Birkhauser.
27. Grad, A. and Solomon, H (1955) Distribution of quadratic forms and some
applications, Annals of Mathematical Statistics, 26, 464-477.
32. Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994). Continuous Uni-
variate Distributions, Volume 1, New York: John Wiley & Sons.
34. Kuelbs, J. (1976). A strong convergence theorem for Banach space valued
random variables, Annals of Probability, 4, 744-771.
51. Smirnov, N. V. (1948). Table for estimating the goodness of fit of empir-
ical distributions, Annals of Mathematical Statistics, 19, 279-281.
52. Smith, R. L., Tawn, J. A., and Yuen, H. K. (1990). Statistics of multi-
variate extremes, International Statistical Review, 58, 47-58.
58. Watson, G. S. (1967). Another test for the uniformity of a circular dis-
tribution, Biometrika, 54, 675-676.
36
Testing Problem for Increasing Function in a
Model with Infinite Dimensional Nuisance
Parameter
36.1 Introduction
Let Xl be a random variable with the distribution function F(t) and density
function !(t), w(t) be an increasing absolutely continuous function, <Ii(t) be the
inverse function: <Ii = w- l . We put X 2 = <Ii(XI). The distribution function
477
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
478 M. Nikulin and V. Soley
the first one with the distribution function F(t) and the density function f(t),
and the second one with the distribution function G(t) = F(\J!(t)) and the
density function g(t) = f(\J!(t))'l/J(t). Let
(36.2)
where
<]?(t) = <]?(8; t), \J!(t) = \J!(8; t), ¢(t) = ¢(8; t), 'l/J(t) = 'l/J(8; t),
and
nl . n2
o < n--->oo
lim - = PI,
n
0 < hm -
n--->oo n
= P2· (36.4)
We note here that this problem is very natural in accelerated life testing,
see for example, Bagdonavicius and Nikulin (1999, 2000), Gerville-Reache and
Nikoulina (1999), Bagdonavicius, Gerville-Reache, Nikoulina and Nikulin (2000).
It is convenient for us to admit that random variables TI,j or T2,i may take the
value +00 with positive probability.
We shall use the Kaplan-Meier estimators Fnl (t) and 9n2(t) as the non-
parametric estimators for F(t) and G(B; t),
1-Fnl(t) = II
"K-I Z
( n- j )
<t n - ~ + 1
.,
J. 1,)-, 1,J_
(with the convention that I10 = 1), where ZI,I :S ... :S ZI,nl are the order
statistics of the sample yl = (YI,I,"', YI,nl) and Z2,1 :S ... :S Z2,n2 are the
order statistics of the sample y2 = (Y2,1,"', Y2,n2)'
We consider the kernel density estimator in(t) for estimating f(t) at a fixed
point t E R
(36.5)
and the kernel density estimator gn(t) for estimating g(B; t) at a fixed point
tE R
(36.6)
J(B) = [a(B), b(B)], where a(B) = w(B; a), b(B) = w(B; b).
480 M. Nikulin and V. Soley
We suppose for simplicity that function 1lJ(0;·) is defined on the real line and
the image of 1lJ(0;·) does not depend on O. We define the function f(O, 01 ; t) by
the relation
(36.7)
then
(36.8)
We put
For a nonnegative function ro(t) we define the function ro(t) by the relations
It is clear that
en = argmjn J
I
lin(t) - f~(0;t)12ro(t)dt. (36.10)
It is evident that
Tn = J
I
lin(t) - f~(O; t)1 2ro(t) dt
So we have
H*(O; t) = H(O; ~(O; t)).
We assume, that
DI. Functions H(t) and H(O; t) satisfy the conditions:
b < t* = sup {t: H(t) < I} and b < t*(O) = sup {t: H*(O;t) < I}.
We suppose too that the density function f(t), the weight ro(t) and the distri-
bution functions H(t), H(B; t) satisfy the following conditions on some appro-
priately chosen interval [a, b]:
Fl. Function f is continuously differentiable on interval [a, b + c:] for some
ft
c: > 0, function f is absolutely continuous on [a, b + c:] and
[ftft (t)
sup
tE[a,b+e:] f(t)
< 00, sup
tE[a,b+e:]
1[:22f]
t
(t)1 < 00.
F2. Functions f and 9 satisfy the condition
where
fnK(t) = J (th;:
- U)h1n K f(u) du,
=J
and
g~(B; t) C~nU) :n JC g(B; u) duo
Ql. Functions Ql(t) and Q2(t) are absolutely continuous on [a,b+c:] and
. d·
sup Iqj (t)1 < 00, where qJ(t) = dtQJ(t) (j = 1,2);
tE[a,b+e:]
and (-, .) , II . II are the inner product and the norm in Rd. Similarly a function
<p( B; t) (as function on B), that is differentiable (in the space L;) on B in the
Estimation of Increasing Function 483
V'P (B;·) - V'P (B o;') = H'P (Bo; ·)(B - Bo) + {}()o(B; .),
where
II{}()o(', B)112 -+ 0 in the space L;, when liB - Boll -+ O.
Here for matrix A we write IIAII~ = trA* A, where A* denotes the conjugate
matrix and for
and
H'P(B;t) = (
a~l a~l cp(B; t) '"
... ..,
a~l at cp(B; t) )
... .
(36.14)
Burke, Czorgo, and Horvath (1981) proved that under the above conditions the
next proposal is true.
484 M. Nikulin and V. Soley
(36.15)
JIf~(t)
where
In(P) = - f(t)IPr(t) dt.
and
en = argmjn J
1
lin(t) - f~(e;t)12ro(t)dt,
J lin(t) - f~(en;t)12ro(t)dt
J
1
J
I
lin(t) - f~(en; t)1 2ro(t) ~ ° as n ---+ 00, (36.19)
and therefore
(36.20)
J
I
A *A
If(8, 8n;t) - fn(8n; t)1 ro(t)
2 P
--t ° as n ---+ 00. (36.21 )
J
I
If(t) - f(8, en; tWro(t) ~ ° as n ---+ 00. (36.22)
•
We suppose, that the function f(8, (h;·) is continuous on 81 in the space L;o
in some neighborhood of the true value 8 and for any c > 0
(36.24)
Since
5(c) ---+ 0, when c --t 0,
and, as it is follows from Proposition 36.2.2, if 8 is the true value of parameter,
then
p {J If(t) - f(e, en; t)12 ro(t)dt 2: 8} ---+ 0, when 5 --t 0,
Let
yCI,n) ::; yC2,n) ::; ... ::; Y(n,n)
be the ordered statistics of the sample Y = {YI,"', Yn}. We consider the
Kaplan-Meier estimator Fn(t) of the distribution function F(t)
1 - Fn(t) = IT (
n_ j )K j
n-j+1 '
j : 1 ::; j ::; n,
Y(j,n) ::; t
(36.26)
Notice that in the case, when P {Xj ::; Tj} = 1 the Kaplan-Meier estimator
Fn(t) coincides with the usual empirical distribution function Fn(t),
n
Fn(t) =L l(-oo,t) (Xi) .
i=l
f(t) f(t)
p(t)
(1 - H(t)) (1 - F(t)) - (1 - QI(t)) (1 - F(t))2'
D(a) O. (36.27)
Estimation of Increasing Function 487
Burke, Czorgo, and Horvath (1981) proved that under the condition
where
Thus,
d
dt Y(t) = -y(t)f(t)
J
and if
l(y) = f(t)y(t) dt = 0, then Y(a) = Y(b) = 0.
[a,b]
PROOF. Under the conditions F(a) = 0, F(b) < 1, integration by parts gives
2 J y(t)Y(t) f(t)dt
(1 - F(t))
= y2(0) + J y2(t) f(t)dt .
(1 - F(t))2
(36.31)
[a,b] [a,b]
Since
2 f(t) 2 f(t) 2 f(t)
(A [y] (t)) (1 _ F(t))2 = Y (t)f(t) - 2y(t)Y(t) (1 - F(t)) +Y (y) (1 _ F(t))2'
[a,b]
J
Then the relation
Y(y) = y(t) dW*(t)
[a,b]
determines a Gaussian random variable with zero mean and variance (J"2 (y).
J W (V(t)) f(t)y(t) dt
[a,b]
Further we shall denote by S the set of all such functions y(t). Integration by
J J
parts gives
W (V(t» f(t)y(t) dt = Y(t) dW (V(t».
~~ ~~
Hence for such function y(t) integral
J
[a,b]
y(t) dW*(t) J
[a,b]
[(1 - F(t)y(t) - Y(t»] dW (V(t»
J
[a,b]
[A[y] (t»] dW (V(t».
Since the last integral is well defined for any y(t) such that
[a,b]
and the set S is dense in the space with semi-norm II· we can define the "*,
integral on dW*(t) for any function 'P, II'PII* < 00, by the relation
•
Let W (t) be a Wiener process and let
where
[a,b]
Here
g(();t)
(1 - Q2(t)) (1 - G((); t))2'
O. (36.33)
Thus, we have
+-1 1
hn hn hn hn
y'nl
(t - U)
1 1
-K
hn
-
hn
* 1
dWn(u)+-
y'nl
1
-K
hn
-
hn
(t - U) dRn(u) ,
and
[a,b]
[a(O),b(O)]
we have
nhIj2 J J_..,fii2
1_¢((); t) J ~K (<P(();
hn
t)"- U) dRn(u)J2 ro (t)
hn
~0 as n ~ 00 .
From Lemma 36.3.4 and Lemma 36.3.5 follows the next lemma.
Lemma 36.3.6 Under the conditions from Lemma 36.3.4 and Lemma 36.3.5
and the conditions F2 we have
1
I n2 ((); t) = ..,fii2¢((); t) J h1n K (<P(();t)-U)
hn -
dWn(u) ,
p
rn --+ 0 as n ~ 00.
References
1. Bagdonavicius, V. and Nikulin, M. (1999). On semiparametric estimation
of reliability from accelerated life testing, In Statistical and Probabilis-
tic Models in Reliability (Eds., D. Ionescu and N. Limnios), pp. 75-89,
Boston: Birkhauser.
7. Czorgo, M., Gombay, E., and Horvath, 1. (1991). Central limit theorems
for Lp distance of kernel estimators of densities under random censorship,
Annals of Statistics, 19, 1813-1831.
Estimation of Increasing Function 493
Masafumi Akahira
University of Tsukuba, Ibaraki, Japan
37.1 Introduction
In the paper, Hodges and Lehmann (1970) introduced the concept of (asymp-
totic) deficiency as follows. Let 6n be a statistical procedure based on n obser-
vations, and a less effective procedure 6kn which requires a larger number kn
of observations to give equally good performance. For the additional number
k n - n of observations needed by the procedure 6t, they called it deficiency. If
d := limn->oo (k n - n) exists, it is called the asymptotic deficiency. In the higher
order asymptotics, the concept of asymptotic deficiency is very useful in dis-
criminating asymptotically efficient estimators [see, for example, Akahira (1986,
1999a, 1999b)]. Then, it is desirable for the value of the asymptotic deficiency
495
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
496 M. Akahira
(37.3)
If there exists
d:= lim (k n - n)
n-HXl
n a r (OJ,n
.) bj + 0
= a + -;;; (1) ;;;, , j = 1,2, (37.4)
then
. b2 - b1
lIm(kn-n)= aa 1/'
n-HXJ a
PROOF. Since the risks r(61,n) and r(62,n) are positive and converge to zero as
n -? 00, we have k n - ? 00 as n - ? 00. Since, by (37.3)
Since
(37.6)
Since lim n- HXJ kn = 00, it is easily seen from (37.6) that limn--->oo kn/n = 1.
Subtracting a 1/ cx and multiplying n in (37.6), we have for large n
a 1/ a _
n _
b2 + 0(1) = a 1/CX(kn - n) kn _
+ a 1/cx _ b1 + 0(1). (37.7)
[knl aa n aa
498 M. Akahira
If, for each j = 1,2, the (as.) variance V(J(8j,n) admits the expansion
(37.9)
which is the gen. (as.) deficiency (by as. variance) of 82 ,kn relative to 8l ,n, where
I(e) is the amount of the Fisher information of Xl, Le.
Example 37.2.1 Suppose that Xl,X2, ... ,Xn, ... is a sequence of i.Ld. ran-
dom variables according to the normal distribution with mean e and variance
1, where n 2: 2. Then we consider two estimators
1 n
81,n .- X = - LXi,
n i=l
A(W) 1
.- - {Xl + ... + X n-2 + WXn-l + (2 - w)Xn },
e2,n
n
Generalized Asymptotic Deficiency 499
'(w)
VII (()2 ,n) = -n1 + -2
2 2
n (w - 1) .
(37.11)
Note that the Fisher information number J(()) is equal to 1 and Ol,n is the
UMVU estimator of (). From (37.8) and (37.11) we have
which is the gen. as. deficiency (by variance) of otdn relative to Ol,n. Here
and
'(w)
n
VO(()2,kJ = (1- 1l"k )Vo(()2,[k
,
n]) + 1l"knVo(()2,[k
,
n]H)'
On the other hand, from (37.9) and (37.11) we have ~1 (()) == 0, ~2(()) _
2(w _1)2, hence
(37.13)
It is easily seen from (37.12) and (37.13) that (37.10) holds. For example, if
A(1/4) A
w = 1/4, it follows from (37.13) that d(()2,k n ,()l,n) = 9/8. We define
A(1/4) A A
VO(()2,n+(9/8)) = (7/8)Vo(()2,nH) + (1/8)Vo(()2,n+2).
From Theorem 37.2.1 we have
This means that O~~£:) asymptotically needs 9/8 more size (of sample) than Ol,n
in the continualized one for O~~£:) to be asymptotically equivalent to Ol,n in the
variance.
500 M. Akahira
(37.14)
Generalized Asymptotic Deficiency 501
mcs
modified minimum X2; Bmmcs x2 2
minimum Haldane discrepancy; BHmDk X k +1 k+l
minimum Hellinger distance; BmHd _x l / 2 1/2
minimum Kullback-Liebler separator; BKL xlogx 1
Since any m.d. estimator Bg in L has a bias, in order to adjust the bias up
to the order o(l/n), let
(37.15)
where
Theorem 37.3.1 The gen. as. deficiency (by as. variance) of B; relative to B'h
is given by
where
Furthermore, B:nl has the minimum gen. as. deficiency (by as. variance) in L*.
502 M. Akahira
8mes =
( + fiFij
2 P3 2
Pl +P2
2
2
-l
'
respectively, and
2
1(8) = 8(1 - 28)'
1 - 28
d ((}':nes' (}':nz) = 4"B > 0,
A A
where the bias-adjustment is due to (37.15). For example, if 8 = 1/4, d(e':nes, e':nz)
= 1/2. We newly denote e':nes and e':nz by e':nes(n) and e':nz(n) based on the sam-
ple (Xl, . .. , Xn) and define
References
1. Akahira, M. (1986). The Structure of Asymptotic Deficiency of Estima-
tors, Queen's Papers in Pure and Applied Mathematics 75, Kingston,
Canada: Queen's University Press.
Generalized Asymptotic Deficiency 503
505
506 Index