0% found this document useful (0 votes)
59 views511 pages

Model Validity

Uploaded by

mehdi lotfi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views511 pages

Model Validity

Uploaded by

mehdi lotfi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 511

Statistics for Industry and Technology

Series Editor

N. Balakrishnan
McMaster University
Department of Mathematics and Statistics
1280 Main Street West
Hamilton, Ontario L8S 4K1
Canada

Editorial Advisory Board

Max Engelhardt
E G & G Idaho, Inc.
Idaho Falls, ID 83415

Harry F. Martz
Group A-1 M S F600
Los Alamos National Laboratory
Los Alamos, N M 87545

Gary C. McDonald
N A O Research & Development Center
30500 Mound Road
Box 9055

Warren, M I 48090-9055

Peter R. Nelson
Department of Mathematcal Sciences
Clemson University
Martin Hall
Box 341907

Clemson, SC 29634-1907

Kazuyuki Suzuki
Communication & Systems Engineering Department
University of Electro Communications
1-5-1 Chofugaoka
Chofu-shi
Tokyo 182
Japan
Goodness-of-Fit Tests and
Model Validity

C. Huber-Carol
N . Balakrishnan
M.S. Nikulin
M . Mesbah
Editors

Springer Science+Business Media, LLC


C. Huber-Carol N. Balakrishnan
Laboratoire de Statistique Medicale Department of Mathematics and Statistics
Universite Rene Descartes—Paris 5 McMaster University
75006 Paris Hamilton, Ontario L8S 4K1
France Canada

M. S. Nikulin M. Mesbah
Laboratoire Statistique Mathematique Laboratoire de Statistique Appliquee
Universite de Bretagne Sud
Universite Bordeaux 2
56 000 Vannes
33076 Bordeaux Cedex
France
France
and
Laboratory of Statistical Methods
V. Steklov Mathematical Institute
191011 St. Petersburg
Russia

Library of Congress Cataloging-in-Publication Data

A CIP catalogue record for this book is available from the Library of Congress,
Washington D.C., U S A .

A M S Subject Classifications: 62-06, 62F03

Printed on acid-free paper. ÜL5) ®


©2002 Springer Science+Business M e d i a N e w Y o r k U^f)
Originally published by Birkhäuser Boston i n 2002
S o f t c o v e r r e p r i n t o f the h a r d c o v e r 1st e d i t i o n 2002

A l l rights reserved. This work may not be translated or copied in whole or in part without the written permission
of the publisher Springer Science+Business Media, L L C ,
except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form
of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar method-
ology now known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even i f the former are not
especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise
Marks Act, may accordingly be used freely by anyone.

ISBN 978-1-4612-6613-6 ISBN 978-1-4612-0103-8 (eBook)


DOI 10.1007/978-1-4612-0103-8

Typeset by the editors in I A T E X .

9 8 7 6 5 4 3 2 1
Contents

Preface xvii
Contributors xix
List of Tables xxvii
List of Figures xxxiii

PART I: HISTORY AND FUNDAMENTALS

1 Karl Pearson and the Chi-Squared Test 3


D. R. Cox
1.1 Karl Pearson 1857-1937: Background to the
Chi-Squared Paper 3
1.2 K. P.: After Chi-Squared 5
1.3 The 1900 Paper 5
1.4 Importance of the Chi-Squared Test 6
References 8

2 Karl Pearson Chi-Square Test-The Dawn of


Statistical Inference 9
C. R. Rao
2.1 Introduction 9
2.2 Large Sample Criteria: The Holy 'frinity 11
2.2.1 Likelihood ratio criterion 11
2.2.2 Wald test 12
2.2.3 Rao's score test 12
2.3 Specification Tests for a
Multinomial Distribution 13
2.3.1 Test of a simple hypothesis 13
2.3.2 Tests of a composite hypothesis 14
2.3.3 Test for goodness-of-fit in a subset of cells 15
2.3.4 Analysis of chi-square 17
2.3.5 Some applications of the chi-square test 18
2.4 Other Tests of Goodness-of-Fit 18

v
vi Contents

2.5 Specification Tests for Continuous Distributions 20


References 22

3 Approximate Models 25
Peter J. Huber

3.1 Models 25
3.2 Bayesian Modeling 27
3.3 Mathematical Statistics and Approximate Models 29
3.4 Statistical Significance and Relevance 31
3.5 Composite Models 32
3.6 The Role of Simulation 38
3.7 Summary Conclusions 40
References 40

PART II: CHI-SQUARED TEST

4 Partitioning the Pearson-Fisher Chi-Squared


Goodness-of-Fit Statistic 45
G. D. Rayner

4.1 Introduction 45
4.2 Neyman Smooth Goodness-of-Fit Tests 46
4.2.1 Smooth goodness-of-fit tests for
categorized data 47
4.2.2 Partitioning the Pearson-Fisher
chi-squared statistic 48
4.3 Constructing the Pearson-Fisher Decomposition 49
4.4 Simulation Study 50
4.5 Results and Discussion 51
References 55

5 Statistical Tests for Normal Family in Presence of


Outlying Observations 57
A i"cha Z erbet

5.1 The Chi-Squared Test of Normality in the


Univariate Case 57
5.1.1 Example: Analysis of the data of Milliken 59
5.2 Bol'shev Test for Outliers 59
5.2.1 Stages of applications of the test of Bol'shev 60
5.2.2 Example 2: Analysis of the data of Daniel (1959) 60
5.3 Power of the Chi-Squared Test 61
References 63
Contents Vll

6 Chi-Squared Test for the Law of Annual Death Rates:


Case with Censure for Life Insurance Files 65
Leo Gerville-Reache

6.1 Introduction 65
6.2 Chi-Squared Goodness-of-Fit Test 66
6.2.1 Statistics with censure 66
6.2.2 Goodness-of-fit test for a composite hypothesis 67
6.3 Demonstration 68
References 69

PART III: GOODNESS-OF-FIT TESTS FOR


PARAMETRIC DISTRIBUTIONS

7 Shapiro-Wilk Type Goodness-of-Fit Tests for


Normality: Asymptotics Revisited 73
Pranab Kumar Sen

7.1 Introduction 73
7.2 Preliminary Notion 74
7.3 SOADR Results for BLUE and LSE 77
7.4 Asymptotics for W~ 81
7.5 Asymptotics Under Alternatives 85
References 87

8 A Test for Exponentiality Based on Spacings for


Progressively Type-II Censored Data 89
N. Balakrishnan, H. K. T. Ng, and N. Kannan

8.1 Introduction 89
8.2 Progressive Censoring 91
8.3 Test for Exponentiality 92
8.3.1 Null distribution of T 93
8.4 Power Function Approximation and Simulation
Results 95
8.4.1 Approximation of power function 95
8.4.2 Monte Carlo power comparison 97
8.5 Modified EDF and Shapiro-Wilk Statistics 98
8.6 Two-Parameter Exponential Case 99
8.7 Illustrative Examples 100
8.7.1 Example 1: One-parameter exponential case 100
8.7.2 Example 2: Two-parameter exponential case 101
8.8 Multi-Sample Extension 102
8.9 Conclusions 103
References 103
viii Contents

9 Goodness-of-Fit Statistics for the Exponential


Distribution When the Data are Grouped 113
Sneh Gulati and Jordan Neus

9.1 Introduction 113


9.2 The Model and the Test Statistics 115
9.3 Asymptotic Distribution 116
9.4 Power Studies 119
References 122

10 Characterization Theorems and Goodness-of-Fit Tests 125


Carol E. Marchetti and Govind S. Mudholkar

10.1 Introduction and Summary 126


10.2 Characterization Theorems 127
10.2.1 Entropy characterizations 127
10.2.2 Statistical independence 128
10.3 Maximum Entropy Tests 130
10.4 Four Z Tests 131
10.5 Byproducts: The G-IG Analogies 134
References 137

11 Goodness-of-Fit Tests Based on Record Data and


Generalized Ranked Set Data 143
Barry C. Arnold, Robert J. Beaver, Enrique Castillo,
and Jose Maria Sarabia

11.1 Introduction 143


11.2 Record Data 144
11.3 Generalized Ranked Set Data 144
11.4 Power 150
11.5 Composite Null Hypotheses 154
11.6 Remarks 156
References 156

PART IV: REGRESSION AND GOODNESS-OF-FIT TESTS

12 Gibbs Regression and a Test of Goodness-of-Fit 161


Lynne Seymour

12.1 Introduction 161


12.2 The Motivation and the Model 162
12.3 Application and Evaluation of the Model 165
12.4 Discussion 169
References 170
13 A CLT for the L_2 Norm of the Regression Estimators
Under a-Mixing: Application to G-O-F Tests 173
Cheikh A. T. Diack

13.1 Introduction 173


13.2 Estimators 174
13.3 A Limit Theorem 175
13.4 Inference 177
13.5 Proofs 178
References 183

14 Testing the Goodness-of-Fit of a Linear Model in


N onparametric Regression 185
Zaher Mohdeb and Abdelkader Mokkadem

14.1 Introduction 185


14.2 The Test Statistic 186
14.3 Simulations 189
References 193

15 A New Test of Linear Hypothesis in Regression 195


Y. Baraud, S. Huet, and B. Laurent

15.1 Introduction 195


15.2 The Testing Procedure 196
15.2.1 Description of the procedure 197
15.2.2 Behavior of the test under the null
hypothesis 198
15.2.3 A toy framework: The case of a known
variance 198
15.3 The Power of the Test 198
15.3.1 The main result 198
15.3.2 Rates of testing 199
15.4 Simulations 201
15.4.1 The simulation experiment 201
15.4.2 The testing procedure 202
15.4.3 The test proposed by Horowitz and
Spokoiny (2000) 202
15.4.4 Results of the simulation study 203
15.5 Proofs 203
15.5.1 Proof of Theorem 15.3.1 203
15.5.2 Proof of Corollary 15.3.1 204
References 206
x Contents

PART V: GOODNESS-OF-FIT TESTS IN SURVIVAL ANALYSIS


AND RELABILITY

16 Inference in Extensions of the Cox Model for


Heterogeneous Populations 211
Odile Pons
16.1 Introduction 211
16.2 Non-Stationary Cox Model 212
16.3 Varying-Coefficient Cox Model 219
References 224

17 Assumptions of a Latent Survival Model 227


Mei-Ling Ting Lee and G. A. Whitmore
17.1 Introduction 227
17.2 Latent Survival Model 228
17.3 Data and Parameter Estimation 229
17.4 Model Validation Methods 230
17.5 Remedies to Achieve a Better Model Fit 233
References 235

18 Goodness-of-Fit Testing for the Cox Proportional


Hazards Model 237
K arthik Devarajan and Nader Ebrahimi
18.1 Introduction 237
18.2 Goodness-of-Fit Testing for the Cox PH Model 240
18.3 Comparison of the Proposed Goodness-of-Fit Test
with Existing Methods 242
18.4 Illustration of the Goodness-of-Fit Test using
Real-Life Data 249
18.5 Concluding Remarks 250
References 251

19 A New Family of Multivariate Distributions for


Survival Data 255
Shulamith T. Gross and Catherine Huber-Carol

19.1 Introduction 255


19.2 Frailty Models: An Overview 255
19.3 The Model 257
19.4 An Application to Skin Grafts Rejection 261
19.4.1 Description of the data 261
References 264
Contents xi

20 Discrimination Index, the Area Under the ROC Curve 261


Byung-Ho Nam and Ralph B. D'Agostino

20.1 Introduction 268


20.2 Nonparametric Confidence Interval for Area under
the ROC Curve 269
20.2.1 Discrimination in logistic regression 269
20.2.2 Estimation of the shift parameter ~ under
the shift model 271
20.2.3 Confidence interval for the area under the
ROC curve 272
20.3 Extension of C Statistic to Survival Analysis 273
Appendix 277
References 279

21 Goodness-of-Fit Tests for Accelerated Life Models 281


Vilijandas Bagdonavicius and Mikhail S. Nikulin
21.1 Introduction 281
21.2 Generalized Sedyakin's Model 282
21.3 Alternatives to the GS Model 286
21.3.1 Proportional hazards model 286
21.3.2 Model including influence of
switch-up's of stresses on reliability 287
21.4 Test Statistic for the GS Model 287
21.5 Asymptotic Distribution of the Test Statistic 288
21.6 The Test 293
21.7 Consistency and the Power of the Test Against
Approaching Alternatives 293
References 296

PART VI: GRAPHICAL METHODS AND


GENERAL GOODNESS-OF-FIT TESTS

22 Two Nonstandard Examples of the


Classical Stratification Approach to
Graphically Assessing Proportionality of Hazards 301
Niels Keiding

22.1 Introduction 301


22.2 Some Approaches to Testing Proportionality of
Hazards 302
22.3 "Proportionality" in Discrete-Time Regression for
Retro-Hazard 303
xii Contents

22.4 The Renewal Assumption in Modulated Renewal


Processess 304
References 308

23 Association in Contingency Tables, Correspondence


Analysis, and (Modified) Andrews Plots 311
Ravindra Khattree and Dyanand N. N aik

23.1 Introduction 311


23.2 (Modified) Andrews Plots in Correspondence
Analysis 313
23.3 Some Examples 314
23.4 Modified Andrews Plots and Rao's Correspondence
Analysis 320
23.5 Conclusions 325
References 325

24 Orthogonal Expansions and Distinction Between


Logistic and Normal 327
Carles M. Cuadras and Daniel Cuadras

24.1 Introduction 327


24.2 Orthogonal Expansion in Principal Components 328
24.3 Maximum Correlation for the Logistic Distribution 331
24.4 Distinction Between Logistic and Normal 333
References 339

25 Functional Tests of Fit 341


Denis Bosq

25.1 Introduction 341


25.2 Behaviour of IITnl1 in Distribution 342
25.3 Consistency of FTF Tests and Rate of Convergence 344
25.4 Adjacent Hypothesis 346
25.5 Choosing a Kernel 347
25.6 Local Efficiency of FTF Tests 348
25.7 Indications Concerning the Proofs 351
25.8 Simulations 352
References 355

26 Quasi Most Powerful Invariant Tests of Goodness-of-Fit 357


Gilles R. Ducharme and Benoit Frichot

26.1 Introduction 357


26.2 Laplace Approximation 358
26.3 Quasi Most Powerful Invariant Test 359
References 360
Contents xiii

PART VII: MODEL VALIDITY IN QUALITY OF LIFE

27 Test of Monotonicity for the Rasch Model 365


Jean Bretagnolle

27.1 Results of the Literature 365


27.2 Extension of Hoeffding Result 366
27.3 A Questionnaire Model 366
27.4 Simulations about the Level in the
Conditional Test Case 368
27.5 Simulations about the Power under H A 369
27.6 Conclusion 369
References 369

28 Validation of Model Assumptions in Quality of


Life Measurements 371
A. Hamon, J. F. Dupuy, and M. Mesbah

28.1 Introduction 371


28.2 Classical Theory 372
28.3 SIP Mobility Data (I) 373
28.4 The Rasch Model 375
28.4.1 Goodness-of-fit tests 375
28.4.2 A graphical method 378
28.5 SIP Mobility Data (II) 378
28.6 Conclusion 382
References 382

PART VIII: TESTS OF HYPOTHESES AND ESTIMATION


WITH ApPLICATIONS

29 One-Sided Hypotheses in a Multinomial Model 387


Richard M. Dudley and Dominique M. Haughton

29.1 Introduction 387


29.2 Putting Multiple Data Sets Into an LLd. Form 388
29.3 Model Selection Criteria 388
29.4 Application to 2 x 2 Contingency Tables 390
29.5 Common Odds Ratio Profile Likelihoods 391
29.6 Jeffreys Priors for Mixture Models 391
29.7 Posterior Probabilities that Models are Best 393
29.8 Data on Long-Term Aspirin Therapy after an MI 394
29.9 Numerical Results 395
29.10 Discussion and Conclusions 39b
References 397
xiv Contents

30 A Depth Test for Symmetry 401


Peter J. Rousseeuw and Anja Struyf

30.1 Introduction 401


30.2 Location Depth and Angular Symmetry 402
30.3 A Test for Angular Symmetry 405
30.4 Regression Depth and Linearity of the
Conditional Median 407
References 411

31 Adaptive Combination of Tests 413


Yadolah Dodge and Jana Jureckovri

31.1 Introduction 413


31.2 Adaptive Combination of Estimators 415
31.3 Adaptive Combination of Tests 417
31.3.1 Adaptive combination of F-test and
median-type test 420
31.3.2 Adaptive combination of M-test and
median-type test 421
References 423

32 Partially Parametric Testing 425


J. C. W. Rayner

32.1 Partially Parametric Inference 425


32.2 S-Sample Smooth Tests for Goodness-of-Fit 426
32.3 Partially Parametric Alternatives to the t-Test 428
32.4 Tests for the Location of Modes 430
References 432

33 Exact N onparametric Two-Sample Homogeneity Tests 435


Jean-Marie Dufour and Abdeljelil Farhat

33.1 Introduction 435


33.2 Test Statistics 437
33.3 Exact Randomized Permutation Tests 440
33.4 Simulation Study 442
33.5 Conclusion 444
References 447

34 Power Comparisons of Some Nonparametric Tests for


Lattice Ordered Alternatives in Two-Factor Experiments 449
Thu Hoang and Van L. Parsons

34.1 Introduction 449


34.2 Hypotheses and Test Statistics 450
34.3 Test Statistic Power Evaluations 452
Contents xv

34:4 Results and Conclusions 455


Appendix 461
References 461

35 Tests of Independence with Exponential Marginals 463


Paul Deheuvels
35.1 Introduction 463
35.2 Karhunen-Loeve Expansions 465
35.3 Applications to Tests of Independence 470
References 472

36 Testing Problem for Increasing Function in


a Model with Infinite Dimensional Nuisance Parameter 477
M. Nikulin and V. Solev
36.1 Introduction 477
36.2 Consistency of the Estimatoren 483
36.3 Asymptotic Behavior of Kernel Estimators of Densities 486
References 492

37 The Concept of Generalized Asymptotic Deficiency


and its Application to the Minimum Discrepancy
Estimation 495
M. Akahira
37.1 Introduction 495
37.2 The Concept of Generalized Asymptotic Deficiency 496
37.3 An Application to the Minimum Discrepancy
Estimation 500
References 502

Index 505
Preface

Commemorating the centennial anniversary of the landmark paper by Karl


Pearson on chi-square goodness-of-fit test, an International Conference on
Goodness-oj-Fit Tests and Model Validity was organized at Paris, France, during
May 29-31, 2000. This conference successfully attracted numerous statisticians
from allover the world, many of them renowned experts in this area of research.
The conference thus provided the participants with details on historical devel-
opments, elaborate surveys of pertinent topics, information on new research
work, and many lively after-lecture discussions. We thank Natacha Heutte,
Chantal Guihenneuc, Min Thu Do Hoang, Anouar Benmalek, Jean Marie Tri-
cot, Jean Franc;ois Petiot, Florence Duguesnoy, Leo Gerville-Reache and Valia
Nikoulina for helping us with the organization of the conference. We also thank
the French Ministry of Education and Research for their financial support, and
the French Group of Biometric Society and the French Statistical Society for
their support and cooperation. Thanks are expressed to Habida Mesbah for her
patience and her special delivery of forgotten items.
This volume presents a broad spectrum of papers presented at the Interna-
tional Conference. It includes 37 articles in total which, for better presentation
as well as convenience of the readers, have been divided into the following eight
parts:
Part I - History and Fundamentals
Part II - Chi-Squared Test
Part III - Goodness-of-Fit Tests for Parametric Distributions
Part IV - Regression and Goodness-of-Fit Tests
Part V - Goodness-of-Fit Tests in Survival Analysis and Reliability
Part VI - Graphical Methods and General Goodness-of-Fit Tests
Part VII - Model Validity in Quality of Life
Part VIII - Tests of Hypotheses and Estimation with Applications
The articles in this volume provide a clear testimony to the importance
and significance of work relating to the theory, methods and applications of
goodness-of-fit tests and model validity. We sincerely hope that the readers will
find this volume of interest. It is also our hope that new researchers will gain
insight as well as new ideas from this volume, which may possibly encourage

xvii
xviii Preface

them to work in this fertile area of research.


We express our thanks to Lauren Schultz (Birkhauser, Boston) for taking a
keen interest in this project, and to Elizabeth Lowe (of Texniques) for assisting
us with the production of this volume. We express our gratitude to all authors
for sending in their articles in time and in good form. We express our gratitude
to Odile Pons, Ion Grama and some anonymous reviewers for helping us with
critically examining merits of the papers during the editorial process. A Mme
Curmi, des services financiers de l'universite Paris V pour sa competence et son
amabilite. Thanks are also expressed to the different personnel at Universite de
Paris V, Universite de Bretagne Sud, Universite de Bordeaux 2 and McMaster
University for providing support and help in order to organize the conference
smoothly and successfully. Our special thanks go to Debbie Iscoe (Canada) for
a fine job in typesetting this entire volume in a camera-ready form.

Paris, France C. Huber-Carol


Hamilton, Ontario, Canada N. Balakrishnan
Bordeaux, France; St. Petersburg, Russia M. S. Nikulin
Vannes, France M. Mesbah

May 2001
Contributors

Akaira, M asafumi
Institute of Mathematics, University of Tsukuba, Ibaraki 305-8571, Japan
e-mail: [email protected]
Arnold, Barry C.
Department of Statistics, University of California, Riverside, California
92521-0138, U.S.A.
e-mail: [email protected];edu
Bagdonavicius, V.
Departement de Mathematiques et Sciences Sociales, Universite Bordeaux
2, 33076 Bordeaux Cedex, France
e-mail: [email protected]
Balakrishnan, N.
Department of Mathematics and Statistics, McMaster University,
Hamilton, Ontario L8S 4K1, Canada
e-mail: [email protected]
Baraud, Y.
Ecole Normale Superieure, Paris, France
Beaver, Robert J.
Department of Statistics, University of California, Riverside, California
92521-0138, U.S.A.
e-mail: robert. [email protected]
Bosq, Denis
Laboratoire de Probabilites, Universite Paris VI, 4, Place Jussieu, 75252
Paris Cedex 05, France
e-mail: [email protected]
Bretagnolle, Jean
Laboratoire de Statistique Appliquee, Universite de Paris XI, 91405 Orsay

xix
xx Contributors

Cedex 11, France


e-mail: Jean. Bretagnolle@math. u-psud.jr

Castillo, Enrique
Department of Applied Mathematics and Sciences, University of Cantabria,
E-39005 Santander, Cantabria, Spain
e-mail: [email protected]

Cox, D. R.
Department of Statistics, Nuffield College, Oxford OX1 1NF, England,
U.K.
e-mail: [email protected]

Cuadras, Carles M.
Department of Statistics, University of Barcelona, 08023 Barcelona, Spain
e-mail: [email protected]

Cuadras, Daniel
University of Barcelona, 08023 Barcelona, Spain

D 'Agostino, Ralph B.
Statistics and Consulting Unit, Department of Mathematics and
Statistics, Boston University, Boston, Massachusetts 02215, U.S.A.
e-mail: ralph@bu. edu

Deheuvels. Paul
L.S.T.A., Universite Paris VI, 92340, Bourg-la-Reine, France
e-mail: [email protected]

Devarajan, Karthik
Division of Statistics, Northern Illinois University, DeKalb, Illinois 60115,
U.S.A.

Diack, Cheikh A. T.
Department of Statistics, University of Warwick, Coventry CV4 7 AL,
UK
e-mail: [email protected]

Dodge, Yadolah
Groupe de Statistique, University of Neuchatel, CH-2002 Neuchatel,
Switzerland
e-mail: [email protected]
Contributors xxi

Ducharme, Gilles R.
Departement des Sciences Mathematiques, Universite Montpellier II, 34095
Montpellier Cedex 5, France
e-mail: [email protected]

Dudley, Richard M.
Department of Mathematics, Massachusetts Institute of Technology,
Cambridge, Massachusetts 02215, U.S.A.
e-mail: [email protected]

Dufour, J ean-Marie
CIRANO and CRDE, Universite de Montreal, Montreal, Quebec H3C
3J7, Canada
e-mail:
Dupuy, Jean-Francois
Department of Applied Statistics, University of South Brittany, 56000
Vannes, France
e-mail: [email protected]

Ebrahimi, Nader
Division of Statistics, University of Northern Illinois, DeKalb, Illinois
60115, U.S.A.
e-mail: [email protected]

Farhat, Abdeljelil
CIRANO, Universite de Montreal, Montreal, Quebec H3A 2A5, Canada
e-mail: [email protected]

Frichot, Benoit
Departement des Sciences Mathematiques, Universite Montpellier II, 34095
Montpellier Cedex 5, France
e-mail: [email protected]

Gerville-Reache, Leo
Departement de Mathematiques et Sciences Sociales, Universite Bordeaux
2, 33076 Bordeaux Cedex, France
e-mail: [email protected]

Gross, Shulamith T.
Laboratoire de Statistique, Universite de Paris V, 75006 Paris, France
e-mail: [email protected]

Gulati, Sneh
Department of Statistics, Florida International University, Miami, Florida
33199, U.S.A.
e-mail: [email protected]
xxii Contributors

Hamon, Agnes
Laboratoire SABRES, Universite de Bretagne-Sud, 56000 Vannes, France
e-mail: [email protected]
Haughton, Dominique M.
Mathematical Sciences, Bentley College, Waltham, Massachusetts 02452-
4705, U.S.A.
e-mail: [email protected]
Hoang, Thu
Laboratoire de Statistique Medicale, Universite de Paris V, 75006 Paris,
France
e-mail: [email protected]
Huber, P. J.
P.O. Box 198, CH-7250, Klosters, Switzerland
e-mail: [email protected]
Huber-Carol, Catherine
Universite Paris Vand U472 INSERM, Paris, France
e-mail: [email protected]
Huet, S.
INRA Biometrie, 78352 Jouy-en-Josas Cedex, France
Jureckova, Jana
Statistics Department, Charles University, Czech Republic
e-mail: [email protected]
Kannan, N.
Division of Mathematics and Statistics, The University of Texas at San
Antonio, Texas 78249-0664, U.S.A.
e-mail: [email protected]
K eiding, Niels
Department of Biostatistics, University of Copenhagen, 2200 Copenhagen,
Denmark
e-mail: [email protected]
Khattree, Ravi
Department of Mathematics and Statistics, Oakland University, Rochester,
Missouri 48309-4485, U.S.A.
e-mail: [email protected]
Laurent, B.
Laboratoire de Statistique, Universite de Paris XI, 91405 Orsay Cedex,
France
e-mail: [email protected]
Con tri bu tors xxiii

Lee, Mei-Ling Ting


Channing Laboratory, Harvard University, Boston, Massachusetts 02115-
5804, U.S.A.
e-mail: [email protected]
Marchetti, Carol E.
Rochester Institute of Technology, Rochester, New York 14623, U.S.A.
e-mail: [email protected]
M esbah, M ounir
Department of Applied Statistics, University of South Brittany, 56000
Vannes, France
e-mail: [email protected]

M ohdeb, Zaher
Departement de Mathematiques, Universite Mentouri Constantine, 25000
Constantine, Algeria
e-mail: [email protected]
Mokkadem, Abdelkader
Department of Mathematics, University of Versailles-Saint-Quentin, 78035
Versailles Cedex, France
e-mail: [email protected]

Mudholkar, G. S.
Department of Statistics, University of Rochester, Rochester, New York
14627-0047, U.S.A.
e-mail: [email protected]
Naik, Dayanand N.
Department of Mathematics and Statistics, Oakland University, Rochester,
Missouri 48309-4485, U.S.A.
Nam, Byung-Ho
Statistics and Consulting Unit, Department of Mathematics and Statis-
tics, Boston University, Boston, Massachusetts 02215, U.S.A.
e-mail: [email protected]
Neus, Jordan
Biostatistics, State University of New York at Stony Brook, Stony Brook,
New York, U.S.A.
e-mail: [email protected]
Ng, H. K. T.
Department of Mathematics and Statistics, McMaster University,
Hamilton, Ontario L8S 4K1, Canada
e-mail: [email protected]
xxiv Contributors

Nikulin, M. S.
UFR de Mathematiques, Informatique et Sciences Sociales, Universite
Bordeaux 2, Bordeaux, France
e-mail: [email protected]

Parsons, Van L.
National Center for Health Statistics, Hyattsville, Maryland 20782-2003,
U.S.A.
e-mail: [email protected]

Pons, Odile
INRA Biometrie, 78352 Jouy-en-Josas Cedex, France
e-mail: [email protected]

Rao, C. R.
Department of Statistics, Pennsylvania State University, University Park,
Pennsylvania 16802, U.S.A.
e-mail: eer1 @psu.edu

Rayner, G. D.
School of Computing and Mathematics, Deakin University, Geelong,
VIC3217 Australia
e-mail: [email protected]

Rayner, J. C. W.
School of Mathematics and Applied Statistics, University of Wollongong,
Wollongong NSW 2522, Australia
e-mail: [email protected]

Rousseeuw, P. J.
Department of Mathematics and Computer Science, University of Antwerp,
Universiteitsplein 1, B-2610 Antwerp, Belgium
e-mail: [email protected]

Sarabia, Jose Maria


Economics Department, University of Cantabria, E-39005 Santander,
Cantabria, Spain
e-mail: [email protected]

Sen, P. K.
Department of Biostatistics, University of North Carolina at Chapel Hill,
North Carolina 27599-7400, U.S.A.
e-mail: [email protected]
Con tri bu tors xxv

Seymour, Lynne
Department of Statistics, The University of Georgia, Athens, Georgia
30602-1952, U.S.A.
e-mail: [email protected]

Solev, V.
The Laboratory of Statistical Methods, Steklov Mathematical Institute,
St. Petersburg, 19011, Russia
e-mail: [email protected]

StruyJ, Anja
Research Assistant, FWO, 1000, Brussels, Belgium

Whitmore, G. A.
McGill University, Montreal, Quebec H3A 2T5, Canada

Zerbet, Afcha
Departement de Mathematiques et Sciences Sociales, Universite Bordeaux
2, 33076 Bordeaux Cedex, France
e-mail: [email protected]
List of Tables

Table 4.1 Simulated percentage test sizes using (asymptotic) XI 53


critical values for the component tests -vl, vi, "Ci?, and
V62 under different categorisations of the data: the un-
categorised method (u) [Rayner and Best (1989, Chapter
6) J, and my method under two different categorisation
schemes (Cl and C2)
Table 4.2 Simulated and asymptotic (xI) critical values for "C32 ,vi, 54
"Ci?, and "C62 under different categorisations of the data:
the uncategorised method (u) [Rayner and Best (1989,
Chapter 6)], and my method under two different cate-
gorisation schemes (CI and C2)

Table 8.1 Progressive censoring schemes used in the Monte Carlo 104
simulation study
Table 8.2 Monte Carlo power estimates for Weibull distribution at 105
10% and 5% levels of significance
Table 8.3 Monte Carlo power estimates for Lomax distribution at 106
10% and 5% levels of significance
Table 8.4 Monte Carlo power estimates for Lognormal distribution 107
at 10% and 5% levels of significance
Table 8.5 Monte Carlo power estimates for Gamma distribution at 108
10% and 5% levels of significance
Table 8.6 Monte Carlo null probabilities of T for exponential dis- 109
tribution at levels 2.5 (2.5) 50%
Table 8.7 Simulated and approximate values of the power of T* at 109
10% and 5% levels of significance

Table 9.1 Power comparisons, n = 50, 5 cutpoints @ 0.4, 0.8, 1.2, 119
1.6, 2.0
Table 9.2 Power comparisons, n = 50, 9 cutpoints @ 0.25,0.5,0.75, 119
1.0, 1.25, 1.5, 1.75, 2.0, 2.25

xxvii
xxviii List of Tables

Table 11.1 Simulation based upper 90, 95 and 99th percentiles of the 146
statistic T for different values of nand m
Table 11.2 Accuracy of chi-square approximations for percentiles of 147
T
Table 11.3 Simulation based upper 90, 95 and 99th percentiles of the 148
statistic T for different values of nand m
Table 11.4 Accuracy of chi-square approximations for percentiles of 149
T
Table 11.5 Power of the T test of size .05 with a standard normal 151
null hypothesis
Table 11.6 Power of the T test of size .05 with a standard normal 152
null hypothesis
Table 11.7 Power of the UkOD test of size .05 with a standard nor- 153
mal null hypothesis
Table 11.8 Ranked set sample of shrub sizes 154

Table 12.1 Interaction profile 166


Table 12.2 Parameter estimates 167
Table 12.3 Gamma parameters for MCMC Pearson statistics 168
Table 12.4 Results for MCMC Pearson statistic 168
Table 12.5 Percentiles under Gibbs regression 169

Table 14.1 Empirical quantiles, when 172 is estimated by S;; (theo- 190
retical values at level 1%, 5%, 10%, are 2.33, 1.65, 1.28
respectively)
Table 14.2 Empirical quantiles, when 17 2 is estimated by fi; (theo- 190
retical values at level 1%, 5%, 10%, are 2.33, 1.65, 1.28
respectively)
Table 14.3 Proportion of rejections in 1000 samples of size n = 50; 191
with two examples of alternatives: h(t) = alt + a2 +
pte- 2t and h(t) = alt + a2 + pt 2 , (17 2 estimated by S;)
Table 14.4 Proportion of rejections in 1000 samples of size n = 50; 192
with two examples of alternatives: h (t) = ali + a2 +
pte- 2t and h(t) = alt + a2 + pt 2 , (17 2 estimated by (f )
~2

Table 15.1 Percentage of rejection 203

Table 18.1 Simulation results: Power comparison of proposed test 244


uncensored Weibull samples, Weibull sample size per
group = 30
Table 18.2 Simulation results: Power comparison of proposed test 244
uncensored Weibull samples, sample size per group = 50
List of Tables XXIX

Table 18.3 Simulation results: Power comparison of proposed test 245


uncensored Weibull samples, sample size per group = 100
Table 18.4 Simulation results: Power comparison of proposed test 245
25% censoring, Weibull samples, sample size per group
= 30
Table 18.5 Simulation results: Power comparison of proposed test 246
25% censoring, Weibull samples, sample size per group
= 50
Table 18.6 Simulation results: Power comparison of proposed test 246
25% censoring, Weibull samples, sample size per group
= 100
Table 18.7 Power comparison of directed divergence and divergence 246
uncensored lognormal samples, sample size per group =
30
Table 18.8 Power comparison of directed divergence and divergence 247
uncensored lognormal samples, sample size per group =
50
Table 18.9 Power comparison of directed divergence and divergence 247
uncensored lognormal samples, sample size per group =
100
Table 18.10 Power comparison of directed divergence and divergence 247
uncensored Weibull samples, sample size per group = 30
Table 18.11 Power comparison of directed divergence and divergence 248
uncensored Wei bull samples, sample size per group = 50
Table 18.12 Power comparison of directed divergence and divergence 248
uncensored Weibull samples, sample size per group = 100
Table 18.13 Power comparison of directed divergence and divergence 248
25% censoring, Weibull samples, sample size per group =
30
Table 18.14 Power comparison of directed divergence and divergence 249
25% censoring, Weibull samples, sample size per group =
50
Table 18.15 Power comparison of directed divergence and divergence 249
25% censoring, Wei bull samples, sample size per group =
100

Table 19.1 Bachelor and Hackett (1970) skin grafts data on severely 262
burnt patients
Table 19.2 Some risk sets R and jump sets S for skin grafts data 263
Table 19.3 Model selection for burn data 263
Table 19.4 Parameters estimation in model 8 having the smallest 263
AlC
xxx List of Tables

Table 23.1 Agreement with respect to number of diseased vessels 315


Table 23.2 Clinical and QC site evaluations: Rowand column points 316
Table 23.3 Cross-classification of mental health status and parents' 320
socioeconomic status
Table 23.4 Mental health and parents' socioeconomic status: row & 321
column points
Table 23.5 Results of a survey on analgesic efficacy of drugs 323

Table 26.1 Empirical power of tests of normality based on 10,000 360


samples of size n = 50 from a logistic distribution

Table 28.1 Items of the Mobility dimension; n = 466 374


Table 28.2 Distribution of the individuals scores for the Mobility di- 374
mension; n = 466
Table 28.3 Division into 4 subgroups; n = 466 379
Table 28.4 Expected and observed frequency of positive answers to 380
item 3 in each slibgroup

Table 32.1 Class survey results 427


Table 32.2 Components Vrs using a discrete uniform target and nor- 427
malized Chebyshev polynomials

Table 33.1 Continuous distributions with their means and variances 442
Table 33.2 Empirical level and power for tests of equality of two 443
distributions: m = 22, n = 22 and a = 5%
Table 33.3 Empirical level and power for MC tests of equality of 445
two continuous distributions having same mean and same
variance: m = n = 22 and a = 5%
Table 33.4 Empirical level and power for tests of equality of two 446
discrete distributions: m = n = 22 and a = 5%

Table 34.1 Power and efficiency of test statistics compared to iso- A57
tonized Kruskal Wallis statistic for a = 0.01, 5 x 5 grids
and one observation per cell
Table 34.2 Power and efficiency of test statistics compared to iso- 458
tonized Kruskal Wallis statistic for a = 0.01, 5 x 5 grids
and four observations per cell
Table 34.3 Comparing ranges of effiency of statistics and choosing a 459
test for selected trend shapes and distributions and for
a = 0.01, 5 x 5 grids and one observation per cell
Table 34.4 Comparing effiency of statistics and choosing a test for 460
selected trend shapes and distributions and for a = 0.01,
5 x 5 grids and four observations per cell

Table 37.1 Function 9 and value of Cg of various estimators 501


List of Figures

Figure 3.1 The 4-lunation series (covering the years 1830-1990 in 4- 35


month intervals) in the time domain: the actual data in
the series, and a smoothed version (obtained by forming
moving averages). Note the changing level of the obser-
vational noise and the decadal waves
Figure 3.2 LoglO-spectrum of the differenced 5-day series (covering 36
the years 1962-1995 in 5-day intervals). The cross-over
between the random walk process and the AR(2) model
occurs near 8 months (243.81 days). On purpose, only
the two most prominent components (2) and (5) of the
model are used

Figure 4.1 Sampling distribution of the V3, V4, Vs,V6 statistics ob- 52
tained from R = 10, 000 samples of size n = 20 taken
from the standard normal distribution. The top row is
for the uncategorised data (u) using Rayner and Best's
method (1989, Chapter 6), and the other rows use my cat-
egorised method (Section 4.3) with ml = 10 categories
(middle, Cl) and m2 = 6 categories (bottom, C2)

Figure 5.1 Neyman-Pearson classes 62

Figure 9.1 Power comparisons for SW1, k = 6, distance = 0.4 120

Figure 22.1 The probability of remaining property-claim free calcu- 306


lated by the Kaplan-Meier estimate based on durations
since an observed claim (with pointwise 95% confidence
limits) and by the nonparametric maximum likelihood es-
timate based on all observations in the assumed station-
ary renewal process. It is seen that the durations after
an observed claim are generally shorter. [From Andersen
and Fledelius (1996)].

xxxi
xxxii List of Figures

Figure 23.1 Agreement w.r.t. no. of diseased vessels 317


Figure 23.2 Agreement w.r.t. no. of diseased vessels 317
Figure 23.3 Agreement w.r.t. no. of diseased vessels 318
Figure 23.4 Agreement w.r.t. no. of diseased vessels 318
Figure 23.5 Agreement w.r.t. no. of diseased vessels 319
Figure 23.6 Agreement w.r.t. no. of diseased vessels 319
Figure 23.7 Agreement w.r.t. no. of diseased vessels 322
Figure 23.8 Drug vs. efficacy rating 324
Figure 23.9 Drug vs. efficacy rating 324

Figure 24.1 Plot of the theoretical principal dimensions hI (X), 335


hl(Y), where X, Y follow the logistic (solid line) and nor-
mal (dashed line) distribution respectively
Figure 24.2 Plot of the theoretical principal dimensions h2 (X), 335
h2(Y), where X, Y follow the logistic (solid line) and nor-
mal (dashed line) distribution respectively
Figure 24.3 Plot of the theoretical principal dimensions h3 (X) , 336
h3(Y), where X, Y follow the logistic (solid line) and nor-
mal (dashed line) distribution respectively
Figure 24.4 Plot of the theoretical principal dimensions h4 (X), 336
h4(Y), where X, Y follow the logistic (solid line) and nor-
mal (dashed line) distribution respectively
Figure 24.5 First logistic dimension: continuous line. Logistic sam- 337
ple: ~-line. Normal sample: "'line. Compare to Figure
24.1
Figure 24.6 Second logistic dimension: continuous line. Logistic sam- 337
ple: ~-line. Normal sample: "'line. Compare to Figure
24.2
Figure 24.7 Third logistic dimension: continuous line. Logistic sam- 338
ple: ~-line. Normal sample: "'line. Compare to Figure
24.3
Figure 24.8 Fourth logistic dimension: continuous line. Logistic sam- 338
ple: ~-line. Normal sample: "'line. Compare to Figure
24.4

Figure 25.1 H : N(O, 1); Ha : N(O, 25/16); n = 50 352


Figure 25.2 H: N(O, 1); Ha : N(O, 25/16); n = 100 353
Figure 25.3 H : N(O, 1); Ha : N(O, 5,1); n = 50 353
Figure 25.4 H : N(O, 1); Ha : N(O, 5,1); n = 100 353
Figure 25.5 H: CAUCHY (0,1); Ha: STUDENT (25); n = 50 354
Figure 25.6 H: U(O, 1); Ha: BETA(3/2,1) 354
Figure 25.7 H : N(O, 1); Ha : 0, 9N(0, 1) + 0, 1N(0, 25); n = 50 354
Figure 25.8 H : N(O, 1); Ha : 0, 8N(0, 1) + 0, 2N(0, 0, 04) 355
List of Figures xxxiii

Figure 28.1 Step by step procedure with the CAC for the Mobility 374
dimension
Figure 28.2 Traces of the Mobility dimension items 379
Figure 28.3 Difficulty estimates in each group formed by the individ- 380
uals who positively answer to item 2 (GI) and negatively
answer to item 2 (Go)
Figure 28.4 Difficulty estimates in each group formed by the individu- 381
als who positively answer to item 10 (GI) and negatively
answer to item 10 (Go)

Figure 30.1 Examples of (a) a discrete and (b) a continuous angularly 403
symmetric distribution around c. Transforming (a) and
(b) through the mapping h(x) = (x - c)/llx - cll yields
the centrosymmetric distributions in (c) and (d)
Figure 30.2 (a) Bagplot of the spleen weight versus heart weight of 408
73 hamsters. (b) Bagplot of the log-transformed data set
Figure 30.3 Evolution of the exchange rates of DEM/USD (dashed 409
line) and JPY /USD (full line) from July to December
1998
Figure 30.4 Differences between exchange rates on consecutive days 410
for DEM/USD and JPY /USD in the second half of 1998.
The origin is depicted as a triangle
Figure 30.5 The azimuth data 410

Figure 32.1 The probability distribution function g(x; fJ 4, 0,1) for 429
varying values of fJ4
Figure 32.2 Probability density function of the bimodal distribution 430
given by Equation (32.1) with modes at 0 and 4.32
Figure 32.3 Comparison of t-test, Wilcoxon test and score test power 431
curves for testing H 0 : /-L = 0 against K : /-L i- 0 as the
data becomes progressively more non-normal
Figure 32.4 Comparison of power curves of the Wald test using the 432
nearest mode technique for samples of size 20 (solid), 50
(dashes) and 100 (dots) from the bimodal distribution in
Figure 32.3 above; 1000 simulations
Goodness-of- Fit Tests and
Model Validity
PART I
HISTORY AND FUNDAMENTALS
1
Karl Pearson and the Chi-Squared Test

D. R. Cox
Nuffield College, Oxford, England, UK

Abstract: This historical and review paper is in three parts. The first gives
some brief details about Karl Pearson. The second describes in outline the
1900 paper which is being celebrated at this conference. The third provides
some perspective on the importance, historically and contemporarily, of the
chi-squared test.

Keywords and phrases: K. P., history of statistics, chi-squared, goodness-


of-fit test, statistical inference

1.1 Karl Pearson 1857-1937: Background to the


Chi-Squared Paper
Karl Pearson, K. P. as he is usually referred to, was born of middle-class York-
shire parents, his father a lawyer and his mother's family connected with ship-
ping. He was at school in London and before going to University had a private
tutor in mathematics, Routh, a well-known expert in the theory of elasticity,
who introduced K. P. to that subject. He read Mathematics at Cambridge,
graduating Third Wrangler in 1879. The two men above him in the Tripos list
followed academic careers at Cambridge but there is no evidence that they were
research workers of note.
The stereotypic Yorkshire man was and is independent and forthright. While
an undergraduate K. P. had a long fight, which he won, with the Authorities
of his College securing the abandonment of compulsory attendance at divinity
lectures.
It is worth considering the kind of mathematics that K. P. studied at Cam-
bridge. There would, of course, have been no analysis but appreciable calculus,
algebra would have meant largely the Theory of Equations and determinants,

3
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
4 D. R. Cox

not matrix algebra, and there would have been substantial emphasis on parts
of classical mathematical physics. More importantly the emphasis was strongly
on ingenuity and manipulative skill in problem solving rather than on the devel-
opment of new concepts. There is some evidence that K. P. met, although not
necessarily to be taught by, such major figures as Clerk Maxwell, Cayley and
Green and more particularly Todhunter. Todhunter had published a History of
the Theory of Probability, essentially a long critical essay and review of what
had been published on Probability up to that point, and he was engaged in a
comparable book on the Theory of Elasticity.
After graduating, K. P. spent an extremely influential year in Germany,
studying physics but also philosophy and other aspects of German culture. He
was particularly attracted to the 17th century rationalist philosopher Spinoza.
During this year he changed the spelling of his name from the English spelling
Carl to the Germanic Karl.
After returning to England he qualified as a lawyer and then spent some
years, partly supported by a Fellowship from Kings College, Cambridge, in mis-
cellaneous lecturing mostly on such topics as German philosophy and Marxism.
He was part of an active world of literary and cultural life in London towards
the end of the 19th century. His views seem broadly those of left-wing thought
of the time, enlightened in their attitude to women's rights, socialist in political
thought, believing in the solution of social problems via rational enquiry and
holding views on racial matters that would now widely be regarded as unac-
ceptable. Biographies of major non-scientific figures of the period quite often
mention K. P., in passing at least.
He applied for a number of permanent academic jobs and in 1884 was ap-
pointed Professor of Engineering and Applied Mathematics at University Col-
lege London. His primary duty was to teach elementary mathematics to engi-
neers; he is reported as being outstandingly successful in this. He published
research papers on the theory of elasticity and collaborated with Todhunter on
his History of that field, writing, it is said, much of the second volume.
In 1890 W. F. R. Weldon was appointed Professor of Biology at University
College and an intensely active collaboration developed between them lasting
until Weldon's early death in 1906. Following the impact on Victorian thought
of Charles Darwin and more immediately for K. P. and Weldon of Galton, this
was a period of intense interest in genetics and evolutionary biology. Weldon be-
lieved that careful collection of observational data on natural variability would
provide the key to important issues and K. P. became involved in the analysis
of data collected by Weldon (and others) in their extensive field work and in
the development of what came to be called the biometric school. Their main
technique was the careful study of the shape of univariate and occasionally
bivariate frequency distributions and, in discrete cases to the analysis of two-
dimensional contingency tables. Recognition that distributions were often far
from the normal or Gaussian form led to the development of the flexible system
Karl Pearson and the Chi-Squared Test 5

of frequency curves named after Pearson. These were fitted by moments. It


was the need for some form of relatively objective way of assessing adequacy of
fit that led to the paper [Pearson (1900)] celebrated in this conference.
It was published in Philosophical Magazine then as now a respected journal
of the physical sciences; indeed it is currently owned by the Physical Society.
Before considering the paper a few comments will be made about K. P. 's
work after the chi-squared paper.

1.2 K. P.: After Chi-Squared


In November 1900 Weldon suggested the need for a new journal and in October
1901 the first issue of Biometrika appeared. K. P. seems to have been by far
the dominant figure in all this, once the initial suggestion had been made. For
a period there were joint editors but following Weldon's death K. P. was the
sole editor until his death at age 80; indeed he was correcting proof a few weeks
before his death and had two papers, one characteristically long and polemical,
in the last issue.
In the period up to 1914 K. P. published about 90 papers in Biometrika
alone, few of them short, and seems to have been the moving force behind
many more. Even a cursory glance at these papers reveals K. Po's astonishing
range of interests and his intellectual vigor and originality. Some are method-
ological papers concerned, for instance, with the analysis of ordinal and nominal
data via an underlying Gaussian distribution and with many other topics. By
far the majority, however, are substantial pieces of analysis of observational
data. The fields range among biology, sociology, criminology, medicine and epi-
demiology and physical anthropology. The emphasis is always on the data and
their interpretation; they are rarely treated merely as exercises in technique.
After the end of the Great War K. P. continued to publish prolifically but his
work seems mostly of less current interest; the focus of development of statistics
had shifted elsewhere.

1.3 The 1900 Paper


The essence of the paper is as follows. First it is shown by direct transformation
of the multiple integrals involved that the distribution of the exponent of a
(nonsingular) multivariate normal distribution in d dimensions has what we
now call the chi-squared distribution with d degrees of freedom. Evaluation of
its tail area is shown to be possible by integration by parts.
6 D. R. Cox

Next the covariance matrix of a multinomial distribution with k cells is


found and the distribution considered across k - 1 cells to avoid singularities
approximated by a multivariate normal distribution in k - 1 dimensions.
The exponent of that multivariate normal distribution is then reexpressed
in more symmetrical form as the nowadays familiar chi-squared statistic for
comparing observed and theoretically known cell probabilities. The informal
reasonableness of using this as a test statistic for goodness-of-fit is discussed.
A long verbal discussion then follows recognizing that usually the compar-
ison is with fitted rather than known cell probabilities. It is argued that this
replacement will have a relatively small effect. That is the degrees of freedom
remain k - 1. (Indeed even after Greenwood and Yule, and subsequently in gen-
erality Fisher, had shown that the degrees of freedom for a 2 x 2 contingency
were one, K. P. insisted that they were three. Note, however, that for many
problems of examining distributional form, which were K. P.'s primary moti-
vation, the number of parameters is appreciably less than the number of cells,
in which case K. P.'s conclusion is a reasonable first approximation. Another
line of explanation of K. P. 's attitude is that he might have regarded the proper
chi-squared statistic to be that based on the theoretical probabilities which is
only estimated by chi-squared from the fitted frequencies. If these are obtained
essentially by minimizing a chi-squared measure a bias correction is needed.
This leads to the notion that with p adjustable parameters the observed chi-
squared plus p should be tested with k - 1 degrees of freedom and this is not
too far from use of k - p - 1 as the degrees of freedom.)
The paper concludes with a variety of numerical examples.
For very interesting further comments on the paper see Barnard (1991).
This is not the place to go into the interchanges between K. P. and other major
figures such as Student, Yule and, of course, R. A. Fisher, on which there is an
extensive literature.

1.4 Importance of the Chi-Squared Test


For perhaps 70 years following its introduction the chi-squared test was one of
the most widely used tools of formal statistical analysis. This may have been
not so much because of its application to the originating problem of assessing
distributional shape, but rather because the chi-squared test of independence
in a contingency table provided the main route for interpretation of qualitative
data, especially as they arise in the social sciences. Evidence of departure from
independence would then be interpreted descriptively.
More recently the role of the test is less central. There are a number of
reasons for this.
First, as compared with K. P.'s time, the primary focus of most studies has
Karl Pearson and the Chi-Squared Test 7

shifted from studies of distributional form to studies of dependence. In these


studies, issues of distributional shape are of secondary interest in indicating
efficient methods of analysis but are not the primary focus. In some contexts,
moreover, studies of robustness remove much of the dependence on strong as-
sumptions of distributional form. Even when comparisons with, say, the Poisson
distribution are involved, a more relevant and focused question is often whether
the variance is equal to the mean or, less commonly, whether minus the log of
the probability of zero is equal to the mean. The former is relevant to the esti-
mation of the standard error of a rate and the latter to the analysis of dilution
series. Much later Fisher gave the appropriate exact distributional theory and
C. R. Rao supplied important complementary results.
Secondly, there has developed a tendency to prefer focused tests rather than
what M. S. Bartlett called omnibus tests. This is partly an issue of power but
at least as importantly of diagnostic effectiveness. Thus a test for normality
based, say on the larger of the standardized third and fourth cumulant, gives
a direct indication of the kind of departure from normality involved, whereas
an overall chi-squared statistic does not. Even more directly the chi-squared
dispersion test for the Poisson distribution is a direct examination of the ratio
of the variance to the mean.
Further there has recently developed a preference for procedures based on
estimation of interpretable parameters over those yielding primarily a signifi-
cance test. While chi-squared could be rescaled to estimate a distance measure
this would often not be easily interpreted. The preference for log linear models
as a route for the interpretation of contingency tables stems partly from this
preference for estimation.
There is the following broader issue. Models, as their name implies, are in-
evitably idealized. Especially with complex biological and social phenomena, it
is inconceivable that the systems are precisely described by any mathematical
or computer model, especially by the relatively simple models that are com-
monly used in statistics. Why then should we test goodness-of-fit to a model
that we know must be at some level inadequate? One answer with modest
amounts of data is that so long as no reasonably significant departure is found
the direction of inadequacy is not dearly established by the data. Thus it may
be poor strategy either to interpret the departure or to modify the model when,
for example, the direction in which we should modify the model is not firmly
established. Especially with modest amounts of data it is likely that substan-
tively important departures from the null hypothesis might be present, so that
a significant departure from the null hypothesis deserves to be taken seriously.
On the other hand, with very large amounts of data the position is often differ-
ent. Even differences that are quite small in subject-matter terms are likely to
be highly statistically significant so that here the issue is often best regarded as
considering whether any lack of fit is important in subject-matter terms. Note,
however, that very large sets of data often have internal structure which may
8 D. R. Cox

make assessments of precision based on strong assumptions of independence


very misleading.
In many fields the most important aspect of tests of goodness-of-fit lies
in checking for various forms of departure from some standard conditions of
relative homogeneity. Examples of such tests are those for interaction, for
nonlinearity and for heterogeneity when information from different sources is
considered for combination.
Despite the generally decreased emphasis on omnibus tests of goodness-
of-fit, the notion that such tests are available remains in principle of great
importance. It represents the openmindedness of the attitude that our current
formulation of a problem may sometimes be shown empirically to be unsatis-
factory in ways not clearly formulated a priori. The organizers of the present
Conference are surely to be congratulated on bringing together this celebration
of K. P.'s path-breaking paper.

References
1. Barnard, G. A. (1991). Introduction to Pearson (1900), In Breakthroughs
in Statistics, Vol. 2 (Eds., S. Kotz and N. L. Johnson), pp. 1-10, New
York: Springer-Verlag.

2. Pearson, K. (1900). On the criterion that a given system of deviations


from the probable in the case of a correlated system of variables is such
that it can reasonably be supposed to have risen from random sampling,
Phil. Mag., 5, 157-175.
2
Karl Pearson Chi-Square Test
The Dawn of Statistical Inference

C. R. Rao
Pennsylvania State University, University Park, Pennsylvania

Abstract: Specification or stochastic modeling of data is an important step in


statistical analysis of data. Karl Pearson was the first to recognize this problem
and introduce a criterion, in a paper published in 1900, to examine whether the
observed data support a given specification. He called it chi-square goodness-
of-fit test, which motivated research in testing of hypotheses and estimation
of unknown parameters and led to the development of statistics as a separate
discipline. Efron (1995) says, "Karl Pearson's famous chi-square paper appeared
in the spring of 1900, an auspicious beginning to a wonderful century for the
field of statistics."
This paper reviews the early work on the chi-square statistic, its derivation
from the general theory of asymptotic inference as a score test introduced by
Rao (1948), its use in practice and recent contributions to alternative tests. A
new test for goodness-of-fit in the continuous case is proposed.

Keywords and phrases: Chi-square test, Jensen difference, likelihood ratio


test, quadratic entropy, Rao's score test, Wald test

2.1 Introduction
In an article entitled Trial by Number, Hacking says that the goodness-of-fit
chi-square test introduced by Karl Pearson (1900), "ushered in a new kind of
decision making" and gives it a place among the top 20 discoveries since 1900
considering all branches of science and technology. R. A. Fisher, who was in-
volved in bitter controversies with Pearson, was appreciative of the chi-square
test. In his book on Statistical Methods for Research Workers (1958, 13th edi-

9
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
10 C. R. Rao

tion, p. 22), Fisher says, "This (chi-square), I believe is the great contribution
to statistical methodology which the unsurpassed energy of Professor Pearson's
work will be remembered," and devoted one full chapter on numerous ingenious
applications of the chi-square test.
Pearson's chi-square is ideally applicable to qualitative data with a finite
number, say 8, of natural categories and the data are in the form of frequencies
of individuals in different categories. The specified hypothesis is of the form

7ri = 7ri(tI), i = 1, ... ,8 (2.1 )

where the probability 7ri in category i is a given function of a k-vector parameter


tI. If Pi is the observed proportion in category i, then a natural test criterion is
of the form

D(p-7r(e)) (2.2)

for a suitable choice of distance or dissimilarity measure, where P = (PI, ... ,Ps)',
7r(e) = (7rl (e), . .. ,7rs(e))' and e is an efficient estimate of tI.
Ideally tI is estimated by

e= argminD(p
e
- 7r(tI)). (2.3)

Various tests of goodness-of-fit proposed in statistical literature differ in the


measure (2.2) chosen.
When we have a sample from a continuous distribution with a distribution
function F(x, tI), there are two ways of deriving the goodness-of-fit test. One
is to discretize the continuous distribution by choosing class intervals -00 =
aQ, aI, ... ,as-I, as = 00, and defining

7ri+1(tI) = F(ai+1,tI) - F(ai,tI),i = 0, ... ,8 -1, (2.4)

in which case a test of the type (2.2) is applicable. An excellent treatment of


such an approach with all the issues involved in the choice of class intervals
and the estimation of tI is given in the monograph by Greenwood and Nikulin
(1996).
Another is the direct method of estimating tI by an efficient procedure such
as the maximum likelihood and constructing a test based on a suitable measure
of difference

(2.5)

where Fn is the empirical distribution function. A good review of this approach


can be found in Durbin (1973).
Section 2.2 of the paper describes three general methods of constructing
large sample test criteria of simple and composite hypotheses, viz., likelihood
Karl Pearson Chi-Square Test 11

ratio, Wald and Rao's score tests, also referred to as Holy Trinity. [See Koenkar
(1987, p. 294) and Lehmann (1999, pp. 525-529)]. In Section 2.3, Pearson's
chi-square and related tests are shown to be score tests, as observed by A.
Bera. The difficulties involved in deriving Wald tests for composite hypotheses
are discussed. Alternative tests of goodness-of-fit based on dissimilarity or
divergence measures derived from entropy functions are given in Section 2.4.
Tests of significance of goodness-of-fit for continuous distributions are reviewed
in Section 2.5. A new test is proposed and the possibility of using bootstrap is
pointed out.

2.2 Large Sample Criteria: The Holy Trinity


Let (X, /3, Pe) be a probability space, where Pe stands for a family of distribu-
tions indexed by a k-vector parameter B E 8. Further let X = (Xl, ... ,Xn )' be
a vector of iid observations from Pe. We consider the problem of testing simple
and composite hypotheses concerning Pe. When n is large, there are three gen-
eral methods [Koenkar (1987, p. 294), Lehmann (1999, pp. 525-529)], referred
to as the Holy Trinity, of constructing large sample test criteria.

2.2.1 Likelihood ratio criterion


Let L(X, B) denote the likelihood function based on the sample observations X.
Then, to test a simple hypothesis Hs : B = Bo (a specified value), the likelihood
ratio criterion introduced by Neyman and Pearson (1928) is

/\s = L(X, Bo) -;- supL(X, B). (2.6)


eEe

To test a composite hypothesis He, expressed in the form

BEe = {B: fI(()) = 0, ... ,fr(B) = O} (2.7)

the likelihood ratio criterion is

/\c = supL(X, B) -;- supL(X, B). (2.8)


eEC eEe

Large sample properties of the likelihood ratio criterion were studied by Wilks
(1938). It was shown that asymptotically

(2.9)

where X2 (b) represents chi-square distribution on b degrees of freedom. Note


that r is the number of restrictions on ().
12 C. R. Rao

2.2.2 Wald test


e
Let be the maximum likelihood estimate of fJ such that the asymptotic nor-
mality

holds, where f(fJ) is the information matrix for a single observation. Then the
Wald (1943) test for Hs : fJ = fJo is
I 2
n(fJ - fJ o) f(fJ)(fJ - fJ o) X (k)
A A A

rv (2.10)

and for the composite hypothesis He defined in (2.7) the Wald test is

(2.11)

where

f(·) = (1, (-), ... ,frO)', an r-vector,

M(fJ) -- (~fJfJi.),
u an r x k ma't·
nx.

2.2.3 Rao's score test


The score vector function as defined by Fisher is

S(fJ) = (_1_ 0 log L _1_ 0 log L)


Vn ofJ l , ... , Vn ofJk . (2.12)

The score test of Rao (1948) for the simple hypothesis Hs is

(2.13)

and for the composite hypothesis (2.7) is

(2.14)

where fJ is the m.l. estimate of fJ under the restrictions (2.7) of the composite
hypothesis.
e,
A variation of the test (2.14) where, instead of the m.l. estimate only Vn
consistent estimate iJ is substituted for fJ is called the Neyman-Rao test by Hall
and Mathiason (1990). Such a statistic has the same chi-square distribution on
r degrees of freedom.
Karl Pearson Chi-Square Test 13

2.3 Specification Tests for a Multinomial


Distribution
0:::
Let 1r1, ... ,1rs , 1ri = 1) and 01, ... ,Os, (2: Oi = n) be the cell probabili-
ties and observed frequencies in a sample of size n from an s-cell multinomial
distribution. Further let
s
P = (p1,'" ,Ps)'; Pi = Ojn, LPi = 1

be the estimated probabilities. The variance-covariance matrix of


P = (PI,··· ,Ps)' is n- 1C, where

2.3.1 Test of a simple hypothesis


Let us consider the test of a simple hypothesis

where 1riO are specified values.


The likelihood ratio test is

(LRT)s = 2~Oi log _2 ; EiO = n1riO, i = 1, ... ,s (2.15)
EiO

which is distributed asymptotically as X2 on (s - 1) d.f.


The Wald test is

(2.16)

where [C(p)]- is a generalized inverse of C(p). An alternative expression for


(2.16) is

(2.17)

which is usually called Neyman's modification of Pearson's chi-square defined


in (2.21).
14 C. R. Rao

For Rao's score test, we compute the scores

8logL ni
i = 1, ... ,s (2.18)
87ri

with the variance-covariance matrix

C(7r) = n(b. -1 - 11'), I' = (1, ... ,1) (2.19)

where b. is a diagonal matrix with 7ri as the i-th diagonal element. The score
statistic is

(2.20)

where y' = (nI/7r1o, ... ,ns/7rso). Observing that (b.(7ro) - 7r07ro) is a g-inverse
of C(7rO) , we find

(2.21)

where EiO = n7riO, the expected value when 7ri = 7riO, i = 1, ... ,s. The statistic
(2.21) is Pearson's chi-square for testing a simple hypothesis. [Note that in gen-
eral, the scores have to be computed for independent parameters, 7ri, ... ,7rs-1
in the present case, in which case the variance-covariance matrix will be non-
singular. The statistic Rs will have the same expression (2.21)].

2.3.2 Tests of a composite hypothesis

Consider the composite hypothesis

(2.22)

where () is a k-vector unknown parameter. Denote the ml estimate of () bye.

LRT

The likelihood ratio test of the hypothesis (2.22) is

2 "~ Oi log E*
Oi rv X2 (s - 1 - k) (2.23)
~

where Ei = n7ri(e), the expected value when () = e.


Karl Pearson Chi-Square Test 15

Score test
Rao's score test, obtained by substituting Ei for EiO in (2.21), is

(2.24)

The results (2.21) and (2.24) show that Pearson's chi-square tests (with the
modification made by Fisher for degrees of freedom when () is estimated) can
be obtained as Rao's score tests.

Wald test
The derivation ofthe Wald test for the composite hypothesis (2.22) is somewhat
complicated as it requires the formulation of (2.22) in the form of restrictions

gi(7f1, ... ,7rs ) =0, i=I, ... ,s-l-k

on the cell probabilities 7rl,'" ,7rs'


For example, consider the hypothesis that the distribution of male births in
families with (s - 1) children is binomial, i.e.,

(2.25)

where () is the probability that a child is a male. The equations (2.25) can be
written as restrictions
s - 17rl s - 27r2 1 7rs- l
--- --- ---- (2.26)
I 7r2 2 7r3 S - 1 7rs
on 7rl, ... ,7rs , which are in the form (2.7) required for applying the Wald test.
We use the formula (2.11) to derive the Wald statistic.
It may be noted that there is no unique representation of the restrictions
(2.26), and the Wald statistic may depend on the actual form in which the
restrictions are expressed. This is one of the drawbacks of the Wald statistic.

2.3.3 Test for goodness-of-fit in a subset of cells


While comparing observed and expected frequencies one may find that the
discrepancies are confined to a subset of the cells of the multinomial distribution.
A typical case is undercount or overcount in the cell for "zero events" in a
binomial or a Poisson distribution. Let us consider the score test for a given
specification

Ho: 7ri = 7ri(()), i = 1, ... ,8 (2.27)


16 C. R. Rao

given the alternative

HI :7ri = ai7ri(O)/T, i = 1, ... ,r,


7ri = 7ri(O)/T, i = r + 1, ... ,s,
T =al7rl (0) + ... + a r7rr(O) + 7rr +l(O) + ... + 7rs(O). (2.28)

Under the model HI, the null hypothesis (2.27) may be stated as

with 0' = (e1, . .. ,Ok) as nuisance parameters. To apply the score test, we need
to compute the scores for aI, ... ,ar and 01 , ... ,Ok and the information matrix
for the (r + k) parameters at the values al = ... = a r = 1 and maximum
likelihood estimates of 0 under Ho.
The scores for ai and OJ at estimated values under Ho are

ai = Oi - n7ri(iJ) , i = 1, ... ,r
8j = 0, j = 1, ... ,k. (2.29)

The information matrix is n times

(2.30)

where

A = (7rl (1 ~ 7rI) -7r.17r2

-7rr7rl -7rr7r2

and I is the information matrix for 01, ... ,Ok. The score statistic for testing
Ho is
(2.31)

where a' = (al, ... ,ar ). The asymptotic distribution of (2.31) is chi-square
with r degrees of freedom, if IA - BI- l B'I i- O. Otherwise it is equal to the
rank of A - B I-I B'; if the rank is less than r, we use a g- inverse in the definition
of (2.31). [Note that A - BI- l B' is the asymptotic variance covariance matrix
of a].
Karl Pearson Chi-Square Test 17

2.3.4 Analysis of chi-square


Consider a multinomial distribution in s classes with the following possible
specifications for the cell probabilities.

H : 'Tri, ... ,'Trs are arbitrary.


HI :'Tri='Tri(B), i=l,,,. ,S,BERk .
H2 : 'Tri = 'Tri(g(<I») = 'Tri(<I», i = 1, ... ,s, <I> E Rq, q < k.
In practice, there are situations where we first test HI against H, and if this
holds to test H2 against HI, i.e., whether the specification HI holds with the
restrictions on B implied by H2.
Let 01, ... ,Os be the observed frequencies in s cells, and and ~ be m.l. e
estimators of B and <I> under the hypotheses HI and H2 respectively. Then, the
chi-square tests for HI against Hand H2 against Hare

(2.32)

and

(2.33)

A test for H2 given HI can be obtained as

xL = X§ - XI, d.f. =k- q (2.34)

which is not a score test, although (2.32 and (2.33) are score tests, but is
asymptotically equivalent to the score test for H2 against HI. It can be further
shown that asymptotically

(2.35)

where Eli and E2i are expected values of frequencies in the i-th cell under the
hypotheses HI and H2 respectively.
An illustration of such a test for examining the equality of the A and E
gene frequencies of the 0, A, E, AB blood group system in two communities is
given in a paper by Rao (1961).
18 C. R. Rao

2.3.5 Some applications of the chi-square test


Chapters 3 and 4 of Fisher's (1958) book contain numerous examples of the
use of the chi-square test. In testing for goodness-of-fit, the main problem is
the comparison of observed frequencies with those expected under the given
specification. As a global test for a given specification, any measure of dif-
ference between the observed and expected frequencies can be used. Whether
the global test is significant or not at a chosen level of significance, of greater
importance is the discovery of patterns in the differences between observed and
expected frequencies. (A visual examination of the observed and expected fre-
quencies placed side by side in two columns followed by a confirmatory test of
any observed pattern of departure from null hypothesis should be a standard
practice in statistical modeling). A brilliant example is provided by the data
in Table 11 of Chapter 3 in Fisher's book, where Fisher points out a tendency
for families to produce more children of one sex. Another example of interest
is the data of Table 4 in the same Chapter of Fisher's book where the observed
and expected frequencies agree very closely, yielding a small value of chi-square,
indicating that the data are edited.

2.4 Other Tests of Goodness-of-Fit


As observed in earlier sections, any good measure of difference between the
vectors of observed and expected frequencies can provide a test of goodness-of-
fit. A rich class of tests can be generated by using Csiszar 1>-divergence between
observed (Oi) and expected frequencies (Ei) defined by

(2.36)

for every convex function ¢ : [0,00) --+ R U {oo} where O¢(OjO) = 0 and
o ¢(PjO) = lim¢(u)ju as u --+ 00. It is shown by Morales, Pardo and Va-
jda (1995) that

(2.37)

if we choose

where () is a k-vector parameter.


Karl Pearson Chi-Square Test 19

Read and Cressie (1988) proposed what they call power divergence statistics
defined by

(2.38)

where Ei = n7ri(fJ), using a BAN estimator of e. The statistic (2.38) has the
same asymptotic chi-square distribution on s - 1 - k degrees of freedom. This
class can be obtained as a special case of (2.36) by choosing
_ 1 ).+1
¢(X) - ,X(,x + 1) (x - x), ,X -=J 0,-1. (2.39)

Special choices of ,X lead to test criteria such as Pearson's chi-square, Ney-


man's modified chi-square, log likelihood ratio, modified log likelihood ratio
and Freeman-Tukey statistic.
A rich class of diversity measures as test criteria for goodness-of-fit is pro-
vided by the Jensen difference defined by Rao (1982)

JH(X, y)
X +
= H ( -2- 1 Y) 1
- 2 H (x) - 2H(Y) (2.40)

where H is an entropy (Le., a concave) function of a nonnegative s-vector


variable. We can generate an entropy function in many ways. One is to define
s
H(Xl, ... ,xs) = L¢(Xi)
i=l

where ¢ : (0,00) ~ R is a continuous concave function. In such a case the


statistic for goodness-of-fit test based on (2.40) is

8n Ls
i=l
[(0.
¢
n
-~
n
12 (0.)
+ E-) - -¢-~
n
- -1(E-)]
2 n
-~ -~ (2.41)

which is distributed asymptotically as a linear combination of 8- k-1 chi-square


variables on one degree of freedom each, where k is the number of unknown
parameters in the specification of the cell probabilities. For a study of the test
(2.41) based on Bose-Einstein entropy H defined by Burbea and Rao (1982a,
1982b), the reader is referred to Pardo (1999).
Another possibility is to use the quadratic entropy introduced by Rao (1984)

(2.42)

where x' = (Xl, ... , xs) and the coefficients aij are chosen such that the matrix

(ais+ajs-aij-ass), i,j=1, ... ,8-1


20 C. R. Rao

is nonnegative definite. The Jensen difference between

under quadratic entropy (2.42) is

(2.43)

The quantity J DQ (0, E) is actually a metric on the space of vectors x' =


(Xl, ... , xs), with Xi 2 0 and Xl + ... + Xs = 1. The goodness-of-fit test based
on DQ(O, E) is

(2.44)

which is distributed asymptotically as a linear function of chi-square variables


on one degree of freedom. The test statistics introduced in this section provide
a rich field for further research.

2.5 Specification Tests for Continuous Distributions


The specification tests for the continuous case are somewhat complicated. One
method is to replace the continuous distribution by a histogram choosing a
set of class intervals, in which case the methods of Section 2.3 are applicable.
However, there are no definite rules for the choices of the number and the
actual boundaries of class intervals. Some ways of dealing with these problems
are described in great detail in the book by Greenwood and Nikulin (1996).
A general criterion for goodness-of-fit in the continuous case is

T = IlFn(x) - F(x, e)11 (2.45)

choosing a suitable norm or a discrepancy measure for the difference between


Fn(-), the edf (empirical distribution function), and F(·, e), the specified distri-
bution function with an estimated value or a specified value of the parameter
O. Some special cases of (2.45) are the Kolmogorov-Smirnov statistic

Kn = sup vnlFn(x) - F(x, e)1 (2.46)


Ixl<oo

and Cramer-von Mises statistic

(2.47)
Karl Pearson Chi-Square Test 21

where \[! is some nonnegative function. The asymptotic distributions of (2.46)


and (2.47) are very complicated except in the case of a specified value of (). Some
methods of modifying these statistics to simplify their distributions are given
by Durbin (1973). The possibility of using bootstrap methodology is discussed
in the book by Shao and Tu (1995). Reference may also made to papers by
Beran (1986) and Romano (1988). The bootstrap methodology for goodness-of-
fit tests in the continuous case is not well developed and some further research
is needed.
There are a few other possible tests which are not explicitly mentioned in
the statistical literature and which seem to be worth studying.
Let x' = (X(I),'" ,X(n») be ordered statistics. To test a simple hypothesis
that the sample comes from a specified distribution F(x, ()o), we compute the
theoretical quantiles
'Ji - 1
F(X(i), ()o) = -~--, i = 1, ... ,n (2.48)
2n
and compare the vectors x and x' = (X(1), ... ,X(n»)' One possible statistic is

vnI)X(i) - X(i»)2
(2.49)
Bs = L:(X(i) - x)2 + L:(x(i) - x)2'
The sampling distribution can be obtained by simulation as F(x, ()o) is known.
To test a composite hypothesis that the sample comes from a distribution
F(x, ()), where the function F is specified and the parameter () is unknown, we
can use a statistic Be which is of the same form as (2.49) with xci) defined by

A 2i - 1
F(X(i)'())=~' i=l, ... ,n (2.50)

e
where is an efficient estimate of (). The distribution of Be may be compli-
cated or may involve the unknown parameter. In such a case, it is worthwhile
examining whether the bootstrap method is applicable.
There is some simplification if F(x, ()) belongs to translation scale family,
F[(x - J-L)/a]. In such a case, we define XCi) as
2i - 1
F(X(i)lJ-L = 0, a = 1) = -,)-, i = 1, ... ,n.
~n
(2.51)

Then the appropriate statistic for specification is the correlation coefficient r


computed from the pairs

(x (i) , XCi»)' i = 1, ... ,n.


The distribution of r can be obtained by simulation, drawing samples from
F(xlJ-L = 0, a = 1).
22 C. R. Rao

In conclusion, the tests described in this paper may have different power
functions depending on the alternatives, but none of them dominates the others.
The purpose of data analysis is learning from data, and a global test is only
of an exploratory nature. Of greater interest is the study of the pattern of
deviations between observed and expected frequencies. No doubt a global test
provides some confidence in search for possible alternatives. Any reasonably
good global test will do for this purpose. Long live chi-square.

References
1. Beran, R. (1986). Simulated power functions, Annals of Statistics, 14,
151-173.

2. Burbea, J. and Rao, C. R. (1982a). On the convexity of some divergence


measures based on entropy functions, IEEE Transactions Inform. Theory,
28,489-495.

3. Burbea, J. and Rao, C. R. (1982b). On the convexity of higher order


Jensen differences based on entropy functions, IEEE Transactions Inform.
Theory, 28, 961-963.

4. Csiszar, 1. (1963). Eine Informationtheoretishe Ungleichung und ihre An-


wendung auf den Beweiss der Ergodizitat von Markhoffschen von Ketten,
Publ. Math. Inst. Hung. Acad. Sci., Ser. A8,85-108.

5. Durbin, J. (1973). Distribution Theory for Tests Based on the Sample


Distribution Function, Vol. 9, Philadelphia: SIAM.

6. Efron, B. (1995). The statistical century, Royal Statistical Society News,


25, 1.

7. Fisher, R. A. (1958). Statistical Methods for Research Workers, 13th


Edition, New York: Hafner.

8. Greenwood, P. E. and Nikulin, M. S. (1996). A Guide to Chi-squared


Testing, New York: John Wiley & Sons.

9. Hall, W. J. and Mathiason, D. J. (1990). On large sample estimation and


testing in parametric models, International Statistical Review, 58, 77-97.

10. Koenkar, R. (1987). A comparison of asymptotic testing methods for l}-


regression, In Statistical Data Analysis, (Ed., Y. Dodge), pp. 287-294,
North Holland.
Karl Pearson Chi-Square Test 23

11. Lancaster, H. O. (1969). The Chi-squared Distribution, New York: John


Wiley & Sons.
12. Lehman, E. L. (1999). Elements of Large Sample Theory, New York:
Springer-Verlag.
13. Morales, D., Pardo, L. and Vajda, 1. (1995). Asymptotic divergence of
estimates of discrete distributions, Journal of Statistical Planning and
Inference, 48, 347-369.
14. Neyman, J. and Pearson, E. S. (1928). On the use and iriterpretation of
certain test criteria, Biometrika, 20, 175-240, 263-294.
15. Pardo, M. C. (1999). On Burbea-Rao divergence based goodness-of-fit
tests for multinomial models, Journal of Multivariate Analysis, 69, 65-
87.
16. Pearson, K. (1900). On the criterion that a given system of deviations
from the probable in the case of a correlated system of variables is such
that it can be reasonably supposed to have arise:p from random sampling,
Philosophical Magazine, 50, 157-172.
17. Rao, C. R. (1948). Large sample tests of statistical hypotheses concerning
several parameters with applications to problems of estimation, Proceed-
ings of the Cambridge Philosophical Society, 44, 50-57.
18. Rao, C. R. (1961). A study of large sample test criteria through properties
of efficient estimates, Sankhyti, Series A, 23, 25-40.
19. Rao, C. R. (1982). Diversity and dissimilarity coefficients: a unified ap-
proach, Journal of Theoretical Pop. Biology, 21, 24-43.
20. Rao, C. R. (1984). Convexity properties of entropy functions and analy-
sis of diversity, In Inequalities in Statistics and Probability, IMS Lecture
Notes, 5, 68-77.
21. Read, T. R. C. and Cressie, N. A. C. (1988). Goodness-of-fit Statistics
for Discrete Multivariate Data, New York: Springer-Verlag.
22. Romano, J. P. (1988). A bootstrap revival of some nonparametric distance
tests, Journal of the American Statistical Association, 83, 698-708.
23. Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap, New York:
Springer-Verlag.

24. Wald, A. (1943). Tests of statistical hypotheses concerning several pa-


rameters when the number of observations is large, Transactions of the
American Mathematical Society, 54,426-482.
24 C. R. Rao

25. Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio


for testing composite hypotheses, Annals of the Mathematical Statistical
Society, 9, 60-62.
3
Approximate Models

Peter J. Huber
Klosters, Switzerland

Abstract: The increasing size and complexity of data sets increasingly forces
us to deal with less than perfect, but ever more complicated models. I shall
discuss general issues of model fitting and of assessing the quality of fit, and
the important and often crucial roles of robustness and simulation.

Keywords and phrases: Robustness, simulation, Bayesian modeling, Markov


chain, Monte Carlo

3.1 Models
The anniversary of Karl Pearson's paper offers a timely opportunity for a digres-
sion and to discuss the role of models in contemporary and future statistics,
and the assessment of adequacy of their fit, in a somewhat broader context,
stressing necessary changes in philosophy rather than the technical nitty-gritty
they involve. The present paper elaborates on what I had tentatively named
"postmodern robustness" in Huber (1996, final section).
Karl Pearson's pioneering 1900 paper had been a first step. He had been con-
cerned exclusively with distributional models, and with global tests of goodness-
of-fit. He had disregarded problems caused by models containing free para-
meters: how to estimate such parameters, and how to adjust the count of the
number of degrees of freedom. Corresponding improvements then were achieved
by Fisher and others. Though, the basic paradigm of distributional models re-
mained in force and still forms the prevalent mental framework for statistical
modeling. For example, the classical texts in theoretical statistics, such as Cox
and Hinkley (1974) or Lehmann (1986), discuss goodness-of-fit tests only in the
context of distributional models. Apart from this, it is curious how current
statistical parlance categorizes models into classes - such as "linear models"

25
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
26 P. J. Huber

or "generalized linear models." The reason is of course that each such class per-
mits a specific formal statistical theory and analysis, or, expressing it the other
way around: each class constitutes the natural scope of a particular theoretical
framework. In my opinion, to be elaborated below, such narrow categorizations
can have undesirable consequences.
We now have a decisive advantage over Pearson: whenever there is an ar-
bitrarily complex but supposedly exact model and a test statistic, we can, at
least in principle, determine the distribution of that statistic under the null
hypothesis to an arbitrary degree of accuracy with the help of simulation. It is
this advantage which allows us to concentrate on the conceptual aspects, in par-
ticular on the traditionally suppressed problems posed by partial or otherwise
inaccurate models.
The word "model" has a bewilderingly wide semantic range, from tiny trains
to long-legged girls. Though, it hides a simple common feature: a model is
a representation of the essential aspects of some real thing in an idealized,
exemplary form, ignoring the inessential ones. The title of this paper is an
intentional pleonasm: by definition, a model is not an exact counterpart of the
real thing, but a judicious approximation. Mathematical statisticians, being
concerned with the pure ideas, sometimes tend to forget this, despite strong
admonitions to the contrary, such as by McCullagh and NeIder (1983, p. 6):
"all models are wrong." What is considered essential of course depends on the
current viewpoint - the same thing may need to be modeled in various different
ways. Progress in science usually occurs through thinking in models, they help
to separate the essential from the inessential.
A general discussion of the role mathematical models have played in sci-
ence should help to clarify the issues. They can be qualitative or quantitative,
theoretical or empirical, causal or phenomenological, deterministic or stochas-
tic, and very often are a mixture of all. The historical development of the
very first non-trivial mathematical models, namely those for planetary motion,
illustrates some salient points most nicely, namely the interplay between concep-
tual/qualitative and phenomenological/quantitative models, the discontinuous
jumps from a model class to the next, and the importance of precisely locating
the discrepancy rather than merely establishing the existence of discrepancies
by a global test of goodness-of-fit.
Early Greek scientists, foremost among them Anaxagoras (ca. 450 BC), had
tried to explain the irregular motions of the planets by a "stochastic model,"
namely by the random action of vortices. This model did not really explain
anything, it just offered a convenient excuse for their inability to understand
what was going on in the skies. In the 4th century BC, Eudoxos then invented
an incredibly clever qualitative model. He managed to explain the puzzling
retrograde motion of the planets deterministically in accordance with the philo-
sophical theory that celestial motions ought to be circular and uniform. For
each planet, he needed four concentric spheres attached to each other, all ro-
Approximate Models 27

tating uniformly. Quantitatively, the model was not too good. In particular,
even after several improvements, it could not possibly explain the varying lu-
minosity of the planets, because in this model their distances from the earth
remained constant. About the same time the Babylonians devised empirical
models, apparently devoid of any philosophical underpinning; they managed to
describe the motions of the moon and the planets phenomenologically through
arithmetic schemes involving additive superposition of piecewise linear func-
tions. Around 130 AD, Ptolemy then constructed an even cleverer model than
Eudoxos. He retained the politically correct uniform circular motion, but the
circles were no longer concentric. With minor adjustments this type of model
withstood the tests of observational astronomy for almost 1500 years, until Ke-
pler, with the superior observations of Tycho Brahe at his disposal, found a
misfit of merely 8' (a quarter of the apparent diameter of sun or moon, and just
barely above observational accuracy) in the motion of a single planet (Mars).
He devised a fundamentally new model, replacing the uniform circular by el-
liptical motions. The laws describing his model later formed the essential basis
for Newton's theory of gravitation.
For us, Kepler's step is the most interesting and relevant. First, we note that
the geocentric Ptolemaic and the heliocentric Copernican models phenomeno-
logically are equivalent - both belong to the class of epicyclic models, and with
properly adjusted parameters the Ptolemaic model renders the phenomena, as
seen from the earth, absolutely identically to the Copernican one. But the
conceptual step to heliocentricity was a most crucial inducement for Kepler's
invention. Second, Kepler's own account shows how difficult it is to overcome
modeling prejudices. It required the genius of Kepler to jump over the shadow
cast forward by 1500 years of epicyclic modeling. To resist and overcome the
temptation to squeeze the data into a preconceived but wrong model class -
by piling up a few more uninterpretable parameters - may require comparable
independence of mind. I think that here we have a warning tale about the
dangers of categorizing models into narrowly specified classes.

3.2 Bayesian Modeling


In the past years, after the paper by Geman and Geman (1984), modeling
activity in the statistical literature has concentrated mainly on Bayesian mod-
eling and on Markov Chain Monte Carlo methods; see the survey by Besag et
al. (1995). There are points of close contact of this activity with the present
paper, namely the common concern with complex models and simulation.
Modeling issues provide one of the neatest discriminations between the rel-
ative strengths and weaknesses of the Bayesian and frequentist approaches;
the two complement each other in a most welcome fashion. The frequentist-
28 P. J. Huber

Pearsonian approach has difficulties with the comparative assessment of com-


peting models. P-values, when abused for that purpose, are awkward and un-
intuitive, to say the least, while the Bayesian approach provides a streamlined
mechanism for quantitative comparisons through posterior probabilities.
On the other hand, Bayesian statistics lacks a mechanism for assessing
goodness-of-fit in absolute terms. The crux of the matter, alluded to but not
elaborated in the excellent discussion of scientific learning and statistical in-
ference by George Box (1980, see p. 383f.), is as follows. Within orthodox
Bayesian statistics, we cannot even address the question whether a model Mi
under consideration is consonant with the data y. For that, we either must
step outside of the Bayes framework and in frequentist-Pearsonian manner per-
form a goodness-of-fit test of the model against an unspecified alternative (an
idea to which Box, being a Bayesian, but not a dogmatic one, obviously does
not object), or else apply "tentative overfitting" procedures. The latter are
based on the unwarranted presumption that by throwing in a few additional
parameters one can obtain a perfectly fitting model. But how and where to
insert those additional parameters often is far from obvious, unless one is aided
by hindsight and is able to abandon one's tacit prejudices about the class of
models. Remember that Kepler rejected the epicyclic Ptolemaic/Copernican
models because he could not obtain an adequate fit within that class.
The recent tutorial on Bayesian model averaging by Hoeting et aZ. (1999),
through the very tentativeness of their remarks on the choice of the class of
models over which to average (what are "models supported by the data"?),
further highlights the problems raised by model adequacy. To put the spotlight
on the central problem, we consider the special case of a small finite number of
discrete models (such that the fit cannot be improved by throwing in a few ad-
ditional parameters). Before we can meaningfully average, or assign meaningful
relative probabilities to the different models by Bayesian methods, we ought to
check whether the models collectively make sense. A mere verification by a
goodness-of-fit test that the class contains models not rejected by such a test,
is not good enough. Averaging over a class of spurious models is not better and
safer than treating a single non-rejected model as correct - see the next section
- and maybe even worse, because it obscures the underlying problem. Only
in very exceptional cases one can check appropriateness of a class of models by
a reverse application of a goodness-of-fit test, namely by testing and rejecting
the hypothesis that all models under consideration are wrong. For one of the
rare cases where such a "badness-of-fit" test works see Huber et al. (1982).
In short, while Bayesian modeling offers some food for thought, plus a bag
of simulation techniques to borrow from, it cannot help with regard to questions
of goodness-of-fit and of model adequacy.
Approximate Models 29

3.3 Mathematical Statistics and Approximate


Models
Karl Pearson, as can be seen from a few remarks scattered through his 1900
paper, had been perfectly aware that his distributional models were approxima-
tions. For example, he notes that a model fitting well for a particular sample
size would fit for smaller, but not necessarily for larger samples.
After Karl Pearson, under the influence of Fisher and others, mathemati-
cal statistics developed a particular modus operandi: take a simple, idealized
model, create an optimal procedure for this model, and then apply the proce-
dure to the real situation, either ignoring deviations altogether, or invoking a
vague continuity principle. 1 By 1960, this approach began to run out of steam:
on one hand, one began to run out of manageable simple models, on the other
hand, one realized that the continuity principle did not necessarily apply: op-
timization at an idealized model might lead to procedures that were unstable
under minor deviations. The robustness paradigm - explicitly permitting small
deviations from the idealized model when optimizing - carried only a few steps
further.
Statisticians, as a rule, pay little attention to what happens after a statistical
test has been performed. If a goodness-of-fit tests rejects a model, we are left
with many alternative actions. Perhaps we do nothing (and continue to live
with a model certified to be inaccurate). Perhaps we tinker with the model by
adding and adjusting a few more features (and thereby destroy its conceptual
simplicity). Or we switch to an entirely different model, maybe one based
on a different theory, or maybe, in the absence of such a theory, to a purely
phenomenological one.
In the opposite case, if a goodness-of-fit test does not reject a model, sta-
tisticians have become accustomed to act as if it were true. Of course this is
logically inadmissible, even more so if with McCullagh and Nelder one believes
that all models are wrong a priori. The data analytic attitude makes more sense,
namely to use such tests merely as warning signs against over-interpretation:
renounce attempts at interpreting deviances from the model if a goodness-of-fit
test (with a much higher than usual level) fails to reject the model.
Moreover, treating a model that has not been rejected as correct can be
misleading and dangerous. Perhaps this is the main lesson we have learned
from robustness. A few illustrations follow.
(1) Model-optimality versus fit-optimality. Distributional goodness-of-fit
tests typically are based on the value of the distance measured between the em-
IThere has been curiously little concern in the statistical literature with the optimality of
goodness-of-fit tests themselves (i.e. of *2- and F-tests), and it has been relegated to notes,
such as by Lehmann (1986, pp. 428-429).
30 P. J. Huber

pirical and the model distribution. To fix the idea, let us take the Kolmogorov
distance, which makes exposition simpler (but exactly the same arguments ap-
ply to a distance based on the chi-2 test statistic):

K (fL, 0-) = sup (IFn (x) - q> (x) I)


x

where fL, 0- are either the traditional model-optimal estimates:


(a) fL,o- = ML estimate for model q>,

or the fit-optimal estimates:

(b) fL,o- = minJ.L,aER K (/1, (J).

In case (a), the test sometimes correctly rejects the hypothesis of normality
for the wrong reason, namely if an outlier inflates the ML estimate for (J, even
though the fit, in terms of minimized distance (b), is good. The disturbing
fact is that the traditional recommendation, namely to estimate the unknown
free parameters by any asymptotically efficient estimate, viz. either maximum
likelihood or minimum chi-2, may have very different consequences depending
on which estimate one chooses.
(2) Minimax robust estimates. Observational errors in most cases are excel-
lently modeled by the normal distribution, if we make allowance for occasional
gross errors (which may be of different origin). If we formalize this in terms of
a contamination model, then the normal part represents the essential aspects of
the observational process in an idealized form, the contamination part merely
models some disturbance factors unrelated to the quantities of interest. But
model-optimal estimates for the idealized model, e.g., the sample mean, are
unstable under small deviations in the tails of the distribution, while robust
estimates, such as judiciously chosen M-estimates, offer stability but lose very
little efficiency at the normal model.
Typically, we are not interested in estimating the parameters of the conta-
minant, or to test it for goodness-of-fit, and mostly not even able, given the
available sample size. Note that a least favorable distribution is not intended
to model the underlying situation (even though it may approximate the true
error distribution better than a normal model), its purpose is to provide a ML
estimate that is minimax robust.
(3) Linear fit. Assume that you want to fit a straight line to an approx-
imately linear function that can be observed with errors in a certain interval
(a,b). Assume that the goal of the fit is to minimize the integrated mean
square deviation between the true, approximately linear function and an ideal-
ized straight line. A model-optimal design will put one half of the observations
at each of the endpoints of the interval. A fit-optimal design will distribute the
observations roughly uniformly over the interval (a,b). The unexpected and
Approximate Models 31

surprising fact is that subliminal deviations of the function from a straight line
(i.e. deviations too small to be detected by goodness-of-fit tests) may suffice
to make the fit-optimal design superior to the model-optimal design [ef. Huber
(1975b)].
In short: for the sake of robustness, we may sometimes prefer a (slightly)
wrong to a (possibly) right model.

3.4 Statistical Significance and Relevance


It is a truism that for a sufficiently large sample size any model ultimately will
be rejected. But this is merely a symptom of a more serious underlying problem,
involving the difference between statistical significance and physical relevance.
Since this is a problem of science rather than one of statistics, statisticians
unfortunately tend to ignore it. The following example is based on an actual
consulting case.
Example: A computer program used by physicists for analyzing certain
experiments automatically plotted the empirical spectrum on top of the theo-
retical spectrum, and in addition it offered a verbal assessment of the goodness-
of-fit (based on a chi-2 statistic). Perplexingly, if the fit was visually perfect, it
would often be assessed by the program as poor, while a poor visual fit often
would be assessed as good. At first, one suspected a programming error in the
statistical calculations, but the reason was: the experiments had widely dif-
ferent signal-to-noise ratios, and if the random observational errors were large,
the test would not reach statistical significance, and if they were small, the
test would reject because of systematic errors (either in the models or in the
observations), too small to show up in the plots and probably also too small to
be physically relevant.
With increasing data sizes, the paradoxical situation of this example may
occur even more often: if a global goodness-of-fit test does not reject, then the
observations are too noisy to be useful, and if it rejects, the decision whether or
not to accept the model involves a delicate judgment of relevance rather than
of statistical significance.
In short, in real-life situations the interpretation of the results of goodness-
of-fit tests must rely on judgment of content rather than on P-values. For a
traditional mathematical statistician, the implied primacy of judgment over
mathematical proof and over statistical significance clearly goes against the
grain. John Tukey once said, discussing the future of data analysis (1962, p.
13): 'The most important maxim for data analysts to heed, and one which
many statisticians seem to have shunned, is this: "Far better an approximate
answer to the right question, which is often vague, than an exact answer to the
wrong question, which always can be made precise.'" Analogous maxims apply
32 P. J. Huber

to modeling: a crude answer derived in a messy way from an approximately


right model is far better than an exact answer from a wrong but tractable
model.

3.5 Composite Models


The problems considered and solved by mathematical statisticians are model
problems themselves - they idealize exemplary situations. Unfortunately, these
problems and their solutions often are too simple to be directly applicable and
then ought to be regarded as mere guidelines. Applied statisticians and data
analysts today are confronted by increasingly more massive and more complex
real-life data, requiring ever more complex models. Such complex models then
must be pieced together from simpler parts.
Composite models facilitate a conceptual separation of the model into im-
portant and less important parts, and they also make it easier to locate where
the deviations from the model occur. Usually, only some parts of the model are
of interest, while other parts are a mere nuisance. Sometimes, in traditional
fashion, we may want to test for the presence or absence (perhaps more ac-
curately: lack of importance) of certain features by not including them in the
model. But we now have to go beyond. We may be willing to condone a lack
of fit in an irrelevant part, provided it does not seriously affect the estimation
of the relevant parameters of interest. Once more, this involves questions of
judgment rather than of mathematical statistics, and the price to pay is that
the traditional global tests of goodness-of-fit lose importance and may become
meaningless. Often, despite huge data sizes, preciously few degrees of freedom
are relevant for the important parts of the model, while the majority, because
of their sheer number, will highlight mere model inadequacies in the less impor-
tant parts. We may be not willing to spend the effort, or perhaps not even able,
to model those irrelevancies. We need local assessments of the relevant parts of
the fit, whether informally through plots of various kinds [ef. the remarks by
Box (1980)], or formally, through tests.
Parameter estimation in a composite model is tricky. Unfortunately, theo-
retical results applying to the components rarely carryover to the composition.
Usually, some version of backfitting will perform part of the work: assume there
are component models AI, ... , An, and you know how to estimate the parame-
ters of Ai separately for each i. Fix preliminary parameter estimates for all
models except Ai, improve those of Ai, and cycle repeatedly through all i. This
should result in correct values of the parameter estimates. However, backfitting
creates devilishly tricky statistical problems with regard to the estimation of
standard errors of the estimated parameters, and with counting the number of
degrees of freedom for goodness-of-fit tests. The standard errors estimated from
Approximate Models 33

the separate models Ai, keeping the others parts of the model fixed, may be se-
rious underestimates. Cross-validation estimates, for example, are invalidated
by repeated cycling.
Once I had felt that stochastic modeling, despite its importance, belonged
so much to a particular field of application that it was difficult to discuss it in a
broad and general framework, and I had therefore excluded it from a discussion
of current issues in statistics [Huber (1975a, p. 86)J. I now would modify my
former stance. I still believe that an abstract and general discussion will fail
because it is practically impossible to establish a common basis of understanding
between the partners of such a discussion. On the other hand, a discussion based
on, and exemplified by, substantial and specific applications will be fruitful.
All my better examples are concerned with the modeling of various applied
stochastic processes. This is not an accident: stochastic processes are creating
the most involved modeling problems. The following example (on modeling the
rotation of the earth) may be the best I have met so far. It shows the intricacies
of stochastic models in real situations, in particular how much modeling and
data processing sometimes has to be done prior to any statistics, and how
different model components must and can be separated despite interactions.
The runner-up is in biology rather than in geophysics, and it operates in the
time domain rather than in the frequency domain [modeling of circadian cycles,
Brown (1988)J.
Example: Modeling the length of day [Huber (2000)J. Because of tidal fric-
tion, the rotation of the earth slows down: the length of day (LOD) increases by
about 2 milliseconds per century. An analysis of medieval and ancient eclipses
back to about 700 BC had shown that on top of the systematic slow-down
there are very substantial random fluctuations. They must be taken into ac-
count when one is extrapolating astronomical calculations to a more distant
historical past. In terms of the LOD-process, these poorly determinable fluctu-
ations are compatible with a Brownian motion (or random walk) model, whose
increments have a variance of about 0.05 ms2/year.
Being interested in estimating the size of the extrapolation errors, I won-
dered whether such a millennial Brownian motion component was a long-range
effect only, or whether it would be discernible also in the more accurate but
much shorter modern series of measurements, and in particular whether those
modern measurements might even permit a more accurate estimate of the vari-
ance of the increments. I could obtain three such series of different lengths and
observational accuracies, listing length-of-day values in intervals of 4 months,
5 days and 1 day, starting in the years 1830, 1962 and 1976 respectively. In
the power spectrum of a differenced series of LOD values, a Brownian motion
component· should manifest itself in the low frequency part of the spectrum
as a horizontal tail end. The problem with the modern measurements is that
there are many nuisance effects, with periodicities ranging from days to tens
of years, and of a size comparable to the putative Brownian motion process.
34 P. J. Huber

Aided by hindsight, a reasonably comprehensive description of the components


of the LOD-process goes as follows.

(1) Systematic drift of about 2 msjcy. The "true" rate cannot be estimated
very accurately from the data because of the random Brownian motion
(2) sitting on top.

(2) Brownian motion (or random walk process). Putative cause: cumulative
random changes in the rotational moment of inertia of the earth's mantle,
induced by plate tectonics.

(3) Decadal fluctuations, with an amplitude of several milliseconds. See Fig-


ure 3.1. The common opinion is that they have to do with damped oscil-
lations on the mantle-core boundary.

(4) Seasonal effects, with an amplitude of about 0.4 ms. Exchange of angular
momentum between the atmosphere and the solid earth, caused by sea-
sonal temperature changes and winds. See Figure 3.2. They can be taken
out cleanly by fitting a trigonometric polynomial to the LOD-process.

(5) "50-day Madden-Julian oscillation." This is a broad spectral hump near


periodicities of 40-50 days, corresponding to damped oscillations with a
root-mean-square amplitude of about 0.17 ms. See Figure 3.2. Appar-
ently, it is due to exchange of angular momentum between the atmosphere
and the solid earth, mediated by winds and putatively caused by random
temperature changes in the atmosphere. The underlying physical mecha-
nism can be modeled quite accurately by an AR(2) process.

(6) Measurement errors. Extremely inhomogeneous, 3-5 ms in the 19th cen-


tury, dominating the spectrum for periodicities below 8 years in the 4-
month series. For most practical purposes they are negligible after 1962,
i.e. then they are dominated by the contributions of (5). See Figure 3.1.

(7) Solid earth tides. These are reasonably well understood, deterministic
effects. In the later parts of the data series made available to me, namely
since 1982, they had been eliminated through preprocessing, but not be-
fore (remnants are the peaks in the 10-14 days range in Figure 3.2).

(8) Side effects of preprocessing (suppression of high frequency noise, includ-


ing measurement errors, through Kalman filtering). They affect the high
frequency parts of the spectrum and depress the spectral power for peri-
odicities below 30 days, leading to biased estimates of the parameters of
the AR(2) process (5). Their existence was discovered only when analyz-
ing a shorter, very accurate I-day series ranging from 1984-1996, which
showed analogous artifacts, but in the range below 5 days.
Approximate Models 35

time
1850 1900 1950

10

5
1990.8
~
"0
0

-5

-10

-15

15

10

5
al 1990.8
~0 0 .4
1924.5 ____- - - " ' -

E
Vl
-5

-10

-15 LOD: 4m-series

Figure 3.1: The 4-lunation series (covering the years 1830-1990 in 4-month
intervals) in the time domain: the actual data in the series, and a smoothed
version (obtained by forming moving averages). Note the changing level of the
observational noise and the decadal waves
36 P. J. Huber

sqrt(freq)
4
182.86 Lagl0-Spectrum of
differenced 5d series

Data:
- actual (dotted)
- deseasoned (solid)
- 6 simulations (grey)

Model: superposition of
0.5
- random walk
- AR(2) "50 day oscillation"

-0.5

-1

Figure 3.2: Log10-spectrum of the differenced 5-day series (covering the years
1962-1995 in 5-day intervals). The cross-over between the random walk process
and the AR(2) model occurs near 8 months (243.81 days). On purpose, only
the two most prominent components (2) and (5) of the model are used
Approximate Models 37

There is a delicate interplay between the components (2) and (3). We note
that random changes in the rotational moment of inertia of the earth's mantle,
as postulated in (2), by preservation of angular momentum cause wiggles in the
rotation rate of the mantle. These wiggles excite damped oscillations on the
mantle-core boundary, with a resonance in the decadal range. High-frequency
components only wiggle the mantle, but in the low-frequency range, mantle
and core move together as one solid body. Even though the exact coupling
mechanisms are not known, the net effect is that the spectrum of the differenced
LOD-series will be flat both below and above the resonance frequency, with a
19% smaller value in the low frequency range (the size of the drop is determined
by the known ratio between the moments of inertia of mantle and core), with
a hump of poorly determined shape in between.
The feature of interest in the spectrum of Figure 3.2 is the putative Brown-
ian motion (or random walk) component, which should manifest itself in a flat
low-frequency spectrum. We would like to check whether a Brownian motion
model fits, and we would like to estimate the size of its contribution. In the 5-
day series, the AR(2)-contribution corresponding to the 50-day Madden-Julian
oscillation dominates the spectrum of the deseasoned series for periodicities
shorter than about 8 months. Information about the putative Brownian motion
part can be gleaned only from the tiny low-frequency tail end of the spectrum
(which has been stretched out in Figure 3.2 for better visibility by plotting the
power spectra against the square root of the frequency rather than against the
frequency itself). It is obviously non-trivial to separate the Brownian motion
contribution from the AR(2) component. When estimating the AR(2) parame-
ters we must rely on the middle frequency range (where the 50-day oscillation is
dominant), and we must remember the tricks of the robustness trade and make
sure for example that the irrelevant peaks in the 10-14 day range do not bias
our parameter estimates. Then, in the low frequency range, where the Brown-
ian motion is dominant, the expected contribution of the AR(2) model must
be subtracted as a correction from the total spectrum, in order to estimate the
contribution of the Brownian motion. As an added complication, the very low
end of the data spectrum may already be somewhat inflated by the contribu-
tions of decadal fluctuations (they were not modeled in the simulations depicted
in Figure 3.2). In order to reduce processing artifacts caused by smoothing, the
actual parameter estimation was not based on the spectra shown in Figure 3.2,
but on the periodogram values themselves. This yielded an estimate of the vari-
ance of the increments of 0.072 ms2/year, valid above the resonance frequency.
For the millennial range (below the resonance frequency), this translates into
0.058 ms2/year, with a 95% confidence interval (0.040, 0.089). - By the way, for
somebody reared on Box-Jenkins time-series analysis, it is quite an educational
experience if he has to devise his own ARMA estimates based on a periodogram
segment!
In the 4-lunation series the high level of measurement noise creates com-
38 P. J. Huber

plications: it dominates the spectrum for periodicities shorter than 8-10 years.
This leaves deplorably few periodogram ordinates for estimating the Brown-
ian motion contribution in the spectrum. They straddle the decadal resonance
hump (3); after subtracting a somewhat crudely estimated contribution from
measurement errors leaking into that range, the average spectral power there
is 0.13 ms2/year. While this value is boosted by the decadal hump, it still is
just barely significantly larger than the value 0.072 ms2/year estimated from
the 5-day series. If we model (3) by a damped harmonic oscillator excited by
that Brownian motion and estimate the coupling parameters from the data (in
statistical parlance this amounts to modeling the mantle-core oscillations by an
AR(2) process), we can get an essentially perfect fit of the spectrum in that
range.
Among other things, this example illustrates that for some parts of the
model (usually the less interesting ones) we may have an abundance of degrees
of freedom, and a scarcity for the interesting parts.

3.6 The Role of Simulation


Following Karl Pearson's classical example, goodness-of-fit usually is assessed
in terms of weighted sums of squared deviations of the observations from their
expected values; their distribution then is approximated by a chi-2 distribution
with a suitable number of degrees of freedom. As already Pearson had found,
the distribution theory of this test statistic is not exactly trivial, and with
complex, composite models, this kind of comparison is no longer feasible for
multiple reasons. The fact that iterative backfitting defies distribution theory
(except in trivial cases) is only one of them. Another serious problem is due to
artifacts caused by preprocessing and processing of the data. Remember that a
statistician rarely sees the raw data themselves - most large data collections in
the sciences are being heavily preprocessed already in the collection stage, and
the scientists not only tend to forget to mention it, but sometimes they also
forget exactly what they had done.
A possible, and often the only escape from this quandary is simulation:
compare the data with simulations of the model. (Note that in the goodness-
of-fit context resampling methods are inappropriate, quite apart from the fact
that they do not work for highly structured data.) This way, one can derive
a test from any arbitrary statistic: calculate the statistic and adjust nuisance
parameters in exactly the same way for the data and for a large number of
simulations of the model. Reject if the value of the statistic derived from
the data is sufficiently far out in the tails of the set of values derived from
the simulations. In particular, the spread of the simulated values of a point
estimate permits to supplement the latter with an estimated standard error.
Approximate Models 39

Of course, just like the classical approaches, also the simulation methods only
measure stochastic variabilities intrinsic to a model assumed to be correct: they
implicitly assume that the estimated parameter values are so close to the true
ones that the latter can be replaced by the former without committing a serious
error when assessing the variability of an estimate or test statistic. Admittedly,
in practice there may be problems; for example, parameter estimation may fail
to converge for a small percentage of the samples, and this may selectively affect
the tails.
Under mild monotonicity assumptions, but at a relatively high computing
cost, it is possible to supplement point estimates with somewhat more reliable
confidence interval estimates than by the method just described. For example,
in order to find a lower confidence bound, one takes the model and replaces
the estimated value () of the model parameter of interest (in the example of the
preceding section: the variance of the increments of the Brownian motion) by
a suitably chosen smaller value ()o. Then one uses the thus modified model for
simulating 1000 data sets, and derives new parameter estimates from each of
these sets. If, say, 950 of the newly estimated values of () then are smaller, and
50 larger than the original estimate of (), then ()o constitutes an approximate
one-sided 95% lower confidence bound. The determination of a suitable ()o is
expensive because it requires a considerable amount of trial and error.
But thanks to simulation, a judgmental assessment of goodness-of-fit need
not even be based on a test statistic (whose selection always is delicate). The
principle is simple and inspired by the line-up methods used by the police: if
the actual data hides well among half a dozen or a dozen simulations of the
model, the model is judged acceptable; if the actual data sticks out like a sore
thumb, the model is no good. Figure 3.2 illustrates the approach by showing
the spectrum estimates resulting from 6 simulations of the model, indicating a
good fit in the interesting low frequency range. But the example also illustrates
a general problem of any global approaches to goodness-of-fit, namely the peaks
in the 10-14 day range, which make the actual data set stick out like the sore
thumb mentioned above. In this case the origin of the discrepancy is under-
stood, and it is irrelevant because it lies in an uninteresting frequency range. A
more sophisticated version of the line-up method, permitting approximate sig-
nificance tests, is to create a pool composed of the actual data set and, say, 99
simulated sets. Somebody not knowing which is which has 5 attempts to pick
the actual set out of the pool, using any tools of his or her choice. If the actual
data set is not among the selected five, the model is deemed to be adequate.
3.7 Summary Conclusions
If we want to be able to deal with increasingly larger and more complex data
sets, we need to go beyond the current, overly narrow ingrained modeling con-
cepts of statistics. The models will become more complex, too, but a clean
statistical theory with mathematically rigorous results is possible only for clean
and simple models. We will have to pay a price, but we will also gain something
in the process. The questions will shift more and more from a mere yes-or-no
global check whether the model is adequate (I prefer the term "model ade-
quacy" to "model validity" - a model adequately rendering the observations
need not be valid in any intrinsic sense), to a detailed assessment of the quality
of the fit, to questions of interpreting the fit, and in particular to the need to
locate and interpret deviations from a model which is known to be imprecise,
and to separate essential deviations from irrelevant ones.

References
1. Besag, J., Green, P., Higdon, D., and Mengersen, K. (1995). Bayesian
computations and stochastic systems, Statistical Science, 10, 1-66.

2. Box, G. E. P. (1980). Sampling and Bayes' inference in scientific modelling


and robustness, Journal of the Royal Statistical Society, Series A, 143,
383-430.

3. Brown, E. N. (1988). Identification and estimation of differential equa-


tion models for circadian data, Ph.D. Thesis, Department of Statistics,
Harvard University.

4. Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics, London:


Chapman and Hall.

5. Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distri-


butions, and the Bayesian restoration of images, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 6, 721-741.

6. Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999).


Bayesian model averaging: A tutorial, Statistical Science, 14, 382-417.

7. Huber, P. J. (1975a). Applications vs. abstraction: the selling out of


mathematical statistics? In Proceedings of the Conference on Directions
for Mathematical Statistics (Ed., S. G. Ghurye), Special Supplement to
Advances in Applied Probability.
Approximate Models 41

8. Huber, P. J. (1975b). Robustness and designs, In A Survey of Statistical


Design and Linear Models (Ed., J. N. Srivastava), pp. 287-301, North-
Holland Publishing Company.

9. Huber, P. J. (1996). Robust Statistical Procedures, Second Edition, Philadel-


phia, PA.: SIAM.

10. Huber, P. J. (2000). Modeling the length of day and extrapolating th.e
rotation of the earth, In Astronomical Amusements, Papers in Honor of
Jean Meeus (Eds., F. Bonoli, S. De Meis, and A. Panaino), Milano: IsIAO.

11. Huber, P. J., Sachs, A., Stol, M., Whiting, R. M., Leichty, E., Walker, C.
B. F., and vanDriel, G. (1982). Astronomical Dating of Babylon I and Ur
III. Occasional Papers on the Near East, Vol. 1, Issue 4, Malibu: Undena
Publications.

12. Lehmann, E. L. (1986). Testing Statistical Hypotheses. 2nd edition. New


York: John Wiley & Sons.

13. McCullagh, P. and NeIder, J. A. (1983). Generalized Linear Models, Lon-


don: Chapman and Hall.

14. Tukey, J. W. (1962). The future of data analysis, Annals of Mathematical


Statistics, 33, 1-67.
PART II
CHI-SQUARED TEST
4
Partitioning the Pearson-Fisher Chi-Squared
Goodness-of-Fit Statistic

G. D. Rayner
Deakin University, Geelong, Australia

Abstract: This paper presents an overview of Rayner and Best's (1989) cate-
gorised Neyman smooth goodness-of-fit score tests, along with an explanation
of recent work into how these tests can be used to construct components of the
Pearson-Fisher chi-squared test statistic in the presence of unknown nuisance
parameters. A short simulation study examining the size of these component
test statistics is also presented.

Keywords and phrases: Pearson-Fisher, chi-squared statistic decomposition,


Neyman smooth test, categorized composite null hypothesis

4.1 Introduction
Pearson's (1900) chi-squared test statistic

X~ = "2..)observed - expecte(},) 2 /expected

was the first, is the most well known, and is probably the most frequently used
test for goodness-of-fit. This test is essentially an omnibus test in that it is
sensitive to a wide variety of different ways in which the data can be different
to the hypothesized distribution. For example, the chi-squared test is able to
detect data that differs from the hypothesized distribution in terms of any of
location, scale, shape, etc. It is interesting to try and consider the component
test statistics, sensitive only to more specific departures, that might combine
to produce the chi-squared test statistic.
For a sample space broken into m classes, let Nj (j = 1, ... , m) be the
number of observations from the sample (of size n = ~j N j ) that fall into the

45
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
46 G. D. Rayner

j-th class. If Pj is the probability of an observation falling into the j-th class
under the completely specified hypothesized distribution, then the Pearson chi-
squared test statistic
m
(4.1)
j=l

is asymptotically X;'-l distributed. A value of X~ larger than the appropriate


critical value indicates that the observations are not distributed according to
the null hypothesis.
When the hypothesized distribution is not completely specified (that is, q
unknown parameters are present), then obtaining pj (the estimated probability
of an observation falling into the j-th class) requires somehow finding these q
unknown nuisance parameters. Different methods of estimating the nuisance
parameters result in different tests [see Rayner and Best (1989, chapter 2)]. For
example, if the q nuisance parameters are estimated using MLE's based on the
grouped data, then we obtain the Pearson-Fisher chi-squared test statistic
m
X~F = ~(Nj - npj)2 j(npj). (4.2)
j=l

This statistic is asymptotically X;'-Q-1 distributed.


This paper begins in Section 4.2 with an introduction to Rayner and Best's
(1989) smooth goodness-of-fit tests for categorized data, both when there are
no nuisance parameters (the simple case), and when nuisance parameters are
present (the composite case). Rayner and Best's (1989) approach to partition-
ing the Pearson-Fisher chi-squared statistic X~F is explained, along with its
major drawback: they supply only conditions for obtaining the component test
statistics, rather than a constructive method for doing so.
Section 4.3 outlines my recently presented method (Rayner, 2000a) for con-
structing an interpretable decomposition of X~F from the restrictions given by
Rayner and Best (1989). Finally, size studies of these component test statistics
obtained under several different categorisations are presented and discussed in
Sections 4.4 and 4.5.

4.2 Neyman Smooth Goodness-of-Fit Tests


Rayner and Best (1989) base their smooth goodness-of-fit tests on an idea
of Neyman's (1937), where the null hypothesis density is embedded in a k-
parameter Neyman smooth alternative, such that when the vector of these k
parameters () = 0, this alternative is the same as the hypothesized distribu-
tion. The score test for () = 0 versus () i- 0, with all its desirable asymptotic
properties, is then used as a goodness-of-fit test.
Partitioning the Pearson-Fisher Chi-Squared Goodness-of-Fit Statistic 47

4.2.1 Smooth goodness-of-fit tests for categorized data


In the simple categorized case, where no unknown (nuisance) parameters are
present, Rayner and Best's (1989) Neyman smooth tests of goodness-of-fit are
obtained by deriving the score test statistic for the category probabilities P =
(PI, ... ,Pm)T. Let

7rj = C(O) exp {t Od~i,j}


t=1
Pj, j = 1, ... , m

where 0 = (01"'" Ok)T are k :::; m-1 real parameters and C(O) is a normalizing
constant (such that 2:j 7rj = 1), and for each i, hi,j are the values taken by a
random variable Hi with P(Hi = hi,)) = 7rj. Testing

Ho: 0 = 0, versus
HI: 0 -=1= 0

decides between the null hypothesis probabilities P and the k-dimensional Ney-
man smooth alternative probabilities 7r = (7r1' ..• , 7rm).
Put N = (Nl,"" Nm)T as the vector of observed counts in each category,
n = 2: j Nj the sample size, D = diag(Pl, ... ,Pm), and the k x m matrix H as
having entries hi,j. Rayner and Best (1989) calculate the score statistic for this
situation as

where ~ is the covariance matrix of the random variables Hi under the null
hypothesis.
For k = m -1, requiring that H satisfies HDHT = 1m-l and Hp = 0 gives
~ = 1m-I, where h is the k x k identity matrix. This means that the score
test statistic Sk can be written as
Sm-l (N - npfHTH(N - np)jn
(N - np)T(D-l - D- 1ppT D- 1 )(N - np)jn
m
L(Ni - npi)2 j(npi) = X~,
i=1

the Pearson chi-squared statistic (see equation (4.1)). Defining

v = H(N - np)jvn = HNjvn


points the way towards partitioning X~ = VTV into m - 1 asymptotically
independent standard normal component test statistics
m
Vr = L hr,j Nj j vn, r = 1, ... , m - l.
j=1
48 G. D. Rayner

The interpretation of these component test statistics depends entirely on H. In


the absence of a compelling reason otherwise, Rayner and Best (1989) recom-
mend selecting hr,j to be the r-th degree orthogonal polynomial evaluated at j,
or hr,j = hr(j). Here, the restrictions o~ H become

which define these orthogonal polynomials. Rayner and Best (1989) suggest
that this selection allows an r-th order moment departure interpretation for
the r-th component test statistic Vr. That is, a significantly large value for Vr
indicates that the data departs from the hypothesized distribution in terms of
moments of order less than or equal to r.
Rayner and Best (1989) clearly describe how these component statistics
should be used: either (1), in an EDA fashion to examine how the data differ
from the hypothesized distribution; or (2), when testing for the hypothesized
distribution, only the first few components along with a residual statistic (say,
VI, V2, V3, V4 and X~ - V? - vl- vl- Vi) should be used. It is important to
avoid post mortem testing using what are discovered to be the most significant
components.

4.2.2 Partitioning the Pearson-Fisher chi-squared statistic


In the composite case, q unknown (nuisance) parameters (3 = ((31, ... , (3q) are
present, so the category probabilities must be expressed in terms of them, that
is, p = p((3). As before, Rayner and Best's (1989) Neyman smooth tests of
goodness-of-fit are obtained by deriving the score test statistic for

Ho: () = 0, versus
HI : °
() I-

using k-dimensional Neyman smooth alternative probabilities 7r = (7fl, ... , 7rm )


which must now be expressed as

For ~ the MLE's of the nuisance parameters (3, then p = p(~) and iI = H(~).
Now the score statistic is

where t = 'E(~).
Partitioning the Pearson-Fisher Chi-Squared Goodness-ai-Fit Statistic 49

Define the q x m matrix TV so that TVu,j = 8pj/8!3u (for u = 1, ... , q and


j = 1, ... , m) evaluated at fJ. For k = m - q - 1, requiring that fI satisfies
~~ ~~T ~~~T

Hp = 0, HW = 0 and HDH = I m - q - 1 (4.4)

means that the score test statistic Sk can be written as

Sm-q-l (N - nfJ)T fIT fI(N - nfJ)/n


(N - nfJf iJ-l(N - nfJ)/n = X~F.

See Rayner and Best (1989) for details.


This is X~F' the Pearson-Fisher chi-squared statistic (see equation (4.2)),
obtained by substituting MLE's for unknown parameters in the usual Pearson
chi-squared statistic. The m - q - 1 components of X~F are therefore given by

v = fI(N - nfJ)/vn = fIN/vn


since X~F = VTV. However, Rayner and Best (1989) do not describe how to
obtain fI from the restrictions in equation (4.4).
In Rayner and McAlevey (1990) and Rayner and Best (1990) some examples
are provided that use this construction. Even here however, only a set of
restrictions on the test statistic are provided, and although the component test
statistics have evidently been calculated in these examples, the method used to
do so is not discussed in sufficient detail that it can be duplicated.

4.3 Constructing the Pearson-Fisher Decomposition


There are many possible decompositions of the form outlined in Section 4.2.2.
Recently [Rayner (2000a)] I demonstrated how each possible decomposition
corresponds precisely with the orthonormal scheme used. This allows m - q - 1
component statistics Vq+1 , ... , Vm- 1 to be constructed, where Vr is a linear
combination of the r-th order orthogonal polynomials defined in equation (4.3).
The particular linear combination is chosen to ensure compatibility with the
simple case (where no nuisance parameters are present). See Rayner (2000a)
for details.
Define

Obtain the m - q - 1 non-zero eigenvalues AI, . .. , Am-q-l and normalized cor-


responding eigenvectors fl, ... , fm-q-l of F (arranged in non-decreasing eigen-
value order). Define the m x m matrix A = diag(Al, ... , Am-q-l, 0, ... ,0). Also
50 G. D. Rayner

let Ul = (il, ... , fm-q-l) and U = (Ul, U2 ) where U2 is an arbitrary m x (q+ 1)


matrix of normalized column vectors chosen to be orthogonal to il, ... , f m-q-l .
With U and A defined in this way, FU = U A and UU T = UTU = 1m so that

(4.6)

For Ho an (m - q - 1) x m matrix, choose its rows to be values of m - q - 1


selected orthogonal polynomials hr(j) (0 < r < m) evaluated at j = 0, ... , m-1
and defined by equation (4.3) above (for calculation details see Emerson, 1968).
Then define Fo = H(; Ho and obtain Uo and Ao from this Fo in the same way as
U and A are obtained from F (see equation (4.6)) to give Fo = UoAoU(;. Then
let

where Ao is the inverse of the first m - q - 1 rows and columns of Ao embedded


in the first m - q - 1 rows and columns of an m x m matrix of zeros. Using
H = HoM and

v = H(N - np)/Vii = HN/Vii = HoMN/Vii (4.7)

provides the desired m-q-1 component test statistics of X~F [Rayner (2000a)].
A program using the free statistical package R [Ihaka and Gentleman (1996)] to
obtain the component test statistics and p-values when testing for the normal
distribution is available from the author's website [Rayner (2000b)].

4.4 Simulation Study


A small size study was carried out to assess nominally 0.5%, 1%, 5%, and 10%
level tests for normality based on samples of size n = 10, 20, 30 and 50. For
each sample size, R = 10,000 simulated samples were taken from the Astanda~d
normal distribution. The tests examined were the component tests V3, ... , V6
under three different categorisation schemes:

(i) First, the uncategorised data was used, and the composite uncategorised
tests for normality of Rayner and Best (1989, chapter 6) were calculated
to obtain V3,u, ... , V6,u;

(ii) Then the data were moderately categorised into the ml = 10 classes
(-00,-3]' (-3,-2], (-2,-1.5], (-1.5,-0.5]''(-0.5,0]' (0,0.5]' (0.5,1.5]'
(1.5,2]' (2,3]' (3,00) to obtain V3,Cl"'" %,Cl using the method in Sec-
tion 4.3;
Partitioning the Pearson-Fisher Chi-Squared Goodness-of-Fit Statistic 51

(iii) Finally, the data were coarsely categorised into m2 = 6 classes (-00, -2],
(-2, -1], (-1,0]' (0,1]' (1,2]' (2, (0) to obtain V3,C2"'" V5,C2' also using
the method in Section 4.3.

While implementing my method of Section 4.3, there were infrequent problems


when some of the smaller sized samples were categorised into m2 = 6 classes in
such a way that the WD- 1WT term in equation (4.5) was singular (1.99% of
samples when n = 10 and 0.03% of samples when n = 20). This problem seldom
occurred for the more moderate categorisation into m1 = 10 classes, though
for n = 10, 0.06% of all samples caused problems under both categorisations.
Samples caused problems with the moderate categorisation only when they also
caused problems with the coarser categorisation.
For these problem samples, only two adjacent central categories had any
observations. This meant that since fl and 0- were being fit to the sample,
the estimated scale parameter 0- could be made sufficiently small so as to all
but completely match the observed and expected counts, giving an extremely
small chi-squared statistic X~F (the largest AXJF ~bserved for these samples
was 1.21 * 10- 13 ). However, as 0- --t 0 the W D- 1WT term in equation (4.5)
(and thus the estimated covariance matrix I;) becomes singular, preventing the
decomposition of X~F' Due to numerical limitations, this started to occur
when 0- ~ 0.1. In practice, the estimated covariance matrix would be singular
for this reason only if an unrealistic categorisation regime was used.

4.5 Results and Discussion


Figure 4.1 compares the sampling distribution of the component statistics Vr u,
Vr,Cl' andVr,c2 (r = 3, ... , 6) for samples of size n = 20 taken from the standa~d
normal distribution. The asymptotic normality of the Vr rv N(O, 1) statistics
seems to be manifesting itself already. Table 4.1 shows simulated component
test sizes for various different categorisations, sample sizes and nominal signif-
icance levels using the asymptotic XI critical values. Table 4.2 gives simulated
critical values for these tests.
Carolan and Rayner (2000) observe that when performing composite tests
for normality (where the mean and standard deviation must be estimated) the
uncategorised smooth test statistics converge only slowly to their asymptotic
XI distribution. Tables 4.1 and 4.2 confirm that this is also the case for my
composite categorised smooth test statistic components. The component test
sizes are almost always too small and the asymptotic distribution overestimates
the critical value almost every time. In practice using this asymptotic critical
value would result in a p-value that is too small so normality would be rejected
too often.
52 G. D. Rayner

nUL HUL I;1Jl nLL


Cl

-4
1111
-2 0 2

values of V3 u
<1.
Cl

--4
IIII
-2 0 2

values of V4 u
4
011111
-4 -2 a 2

values of V5 __ u
4
011111
-4-2024

values of V6 u

nLLnLLHLLHLL
o
-4-2024
I I I I a I
-4
I
-2
I
0
I
2

values of V4_ c1
4
0 I
-4-2024
I I I

values of V5_c1
I 0 I I
-4-2024
I I i

HLLnLLHLL
o I
-4-2024
I I I

values of V3 __ C2
I 0 I
-4
I
-2
I
0
I
2
I
4
0 I
-4
I
-2 a
I I
2

values of V5_ c2
4
I

Figure 4.1: Sampling distribution of the V3, V4 , V5, V6 statistics obtained from
R = 10,000 samples of size n = 20 taken from the standard normal distribution.
The top row is for the uncategorised data (u) using Rayner and Best's method
(1989, Chapter 6), and the other rows use my categorised method (Section 4.3)
with ml = 10 categories (middle, Cl) and m2 = 6 categories (bottom, C2)

Interestingly, this is in contrast to the fairly fast convergence of the distri-


bution of Vr,c
that has been observed for simple (when there are no nuisance
parameters to estimate) categorised data. See, for example, the discussion of
size in Rayner and Rayner (2000). Their discussion is concerned with testing
for uniformity, but in the absence of nuisance parameters the distribution func-
tion transformation can be used to state any goodness-of-fit problem in terms
of testing for uniformity.
'"tJ
~
M-
......
M-
§.
~.
g:
(1)
Tabl~ 4.1: Simulated percentage test sizes using (asymptotic) critical values for the component tests V},
xi f:~2, v,l,
and V62 under different categorisations of the data: the uncategorised method (u) [Rayner and Best (1989, Chapter 6)], ~
and my method under two different categorisation schemes (c\ and (:2) ~
Nominal size (significance level a) g I
Tests 0.5% 1% 5% 10%
n 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50
31
U - llncategorised ~
Vl u 0.01% 0.22% 0.48% 0.55% 0.05% 0.47% 0.89% 0.99% 118% 3.05% 3.87% 4.33% 3.52% 6.29% 7.40% 8.48% ~
-
v4,;- u 0.00% 0.34% 0.60% 0.76% 0.00% 0.53% 0.84% 1.13% 0.08% 1.38% 1.90% 2.65% o.ao% 2.13% 3.02% 4.12% g
v. 2 0.00% 0.02% 0.13% 0.49% 0.00% 0.02% 0.24% 0.61% 0.10% 0.a5% 0.87% 1.43% 0.97% 1.35% 1.99% 2.82% 'I.
~5.u
v.62 u 0.00% O.oJ% 0.02%
1
(U5% 0.00% 0.01% 0.04% 0.18% 0.01% 0.19% 0.29% 0.65% 0 ..~5% 0.90% 1.02% 1.63% ~
~
CJ - cat.egorised, 7T/.1 = 10
i;23,CI 0.12% 0.25% O.:IG% DA1% D.2!)% (J.(iO% o.m% 0.80% 2.4:1% :1.72% 4.20'){, t1.[)~)%) 5.7fi% 7.7G% !).I!J% !J.(iU% ~
2
Q
v 0.07% 0.50% 0.5G% 0.53% 0.14% 0.71% 0.85% 0.89% 0.76% 2.18% 3.00% a.8]% 2.05<j{) 5.37% G.5:1% 8.1:1%
v.~CJ 0.02% 0.17% 0.58% 0.89% 0.02% 0.34% 0.89% 1.24% 0.60% 1.53% 2.85% 3.5:3% 2.88% 3.72% 5.1a% G
"SiC!
6.a5% o
VnIC ] 0.04% 0.09% 050% 0.89% 0.11% 0.19% 0.74% l.a2% 1.]5% UO% 2.14% :U(i% 2.84% :1.57% 4.52% G.9G% o
C2 - categorised, m2 = 6 S
(12 0.13% 0.2a% 0.46% 0.27% 0.'15% 0.58% 0.81% 0.79% 3.35% 4.27% 5.09% 4.77% 8.19% 9.54% 10.60% 1032%
J{2 0.08% 0.:.11% 0.3]% 0.32% 0.43% 0.71% 0.69% 0.79% 2.90% 3.82% 4.05% 4.74% 4.8n% 7.97% 9.48% 9!J9%
~
o
v.~C2 'i'">
0.13% 0.22% 0.:17% 0.(i5% (U7% O.!)f)% 087% 1.00% 2.~()% :1.71% 4.2:1% 5.0r,% ().f)()CX) 8.:1~% !).()O,){, 10.42%
5.c·"
3!
M-

"Cr.l
M-
Il:>
M-
(jj.
M-
e=;.

CJ1
CJ-:)
Ci1
,.,.
Partitioning the Pearson-Fisher Chi-Squared Goodness-oE-Fit Statistic 55

References
1. Carolan, A. C. and Rayner, J. C. W. (2000). A note on the asymptotic
behaviour of smooth tests of goodness-of-fit, Manuscript in preparation.

2. Emerson, P. L. (1968). Numerical construction of orthogonal polynomials


from a general recurrence formula, Biometrics, 24, 695-701.

3. Ihaka, R. and Gentleman, R. (1996). R: a language for data analysis and


graphics, Journal of Computational and Graphical Statistics, 5, 299-314.

4. Neyman, J. (1937). "Smooth" test for goodness of fit, Skand. Aktuari-


etidskr., 20, 150-199.

5. Pearson, K. (1900). On the criterion that a given system of deviations


from the probable in the case of a correlated system of variables is such
that it can reasonable be supposed to have arisen from random sampling,
Philosophical Magazine, 5th ser., 50, 157-175.
6. Rayner, G. D. (2000a). Components of the Pearson-Fisher chi-squared
statistic, Statistics fj Probability Letters (submitted).

7. Rayner, G. D. (2000b). Pearson-Fisher chi-squared decomposition pro-


grams and examples [Web documents],
<http://www3.cm.deakin.edu.aurgdrayner/mypage/research/findH/>.
[Accessed: 30/11/2000].

8. Rayner, J. C. W. and Best, D. J. (1989). Smooth Tests of Goodness of


Fit, New York: Oxford University Press.
9. Rayner, J. C. W. and Best, D. J. (1990). Smooth tests of goodness of fit:
an overview, International Statistical Review, 58, 9-17.

10. Rayner, J. C. W. and McAlevey, 1. G. (1990). Smooth goodness of fit


tests for categorised composite null hypotheses, Statistics fj Probability
Letters, 9, 423-429.
11. Rayner, G. D. and Rayner, J. C. W. (2000). Class construction in Neyman
smooth categorised testing for uniformity, Communications in Statistics-
Simulation and Computation (submitted).
5
Statistical Tests for Normal Family in Presence of
Outlying Observations

Alcha Zerbet
Universite Bordeaux 2, Bordeaux, France

Abstract: A package of programs in the Fortran software is available for sta-


tistical analysis of normal data in presence of outlying observations. At first the
Bol'shev test, based on the Chauvenet rule, is applied for detecting all outlying
observations in a sample. After the chi-squared type test, based on the statistic
of Nikulin-Rao-Robson-Moore with the Neyman-Pearson classes for grouping of
data, is applied for testing of normality. We include a practical application of
our software to treat the data of Milliken and the data of Daniel. The power of
the test for testing normality against the family of logistic distributions, formed
on the Neyman-Pearson classes, is also studied.

Keywords and phrases: Bol'shev test, chi-squared testing, Chauvenet rule,


logistic distribution, maximum likelihood estimator, Neyman-Pearson classes,
Nikulin-Rao-Robson-Moore statistic, normal distribution, outliers

5.1 The Chi-Squared Test of Normality in the


Univariate Case
Consider the problem of testing the hypothesis Ho according to which the dis-
tribution function of the independent identically distributed random variables
Xl, ... ,Xn is <I>(X~fL), IMI < 00, (J> 0, where <I>(x) is the distribution function
of the standard normal law.
Let p = (PI, ... , Pr) be the vector of positive probability, PI + ... + Pr = 1,
and define the Xi by

Xo=-oo, xr=+oo, Xj=<I>-l(Pl+ ... +Pj), j=l, ... ,r-l;'P(x)=<I>/(x).

57
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
58 A. Zerbet

The maximum-likelihood estimator of () = (J.L,O") is en = (Xn , s~), where

2
sn = -1 ~ - 2
LJXi - Xn) .
n i=l

Let
i = 1, ... ,n.

We note that under Ho the statistic Yi follows the so-called Thompson distrib-
ution with n - 2 degrees of freedom:
n-4
1 f(n2"l) jY ( y2)-2
P{Yi ::; y} = Tn -2(y) = 2 1 - -- ,
J7r(n - 1) r(n2" ) -y'n-l n- 1
Iyl < Vn=1,
)
which does not depend on J.L and 0"2 ; Tn -2 (x~;n is the MVUE for <I> ( x~J.I: ).
We consider v* = (vi, ... , v;) the frequency vector obtained by grouping Yl , }2,
... , Yn over the intervals (XO,Xl], (Xl,X2], ... , (Xr-l,X r ). For testing Ho we con-
sider, following Drost (1988) and Zhang (1999), the statistic of Nikulin-Rao-
Robson-Moore given by

where
X2 = t
i=l
t : ~Pi)2,
(v
Pt
a(v*) = t
j=l
vj(<p(Xj) -. <p(Xj-l))
PJ

(3(v*) = t vj( -Xj<p(Xj) +.Xj-l<P(Xj-l)),


PJ

t t
j=l

>'1 = 1- (<p(Xj) - ~(Xj_l))2, >'2 = 2 _ (-Xj<P(Xj) + Xj_l<P(Xj_l))2,


PJ PJ

t
j=l j=l

>'3 = (<p(Xj) - <p(Xj-l))( -Xj~(Xj) + Xj-l<P(Xj-l)).


j=l PJ

Theorem 5.1.1 . Under Ho, the statistic Y; has in the limit, when n ----t 00,
the chi-squared distribution with r - 1 degrees of freedom.

Remark 5.1.1 We recommend choosing r such that


. 1
r ::; mm( -, logn),
a
Statistical Tests for Normal Family 59

where a is the significance level (0 < a < 0.5). Note that Sturges' empirical
rule suggests
r = 1 + logn.
With this choice of r, the expected number of observations in each class is not
small. If there is no alternative for H a, then it is reasonable to choose Pi = l/r.
For more details, see Drost (1988), and Greenwood and Nikulin (1996).

5.1.1 Example: Analysis of the data of Milliken


We consider the data of Milliken [Linnik (1962)] from his famous experiment
on the evaluation of the charge of an electron :
4.781 4.795 4.769 4.792 4.770 4.775 4.772 4.791 4.782 4.767 4.764 4.776
4.771 4.789 4.772 4.789 4.764 4.774 4.778 4.791 4.777 4.765 4.785 4.805
4.768 4.801 4.785 4.783 4.808 4.771 4.809 4.790 4.779 4.788 4.772 4.791
4.788 4.783 4.740 4.775 4.761 4.792 4.758 4.764 4.810 4.799 4.799 4.797
4.790 4.747 4.769 4.806 4.779 4.785 4.790 4.777 4.749 4.781

We suppose, our hypothesis Ha, that the data are the realizations of a normal
N(/-L, (J'2) sample of size 58. On the basis of this data we have Xn = 4.7808
and s; = 22980 * 10-8 . To test Ha, we construct a chi-squared test based on
the statistic Y; with equiprobable classes. This hypothesis should be rejected
if Y; > X a , where Xa is the a-upper quantile of the chi-squared with (r - 1)
degrees of freedom. If we choose, for example, r = 3 and a = 0.1 then the
results of computations are as follows:

So, in this case we have Xa = 4.6052. Since the observed value of Y528 is less
than Xa then we accept the hypothesis of normality Ha.

5.2 Bol'shev Test for Outliers


We present here the test of Bol'shev [see, Bol'shev and Ubaidullaeva (1974)],
based on the rule of Chauvenet (1863) for the detection of outlying observations
in a set of the experimental data. We underline also here the importance of the
test of Bol'shev which does not suppose that one knows the exact number of
the outliers, but only that it does not exceed a maximum number s, contrary
to the other tests, like those of Chauvenet (1863), Pearson and Chandrasekar
(1936), Grubbs (1950) and Wilks (1963) which suppose in advance that one
knows this number exactly.
60 A. Zerbet

5.2.1 Stages of application of the test of Bol'shev


Being given a normal N(/-l, 0") sample X = (Xl, X2 .. , X n ), first of all, we con-
struct the statistics VI, ... , Vn which are uniform on the interval [0, n], where

Vi = { n[l - Tn -2(Yi)], in the unilateral case,


for i = 1, ... ,n.
n[l - Tn -2(IYiI)], in the bilateral case,

After using the statistics VI, ... , Vn , we construct the vector of the order sta-
tistics
V(.) = (V(l), ... , V(n)) , V(l) ~ ... ~ V(n).

V-c .)
Then, supposing a is fixed (0 ~ a ~ 0.5), we compute j(i) for all Xi,i
1, ... ,n, where j(i) is the number of V(j) corresponding to Xi. If

V-c·)
...1...!:.... a
<_
j(i) - .x'
then we declare that Xi is an outlier observation (.x = 1 in the unilateral case;
.x = 2 in the bilateral case).
5.2.2 Example 2: Analysis of the data of Daniel (1959)
Daniel records the results of an experiment factorille form 25 :5 factors, each one
on two levels, where the 31 contrasts, in ascending order of the absolute value,
are
o
0.2947
I 0.0281 1-0.0561 1-0.08421-0.09821 0.1263
-0.3087 0.3929 0.4069 0.4209 0.4350
I 0. 1684 1 0.1964
0.4630 ·0.4771
I 0.22451-0.2526
0.5472 0.6595
0.7437 -0.7437 -0.7577 -0.8138 ·0.8138 -0.8980 1.080 -1.305 2.147 -2.666
-3.143

By applying the test of chi-squared for testing normality to these data (a = 0.1),
we obtain for r = 4 the following results :

Since the value of the statistic Y;is higher than the quantile Xa = 6.2514 of
the law of chi-squared with 3 degrees of freedom corresponding to the level of
significance a = 0.1, we must reject the null hypothesis.
Carrying out the test of Bol'shev to detect the outliers on the same level
of significance a = 0.1, we conclude that the observation X31 = -3.143 is an
outlier. We apply again the chi-squared test on the remainder of the data after
elimination of outlier X 31. At the end we obtain the following results:
Statistical Tests for Normal Family 61

This time, we must accept the null hypothesis as long as Y}o < 6.2514, noting
that earlier the rejection of the hypothesis Ho was due to the presence of one
outlier.

5.3 Power of the Chi-Squared Test


In this section, we wish to test the hypothesis H o, against H n , according to
which the density function of Xl is the mixture between the normal and logistic
density
.!. 1 )Lp(x - I-l) + _l_g(x - I-l),
fn(x - I-l) = (1 _ _
(j (j fo (j (j (jfo (j
where g is the logistic density function and

Since we test Ho against H n , it is better [see, for example, Aguirre and Nikulin
(1994)] to consider the Neyman-Pearson classes hand h for grouping data,
where

X-I-l X-I-l} X-I-l X-I-l }


h = { x: cp(-(j-) ~ fn(-(j-) and h = { x: cp(-(j-) < fn(-(j-)

for which we need to solve the equation

X --
cP ( -
(j
I-l) = f (x -(j
n --
I-l) , W h'ICh'IS . 1ent to (0(X
eqmva y
- I-l) __ g(x - I-l).
(j (j

Let ai = ai(Bn ), (i = 1, ... , m) be the roots of the last equation and let

l aj (iJ) 1
, - fn(
aj_l(B) Sn
x-
Sn
Xn
)
V
1
= Pj + r.:;Cj(()n), (j = 1,2, ... , m).
n
A

For example, on the Figure 5.1 we have

The limit distribution of the test-statistic is given by the following


62 A. Zerbet

Figure 5.1: Neyman-Pearson classes

Theorem 5.3.1

where

the elements gkl of G are

and

and hz are the elements of the Fisher information matrix.

The asymptotic distribution of the statistic Y;


is the noncentral chi-squared
distribution which can be approximated for small A by a central chi-squared
distribution using Patnaik's method.
Statistical Tests for Normal Family 63

Acknowledgements. This research was supported by the Conseil Regional


d'Aquitane, Grant 20000204009, and by the Comite Mixte Inter Universitaire
Franco-Marocain (C.M.I.F.M.), NO 00/211/MA.

References
1. Aguirre, N. and Nikulin, M. S. (1994). Goodness-of-fit test for the family
of logistic distributions, Questiio, 18, 317-335.
2. Bol'shev, L. N. and Ubaidullaeva, M. (1974). Chauvenet's test in the
classical theory of errors, Theory of Probability and its Applications, 19,
683-692.
3. Chauvenet, W. (1863). A Manual of Spherical and Practical Astronomy,
II, Philadelphia.
4. Daniel, C. (1959). Use of half-normal plots in interpreting factorial two-
level experiments, Technometrics, 1, 311-341.
5. Drost, F. C. (1988). Asymptotics for generalised chi-square goodness-of-
fit tests, In Amsterdam: Centre for Mathematics and Computer Sciences,
CWI tracts, 48.
6. Greenwood, P. and Nikulin, M. (1996). A Guide to Chi-squared Testing,
New York: John Wiley & Sons.
7. Grubbs, F. E. (1950). Sample criteria for testing outlying observations,
Annals of Mathematical Statistics, 21, 27-58.
8. Linnik, Yu. V. (1962). The Method of Least Squares and the Principles of
the Mathematical-Statistical Theory of Processing of Observations, Second
revised and augmented edition. Moscow: Gosudarstv. Izdat. Fiz. -Mat.
Lit.
9. Moore, D. S. and Spruill, M. C. (1975). Unified large-sample theory of
general chi-squared statistics for tests of fit, Annals of Statistics, 3, 599-
616.
10. Pearson, E. S. and Chandrasekar, C. (1936). The efficiency of statistical
tools and a criterion for rejection of outlying observation, Biometrika, 28,
308-320.
11. Rao, K. C. and Robson, D. S. (1974). A chi-squared statistic for goodness-
of-fit tests within the exponential family, Communications in Statistics,
3, 1139-1153.
64 A. Zerbet

12. Zhang, B. (1999). A chi-squared goodness-of-fit test for logistic regression


models based on case-control data, Biometrika, 86, 531-539.

13. Wilks, S. S. (1963). Multivariate statistical outliers, Sankhya, Series A,


25, 407-426.
6
Chi-Squared Test for the Law of Annual Death
Rates: Case with Censure for Life Insurance Files

Leo Gerville-Reache
Universite Bordeaux 2, Bordeaux, France

Abstract: The object of this article is to set up a chi-squared test in the


case of a law of multidimensional binomial type when the observations are
censured. This problem arises, for example, in the construction of a goodness-
of-fit test for the law of annual death rates used in life insurance to calculate
various premiums. For the non-censured case with a complete example for the
Makeham law, see Gerville-Reache and Nikulin (2000).

Keywords and phrases: Chi-squared test, censure, composite hypothesis,


conditional probabilities

6.1 Introduction
Consider an individual of age x (in years) at time 0 taken as the origin. Denote
by Tx his residual lifetime counted from this origin which is a random variable.
One characterizes the law of probability of Tx by the probability of death:

t> 0, x> o. (6.1)

Then, we define the annual death rate


x> o. (6.2)

Also we introduce the instantaneous death rate J.Lx as


1
qx = 1 - exp ( - lx+ J.Ly dy ) , x> o.

Makeham proposed the following formula:

a > 0, (3 > 0, c > 1. (6.3)

65
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
66 L. Gerville-Reache

So, we obtain easily the annual death rate for Makeham law as

(c - 1)
qx=l-exp ( -a-(3log(c)c
x) . a > 0, (3 > 0, c > 1. (6.4)

6.2 Chi-Squared Goodness-of-Fit Test


6.2.1 Statistics with censure
One observes over a period of a year a population of N people that are supposed
to be independent compared to the property "death". One gathers the people
of the same age and one obtains at the beginning of the observation w groups
of people: the group of age G x contains lx people of age x ( x = 0, ... ,w - 1).
It is supposed that one does not observe people of age higher than w. Thus, we
obtain w independent groups. In the group G x , each individual is supposed to
have the same probability qx of dying during the year.
Let tix be the relative date of input in the study of the individual ix of age
x and Six its date of output (death or disappearance of files) with S tix < °
Six S 1. Let Ix be the set of indices of individuals of age x whose deaths were
observed during the time studied. This is the principle of censure.
At the end of the observation, one counts for each group G x , the number of
deaths D~, using which we define the annual death rate observed in age x [see
Gerber (1990, pp. 109-114)] as

(6.5)

We notice that if Six = 1 (\I ix not belonging to Ix) and tix = (\I ix), then
Q~ = D~/lx (this is the case without censure). It is supposed that censure is
°
independent of death and age.
Let
1 Ix _ 1 Ix 1
Sx = l L Six , tx = l
x ix=l
L
tix and SIx = D*
x ix=l
Six; L
x Ix

we obtain then
* D~
Qx = lx(Sx - t x ) + D~(l - SIx)

As D~ is small compared to lx and for a reasonable rate of death, D~ (1- SIJ


is negligible compared to l~ = Ilx(sx - tx)l, we note that
Chi-Squared Test for the Law of Annual Death Rates 67

Let Tix be the residual lifetime of the individual ix, then

qx = P {Tix ~ 11Tix > O} . (6.6)


It is reasonable then to consider the following approximation:

qix = P {Tix ~ Six ITix > tix} ~ (Six - tix)qx. (6.7)


We obtain by taking an average that D; follows approximatively a binomial
law with parameters
(lx; (sx - tx)qx).
For rather large D;, we obtain that D; follows asymptotically a normal law with
parameters (l;qx; l;qx(1- qx)). Because 1 - qx is very close to 1 - qx(sx - t x )
for a reasonable censure, we deduce from it that Q; is thus an efficient and
asymptotically normally distributed estimator of qx .
We obtain naturally that

X2 = L
w-l (D* [*
x - xqx
)2
(6.8)
w x=o l~qx(1- qx))
Follows asymptotically a chi-squared law with w degrees of freedom.

6.2.2 Goodness-of-fit test for a composite hypothesis


One wants to test the composite hypothesis Ho according to which qx comes
from a parametric family of functions

qx = qx(O), where 0 = Uh, e2 , ... , es)t E e ~ RS ,


with s < w. In this case, we must initially build an estimator eof e, and then
to establish the asymptotic distribution of

(6.9)

It should be noted here that the statistic X~(lj ) is different from the tra-
ditional statistic of Pearson insofar as we define a goodness-of-fit test on con-
ditional probabilities. We saw previously that D~ are independent random
variables which follow binomial laws with parameters lx and (sx - tx)qx.
The likelihood function of (Do, Di, ... , D:_1)t is
w-l
L(e) = II CE; [(sx - tx)qx(lJ)]D; [(1- (sx - tx)qx(O))]I.x-D;. (6.10)
x=o
One takes the estimator which maximizes the likelihood function

e= argmax
e
L(e). (6.11)
68 L. Gerville-Reache

Theorem 6.2.1 Suppose that qx(O), a function of 0 = (e l , e2, ... , es)t E e ~


R S , admits partial continuous derivatives. Also, let us suppose that
·
11m maxx(l~) ( l·
. (l*):::; c a strict y pos~tive constant.
)
N*=(sx-tx)2N-'>oo mInx x
Let ax = l~/N*(';:::j lx/N) and

F = IIY: ax 8qx(O) 8qx(O) I


x=o qx(O)(l - qx(O)) 8e 8e) sxs
2

Then, for N* - t 00, (8x - tx)viN(e - 0) asymptotically follows a normal law


N s ( Os, F- l ), and X~(e) asymptotically follows a chi-squared law with w - 8
degrees of freedom.

6.3 Demonstration
Let q~(O) =(8 x - tx)qx(O). We have seen that the likelihood function of
(Do,Di, ... ,D~_l)t is
w-l
L(e) = II CE; [q;(O)]D; [1 - q;(O)]Z;-D; (6.12)
x=o
which yields
w-l
In [L(e)] = L
[In [cE;]
+ D~ ·In [q~(e)] + (lx - D~) ·In [(1 - q;( e))1] .
x=o
Under the assumption that qx(e), a function of e = (el, e,2 , ... , es)t ~ e ~ RS,
admits partial continuous derivatives, a necessary condition for e to be the
maximum likelihood estimator of e is
8
8e In [L(e)] = Os

{:} 8~i In [L( e)] = 0, Vi = 1, ... ,8

w-l D* 8 (l - D*) 8
{:} x=o
L qx*(~). 8e q ;(e) - 2
(1 ~ *(;)). 8e q ;(e)
qx 2
= 0, Vi = 1, ... ,8
Chi-Squared Test for the Law of Annual Death Rates 69

If Dx is the number of deaths which would have been observed without


censure, then we have the approximate equality Dx ~ D;/(sx - t x ). Moreover,
qx(f)) is small as compared to 1. So, with a reasonable censure, we have

Hence,

This is the equation which checks whether the estimator is a maximum like-
lihood estimator in the case when there is no censure as shown by in Gerville-
Reache and Nikulin (2000). Therefore, we can follow the derivation of the as-
x3
ymptotic law of the statistic (9) of the non-censured case simply by replacing
N by N*, lx by l;, Dx by D;, and Qx by Q; in the result of Gerville-Reache
and Nikulin (2000).

References
1. Chernoff, H. and Lehmann, E. L. (1954). The use of maximum likelihood
estimates in chi-square test of goodness-of-fit, Annals of Mathematical
Statistics, 25, 579-586.

2. Courgeau, D. and Lelievre, E. (1989). Analyse demographique des biogra-


phies, Edition de l'INED, Paris.

3. Gerber, H. U. (1990). Life Insurance Mathematics, New York: Springer-


Verlag.

4. Gerville-Reache, 1. and Nikulin, M. S. (2000). Analyse statistique du


modele demogra-phique de Makeham, Revue Roumaine de Mathematiques
Pures et Appliquees (to appear).

5. Greenwood, P. E. and Nikulin M. S. (1996). A Guide to Chi-Square


Testing, New York: John Wiley & Sons.

6. Petauton, P. (1991). Theorie et pratique de l'assurance vie, Paris: Dunod.


70 L. Gerville-Heache

7. Voinov, V. G. and Nikulin, M. S. (1993). Unbiased Estimators and Their


Applications-J: Univariate Case, Dordrecht: Kluwer.
PART III
GOODNESS-OF- FIT TESTS FOR
PARAMETRIC DISTRIBUTIONS
7
Shapiro- Wilk Type Goodness-of-Fit Tests for
Normality: Asymptotics Revisited

Pranab Kumar Sen


University of North Carolina at Chapel Hill, North Carolina

Abstract: Regression type goodness-of-fit tests for normality based on L-


statistics, proposed by Shapiro and Wilk (1965), are known to possess good
power properties. However, due to some distributional problems (particularly
for large sample sizes), various modifications have been considered in the lit-
erature. The intricacies of these asymptotics are presented here in a general
setup, and in the light of that some theoretical explanations are provided for
the asymptotic power of related tests.

Keywords and phrases: BLUE, degenerate U-statistics, FOE, L-statistics,


regression GOF tests, SOADR

7.1 Introduction
An omnibus goodness-of-fit (GOF) test for normality, with nuisance IL, (j, loca-
tion and scale parameters, is due to Shapiro and Wilk (1965). Their ingenious
test is based on the regression of the observed sample order statistics on the ex-
pected values of order statistics in a sample of the same size from the standard
normal distribution. Based on extensive numerical studies it has been revealed
that their test has good power properties against a broad class of alternatives.
However, the actual distribution of their test statistic, even under the null hy-
pothesis, is quite involved; tables have been provided up to sample size 50,
and beyond that suitable approximations have been incorporated to approxi-
mate them well (Shapiro 1998). In this context, some asymptotic distributional
problems have also been discussed by De Wet and Venter (1973), though that
provides very little simplicity in this respect. It has been thoroughly discussed
in Shapiro (1998) that generally such asymptotic approximations entail some

73
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
74 P. K. Sen

loss of power. One of the objectives of the current study is to focus on some
asymptotics that would provide good explanation for this shortcoming.
Basically, the asymptotic distribution of the Shapiro-Wilk type of tests is
governed by a second-order asymptotic distributional representation (SOADR)
property that has been systematically presented in Jureckova and Sen (1996,
Ch. 4). Borrowing strength from such results, Jureckova and Sen (2000) con-
sidered a general class of GOF-tests for a class of underlying distributions (in-
cluding the normal one as a notable case), and proposed alternative tests based
on a pair of location estimators that are first-order equivalent (FOE). In their
set-up too, the SOADR results playa vital role. We refer to Jureckova, Picek,
and Sen (2001) for some numerical studies relating to such robust GOF tests.
In view of the fact that such GOF tests are for composite hypotheses and
the alternatives are not necessarily contiguous, there may not be an omnibus
test having better power property for the entire class of alternatives. In the
same vein, the usual (asymptotic) optimality properties of likelihood ratio type
tests may not be tenable here. As such, we find it more convincing to stress the
simplicity of the asymptotic null hypothesis distribution and other robustness
properties. In this context as well, there is a basic role of the SOADR results
most of which are known by this time. Along with the preliminary notion, the
Shapiro-Wilk (1965) type of test statistics are presented in Section 7.2. SOADR
results are presented in Section 7.3. In the light of these results, in Section 7.4,
the contemplated asymptotics are discussed. The last section is devoted to
some concluding remarks.

7.2 Preliminary Notion


Let Xl, ... , Xn be n independent observations from a distribution with a prob-
ability density function (pdf) f(x) = 0-- 1 fa((x - IL)/o-) where fa is a pdf free
from the nuisance location and scale parameters IL, and 0-. We denote the stan-
dard normal pdf by q;(x) and the corresponding distribution function by <p(x).
In GOF testing for normality, we set

Ho : fa = q; against H1 : fa i=- q;, (7.1)

treating IL, 0- as nuisance parameters. We arrange the Xi in an ascending order,


and denote the order statistics by Xn:1 < ... < X n:n . Also, in a sample of
size n from the standard normal distribution, we denote the order statistics by
Zn:i, i = 1, ... ,n. Let then

mni E{ Zn:i}, i = 1, ... ,n;


Vnij COV{Zn:i,Zn:j}, i,j = 1, ... ,n. (7.2)
GOF Tests for Normality 75

Also let mn = (mnl, ... , m nn )' and V n = ((Vnij)) be the vector of expected
order statistics and their covariance matrix respectively. Then the best linear
unbiased estimator (BLUE) of!J" is given by
n
CTn = L aniXn:i, (7.3)
i=l

where an = (anl, .. . , ann)' is given by

an = {mn n 1mn }-l (V-1


' V- n mn ) . (7.4)

Note that by virtue of the symmetry of <P, we have a~ln = 0, but a~an may not
be strictly equal to one. For that reason, Shapiro and Wilk (1965) considered
a modified estimator wherein they let
I V-1V-1
o = { mn
an n
1
nmn )
nmn }-1/2(V- (7.5)

so that a~ a~ = 1. We also note that the BLUE of /-l is Xn = n- 1 Ei=l Xi,


which is also the maximum likelihood estimator (MLE) of /-l. The MLE of !J" is
S~ where
n
S~2 = n- 1 L(Xi - Xn)2. (7.6)
i=l

The Shapiro-Wilk (SW) test statistic is then expressed as

(7.7)
i=l i=l

Note that Wn is bounded from above by 1 and non-normality is indicated by a


shift of the test statistic to lower values. For this reason, and from the SOADR
results to be presented later, we shall take the SW test statistic in the equivalent
form
W~ = (n - 1){1- W n }, (7.8)
rejecting the null hypothesis for higher values of W~. Later on, we shall con-
sider a slightly different form of W~ that appears to have a simpler asymptotic
distribution. But the present form serves better the discussion to follow. For
later convenience, we set

Xi=/-l+!J"ei, i=l, ... ,n, (7.9)

where the ei are independent and identically distributed random variables with
zero mean and unit variance; under HQ, the ei have the standard normal distri-
bution. Further, let en:i, i = 1, ... , n be the associated order statistics, so that
Xn:i = /-l + !J"en:i, i = 1, ... , n. We also use the notation

e~ = (en:l, . .. , e n :n )'. (7.10)


76 P. K. Sen

Note that under the null hypothesis e~:i has the expectation mni, so that rewrit-
ing W~ as
W* = (n _ 1) en*' [I n - n -1 In In' - anan
0 0'] en*
(7.11)
n en*'[1 n - n -11 n 1']
n en* '
we claim that its distribution is free from the nuisance parameters, and un-
der the null hypothesis W~ would be stochastically smaller (still nonnegative)
[as may be verified by invoking the idempotent nature of the two matrices in
the numerator and denominator of (7.11)]. Under alternatives, the expectation
vector would differ from mn, and as a result, the expectation of the numerator
quadratic form would be away from zero, so that W~ would be Op(n). This
provides the rationality of the SW-test. In the same way, for a suitable ap-
proximation to a;;" say denoted by b n , the corresponding analogous form of the
modified SW-test statistic can be expressed as .

W** = (n _ l)en*'[1n - n -1 1n1n' - b n b']n en* (7.12)


n en*'[1 n - n -111']
n n en* '

where because of the symmetry of F, we must have b~ln = 0, and by choice


b~bn = 1. In this context, we focus on the modified SW-test statistic considered
by Shapiro and Francia (1972) where b n is chosen as (m~mn)-1/2mn, and
for which other simple and good approximations are available. Shapiro (1998)
commented on some loss of power due to the use of such modified test statistics,
and through some modifications, we will suggest some improvements for large
n. A crucial factor in this context is the functional dependence of b n on mn
through a linear transformation (as is the case with the Shapiro-Francia (1972)
modification). There are some delicate asymptotic issues that involve second-
order asymptotics and will be discussed later. Thus the crux of the problem is
to study the distribution theory of the ratio of such quadratic forms in order
statistics. We refer to Shapiro and Wilk (1965) and Shapiro (1998) for an
extensive discussion of the small and moderate sample properties of their GOF
test statistics and its relationship with various approximations that have been
considered during the past three decades by a host of researchers. We intend
to concentrate on large sample properties and on ramifications of the Shapiro-
Wilk type of tests. Actually, based on some SOADR results to be presented
in the next two sections, we advocate a slightly different reformulation of the
SW-test statistic which has a simpler asymptotic null distribution, and this
also explains how similar reformulations can be made of the modified forms of
SW-test statistics, such as W~*.
GOF Tests for Normality 77

7.3 SOADR Results for BLUE and LSE


We reiterate the SOADR properties of the BLUE and the LSE of the scale
parameter, mostly along the lines of Jureckova and Sen (1996, 2000), and in-
corporate these findings in a unified treatise of the general asymptotics of the
SW-type GOF tests. We present some SOADR results here, avoiding derivation
by cross-reference to existing literature.
First consider the MLE (under normality) of (72, for which an unbiased
version is
n
Sn2 = (n - 1) -1"" - 2
~(Xi - Xn) . (7.13)
i=l
Note that if we consider a kernel g(x, y) = ~(x - y)2, then (72 = Ep{g(X1' X2)}
and
S; = (n) -1 L g(Xi' Xj) (7.14)
2 {l:Si<j:Sn}

is aU-statistic [Hoeffding (1948)] of degree 2. As such, if we use the classical


Hoeffding-decomposition of U-statistics, we obtain the following representation:

(7.15)

where letting gl(X) = E[g(X, Y)IX = x)] = ~[(x - J-l)2 + (72],


n n
U~l) = n- 1 L2[91(Xi) - Eg1(Xi)] = n- 1 L[(Xi - J-l)2 - (72)] (7.16)
i=l i=l

and writing g2(X, y) = g(x, y) - gl(X) - gl(y) + (72,

U(2)
n
= ( n)-l L g2(Xi,Xj )
2 {l:Si<j:Sn}

.!.(S; - (72) - [(Xn - J-l)2 - .!.(72]. (7.17)


n n

Note that U~l) is an average of n centered LLd. random variables, so that


the classical central limit theorem applies. In particular, when the dJ. F is
normal, we have VnU~l) asymptotically normal with zero mean and variance
2(74. Moreover, in this special case, we therefore claim that

(7.18)

where Zr = n(Xn - J-l)2/(72 has a chi square distribution with 1 degree of


freedom (DF), independently of S;,.
As a result, in the normal case, we obtain
78 P. K. Sen

the following SOADR result for S~.


1 1
8 2 /(72 = 1 + U(1)/(72 +- - - n-l
__ Z2 + 0 (n- 3/ 2). (7.19)
n n n-l 1 p

It is of course possible to exploit the exact distribution of (n - I)S~/(72 under


normality (which is chi square with n - 1 DF to obtain a parallel SOADR
result (under normality), but the above representation would be more useful
for alternatives as well as for the study of limiting distribution of the SW-GOF
test statistic in a unified way.
Let us next consider the case of the BLUE of (7. Recall that whereas S;is
an unbiased estimator of (72, the BLUE is for (7, and hence, its unbiasedness and
SOADR properties could be affected in estimating (72 instead 'of (7. For that
reason, we consider first the SOADR for the BLUE of (7 given by (7.3)-(7.4)
and modify that for the Shapiro-Wilk estimator of (72 with scores given by (7.5).
Note that ern in (7.3) is an unbiased estimator of (7, and can be expressed
as
(7.20)

where Fn(x) is the empirical dJ of the Xi, and the score function ¢n(u) con-
verges to a smooth score function 'ljJ(u) = q>-l(u), the quantile function of the
standard normal dJ., for all u E (0,1). In fact, 'ljJn(u) can be approximated
well by q>-l(nu/(n + 1)), u E (0,1); we refer to Jung (1955) for an excel-
lent motivation and useful illustration. Although, in Jung's case there was
a smooth score function while we have here the scores generated by the ex-
pected order statistics values and their covariance matrix, smoothness can be
imported up to the second order terms by using the Hajek (1968) projection
results and representations on the projected terms. Thus, side by side, we let
~(F) = I~ F-l(u)'ljJ(u)du and note that when F is normal (with a scale para-
meter (7), ~(F) = (7. Then, proceeding as in Section 7.4 of Sen (1981) [viz.,
(7.4.30) to (7.4.37)], we obtain that

ern - ~(F) = Znl + Zn2, (7.21)

where Znl = n- 1 Ef=l 'ljJl(Xi) with

'ljJl(X) = - 1: [I(y ~ x) - F(y)]¢(F(y))dy, x E R, (7.22)

so that Znl is an average of LLd.r.v.'s with zero mean and a finite positive
variance, say "(2. The component Zn2 involves second-order terms; by virtue of
unbiasedness of ern when F is normal, we have E(Zn2) = 0, for F normal, and
proceeding as in Theorem 4.3.1 of Jureckova and Sen (1996), it can be shown
that Zn2 = Op(n- 1 ). Actually, their Theorem 4.5.2 (p. 155) gives a SOADR
result for ern. In passing we may remark that when F is itself normal, we have
Znl = (1/2)U~1), where the latter is defined by (7.16).
GOF Tests for Normality 79

Let us now look into (7.5), and define Cn by letting

(7.23)

where e~ is defined by (7.9) and (7.10). Using the Cauchy-Schwarz inequality,


we have then
(7.24)

On the other hand, by definition,

(7.25)

so that the right hand side of (7.25) can not be greater than (n - 1). It is to
be noted further that by definition
n
m~mn = n- L Vnii = n - trace(V n), (7.26)
i=l

so that
(7.27)

where p;
= {(m~ V;;:-lmn)2}/{(m~mn)(m~ V;;:-2m n )} is bounded from above
by 1. Note further that

(7.28)

and as a result,
(7.29)
This displays the role of the score function an in relation to the constant Cn.
This result will be useful in the sequel.
For some b n other than a~, if we define c~ as in (7.23), we would have (7.24)
intact, though (7.25) and (7.27) would be somewhat different. Specifically,
parallel to (7.25), we would have

(7.30)

where b~bn = 1. In particular, if following Shapiro and Francia (1972), we


choose b n = (m~mn)-l/2mn, (3.18) reduces to

(7.31)
80 P. K. Sen

Note that by the Cauchy-Schwarz inequality,

(7.32)

and hence, we obtain that

(7.33)

Consequently, c~ 2: Cn, and this has a bearing on the asymptotic behaviour


of the Shapiro-Wilk type test. However, as we shall see later on, the basic
difference comes from the SOADR results for the associated linear estimates
of (T that are FOE but not necessarily BLUE. In this context, we need the
following results that are presented first.
Consider the case of a linear estimate of (T for which b n = Bnmn for some
symmetric p.d. matrix Bn such that b~ln = 0 and b~bn = 1. Note that by
virtue of the expression of the variance of the BLUE estimator [Jung (1955)],
we have m~ V;;:-lmn = 2n, while by (7.26), we have m~mn = n - trace(Vn ).
Therefore, we may write

(7.34)

We rewrite this equation as follows:

(7.35)

Or, in other words, 'Yn is an eigenvalue of V;;:-l with respect to the eigenvector
mn. As such, we have
(7.36)
and repeated use of this equation leads us to the identity

p~ = 1 so that c~ = Cn = n - trace (Vn), (7.37)

where p~ is defined after (7.27) and c~ refers to the specific case of the Shapiro-
Francia (1972) modification. Let us consider now the general form of c~ when
b n = Bnlbfmn where Bn is a symmetric positive definite matrix such that
l~Bnmn = 0 and m~BnBnmn = 1. Let then m~Bnmn = dnm~mn, so that
by similar argument we claim that dn is an eigenvalue of Bn with respect to
the eigenvector mn. Then, it foll,?ws from the above that for the entire class of
such B n , the corresponding c~ will be equal to en in (7.25), and by (7.29), we
write this as
Cn = (2n + 1) [n - trace (V n) ]j2n. (7.38)
This characterization is essentially based on the properties of the eigenvalues
and eigenvectors. On the other hand, for some other b n which can not be
GOF Tests for Normality 81

expressed exactly as Bnm n , we could write b n = Bnm~ for some scores vector
m~ that replaces mn. Although, for large sample sizes, mn and m~ could be
very close to each other (as is the case with some other modifications of the SW-
test proposed in the literature [Shapiro (1998)]), we would have an eigenvector
different from mn, and as a result, the corresponding p~ may not be strictly
equal to one, and this in turn may also cause perturbation in the associated c~.
This point will be made clear in the next section.

7.4 Asymptotics for W~

By virtue of (7.11), for the desired asymptotics, we may assume without any
loss of generality that J.L = 0, (J = 1. Thus, effectively, we work with the reduced
order statistics en:i; then 5;' can be defined as (n - 1)-le~[In - n-ll~ln]e~,
and we have the following SOADR result:
52 1 + U(l)
+ U(2)
n n
n'
U(2) = _1_{1 - Z2} + 0 (n- 3 / 2 ) (7.39)
n n-1 1 p ,

where U~l) is defined by (7.16) (but based on the en:i ), and Zr = ne~ has the
chi square distribution with 1 DF, independently of U~l). Note that 5;', Zl are
jointly sufficient statistics for the normal F, so that U~2) is also a function of
the sufficient statistics.
We rewrite a-n , defined by (7.3) and (7.4) as a- n = (m~ V;;:-le~)/(m~ V;;:-lm n )
and note that here a-n has expectation 1. Let then
n
Qni = E{a-nled =L anjE[en:jlei]
j=l
n
Lanjqnj(ei) = qn(ei), i = 1, ... ,n, (7.40)
j=l

so that by the Hajek (1968) projection result,


n
a-n - 1 L[qn(ei) - 1] + Rn;
i=l
n
E(a-n - 1 - L[qn(ei) - 1])2
i=l
(7.41 )
Further, comparing (7.41) with (7.21) and (7.22), we conclude that
n
L[qn(ei) - 1] = Zn1, (7.42)
i=l
82 P. K. Sen

where Znl is defined as in (7.22), but now based on the ei instead of the Xi.
We write
CJn - 1 = Znl + R n , and 2Zn l = U~l). (7.43)
Therefore, defining a~ as in (7.5), we have

where m~mn = n - trace(Vn ). Based on these SOADR results, we have by


some simple steps

w~ = (n -1) - (n - trace(Vn )){1 + 2Zn1 + 2Rn + Z;l - U~l) - U~2)


+U~1)2 - 2ZnlU~1) + Op(n- 3 / 2)}
(trace(V n) - I)}
+(n - trace(Vn)){(U~l) - 2Znl) - (U~l) - Znl)2
+(U~2) - 2Rn)} + Op(n-l/2)
(trace(Vn ) - I)}
1 1
+(n - trace(V n)){ (n _ 1) [1 - Zrl- 4u2 )2 - 2Rn} + op(I),
(7.45)

where in the last step, we make use of (7.43) and (7.39). It may be noted that
~nU~1)2 has asymptotically a chi square distribution with 1 DF, independently
of Zr, and hence, the distribution of W~, under the null hypothesis of normal-
ity, depends on (i) the nonstochastic trace(Vn) (which is 2: 1), and (ii) the
stochastic Zr, nU~1)2 as well as the residual Rn (which is Op(n- 1)). On using
the basic results of Hoeffding (1953) on expected order statistics, we claim that
as n ~ 00,
1 , 1
-(mnmn ) ~ 1, so that -trace(Vn) ~ o. (7.46)
n n
With this simplification, we note that

(7.47)

We shall see that W~o has a simpler asymptotic distribution. To see this, we
write

(7.48)
GOF Tests for Normality 83

where the last factor on the right hand side is, by (7.39), 1 + Op(n- 1/ 2 ). Conse-
quently, by the Slutzky theorem, W~o and W~* both have the same asymptotic
distribution, if they have any at all. To study this, we incorporate (7.39) and
(7.43), and conclude that W~* has the asymptotic representation

(7.49)

At this stage, noting that for normal F, E(Rn) = 0, we make use of Theorem
4.5.2 of Jureckova and Sen (1996) (after verifying that for the normal distri-
bution, the associated scores an satisfy the needed regularity conditions), and
obtain a SOADR for Rn. We may write

2nRn = n !! T(2)(F; x, y)d[Fn(x) - F(x)]d[Fn(Y) - F(y)] + op(1), (7.50)

where r(2)(F, .. ) stands for a functional (second) derivative of the L-functional


that appears in (7.20). Thus, keeping in mind the weak convergence of fo(Fn-
F) to a Brownian Bridge, we have a typical quadratic variation of a Wiener
process, for which a Cramer-von Mises type asymptotic distribution, as has
been suggested in De Wet and Venter (1973) and others hold. Combining all
these asymptotics, we conclude that as n increases, for a normal F,

W~o --7V L Ak(Z~ - 1), (7.51)


k2:1

where the Ak are nonnegative Fourier coefficients and the Zk are independent
standard normal deviates. Here Al = 1, A2 = 1/2. This form is in close
proximity with the form suggested by De Wet and Venter (1973) for some allied
forms of W n . We provide here a clear representation involving appropriate
SOADR results that have been studied in the literature, mostly, in the past
decade.
There are certain distinct advantages in writing W~o in terms of such a de-
generate U-statistics. It not only provides access to the study of the asymptotic
properties of the SW-test statistic, but also allows us to make use of suitable
resampling plans to generate asymptotic critical levels of the SW-test statistic
- a much needed task to make the SW-test applicable in large sample sizes. In
this context, we may refer to Huskova and Janssen (1993) where the validity of
bootstrapping for degenerate U -statistics has been critically examined, and we
may adopt with some advantage their methodology to generate the asymptotic
null distribution of W~o. The asymptotic distribution for W~ can be readily
obtained from that of W~o.
Let us now discuss the case of the Shapiro-Francia (1972) modification of
the SW-test statistic, considered in (7.29). If we define their estimate of (J by

(7.52)
84 P. K. Sen

and note that it is unbiased but not the BLUE of () (while a-n is BLUE),
exploiting the BLUE characterization, and noting that a-~ is asymptotically
first-order efficient (FOE), we can write

a-~ = 1 + Znl + Zn2 + Zn3 (7.53)

where Znl, Zn2 are defined as in (7.20) and Zn3 is orthogonal to Zn2 (and is
also Op(n- 1 )). Also the SOADR result applies to a-~ as well. As such, if we
proceed as in (7.19) through (7.50), we obtain the following representation for
w~~.
W~~-4D L Ak(Z~ - 1), (7.54)
k~l

where Ai = Al = 1, A2 = A2 = 1/2, and by the convolution property, Ak 2:


Ak, 'Ilk 2: 3. Comparing (7.50) and (7.54), we may gather that when F is
normal (i.e., under H o),

w~ is stochastically more dispersed than W~*. (7.55)

This explains why in considering a modified form of the SW-test statistic, there
is a need to adjust for the critical values, and doing that might make those
modifications more competitive with the SW-test itself. Of course, speaking
for the power properties, even for large sample sizes, we need to pay adequate
attention to the intricate asymptotics, and these are considered in the next
section.
The asymptotics for the Shapiro-Francia test in general go over to other
cases where b n are related to mn by suitable matrix multiplication, as has been
discussed after (7.37). However, we need to assume that they are FOE and ad-
mit SOADR. If we consider other b n that are related to various approximations
to mn and V n , as has been discussed in Shapiro (1998, pp. 481-482), the eigen-
values will be different, though quite close to the ones discussed earlier. Further,
because of the fact that in such a case, p; defined in an analogous way, will be
typically less than one (though quite close to 1), while n-l(m~mn) -4 1, we
could see that there will be additional variation due to p; being less than one,
and more so due to other terms that appear in the second-order expansion. As
such, the SW-approximation may not generally apply here very satisfactorily.
Although their critical values can be estimated by similar resampling methods,
because of more variable second-order terms, their distribution will be more
dispersed, and as a result, there could be some loss of power.
GOF Tests for Normality 85

7.5 Asymptotics Under Alternatives


The basic strength of the SW-test relates to its suitability for a very broad class
of alternatives (to the null hypothesis that F is normal), and in this context,
the location and scale parameters are treated as nuisance parameters. As such,
this class of alternatives does not include the conventional location-scale family.
In addition, such alternatives need not be of the contiguous type in the sense
that they may not be brought closer to the null hypothesis by considering a
suitable sequence of local parametric functions. For example, in testing for F
normal (with nuisance mean and variance), we may be interested in alternatives
that F is logistic, or Laplace or some other dJ., admitting nuisance location
and scale parameters. In order to exploit fully the properties of BLUE, a
minimal requirement for this class of alternatives is that the associated dJ.'s
admit finite second order moments so that the expected order statistics and the
associated dispersion matrix are defined properly. Also, the appropriateness of
the SW-type of tests when the null hypothesis relates to F being some other
nonnormal dJ. needs to be assessed properly. In that setup, though it might be
possible to obtain BLUE of the scale parameter, the sample variance may not
be generally the efficient estimator (MLE) of the (squared) scale parameter.
In that way, a stipulated BLUE estimator (specifically for the assumed F)
would be FOE but the conventional sample standard deviation may not be
FOE. As a result, the asymptotic distribution (and even the finite sample ones,
tabulated extensively for the normal case) could be quite different. Faced with
this situation, Jureckova and Sen (2000) considered an alternative approach
based on a pair of robust estimators of location parameter which are FOE
under the assumed F (under Ho), and developed GOF tests that are usable for
a broad class of F including the normal distribution as an important member.
The numerical work presented in Jureckova, Picek, and Sen (2001) casts light
on the competitive nature of such robust tests (though for testing for normality,
as expected, the SW-test performs slightly better).
Let us concentrate on the nature of asymptotics that crops up in the study
of the SW-type GOF-tests for normality. Let us denote the distribution under
the alternative by G(x) and its pdf by g(x); we assume that G admits finite
second moment so that the order statistics have finite expectations and variance
covariance terms. Also, without any loss of generality, we let the location
parameter be 0 and the scale parameter equal to 1. Compared to (7.2), we now
introduce the order statistics in a sample of size n from this standardized G by
Z~:i' i = 1, ... , n. Let then Tn = ({nl, ... , Inn)' where

Ini = E{Z~:i}' i = 1, ... ,no (7.56)

Let us recall the notation introduced in (7.3) through (7.7). Note that under
86 P. K. Sen

the alternative hypothesis, an is an unbiased estimator of


" _ (m~ V;ll'n)
Un - 1· (7.57)
(m~ V~ mn)

On the other hand, S;, defined by (7.13), is an unbiased and consistent esti-
mator of the second moment (say V2) of the standardized d.f. G (whose scale
parameter is taken as 1); V2 may not necessarily be equal to 1; take for example,
G logistic with unit scale parameter. Further, using the results of Hoeffding
(1953) on expected order statistics, we claim that

(7.58)

With these results in hand, let us define


1 )2
~ _ mnV; I'n
( I

(7.59)
mn V-
2
n - ( I
n mn )( I' n l'n I ).

Using the Cauchy-Schwarz inequality, it is easy to show that

o :s: ~n :S 1, 'lin, and G, (7.60)

while using the Hoeffding (1953) results, along with the fact [Jung (1955)] that
V;lmn can be well approximated by mn, we conclude that as n increases,

(7.61)

where Fa is the standard normal d.f., so that its second moment is equal to 1,
and by definition, V2 = J~(G-l(u))2du. As a result, it follows by some standard
steps that under the alternative that G is the true d.f.,

Wn ---> P .6., as n ---> 00. (7.62)

On the other hand, from the results in Section 7.4, we conclude that under the
null hypothesis of normality,

Wn = 1 - (n _1)-1 W~ --->P 1, as n ---> 00. (7.63)

As a result we conclude that the SW-test for normality is consistent for the
entire class of alternatives for which .6. is strictly less than one. This is the
case when the two quantile functions <I>-l(p) and G-1(p) (for p E (0,1)) do
not coincide for all values of p. In that way, the domain of consistency of
the SW-test includes all nonnormal distributions admitting finite second order
moments (so that the normal BLUE of cr converges to a limit other than the
scale parameter of such a distribution). This is certainly a very strong result in
GOF Tests for Normality 87

the sense that it includes separable families of alternatives in a very natural way,
and it includes mixture models also in the same vein. For example, against F
normal, we might be interested in the set of alternatives that it is contaminated
dJ., namely,
F(x) = (1-7])1>(x) + 7]H(x), (7.64)
where 7] > 0 is small, and H (x) has a heavier tail; it could be a normal dJ. with
a larger variance or even some other one like the Laplace that has a heavier
tail than a normal one. It is also possible to treat 7] as a sequence converging
to 0, and in that way local contamination models are also contemplated in this
setup. However, the consistency is a minimum requirement for any GOF test,
and it should not be overemphasized. There may not be a unique GOF test for
normality with power-optimality against such a broad class of alternatives. For
this reason, Jureckova and Sen (2000) discussed such asymptotic power pictures
for other tests.
As regards the consistency property of modified SW-type tests (as discussed
in earlier sections), the picture is the same. It is only with respect to power
properties there could be some difference. In view of (7.55), there is a need
to calibrate the critical levels of such modified test statistics (such as W~*)
as otherwise for local alternatives there might not be a perceptible difference
particularly when the sample size is large. However, if we consider a fixed
alternative (that is more appropriate in the present context), then the rate at
which the power function goes to one [in the Bahadur (1960) sense] might be
different. The basic difficulty for such a study stems from the fact that due
to their complicated null hypothesis distributions the exact Bahadur-slopes for
such statistics are not that simple to formulate, while the approximate Bahadur-
slope comparisons are known to be deficient in certain respects. As such, the
empirical evidence acquired from extensive numerical studies made so far [viz.,
Shapiro (1998)] should be used as a stepping stone for further comparative
studies.

Acknowledgements. This work was supported by the US-Czech Collabora-


tive Research Grant NSF INT-96000518. Thanks are due to the organizers of
the GOF 2000 Conference in Paris, France for the invitation and support for
the presentation of the manuscript in the meeting.

References
1. Bahadur, R. R. (1960). Stochastic comparison of tests. Annals of Math-
ematical Statistics, 31, 276-295.
2. De Wet, T. and Venter, J. H. (1973). Asymptotic distributions of quadratic
forms with application to test of fit, Annals of Statistics, 1, 380-387.
88 P. K. Sen

3. Hajek, J. (1968). Asymptotic normality of simple linear rank statistics


under alternatives, Annals of Mathematical Statistics, 39, 325-346.

4. Hoeffding, W. (1948). On a class of statistics with asymptotically normal


distribution, Annals of Mathematical Statistics, 19, 293-325.

5. Hoeffding, W. (1953). On the distribution of the expected values of the


order statistics, Annals of Mathematical Statistics, 24, 93-100.

6. Huskova, M. and Janssen, P. (1993). Consistency of the generalized boot-


strap for degenerate U-statistics, Annals of Statistics, 21, 1811-1823.

7. Jung, J. (1955). On linear estimates defined by a continuous weight func-


tion, Ark. Mat. Band 3, no. 15, 199-209.

8. Jureckova, J., Picek, J., and Sen, P. K. (2001). A goodness-of-fit test


with nuisance parameters: numerical performance, Journal of Statistical
Planning and Inference, (to appear).

9. Jureckova, J. and Sen, P. K. (1996). Robust Statistical Procedures: As-


ymptotics and Interrelations. New York: John Wiley & Sons.

10. Jurel}ova, J. and Sen, P. K. (2000). Goodness-of-fit tests and second-order


asymptotic relations, Journal of Statistical Planning and Inference, 91,
in press.

11. Sen, P. K. (1981). Sequential Nonparametrics: Invariance Principles and


Statistical Inference, New York: John Wiley & Sons.

12. Shapiro, S. S. (1998). Distribution assessment, In Handbook of Statistics,


Vol. 17: Order Statistics: Applications (Eds., N. Balakrishnan and C. R.
Rao) , pp. 475-494, Amsterdam: Elsevier.

13. Shapiro, S. S. and Francia, R. S. (1972). Approximate analysis of variance


test for normality, Journal of the American Statistical Association, 67,
215-225.

14. Shapiro, S. S. and Wilk, M. B. (1965). An analysis of variance test for


normality (complete samples), Biometrika, 52, 591-611.
8
A Test of Exponentiality Based on Spacings for
Progressively Type-II Censored Data

N. Balakrishnan, H. K. T. Ng, and N. Kannan


McMaster University, Hamilton, Ontario, Canada
McMaster University, Hamilton, Ontario, Canada
The University of Texas at San Antonio, San Antonio, Texas

Abstract: There have been numerous tests proposed in the literature to de-
termine whether or not an exponential model is appropriate for a given data
set. These procedures range from graphical techniques, to tests that exploit
characterization results for the exponential distribution. In this article, we pro-
pose a goodness-of-fit test for the exponential distribution based on general
progressively Type-II censored data. This test based on spacings generalizes a
test proposed by Tiku (1980). We derive the exact and asymptotic null dis-
tribution of the test statistic. The results of a simulation study of the power
under several different alternatives like the Weibull, Lomax, Lognormal and
Gamma distributions are presented. We also discuss an approximation to the
power based on normality and compare the results with those obtained by sim-
ulation. A wide range of sample sizes and progressive censoring schemes have
been considered for the empirical study. We also compare the performance of
this procedure with two standard tests for exponentiality, viz. the Cramer-von
Mises and the Shapiro-Wilk test. The results are illustrated on some real data
for the one- and two-parameter exponential models. Finally, some extensions
to the multi-sample case are suggested.

Keywords and phrases: Exponential distribution, goodness-of-fit, lifetime

8.1 Introduction
The exponential distribution is one of the most widely used life-time models
in the areas of life testing and reliability. The volume by Balakrishnan and
Basu (1995) [see also Johnson, Kotz, and Balakrishnan (1994, Chapter 19)]
provides an extensive review of the genesis of the distribution and its properties,

89
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
90 N. Balakrishnan, H. K. T. Ng, and N. Kannan

including several characterization results. Because of its wide applicability and


its relations to other distributions like the gamma and Weibull, there have
been numerous tests proposed in the literature to determine whether or not an
exponential model is indeed appropriate for a given sample.
The history of goodness-of-fit tests originated with the seminal paper by
Karl Pearson in 1900 on the chi-squared test. Tests based on the empirical
distribution function (EDF) like the Kolmogorov-Smirnov, Cramer-von Mises
and their variants are applicable for testing the hypothesis that the random
sample comes from some arbitrary distribution. The properties of these "om-
nibus" tests under various scenarios have been investigated by several authors;
see D' Agostino and Stephens (1986) for a detailed bibliography. These tests
are intuitive, and easily modified in the event of censored data.
However, if the investigator is interested in testing whether a particular
model like the normal or exponential is appropriate, it may be appropriate to
use the properties of the underlying distribution to derive a more specific (hope-
fully, more powerful) test. For the exponential distribution, one can exploit the
fact that the hazard function is constant or that the logarithm of the survival
function is linear. Shapiro (1995) and Stephens (1986) provide a fairly extensive
review of the literature on tests for the exponential distribution. Spinelli and
Stephens (1987) discuss tests for the two-parameter exponential distribution
when the parameters are unknown.
In this article, we propose a test for exponeniiality based on spacings under
Progressive Type-II censoring. In Section 8.2, we will briefly describe the idea
of Progressive Type-II censoring and some basic results. In Section 8.3, we will
propose a test statistic for exponentiality based on spacings. We will derive
the exact and asymptotic null distributions of the test statistic. In Section 8.4,
we present results of a simulation study to investigate the power of this test
under several different alternatives. We also discuss an approximation to the
power and compare the approximate values with those obtained by simulations.
In Section 8.5, we examine two standard tests (Cramer-von Mises A2 and the
Shapiro-Wilk WE) for exponentiality discussed extensively in the literature, and
compare.the power performance of all three procedures. Section 8.6 considers
tests for the two-parameter exponential distribution. We illustrate the test
procedures proposed here using some numerical examples in Section 8.7. Section
8.8 discusses the multi-sample extension of this procedure. Finally, we conclude
with some comments and suggestions for further research in Section 8.9.
A Test of Exponentiality 91

8.2 Progressive Censoring


Conventional Type-I and Type-II censoring schemes do not allow for removal
of units at points other than the terminal point of the experiment. We consider
a more general censoring scheme called Progressive Type-II Right censoring as
follows: Consider an experiment in which n units are placed on a life test. At the
time ofthe first failure, Rl units are randomly removed from the remaining n-1
surviving units. At the second failure, R2 units from the remaining n - 2 - Rl
units are randomly removed. The test continues until the mth failure. At this
time, all remaining Rm = n - m - Rl - R2 - ... - R m- 1 units are removed. The
Ris are fixed prior to the study. If Rl = R2 = ... = Rm = 0, we have n = m
which corresponds to the complete sample. If Rl = R2 = ... = Rm-l = 0, then
Rm = n - m which corresponds to the conventional Type-II right censoring
scheme. The idea of progressive censoring is due to Cohen (1963, 1966).
We may introduce a further generalization: Suppose the failure times of
the first r units are not observed. At the (r + 1)th failure, Rr+l units are
randomly removed. At successive failures, we remove units randomly as before.
This is called General Progressive Type-II censoring. If r = 0, this is the
scheme outlined above. All the procedures outlined in this paper may be easily
modified to deal with this general case. However, for simplicity we will consider
only Progressively Type-II right censored data.
Let XL~;:~··,Rm) denote the ith failure time. These failure times are referred
to as progressively Type-II right censored order statistics. The joint probability
density function of these order statistics may be written using probability argu-
ments. However, the marginal distributions do not have the same simple form
as in the case of the usual order statistics. For an exhaustive list of references
and further details on progressive censoring, the reader may refer to the recent
book by Balakrishnan and Aggarwala (2000).
The standard goodness-of-fit tests can be easily modified to deal with Type-
I and Type-II censored data. However, the loss of information, especially in the
tails of the distribution results in a significant loss of discriminatory power in
many cases. In the next section, we propose a test for exponentiality using
spacings under progressive censoring and examine its properties.
92 N. Balakrishnan, H. K. T. Ng, and N. Kannan

8.3 Test for Exponentiality


Let us assume that the failure times have an exponential distribution with
probability density function (p.d.f.)

f (x; rJ) = ~ exp ( - ~) , x> 0, (8.1)

and with cumulative distribution function (c.d.f.)

F(x;rJ) = 1- exp (-~), x> 0, (8.2)

where rJ > 0 is an unknown scale parameter.


(Rl, ... ,Rm) X(Rl, ... ,Rm)
L et X I:m:n '2:m:n
X(RI, ... ,Rm ) d
, ... , m:m:n y ype- II
. 1 T
eno t e a progressIve
right censored sample. We would like to test whether such a sample comes from
an exponential distribution with p.d.f. (8.1) with rJ being unknown. In other
words, we want to test the hypotheses

Ho : X ~ Exp(rJ)
d
against HI: X i- Exp(rJ). (8.3)

For convenience, we will suppress the censoring scheme in the notation of the
XI:m:n s .
Define the normalized spacings 8 1 , 82, ... , 8 m as
(RI, ... ,Rm)
81 n X l:m:n ,
82 (n - R - l)(X(Rl, ... ,Rm.) _ X(Rl, ... ,Rm ))
I 2:m:n l:m:n'
83 (n - R I - R 2 - 2)(X(RI, ... ,Rm) _ X(Rl, ... ,Rm ))
3:m:n 2:m:n'

8m (n - RI - ... - Rm-I - m + l)(X~~~~,Rm.) - X~~L~~m.)).


(8.4)

If the underlying distribution is exponential, 8 1 , 82, ... , 8 m defined in (8.4) are


all independent and identically distributed as exponential with scale parameter
rJ; see Balakrishnan and Aggarwala (2000) for details.
Consider the test statistic given by
m-1
L (m - i)8i
i=l
T = -----::m:::--- (8.5)
(m - 1) L 8i
i=l
A Test of Exponentiality 93

The numerator of the test statistic is a linear combination of the spacings with
decreasing weights, and the denominator is the sum of the spacings. The test
statistic is dearly scale invariant, with small and large values of T leading to the
rejection of Ho. The statistic T was suggested by Tiku (1980) for complete and
doubly Type-II censored samples. Balakrishnan (1983) studied the power of the
test against a variety of alternatives, and showed that the test (for complete
samples) performs well compared to standard tests in the literature.

8.3.1 Null distribution of T


To derive the null distribution of the test statistic T, we first write T in the
following form:

(8.6)

where
j
"£ Si
Zj = i~1 , j = 1,2, ... , m - l.
"£ Si
i=1

Since S1, S2, ... , Sm are all independent and identically distributed as ex-
ponential with scale parameter CT, the joint p.d.f. of S1, S2, ... , Sm is given
by

Si > 0, i = 1,2, ... , m.

Consider the transformation


j
"£ Si
i=1
m j = 1,2, ... ,m-1,
"£ Si
i=1
m
Zm LSi.
i=1

We then have
94 N. Balakrishnan, H. K. T. Ng, and N. Kannan

82 Z2Zm - ZlZm
83 Z3Zm - Z2Zm

8m -i Zm-iZm - Zm-2 Z m
8m Zm - Zm-iZm.

The Jacobian of this transformation is

Zm 0 0 0 0 Zi
-Zm Zm 0 0 0 Z2 - Zi
0 -Zm Zm 0 0 Z3 - Z2

PI det

0
0 0 0 -Zm Zm Zm-i - Zm-2
0 0 0 -Zm 1- Zm-i

which can be shown to equal z~-i.


Therefore, the joint density of Zi, Z2, ... , Zm is given by

iZ1, ... ,Zm (Zi' ... , zm)


_ _1_ -zm/a m-i
- am e zm' 0< Zi < Z2 < ... < Zm-i < 1, Zm > 0,

which yields the joint density of Zi, Z2, ... , Zm-i to be


1
10o 00
_e-
am
Zm / a zm-i dz
m m

(m - I)!, 0 < Zi < Z2 < ... < Zm-i < 1.

The joint distribution of Zi, Z2, ... , Zm-i is thus the same as the joint
distribution of the (m-l) order statistics (say, U(i)"'" U(m-i)) obtained from
a random sample of size (m - 1) from the Uniform (0,1) distribution (say,
Ui,"" Um-i). Hence, we immediately have

m-i m-i m-i


(m -1)T 4 L Zi 4 L U(i) 4 LUi.
i=i i=i i=i

This implies that the null distribution of the test statistic T is exactly the same
as the average of m-l LLd. Uniform(O,I) random variables. Therefore, the null
distribution of T tends to normality very rapidly as m increases. It is readily
verified that the mean of the limiting distribution is E(T) = ~ and variance
Var(T) = i2(~-1)'
A Test of Exponentiality 95

Remark 8.3.1 The above expressions of E(T) = ~ and Var(T) = 12(~-1) can
also be derived by taking expectations on both sides of
m m-l
T(m-1)L: Si= L:(m-i)Si
i=l i=l

and using Basu's theorem with the facts that 2 I:~l Si/~ is distributed as X~m'
2Sd ~ is distributed as X~, and that the ancillary statistic T is independent of
the complete sufficient statistic I:~l Si.

8.4 Power Function Approximation and Simulation


Results
8.4.1 Approximation of power function
The power function of the test is given by

To compute the power under different alternatives, we need to compute prob-


. m
abilities of the form Pr(T ~ c), for c being some constant. Since (m - 1) I: Si
i=l
is a positive quantity, we may write

Pr(T ~ c) Pr
I:
m-l (m
[ i=l
- i)Si
m >c
1
(m -1) i~l Si
Pr(L ~ 0),
where
m-l m
L= L:(m-i)Si-c(m-1)L:Si.
i=l i=l

From (8.4), L may be written as a linear combination of the progressively Type-


II right censored order statistics as
m-l m
L = L (m - i)Si - c(m - 1) L: Si
i=l i=l
m
'" ·X(Rl, ... ,Rm )
L...... at i:m:n ,
i=l
96 N. Balakrishnan, H. K. T. Ng, and N. Kannan

where
ai [(m - i) - c(m - 1)](Ri + 1) + (n - i - Rl - ... - Ri),
i=I, ... ,m-l,
am = -c(m - 1)(Rm + 1).
For large values of m, we may approximate the probability by

Pr(L 2: 0) ~ Pr [z 2: -*J ' (8.7)

where Z is a standard normal random variable, and


m
(J E(L) = 2: aiE [Xi(:~:~.. ,Rm)] ,
i=l
m
Var(L) = 2: arVar [Xi~~:~··,Rm)]
i=l
m-l m
+2 '~
" '~
" ··Cov [X(Rl,
a~aJ ~:m:n
... ,Rm) , X(Rl,
j:m:n... ,Rm)] .
i=l j=i+l

The single and product moments of progressively Type-II right censored or-
der statistics occurring in the above expression may be obtained by first-order
approximations; see Balakrishnan and Rao (1997). The idea is to use the prob-
ability integral transformation
X(R1, ... ,Rm) 1:: F-1(U(Rl, ... ,Rm)) (8.8)
~:m:n ~:m:n'

where UF!.~~··,Rm) is the ith progressively Type-II right censored order statis-
tic from the uniform U(O,l) distribution, and F- 1 is the inverse cdf of the
underlying distribution.
The mean, variance, and covariance for progressively Type-II order statis-
tics from the Uniform U(O, 1) distribution are given by [see Balakrishnan and
Aggarwala (2000)]
E(Ui:m:n) i = 1, ... ,m, (8.9)
Var(Ui:m:n ) aibi, i=l, ... ,m, (8.10)
COV(Ui:m:n, Uj:m:n) aibj, 1 ::; i ::; j ::; m, (8.11)
where
IT Rk + Rk+ + ... + Rm + k + 1 ,
1 m -
k=lRk + Rk+1 + ... + Rm + m - k + 2
IT Rk + Rk+1 + ... + Rm + k + 2 m -
k=lRk + Rk+ + ... + Rm + m - k + 3
1

_IT Rk + Rk+ + ... + Rm + k + 1.


1 m -
k=l Rk + Rk+1 + ... + Rm + m - k +2
A Test of Exponentiality 97

Expanding F-1(Ui:m:n) in a Taylor series (keeping only the first term), we


have

E(Xi:m:n) ~ F-1(ai:m:n), (8.12)


Var(Xi:m:n) ~ {F-1(1)(ai :m :n )} 2 Var(Ui:m:n), (8.13)
COV(Xi:m:n, X j :m:n ) ~ {F-1(1)(ai :m:n )} {F-1(1)(aj:m:n)}
XCOV(Ui:m:n, Uj:m:n), (8.14)

h
were F -l(l)()
u = dP-l(u)
du = f(F 1l(u))' B a Iak rIS. hnan and Rao (1997) used
these results to derive expressions for the approximate best linear unbiased
estimators for an arbitrary location-scale family of distributions.
We would like to point out that even though limiting results for linear
combinations of regular order statistics are available [see, for example, David
(1981)]' such results under progressive censoring have not been studied yet. It
is unclear whether the results in the regular case can be easily extended to
progressive censoring.
Instead of rewriting the test statistic as a linear combination of the pro-
gressively censored order statistics, we may directly approximate the power by
considering the test statistic T. We may write
m-l
E (m - i)Si
T = ....;.i=_l_ _m"""'-_ (8.15)
(m -1) E Si
i=l

We then have,

(8.16)

E(Wl)]2 [Var(Wd Var(W2) _ 2COV(Wl, W2)]


Var(T) ~ [E(W2) E2(Wl) + E2(W2) E(Wl)E(W2)'
(8.17)

See Kendall and Stuart (1969) for details. We may then approximate the dis-
tribution of T by a normal distribution with mean and variance given by the
above expressions.

8.4.2 Monte Carlo power comparison


In order to assess the power properties of the test statistic T, a Monte Carlo
simulation study was conducted to determine the power under different alter-
natives. The following lifetime distributions were used as alternatives to the
exponential distribution:
98 N. Balakrishnan, H. K. T. Ng, and N. Kannan

1. Weibull distribution with shape parameter 0.5, 2.0;


2. Lomax distribution with shape parameter 0.5, 2.0;
3. Lognormal distribution with shape parameter 0.5, 1.0;
4. Gamma distribution with shape parameter 0.75, 2.0.
For a detailed discussion on various properties of these distributions, one may
refer to Johnson, Kotz, and Balakrishnan (1994). For different choices of sample
sizes and progressive censoring schemes, we generated 100,000 sets of data in
order to obtain the estimated power values. These values are tabulated in Tables
8.2-8.5 for n = 20 (m = 8,12,16),40 (m = 10,20,30) and 60 (m = 20,40,50)
with three different progressive censoring schemes in each case. For convenience,
Table 8.9 lists the different censoring schemes (c.s.) used in the simulation
study.
The power values presented in Tables 8.2-8.5 clearly show that the test
proposed performs very well for all the alternatives considered. The power
increases with m for a fixed n, and also increases as n increases. We also
calculated the power values of T from the normal approximation using the
two methods and found them to be close to the simulated power values for
large values of m. The approximations are also presented in Tables 8.2-8.5 for
comparison. It is important to note from these tables that the approximation
in (8.7) does not work well for small values of m, even when the value of n is
large.
To demonstrate the accuracy of the Monte Carlo simulations, we also tabu-
late in Table 8.6 the null probabilities for the exponential distribution at levels
2.5(2.5)50%. Since the critical values are independent of n and the progressive
censoring schemes, we only present the value for different values of m. We can
see that the simulated probabilities under the null distribution are very close
to the pre-fixed levels, which suggests that the Monte Carlo method provides a
very good approximation. The results in Table 8.6 also provide ample evidence
to the accuracy of the normal approximation to the null distribution of the test
statistic. If we have to report the p-value of the test, we are then justified in
computing tail probabilities using the normal approximation.

8.5 Modified EDF and Shapiro-Wilk Statistics


As we have mentioned in the introduction, there have been several goodness-
of-fit tests for exponentiality proposed in the literature. Spinelli and Stephens
(1987) compared the performance of several test procedures based on the EDF
as well as those based on regression methods. They concluded that, in par-
ticular, two statistics, viz. Cramer-von Mises A2 and the Shapiro-Wilk WE,
had overall better power performance. In this section, we will modify the two
statistics in the case of progressively Type-II censored data, and compare their
A Test of Exponentiality 99

performance with the test based on spacings proposed in Section 8.3.


Testing the null hypothesis that the sample comes from an exponential dis-
tribution is equivalent to testing the hypothesis that the spacings 81,82, ... , 8 m
are distributed as scaled exponential. We can then apply the procedures in
Spinelli and Stephens (1987) to the 8i'S as follows: Let 8(1),8(2), ... ,8(m) be
the ordered spacings. Let fr = S = ;k L~l 8i denote the estimator of (J based
on generalized least squares. Define
and Zi = 1- exp( -wd, i = 1, ... ,m.
The test statistic A 2 is then defined as
1 m
A2 = - - L)2i - l){lnzi + In(l- Zi)} - m. (8.18)
m i=l
Large values of A2 lead to rejection of the null hypothesis that the sample comes
from an exponential di~tribution.
An alternative test was introduced by Shapiro and Wilk (1972) that com-
pares the generalized least squares estimator of (J with the estimator obtained
from the sample variance. The resulting test statistics WE is defined as

(8.19)

This test is a two-tailed test.


A Monte Carlo simulation study was conducted to compare the three pro-
cedures. We simulated the 5 and 10 percentage points for the test statistics A2
and WE, and used the values to compute the power for different alternatives.
The results presented in Tables 8.2-8.5 show that for all the alternatives con-
sidered, the test based on spacings performs significantly better than either of
the other two procedures.

8.6 Two-Parameter Exponential Case


We may also consider a test for the two-parameter exponential distribution
(location-scale model) with p.d.f.

f(x;~,(J) = ~exp [_ (X: ~)] , (8.20)

where the scale (J > 0, and the location ~ are unknown parameters. In this case,
the progressively Type-II right censored spacings 8i, 82' ... , 8:n
are defined as
8 1* n
(X(Rl, ... ,Rm) _
l:m:n
)
~ ,

i = 2,3, ... ,m (8.21)


100 N. Balakrishnan, H. K. T. Ng, and N. Kannan

where Si'S are as defined earlier in (8.4). Once again, Si, S2' ... , S:n are all
independent and identically distributed as exponential with scale parameter CT.
Since the first spacing Si involves the unknown parameter /-t, the test statistic
T proposed earlier in (8.5) may be modified as
m-1
2: (m - i)S;
T* = .,;;i_=.::,2_ _::::-_ (8.22)
m
(m - 2)
i=2
2: S;
Following the same procedure outlined in Section 8.3, the null distribution of
the test statistic T* can be derived. The distribution of T* is the same as the
distribution of the average of (m - 2) Li.d. Uniform(O,l) random variables.
Hence, the asymptotic null distribution of T* is normal with mean E(T*) =
~ and variance Var(T*) = 12(;;'-2)' Furthermore, the power approximation
procedure discussed in Section 8.4 can also be adapted to this two-parameter
exponential case.

8.7 Illustrative Examples


8.7.1 Example 1: One-parameter exponential case
In this section, we present two examples to illustrate the use of the statistics
T and T* in testing for the validity of the one- and two-parameter exponential
distributions for an observed progressively Type-II right censored sample.
We consider the following progressively Type-II right censored data giving
the times to breakdown of an insulating fluid tested at 34 kilovolts. This data
is taken from Table 6.1 of Nelson (1982), and has been considered earlier by
Viveros and Balakrishnan (1994). The observations in the original time scale,
the progressive censoring pattern and the spacings computed from Eq. (8.4) are
as follows:

Progressively censored sample presented by Viveros and Balakrishnan (1994)


i 1 2 3 4 5 6 7 8
Xi:rn:n 0.18999 0.77997 0.95993 1.30996 2.77986 4.84962 6.49999 7.35000
Ri 0 0 3 0 3 0 0 5
Si 3.60975 10.61969 3.05924 4.55051 17.63873 16.55808 11.55257 5.10007

Nelson (1982) and Viveros and Balakrishnan (1994) considered a Weibull


model for these data and constructed confidence intervals for the Wei bull shape
and scale parameters based on the complete sample and the progressively cen-
sored sample, respectively. In both cases, the confidence interval for the shape
parameter contained the value of 1 (the shape parameter value for the expo-
nential case) leading to the conclusion that the data are consistent with an
exponential distribution.
A Test of Exponentiality 101

In this example, we have n = 19, m = 8. The test statistic is computed as


m-l
i~l (m - i)Si 220.06957
T m = 508.82052 = 0.43251,
(m -1) L Si
i=l

and the p-value is

2<I> (0.4~
1/84
0.5) = 2 x 0.26810 = 0.53620.

Based on this p-value, we fail to reject the null hypothesis that the random
sample is from an exponential distribution. This is consistent with the findings
of Nelson (1982) and Viveros and Balakrishnan (1994).

8.7.2 Example 2: Two-parameter exponential case


Spinelli and Stephens (1987) reported data with 32 observations on measure-
ments of modulus of repute (a measure of the breaking strength of lumber)
of wood beams. For the purpose of illustrating the test procedure outlined in
Section 8.5, a progressively Type-II right censored sample of size m = 20 has
been randomly generated from the n = 32 observations in Table 3 of Spinelli
and Stephens (1987). The observations, the removal pattern applied and the
corresponding spacings computed from Eq. (8.19) are as follows:

Progressively censored sample generated from the measurements of modulus


of repute of wood beams data by Spinelli and Stephens (1987)
i 1 2 3 4 5 6 7 8 9 10
xi:rn:n 43.19 49.44 51.55 56.63 67.27 78.47 86.59 90.63 94.38 98.21
Ri 0 2 0 0 2 0 0 0 0 0
S. 193.75 59.08 137.16 276.64 257.60 178.64 84.84 75.00 72.77
i 11 12 13 14 15 16 17 18 19 20
Xi:m:n 98.39 99.74 100.22 103.48 105.54 107.13 108.14 108.94 110.81 116.39
R; 2 2 0 0 0 0 1 1 0 2
S. 3.24 20.25 5.76 35.86 20.60 14.31 8.08 4.80 7.48 16.74

Spinelli and Stephens (1987) studied tests based on regression and the em-
pirical distribution function for testing the null hypothesis of exponentiality
using the complete sample. They found that all the test statistics were highly
significant (with p-value < 0.01) and rejected the null hypothesis that the data
are exponentially distributed with p.d.f. (8.18).
The test statistic in (8.22) for testing the validity of a two-parameter expo-
nential distribution is computed as
m-l
L (m - i)Si
T* i=2 = 19983.72 = 0.75391
(m - 2) f: Si 26506.8 '
i=2
102 N. Balakrishnan, H. K. T. Ng, and N. Kannan

and the p-value is

2 [1 - <I> (0.75391 - 0.5)] =2x 0.0000951 = 0.00019026.


J1/216
From this p-value, we observe that the data provide enough evidence to reject
the null hypothesis that the progressively censored sample comes from a two-
parameter exponential distribution, which agrees with the conclusion of Spinelli
and Stephens (1987) drawn from the complete sample.

8.8 Multi-Sample Extension


'T'
.10 test t h at k'Ind epen d ent progressIve
. 1y censore d samp1es X(Rli, ... ,Rm ". i )
l:mi:ni ... ,

X~~~~·i·:·~~mii), i = 1,2, ... ,k, come from exponential populations E(lLi, O'i), we
can generalize the test statistic T* in (8.22) as follows:

(8.23)

where Tt is the test statistic computed from the ith sample. Small and large
values of T* indicate the non-exponentiality of at least one of the k samples.
If we wish to test that the samples come from one-parameter exponential
populations E(O'i), we can generalize the test statistic T in (8.5) as follows:

T _ I:~=l (mi - l)Ti (8.24)


*- k
I:i=l (mi - 1)
where Ti is the test statistic computed from the ith sample. Small and large
values of T* indicate the non-exponentiality of at least one of the k samples.
Note that, in both cases we may have the censoring schemes (R~js), sample
sizes (ni), and effective sample sizes (mi) for the k samples to be different.
The null distribution of T* (T*) may once again be shown to be equivalent to
the distribution of the average of I:~=l (mi - 2) (I:~=l (mi -1)) Uniform U(O, 1)
random variables. To compute the power under different alternatives, we may
use a similar approximation to the one discussed in Section 8.4. In this case,
however, we will not be able to write P(T* > c) in terms of a probability involv-
ing linear combinations of progressively Type-II right censored order statistics
from each sample. We rely on the expressions in (8.16) and (8.17) to compute
the moments of T* and T*, and the corresponding normal approximations to
the probabilities.
Table 8.7 presents some simulation results for k = 2,3 in the case of the one-
parameter exponential model. The approximate values of power are reasonably
A Test of Exponentiality 103

close to the simulated values for most cases considered. It is of interest to note
that combinations of censoring schemes for the k samples provide distinctly
different power values.

8.9 Conclusions
In this article, we have proposed goodness-of-fit tests for the one- and two-
parameter exponential models under general progressive Type-II censoring.
These tests are based on normalized spacings, generalizing tests proposed by
Tiku (1980). The exact and asymptotic null distribution of the test statistics
have been derived. Further, two approximations to compute the power under
different alternatives have been suggested.
Results of the simulation study for a wide range of sample sizes and cen-
soring schemes show that the test performs well in detecting departures from
exponentiality. If the alternative model is distinctly different from exponential,
the power values are close to 1. The approximations for the power are very close
to the values obtained through simulations. The proposed test procedures are
illustrated on some real data for the one- and two-parameter exponential mod-
els. The conclusions drawn from these tests are consistent with those drawn
by other authors using different procedures. Finally, some extensions to the
multi-sample case have been suggested.
There are several theoretical aspects that still need to be looked at carefully.
In particular, it would be useful to develop limit theorems for linear combina-
tions of progressively Type-II censored order statistics. This would provide
theoretical justification for the normal approximations suggested in this paper.
Finally, it would also be interesting to develop analogous goodness-of-fit tests
for the general location-scale family of distributions.
104 N. Balakrishnan, H. K. T. Ng, and N. Kannan

Table 8.1: Progressive censoring schemes used in the Monte Carlo


simulation study
n m (RI, R2, . .. , Rm) Scheme No.
20 8 RI = 12, Ri = 0 for i i= 1 [1]
Rs = 12, Ri = 0 for i i= 8 [2]
RI = Rs = 6, Ri = 0 for i i= 1,8 [3]
12 RI = 8, Ri = 0 for i i= 1 [4)
RI2 = 8, Ri = 0 for i i= 12 [5]
R3 = Rs = R7 = Rg = 2, Ri = 0 for i i= 3,5, 7, 9 [6]
16 RI = 4, Ri = 0 for i i= 1 [7]
RI6 = 4,Ri = 0 for i i= 16 [8)
Rs = 4, Ri = 0 for i i= 5 [9]
40 10 RI = 30, Ri = 0 for i i= 1 [10]
RlO = 30, Ri = 0 for i i= 10 [11]
RI = Rs = RIO = 10, Ri = 0 for i = 1,5,10 [12)
20 RI = 20, Ri = 0 for i i= 1 [13]
R20 = 20, Ri = 0 for i i= 20 [14]
Ri = 1, for i = 1,2, ... ,20 [15]
30 RI = 10, Ri = 0 for i i= 1 [16]
R30 = 10, Ri = 0 for i i= 30 [17]
RI = R30 = 5, R = 0 for i i= 1,30 [18]
60 20 RI = 40, Ri = 0 for i i= 1 [19]
R 20 = 40, Ri = 0 for i i= 20 [20)
RI = R20 = 10, RIO = 20 , Ri = 0 for i i= 1,10,20 [21)
40 RI = 20, Ri = 0 for i i= 1 [22]
R 40 = 20, Ri = 0 for i i= 40 [23)
R2i-1 = 1, R2i = 0 for i = 1,2, ... ,20 [24]
50 RI = 10, Ri = 0 for i i= 1 [25]
Rso = 10, Ri = 0 for i i= 50 [26)
RI = Rso = 5, Ri = 0 for i i= 1,50 [27)
A Test of Exponentiality 105

Table 8.2: Monte Carlo power estimates for Weibull distribution at 10% and
5% levels of significance

WeibuJl(0.5
10'70 5%
c.s. T App(L) App(W) A- WE T App(L) App(W) A< WE
1 0.71672 0.79033 0.70467 0.60746 0.55796 0.61452 0.69872 0.59668 0.52883 0.45663
2 0.51847 0.66998 0.57311 0.37713 0.33930 0.39990 0.55197 0.45694 0.29668 0.24614
3 0.57377 0.72616 0.63247 0.43776 0.39414 0.45581 0.61273 0.51555 0.35465 0.29455
4 0.83379 0.83901 0.81327 0.66873 0.63183 0.76001 0.77399 0.72796 0.59327 0.53617
5 0.69 96 0.78209 0.73798 0.47934 0.44554 0.59111 0.68995 0.63459 0.39233 0.34245
6 0.85449 0.85515 0.83683 0.68503 0.65829 0.79102 0.80064 0.76174 0.61718 0.56993
7 0.90389 0.87568 0.88596 0.71755 0.68749 0.85230 0.82815 0.82318 0.64605 0.59762
8 0.84024 0.85959 0.86140 0.59537 0.56385 0.76609 0.80119 0.78752 0.51014 0.46244
9 0.92164 0.89050 0.90996 0.74722 0.72349 0.87656 0.84947 0.85511 0.68152 0.63910
10 0.80630 0.83480 0.79047 0.69662 0.64007 0.72293 0.76423 0.69720 0.61928 0.54240
11 0.58011 0.69950 0.62607 0.39267 0.36022 0.46239 0.58494 0.51114 0.31052 0.26233
12 0.70747 0.81095 0.75718 0.51728 0.48377 0.60430 0.72898 0.65652 0.43550 0.38432
13 0.95650 0.91312 0.94612 0.80383 0.77610 0.92708 0.88138 0.90818 0.74337 0.69770
14 0.86101 0.87312 0.88381 0.56856 0.54223 0.78663 0.81655 0.81548 0.48023 0.43650
15 0.94605 0.91715 0.95127 0.75547 0.73113 0.91204 0.88713 0.91625 0.68957 0.64844
16 0.99036 0.95235 0.98797 0.87649 0.86136 0.98199 0.93601 0.97634 0.83067 0.80121
17 0.97063 0.94666 0.97797 0.74441 0.72546 0.94722 0.92471 0.95751 0.67077 0.63439
18 0.97969 0.95122 0.98410 0.79352 0.77495 0.96253 0.93245 0.96856 0.72763 0.69231
19 0.95994 0.91637 0.95033 0.82456 0.79270 0.93151 0.88588 0.91435 0.76548 0.71624
20 0.83761 0.85969 0.86431 0.53713 0.50973 0.75680 0.79580 0.78887 0.44811 0.40481
21 0.94038 0.92336 0.95982 0.72449 0.70207 0.90314 0.89470 0.92723 0.65344 0.61401
22 0.99840 0.97427 0.99792 0.93655 0.92493 0.99638 0.96575 0.99528 0.90704 0.88510
23 0.98985 0.97129 0.99328 0.80242 0.78826 0.97958 0.95890 0.98529 0.73386 0.70346
24 0.99819 0.97611 0.99834 0.92981 0.91833 0.99601 0.96820 0.99615 0.89855 0.87707
25 0.99970 0.98529 0.99961 0.96329 0.95653 0.99916 0.98049 0.99901 0.94347 0.92864
26 0.99872 0.98636 0.99921 0.90278 0.89330 0.99691 0.98104 0.99797 0.85950 0.83900
27 0.99924 0.98660 0.99949 0.92795 0.92069 0.99818 0.98174 0.99866 0.89362 0.87540

WeibuJl(2.0
10'70 5'70
c.s. T App(L) App(W) A WE T App(L) App(W) A WE
1 0.81945 0.89734 0.93811 0.25905 0.23759 0.68849 0.81533 0.84627 0.17926 0.14879
2 0.49956 0.55783 0.55755 0.14362 0.13726 0.34470 0.39873 0.39878 0.08634 0.07574
3 0.60582 0.68462 0.68882 0.16607 0.15746 0.44619 0.52574 0.52552 0.10594 0.09065
4 0.91172 0.94854 0.97258 0.30676 0.25619 0.82316 0.89889 0.92478 0.21770 0.16360
5 0.72826 0.78185 0.78672 0.19354 0.16742 0.58402 0.64632 0.64752 0.12369 0.09740
6 0.87935 0.92728 0.94502 0.24654 0.21070 0.76253 0.85062 0.86532 0.16674 0.12852
7 0.95772 0.97482 0.98732 0.34094 0.27326 0.90236 0.94434 0.96109 0.24467 0.17720
8 0.89272 0.92350 0.934U4 0.26365 0.21288 0.79847 0.84933 0.85772 0.17796 0.13134
9 0.96362 0.97831 0.98915 0.33296 0.27218 0.91216 0.94911 0.96429 0.23641 0.17514
10 0.93802 0.95734 0.99257 0.42401 0.35878 0.87462 0.92656 0.97241 0.32530 0.24949
11 0.58036 0.63285 0.63318 0.15917 0.14561 0.42602 0.47993 0.47986 0.09810 0.08196
12 0.73598 0.79830 0.80971 0.19894 0.17756 0.58615 0.66508 0.66851 0.12964 0.10439
13 0.99378 0.99339 0.99923 0.53410 0.40719 0.98161 0.98593 0.99659 0.42401 0.29094
14 0.91227 0.93098 0.93719 0.28490 0.22011 0.83312 0.86453 0.86957 0.19431 0.13708
15 0.98015 0.98502 0.99220 0.37401 0.29222 0.94816 0.96359 0.97446 0.27154 0.19144
16 0.99956 0.99932 0.99994 0.61063 0.45501 0.99800 0.99806 0.99964 0.49531 0.33160
17 0.99342 0.99455 0.99691 0.45004 0.32440 0.98097 0.98533 0.98956 0.33607 0.21861
18 0.99723 0.99742 0.99910 0.51169 0.37044 0.99111 0.99282 0.99636 0.39531 0.25746
19 0.99673 0.99372 0.99978 0.63063 0.48642 0.98981 0.98830 0.99890 0.52833 0.36543
20 0.89434 0.91226 0.91695 0.26949 0.20967 0.80681 0.83526 0.83849 0.18194 0.12919
21 0.97705 0.98067 0.98895 0.37580 0.29183 0.94247 0.95524 0.96683 0.27360 0.19119
22 1.00000 0.99994 1.00000 0.76301 0.57640 0.99993 0.99982 0.99999 0.66509 0.45036
23 0.99878 0.99896 0.99953 0.54811 0.38065 0.99612 0.99674 0.99805 0.42790 0.26786
24 1.00000 0.99994 1.00000 0.67704 0.50448 0.99984 0.99977 0.99997 0.56469 0.37931
25 1.00000 1.00000 1.00000 0.81038 0.61641 1.00000 0.99999 1.00000 0.71880 0.49141
26 0.99999 0.99996 0.99999 0.69585 0.49341 0.99988 0.99985 0.99996 0.58462 0.36861
27 0.99999 0.99998 1.00000 0.74535 0.54158 0.99997 0.99994 0.99999 0.64051 0.41547
106 N. Balakrishnan, H. K. T. Ng, and N. Kannan

Table 8.3: Monte Carlo power estimates for Lomax distribution at 10% and 5%
levels of significance

Lomax 0.5)
10% 520
c.s. T App(L) App(W) A- WE T App(L) App(W) A- WE
[1 0.82303 0.93658 0.79763 0.68983 0.68715 0.77238 0.91681 0.73874 0.64701 0.63292
2 0.21607 0.31514 0.23585 0.12015 0.11848 0.13322 0.23460 0.15311 0.06864 0.06345
3 0.35568 0.48739 0.37394 0.17834 0.17364 0.25550 0.37055 0.27518 0.11914 0.10764
4) 0.93605 0.96461 0.92302 0.82777 0.81768 0.91141 0.95555 0.89321 0.79650 0.77966
5 0.50119 0.61830 0.51626 0.22339 0.20745 0.39396 0.50200 0.41055 0.15846 0.13678
6 0.87040 0.93725 0.83274 0.73201 0.71268 0.83437 0.92106 0.78946 0.69530 0.66573
7 0.97759 0.97571 0.97297 0.90291 0.89338 0.96663 0.97001 0.95990 0.88231 0.86585
[8 0.84985 0.88405 0.85203 0.54634 0.51649 0.79228 0.84954 0.79410 0.47630 0.43396
9 0.97492 0.97467 0.97028 0.89796 0.88618 0.96332 0.96873 0.95639 0.87614 0.85794
[10) 0.89205 0.95408 0.87313 0.76905 0.76184 0.85680 0.94141 0.83023 0.73191 0.71581
[11) 0.15319 0.22198 0.16594 0.10395 0.10412 0.08547 0.15746 0.09761 0.05345 0.05197
12 0.30073 0.38308 0.31154 0.14420 0.13878 0.20851 0.28029 0.22031 0.08717 0.07877
13 0.99203 0.98155 0.99126 0.94559 0.93672 0.98753 0.97742 0.98616 0.93202 0.91870
14 0.51911 0.57796 0.53041 0.17396 0.15807 0.40612 0.45920 0.41917 0.11173 0.09385
15 0.87879 0.88714 0.84641 0.63073 0.57919 0.83686 0.85845 0.80122 0.57361 0.51157
16 0.99947 0.98876 0.99965 0.98655 0.98275 0.99906 0.98639 0.99934 0.98181 0.97631
17 0.95434 0.91384 0.95475 0.61017 0.55853 0.92672 0.88995 0.92805 0.53408 0.46811
[18J 0.98875 0.94983 0.98856 0.83422 0.80025 0.98088 0.93788 0.98073 0.79008 0.74271
19 0.99196 0.98152 0.99125 0.94550 0.93663 0.98750 0.97739 0.98614 0.93192 0.91861
[20) 0.27506 0.31672 0.29002 0.11468 0.10877 0.18182 0.21508 0.19538 0.06215 0.05715
21 0.67153 0.72728 0.67173 0.29673 0.26005 0.58254 0.64608 0.58271 0.22258 0.18253
22) 0.99998 0.99205 0.99999 0.99707 0.99574 0.99997 0.99041 0.99998 0.99578 0.99347
23 0.95153 0.90840 0.95330 0.49868 0.43662 0.92107 0.87992 0.92351 0.41043 0.33957
24) 0.99844 0.97332 0.99211 0.96215 0.94623 0.99740 0.96775 0.98879 0.95045 0.92816
25 1.00000 0.99392 1.00000 0.99937 0.99884 1.00000 0.99269 1.00000 0.99906 0.99835
[26J 0.99923 0.96646 0.99920 0.91593 0.88434 0.99846 0.95875 0.99839 0.88664. 0.84007
[27) 0.99997 0.97647 0.99992 0.98098 0.97197 0.99987 0.97147 0.99982 0.97274 0.95854

Lomax(2.0)
10-"" 5')'0
c.s. T App(L) App(W) A- WE T App(L) App(W) A- WE
[1) 0.30907 0.28451 0.21716 0.17958 0.16996 0.22452 0.21132 0.14050 0.12285 0.10901
[2 0.11154 0.18329 0.12257 0.09971 0.09993 0.05553 0.13127 0.06576 0.05001 0.05019
[3) 0.12760 0.19612 0.13394 0.10209 0.10199 0.06827 0.14152 0.07401 0.05164 0.05101
4 0.40318 0.34265 0.28542 0.21513 0.19097 0.31774 0.24740 0.19945 0.15517 0.12782
5) 0.14348 0.18665 0.14423 0.10234 0.10098 0.07840 0.12598 0.08144 0.05257 0.05097
6 0.34351 0.26750 0.22272 0.19374 0.17011 0.26190 0.18905 0.14520 0.13441 0.11023
7) 0.48634 0.41182 0.35544 0.24258 0.20713 0.39831 0.30515 0.26333 0.17987 0.14128
[8) 0.24019 0.25206 0.21992 0.11499 0.11005 0.15713 0.16879 0.14033 0.06245 0.05625
[9J 0.47963 0.40199 0.34740 0.24079 0.20562 0.39190 0.29672 0.25631 0.17800 0.13977
10 0.36079 0.30957 0.25018 0.19619 0.18124 0.27491 0.22425 0.16867 0.13741 0.11851
11 0.10710 0.16402 0.11660 0.09873 0.10080 0.05239 0.11186 0.06150 0.04911 0.04951
[12 0.12625 0.17748 0.12956 0.10021 0.10212 0.06505 0.12231 0.07084 0.05093 0.05065
13 0.55863 0.47640 0.42100 0.27115 0.22498 0.46974 0.36730 0.32598 0.20418 0.15610
[14 0.13945 0.16562 0.14154 0.10118 0.09983 0.07666 0.10342 0.07927 0.05174 0.05031
15 0.28328 0.23982 0.21548 0.13414 0.12020 0.20092 0.15828 0.13758 0.07975 0.06556
[16 0.69864 0.60886 0.56620 0.32850 0.26260 0.61953 0.51160 0.47313 0.25704 0.18883
[17) 0.28986 0.29036 0.27554 0.11381 0.10897 0.19623 0.19331 0.18472 0.06078 0.05647
18) 0.40740 0.39444 0.37317 0.13461 0.12268 0.30562 0.28362 0.27218 0.07802 0.06664
[19 0.55835 0.47602 0.42058 0.27110 0.22499 0.46945 0.36690 0.32560 0.20421 0.15603
20 0.11311 0.14384 0.11934 0.09941 0.09869 0.05773 0.08832 0.06338 0.05016 0.04953
21) 0.17575 0.19212 0.16926 0.10483 0.10222 0.10522 0.12260 0.10054 0.05452 0.05210
22 0.79492 0.70105 0.67852 0.37994 0.29782 0.72959 0.62078 0.59503 0.30652 0.21959
23 0.26916 0.26786 0.25884 0.10682 0.10408 0.17633 0.17449 0.16974 0.05611 0.05310
24J 0.59787 0.49216 0.46968 0.23909 0.18454 0.50840 0.38396 0.36923 0.17237 0.11847
[25 0.86092 0.76694 0.76334 0.42547 0.33020 0.80912 0.70166 0.69209 0.34902 0.24792
[26 0.51287 0.49757 0.48655 0.13395 0.12176 0.40291 0.38241 0.37653 0.07531 0.06508
27 0.65010 0.61974 0.60888 0.17696 0.15089 0.55128 0.51589 0.50570 0.11144 0.08636
A Test of Exponentiality 107

Table 8.4: Monte Carlo power estimates for Lognormal distribution at 10% and
5% levels of significance

Lognormal 0.5)
10'11> 5'7.
c.s. T App(L) App(W) A WE T App(L) App(W) A- WE
1 0.97317 0.99822 0.99953 0.57405 0.45279 0.93250 0.99138 0.99521 0.45287 0.32599
2 0.92186 0.95576 0.95264 0.36626 0.30817 0.82028 0.87119 0.86690 0.26666 0.20415
3 0.95509 0.98294 0.98435 0.44054 0.35978 0.88607 0.93814 0.93878 0.32760 0.24556
4 0.98369 0.99916 0.99951 0.61419 0.40693 0.95757 0.99580 0.99662 0.48638 0.28805
5 0.97790 0.99171 0.99046 0.49165 0.34131 0.93753 0.96818 0.96493 0.37131 0.23011
6 0.98599 0.99917 0.99931 0.53921 0.37542 0.95939 0.99506 0.99516 0.41481 0.25898
7 0.98804 0.99948 0.99947 0.62071 0.37729 0.96865 0.99725 0.99697 0.49201 0.26345
8 0.99050 0.99799 0.99747 0.57392 0.35703 0.97141 0.99078 0.98897 0.44477 0.24460
9 0.99008 0.99964 0.99960 0.60367 0.37816 0.97302 0.99787 0.99752 0.47486 0.26274
10 0.99883 0.99999 1.00000 0.91263 0.73497 0.99648 0.99993 1.00000 0.84945 0.61977
11 0.98880 0.99641 0.99603 0.58133 0.43831 0.96171 0.98239 0.98083 0.45922 0.31393
12 0.99844 0.99971 0.99988 0.70643 0.54012 0.99226 0.99791 0.99862 0.58682 0.40452
13 0.99968 1.00000 1.00000 0.93916 0.63184 0.99916 0.99999 1.00000 0.88579 0.50824
14 0.99983 0.99998 0.99998 0.83711 0.52330 0.99918 0.99985 0.99980 0.73616 0.39745
15 0.99995 1.00000 1.00000 0.88454 0.58418 0.99977 0.99999 1.00000 0.80170 0.45680
16 0.99987 1.00000 1.00000 0.94584 0.59018 0.99970 1.00000 1.00000 0.89330 0.46627
17 0.99999 1.00000 1.00000 0.92174 0.56214 0.99993 1.00000 0.99999 0.85183 0.43561
18 0.99999 1.00000 1.00000 0.93584 0.57814 0.99993 1.00000 1.00000 0.87549 0.45259
19 0.99998 1.00000 1.00000 0.99074 0.80103 0.99993 1.00000 1.00000 0.97817 0.70248
20 0.99997 1.00000 1.00000 0.90574 0.60691 0.99986 0.99998 0.99998 0.83300 0.48040
21 1.00000 1.00000 1.00000 0.95619 0.70913 1.00000 1.00000 1.00000 0.91205 0.58757
22 0.99999 1.00000 1.00000 0.99518 0.73613 0.99999 1.00000 1.00000 0.98564 0.62309
23 1.00000 1.00000 1.00000 0.98858 0.69547 1.00000 1.00000 1.00000 0.96991) 0.57676
24 1.00000 1.00000 1.00000 0.99267 0.73544 0.99999 1.00000 1.00000 0.97930 0.62046
25 1.00000 1.00000 1.00000 0.99540 0.71991 1.00000 1.00000 1.00000 0.98692 0.60573
26 1.00000 1.00000 1.00000 0.99435 0.71123 1.00000 1.00000 1.00000 0.98352 0.59598
27 1.00000 1.00000 1.00000 0.99510 0.71781 1.00000 1.00000 1.00000 0.98578 0.60334

Lognormal 1.0
10'11> 5'11>
c.s. T App(L) App(W) A WE T App(L) App(W) A WE
1 0.20500 0.29174 0.27413 0.11028 0.10874 0.12145 0.21294 0.18012 0.05800 0.05538
2 0.24330 0.27591 0.26382 0.10210 0.10222 0.13892 0.18235 0.16012 0.05253 0.05076
3 0.23727 0.28077 0.26728 0.10238 fr.10244 0.13645 0.18965 0.16506 0.05240 0.05095
4 0.18549 0.24289 0.22190 0.11502 0.10925 0.11137 0.17729 0.14109 0.06320 0.05743
5 0.21975 0.25508 0.24096 0.09901 0.09883 0.12839 0.17135 0.14707 0.05157 0.04913
6 0.20836 0.27502 0.26113 0.11233 0.10829 0.12532 0.19362 0.16736 0.06123 0.05592
7 0.18059 0.20926 0.18788 0.12156 0.11289 0.11081 0.15041 0.11520 0.06750 0.05970
8 0.16825 0.21182 0.19219 0.10035 0.10005 0.09605 0.14480 0.11486 0.05062 0.04966
9 0.18455 0.21834 0.19881 0.12050 0.11267 0.11222 0.15713 0.12356 0.06690 0.05911
10 0.29002 0.41242 0.40955 0.13184 0.12372 0.19056 0.30891 0.29732 0.07706 0.06719
11 0.40977 0.43709 0.43629 0.11688 0.11248 0.27017 0.30055 0.29513 0.06533 0.05891
12 0.42399 0.46854 0.46800 0.11977 0.11430 0.28509 0.32841 0.32411 0.06761 0.05954
13 0.21872 0.28042 0.27187 0.14004 0.12703 0.13898 0.20047 u.18414 0.08305 0.06956
14 0.40377 0.42535 0.42453 0.11755 0.10848 0.27285 0.29961 0.29391 0.06296 0.05608
15 0.33082 0.40361 0.40213 0.11992 0.11126 0.21963 0.28892 0.28084 0.06599 0.05802
16 0.20384 0.21442 0.20436 0.15210 0.13327 0.13046 0.15171 0.12945 0.09250 0.07404
17 0.26370 0.29442 0.28827 0.11259 0.10725 0.16711 0.20260 0.18821 0.06001 0.05530
18 0.20548 0.24718 0.23741 0.11441 0.10902 0.12479 0.17094 0.15149 0.06116 0.05656
19 0.27206 0.37697 0.37584 0.16762 0.14182 0.18277 0.28612 0.27495 0.10273 0.08003
20 0.59741 0.61445 0.61335 0.14869 0.12681 0.45061 0.46773 0.46762 0.08561 0.06788
21 0.54944 0.59360 0.59258 0.14769 0.12547 0.41023 0.45363 0.45334 0.08552 0.06683
22 0.21695 0.22832 0.22276 0.18074 0.14920 0.14186 0.16210 0.14505 0.11487 0.08614
23 0.41991 0.44414 0.44395 0.13418 0.11751 0.29660 0.32185 0.31788 0.07462 0.06233
24 0.27574 0.35465 0.35287 0.15682 0.13262 0.18403 0.25781 0.24838 0.09450 0.07436
25 0.22026 0.19908 0.19353 0.18767 0.15237 0.14658 0.13547 0.12114 0.12045 0.08813
26 0.25109 0.28162 0.27666 0.12797 0.11442 0.16094 0.19489 0.18231 0.06991 0.05983
27 0.19203 0.22635 0.21912 0.13294 0.11846 0.11469 0.15524 0.13924 0.07404 0.06270
108 N. Balakrishnan, H. K. T. Ng, and N. Kannan

Table 8.5: Monte Carlo power estimates for Gamma distribution at 10% and
5% levels of significance

Gamma(0.75)
10'10 5 0
c.s. T App(L) App(W) A" WE T App(L) App(W) A" WE
1 0.18415 0.20186 0.14326 0.13881 0.13226 0.10768 0.14541 0.08019 0.08196 0.07277
2 0.17285 0.24856 0.18925 0.13105 0.12493 0.09870 0.18408 0.11582 0.07548 0.06725
[3 0.17698 0.24464 0.18837 0.13428 0.12747 0.10133 0.17894 0.11471 0.07845 0.06911
[4] 0.20126 0.18905 0.15029 0.13406 0.12826 0.12070 0.12841 0.08523 0.07908 0.07091
5 0.19227 0.23571 0.20012 0.12946 0.12390 0.11468 0.16427 0.12364 0.07521 0.06779
[6 0.22455 0.19843 0.15971 0.13788 0.13169 0.14011 0.13668 0.09233 0.08220 0.07366
7 0.21915 0.18877 0.16068 0.13048 0.12508 0.13728 0.12449 0.09287 0.07550 0.06775
[8 0.21378 0.22985 0.20544 0.12798 0.12310 0.13250 0.15524 0.12734 0.07413 0.06615
9 0.23260 0.19744 0.17014 0.13328 0.12758 0.14705 0.13109 0.10003 0.07760 0.06 965
10 0.20925 0.20162 0.15722 0.14563 0.13917 0.12582 0.13937 0.08970 0.08834 0.07907
11] 0.19293 0.25732 0.21050 0.13239 0.12763 0.11360 0.18612 0.13223 0.07732 0.07132
12 0.21813 0.27227 0.23039 0.14202 0.13576 0.13313 0.19523 0.14796 0.08415 0.07732
[13 0.25790 0.20852 0.18919 0.13505 0.12917 0.16655 0.13580 0.11353 0.07895 0.07154
14 0.24734 0.27371 0.25698 0.12966 0.12466 0.15974 0.18714 0.16810 0.07488 0.06859
[15] 0.28922 0.26257 0.24672 0.13907 0.13300 0.19353 0.17766 0.15884 0.08176 0.07444
16 0.30315 0.23667 0.22608 0.13627 0.13220 0.20500 0.15465 0.14195 0.07859 0.07327
[17] 0.29555 0.30082 0.29251 0.13444 0.13005 0.19685 0.20602 0.19660 0.07629 0.07161
18 0.29813 0.29032 0.28190 0.13498 0.13071 0.19872 0.19721 0.18755 0.07702 0.07198
[19 0.26838 0.21659 0.19826 0.14554 0.13880 0.17535 0.14130 0.12004 0.08561 0.07798
20 0.25713 0.28799 0.27132 0.13654 0.13307 0.16713 0.19870 0.17981 0.07880 0.07305
21 0.30867 0.32241 0.30814 0.14807 0.14217 0.20850 0.22575 0.21093 0.08878 0.08112
[22] 0.35995 0.28396 0.27819 0.13843 0.13502 0.25210 0.19031 0.18315 0.08067 0.07448
23 0.35395 0.36277 0.35837 0.13581 0.13221 0.24655 0.25689 0.25220 0.07848 0.07290
24 0.39902 0.33282 0.32834 0.14345 0.13933 0.29018 0.23045 0.22488 0.08515 0.07792
25 0.40398 0.32213 0.31855 0.13907 0.13471 0.29295 0.22116 0.21665 0.07973 0.07439
[26] 0.39963 0.39085 0.38809 0.13739 0.13324 0.29200 0.28087 0.27803 0.07866 0.07312
27 0.40108 0.37789 0.37505 0.13829 0.13378 0.29301 0.26930 0.26625 0.07910 0.07378

Gamma(2.0)
10% 5%
c.s. T App(L) App(W) A" WFJ T App(L) App(W) A" WE
[I] 0.46065 0.62475 0.62450 0.12548 0.12036 0.31030 0.46339 0.46149 0.07321 0.06460
2 0.31230 0.35326 0.35024 0.10908 0.10780 0.18855 0.23024 0.22497 0.05885 0.05508
[3] 0.35993 0.42015 0.41776 0.11336 0.11036 0.22623 0.27977 0.27830 0.06245 0.05763
4 0.51783 0.67413 0.67629 0.13134 0.11943 0.37470 0.52369 0.52329 0.07698 0.06509
5 0.42380 0.46831 0.46754 0.11795 0.11029 0.28395 0.32517 0.32399 0.06544 0.05888
[6 0.50328 0.64337 0.64437 0.12347 0.11446 0.35268 0.47912 0.47894 0.07051 0.06077
7 0.57175 0.71129 0.71348 0.13545 0.12040 0.42260 0.56653 0.56661 0.07713 0.06545
[8] 0.52100 0.58340 0.58317 0.12623 0.11473 0.37236 0.43239 0.43223 0.07143 0.06223
[9] 0.58871 0.72468 0.72717 0.13433 0.11948 0.43735 0.57774 0.57791 0.07667 0.06468
10] 0.64455 0.79434 0.81290 0.18295 0.15373 0.50515 0.67882 0.68469 0.11693 0.09002
11 0.40708 0.44438 0.44338 0.12129 0.11380 0.27068 0.30636 0.30449 0.06812 0.06056
[12] 0.49898 0.56138 0.56073 0.13200 0.12107 0.34671 0.40423 0.40411 0.07695 0.06556
13 0.76435 0.86731 0.87863 0.19476 0.15400 0.64243 0.77634 0.78295 0.12190 0.08749
14 0.64708 0.67861 0.67864 0.15059 0.13037 0.50255 0.53621 0.53617 0.08910 0.07071
15 0.74438 0.81444 0.81902 0.16688 0.14117 0.60845 0.69161 0.69312 0.10085 0.07851
16 0.84495 0.91453 0.92076 0.19910 0.15700 0.74340 0.84214 0.84661 0.12385 0.08902
[17] 0.79250 0.82753 0.82892 0.17654 0.14476 0.67360 0.71617 0.71654 0.10737 0.08076
18 0.81585 0.85752 0.86045 0.18534 0.14961 0.70285 0.75773 0.75903 0.11441 0.08376
[19J 0.83133 0.90917 0.92771 0.24927 0.18278 0.73048 0.84521 0.86022 0.16829 0.11168
20 0.67630 0.70367 0.70371 0.16159 0.13807 0.53490 0.56499 0.56494 0.09688 0.07688
[21] 0.78997 0.83131 0.83647 0.18907 0.15461 0.66470 0.71712 0.71917 0.11754 0.08968
22 0.93598 0.96845 0.97375 0.25943 0.18517 0.87880 0.93434 0.94018 0.17081 0.11083
[23J 0.90123 0.91524 0.91700 0.21660 0.16524 0.82293 0.84274 0.84371 0.13600 0.09562
[24] 0.94513 0.96927 0.97321 0.24655 0.18208 0.89032 0.93273 0.93696 0.15958 0.10905
25] 0.96211 0.98182 0.98469 0.27168 0.19690 0.92165 0.95854 0.96208 0.18012 0.11732
26 0.94769 0.95990 0.96196 0.24890 0.18512 0.89636 0.91673 0.91854 0.16102 0.10871
27 0.95310 0.96790 0.97045 0.25895 0.19005 0.90658 0.93157 0.93412 0.16873 0.11246
A Test of Exponentiality 109

Table 8.6: Monte Carlo null probabilities of T for exponential distribution at


levels 2.5 (2.5) 50%
m 8 10 12 16 20 30 40 50 60
2.5ro 0.02253 0.02328 0.02371 0.02301 0.02429 0.02413 0.02476 0.02507 0.02462
5'1'0 0.04890 0.04808 0.04835 0.04923 0.04911 0.04898 0.05009 0.04965 0.04871
7.5ro 0.07512 0.07483 0.07351 0.07483 0.07470 0.07397 0.07582 0.07520 0.07412
10'1'0 0.10062 0.10068 0.09847 0.09988 0.09974 0.10001 0.10157 0.10074 0.09971
12.5'Ji'o 0.12675 0.12631 0.12446 0.12603 0.12449 0.12483 0.12693 0.12671 0.12484
15ro 0.15306 0.15266 0.14985 0.15147 0.15003 0.15098 0.15134 0.15170 0.14946
17.5'1'0 0.17857 0.17859 0.17529 0.17705 0.17558 0.17612 0.17663 0.17680 0.17457
20'Ji'o 0.20468 0.20510 0.20045 0.20283 0.20115 0.20173 0.20129 0.20140 0.19986
22.5ro 0.23018 0.23117 0.22611 0.22773 0.22700 0.22683 0.22697 0.22716 0.22475
25'Ji'o 0.25656 0.25657 0.25116 0.25349 0.25133 0.25243 0.25266 0.25172 0.25019
27.5'Ji'o 0.28182 0.28228 0.27619 0.27841 0.27594 0.27754 0.27726 0.27639 0.27496
30'70 0.30769 0.30775 0.30175 0.30400 0.30082 0.30235 0.30175 0.30168 0.30020
32.5'Ji'o 0.33301 0.33277 0.32711 0.32783 0.32573 0.32753 0.32655 0.32668 0.32499
35'70 0.35814 0.35827 0.35161 0.35277 0.35141 0.35271 0.35235 0.35113 0.35056
37.5'Yo 0.38273 0.38346 0.37726 0.37720 0.37649 0.37799 0.37732 0.37617 0.37520
40'Ji'o 0.40781 0.40829 0.40274 0.40147 0.40152 0.40335 0.40204 0.40129 0.40072
42.5ro 0.43366 0.43375 0.42809 0.42668 0.42666 0.428'30 0.42760 0.42633 0.42546
45'Ji'o 0.45883 0.45921 0.45420 0.45230 0.45167 0.45283 0.45339 0.45039 0.44989
47.5'Ji'o 0.48439 0.48401 0.47919 0.47614 0.47604 0.47765 0.47838 0.47578 0.47557
50'1'0 0.50943 0.50819 0.50365 0.50109 0.50159 0.50268 0.50383 0.50129 0.50035

Table 8.7: Simulated and approximate values of the power of T* at 10% and
5% levels of significance

Lognormal(l.O Lomax 2.0)


lU~ 0 070 lU~ 0 O~ 0
k n1 m1 c.s. bim. App. blm. App. blm. App. blm. _App.
2 40 20 0.44296 0.51678 0.35086 0.39493 0.49618 0.37621 0.39880 0.27749
~~~gI~! 0.42930 0.49896 0.33882 0.38034 0.59646 0.43641 0.50350 0.33563
[13][15]
[14][14) 0.65778 0.68772 0.52172 0.55493 0.16830 0.17085 0.09696 0.10069
ou 4U V.25098 0.38508 0.16614 0.28199 0.81572 U.78496 U.768U8 U.71U48
~~~H~:j 0.52032 0.61590 0.43032 0.49254 0.64546 0.53579 0.54524 0.42705
[23][23] 0.66720 0.69734 0.54014 0.57330 0.39242 0.38314 0.28358 0.27471
3 40 20 0.53794 0.70703 0.44136 0.59081 0.57268 0.42501 0.47108 0.32097
l~~W~H~~!
[14] [14] [15) 0.75916 0.82885 0.64342 0.72754 0.29140 0.24573 0.20196 0.15959
[14] [14] [14) 0.81718 0.84408 0.71218 0.74542 0.19724 0.19967 0.12016 0.12231
60 4U U.4454U U.oU647 U.iloIlU U.78!S:o!U U.!SlUU!S
f~~H~~H~:! 0.65518 0.79441 0.56334 ~::~~:~ 0.70282 0.61208 ~:~~~~~ ~:~~~~~
[23][23][23) 0.82042 0.84665 0.72174 0.75498 0.50552 0.49186 0.38716 0.37461
110 N. Balakrishnan, H. K. T. Ng, and N. Kannan

References
1. Balakrishnan, N. (1983). Empirical power study of a multi-sample test of
exponentiality based on spacings, Journal oj Statistical Computation and
Simulation, 18, 265-271.

2. Balakrishnan, N. and Aggarwala, R. (2000). Progressive Censoring: The-


ory, Methods, and Applications, Boston: Birkhauser.

3. Balakrishnan, N. and Basu, A. P. (1995) (Eds.). The Exponential Dis-


tribution: Theory, Methods and Applications, Langhorne, Pennsylvania:
Gordon and Breach.

4. Balakrishnan, N. and Rao, C. R. (1997). Large-sample approximations


to the best linear unbiased estimation and best linear unbiased predic-
tion based on progressively censored samples and some applications, In
Advances in Statistical Decision Theory and Applications (Eds., S. Pan-
chapakesan and N. Balakrishnan), pp. 431-444, Boston: Birkhauser.

5. Cohen, A. C. (1963). Progressively censored samples in life testing, Tech-


nometrics, 5, 327-329.

6. Cohen, A. C. (1966). Life testing and early failure, Technometrics, 8,


539-549.

7. D'Agostino, R. B. and Stephens, M. A. (1986) (Eds.). Goodness-oj-Fit


Techniques, New York: Marcel Dekker.

8. David, H. A. (1981). Order Statistics, Second Edition, New York: John


Wiley & Sons.

9. Johnson, N. 1., Kotz, S., and Balakrishnan, N. (1994). Continuous Uni-


variate Distributions- Volume 1, Second Edition, New York: John Wiley
& Sons.

10. Kendall, M. G.and Stuart, A. (1969). The Advanced Theory oj Statistics,


Volume 1, London: Charles Griffin.

11. Nelson, W. (1982). Applied LiJe Data Analysis, New York: John Wiley
& Sons.

12. Shapiro, S. S. (1995). Goodness-of-fit tests, In The Exponential Distribu-


tion: Theory, Methods and Applications (Eds., N. Balakrishnan and A. P.
Basu) , Chapter 13, Langhorne, Pennsylvania: Gordon and Breach.
A Test of Exponentiality 111

13. Shapiro, S. S. and Wilk, M. B. (1972). An analysis of variance test for


the exponential distribution, Technometrics, 14, 355-370.

14. Spinelli, J. J. and Stephens, M. A. (1987). Tests for exponentiality when


origin and scale parameters are unknown, Technometrics, 29,471-476.

15. Stephens, M. A. (1986). Tests for the exponential distribution, In


Goodness-oj-Fit Techniques (Eds., R. B. D'Agostino and M. A. Stephens),
Chapter 10, New York: Marcel Dekker.

16. Tiku, M. L. (1980). Goodness of fit statistics based on the spacings of


complete or censored samples, Australian Journal oj Statistics, 22, 260-
275.

17. Viveros, R. and Balakrishnan, N. (1994). Interval estimation of parame-


ters of life from progressively censored data, Technometrics, 36, 84-91.
9
Goodness-of-Fit Statistics for the Exponential
Distribution When the Data are Grouped

Sneh Gulati and Jordan Neus


Florida International University, Miami, Florida
State University of New York at Stony Brook, Stony Brook, New York

Abstract: In many industrial and biological experiments, the recorded data


consist of the number of observations falling in an interval. In this paper, we
develop two test statistics to test whether the grouped observations come from
an exponential distribution. Following the procedure of Damianou and Kemp
(1990), Kolmogrov-Smirnov type statistics are developed with the maximum
likelihood estimator of the scale parameter substituted for the true unknown
scale. The asymptotic theory for both the statistics is studied and power studies
carried out via simulations.

Keywords and phrases: Empirical distribution function, exponential distrib-


ution, grouped data, Kolmogrov-Smirnov, goodness-of-fit, parametric bootstrap

9.1 Introduction
In a number of life testing experiments, it is impossible to monitor units con-
tinuously; instead one inspects the units intermittently or at prespecified times.
Thus the data consists of the number of failures or deaths in an interval. For
example, when testing a large number of inexpensive units for time to failure,
it may be cost prohibitive to connect each one to a monitoring device. Thus
an inspector may inspect them at predetermined time intervals and record the
number of units that failed since the last inspection. Similarly in cancer fol-
low up studies where the variable of interest is time to relapse, a patient may
be monitored only at regular intervals or may seek help only after tangible
symptoms of the disease appear. Thus the time to relapse cannot be specified
exactly, but will only be known to lie between two successive clinic visits [see
Yu et al. (2000) for details]. Grouped data also arise when it is not possible to

113
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
114 S. Gulati and J. Neus

measure units precisely due to the finite precision of the measuring instrument.
As a result, one can only record the interval in which a measurement falls. See
Steiner et al. (1994) for some excellent examples of how grouped data can arise
naturally in the industry.
The first test to assess the goodness-of-fit of any model was developed by
Karl Pearson and is the well-known chi-square test. The chi-square test is also
the first test developed for grouped data since the test discretises any given
data set and compares the observed cell counts to the expected cell counts.
Next, came the empirical distribution function (EDF) tests, the Kolmogrov-
Smirnov (KS) tests and the Cramer-von Mises statistics. Originally devel-
oped for complete data sets, they have also been extensively studied for testing
goodness-of-fit for discrete and grouped data sets. The use of the KS test sta-
tistic for goodness-of-fit tests was by Schmid (1958), Conover (1972), Pettitt
and Stephens (1977), Wood and Altavela (1978), among others. While Schmid
(1958) studied the asymptotic distribution of the KS statistic for grouped data,
Conover (1972) derived the exact null distribution of the test statistic as did
Pettitt and Stephens (1974). A detailed study of the use of Cramer-von Mises
statistics for studying goodness-of-fit of discrete data was done by Choulakian
et al. (1994). They derived the asymptotic distribution of the W 2 , U 2 and the
A2 statistics for a discrete distribution and showed that asymptotically all three
test statistics can be written as a sum of independent non-central chi-square
variates.
It is well known that the KS statistic is based on the maximum distance
between the EDF and the hypothesized cumulative distribution function (CDF),
while the Cramer-von Mises statistics are functions of the distance between the
empirical CDF and the true CDF at all the observed data values (for continuous
data the difference is measured at all data points, while for grouped data, the
distance is measured at all the end points of the groups). Hence, the Cramer-
von Mises are in general, more powerful than the KS statistics. As a result, a
subclass of KS statistics which utilize the distance between the EDF and the
hypothesized CDF at all data values or at certain quantiles have been proposed,
among others by, Riedwyl (1967), Maag et al. (1973) and Green and Hegazy
(1976) and more recently, Damianou and Kemp (1990). These test statistics are
more powerful than the KS statistic, Watson's U-statistic and are comparable
to the Cramer-von Mises statistics.
Most goodness-of-fit tests developed for grouped data so far have been for
a completely specified null distribution, i.e., a simple null hypothesis. The pur-
pose of this paper is to develop statistics to test whether the given grouped
data comes from an exponential distribution with an unknown mean. We use
the methodology of Damianou and Kemp (1990) to develop the test statis-
tics. We develop the test statistics in Section 9.2. The asymptotic distribution
of the statistics is studied in Section 9.3, and finally, in Section 9.4, we study the
GOF Statistics for Grouped Exponential Data 115

power of the test statistics via simulations. An example to show the practical
applications of the test is also presented in Section 9.4.

9.2 The Model and the Test Statistics


Suppose that n independent observations are made on X, a lifetime with density
J(x). We assume that the units are observed at times Xl, X2, ... , Xk-l, leading
to observations in k groups (0, Xl), (Xl, X2), ... , (Xk-l, (0). The recorded data
then consists of nl, n2, ... , nk, where ni (1 :s; i :s; k) is the number of observa-
tions falling in the ith interval. The purpose of this paper is to use the above
grouped data to develop test statistics to test the null hypothesis

Ho : J(x) = Be- x (), B unknown. (9.1)

Since B is unknown, our first step in testing the hypothesis involves the
estimation of B. From Kulldorff (1961) the maximum likelihood estimator,
MLE, {j of B exists if and only if nl < nand nk < n, and is obtained by solving
the following equation:

k-l
ni ( Xi - Xi-l ) k
'"'" ()(_.)
~ e x, X,-l - 1 - '"'"
~ nixi-l = O.
(9.2)
i=l i=2

While the above equation can be solved easily by using iterative methods,
note that if all the intervals are of the same length, then (9.2) has a closed form
solution given as follows:

B~ -- ~ 1n
Xl
(1 + k
n - nk
2:i=l (i - l)ni
) . (9.3)

Let 7ri = e-()Xi-l - e-()Xi, 1 :s; i :s; n, be the true probability under the null,
of observing a value in the ith interval. Kulldorff (1961) has shown that {j is
consistent and asymptotically sufficient with asymptotic variance, aJ' given by
(n 2: ;i (d7rddB) 2) -1. From Nelson (1977), we also have that under the null hy-
pothesis, {j is asymptotically normally distributed with mean Band asymptotic
variance aJ.
Now in order to develop the test statistic, define the following quantities at
the inspection times, Xi:

F(Xi' {j) = 1 - e- xJj , the estimated null CDF,

Fn(Xi) = 2:;=1 ni/n, the empirical CDF, and


116 S. Gulati and J. Neus

Si = Fn(Xi) - F(Xi' iJ), the difference between the two estimates.


As in Damianou and Kemp (1990), we propose the following two test statis-
tics to test the hypothesis (9.1) against the alternate hypothesis that the data
do not come from an exponential distribution:

SWl=Vii~~
j=l W1(j)
(9.4)

where W1(j) = VF(xj,iJ)(I- F(xj,iJ)), and


k-1
SW2 = Vii l:)W2(j)} I Sj I (9.5)
j=l
where W2(j) = (k/2 - j)2+, X+ = 1 for x = 0 and X+ = x for x i- O.
Note that the weight function WI (t) is the same weight function in the
Anderson-Darling test and gives more weight to the tails of the distribution,
while the weight function W2(t) gives more weight to the center of the distrib-
ution.
To test the null hypothesis (9.1) against one-sided alternate hypothesis, one
can also define the following one-sided test statistics:
k-1 S.
SWl* = ViiL W (J .) (9.6)
j=l 1 J
k-1
SW2* = Vii L {W2(j)}Sj (9.7)
j=l
where the weights WI and W2 are defined as in the two-sided statistics.
As is obvious, the small sample distribution of the test statistics will depend
on the true value of (j as well as the cut points X1,X2, ... ,Xk-1, hence the p-
value of the test statistics is calculated based on their asymptotic distribution.
This is not overly restrictive, since the test statistics approach their asymptotic
distribution very quickly. Based on our simulation results and existing theory
on the distribution of count data, we recommend that the methods presented
in this paper can be used for n as small as 15.
Next we discuss the asymptotic theory of the test statistics.

9.3 Asymptotic Distribution


To study the asymptotic distribution of the test statistics, we define the follow-
ing quantities:
GOF Statistics for Grouped Exponential Data 117

1. P= ~, 1 :S i :S n, the point estimate of the probability that an observa-


tion lies in the ith interval.
2. 7ri = e- OXi - 1 - e- OXi , 1 :S i :S n, the estimated probability (under the null
hypothesis) that an observation falls in the ith interval.
3. Pi, 1 :S i :S n, the true probability of an observation falling in the ith
interval.
4. 7ri = e-f)xi-l - e-f)xi, 1 :S i :S n, the true probability under the null, of
observing a value in the ith interval.
Now from Bishop, Feinberg, and Holland (1975), we have that as n ---t 00:
Pi ---t Pi almost surely (w.p. 1) and 7ri ---t Pi = 7ri w.p. 1 if the null is true.
As the next step, we define the following vectors and matrices based on the
Pi'S:

and
PI 0 0 0
0 P2 0 0
Dp=

0 0 0 ... Pk
Now if we let
0"2 = (dil/de)'(dil/de) = L (dpjde)2/Pi
and
D;I (dil/ de) (dil/ de)'
L = 0"
2 '

then, again from Bishop, Feinberg, and Holland (1975), we have the following:
Theorem 9.3.1 Under the null hypothesis (9.1) and the regularity conditions
defined in Chapter 14 of Bishop, Feinberg, and Holland (1975), the k x 1 vector
W defined as
fo(PI - 7rd
fo(p2 - 7r2)
W=

fo(Pk - 7rk)
converges in distribution to a multivariate normal random W vector with mean
o and variance covariance matrix ~ = (Dp - jJjJ') (I - L).
118 S. Gulati and J. Neus

Since the one-sided ~t statistics defined in (9.5) and (9.6) are linear com-
binations of the vector W, Theorem 9.3.1 then immediately gives us:

Theorem 9.3.2 Assume that the null hypothesis defined in (9.1) is true and
the aforementioned regularity conditions are satisfied. Then as n - t 00, SW1 *
and SW2* converge in distribution to normal random variables with mean 0 and
variance given by O'f and O'~ respectively, where O'f and O'~ are scalar functions
of the matrix ~.

OUTLINE OF PROOF. To prove the theorem, we use techniques similar to


those used by Choulakian et at. (1994) and define the following vectors and
matrices:
Let B be the 1 x k row vector (1,1, ... , 1), and C be the k x k matrix given
by
100 0
1 1 0 0
C=

1 1 1 '" 1
We also define the k x k matrices of the weight functions as follows: Q'¥I is
the k x k diagonal matrix with its kth diagonal entry 0 and for 1 :S j :S k - 1,
the jth diagonal entry is 'l1 1( .) =
1 J J 1
F(xj, 61)(1 - F(Xj, e))
and QW2 is defined
A A

similarly for the weight function 'l1 2. That is, QW2 is the k x k diagonal matrix
with its kth diagonal entry 0 and for 1 :S j :S k - 1, the jth diagonal entry is
'l12(j).
Now note that

SW1* =
k-1
L L
(j WO)
'l1 (J O ) = (BQW1C)W
_
(9.8)
J=l 2=1 1 J

SW2* =
k-1
L L
(j W)
(J
'l1 O ) = (BQ W2 C)W.
_ (9.9)
J=l 2=1 2 J

From Kulldorff (1961), under the null hypothesis (9.1), 61 - t 61 w.p. 1,


F(xj,e) and from Theorem 9.3.1, tV converges in distribution to the mul-
tivariate normal vector W. The theorem now follows from the asymptotic
theory described in Section 14.9 of Bishop, Feiberg, and Holland (1975) and
the properties of the multivariate normal distribution with d and O'~ defined
appropriately. •

While a one tailed test is not commonly used to test hypotheses of the form
(9.1), Theorem 9.3.2 provides the foundation for the distribution theory of the
GOF Statistics for Grouped Exponential Data 119

two-sided test statistics. To test the hypothesis (9.1) against a general omnibus
alternate hypothesis, that is, Ha: the data do not come from an exponential
distribution, we use the test statistics defined in 9.4 and 9.5. Note that we can
write the statistics SW1 and SW2 as follows:
k-l j -
SW1=L L Wj.
j=l i=l \]i I (J)

From 9.8 and 9.9, we have that

and

Thus from Theorem 9.3.2, SW1 and SW2 converge in distribution to the sum of
the absolute values of the components of a multivariate normal random vector.
Note that while the asymptotic distribution of the test statistics is not known in
closed form, with the proliferation of high-speed computers, it can be simulated
quite easily and enable us to calculate "bootstrapped p-values" for the test.
Finally, the testing procedure is given as follows. From the given data set,
calculate the test statistic SW1 (or SW2), henceforth referred to as the data
test statistic SWdat. As mentioned previously, the distribution theory outlined
above allows us to calculate the p-value of the test through the following para-
metric bootstrap technique. Using the estimate, B, of ecalculated from the data,
generate 5,000 samples of size n from the density f(x, B) = Be- xo . Each sam-
ple is then grouped into the intervals defined by (0, Xl), (Xl, X2), ... , (Xk-l, 00),
and B, the "bootstrapped" test statistic SW1 (or SW2) are all calculated. The
p-value of the test is defined to be the proportion of "bootstrapped" test sta-
tistics less than or equal to the data test statistic, SWdat. The test is rejected
for small p-values. A FORTRAN program to calculate the p-value of the test
is available from the authors upon request.

9.4 Power Studies


We studied the power of the test statistics SW1 and SW2 in testing the null
hypothesis (9.1) with e equal to 1 against the omnibus alternate hypothesis,
Ha: the data do not come from an exponential distribution. To calculate the
power of the test, we used the parametric bootstrap technique as mentioned
earlier. The steps of the technique are outlined below:

1. Generate a single grouped sample from the alternate.

2. Estimate e and calculate SWI and SW2.


120 S. Gulati and J. Neus

3. Generate 5000 "grouped" exponential samples with scale parameter e.


We call these samples bootstrapped samples.

4. For the above samples, calculate the bootstrapped p-value.

5. Repeat the above steps 5000 times.

The power of the test is then given by the proportion of bootstrapped


p-values which are less than or equal to a.
For the alternate distributions, considered the following distributions:
Wei bull (shape = 1.5), Weibull (shape = 1.5, scale = 0.886), Half-Normal (0, 1),
Half-Normal (0,2), Chi Square (4), Uniform (0,2), Uniform (0,3), Gamma
(shape = 1.5), Chi Square (1), Wei bull (shape = 0.8), and Lognormal (0,1).
For each alternate distribution mentioned above, we studied the power for
a sample size n = 25, 100(25) and significance levels a = 0.2, 0.1, 0.05, 0.025,
0.01 and 0.005. We used 4, 5, 6, and 10 intervals with varying (but equal)
lengths. In general, we found that the test does very well against most alternate
distributions, except for the Weibull, with shape parameter equal to 0.8, and
the lognormal distribution. As expected, the power increases as the sample size
increases, and depends on the number of intervals and the distance between
them. Also as expected, if the length of the intervals is held constant, the
power increases as the number of intervals increases. When comparing the two
test statistics, it was found that SW2 always did better than the Anderson-
Darling type statistic. Some of these results are presented in Tables 9.1 and 9.2
and in Figure 9.1.
Finally, as an example, we decided to study the performance of the test on
the air traffic data described in Hsu (1979). The data consist of 213 aircraft
arrival times (in a particular sector of the Newark Airport) from noon till 8 PM,
on April 30, 1969. Hsu has shown through various tests that the distribution of
the arrival times is best described by the Poisson distribution and that of the
interarrival times is best described by the exponential distribution. We divided
both the arrival times and the interarrival times into 10 intervals and calculated
the p-value of the test statistics SWI and SW2 on the two data sets. For the
arrival times, both tests reject exponentiality with a p-value of 0.0 and for the
interarrival times, both tests accept exponentiality with the p-value for SWI
being 0.6788 and that for SW2 equal to 0.7756.
GOF Statistics for Grouped Exponential Data 121

Table 9.1: Power comparisons, n = 50, 5 cutpoints @ 0.4, 0.8, 1.2,


1.6, 2.0
Alternate Distribution a Power of SW1 Power of SW2
Gamma (1.5) 0.10 0.4428 0.4776
Gamma (1.5) 0.05 0.3324 0.3516
Weibull (1.5) 0.10 0.8490 0.8886
Weibull (1.5) 0.05 0.7616 0.8084
Weibull (2) 0.10 0.9990 0.9994
Weibull (2) 0.05 0.9976 0.9988
Half Normal (0,2) 0.10 0.3046 0.2874
Half Normal (0,2) 0.05 0.2066 0.1882
Log-Normal 0.10 0.2510 0.2884
Log-Normal 0.05 0.1536 0.1810

Table 9.2: Power comparisons, n = 50, 9 cutpoints @ 0.25, 0.5,


0.75, 1.0, 1.25, 1.5, 1.75, 2.0, 2.25
Alternate Distribution a Power of SW1 Power of SW2
Gamma (1.5) 0.10 0.7476 0.8516
Gamma (1.5) 0.05 0.6418 0.7690
Weibull (1.5) 0.10 0.3250 0.4034
Weibull (1.5) 0.05 0.2228 0.2938
Weibull (2) 0.10 1.0 0.9986
Weibull (2) 0.05 0.9998 0.9962
Half Normal (0,2) 0.10 0.6404 0.6428
Half Normal (0,2) 0.05 0.5156 0.5124
Log-Normal 0.10 0.8912 0.9284
Log-Normal 0.05 0.8140 0.8684
122 S. Gulati and J. Neus

1.0
----------_.
.. -
--------=-:
-"
. /'"
0.8
.. / O~:- •
0.6 O/A~.
~ //.~ e
C'/:/ .
(JJ

0.4

'/ .?-O -- chi(1 )


chi (4)

0.2 • /0
-/ --
--- weib(0.8)

:~O
-
Gamma(1.5)
weib(1.5)
HN(0.2)
0.0

0.03 0.08 0.13 O. - - LN(0.1)


significance. level

Figure 9.1: Power comparisons for SW1, k = 6, distance = 0.4

References
1. Bishop, Y. M., Feinberg, S. E., and Holland, P. W. (1975). Discrete
Multivariate Analysis, MIT Press: Cambridge.

2. Choulakian, V., Lockhart, R. A., and Stephens, M. A. (1994). Cramer-


von Mises statistics for discrete distributions, Canadian Journal of Sta-
tistics, 22, 125-137.

3. Conover, W. J. (1972). A Kolmogrov goodness-of-fit test for discontinuous


distributions, Journal of the American Statistical Association, 67, 591-
596.

4. Damianou, C. and Kemp, A. W. (1990). New goodness of statistics for dis-


crete and continuous data, American Journal of Mathematical and Man-
agement Sciences, 10, 275-307.
GOF Statistics for Grouped Exponential Data 123

5. Green, J. R. and Hegazy, Y. A. S. (1976). Powerful modified-EDF good-


ness-of-fit tests, Journal of the American Statistical Association, 71, 204-
209.

6. Hsu, D. A. (1979). Detecting shifts of parameter in gamma sequences


with applications to stock prices and air traffic flow analysis, Journal of
the A merican Statistical Association, 74, 31-40.

7. Kulldorff, G. (1961). Estimation from Grouped and Partially Grouped


Samples, John WIley & Sons: New York.

8. Maag, U. R., Streit, P., and Drouilly, P. A. (1973). Goodness-of-fit test


for grouped data, Journal of the American Statistical Association, 68,
462-465.

9. Nelson, W. (1977). Optimum demonstration tests with grouped inspec-


tion data from an exponential distribution, IEEE Transactions in Relia-
bility, 23, 226-23l.

10. Pettitt, A. N. and Stephens, M. A. (1977). The Kolmogrov-Smirnov


goodness-of-fit statistic with discrete and grouped data, Technometrics,
19, 205-210.

11. Riedwyl, H. (1967). Goodness-of-fit, Journal of the American Statistical


Association, 62, 390-398.

12. Schmid, P. (1958). On the Kolmogrov and Smirnov limit theorems for
discontinuous distribution functions, Annals of Mathematical Statistics,
29,1011-1027.

13. Steiner, S. H., Geyer, P. 1., and Wesolowsky, G. O. (1994). Control charts
based on grouped data, International Journal of Production Research, 32,
75-91.

14. Wood, C. L. and Altavela, M. M. (1978). Large sample results for the
Kolmogrov-Smirnov statistics for discrete distributions, Biometrika, 65,
235-239.

15. Yu, Q., Li, L., and Wong, G. (2000). On consistency of the self-consistent
estimator of survival functions with interval censored data, Scandinavian
Journal of Statistics, 27, 35-44.
10
Characterization Theorems and Goodness-of-Fit
Tests

Carol E. Marchetti and Govind S. Mudholkar


Rochester Institute of Technology, Rochester, New York
University of Rochester, Rochester, New York

Abstract: Karl Pearson's chi-square goodness-of-fit test of 1900 is considered


an epochal contribution to the science in general and statistics in particular.
Regarded as the first objective criterion for agreement between a theory and
reality, and suggested as "beginning the prelude to the modern era in statis-
tics," it stimulated a broadband enquiry into the basics of statistics and led to
numerous concepts and ideas which are now common fare in statistical science.
Over the decades of the twentieth century the goodness-of-fit has become a sub-
stantial field of statistical science of both theoretical and applied importance,
and has led to development of a variety of statistical tools. The characterization
theorems in probability and statistics, the other topic of our focus, are widely
appreciated for their role in clarifying the structure of the families of probability
distributions. The purpose of this paper is twofold. The first is to demonstrate
that characterization theorems can be natural, logical and effective starting
points for constructing goodness-of-fit tests. Towards this end, several entropy
and independence characterizations of the normal and the inverse gaussian (IG)
distributions, which have resulted in goodness-of-fit tests, are used. The second
goal of this paper is to show that the interplay between distributional charac-
terizations and goodness-of-fit assessment continues to be a stimulus for new
discoveries and ideas. The point is illustrated using the new concepts of IG
symmetry, IG skewness and IG kurtosis, which resulted from goodness-of-fit in-
vestigations and have substantially expanded our understanding of the striking
and intriguing analogies between the IG and normal families.

Keywords and phrases: Goodness-of-fit, characterization

125
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
126 C. E. Marchetti and G. S. Mudholkar

10.1 Introduction and Summary


Karl Pearson's classic "The Grammar of Science" (1892), his definition of the
product moment correlation coefficient with Filon (1898), and his chi-square
goodness-of-fit (1900) are major landmarks in the history of scientific and sta-
tistical decision making. The chi-square goodness-of-fit (GOF) test in particular
contributed the first objective criterion for agreement between a theory and re-
ality, and has been considered among the "20 Discoveries That Changed Our
Lives"; see Hacking (1984). Arguably, it became the launching platform for
the statistical science; or in the words of Bingham (2000), it "may be regarded
as beginning the prelude to the modern era in statistics." The chi-square GOF
test led to tabulation of the incomplete gamma function in Pearson's Statistical
Laboratory, the subsequent square root transformation due to Fisher, the cube
root transformation by Wilson and Hilferty for approximating the chi-square
distribution, and numerous developments surrounding these topics. It is the
corner stone of the important field of methods to verify distributional assump-
tions used in statistical modeling and data analysis. Early contributors to this
area include Pearson (1933), Greenwood (1946), Moran (1951), Kolmogorov
(1933), and many others; for references, see D'Agostino and Stephens (1986).
Much of this earlier work was focused on the simple GOF hypothesis, with
possible use of the plug-in approach for large sample testing of the composite
GOF hypotheses. In the early 1960's, when asked by 1. Vincze (1984), A. N.
Kolmogorov named testing the composite GOF hypothesis of normality as the
most important statistical problem of the period. Soon the well-known and
widely used Shapiro-Wilk test of normality (1965) appeared, and was followed
by many alternatives.
Characterization theorems are generally well appreciated for their aesthetic
appeal, mathematical completeness and the light they shed on the structures
of the probability distributions. Although logically self evident, but not well
recognized, is the fact that they can be natural and effective bases for construct-
ing GOF tests, useful for assessing the validity of distributional models such as
normal, exponential, inverse gaussian (IG) etc., commonly used in statistical
analyses of data.
To marshal this point, in Section 10.2 we describe some characterization
theorems based on either maximum entropy properties of probability distri-
butions or independence of sample statistics. Goodness-of-fit tests based on
the maximum entropy properties are outlined in Section 10.3. In Section 10.4,
we describe how some of the independence characterizations have been used
to develop GOF tests for the normal, multivariate normal and IG composite
hypotheses.
The IG distribution, introduced by Schrodinger (1915) and Smoluchovsky
Characterization Theorems and Goodness-of-Fit Tests 127

(1915) in the context of Brownian motion, entered statistical applications via


the works of Tweedie (1945) and Wald (1947). The IG distribution was in-
dependently discovered by E. Halphen in the context of hydrology, but for
political reasons, was published under the authorship of Dugue (1941); see Se-
shadri (1999). It is now widely used for modeling and analyzing non-negative,
right-skewed data; see Seshadri (1993, 1999). In a review paper, Folks and
Chhikara (1978) highlighted some remarkable similarities between the gaussian
(G) and the IG distributions; see also Iyengar and Patwardhan (1988) and
Mudholkar and Natarajan (2000). As expressed by Dawid, a discussant of the
Folks and Chhikara review paper, these G-IG analogies "intrigued and baffled"
many and still remain a curiosity. In Section 10.5 we show how development
of the IG goodness-of-fit tests based on maximum entropy and independence
characterization properties led to the notions of IG symmetry, IG skewness and
IG kurtosis, and how they substantially expanded the list of G-IG analogies; see
Mudholkar and Natarajan (1999) and Mudholkar and Tian (2000). This sup-
ports the proposition that, as Pearson's chi-square GOF test did, goodness-of-fit
investigations today continue to generate new statistical notions and knowledge.

10.2 Characterization Theorems


One of the earliest and prettiest of the characterization results is Cramer's
(1936) convolution characterization of the normal distribution. It states that if
Xl and X2 are independent random variables such that Xl + X 2 has a normal
distribution, then Xl and X2 are normally distributed. The body of litera-
ture on characterization results for probability distributions now forms a well-
developed area of probability and statistics; see the monographs by Lukacs and
Laha (1964) and Kagan, Linnik, and Rao (1973), the Proceedings of the 1974
Calgary Conference, and, more recently, the monograph by Rao and Shanbhag
(1986). In this section we describe some characterizations of relevance to our
present objective.

10.2.1 Entropy characterizations


The notion of entropy and the maximum entropy principle have their roots
in the 19th and early 20th century physics in the formulation of disorder and
chaos, e.g. by Boltzman, in thermodynamics and statistical mechanics. It is
closely related to Kullback's (1959) notion of divergence, and Kullback-Leibler
information measures; see Kullback and Leibler (1951), Kullback (1959) and
Csiszar (1975).
The entropy, H(f), of a random variable X with p.d.f. f(x) is given by
H(f) = E[-log f(X)]. The following characterizations of some familiar proba-
128 C. E. Marchetti and G. S. Mudholkar

bility distributions will be used to illustrate a theme of this paper, namely the
role of the characterization results in GOF tests.

1. The Uniform Distribution. This is the simplest, the earliest and the best-
known entropy characterization. Among all random variables taking values in
[0,1]' the U(O, 1) variate has maximum entropy.

2. The Normal Distribution. This famous result was introduced by Shan-


non (1949) in the context of information theory, i.e. the mathematical theory
of communication. It states that among all real-valued random variables with
given fixed values of E(X) and E(X2), the normal variate with these expec-
tations attains the maximum value cr(2Jr(e)1/2 of the entropy. The result may
alternatively be expressed as: Among all real valued random variables with
variance equal to cr 2, the normal variates with variance cr 2 have the maximum
possible entropy.

3. The Exponential Distribution. Among all nonnegative random variables


with E(X) = J-L, the exponential random variable attains the maximum value
(1 + log J-L) of the entropy.

4. The Gamma Distribution. Among all nonnegative random variables


with fixed values of E(X) and E(log X), the gamma variate has maximum
entropy.

The above examples are particular cases of a general theorem characterizing


members of Koopman's multi-parameter exponential family; see Kagan, Lin-
nik, and Rao (1973), and Csiszar (1975). Seshadri (1993) gives an extensive
treatment of the IG as an exponential family, which implies the corresponding
IG entropy characterization. However, this characterization is not expedient
for testing goodness-of-fit. Instead, the following characterization due to Mud-
holkar and Tian (2000) can be used to construct a maximum entropy GOF test
for IG models.

5. The Inverse Gaussian Distribution. Let X be a non-negative random


variable Y = v'X and let E(y2) -1/ E(y2) = ~2 have a given fixed value. Then
Y attains the maximum possible value log( v2Jre~/2) when X is appropriate IG
random variable.

10.2.2 Statistical independence


The independence of the mean and the variance of a normal sample is perhaps
the best known and most commonly used result involving independence of sam-
ple statistics. Actually, this is a characterization of a random sample from a
Characterization Theorems and Goodness-of-Fit Tests 129

normal population, which stimulated a substantial body of results characteriz-


ing populations in terms of the independence of statistics of random samples
from them. For an excellent account of such results, see Kagan, Linnik, and
Rao (1973). The following are a few of these results which have been used as
starting points for developing GOF tests for parametric statistical models.

1. Sample Mean and Variance. The mean and variance of a random sample
from a population are independent if and only if the population is normal. This
result is attributed to Geary (1936), Lukacs (1942) and Zinger (1951).

2. Sample Mean and Third Central Moment. If XI,X2, ... ,Xn is a


random sample from a normal population then it is a simple exercise involving
characteristic functions to show that the sample mean X and the vector (Xl -
X, X2 - X, ... , Xn - X) are independently distributed. Consequently, it follows
that X and the third central moment m3 = ~(Xi - X)3/ n are independently
distributed. Actually, Kagan, Linnik, and Rao (1973), as a corollary of a general
result characterizing the normal distribution in terms of the independence of
linear and polynomial statistics of a random sample, show that the sample
mean and the third central sample moment are independent if and only if the
population is normal.

3. Multivariate Mean and Covariance Matrix. The multivariate ver-


sion of the characterization result 1 above states that for a random sample
X I, X 2, ... ,Xn from a p-variate population, the sample mean X and the sam-
ple covariance matrix S are independent if and only if the population is a
p-variate normal population.

4. Mean and Difference from Harmonic Mean. From the classical arith-
metic, geometric and harmonic inequality it follows that E(X- l ) - E(X) ;::: 0,
with equality if and only if X is degenerate. Actually, the above difference is a
legitimate scale parameter for the distribution of the reciprocal of the IG(J.L, >.)
variate. It is well known that the maximum likelihood estimates p, = X and
1/ A= V = ~(1/ Xi - 1/X) based on a random sample from the IG popula-
tion are independently distributed. Khatri (1962) has shown that X and V are
independently distributed if and only if the population is inverse gaussian.

5. Mean and Coefficient of Variation. It is well-known that if random


variables X and Y are Li.d., then X + Y and X/Yare independent if and only
if the common distribution of X and Y is gamma. A recent result by Hwang
and Hu (1999) states that the mean and coefficient of variation of a random
sample are independent if and only if the population is gamma.

These simple characterization results have been extended in a variety of


ways. For an overview account of such extensions, see Kagan, Linnik, and Rao
(1973) and Rao and Shanbhag (1986).
130 C. E. Marchetti and G. S. Mudholkar

10.3 Maximum Entropy Tests


The earliest explicit use of a characterization theorem for constructing a good-
ness-of-fit test is by Vasicek (1976), who used Shannon's maximum entropy
characterization to construct a test for the composite hypothesis of normality.
However, some earlier tests of uniformity and exponentiality can now be inter-
preted as maximum entropy tests. Vasicek's approach involved constructing a
nonparametric estimate of entropy and rejecting normality for its small values.

1. Vasicek's Test. Let Xl, X 2 , ••• , Xn be a random sample from a population


with density function f.(.). Then Vasicek's estimator of the population entropy
H(f) = E(-logf(x)) is

H(f) = Hmn = -1~


A {n
~ log -2 (XCi+m) - XCi-m)) , } (10.1)
ni=1 m

where XCI) :::; X(2) :::; ... :::; X Cn ), m is a positive integer less than n/2, XCi) =
XCI) for i < 1 and XCi) = XCn) for i > n. Vasicek then proposed rejecting the
null hypothesis Ha that the population is normal if is small, or equivalently if

Kmn = ~ {
n
II (XCi+m) - X Ci - m))
}I/n :::; Ca· (10.2)
2ms i=1

Vasicek gave an empirical tabulation of the 5% points of the null distribution


of the Kmn test, and presented results of a study of its power properties. He
also argued consistency of the test by showing that under Ha, as n ~ 00, Kmn
converges in probability to O"-I exp{H(f)} < y'27re.
Vasicek's approach can be used to interpret Greenwood's (1946) test of
uniformity and it's extension by Cressie (1976) in the entropy terms; e.g. see
Dudewicz and van der Meulen (1981). Furthermore, it has also been used to
construct GOF tests for a variety of parametric models; e.g. see Gokhale (1983)
and Mudholkar and Lin (1984) for the maximum entropy tests of exponentiality.
Now there exists a substantial body of literature on the distributional prop-
erties of Vasicek's entropy estimator, the sum-log of generalized spacings; e.g.
see Kashimov (1989), Hall (1984, 1986), van Es (1992), Beirlant et al. (1997),
and references therein. However, the null distribution of K mn , which involves
an estimate of the nuisance parameter, is still intractable.
The entropy measure H (f) is closely related to the measures of divergence
and Kullback-Leibler information. Nonparametric estimation of entropy and
information, and use of these estimates for testing goodness-of-fit against simple
and restricted composite alternatives, e.g. Ha : N(j.t, 0") versus HI : N(O, 0") or
Characterization Theorems and Goodness-oE-Fit Tests 131

HI : N(O,l), has been considered by Ebrahimi, Habibullah, and Soofi (1992)


and Soofi, Ebrahimi, and Habibullah (1995).

2. Inverse Gaussian Model. The IG distribution of non-negative right-


skewed random variables is increasingly used for modeling and analysis of data
in.applied research; see Seshadri (1999). Mudholkar and Tian (2000) have de-
veloped an IG analog of Vasicek's test, i.e. a maximum entropy test of the com-
posite IG hypothesis using the entropy characterization for the root-reciprocal
IG distribution given in Section 1O.2.l.
Specifically, if X(I), X(2) , ... , X(n) are the order statistics of a random sample
V
and Y(i) = 1/ X (n-i+1) , for i = 1,2, ... , n, then the maximum entropy test
rejects the IG character of the population if

(10.3)

where Hmn is the sample entropy of the Y(i) 's as defined by Vasicek and w is
found by

w2 = tr Yi
n
tr Yi- )-1
2/(n - 1) - n 2 ( n 2 / (n - 1) (10.4)

to be small. As in the case of the entropy test of normality, even the asymptotic
null distribution of this test is analytically intractable. Hence an empirical table
of the 5% points was constructed and compared with the similar table in Vasicek
(1976). Interestingly, the values in the two tables are remarkably close, but not
close enough to be considered identical. We shall return to this point in Section
10.5. Mudholkar and Tian have also considered the use of the Kullback-Leibler
information measure for testing the composite IG hypothesis against simple or
restricted composite alternatives.

10.4 Four Z Tests


In this section, we outline the use of independence characterizations mentioned
in Section 10.3 as the starting points for developing goodness-of-fit tests. Specif-
ically, four tests for assessing model appropriateness are outlined. Two are for
normality, one each for multivariate normality and for the composite inverse
gaussian hypothesis.

1. The Z2-Test. This test due to Lin and Mudholkar (1980), then labeled the
Z test, used the characteristic independence of the sample mean and variance
132 C. E. Marchetti and G. S. Mudholkar

of random samples from normal populations. For obvious reasons, Mudholkar,


Marchetti, and Lin (2000) later relabeled it.
Let X and 8 2 denote respectively the mean and the variance of a random
sample from a normal population. Obviously, a single replication (X,8 2 ) is
insufficient to test the independence. To obtain a measure of dependence be-
tween X and 8 2 Lin and Mudholkar start with the jackknife pseudovalues, or
equivalently, eX-i' 8=-i)' i = 1,2, ... ,n, obtained by deleting one observation at
a time.
It is well-known that the rank tests of independence, e.g. due to Hoeffding
(1948) or Blum, Kiefer, and Rosenblatt (1961), which are consistent against
all bivariate dependence alternatives, suffer from very low power. For this and
the practical reason of simplicity, Lin and Mudholkar first approximately sym-
metrize 8 2 by the well-known Wilson-Hilferty (1931) cube-root transformation
and propose the correlation coefficient r = Corr(X_i, (8=-i)I/3) as a measure of
dependence. Furthermore, because of the normality and robustness properties
of Fisher's arctanh- l transform, they propose

2
1 (1 r)
Z2 = -log -+- ,
1- r
(10.5)

as a test statistic for normality. Under normality, as n ----+ 00, foZ2 ----+ N(O, 3).
For use with small samples, n ~ 5, they empirically obtain approximations for
Var(Z2) and Kurto8i8(Z2) and recommend use of Edgeworth or Cornish-Fisher
corrections to the null distribution. Furthermore, Lin and Mudholkar show
that the Z2 test is consistent against and appropriate for detecting all skewed
alternatives to normality.

2. The Z3-Test. The characterization of normality in terms of the indepen-


dence between X and m3 was used, as in the Z2 test, by Mudholkar, Marchetti,
and Lin (2000) to construct the Z3 test of normality. However, because of the
symmetry of m3 a transformation was considered unnecessary. Specifically, for
testing normality of a population based on a random sample Xl, X 2 , ... ,Xn ,
they proposed the test statistic

Z3 =
~
(1
~') 1og 1 + r3) '
- r3
(10.6)

where r3 = Corr(X_i' m3,-i). To distinguish between the two Z statistics, they


named Lin and Mudholkar's statistic Z2 and the new statistic Z3. They show
that, as n ----+ 00 under the composite hypothesis of normality, foZ3 ----+ N(O, 4).
They also show that Z2 and Z3 are independent asymptotically, and empirically
for samples as small as n = 5. Furthermore, Z3 is shown to be consistent against
all alternatives with non-normal kurtosis.
In summary, as n ----+ 00, fo(Z2, Z3) are asymptotically bivariate normal
with mean 0 and covariance matrix ~ with diagonal elements Var(Z2) = 3,
Characterization Theorems and Goodness-of-Fit Tests 133

Var(Z3) = 4 and COV(Z2, Z3) = O. Mudholkar, Lin, and Marchetti use the
two Z-tests to detect four targeted skewness-kurtosis alternatives: right-skew
heavy-tail, right-skew light-tail, left skew heavy-tail and left skew light-tail.
This is done by combining the one-tail versions of the two Z-tests, which are
for all practical purposes independent, using the Fisher (1932) classical method
of combining independent p-values.

3. The Multivariate Zp-Test. Mudholkar, McDermott, and Srivastava


(1992) developed a p-dimensional adaptation of Lin and Mudholkar's test of
normality. Let Xl, X 2, ... ,X n be a random sample from a p-variate popula-
tion, with sample mean X and sample covariance matrix S. Then it is well
known that, if the population is normal, the Mahalonobis distances

(10.7)

are asymptotically independently distributed chi square variates. Hence, for


large n they may be transformed into approximately i.i.d. normal variates, and
the multivariate normality of the population may be tested by testing univariate
normality of the transforms. Towards this end, Mudholkar, McDermott, and
Srivastava (1992), empirically refine the Wilson-Hilferty transformation, and
claim that, for p ~ 2,

Yi = (D;)h, i = 1,2, ... , n, where h = 1/3 - O.llip (lO.8)

may be considered approximately i.i.d. normal variates. They propose testing


p-variate normality of the X's using

(10.9)

as the test statistic, i.e. by applying the Z2 test to the Y's. They derive the
asymptotic null distribution of Zp, and offer its empirical refinement, which is
applicable for n ~ 10.

4. The Z(IG) Test. Mudholkar, Natarajan, and Chaubey (2000) have em-
ployed Khatri's (1962) characterization of the inverse gaussian distribution, as
in the examples above, to construct the Z(IG) statistic for testing the compos-
ite IG hypothesis. They find that asymptotically, under the null hypothesis, as
n ---t 00, jnZ(IG) is normally distributed with zero mean and variance 3, and
present a small sample refinement of the distribution. It is interesting that the
asymptotic null distribution of Z (IG) is exactly same as that of the Z2 statistic
of normality. We shall return to amplify this point in the next section.
We close this section by noting the paucity of the GOF tests for the im-
portant composite gamma distributional assumption and report that a Z test
based on the characterization of the gamma distribution stated in the previous
section is under development.
134 C. E. Marchetti and G. S. Mudholkar

10.5 Byproducts: The G-IG Analogies


This section is given to supporting the proposition, suggested in the intro-
duction, that construction of GOF tests and related issues can generate new
statistical notions and knowledge. The parallelism between the characterization
results for the normal and the IG families and the related GOF tests are used
to illustrate the point.
It is well-known that the approximately 100 years old IG family is strik-
ingly similar to the nearly three centuries old gaussian family in terms of its
analytical simplicity and remarkably parallel statistical properties and meth-
ods; e.g. see discussion of Folks and Chhikara (1978). Arguably, the IG and
the gaussian distributions may be regarded as fraternal twins, or the IG and
the G families may be said to form parallel universes. Work on the GOF tests
for the composite IG hypothesis described in the earlier sections resulted in a
substantial expansion of the IG inference methodology and of the list of the
G-IG analogies. Some of these results and connections are now outlined.

1. IG-Skewness and IG-Kurtosis. Lin and Mudholkar's Z test result-


ing from the characteristic independence of the sample mean and variance is
targeted to detect skew alternatives. For the parallel IG test, based on the
independence of the maximum likelihood estimators of the two IG parameters,
the nature of its target alternatives was raised by Mudholkar, Natarajan, and
Chaubey (2000). The investigation into an answer led to the following defini-
tions of the coefficient 61 of IG-skewness and 62 of IG-kurtosis

(E(X)E(X 1) -1)JE(X2)/E2(X) -1 '


[ Var(V)]
E2(V) + 1, (10.10)

where V = E(l/Xi -l/X)/n. Interestingly, the asymptotic joint distribution


of d1 and d2, the estimators of 61 and 62 , obtained by substituting sample
moments of IG samples for the population moments, is exactly the same as that
of the Pearson coefficients b1 and b2 based on normal samples. That is, 61 rv
N(0,6/n) and 62 rv N(0,24/n) and the two are asymptotically independent;
see Mudholkar and Natarajan (1999). A development of the GOF tests of the
IG composite hypothesis based on d1 and d2, along the lines of the tests of
normality based on b1 and b2 [D'Agostino and Stephens (1986)], is in progress.

2. IG Symmetry. The concept of IG-skewness was developed as a measure


of departure from IG-symmetry, where X is ~ nonnegative random variable. X
Characterization Theorems and Goodness-of-Fit Tests 135

is said to be IG-symmetric about its expectation /-L, if

r = 1,2, ... (10.11)

Mudholkar and Natarajan (1999) show that the contaminated IG distrib-


utions, i.e. the IG scale mixtures, the lognormal distributions and mixtures
of these are IG-symmetric, and suggest that these families may have a role in
future developments of robust IG methods.

3. The ANORE Alternatives. The procedure for testing homogeneity


of IG means is known as the analysis of reciprocals (ANORE), and, as in the
normal theory ANOVA, uses the F distribution. The best known contrast-based
normal theory alternatives to the ANOVA test are Tukey's studentized range
and Dunnet's comparison with control tests. Mudholkar and Tian (2000) define
the IG analogs of these tests for comparing IG means and show that the null
distributions of these statistics for comparing IG means and show that their
null distributions are the same as those of their normal theory counterparts.
Thus these procedures can be implemented without new tables or software.

4. Robust Tests. In her dissertation, Natarajan (1998) began with the IG-
GOF tests, then developed and studied IG analogs of the classical tests for
equality of variances due to Bartlett, Cochran, Hartley and others, and the IG
analogs of the order constrained versions of these tests due to Fujino (1979).
She also developed IG analogs of the robust tests for homogeneity of variances
in Mudholkar, McDermott, and Aumont (1993) and Mudholkar, McDermott,
and Mudholkar (1995). Also considered in her dissertation is an IG analog of
the transformation methods of Box and Cox (1964). The motivation for the
entire investigation was the similarity between the normal theory and the IG
theory originally stimulated by the GOF problem.

5. Order Constrained Inference. The theory for statistical inference sub-


ject to a priori order constraints goes back to Chernoff (1954) and Bartholomew
(1959a,b). An excellent account of a substantial part of the growing body of
literature on the subject appears in Barlow, Bartholomew, Bremner and Brunk
(1971) and Robertson, Wright, and Dykstra (1988). Mudholkar and Tian (2000)
have developed and studied the likelihood ratio test for comparing IG means
subject to linear order constraints, and have shown that the null distributions
of these tests correspond well with the normal theory results. They have also
studied, in the IG context, a simple approach for general order restrictions pro-
posed in Mudholkar and McDermott (1989) and in McDermott and Mudholkar
(1993), and concluded that the results are essentially analogous to those in the
normal theory.
136 C. E. Marchetti and G. S. Mudholkar

6. Extreme Value Distributions. Extreme value and extreme spacings dis-


tributions are elegant and important parts of statistical theory and practice.
Friemer et al. (1989) showed that the asymptotic distributions of the extremes
and extreme spacings of random samples can be simply derived using elemen-
tary methods, such as Taylor expansions, with quantile functions. They also
showed that if anXn:n +bn , where Xi:n denotes the ith largest in a random sam-
ple of size n, has an asymptotic distribution, then the high probability order of
magnitude of the tail length as measured by the extreme spacings is given by
Xn:n -X(n-l):n = Op(a~l). However, their work did not cover the pedagogically
important normal and gamma populations, which lack closed form expressions
for their quantile functions. Mudholkar and Tian (2000) fill in this gap and
additionally consider the IG, the reciprocal IG (RIG) and the reciprocal root
IG (RRIG) populations.

7. Simulation Considerations. Because the null distribution of Vasicek's


maximum entropy test statistic is intractable, he had to tabulate it empirically.
The IG test statistic by Mudholkar and Tian (2000) had the same problem and
consequently had to resort to the same approach. A comparison of their table
with Vasicek's showed the two to be remarkably close, but not close enough
to be considered identical. However, it was noticed that in his Monte Carlo
study, as recommended by IBM at the time, Vasicek used the mean of twelve
uniform random variables as the generator for normal deviates. When Vasicek's
experiment was repeated by Mudholkar and Tian using the current IMSL de-
fault generator, the difference between the percentiles for the entropy tests of
normality and the IG hypothesis were seen to be statistically very insignificant.
This observation has obvious implications for the results of early Monte Carlo
experiments.

8. Conclusion. Statistical measurements are generally nonnegative and pos-


itively skewed. Hence, the IG family of such distributions may often be bet-
ter suited for modeling and analysis of data than its older and better-known
gaussian twin. The normal theory for statistical analysis is substantial and very
well developed, whereas the theory for the IG inference may be considered to
be in its infancy. There are indications that the two methodologies may be
strongly analogous, and that much of the effort expended in the development
of normal theory may have served the cause of statistical analysis of IG models.
We emphasize that these findings were part of investigations stimulated by the
interplay between the areas of characterization theorems and goodness-of-fit
tests.
Characterization Theorems and Goodness-of-Fit Tests 137

References
1. Barlow, R E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D.
(1971). Statistical inJerence under order restrictions, New York: John
Wiley & Sons.

2. Bartholomew, D. J. (1959a). A test of homogeneity for ordered alterna-


tives, Biometrika, 46, 36-38.

3. Bartholomew, D. J. (1959b). A test of homogeneity for ordered alterna-


tives II, Biometrika, 46, 328-335.

4. Beirlant, J., Dudewicz, E. J., Gyorfi, 1., and van der Meulen, E. C.
(1997). Nonparametric entropy: An overview, International Journal oj
Mathematical and Statistical Sciences, 6, 17-39.

5. Bingham, N. H. (2000). Studies in the history of probability and sta-


tistics XLVI. Measure into probability: From Lebesgue to Kolmogorov,
Biometrika, 87, 145-156.

6. Blum, J. R, Kiefer, J., and Rosenblatt, M. (1961). Distribution free


tests of independence based on the sample distribution function, Annals
of Mathematical Statistics, 32, 485-498.

7. Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations,


Journal oj the Royal Statistical Society, Series B, 26, 211-252.

8. Chernoff, H. (1954). Testing homogeneity against ordered alternatives,


Annals oj Mathematical Statistics, 34, 945-956.

9. Cramer, H. (1936). Uber eine Eigenschaft der normalen Verteilungsfunk-


tion, Math. Zeitschrijt, 41, 405-411.

10. Cressie, N. (1976). On the logarithms of high-order spacings, Biometrika,


63, 343-355.

11. Csiszar, I. (1975). I-divergence geometry of probability distributions an


minimization problems, Annals oj Probability, 3, 146-158.

12. D'Agostino, R B. and Stephens, M. A. (Eds.) (1986). Goodness-oj-fit


Techniques, Marcel Dekker: New York.

13. Darling, D. A. (1953). On a class of problems related to the random


division of an interval, Annals oj Mathematical Statistics, 24, 239-253.
138 C. E. Marchetti and G. S. Mudholkar

14. Dawid, A. P. (1978). Comments on "The inverse Gaussian distribution


and its statistical application: A review," Journal of the Royal Statistical
Society, Series B, 40, 280.
15. Dudewicz, E. and van der Meulen; E. C. (1981). Entropy-based tests of
uniformity, Journal of the American Statistical Association, 76, 967-974.

16. Dugue, D. (1941), Sur un nouveau type de courbe de frequence, Comptes


Rendus de l'Academie des Sciences Paris, 213, 634-635.
17. Ebrahimi, N., Habibullah, M., and Soofi, E. S. (1992). Testing for ex-
ponentiality based on Kullback-Leiber information, Journal of the Royal
Statistical Society, Series B, 54, 739-748.
18. Fisher, R. A. (1932). Statistical Methods for Research Workers, London:
Oliver and Boyd,.

19. Folks, J. L. and Chhikara, R. S. (1978). The inverse gaussian distribution


and its statistical application - a review, Journal of the Royal Statistical
Society, Series B, 40, 263-289.
20. Freimer, M., Kollia, G., Mudholkar, G. S., and Lin, C. T. (1989). Ex-
tremes, extreme spacings and outliers in the Tukey and Weibull families,
Communications in Statistics-Theory and Methods, 18,4261-4274.
21. Fujino, Y. (1979). Tests for the homogeneity of a set of variances against
ordered alternatives, Biometrika, 66, 133-140.

22. Geary, R. C. (1936). The distribution of 'Student's' ratio for non-normal


samples, Supplement to the Journal of the Royal Statistical Society, 3,
178-184.
23. Gokhale, D. V. (1983). On entropy-based goodness-of-fit tests, Compu-
tational Statistics and Data Analysis, 1, 157-165.
24. Greenwood, M. (1946). The statistical study ofinfectious disease, Journal
of the Royal Statistical Society, Series B, 109, 85-110.
25. Hacking, I. (1984). Trial by numbers, Science-84, 5, 69-73.
26. Hall. P. (1984). Limit theorems for sums of general functions of m-
spacings, Mathematical Statistics and Data Analysis, 1,517-532.

27. Hall, P. (1986). On powerful distributional tests on sample spacings,


Journal of Multivariate Analysis, 19, 201-255.
28. Hoeffding, W. (1948). A nonparametric test of independence, Annals of
Mathematical Statistics, 19, 546.
Characterization Theorems and Goodness-of-Fit Tests 139

29. Hwang, T-Y. and Hu, C-Y. (1999). On a characterization of the gamma
distribution: The independence of the sample mean and the sample coef-
ficient of variation, Annals of the Institute of Statistical Mathematics, 51,
749-753.

30. Iyengar, S. and Patwardhan, G. (1988). Recent developments in the in-


verse Gaussian distribution, Handbook of Statistics Volume 7, 479-490.

31. Kagan, A. M., Linnik, Y. V., and Rao, B. (1973). Characterization Prob-
lems in Mathematical Statistics, New York: John Wiley & Sons.

32. Kashimov, S. A. (1989). Asymptotic properties of functions of spacings,


Theory of Probability and its Applications, 34, 298-307.

33. Khatri, C. G. (1962). A characterization of the inverse gaussian distrib-


ution, Annals of Mathematical Statistics, 33, 800-803.

34. Kolmogorov, A. (1933). Sulla determinazione empirica di una legge di


distribuzione, Gior. 1st. Ital. Attuari, 4, 83-91.

35. Kullback, S. (1959). Information Theory and Statistics, p. 15, New York:
John Wiley & Sons.

36. Kullback, S. and Leibler, R. A. (1951). On information and sufficiency,


Annals of Mathematical Statistics, 22, 79-86.

37. Lin, C. C. and Mudholkar, G. S. (1980). A simple test for normality


against asymmetric alternatives, Biometrika, 67, 455-461.

38. Lukacs, E. (1942). A characterization of the normal distribution, Annals


of Mathematical Statistics, 13, 91-93.

39. Lukacs, E. and Laha, R. G. (1964). Applications of Characterizations,


New York: Hafner.

40. McDermott, M. P. and Mudholkar, G. S. (1993). A simple approach to


testing homogeneity of order-constrained means, Journal of the American
Statistical Association, 88, 1371-1379.

41. Moran, P. A. P. (1951). The random division of an interval - Part II,


Journal of the Royal Statistical Society, Series B, 9, 92-98.

42. Mudholkar, G. S. and Lin, C. T. (1984). On two applications of charac-


terization theorems to goodness of fit, Colloquia Mathematica Societatis
Janos Bolyai, 45, 395-414.
140 C. E. Marchetti and G. S. Mudholkar

43. Mudholkar, G. S., Marchetti, C. E., and Lin, C. T. (2000). Independence


characterizations and testing normality, Journal of Statistical Planning
and Inference (to appear).

44. Mudholkar, G. S. and McDermott, M. P. (1989). A class of tests for


equality of ordered means, Biometrika, 76, 161-168.

45. Mudholkar, G. S., McDermott, M. P., and Aumont, J. (1993). Testing


homogeneity of ordered variances, Metrika, 40, 271-281.

46. Mudholkar, G. S., McDermott, M. P., and Mudholkar, A. (1995). Robust


finite-intersection tests for homogeneity of ordered variances, Journal of
Statistical Planning and Inference, 43, 185-195.

47. Mudholkar, G. S., McDermott, M. P., and Srivastava, D. K. (1992). A


test of p-variate normality, Biometrika, 79, 850-854.

48. Mudholkar, G. S. and Natarajan, R (1999). The inverse gaussian analogs


of symmetry, skewness and kurtosis, Annals of the Institute of Statistical
Mathematics (to appear).

49. Mudholkar, G. S., Natarajan, R, and Chaubey, Y. P. (2000), Indepen-


dence characterization and inverse gaussian goodness of fit composite hy-
pothesis, Sankhya (to appear).

50. Mudholkar, G. S. and Tian, L. (2000). An entropy characterization of the


inverse gaussian distribution and related goodness of fit test, Technical
Report, University of Rochester, Rochester, NY. Submitted for publica-
tion.

51. Natarajan, R (1998). An investigation of the inverse Gaussian distribu-


tion with an emphasis on Gaussian analogies, Ph.D. Thesis, University of
Rochester, Rochester, NY.

52. Pearson, K. (1892). The Grammar of Science, London: W. Scott.

53. Pearson, K. (1900). On the criterion that a given system of deviations


from the probable in the case of a correlated system of variables is such
that it can reasonably be supposed to have risen from random sampling,
Phil. Mag., 5, 157-175.

54. Pearson, K. (1933). On a method of determining whether a sample of


given size n supposed to have been drawn fro!ll a parent population hav-
ing a known probability integral has probably been drawn at random,
Biometrika, 25, 379-410.
Characterization Theorems and Goodness-of-Fit Tests 141

55. Pearson, K. and Filon, L. N. G. (1898). Philosophical Transactions of the


Royal Society of London, 191, 229-311.

56. Rao, C. R. and Shanbhag, D. N. (1986). Recent results on characteriza-


tion of probability distributions: A unified approach through extensions
of Deny's theorem, Advances in Applied Probability, 18, 660-678.

57. Robertson, T., Wright, F. T., and Dykstra, R. L. (1988). Order Restricted
Statistical Inference, New York: John Wiley & Sons.

58. Schrodinger, E. (1915). Zur Theorie der Fall-und-Steigversuche an Teilchenn


mit Brownsche Bewegung, Physikalische Zeitschrijt, 16, 289-295.

59. Seshadri, V. (1993). The Inverse Gaussian Distribution: A Case Study


in Exponential Families, Oxford: Clarendon Press.

60. Seshadri, V. (1999). The Inverse Gaussian Distribution: Statistical The-


ory and Applications, New York: Springer-Verlag.

61. Shannon, C. E. (1949). The Mathematical Theory of Communication, p.


55, University of Illinois Press.

62. Shapiro, S. S. and Wilk, M. B. (1965). An analysis of variance test for


normality (complete samples), Biometrika, 52, 591-611.

63. Smoluchovsky, M. V. (1915). Notiz uber die berechnung der Browschen


molekular-bewegung bei der ehrenhaft-millikanschen versuchsanordnung,
Phy. Z., 16, 318-321.

64. Soofi, E. S., Ebrahimi, N., and Habibullah, M. (1995). Information dis-
tinguishability with application to analysis of failure data, Journal of the
American Statistical Association, 90, 657-668.

65. Tweedie, M. C. K. (1945). Inverse statistical variance, Nature, 155, 453.

66. van Es, B. (1992). Estimating functionals related to a density by a class


of statistics based on spacings, Scandinavian Journal of Statistics, 19,
61-72.

67. Vasicek, O. (1976). A test for normality based on the sample entropy,
Journal of the Royal Statistical Society, Series B, 38, 54-59.

68. Vincze, 1. (1984). Colloquia Mathematica Societatis Janos Bolyai, 45,


395-414.

69. Wald, A. (1947). Sequential Analysis, New York: John Wiley & Sons.
142 C. E. Marchetti and G. S. Mudholkar

70. Wilson, E. B. and Hilferty, M. M. (1931). The distribution of chi-square,


Proceedings of the National Academy of Sciences, 17, 684-688.

71. Zinger, A. A. (1951). On independent samples from normal populations,


Uspeki Mat. Nauk, 6, 172-175.
11
Goodness-of-Fit Tests Based on Record Data and
Generalized Ranked Set Data

Barry C. Arnold, Robert J. Beaver, Enrique Castillo, and


Jose Maria Sarabia
University of California, Riverside, California
University of California, Riverside, California
University of Cantabria, Santander, Spain
University of Cantabria, Santander, Spain

Abstract: Assume that observations have common distribution function F.


We wish to test H : F = Fo where Fo is a completely specified distribution. Two
kinds of data are considered: (i) The first k+ 1- reco;d ~alues X(O), X(1), ... , X(k)
or possibly several independent sets of records based on observations with dis-
tribution F. (ii) Generalized ranked set data, i.e., J independent order statistics
Xij :nj; with common parent distribution F. Several appropriate goodness-of-
fit tests are described and evaluated by simulation studies. The more general
problem dealing with the composite hypothesis H : F E {F(·; 0) : 0 E 8} is
also discussed.

Keywords and phrases: Chi-squared test, Watson statistic, missing data,


order statistics

11.1 Introd uction


The classic problem of goodness-of-fit involves determining whether a set of i.i.d.
observations can be reasonably supposed to have common distribution function
Fo, a completely specified distribution. It is often assumed, and is assumed
here, that Fo is continuous. Thus, via a straightforward transformation, we
reduce the problem to one of testing goodness-of-fit to either a uniform or an
exponential distribution, whichever is deemed convenient. The non-standard
feature of the analysis in this paper rests in the nature of the data assumed
available. We assume that our data will consist either of record values from
observations with common distribution F, or will consist of independent order

143
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
144 B. C. Arnold, R. J. Beaver, E. Castillo, and J. M. Sarabia

statistics with common parent distribution F (i.e. generalized ranked set sam-
ples [Kim and Arnold (1999)]. In both cases we will wish to test H : F = Fo.
It is natural to also consider the problem of testing a composite hypothesis
H : F E {F() : () E e} using record and ranked set data configurations. In such
a situation the first step will be to use the data to estimate ().
Simulation based power studies are provided for the proposed tests in the
simple hypothesis case. The major emphasis will be on the ranked set data
case. As we shall see in Section 11.2, record value data is readily analysed by
taking advantage of characteristic properties of record spacings for exponentially
distributed data.

11.2 Record Data


If our data consists of a set of k + 1 records X(O) , X(l)' ... ,X(k) (or perhaps
J independent sets of records) then our problem is readily transformed to a
standard exponential goodness-of-fit problem. This is true because the spacings
of the transformed records Y(i) = -log(l - FO(X(i))) will, under H : F = Fo,
be i.i.d. exponential variables.
If more generally, the hypothesis to be tested is H : F E {F() : () E e}, then
it will be necessary to obtain record based estimates of () to be used in trans-
forming the records before testing the spacings for exponentiality. See Arnold,
Balakrishnan, and Nagaraja (1998) for discussion of record based parameter
estimates.

11.3 Generalized Ranked Set Data


Here our data will consist of J independent order statistics X i1 :n1 X i2 ,n2' ... ,
Xi;:nJ' with common parent distribution F. To test H : F = Fo, we may
consider

(11.1)

and ask whether these are reasonably supposed to be uniform order statistics.
A Pearson-like goodness-of-fit statistic for this is of the form

(11.2)

Large values of T will be cause for rejection of H. The null distribution of T


would be expected to be approximately x3-
if J is large, the nj's are large and if
Goodness-of-Fit Tests 145

the ratios ijfnj are not too extreme. In practice however, the nj's will be small.
If J is large a X) approximation may be adequate. If J is small then a more
accurate evaluation of the null distribution of T will be needed. Balanced rank
set samples are most commonly used. These consist of m independent replicates
of a complete set of n independent order statistics Xl:n, X2:n,' ", Xn:n where n
is small and m is usually not so small. Simulation based upper 90, 95 and 99th
percentiles of the statistic T for such balanced ranked set samples are provided
in Table 11.1 for an array of choices of values for m and n. These simulations
based on 200,000 replications for some representative choices of m and n can be
expected to provide two figure accuracy and often three figure accuracy. More
extensive tables will be published elsewhere.
The discrepancies between the percentiles displayed in Table 11.1 and the
corresponding X~n approximation can be quite large. Some representative com-
parisons are given in Table 11.2. Note that the percentage error is in the range
-7.8% to 6.3%.
It is evident from Table 11.2, that only for large values of mn (say mn > 100)
is it reasonable to use a X2 approximation for the 90th percentile of T. Even
larger values of mn are required if we wish to accurately approximate the 95th
and 99th percentiles. In general recourse should be made to the simulated
values in Table 11.1.
Of course, one could instead have transformed to get exponential order
statistics instead of uniform ones. Thus we might define

(11.3)

and consider the related Pearson type statistic

- LJ (Zi"n' -
T = J' J Mi··n·)
J' J
2
fa.··
2
"J' nJ·
(11.4)
i=l

where

and

Table 11.3 includes simulation based percentiles for the statistics T for bal-
anced ranked set samples for an array of choices of m and n. These simulations
are based on 200,000 replications for each choice of m and n. More extensive
tables will be published elsewhere.
Table 11.4 provides indications of the discrepancies between the entries of
Table 11.3 and the corresponding X~n approximation for the distribution of T.
Here too large values of mn are required if we wish to reliably approximate the
146 B. C. Arnold, R. J. Beaver, E. Castillo, and J. M. Sarabia

Table 11.1: Simulation based upper 90, 95 and


99th percentiles of the statistic T for different
values of nand m
n m TO.90 TO.95 TO.99
2 1 4.2774 5.5352 7.6272
3 1 5.8970 7.6109 11.3275
5 1 9.1312 11.3177 16.8318
10 1 16.2907 19.3516 26.9705

1 3 5.1017 5.7119 6.8890


3 3 14.4149 16.6576 21.2247
5 3 22.5614 25.7760 32.6179
10 3 41.3461 45.9678 56.7119

2 5 15.0453 16.8732 20.4360


3 5 21.9053 24.4662 29.8155
5 5 34.7321 38.5275 46.5919
10 5 64.8435 70.3943 82.6211

1 10 13.7028 14.8317 16.9072


3 10 39.7595 43.1229 49.9132
5 10 63.8748 68.7633 78.7493
10 10 120.8672 128.2020 143.1316

2 25 60.9863 64.4107 71.3143


3 25 90.0779 95.0251 104.6656
5 25 146.7982 154.0079 168.1895
10 25 283.1010 293.8117 315.2286

1 100 111.5231 114.8760 121.2865


3 100 329.8593 338.9327 356.5544
5 100 543.0533 556.2458 581.1844
10 100 1065.0612 1084.8261 1122.8240
Goodness-of-Fit Tests 147

Table 11.2: Accuracy of chi-square approximations for percentiles of T

n m Percentile Simulated value Chi-square %Error


approximation
2 5 .90 15.0453 15.9872 6.26
5 5 .95 38.5275 37.6525 -2.27
10 5 .99 82.6211 76.1539 -7.83

3 10 .90 39.7595 40.2560 1.24


6 10 .95 80.9052 79.0819 -2.25
9 10 .99 118.4600 124.1163 4.77
4 25 .90 118.7316 118.4980 -0.19
7 25 .95 210.4848 206.8668 -1.72
10 25 .99 315.2286 304.9369 -3.26

2 50 .90 115.4174 118.4980 2.67


5 50 .95 290.2510 287.8815 -0.84
10 50 .99 589.0752 576.4928 -2.14

90th percentile of T by the corresponding X~n percentile. Even larger values


are needed to accurately approximate the 95th and 99th percentiles.
Predictably, the test procedures based on T and T will be more powerful
against different alternatives.
A small simulation study shows for example that, for testing F(x) = eX /1 +
eX (a logistic distribution) based on a balanced ranked set sample with m =
10, n = 3, the test based on T is more powerful against a standard normal
alternative while a test based on T is more powerful against a standard Cauchy
alternative. Such power considerations will be discussed further in Section 11.4.
An attractive alternative approach in the generalized ranked set sampling
scenario is to treat the problem as one involving a considerable amount of
missing data. Thus we have one observation X i1 :n1 from the first set of n1
observations, n1 - 1 of which are missing. Similarly Xi2:n2 has n2 - 1 missing
observations associated with it. Using the notation N = 2:.1=1 nj we have
J observations and N - J missing values. Our approach then is to simulate
the missing data assuming H : F = Fa and then use standard goodness-of-fit
procedures applied to the augmented samples. In practice we will transform
using (11.1) or (11.3) to uniform or exponential order statistics and readily
simulate the missing uniform or exponential order statistics. For example if we
transform to uniform order statistics then the missing data corresponding to
Yij:nj can be simulated by generating:

(ij - 1) LLd. uniform (0, Yij:nJ variates


148 B. C. Arnold, R. J. Beaver, E. Castillo, and J. M. Sarabia

Table 11.3: Simulation based upper 90, 95 and


99th percentiles of the statistic T for different
values of nand m
n m To.9o To.95 To.99
1 1 1.7345 4.0986 13.1464
3 1 6.5532 10.2042 21.5125
5 1 10.2478 14.3793 26.2311
10 1 17.5866 22.1089 34.1520

1 3 6.7389 10.9627 24.0590


3 3 17.4239 23.1078 38.3326
5 3 25.8054 31.9416 47.2485
10 3 44.2939 50.8899 66.8919

1 5 11.0281 16.2471 31.3853


3 5 26.6323 33.3458 50.9238
5 5 39.4436 46.7528 63.7632
10 5 68.6998 76.6287 94.9124

1 10 19.9969 26.7644 45.2539


3 10 47.4434 56.0884 76.7791
5 10 71.1083 80.2365 101.6653
10 10 126.6812 137.1765 159.6729

1 25 42.5964 52.0486 75.1563


3 25 103.2947 115.2631 141.7282
5 25 158.6502 171.6593 199.9937
10 25 292.0612 306.7315 337.4402

1 100 136.7013 151.6927 184.0668


3 100 356.4998 376.3445 417.3805
5 100 566.5579 589.0772 634.3068
10 100 1083.0568 1109.6497 1162.5362
Goodness-of-Fit Tests 149

Table 11.4: Accuracy of chi-square approximations for percentiles of T

n m Percentile Simulated value Chi-squared % Error


approximation
2 5 .90 19.3748 15.9872 -17.48
5 5 .95 46.7528 37.6252 -19.46
10 5 .99 94.9124 76.1539 -19.76

3 10 .90 47.4434 40.2560 -15.15


6 10 .95 91.8737 79.0819 -13.73
9 10 .99 147.7456 124.1163 -15.99

4 25 .90 131.1570 118.4980 -9.65


7 25 .95 226.4419 206.8668 -8.64
10 25 .99 337.4402 304.9396 -9.63

2 50 .90 134.5036 118.4980 -11.90


5 50 .95 313.9644 287.8815 -8.31
10 50 .99 618.3149 576.4928 -6.76

and
(nj - ij) Li.d. uniform (Yij:nj' 1) variates.

Simulations will enable us to determine the power of these procedures against a


variety of alternatives. Currently our simulations are based on balanced ranked
set samples, but obviously unbalanced data can be accommodated.
Assuming that we transform to uniform order statistics using (11.1) a rea-
sonable test statistic to apply to the augmented sample of size N = L:j=l nj
is Stephens' (1970) modified version of the Watson (1961) U 2 statistic. For
background see D'Agostino and Stephens (1986, p. 248-249). If we denote the
ordered N observations by y(l)," ., Y(N) then

2 1 ~ 2i - 1 2 - 2
U =- + L..,,(-- - Y(i)) - N(Y - 0.5) (11.5)
12N i=l 2N
and the modified statistic is given by

U2 = {U2 _ 0.1 Q:!:.}{1 0.8}


MOD N + N2 + N . (11.6)

Critical values for UJ.WD when N > 10 were supplied by Stephens. They are:
90th percentile = 0.152, 95th percentile = 0.187 and 99th percentile = 0.267.
150 B. C. Arnold, R. J. Beaver, E. Castillo, and J. M. Sarabia

For values of N :::; 10, Quesenberry and Miller (1977) provide simulated critical
values that differ only slightly from the values corresponding to the case N > 10.
Under the null hypothesis F = Fo, our augmented sample of Y's will be
distributed as a sample of size N from a uniform (0,1) distribution and so the
relevant critical value of U'kOD is the customary value for a random sample of
size N.

11.4 Power
In Section 11.3, we introduced 3 different tests for a goodness-of-fit of H : F =
Fo where Fo is a completely specified distribution. Critical values for the test
statistics T, i' and U'kOD' based on simulation studies, were provided. We
turn now to consider how well the tests perform. A priori it is not easy to
visualize which of the 3 tests will be best for particular situations. A small
power simulation study, reported in Section 11.3, indicated that sometimes T
is more powerful than i' and sometimes the situation is reversed.
Simulated power studies can provide some guidance in selection of a test
from the 3 available. They provide only limited information since the results
obtained may well be specific to the particular alternatives considered and the
particular sample sizes used, etc. The simulation studies to be reported in this
section are based on balanced rank set samples with a spectrum of choices of
values of m and n. In all cases a test of size .05 was used. The null hypothesis
was that F is a standard normal distribution. Four alternative hypotheses were
considered: Normal (0,4), Normal (2,1), Logistic (0,1) and Logistic (0,4).
The simulated power determinations are based on 10,000 replications for
T and i' and on 20,000 replications for U'kOD' The results for a selection of
values of m and n are displayed in Tables 11.5-11.7. More extensive tables will
be presented elsewhere.
Comparison of Tables 11.5-11.7 reveals that, almost uniformly over the
range of values of m and n considered, the i' test is more powerful than the T
test which itself is more powerful than the U'kOD test. It must be emphasized
that this may be specific to the choice of null hypothesis (normal) and the
choices of alternatives. We know that for a logistic null hypothesis, as reported
in Section 11.3, i' is not uniformly more powerful than T. More extensive
and detailed power simulations will be required to resolve the issue. For the
moment, however, for a standard normal null hypothesis the test based on i'
seems to be the one to choose.
The reader will have noticed from Tables 11.5-11.7 that none of the tests
is really able to distinguish standard normal data from standard logistic data.
This is especially true for the test based on T which actually appears to be
biased since the test of size .05 actually rejects normality less than 5% of the
Goodness-oE-Fit Tests 151

Table 11.5: Power of the T test of size .05 with a standard normal null hypoth-
esis
n m 95th Percentile N(O,l) N(0,4) N(2,1) L(O,l) L(0,4)
1 1 2.709000 0.049300 0.319200 0.514400 0.057200 0.296400
3 1 7.610900 0.055000 0.191400 0.925600 0.039900 0.156300
5 1 11.317700 0.055100 0.286700 0.999700 0.042000 0.242700
10 1 19.351601 0.049500 0.638600 1.000000 0.048200 0.491700

1 3 5.711900 0.051100 0.446000 0.813000 0.040000 0.374200


3 3 16.657600 0.051100 0.452200 1.000000 0.034600 0.366800
5 3 25.775999 0.050300 0.636800 1.000000 0.036900 0.508400
10 3 45.967800 0.048500 0.975200 1.000000 0.046800 0.909700

1 5 8.461500 0.049900 0.576100 0.939900 0.038300 0.481800


3 5 24.466200 0.056000 0.640900 1.000000 0.029200 0.538100
5 5 38.527500 0.048300 0.842100 1.000000 0.032700 0.721600
10 5 70.394302 0.046800 0.999300 1.000000 0.047000 0.986100

1 10 14.831700 0.050300 0.794400 0.997100 0.030300 0.695800


3 10 43.122898 0.050700 0.896600 1.000000 0.025800 0.810600
5 10 68.763298 0.051300 0.985800 1.000000 0.026100 0.947600
10 10 128.201996 0.046000 1.000000 1.000000 0.042600 1.000000

1 25 32.535400 0.047500 0.984100 1.000000 0.021000 0.946800


3 25 95.025101 0.054900 0.999000 1.000000 0.018600 0.991200
5 25 154.007904 0.045500 0.999900 1.000000 0.020800 1.000000
10 25 293.811707 0.045700 1.000000 1.000000 0.037900 1.000000

1 100 114.875999 0.053200 1.000000 1.000000 0.006500 1.000000


3 100 338.932709 0.053900 1.000000 1.000000 0.004900 1.000000
5 100 556.245789 0.047800 1.000000 1.000000 0.007400 1.000000
10 100 1084.826050 0.046200 1.000000 1.000000 0.026900 1.000000
152 B. C. Arnold, R. J. Beaver, E. Castillo, and J. M. Sarabia

Table 11.6: Power of the T test of size .05 with a standard normal null hypoth-
esis

n ill critO.95 N(O,l) N(0,4) N(2,1) LO,l) L(0,4)


1 1 4.098600 0.050300 0.196600 0.631800 0.048300 0.185500
3 1 10.204200 0.054000 0.417400 0.993100 0.049300 0.370300
5 1 14.379300 0.051800 0.586000 1.000000 0.052700 0.530800
10 1 22.108900 0.048900 0.906300 1.000000 0.060800 0.848100
1 3 10.962700 0.050200 0.383100 0.890400 0.059300 0.345100
3 3 23.107800 0.048100 0.721300 1.000000 0.058300 0.663200
5 3 31.941601 0.050500 0.904600 1.000000 0.070200 0.851700
10 3 50.889900 0.045300 0.999300 1.000000 0.067400 0.997000
1 5 16.247101 0.049300 0.494200 0.969600 0.071100 0.443300
3 5 33.345798 0.054800 0.857000 1.000000 0.068800 0.807100
5 5 46.752800 0.046500 0.981300 1.000000 0.067600 0.959300
10 5 76.628700 0.043800 1.000000 1.000000 0.077400 0.999900
1 10 26.764400 0.045400 0.694100 0.999100 0.080400 0.643800
3 10 56.088402 0.048200 0.978100 1.000000 0.074000 0.958400
5 10 80.236504 0.047700 0.999300 1.000000 0.073600 0.997000
10 10 137.176498 0.041500 1.000000 1.000000 0.085100 1.000000
1 25 52.048599 0.044500 0.933300 1.000000 0.101000 0.894900
3 25 115.263100 0.048900 1.000000 1.000000 0.090900 0.999700
5 25 171.659302 0.046400 1.000000 1.000000 0.088600 1.000000
10 25 306.731506 0.049800 1.000000 1.000000 0.099400 1.000000
1 100 151.692703 0.043400 1.000000 1.000000 0.152000 0.999900
3 100 376.344513 0.043500 1.000000 1.000000 0.113000 1.000000
5 100 589.077209 0.044700 1.000000 1.000000 0.103400 1.000000
10 100 1109.649658 0.040900 1.000000 1.000000 0.134300 1.000000
Goodness-oi-Fit Tests 153

Table 11.7: Power of the UkOD test of size .05 with a stan-
dard normal null hypothesis
n m N(O,l) N(0,4) N(2,1) L(O,l) L(O,4)
1 1
3 1 0.04755 0.14545 0.71445 0.04755 0.11595
5 1 0.04970 0.21365 0.96800 0.05045 0.16185
10 1 0.04675 0.46550 1.00000 0.05090 0.34700

1 3 0.05505 0.18225 0.57390 0.05660 0.14675


3 3 0.04970 0.33290 0.99520 0.05195 0.25010
5 3 0.05220 0.52980 1.00000 0.05140 0.39555
10 3 0.05075 0.8948 1.00000 0.05775 0.76895

1 5 0.05015 0.27310 0.82135 0.05220 0.20155


3 5 0.04915 0.51920 1.00000 0.05210 0.39370
5 5 0.04925 0.75505 1.00000 0.05860 0.59155
10 5 0.05265 0.98940 1.00000 0.06780 0.94410

1 10 0.04995 0.50465 0.99230 0.05895 0.37085


3 10 0.04755 0.82990 1.00000 0.06070 0.67680
5 10 0.05050 0.97180 1.00000 0.06060 0.89150
10 10 0.04845 1.00000 1.00000 0.08535 0.99875

1 25 0.05010 0.89535 1.00000 0.06230 0.76280


3 25 0.05135 0.99740 1.00000 0.07750 0.97630
5 25 0.05215 1.00000 1.00000 0.09565 0.99960
10 25 0.04715 1.00000 1.00000 0.14555 1.00000

1 100 0.04935 1.00000 1.00000 0.12535 0.99990


3 100 0.04675 1.00000 1.00000 0.17100 1.00000
5 100 0.04680 1.00000 1.00000 0.25155 1.00000
10 100 0.05215 1.00000 1.00000 0.48115 1.00000
154 B. C. Arnold, R. J. Beaver, E. Castillo, and J. M. Sarabia

Table 11.8: Ranked set sample of shrub sizes

Initial data
Rank: 1 0.79 0.20 0.57 0.35 0.75
Rank: 2 1.45 0.97 0.97 0.98 1.50
Rank: 3 0.52 0.62 2.54 2.12 1.86
Rank: 1 -0.235722 -1.609440 -0.562119 -1.049820 -0.287682
Rank: 2 0.371564 -0.030459 -0.030459 -0.020203 0.405465
Rank: 3 -0.653926 -0.478036 0.932164 0.751416 0.620576

time when the data has a standard logistic distribution. The tests based on
T and UkOD do better but an embarrassingly low power is achieved for a
standard logistic alternative even for large values of m and n. It has been
observed by many authors that the normal and the logistic densities are not
easy to distinguish. The current study reinforces that observation.

11.5 Composite Null Hypotheses


If an hypothesis involves a parametric family of distributions then, prior to
transforming to uniform or exponential order statistics, it will be necessary to
utilize a ranked set based estimate of the unknown parameters. Either an EM
algorithm implementation of maximum likelihood or a Gibbs sampler based
diffuse prior Bayesian estimation procedure [Kim and Arnold (1999)] can be
used for this part of the analysis. Simulation based evaluations of the power of
such procedures in balanced ranked set settings are currently underway.
To illustrate the approach consider the following data set from Muttlak and
McDonald (1990). They report an application in which interest centers on the
size of shrubs. In the application the sample contained 46 shrubs. From the
initial 46 shrubs a balanced ranked set sample was taken with n = 3 and m = 5.
The data are in Table 8. In this example we test whether or not the sample is
lognormally distributed. Thus, we take logarithms and test for normality.
To test for normality we use the following steps:
Step 1: We obtain the initial crude estimates of /-L and (J using the estimates
k
jj = 2:= bjXij,nj (11.7)
i=l

and
k
a- = ~
~
c)·Xi J'· n'J (11.8)
i=l
Goodness-oi-Fit Tests 155

where
(11.9)

(11.10)

and
aj = E(Xij,nj)' (11.11)
These estimates for the data in Table 11.8 are:
fl = -0.125112; (j = 0.283775.
Step 2: We complete the sample by simulating the missing data using the
actual estimates of /-l and (J'. To this end we first transform the data to a uniform
sample using the cdf associated with these values of /-l and (J', we simulate the
uniform missing data, and finally, we return to our normal sample.
Step 3: We calculate x = 2::f:1 xdN and 82- 2::f:1 (Xi - x)2(N - 1) using the
actual completed sample.
Step 4: We simulate (J'[, an inverted gamma JG( Nil, S;) random value.
Step 5: We simulate /-li, a normal N(x, ~) random value.
Step 6: We repeat Steps 2 to 5 N1 + N2 times (in the example we have used
N1 = 500, N2 = 500).
Step 7: We disregard the first N1 iterations and then estimate the parameters
using
1 M+~ 1 M+~
fl =]V: :L /-li = -0.125202; 8- 2 =]V: :L (J'[ = 0.502888 .
2 i=Nl+l 2 i=Nl+l

Step 8: We complete the sample, as in Step 2, but using the estimates for /-l
and (J' from Step 7.
Step 9: We obtain an iid uniform sample transforming the sample using the
transformation
Xi - /-l)
Ui = <I> ( ---;;- ,

where <I>O is the cdf of the standard N(O, 1) distribution and simulating missing
uniform observations. This sample of size 45 after being sorted becomes:
0.00158157,0.0315106,0.0329857,0.0552611, 0.0624795, 0.0880001,
0.100829,0.101379,0.112136,0.121759,0.146543, 0.192474, 0.214823,
0.217738, 0.24146, 0.252666, 0.292152, 0.311323, 0.373312, 0.395422,
0.413025,0.448858,0.536143,0.539685,0.569218, 0.574718, 0.574718,
0.582695,0.599063,0.623866,0.624577, 0.771671,0.803022,0.838381,
0.841018,0.854342,0.895774,0.900308,0.930962, 0.942142, 0.954493,
0.959348, 0.980614, 0.982249, 0.993978
156 B. C. Arnold, R. J. Beaver, E. Castillo, and J. M. Sarabia

Step 10: We test the uniformity of this sample using the U 2 statistic given by

N {2''/, - 1
U
2 I
= -l?N +" - - - u(') }2 -
~?N 2
- 2
5) =.
N(u - 0. 0 141958
,
(11.12)
... i=l ,;,.J

where u(i) is the ith order statistic from the transformed sample of size N. The
value of the test statistic is modified as follows prior to entering the table of
critical values

2
UMOD = {2
U -
0.1 0.1}{
N + N2 0.8}
1 + N = 0.14227.

Observe that we get a value that is smaller than the critical value 0.187 at the
0.05 significance level. Thus, we cannot reject the assumption that the sample
comes from a lognormal population.

11.6 Remarks
(i) Only minor modifications of our ranked set procedures will be required
if more than one unit in each ranked set is measured, i.e. if some of the
Xij:nj'S are dependent, coming from the same sample.

(ii) In spite of our expectations, we found that the x2-approximations for


both T and T were not very accurate. As a result, it is important that
more extensive tables for both statistics be made available as planned.

References
1. Arnold, B. C., Balakrishnan, N., and Nagaraj a, R.N. (1998). Records,
New York: John Wiley & Sons.

2. D'Agostino, R. B. and Stephens, M. A. (1986). Goodness-oj-Fit Tech-


niques, New York: Marcel Dekker.

3. Kim, Y. H. and Arnold, B. C. (1999). Parameter estimation under gener-


alized ranked set sampling, Statistics and Probability Letters, 42, 353-360.

4. Mattlak, R. A. and McDonald, L. L. (1990). Ranked set sampling with


size-biased probability of selection, Biometrics, 46, 435-445.
Goodness-of-Fit Tests 157

5. Quesenberry, C. P. and Miller, F. L., Jr. (1977). Power studies of some


test of uniformity, Journal of Statistical Computation and Simulation, 5,
169-191.

6. Stephens, M. A. (1970). Use of the Kolmogorov-Smirnov, Cramer-von


Mises and related statistics without extensive tables, Journal of the Royal
Statistical Society, Series B, 32, 115-122.

7. Watson, G. S. (1961). Goodness-of-fit tests on a circle, Biometrika, 48,


109-114.
PART IV
REGRESSION AND GOODNESS-OF- FIT TESTS
12
Gibbs Regression and a Test for Goodness-of-Fit

Lynne Seymour
University of Georgia, Athens, Georgia

Abstract: We explore a model for social networks that may be viewed either
as an extension of logistic regression or as a Gibbs distribution on a com-
plete graph. The model was developed for data from a mental health service
system which includes a neighborhood structure on the clients in the system.
This neighborhood structure is used to develop a Markov chain Monte Carlo
goodness-of-fit test for the fitted model, with pleasing results.

Keywords and phrases: Gibbs distribution, Markov chain Monte Carlo,


Pearson's goodness-of-fit statistic, dependent binary data

12.1 Introduction
Researchers in the social sciences require an understanding of the social net-
work within which individuals act, as well as the individual interactions within
that network. In an attempt to capture the global and local interactions si-
multaneously, spatial models, in which the spatial adjacency matrix is replaced
by a matrix of social interdependencies, were considered [Doreian (1980, 1982,
1989)] with some success [e.g., Gould (1991)]. Another modeling effort looks
at log-linear models. These models, in which the social interdependency is the
observed random variable, have also been successful in modeling social networks
[Strauss and Ikeda (1990), Galaskiewicz and Wasserman (1993) and Wasserman
and Pattison (1996)]. Such logistic regression models are called Markov random
graphs in the social science literature.
In statistical image analysis, the Gibbs distribution - which was originally
introduced by Gibbs (1902) to model particle interactions in statistical mechan-

161
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
162 L. Seymour

ics - models categorical responses at each point of a regular grid; specifically,


the model is known as a Gibbs random field [Besag (1974) and Geman and Ge-
man (1984)], and may be interpreted intuitively as a distribution describing an
image - a collection of (stochastic) colors at each pixel on a computer screen.
Within the random field, however, "neighboring pixels" are those which are
within some fixed distance of one another, which is not necessarily suitable for
modeling a social network. This research article explores using a Gibbs distri-
bution on a complete graph (i.e., one in which each point is a neighbor of every
other point) as a model for social networks - a model which was initially pro-
posed by Seymour et al. (2000) and which we have dubbed a Gibbs regression.
The social network interpretation of this model sees individuals as vertices on
a complete graph, with categorical responses at each vertex.
This paper proceeds in Section 12.2 with a brief description of the data
and of the Gibbs regression model; more details may be found in Seymour
et al. (2000). Section 12.3 fits the model, and develops and implements a
goodness-of-fit test based on the neighborhood structure necessary for Gibbs
distributions. Section 12.4 concludes with discussion.

12.2 The Motivation and the Model


This implementation of Gibbs regression was motivated by data from a study
which investigates how the continuity of an individual client's case management
(i.e., the supervision of the client's care within the service system) is affected
by the management structure of a city's social services [Lehman et al. (1994)
and Morrissey et at. (1994)]. Continuity of case management was considered
by the experts involved to be an indicator of the stability of the service system.
In that data, there is Wij, the number of organizations visited in common
by clients i and j, which will be used as a measure of dependence between those
individuals. In addition, there are traditional covariates: for the ith client, we
have

XiI indicating sex (0 if female; 1 if male),

Xi2 indicating age (1 if 18-27; 2 if 28-33; 3 if 34-39; 4 if 40+),

Xi3 indicating level of education (1 if did not complete high school; 2


if completed high school but did not attend college; 3 if some college
education),

Xi4 indicating schizophrenia diagnosis (0 if not schizophrenic; 1 otherwise),


and
Gibbs Regression 163

Xi5 indicating marital status (0 if not married; 1 otherwise).

The dependent variable, Yi, is a discrete measure of whether or not the ith
client received case management.
For this study, Yi is taken to be the simplest case - a symmetric binary
variable (-lor 1) - but in general Yi may be a multinomial response. For
example, the response of interest here is the number of case managers a client
has had. Ideally, the responses should be reflect whether client i has had no
case manager, has had one case manager, or has had more than one case man-
ager. Due to model complexity and the available data, however, Yi is herein
considered as a binary variable.
Initial considerations imply a model of the form

(12.1)

where

Yi E {-I, I} is the response of the ith client;

!3 = {JJo, ... , JJ5}' is a vector of regression coefficients (' denotes trans-


pose); and

Xi = {I, XiI, ... ,Xi5}' is the vector of covariates for the ith client.
If there is no interdependence, then the model in (12.1) - the logistic regression
model - is adequate. However, an extension of the logistic regression model
which can account for client interdependence is required. The description of
and estimation strategies for such a model follows, the bulk of which may also
be found in Seymour et al. (2000).
A general solution to this problem was given by Besag (1974). Assuming
that there are no interactions of orders greater than two, and that the second-
order interactions are determined by the Wij'S, the general formula in Seymour
et al. (2000) yields

P {YI = YI,···, YN = YN} ex: exp {LYi!3IXi +


t
~L t
L P(Wij)YiYj} , (12.2)
J

in which the interdependencies are captured by combining {Wij} = W with


a real-valued function p(.) governing the strength of dependence for a given
value of Wij. An elementary version of this sort of interaction has in fact been
considered for network autocorrelation models by Doreian (1980), specifically
with p(Wij) = ~Wij where ~ is a real-valued parameter to be estimated. The
conditional form of (12.2) is
164 L. Seymour

P{Yi = 11Yj = Yj,j # i}


exp {1J'xi +E P(Wij)Yj}
#i
(12.3)

which is a Gibbs distribution on the network of clients, which we call Gibbs


regression. Note that (12.3) reduces cleanly to (12.1) if p(Wij) == 0 for all values
of Wij.
The Gibbs regression model bears similarities to both the spatial and ran-
dom graph approaches from the social sciences. While the interactions employed
in (12.3) are similar to those used in the spatial approach, the responses in that
approach are not categorical but are typically continuous; e.g., the spatial ap-
proach might use a model of the form Yi = jl' Xi + L: j -=Ii p(Wij ) Yj. The random
graph approach is based on log-linear models such as (12.1) and (12.3), with
much of the current methodology coming from Markov random field theory.
Indeed, Gibbs and Markov random fields are equivalent if the client has a fixed
and finite set of neighbors which is not dependent upon the number of clients in
the network Besag (1974). But the Markov property clearly does not hold for
the Gibbs regression, since all clients are neighbors no matter how many clients
are within the service system. More importantly, the random graph approach
considers the client interactions as random responses, whereas the Gibbs regres-
sion in (12.3) assumes that all clients interact in a deterministic - and possibly
zero-valued - way.
A major difficulty with (12.2) is that the normalizing constant required
to make it a proper likelihood function requires summation over all possible
outcomes, which is a prohibitively large set. Though this makes the direct
calculation of the likelihood function impossible, there are two methods of cir-
cumventing the problem.
One such method is to use the pseudolikelihood function of Besag (1975),
which is calculated by multiplying the probabilities in (12.3) over all clients.
Indeed, Strauss and Ikeda (1990) use the pseudolikelihood for estimation in
the random graph approach. Though the maximum pseudolikelihood estimate
(MPLE) is consistent; see, Comets (1992), there are at least two problems with
using the MPLE, particularly for social network models. The first of these is
that the variance of the MPLE is unknown and/or intractable, depending on
the strength of dependence in the random field [Possolo (1991), Cressie (1993)
and Seymour (2000)]. Without the variance, there is no statistical inference
for parameters based on the MPLE. The second problem is that the MPLE is
unstable for "small" random fields, where the concept of "small" depends on
Gibbs Regression 165

the strength of dependence in the random field; see, Seymour (2000) . If the
dependence is very weak, then a "small" sample size is 100 clients; the sample
size considered "small" increases as the strength of dependence increases.
A computational technique which circumvents the intractable likelihood
function in a more statistically satisfying way is the Markov chain Monte
Carlo (MCMC) approximation to the likelihood function derived by Geyer and
Thompson (1992). The value that maximizes that MCMC approximation with
respect to e is called a Monte Carlo MLE (MCMLE) and converges almost
surely to the true MLE as the length of the chain goes to infinity. In addition,
since this technique estimates the true log-likelihood function (to within a mul-
tiplicative constant), the approximation of standard errors using the observed
information matrix [approximated numerically via quasi-Newton methods; see
Georgii (1988)] is valid.
In principle, eo may be any value in the parameter space, but in practice,
it is known that the procedure works best if eo is not too far from the MLE.
For this purpose, the current demonstration uses the MLE under independence
as eo since it is easily implemented. In order to get values of eo which are
closer to the dependence MLE, subsequent values of the Monte Carlo MLE
are iteratively assigned to eo as the Monte Carlo procedure is run again. This
procedure was suggested by Geyer and Thompson (1992), and results obtained
using this procedure appear to be numerically stable.

12.3 Application and Evaluation of the Model


The data set described above is from a metropolitan area in the Midwestern
United States. There are 34 clients who have changed case management (re-
sponse is +1) within this mental health service system, and 63 who have retained
the same case manager (response is -1) throughout the demonstration.
Table 12.1 lists information about the interaction matrix W. For exam-
ple, there are 908 pairs of clients who share no organizations in common. Of
those, 436 share a positive response, so that 48.02% of those pairs who have no
organizations in common have each changed case management. For the sake
of parsimony, it is desirable to combine or "lump" some of the values of Wij
together. Indeed, this was also indicated as different versions of the model were
fitted to the data. The optimal lumping chosen is also given in Table 12.l.
Notice that by combining the values of 2, 3, and 4, the percentage of positive
responses for 2 without lumping and for 2+ with lumping has not changed sig-
nificantly (and, in an off-the-cuff comparison, @, p-value of 0.7114 fails to reject
the null hypothesis that these two percentages are equal).
166 L. Seymour

Table 12.1: Interaction profile

Without lumping With lumping


Wij #Wij #{Yi=Yj=I} %age #Wij #{Yi = Yj = I} %age
0 908 436 0.4802 908 436 0.4802
1 2618 1079 0.4121 2618 1079 0.4121
2 994 393 0.3954 1130 438 0.3876
3 121 43 0.3554
4 15 2 0.1333

In order to avoid degeneracy in (12.2), the function p(.) must be restricted


in some way. There are several such restrictions which could make sense in this
context, such as forcing L:k p(k) to be 0 or 1. In the current application, we
impose the restriction that p(O) = 0, which indicates that clients who share
no organizations in common are not dependent, and we force Ip(')1 to be small
(specifically, less than 1), which limits the dependence structure in a reasonable
way given the relatively small number of individuals in the study. Indeed, there
is currently no understanding of how such a strong dependence structure -
called phase transition in the random field literature - will affect inference for
a random field on a complete graph, though there is some understanding of
inference for strongly dependent random fields on the integer lattice [Georgii
(1988) and Comets (1992)].
Table 12.2 compares the fit of three similar models to the data: the logistic
(independence) regression; an extension of the logistic regression, called "10-
gistic( +)", in which Xi6 = 2: j Wij is taken to be a covariate; and the Gibbs
(dependence) regression. The logistic and logistic( +) regressions were fit using
straightforward maximum likelihood parameter estimation. The Gibbs regres-
sion was fit using the MCMC approximation to the log-likelihood, as described
in Section 12.2. A backward elimination procedure was used to eliminate the
parameters which were not significantly different from zero. Estimates of the
standard errors for the remaining parameter estimates are given in parentheses.

All three models indicate that a client's sex and level of education (variables
1 and 3) are potentially significant predictors of whether the client changes case
management (males are more likely to change case management; the likelihood
of changing case management increases as educational level increases), whereas
a client's age, schizophrenia diagnosis, and marital status (variables 2, 4, and 5)
are nowhere near significant. An interesting phenomenon is that inclusion of the
W information forces the intercept to be zero in both the logistic( +) and Gibbs
regressions. Particularly for Gibbs regression, this makes some sense since an
intercept should depend on the neighboring responses of a given client. Using
Gibbs Regression 167

Table 12.2: Parameter estimates

Parameters Logistic Logistic (+) Gibbs


/30 -0.5660 0 0
(0.2997)
{31 0.3431 0.4836 0.4080
(0.2166) (0.2307) (0.2546)
/32 0 0 0
{33 0.4477 0.4773 0.4534
(0.1649) (0.1179) (0.1476)
{34 0 0 0
/35 0 0 0
(36 - -0.0066 -
(0.0020)
p(O) - - 0
p(l) - - -0.0312
(0.0141)
p(2) - - -0.0224
(0.0220)
NLLH 57.4805 57.6425 48.9700

the negative log-likelihoods and the standard errors of the parameter estimates
as a guide, the Gibbs regression appears to contribute something significant
towards explaining the relationships involved.
In order to further assess the fit of the Gibbs regression, an MCMC version of
Pearson's goodness-of-fit statistic may be calculated for the "contingency table"
interaction profile in Table 12.1 (with lumping). In a traditional contingency
table setting, let c be the number of categories into which N responses uniquely
fall. Let Oi be the observed number of responses in category i E {I, ... , c},
N = 0 1 + ... + Oc. Let Ei be the expected number of responses in category
i E {I, ... , c}, under an assumed model. Then Pearson's goodness-of-fit statistic
is given by
ti=1
(Oi - Ei)2,
Ei
(12.4)

which has a X2 (c - 1) distribution under the null hypothesis that the assumed
model is the true model, assuming that the responses making up the contingency
table are independent. The following development is the first goodness-of-fit
test developed for Gibbs distributions.
In the current setting, the categories are the numbers of organizations shared
in common (as in Table 12.1), a response is whether both of a given pair of
clients have changed case management, the observed counts are given in the
168 L. Seymour

sixth column of Table 12.1, and the expected counts must be estimated via
MCMC methods. In addition, the responses are not independent. Hence, the
goodness-of-fit statistic (12.4) may not have a X2 distribution.
In order to evaluate (12.4), we first generate a Markov chain of social net-
works via the Metropolis algorithm [Metropolis et al. (1953)], using the can-
didate model with both the W matrix and the covariate information held con-
stant. "Expected" counts of shared positive responses for each value of Wij are
aggregated from the chain, and the statistic (12.4) is then calculated using the
number of shared positive responses in Table 12.1 as "observed" values. In a
traditional contingency table setting, since c = 3, the appropriate distribution
for this statistic is X2(2). In this situation, however, the MCMC Pearson sta-
tistic for the Gibbs regression model in Table 12.2 appears to be distributed
Gamma(a, ()) as in Table 12.3, where a and () depend on the length of the
Markov chain. (N. B. X2(2) =Gamma(l, 2).) We did not explore this distribu-
tion under the logistic or logistic( +) regression models.

Table 12.3: Gamma parameters for


MCMC Pearson statistics
Chain Length a ()

10 0.8636 18.1036
100 0.8922 15.8216
1000 0.9004 14.9529

The MCMC Pearson statistics for each of the models are shown in Table
12.4; all used a Markov chain of length 1000. The expected counts and Pearson
statistics of the logistic and logistic( +) regressions were calculated simply for
comparison; in fact, it is expected that their MCMC Pearson statistics will be
distributed differently from that of the Gibbs regression. Nevertheless, note
that the expected counts match up best with the observed counts under the
Gibbs regression.

Table 12.4: Results for MCMC Pearson statistic

Wij 0 E-Logistic E-Logistic( +) E-Gibbs


0 436 399.315 679.513 428.423
1 1079 1087.619 1927.686 1045.687
2+ 438 460.915 828.139 423.341
Statistic 4.578 644.706 1.703
Gibbs Regression 169

Table 12.5 gives the percentiles from the simulated distribution (sample size
of 5000) for the MCMC Pearson statistic under the chosen Gibbs regression for
the chain lengths shown in Table 12.5. One can easily determine that one
cannot reject the null hypothesis that the Gibbs regression fits this data.

Table 12.5: Percentiles under Gibbs regression

Percentiles M=lO M= 100 M = 1000


1st 0.43 0.50 0.58
5th 1.39 1.19 1.13
10th 2.31 1.98 1.80
25th 4.95 4.60 4.56
50th 10.28 9.28 8.66
75th 20.16 17.58 16.94
90th 35.87 31.60 29.59
95th 48.30 43.08 42.99
99th 80.17 72.90 69.52

12.4 Discussion
Though the Gibbs regression model may be very effective in modeling social
networks, there are some difficulties with the data and with the model.
The fit of a Gibbs regression will almost surely be improved by weighting
the individual organizations according to their expected impact upon the re-
sponse. Unfortunately, the data described herein gave no information about
the individual organizations within the service system.
There is an abundance of model diagnostic tools in the logistic regression
literature which may be extended to Gibbs regression, some ,of which were
used in Seymour et ai. (2000); however, there are no such diagnostic tools
in the Gibbs random field literature. For model selection, an ad hoc kind
of backwards selection from classical mUltiple regression was· used to choose
the models in Table 12.2; some other ad hoc selection criterion could easily
have been used. Again, there are numerous criteria from the logistic regression
literature that could be extended. In addition, there are two criteria from the
Gibbs-Markov random field literature that could be used for Gibbs regression:
one MCMC-based Bayesian information criterion [Seymour and Ji (1996)], and
one pseudolikelihood criterion [Ji and Seymour (1996)].
170 L. Seymour

Unfortunately, there are no theoretical properties for a Gibbs distribution


on a complete graph. Hopefully, the potential of the model demonstrated herein
will motivate such results.

References
1. Besag, J. E. (1974). Spatial interaction and the statistical analysis of lat-
tice systems (with discussion), Journal of Royal Statistical Society, Series
B, 36, 192-236.

2. Besag, J. E. (1975). Statistical analysis of non-lattice data, The Statisti-


cian, 24, 179-195.

3. Comets, F. (1992). On consistency of a class of estimators for exponential


families of Markov random fields on the lattice, Annals of Statistics, 20,
455-468.

4. Cressie, N. (1993). Statistics for Spatial Data, New York: John Wiley &
Sons.

5. Doreian, P. (1980). Linear models with spatially distributed data: spatial


disturbances or spatial effects, Sociological Methods & Research, 9, 29-61.

6. Doreian, P. (1982). Maximum likelihood methods for linear models: spa-


tial effect and spatial disturbance terms, Sociological Methods & Research,
10, 243-269.

7. Doreian, P. (1989). Network autocorrelation models: problems and prospects.


Paper presented at 1989 Symposium "Spatial Statistics: Past, Present,
Future", Department of Geography, Syracuse University.

8. Galaskiewicz, J. and Wasserman, S. (1993). Social network analysis: Con-


cepts, methodology, and directions for the 1990s, Sociological Methods &
Research, 22, 3-22.

9. Gibbs, J. W. (1902). Elementary Principles of Statistical Mechanics, Yale


University Press.

10. Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distribu-


tions, and Bayesian restoration of images, IEEE Transactions on Pattern
Analysis and Machine Intelligence, 6, 721-741.

11. Georgii, H. O. (1988). Gibbs Measures and Phase Transitions, Berlin:


Walter de Gruyter.
Gibbs Regression 171

12. Geyer, C. J. and Thompson, E. A. (1992). Constrained Monte Carlo max-


imum likelihood for dependent data (with discussion), Journal of Royal
Statistical Society, Series B, 54, 657-699.

13. Gould, R (1991). Multiple networks and mobilization in the Paris Com-
mune, 1871, American Sociological Review, 56, 716-29.

14. Ji, C. and Seymour, 1. (1996). A consistent model selection procedure


for Markov random fields based on penalized pseudolikelihood, Annals of
Applied Probability, 6, 423-443.

15. Lehman, A., Postrado, L., Roth, D., McNary, S., and Goldman, H. (1994).
An evaluation of continuity of care, case management, and client outcomes
in the Robert Wood Johnson Program on chronic mental illness, The
Milbank Quarterly, 72, 105-122.

16. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H.,


and Teller, E. (1953). Equations of state calculations by fast computing
machines, Journal of Chemical Physics, 21, 1087-1092.

17. Morrissey, J. P., Calloway, M., Bartko, W. T., Ridgley, S., Goldman, H.,
and Paulson, R 1. (1994). Local mental health authorities and service
system change: Evidence from the Robert Wood Johnson Foundation
Program on Chronic Mental Illness, The Milbank Quarterly, 72, 49-80.

18. Nash, J. C. (1990). Compact Numerical Methods for Computers - Linear


Algebra and Function Minimisation (2nd edition), Bristol: Adam Hilger.

19. Possolo, A. (1991). Subsampling a random field, In Spatial Statistics and


Imaging (Ed., A. Possolo), Vol. 20, pp. 286-294, 1MS Lecture Notes -
Monograph Series.

20. Seymour, L. (2000). A note on the variance of the maximum pseudo-


likelihood estimator, Submitted to Proceedings of the Symposium on Sto-
chastic Processes, Athens, Georgia.

21. Seymour, L. and Ji, C.(1996). Approximate Bayes model selection crite-
ria for Gibbs-Markov random fields, Journal of Statistical Planning and
Inference, 51, 75-97.

22. Seymour, L., Smith, R, Calloway, M., and Morrissey, J. P. (2000). Lattice
models for social networks with binary data, Technical Report 2000-24,
Department of Statistics, University of Georgia.

23. Strauss, D. and Ikeda, M. (1990). Pseudolikelihood estimation for social


networks. Journal of the American Statistical Association, 85, 204-212.
172 L. Seymour

24. Wasserman, S. and Pattison, P. (1996). Logit models and logistic regres-
sions for social networks: 1. An introduction to Markov graphs and p*,
Psychometrika, 61, 401-425.
13
A CLT for the L_2 Norm of the Regression
Estimators Under Q- Mixing: Application to
G-O-F Tests

Cheikh A. T. Diack
University of Warwick, Coventry, UK

Abstract: We establish a central limit theorem for integrated square error of


least squares splines estimators based on a-mixing. The new theorem is used
to study the behavior of an asymptotic goodness-of-fit test.

Keywords and phrases: G-O-F tests, B-splines, mixing

13.1 Introduction
The local and global properties of commonly used nonparametric estimators on
the basis of i.i.d. observations are now well known and allow powerful methods of
statistical inference such as goodness-of-fit tests. However, much less is known
in the case of dependent observations. Whereas there are many papers in
nonparametric curve estimation under mixing, only local properties are usually
established.
In this paper, we consider the problem of estimating a regression function
when the design points are nonrandom and the errors are dependent. We es-
timate the regression function using splines. The rate of convergence for such
estimators are derived by Burman (1991). Our objective is to obtain a global
measure of quality for the least squares spline as an estimate of the regression
function. Specifically, we derive the central limit theorem for the integrated
square error of the least squares splines estimator. We apply this new result to
validating an asymptotic goodness-of-fit test. We also discuss the consistency
of the proposed tests.

173
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
174 C. A. T. Diack

We consider the following regression model

(13.1)

The design points {xd ~=l are deterministic. Without loss of generality, we
assume that Xi E [0,1]. We also assume that {Zk, k E Z} is the two-sided
moving average
+00
Zt = 'l/JjXt-j, L: (13.2)
j=-oo

where X t "-' II D (0, (J2) and the sequence {'l/Jj} is absolute summable. Let

(13.3)

be its covariance sequence. Let (J (Zi' i<O) and (J (Zi' i 2: j) be the (J-fields
generated by {Zi' i < O} and {Zi' i 2: j}, respectively. We assume that the
sequence {Zk,k E Z} is a-mixing, that is:

aj = sup IP (AB) - P (A) P (B)I --t 0 as j --t +00.


AEu(Zi,i 5. 0)
BEu(Zi,i~j)

We also introduce the maximal coefficient of correlation

pj = sup Icorr (A, B) I --t 0 as j --t +00.


AEL2(U(Zi,i 5. 0))
BEL2(U(Zi,i~j))

We assume that the spectral density of Z is bounded away from zero and infinity.

13.2 Estimators
To estimate the function g, we use a least squares spline estimator. Let 'TIo =
o< 'TIl < .. , < 'TIk+1 = 1 be a subdivision of the interval [0,1] by k distinct
points. We define S(k, d) as the collection of all polynomial splines of order d
(Le., degree < d -1) having a sequence of knots 'r/l < ... < 'TIk. The class S(k, d)
of such splines is a linear space of functions with dimension (k + d). A basis
for this linear space is provided by the B-splines [see Schumaker (1981)]. Let
{Nl, .. , Nk+d} denote the set of normalized B-splines. The least squares spline
estimator of 9 is defined by

k+d
flex) = L:0pNp(x) ,
p=l
Regression Splines for G-O-F 175

where

(13.4)

We need to specify some conditions. For any two sequences of positive real
numbers {an} and {b n }, we write an r-v bn to mean that an/bn stays bounded
between two positive constants.
We assume that the sequence of knots is generated by p (x), a positive
continuous density on [O~ 1] such that

l° '1)i
p (x) dx
i
= -k- , i = 0, ... , k + 1.
+1
We set 15k = maxO:Si:Sk (r]i+l - rli) , then it is easy to see that
,
Uk r-v
k- 1 • (13.5)

We assume that
sup IHn(x)-H(x)l=o(k- 1 ) (13.6)
XE[O,l]

where Hn (x) is the empirical distribution function of {Xi}r=l and H (x) is the
limit distribution with positive density h (x) .

13.3 A Limit Theorem


In this section we study the global properties of our estimates. The asymp-
J
totic distribution of the functional {g (x) - 9 (x)} 2 h (x) dx is evaluated under
appropriate conditions as the sample size n ~ +00. We set

T = J {g (x) - 9 (x)} 2 h (x) dx. (13.7)

It is convenient to introduce the following notation. Let N (x) be the vector of


Np (x) ,p = 1, ... , k + d and

We define the (k + d) x (k + d) matrix Mh by

Mh= J N(x)N(x)'h(x)dx. (13.8)


176 C. A. T. Diack

We denote the n x n matrices with (i, j)th element f ij = rli-jl and fij = ri+j
by f and f+ respectively. We assume that the spectral density of {Zk' k E Z}
is bounded away from zero and infinity. A classical result on Toeplitz matrices
[see Grenander and Szego (1984)] proves that 27rAminf and 27rA max f (where
Aminf and Amaxf are the smallest and the largest eigenvalues of f respectively)
converge, respectively, to the minimum and the maximum of the spectral density
of Z. Hence, the assumption on the spectral density of {Zk' k E Z} guarantees
that the eigenvalues of f are bounded away from zero and infinity.
Let 7rpq be the (p, q)th element of the n x n matrix F' M;;l MhM;;l F and

(13.9)
We set A = (7r1' ... , 7rn -d' .

Theorem 13.3.1 Suppose that L;=' a~-2/E < 00 and E /Zl/ 2E < 00 for some
E> 2. Assume that {13.5} and {13.6} hold and limn->oop~ < 1. We also assume
that {Zt} is the two-sided moving average

+00
Zt = L 'l/JjXt-j,
j=-oo

where X t rv lID (0, ()2) and Lt~oo /j'l/Jj/ < +00. Then, if k o (n) and
EXt = rw 4 < 00,

nT - ~tr (F' M;;l Mh M;; 1 Ff) - ~ J {g(d) (x) jpd (x)} 2 h (x) dx

2roA' (f + f+) A + 7r5r5 + (fJ - 3) 7r5rO L~orp + (Llpl<n 7rprp r


~N(0,1).

The above theorem says that

IMSE ~ n12 tr (F' Mn-1 MhMn-1)


Ff B 2d 2d
+ (2d)!k J{ g (d) ( x ) j P ()d}2
x h ()
x dx.
(13.10)
When the errors in the models are un correlated, f = roI (with I the identity
matrix) and ~tr (F'M;;lMhM;;lFf) rv kro. Therefore, {13.10} agrees with the
results of Agarwal and Studden (1980).

Actually, we believe that Theorem 13.3.1 is new even for the case of uncor-
related errors when the variance is 2r5 L;~J 7r~ + (fJ - 1) 71"515.
Regression Splines for G-O-F 177

13.4 Inference
In this section we use Theorem 13.3.1 to construct consistent nonparametric
tests. We prove that the tests have asymptotic powers for some local alterna-
tives.

Goodness-of-fit tests
The null hypothesis is that Ho : g = go. Against an unrestricted alternative,
it is natural to use the L2 distance between the estimator 9 and go. Therefore,
the statistic of the test is given by

T = J{g (x) - go (x)}2 h (x) dx.

Using Theorem 13.3.1, we see that the null hypotheses can be rejected at as-
ymptotic level a if

r;;
nT2:qQvV+~tr ' -1 Mh M n-1)
1 (FMn Fr nB2d
+ (2d)!k 2d
J go (x) }2 h(x)dx
{(d)

(13.11)
where

v= 2'YoA' (r + r+) A + 7f5'Y6 + (ry - 3) 7f6'Yo f=


p=o
'Yp + (L
Ipl<n
P'YP) 2
7f (13.12)

and qQ is the upper 10000 percentile of the standard normal distribution.

Specification test
Under some assumptions, the same cutoff point for the goodness-of-fit test
may be used for testing composite hypotheses of the form Ho : g = go (., (3)
where {3 E e, is an unknown parameter. However, we must use the statistic
T by substituting an estimate ~ for the unknown parameter {3. We need the
following assumption:

Under some mild regularity conditions, estimators such as the least squares,
generalized method for moments or the adaptive efficient weighted estimators
satisfy the required assumption. Hence, the specification test has the same
properties as the goodness-of-fit test.
178 C. A. T. Diack

Asymptotic power
To make a local power calculation for the tests described above, we need to con-
sider the behavior of different statistics (calculated under a fixed but unknown
point go E Ho) for a sequence of alternatives of the form

gn (x) = go (x) + TnVJ (x), (13.13)

where gn lies in the alternative hypothesis, VJ (.) is a known function and Tn is


a sequence of real variables converging to zero.

Theorem 13.4.1 We suppose that the assumptions of Theorem lS.S.l hold


and that
nk-lT~ --+ +00. (13.14)
Then T has a power equal to one under the local alternatives.

Discussion
We have proposed an asymptotic goodness-of-fit test and a specification test
based on least squares spline estimators. The tests are consistent and have
power against some local alternatives.
In applications, the covariance matrix r is unknown. Therefore, we must
estimate it. The estimators which we shall use for Ir, r 2:: are °
n-h
:Yr = ~ L (Yi - Y) (Yi+r - Y) ,r = 0, ... , n - 1,
n i=l

where Y is the sample mean. The estimators :Yr, r = 0, ... , n - 1, have the
r
desirable property that for each n 2:: 1, the matrix with elements rij = :Yli-jl,
is non-negative definite [ef. Brockwell and Davis (1991)]. However, plug-in r
in order to estimate the variance does not guarantee that we have a consistent
estimator. This is an open problem which is under study.
Using regression splines may be advantageous when we want to impose
properties such as monotonicity and/or convexity. We could then test the
shape of a regression function by using the functional T defined in (13.7) when
we substitute an constrained estimator for g. This problem is also under study.

13.5 Proofs
PROOF OF THEOREM 13.3.1. We can write T = Tl + T2 + T3 where

Tl = J{g (x) -lEg (x)}2 h (x) dx,


Regression Splines for G-O-F 179

T2 = J {g (x) -lEg (x)}2 h (x) dx

and
T3 = J {g (x) -lEg (x)} {lEg (x) - 9 (x)} h (x) dx.

From Theorem 3.1 in Agarwal and Studden (1980), we have

B 2d
T2~(2d)!k2d J{ 9 (4) (x)/p(x) d}2 h(x)dx. (13.15)

On the other hand, ET3 = 0 and var(T3) = 0 (var (T)). Therefore, to prove
Theorem 3, it is enough to prove that

Tl '"Vt N (U, V) , (13.16)

where U = 1
~
tr(F' M- 1
h M-
n M(j) n Fr) and
1

V = 1/n 2 (2roA' (r + r+) A + 1T6r6 + (T/- 3) 1T6ro f


p=O
rp + (L
Ipl<n
1Tpr p) 2) .
We have
1
Tl = 2"
n
(FZ)' M;;l MhM;;l (FZ).

It follows easily that ETI = U. Now, since {Zk, k E Z} is a strictly stationary


sequence, straightforward calculations prove that

where the 1Tp are defined by (1.9) . Hence,

var (T1 ) =
1
2"
n
L L 1Tp1TqCOV (ZoZp, ZoZq) .
Ipl<n Iql<n
This can be rewritten in the following form

var (T1 ) = ~2 L L 1Tp1Tq { lEZ6Z pZq - rprq} .


Ipl<n Iql<n

Now Zt = L:.f=_oo'ljljXt-j,{Xt } '" lID (0,(}2) with EXt = T/(}4. Hence,


straightforward calculations show that

L
00

lEZ6ZpZq = (T/ - 3) (}4 'ljI;'ljIi+p'ljli+q + rOrp-q + 2rprq '


i=-oo
180 c. A. T. Diack

It follows that

var (nTl) = L L
Ipl<n Iql<n
7rp7rq {('T7 - 3) 0- 4 f:
i=-oo
'I/J;'l/Ji+p'l/Ji+q + 'Yo'Yp-q + 'YP'Yq} .

(13.17)
One can show easily the following equality

L L 7rp7rq'Yo'Yp-q = 2'YoA' (r + r+) A + 'lr5'Y5·


Ipl<nlql<n

The last term of (13.17) is equal to (2: IPI <n 7rp'Ypf.


On the other hand,

(13.18)

where m;s is the (r, s )th element of the matrix M;;l MhM;;l. Using equation
6.22 in Agarwal and Studden (1980) and Lemma 6.3 in Zhou, Shen, and Wolf
(1998), we see that Im;sl = 0 (kv 1r - sl ) with v E" (0,1) . Therefore, it is easy to
see that the second term of the right hand side of (13.18) is 0 (kp/n) . Besides,

INs (Xq) - Ns (Xp+q) I < IXp+q - xql sup IN' (x) I.


x

A classical result on B-splines proves that sUPx IN' (x)1 = 0 (k). Moreover,
using (13.6), we see that IXp+q - Xql = 0 (k-lp) . Finally, we obtain

l7rp-'lrol = 0 (~ {I +ncn }) (13.19)

with cn ~ o. It is worth noting that 7ro rv k. Using this, we can write


00

L L 7rp7rq L 'I/J;'l/Ji+p'I/Ji+q
Ipl<n Iql<n i=-oo
00

7r5 L L L 'I/J;'l/Ji+P'l/Ji+q
Ipl<n Iql<n i=-oo
00

+7rO L L L ('Irq - 7ro + trp - 7ro) 'I/J;'l/Ji+p'I/Ji+q


Ipl<n Iql<ni=-oo
00

+L L L ('Irq - 'lro)(7rp - 7ro) 'I/J;'l/Ji+p'I/Ji+q. (13.20)


Ipl<n Iql<n i=-oo
Regression Splines for G-O-F 181

Interchanging the order of summation we find that


00 00
n~~ 7r6J4 L L L 'I/J;'l/Ji+p'I/Ji+q = 7rho L rp rv k 2,
Ipl<n Iql<ni=-oo p=o
We will show that the two other terms of (1.20) are 0 (k2) , In what follows, we
will denote the generic constants by Cl, C2, .. , Using the absolute summability
of {j'l/Jj} we have
00
7ro L L L (7rq - 7ro + trp - 7ro) 'I/J;'l/Ji+p'l/Ji+q
Ipl<n Iql<ni=-oo
k 00
-< Cl 7rO-
n {I + nSn} L L .L (Ipl + Iql) I'l/J;'l/Ji+p'l/Ji+ql
Ipl<n Iql<n 2 =-00
k 00 00
< C2 7rO- {I + nSn} L L Ipll'l/J;'l/Ji+pl L I'l/Ji+ql
n Ipl<ni=-oo q=-oo

< C37rO~n {I + nSn} f f


.
p=-oo 2=-00
(Ip + il + Iii) I'l/J;'l/Ji+pl
k 00 00
<
-
C4 7rO-
n
{I + nSn} L .L (Ip + il + Iii) I'l/Ji'l/Ji+p I
P=-002=-00

Reasoning as above, we see that


00
L L L (7rq - 7ro) (7rp - 7ro) 'l/J;'l/Ji+p'l/Ji+q
Ipl<n Iql<ni=-oo
k2 00
< cS2"
n
{I + nSn}2 L L .L Ipllqll'l/J;'l/Ji+p'l/Ji+ql
Ipl<n Iql<n 2 =-00

< C5 ~ {I + ne ,t;oo (t::oo IPII";i";i+pl) ,


n }'
182 C. A. T. Diack

Hence we have

L
00

2'/'oA' (r + r+) A + 7r5'Y8 + (rt - 3) 7r5'Yo ,/,p


p=o

(13.21)

Next, we show that TI is Gaussian. But we first show that var (nTI) rv k 2 .
Using Lemma 6.5 in Zhou, Shen, and Wolf (1998), we see that we just need to
show that IIAI12 rv k 2 • We have
n-l

IIAI12 = L 7r~,
p=o

and

n-p
< C8 k '
---:;;: " N r (Xq) Ns (x p+q ) Y Ir-sl
" '~
~
q=l r,s

Noting that N r (Xq) = 0 when Xq ~ (tr, tr+d) and since IX p+q - xql = cnk-Ip,
we have
cgk (n - p) (k-1)P
l7rpl < ---:;;: yen .

Hence IIAI12 rv k 2 . Using the definition of 7rpq in (1.9) , we can write


1 n n
TI = n2 LL 7rpqZpZq = TI,1 + TI,2,
p=lq=l

with

and

We have ETI,1 = ETI. Besides,

var (TI,t) = ~4 L 7rpp7rqq COV (Z~, Z;) .


p,q
Regression Splines for G-O-F 183

Using Lemma 4.1 in Burman (1991) and since l7rpp l < k 2 we have for some
E>2
clOk2
var (T1,1 ) -< n 4
L
1-2/t < clOk2
0: 1p-q 1 - n3
L O:p
1-2/t -
- 0
(k2)
.
~q p

Therefore it is enough to prove that Tl,2 is Gaussian. We have var(Tl,2) :=::::var(TI)


According to Corollary 2.1 in Peligrad (1996), it is enough to prove

(13.22)

sup 4 1 (T ) '" 2 <


~ 7rpq 00, (13.23)
n n var 1
p,q

and for every ( > 0

where ()~ = var (i:P;;fq ZpZq -lEZpZq) . Eqs. (13.22) and (13.23) are trivial.
Reasoning as above one can show that ()~ 2:: n 3 hence, (13.24) follows easily
and Theorem 13.3.1 is proven. •

PROOF OF THEOREM 13.4.1. The theorem follows quite readily from Theorem
13.3.1. •

Acknowledgement. This research was carried out while the author was a
Fellow of EURANDOM in Eindhoven, The Netherlands.

References
1. Agarwal, G. G. and Studden, W. J. (1980). Asymptotic integrated mean
square error using least squares and bias minimizing splines, The Annals
of Statistics, 8, 1307-1325.

2. Brockwell, P. J. and Davis, R. A. (1991). Times Series: Theory and


Methods, Second Edition, New York: Springer-Verlag.

3. Burman, P. (1991). Regression function estimation from dependent ob-


servations, Journal of Multivariate Analysis, 36.

4. Grenander, U. and Szego, (1984). Toeplitz Forms and their Applications,


New York: Chelsea Publishing Company.
184 c. A. T. Diack

5. Peligrad, M. (1996). On the asymptotic normality of sequences of weak


dependent random variables, Journal of Theoretical Probability, 9, 703-
715.

6. Schumaker, L. (1981). Spline Function: Basic Theory, New York: John


Wiley & Sons.

7. Zhou, X., Shen, S., and Wolf, D. A. (1998). Local asymptotics for regres-
sion splines and confidence regions, The Annals of Statistics, 26, 1760-
1782.
14
Testing the Goodness-of-Fit of a Linear Model in
N onparametric Regression

Zaher Mohdeb and Abdelkader Mokkadem


Universite Mentouri, Constantine, Algeria
University of Versailles-Saint-Quentin, Versailles, France

Abstract: We construct a linear hypothesis test on the regression function f


in nonparametric regression model; more precisely, we test that f is an element
of U, where U is a finite dimensional vector space. The test statistic is easy
to compute and we give the asymptotic level and the asymptotic power of the
test. Even if the procedure is based on large sample behaviour, simulation
experiments reveal that, for small samples, the proposed statistic is close to the
asymptotic distribution and the test has good power properties.

Keywords and phrases: Linear hypothesis, nonparametric regression, non-


parametric test

14.1 Introduction
We consider the following regression model

Yi = f(ti) + Ei, i = 1, ... ,n, (14.1)

where f is an unknown real function, defined on the interval [0, 1] and h = a <
t2 < ... < tn = 1, is a fixed sampling on [0,1]. The errors Ei are independent
and identically distributed random variables with zero mean and variance (J'2.
Our aim is to construct a linear hypothesis test on the regression function
f. Let 91(t), ... , 9p(t) be linearly independent functions in [0,1] and let Up be
the vector space spanned by gl, ... , gp ; we want to test the hypothesis

Ho: f E Up against HI: f ¢ Up . (14.2)

The use of nonparametric regression methods for developing tests in regres-


sion analysis has been the subject of several works. A considerable amount of

185
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
186 Z. Mohdeb and A. Mokkadem

literature is based on the spline method: Cox and Koh (1989), Eubank and
Spiegelman (1990) derived goodness-of-fit tests for a linear model based on test
statistics constructed from nonparametric regression fits to residuals from lin-
ear regression; Jayasuriya (1996) proposed a test based on nonparametric fits
to the residuals from kth order polynomial regression. A method based on
a kernel estimate for 1 is proposed by Mi.iller (1992). HardIe and Mammen
(1993) suggested a test statistic based on a weighted L 2 -distance between the
parametric fit and a nonparametric fit based on a kernel estimator. The use of
empirical Fourier coefficients of 1 to construct a hypothesis test in the model
(14.1) is developed by Eubank and Spiegelman (1990) and Eubank and Hart
(1992).
In order to test the hypothesis Ho in (14.2), Dette and Munk (1998) pro-
posed a procedure based on the large sample behaviour of an empirical L2_
distance between 1 and the subspace Up. Their test statistic makes use of
weights. In the present paper, we propose a test statistic based on a simi-
lar approach but without weight and show that it has the same asymptotic
behaviour.
The remainder of this paper is organized as follows. In Section 14.2 a test
is derived for Ho and its asymptotic distribution under the null and alternative
hypotheses is given. In Section 14.3 we employ insights from Section 14.2 to
derive a practical test for Ho using Monte Carlo techniques for small samples.

14.2 The Test Statistic


We assume that the design {tl,"" tn} is associated to a positive density func-
tion h on the interval [0,1]. We denote L2(dJ-L), where dJ-L(t) = h(t) dt, the space
of square integrable functions, equipped with the usual inner product (., .). The
distance between 1 and Up is denoted by T(f). We assume that the following
assumptions are satisfied:
(AI) ...:nax
2-2, ... ,n
11ti
ti-1
h(t) dt - ~I
n
=
0 (~)
n
;

(A2) h, 1, gk, ,k = 1, ... ,p, satisfy the Holder condition with order I> 1/2;
(A3) Vn, el, ... , En are independent and :JC E lR,+ such that E(E[) < C, Vi.
Set

(14.3)

Since the hypothesis "1 E Up" is equivalent to "cjJ = 1 - L~=l akgk E Up," as
a measure of discrepancy between the regression function and the subspace Up,
Testing the Goodness-ai-Fit 187

we use the square distance

T2(¢) = min rll¢(t) - v(t)12 h(t)dt.


vEUp Jo
This square distance can be represented also as

(14.4)

where G(VI,'" ,Vk) denotes the Gramian determinant I((Vi, Vj)) li,j=l, ... ,k for
VI,"" vk in L 2 (dJ..l).
We thus need to estimate r2(¢); for this, we introduce the observations
X = (Xl"'" X n )', where
P
Xi = Yi - 2: akgk(ti) , i = 1, ... ,n.
k=l

We follow the procedure of Dette and Munk (1998), but applied to ¢ and
Xi, and without use of weights. Let Ll i = ti - ti-l (i = 2, ... , n), fl.l =
fl. 2 , W = diag( fl.ih(ti))._ and gk,n = (9k(t l ), ... , gk(tn) )', k = 1, ... , n.
2-1, ... ,n
Let Up,n be the vector subspace of IRn spanned by (gl,n," ., gp,n) and II~ the
projection matrix on the orthogonal of Up,n.
We define Gn(X,gl, .. ' ,gp) as the determinant obtained by replacing in
(14.4) the inner products (¢, ¢) and (¢,9k), (k = 1, ... ,p) respectively by
n
X'WX = 2: fl.ih(ti)Xl
i=l

and n
X'W gk,n = 2: fl.ih(ti)9k(ti)Xi k = 1, ... ,p.
i=l
Hence, we estimate r2 (¢) by

(14.5)

where S;
is the estimator of Gasser, Sroka, and Jennen-Steinmetz (1986), de-
fined by

t;
1 n-l
S; = 6(n _ 2) (Yi+l + Yi-l - 2Yi)2 . (14.6)

We reject Ho if T~ > T . We have the following asymptotic behaviour of our


test statistic.
188 z. Mohdeb and A. Mokkadem

Theorem 14.2.1 If the assumptions (AI) - (A3) are satisfied, then

vn (T~ - T2(j)) ~N (0, I; cr4 + 4cr2T2(j)), as n ~ +00,


where ~ denotes the convergence in distribution.
Steps of the proof
First, we show that Gn(X, gl, ... , 9p) is invariant with respect to change of
basis in Up. Using then a fixed orthonormal basis {VI, . .. , vp} of Up, the test
statistic may be written

T~ = Gn (X,Vl, ... p )-S;tr(Wrr;)


I' -
,V

~ Lli h(ti)X; - t, I~ Lli h(ti)Vk(ti) Xi S; tr (WIT';)


Then, we use analysis arguments and Lemma A.l in Dette and Munk (1998)
for the weights Wi,n = 1, i = 1, ... , n, to obtain

(14.7)

where r}i = r}i,n, i = 1, ... , n are random variables which form a centered row-
wise 2-dependent array.
We show that

lim Var (
n->oo
vn"
In
n~
r}i
) 17 4
= _cr
9
+ 4cr 2T2(j) .
t=2

Now, the convergence of the distribution of fo L~2 r}i to normal law


N (0, 1; cr4+ 4cr 2T2(j)) follows from Orey (1958). Theorem 14.2.1 is obtained
from (14.7).

The test procedure


Since T2 (j) = 0 when f E Up, the theorem provides a very simple test of
the hypothesis Ho; let (j2 be any consistent estimator of cr 2 • We reject Ho at the
level a if the test statistic (9n/17)I/2T~/(j2 exceeds the (1 - a)-quantile Cl-a
of the standard normal distribution. (j2 may be replaced by Se; however, from
our simulations, it appears that the use of Se does not always behave well; for
this reason, we suggest the use of the empirical estimator of cr 2 :

fi = ~ t
n i=1
{Yi - t k=1
Zik9k(td}2 , (14.8)

which is consistent under Ho and gives a better approximation by the normal


distribution.
Testing the Goodness-of-Fit 189

14.3 Simulations
In order to investigate both the power and level of our test (14.2), we conducted
a small-scale simulation using the model Yi = !(ti) + Ci, i = 1, ... ,n (n = 50)
with t = (i - 1) I (n - 1), i = 1, ... , n and the uncorrelated normal random
errors with variance (72. In our simulations, we study the test of the hypothesis
Ho: ! E U2, where U2 is the subspace of L2(dp,) spanned by gl(t) = t and
g2(t) = 1. The Monte Carlo study, for small sample size n = 50, turns on
the comparison of the statistic T; and the statistic M~ proposed by Dette and
Munk (1998). The empirical quantiles, denoted by QT and QDM of the test
statistics T and M~ respectively, are given in Table 14.1 and Table 14.2. As
shown in Table 14.1, use of the 8 2 =S; estimator of Gasser, Sroka, and Jennen-
Steinmetz (1986), in the statistics reveals that the normal approximation is not
satisfactory. In Table 14.2, 8 2 is replaced by &2 = (lin) 2:i=lIYi - alti - a212;
it appears, that the normal law is better approximated with T; than M~.
In order to study the power, we consider two different forms for j, namely
fr(t) = -1.5t + 0.5 + ,8te- 2t and h(t) = -1.5t + 0.5 + ,8t2 with several choices
o.n the interval [0,2] for ,8. The empirical powers of T and M~ are denoted by
PT and PDM. As shown in Table 14.3 and Table 14.4, where the level is taken
at a = 5%, the results of the empirical power study give a little advantage to
the statistic T;. We note also that the test has good power properties.
190 z. Mohdeb and A. Mokkadem

Table 14.1: Empirical quantiles, when (72 is estimated by


S;; (theoretical values at level 1%, 5%, 10%, are 2.33, 1.65,
1.28 respectively)

a= 1% a=5% a= 10%

(7 QT QDM QT QDM QT QDM


0.05 3.7457 3.5102 2.3354 2.1856 1.7829 1.6124
0.10 3.6813 3.4737 2.2886 2.1561 1.7751 1.6080
0.20 3.8228 3.5927 2.3632 2.1640 1.7832 1.6098
0.50 3.7958 3.5486 2.4075 2.2063 1.7798 1.6192
1.00 3.5249 3.2360 2.2697 2.0983 1.6775 1.5229
2.00 3.5239 3.3212 2.2464 2.0923 1.6716 1.5190

Table 14.2: Empirical quantiles, when (72 is estimated by &2;


(theoretical values at level 1%, 5%, 10%, are 2.33, 1.65, 1.28
respectively)

a= 1% a=5% a= 10%

(7 QT QDM QT QDM QT QDM


0.05 2.2711 2.1428 1.7099 1.5900 1.3495 1.2362
0.10 2.2250 2.0973 1.6540 1.5408 1.3419 1.2198
0.20 2.1796 2.0346 1.6846 1.5678 1.3870 1.2530
0.50 2.2001 2.1122 1.6989 1.5707 1.3855 1.2514
1.00 2.2982 2.1976 1.7188 1.6167 1.4083 1.2708
2.00 2.2778 2.1173 1.6877 1.5558 1.3519 1.2311
Testing the Goodness-of-Fit 191

Table 14.3: Proportion of rejections in 1000 samples of size n = 50; with two
examples of alternatives: fI(t) = alt + a2 + /3te- 2t and h(t) = alt + a2 + /3t 2,
((/2 estimated by S;)

fI(t) h(t)

/3 PT PDM PT PDM

(/ = 0.05 0.0 0.052 0.053 0.041 0.042


0.1 0.060 0.058 0.067 0.065
0.5 0.160 0.155 0.715 0.670
1.0 0.666 0.616 0.999 0.999
1.5 0.972 0.964 1.000 1.000
2.0 0.999 0.999 1.000 1.000

(/ = 0.10 0.0 0.057 0.059 0.054 0.052


0.1 0.047 0.044 0.055 0.051
0.5 0.071 0.072 0.182 0.169
1.0 0.163 0.136 0.687 0.651
1.5 0.355 0.327 0.990 0.986
2.0 0.644 0.599 0.977 0.965

(/ = 0.20 0.0 0.047 0.045 0.049 0.047


0.1 0.060 0.057 0.048 0.045
0.5 0.059 0.059 0.055 0.054
1.0 0.075 0.072 0.200 0.190
1.5 0.109 0.097 0.415 0.393
2.0 0.179 0.158 0.375 0.350

(/ = 0.50 0.0 0.065 0.062 0.055 0.050


0.1 0.056 0.055 0.038 0.037
0.5 0.050 0.047 0.053 0.048
1.0 0.048 0.055 0.064 0.066
1.5 0.057 0.055 0.090 0.086
2.0 0.066 0.064 0.088 0.087
192 Z. Mohdeb and A. Mokkadem

Table 14.4: Proportion of rejections in 1000 samples of size n = 50; with two
examples of alternatives: ft (t) = alt + a2 + (3te- 2t and f2(t) = alt + a2 + (3t 2 ,
(0"2 estimated by 82 ) ,

ft (t) f2(t)

(3 PT PVM PT PVM

0" = 0.05 0.0 0.052 0.049 0.055 0.047


0.1 0.056 0.054 0.059 0.060
0.5 0.160 0.145 0.732 0.671
1.0 0.647 0.576 1.000 1.000
1.5 0.966 0.944 1.000 1.000
2.0 1.000 0.998 1.000 1.000

0" = 0.10 0.0 0.052 0.050 0.045 0.044


0.1 0.054 0.051 0.049 0.044
0.5 0.067 0.066 0.170 0.166
1.0 0.165 0.146 0.715 0.667
1.5 0.359 0.314 0.981 0.966
2.0 0.640 0.568 1.000 1.000

0" = 0.20 0.0 0.043 0.046 0.052 0.057


0.1 0.047 0.053 0.043 0.041
0.5 0.060 0.056 0.048 0.047
1.0 0.077 0.074 0.183 0.167
1.5 0.104 0.092 0.384 0.343
2.0 0.173 0.155 0.693 0.635

0" = 0.50 0.0 0.050 0.048 0.028 0.030


0.1 0.054 0.050 0.061 0.058
0.5 0.056 0.058 0.065 0.055
1.0 0.038 0.037 0.063 0.064
1.5 0.051 0.052 0.097 0.093
2.0 0.068 0.064 0.117 0.106
Testing the Goodness-ai-Fit 193

References
1. Cox, D. and Koh, E. (1989), A smoothing spline based test of model
adequacy in polynomial regression, Annals of the Institute of Statistical
Mathematics, 41, 383-400.

2. Dette, H. and Munk, A. (1998). Validation of linear regression models,


Annals of Statistics, 26, 778-800.

3. Eubank, R. L. and Hart, J. D. (1992). Testing goodness-of-fit in regression


using nonparametric via order selection criteria, Annals of Statistics, 20,
1412-1425.

4. Eubank, R. L. and Spiegelman, C. H. (1990). Testing the goodness-of-fit


of a linear model via nonparametric regression techniques, Journal of the
American Statistical Association, 85, 387-392.

5. Gasser, T., Sroka, 1., and Jennen-Steinmetz, C. (1986). Residual variance


and residual pattern in nonlinear regression, Biometrika, 73, 625-633.

6. HardIe, W. and Mammen, E. (1993). Comparing nonparametric versus


parametric regression fits, Annals of Statistics, 21, 1926-1947.

7. Jayasuriya, B. R. (1996). Testing for polynomial regression using non-


parametric regression techniques, Journal of the American Statistical As-
sociation, 91, 1626-1631.

8. Miiller, H. G. (1992). Goodness of fit diagnostics for regression models,


Scandinavian Journal of Statistics, 19, 157-172.

9. Orey, S. (1958). A central limit theorem for m-dependant random vari-


ables, Duke Mathematics Journal, 52, 543-546.
PART V
GOODNESS-OF-FIT TESTS IN SURVIVAL ANALYSIS
AND RELIABILITY
15
A New Test of Linear Hypothesis in Regression

Y. Baraud, S. Huet, and B. Laurent


Ecole Normale Superieure, Paris, France
INRA, Jouy-en-Josas, France
Universite Paris-Sud, Orsay, France

Abstract: In the Gaussian regression model, we propose a new test, based on


model selection methods, for testing that the regression function F belongs to
a linear space. The test is free from any prior assumption on F and on the
variance a 2 of the errors. The procedure is rate optimal over both smooth and
directional alternatives and the simulations studies show that it is also robust
with respect to the non-Gaussianity of the errors.

Keywords and phrases: Adaptive test, model selection, linear hypothesis,


minimax hypothesis testing, nonparametric alternative, goodness-of-fit, non-
parametric regression, Fisher test

15.1 Introduction
We consider the regression model

(15.1)

where the Xi'S are known deterministic points belonging to R k, F is an unknown


function from R k into R, a is an unknown positive number and (c:l, ... , en)
is an unobservable sample of i.i.d. standard Gaussian random variables, i.e.
ei '" N (0, 1). Let V* be some finite dimensional subspace of the set of functions
F(Rk, R) = {F : Rk ---? R}, that is, V* = fE1=1 (}j'ljJj,(h, ... , (}d E R} where
the 'ljJj's are given functions in F(Rk,R).
The aim of this paper is to propose a test of the null hypothesis "F belongs
to V*" against that it does not under no prior information on F and a.
The testing procedure can be described in the following way. We consider a
finite collection of linear subspaces of F(R k , R), {S~, mE M}. We associate
to the collection {S~, mE M} a suitable collection of numbers {am, mE M}

195
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
196 Y. Baraud, S. Huet, and B. Laurent

belonging to ]0, 1[ and consider the family of Fisher tests of level am of the null
hypothesis F E V* against the alternative F E V* + 8~. Our procedure rejects
the null hypothesis if one of these Fisher tests does.
Let us give some examples. If F is defined on I = [0,1] one can take for
mEN, 8~ as the space of trigonometric polynomials of degree not larger
than m, the space of piecewise polynomials with fixed degree based on the
grid {kim, k = 0, ... ,m} or the space spanned by the Haar basis, {'Pj,k, j E
{O, ... ,m}, k E {I, ... , 2j }}. Moreover, it is possible to mix several kinds of these
spaces to constitute the collection {8~, mE M}. If F is defined on I = [0,1]2,
one can take for example the space of constant functions on a finite partition
of [0,1]2 in rectangles.
Under the Gaussian assumption on the errors, the results given in this paper
are nonasymptotic. For each n, the test has the desired level and we charac-
terize a set of functions over which our test is powerful. Under the posterior
assumption that F belongs to the Holderian ball

Hs(R) = {F: [0, 1] ~ R, \f(x,y) E [0,1]2, IF(x) - F(y)1 ~ Rlx - yls}, (15.2)

°
for some R> and s E]O, 1], we show that the test is rate optimal among the
adaptive tests (that is among the tests which have no prior knowledge of R nor
s). Such a result has been obtained by Horowitz and Spokoiny (2000) for a
different procedure that will be described in Section 15.4. In addition our test
recovers the parametric rate of testing over directional alternatives. A similar
result has been obtained by Eubank and Hart (1992).
Finally, we present a simulation study to evaluate the performances of our
testing procedure both when the errors are Gaussian and non-Gaussian.

15.2 The Testing Procedure


Let {8~ ,m E M} be a collection of subspaces of F(R k , R). For each mE M,
we denote by 8 m the linear subspace of R n defined as the orthogonal projection
of

onto V..l where


v = {(F(xd, ... , F(xn)f, FE V*},
and set Vm = V + 8 m . We make the following assumption:
Condition (HM): For all m E M 8m "# {O} and Vm "# Rn.
This condition is usually met in practice if with regard to the design points
Xi'S, the collection {8~, m E M} is chosen judiciously. It implies that the
Test of Linear Hypothesis 197

dimensions of 8 m and V~, respectively denoted by Dm and N m , are both


different from 0.
We denote by P p the law of the observation Y = (Y1 , ... , Yn)T drawn from
model (15.1) and by 11.11 the Euclidean norm in Rn.
For each mE M, we denote by lIm and lIvrn the orthogonal projectors onto
8 m and Vm respectively.

15.2.1 Description of the procedure


Let us fix some a E]O, 1[ and assume that the collection {8~ mE M} satisfies
(HM)' Let Ta be the test statistic defined by

(15.3)

where for each u E]O, 1[, Pi} N (u) is the 1 - u quantile of a Fisher random
variable with Dm and N m de'gr~es of freedom and where {am, m EM} is a
collection of numbers in ]0, 1[ satisfying

\IF E V*, Pp (Ta > 0) ~ a. (15.4)

\Ve reject the null hypothesis when Ta is positive. In the sequel, we choose the
collection {am, m E M} in accordance with one of the following procedures:

PI For all mE M, am = an where an is the largest number a in ]0, 1[ satisfying

~a. (15.5)

where c is a standard Gaussian vector of dimension n.

P2 The am's verify the following equality

L am=a.
mEM

Comments: For each m EM, the Fisher t,est of level am of the hypothesis
F E V* against F ¢ V*, F E V* + 8:n rejects the null hypothesis if the test
statistic

is positive. Since our test statistic Tn is defined as the supremum of those


statistics, clearly our test rejects the null if one of those Fisher tests does.
198 Y. Baraud, S. Huet, and B. Laurent

15.2.2 Behavior of the test under the null hypothesis


The test associated to the procedure PI has the property to be of size a. More
precisely, we have
\IF E V*, PF(Tn > 0) = a. (15.6)
This result follows from the definition of an noticing that 8 m C V..L implies that
IImY = IImc and V c Vm implies that Y - IIvmY = c - IIvmc. The drawback
of this procedure is that it requires heavy computations to obtain the value of
an. In contrast, the choice of the am's as proposed in the procedure P2 does
not require any computation at all. A test associated to the procedure P2 leads
to a Bonferonni test for which we only have (15.4) by arguing as follows:

15.2.3 A toy framework: The case of a known variance


For the sake of simplicity and to clarify the proofs as much as possible, we only
state our main results in the simpler case where the variance (72 is known. This
means that we content ourselves with proving those results for the test statistic
Tn (instead of Tn) defined as

(15.7)

where for each u E]O, 1[, XD~ (u) denotes the 1 - u quantile ofaX2 random
variable with Dm degrees of freedom (note that in this case the assumption
that N m f- 0 can be relaxed). For a complete proof of those results in the case
of an unknown variance, i.e. for the test statistic Tn, we refer the reader to the
paper by Baraud, Huet, and Laurent (2000).

15.3 The Power of the Test


15.3.1 The main result
The aim of this section is to describe a subset of F(Rk, R) over which the test
defined in Section 15.2.3 is powerful. For any FE F(Rk, R) and 9 C F(Rk, R),
we define
Test of Linear Hypothesis 199

Theorem 15.3.1 Let a and f3 be two numbers in ]0,1[. Let To; be the test
statistic defined by (15.7). Let

Then, PF(To; > 0) ~ 1 - f3 for all F belonging to the set

Fn(f3) = {F E F(Rk,R), d~(F, V*) ~ rr!~L (d;(F, V* + S':n) + v~(f3)) }.


(15.8)

Comment: The definition of Fn(f3) given in (15.8) shows that we would take
advantage of a collection of spaces {S~,m E M} containing spaces with good
approximation properties, that is spaces S~'s such that both d~(F, S~) and the
dimension of S~ are small.

15.3.2 Rates of testing


By evaluating the order of magnitude of the infimum in (15.8) under the as-
sumption that F belongs to some class of smoothness, we derive from Theo-
rem 15.3.1 an upper bound for the rate of testing achieved by our procedure
over this class.
We take full advantage of the possibility to mix several linear spaces in the
collection {S~, mE M}, each of them being appropriate for detecting specific
alternatives. We show that for an adequate choice of the collection, we obtain
a test that is both rate optimal (in some sense) over the s-Holderian balls of
radius R defined by (15.2), simultaneously for all s E]O, 1] and R > 0, and that
is able to detect local alternatives at the parametric rate of testing. For this
aim, we introduce the following collections of am's and S~'s .

• Let
Ml = {k E {I, ... , n}, kE {2 j , j ~ O} U {n} } .
We set a(k,l) = a/(4IMll) if k t=
nand a(n,l) = a/4. Let SCk,l) be the
linear subspace of F ([0, 1], R) spanned by the piecewise constant functions
(Il~,il' j = 1, ... , k) if k t= nand SCn,l) = F([O, 1], R) .
• Let (cPj)f2l beaHilbertbasisofF([O, I],R). For each k E M2 = {I, ... ,n}
we define a(k,2) ,= 3a/(1f 2 k 2 ) and SCk,2) as the linear space spanned by
the function cPk.'

In the sequel, we set M = {(k, 1), k E Md U {(k, 2), k E M2} and consider
the collections {S~, mE M} and {am, mE M}. Note that LmEM am :::; a,
and therefore the inequality (15.4) holds.
200 Y. Baraud, S. Huet, and B. Laurent

Corollary 15.3.1 Let 0: and (3 be two numbers in ]0,1[. Let t:r be the test
statistic defined by (15.7), then the following holds:

(i) Let us denote by Ln the quantity log log(n), and assume that R2 ~
vr:;a-
n
2

°
There exists a constant C(o:, (3) depending only on 0: and (3 such that for
all s E]O, 1], for all R > and for all FE Hs(R) such that

d;(F, V*)

" C( "'~) { ( R'/(l+4.) ( V;,,')


4,/(1+4.) + ~n " , ) " ( ;,,) } ,

we have P F (Ta > 0) ~ 1 - (3.

(ii) Assume that F E L2([0, 1], dx) and that the Xi'S satisfy for all k ~ 1,

If for some ko ~ 1,

then for n large enough,

Comments: In Case (i), the rate of testing is of order (y'log log(n)/n)2s/(1+4s)


provided that s > 1/4. In the Gaussian white noise model, the rate n- 2s /(1+ 4s )
is known to be minimax over s-Holderian balls [we refer to Ingster (1993)].
The loss of efficiency of a log log( n) factor is due to the fact that the test is
adaptive, which means that it does not use any prior knowledge on sand R [for
further details we refer to the work of Spokoiny (1996)]. When s < 1/4, the
rate of testing is of order n- 1 / 4 . This rate is known to be minimax as proved in
Baraud (2000). Lastly, let us emphasize that the result given in (i) holds under
no assumption on the Xi'S.
The result presented in (ii) shows that we obtain the parametric rate 1/ fo
for directional alternatives.
Test of Linear Hypothesis 201

15.4 Simulations
We carry out a simulation study in order to evaluate the performances of our
procedure both when the errors are normally distributed and when they are not,
and to compare its performances with the testing procedure proposed recently
by Horowitz and Spokoiny (2000). This section is organized as follows: we
present the simulation experiment, then we describe how our testing procedure
is performed. Finally we give the results of the simulation study.

15.4.1 The simulation experiment


We evaluate the performances of our test by using the same simulation experi-
ment as Horowitz and Spokoiny (2000). They considered three distributions of
the errors Ci.

1. The Gaussian distribution: Ci rv N(O, 4)

2. The mixture of Gaussian distributions: Ci is distributed as 7rXI +(1-7r)X2


where 7r is distributed as a Bernoulli variable with expectation 0.9, Xl
and X2 are centered Gaussian variables with variance respectively equal
to 2.43 and 25, 7r, Xl and X2 are independent. In that case the variance
of Ci equals 3.9. This distribution has heavy tails.

3. The Type I distribution: Ci has density (s/2)fx(I-l+(s/2)x) where fx(x) =


exp{ -x - exp( -x)}, and where I-l and s2 are the expectation and the vari-
ance of a variable X with density fx. These Ci'S are centered variables
with variance 4. This distribution is asymmetrical.

By considering the distributions 2 and 3, we evaluate the robustness of our


procedure to discrepancies to the assumption of normality.
Three regression function F are considered:

Fo(x) l+x
F,.(x) 1+x + ~¢ ( ~) with T = 0.25 and T = 1

where ¢ is the density of a standard Gaussian variable. When '( = 0.25 the
regression function FT presents a peak, when '( = 1, FT presents a small bump.
We will test the linearity of the function F at level a = 5%.
The number of observations n = 250. The· Xi'S are simulated once and for
all, as centered Gaussian variables with variance equal to 25 and are constrained
to lie in the interval [<p- 1 (0.05), <p- 1 (0.95)]. where <p is the distribution function
of a standard Gaussian variable.
202 Y. Baraud, S. Huet, and B. Laurent

15.4.2 The testing procedure

The testing procedure depends on the choice of the collections {S;;', m E M}


and {am,m EM}.

The collection {S;;', mE M}. We consider the spaces S;;' based on piecewise
functions and trigonometric polynomials. More precisely, for each k = 1, ... n,
we consider the spaces S(k,l) of regular histograms based on the regular grid
{l / k, l = 0, ... k} and S(k,2) the space of trigonometric polynomials of degree
not larger than k. For each 6 E {1,2}, and for each k = 1, ... n let us recall
than S(k,8) is the linear subspace of R n defined as the orthogonal projection
of {(F(XI), ... , F(xn))T, FE S(k,8)} onto V..l, where V is the space of R n with
dimension d = 2 spanned by the vectors (1, ... , l)T and (Xl, ... ,Xn)T. Now let
us set

M = {( k, 6) E {I, ... , n} x {I, 2}, dime S(k,8)) E { {2 j , j :::: O} n {I, ... , n /2} } .

The collection {am, m EM}. We consider the procedure P2 (see Sec-


tion 15.2.1) where the am's equal a/IMI where IMI denotes the cardinality
of M and a = 5%. Note that, under the assumption of Gaussian errors, the
test based on the procedure PI is of size a and is more powerful than the test
based on the procedure P2 since an is larger than the am's. But the calculation
of an makes the procedure PI complicated to deal with, an depending on n, on
V, and on the collection {Sm,m EM}. Therefore we decided to evaluate the
performances of the procedure P2 that is very easily implemented.
The test statistic is equal to Tn defined by Equation 15.3, and the null
hypothesis is rejected if To. > O. The results of the simulation study are reported
in Table 15.1 under the column Tn. They are based on 4000 simulations.

15.4.3 The test proposed by Horowitz and Spokoiny (2000)

They proposed an adaptive testing procedure for testing that the regression
function belongs to some parametric family of functions. Their procedure re-
jects the null hypothesis if for some bandwidth among a grid, the distance
between the nonparametric kernel estimator and the kernel smoothed paramet-
ric estimator of F under the null hypothesis is large. The quantiles of their test
statistic are estimated by a bootstrap method.
The results of the simulation experiment are reported in Table 15.1, under
the column HS-test. They used 1000 simulations for estimating the level of the
test and 250 simulations for estimating the power.
Test of Linear Hypothesis 203

Table 15.1: Percentage of rejection


Null hypothesis is true Null hypothesis is false
HS-test To. T HS-test To. T HS-test To.
Normal 0.07 0.03 1 0.79 0.93 0.25 0.92 1
Mixture 0.05 0.05 1 0.80 0.93 0.25 0.93 1
Type I 0.05 0.03 1 0.82 0.94 0.25 0.97 1

15.4.4 Results of the simulation study


As expected, when the distribution of the errors is Gaussian, the test based
on Ta is of level 5% (see Section 15.2.2). For the distributions "mixture" and
"Type I", it has also the desired level showing the robustness of the procedure
with respect to non-Gaussianity in these cases. In addition it turns out to be
powerful over the alternatives considered by Horowitz and Spokoiny (2000).

15.5 Proofs
15.5.1 Proof of Theorem 15.3.1
For the sake of simplicity and to keep our formulae as short as possible we
assume that (j2 = 1. By definition of t~, for any F E F(R k , R),

Pp (ta::; 0) Pp ('11m E M, IIIImYI12::; XD~(am))


< inf Pp
mEM
(11IImY112 ::; XD1 (am)) .
m

Let f = (F(xI), ... , F(xn))T. Let us denote by XDl(a, u) the u quantile of a


noncentral X2 variable with D degrees of freedom and noncentrality parameter
a. For each mE M, the random variable IIIImYI12 is distributed as a noncentral
X2 variable with Dm degrees of freedom and noncentrality parameter IIIImfI12.
It follows that if for some m in M,

(15.9)

then P p(Ta ::; 0) ::; (3. We shall use the two following inequalities, respectively
due to Laurent and Massart (2000) and to Birge (2000). For all u EjO, 1[:

XDl(U) < D + 2VD log(l/u) + 21og(1/u), (15.10)

XDl(a,u) > D+a-2V(D+2a)log(1/u). (15.11)


204 Y. Baraud, S. Huet, and B. Laurent

In the following, we set Lm = log(l/a m ) and L = log(I//J). By using the


inequalities VU + v ::; y'u + .jV, 2y'uV ::; ()u + ()-Iv which hold for all positive
numbers u, v, () we derive from (15.11) that

Therefore, using (15.10) the inequality (15.9) holds if

Sincey'u + .jV ::; V2vu + v for any positive numbers u, v, inequality (15.9)
holds as soon as

For any linear space W eRn, we denote by IIw the orthogonal projector onto
W. Using the fact that Sm C Vl-, by the Pythagorean inequality, IIIImfl1 2 =
Ilf-IIvfI12-IIIIV.Lf-IImfI12. Noting that IIIIV.Lf-IImfI12 = Ilf-IIv+smfI12,
we get IIIImfl1 2In = d~(F, V*) - d~(F, V* + S:n), which concludes the proof of
Theorem 15.3.1.

15.5.2 Proof of Corollary 15.3.1


PROOF OF CASE (i). All along the proof, C denotes some constant that may
vary from line to line. The dependency of C with respect to various quantities
is specified by the notation C(.). Let

It follows from Theorem 15.3.1 that P FCi'e" > 0) ~ 1 - (3 for all F such that
d~(F, V*) ~ p~(F). Let us therefore give an upper bound for p;(F).
Note that since S(n,l) = .1'([0,1], R), we have d;(F, S(n,I») = 0 and since
D (n,1) ::; n (we do not assume the design points to be distinct) and a (n, I) = a I 4
we have
(15.12)

Noting that d;(F, V* +S(k,I») ::; d;(F, S(k,I»)' the statement of the first part
of the corollary follows from the two following inequalities: for all F E Hs(R)
and for all k E MI,

d~(F, S(k,I») < R 2 k- 2s (15.13)

V[k,I)({3) < :c(a,{3) (VkLn+Ln). (15.14)


Test of Linear Hypothesis 205

Let us prove (15.13). For any k E MI, we define the function Fk on


[0,1] as follows: for x E](j -l)/k,j/k]' Fk(x) = F(j/k) for j = 1, ... ,k and
Fk(O) = F(l/k). Clearly Fk E S(k,l) and under the assumption that F E Hs(R)
the following inequalities hold

The inequality (15.14) follows easily by noting that for all k E MI,

log (l/a(k,I)) ~ C(a,,8) log log(n),

and D(k,l) = k.
Therefore, we have

(15.15)

Under the assumption on R we know that k* ~ 1. Let us now distinguish


between two cases:

1. If k* ~ n then there exists some k' E Ml such that k* ~ k' ~ 2k*. It


follows that

Hence, by (15.15) we get

(15.16)

2. If k* > n then by (15.12)

(15.17)

Now, by collecting (15.12), (15.16) and (15.17), one obtains that


206 Y. Baraud, S. Huet, and B. Laurent

which concludes the proof in case (i) by replacing k* by its value.



PROOF OF CASE (ii). Let us set fn = (F(xI), ... ,F(xn))T/yfii, and assume
that (72 = 1 for simplicity.
Following the proof of the theorem, we have to show that for some k E M2

(15.18)

Using that 2P(X > u) ~ exp( -u 2 /2) for all u > 0, where X rv N(O, 1), it is
easy to show that for all t EJO, IJ

XII(t) ~ -2Iog(t).

In the same way using that for all /-l E Rand 0 < u < /-l,

we get

XII (/-l2, t) ;::: (/-l - J-2Iog(2t)) 2.

Then (15.18) is verified if

This holds as soon as 11I1(k,2)f n I1 2 ;::: -4Iog(2a(k,2),B).


Let r~(ko) = 2::i=I ¢%o(xd/n. We conclude by noticing that


References
1. Baraud, Y. (2000). Non asymptotic minimax rates of testing in signal
detection, Technical Report 00.25, Ecole Normale Superieure, Paris.

2. Baraud, Y., Huet, S., and Laurent, B. (2000). Adaptive tests of linear
hypotheses by model selection, Technical Report 99-13, Ecole Normale
Superieure, Paris.
Test of Linear Hypothesis 207

3. Birge, L. (2000). An alternative point of view on Lepski's method, In


State of the Art in Probability and Statistics; Festschrift for Willem R.
van Zwet (Eds., Mathisca C. M. de Gunst, Chris A. J. Klaassen, and Aad
W. van der Vaart), Institute of Mathematical Statistics, Lecture Notes-
Monograph Series, (to appear).

4. Eubank, R. L. and Hart, J. D. (1992). Testing goodness-of-fit in regression


via order selection criteria, Annals of Statistics, 20, 1412-1425.

5. Horowitz, J. L. and Spokoiny, V. G. (2000). An adaptive, rate-optimal


test of a parametric model against a nonparametric alternative, Econo-
metrica, (to appear).

6. Ingster, Yu. 1. (1993). Asymptotically minimax testing for nonparametric


alternatives I, Mathematical Methods of Statistics, 2, 85-114.

7. Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic


functional by model selection, Annals of Statistics, (to appear).

8. Spokoiny, V. G. (1996). Adaptive hypothesis testing using wavelets, An-


nals of Statistics, 24, 2477-2498.
16
Inference in Extensions of the Cox Model for
Heterogeneous Populations

Odile Pons
INRA Biometrie, Jouy-en-Josas, France

Abstract: The analysis of censored time data in heterogeneous populations


requires extensions of the Cox model to describe the distribution of duration
times when the conditions may change according to various schemes. New
estimators and tests are presented for a model with a non-stationary baseline
hazard depending on the time at which the observed phenomenon starts, and
a model where the regression coefficients are functions of an observed variable.

Keywords and phrases: Asymptotic distribution, censored data, Cox model,


non-parametric estimation, non-stationarity

16.1 Introduction
The distribution of a survival time TO conditionally on a vector Z of explanatory
variables or processes is characterized by the hazard function
1
>..(t I Z) = ~ttO
lim -;:-Pr(t < TO < t + ~tITO > t, Z(s), 0 < s < t).
ut - - - -

Cox's model (1972) is widely used for the analysis of censored survival data
under controlled experimental conditions, it expresses the hazard function of
TO in the form>..(t I Z) = >..(t)ef3TZ (t) where (3 is a vector of unknown regression
parameters and>" is an unknown non-parametric baseline hazard function. If
the survival time is right censored at a random time C, the observations are
the censored time T = TO /\ C, the indicator 6 = 1{TO::;C} and the covariate Z.
The regression parameter is estimated by the value that maximizes the "partial
likelihood" [Cox (1972)] and an estimator of the cumulative hazard function
JJ
A(t) = >..(s) ds was defined by Breslow (1972).

211
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
212 0. Pons

In the fields of epidemiology or ecology, heterogeneous populations are ob-


served and more complex models are required to describe the distribution of
event times when the conditions may change in time or according to the value
of a variable. This paper presents results for two extensions of the classical Cox
model involving a non-parametric baseline hazard function and a regression on
a p-dimensional process,
• a model for the duration X = TO - S of a phenomenon with a non-
stationary baseline hazard depending non-parametrically on the time S
at which the observed phenomenon starts,
AXls,z(x I S, Z) = AXls(x; S) e,BTZ(S+x) , (16.1)

• a model where the regression coefficients are smooth functions of an ob-


served variable X,
(16.2)

These models have been discussed and studied in the literature by several au-
thors, in particular by Keiding (1991) for ( 16.1), by Brown (1975) and Zucker
and Karr (1990) for time-varying coefficients and by Hastie and Tibshirani
(1993) for (16.2). In Pons and Visser (2000) and Pons (1999), I proposed new
estimators of the re~ession parameters and the cumulative baseline hazard
function, and I studied their asymptotic properties. They are based on the
local likelihood estimation method introduced by Tibshirani and Hastie (1987)
for non-parametric regression models and adapted with kernels by Staniswalis
(1989). They are defined using a kernel K that is a continuous and symmetric
density with support [-1,1] and Khn(S) = h:;;lK(h:;;ls), where the bandwidth
h n tends to zero at a convenient rate. The asymptotic properties of the es-
timators fin and An of (3 and of the cumulative baseline hazard function will
be presented for both models. They follow the classical lines but the kernel
estimation requires modifications. In model (16.1), the convergence rate of fin
is not modified by the kernel smoothing, it is n- 1 / 2 as in the Cox model and
An converges at the non-parametric rate for a kernel estimator as expected. In
model (16.2), An only involves kernel terms through the regression functions
but both fin and An have the same non-parametric rate of convergence as it was
also the case for splines estimators in Zucker and Karr (1990). Goodness-of-fit
test statistics for the classical Cox model against the alternatives of models
(16.1) or (16.2) are deduced from these results.

16.2 Non-Stationary Cox Model


On a probability space (0, F, Po), let S and TO be the times at which a phenom-
enon starts and ends respectively, and let X = TO - S > 0 denote the duration
Cox Models for Heterogeneous Populations 213

of the phenomenon. Let Z be a p-dimensional vector of left-continuous process


of covariates with right-hand limits. We assume that the conditional hazard
function of X given (S, Z) follows (16.1) with the parameters (j30, Ao,xls), that
TO may be right-censored at a random time C independent of (S, TO) condi-
tionally on Z and non informative for j30 and AO,X,S, and that S is uncensored.
We observe a sample of n observations (Si, Ti , 8i , Zih<i<n drawn from the dis-
tribution of (S, T, 8, Z), where T = TO 1\ C and 8 = l{-;'o::;C} is the censoring
indicator. The data are observed on a finite time interval [0, T] strictly included
in the support of the distributions of the variables S, TO and C, and (S, TO)
belongs to the triangle IT = {(s, x) E [0, T] X [0, T]; S + x ::; T}. Up to additive
terms constant in (j3, AXIS), the log-likelihood of (Si, Ti , 8i , Zi) under (16.1) is

l(i) = 8i{log AXls(Xi ; Si) + j3T Zi(Ti )}


- foT Yi(y) exp{j3T Zi(Si + Y)}AXIS(Y; Sd dy.

For Si in a neighborhood of s, the baseline hazard AXls(-; Si) is approximated


by AXls(·; s), which yields an approximation l(i)(S) for l(i), and the local log-
likelihood at S E [hn,T - hn ] is defined as In(s) = LiKhn(S - Si)l(i)(S). The
estimator of Ao,xls(x;s) = It Ao,xls(y;s)dy is then defined for (s,x) E In,T =
{(s, x); S E [hn, T - hn], x E [0, T - s]} by An,XIS(X; S) = An,XIS(x; S, ,Bn) with

A (. j3) _ '" Khn (S - Si)l{Si<Ci,xi<X!\(Ci-Si)}


n,XIS X,S, - L: nSAO)(Xi;s,j3)

where S~O) (x; S, j3) = n- 1 Lj Kh n (s - Sj)Yj-(x) exp{j3T Zj(Sj + x)} and Yi(x) =
l{T?!\Ci~Si+x} = l{Xi~x} with Xi = Xi 1\ (Ci - Si). The estimator 7Jn of the
regression coefficient maximizes the following partial likelihood

In(j3) = L8i [j3T Zi (TP) -log{nS~O)(Xi;Si,j3)}] cn(Si)


i

where cn(s) = l[h n ,T-hn ] (s). In this section, we shall use the following notations
and assumptions. We assume that under Po, S has a density is such that for
some ry > 0, the supports of the distributions of Sand C contain [-ry, T + ry]
and we define I T,77 = {(s,x): x E [O,T],S E [-ry,T+ry],S+X E [-ry,T+ry]}. For
a vector z in lRP, let z00 = 1, z0 1 = z and z0 2 = zzT. The norms in lRP and
in (lRP)02 are denoted II . II and the uniform norm of a function or a process
on a set J is denoted II . IIJ. For k = 0,1,2, (s, x) in In,T and under the next
conditions let

S~k) (x; s, j3)

s(k) (x; s, j3)


214 O. Pons

S~k)(X;s,{3) lE[S~k)(X; s, {3)] = foT Kh n(s - u)s(k)(x; u, {3) du,

S(k,l)(X,y;S) lE[Y(y)Z®k(s + x)e!3;rZ(S+X) (Z®l(s + y)f


x e!3;rZ(S+Y) IS = slfs(s), k, l = 0,1,
1({3) r {s(2)s(0)-1_ [s(1)s(0)-1]®2}(x;s,{3)
if.,-
x s(O)(x; s, {30)>'0,Xls(x; 8) dxds,
n- 1 L 6i 1{SiSs}1{Xi:SX}'

H(s, x) Po(S ~ 8,X ~ x,6 = 1),


Hln(X; S) foT Khn(S - u)H(du,x),

Hl(X;S) fox s(O) (y; s)>'o,xIS(y; s) dy,

Z®k(S !3Tz ,(s-+x)1 i = 1 . .. ,n,


W?)(s,X) i i + X ) eO" 1
{Si:SS} {Xi~X}'

W(k) = lEWi(k) and W~k) = n-1 I:i wl


k). We also denote 'Y2(K) = I~1 K 2(z) dz
and we omit {30 in the notations. We assume that the next conditions hold,
Cl {30 and (iJn)n belong to the interior of B a compact and convex subset of
lRP, lI>'o,xlsllf.,-,1) < 00 and 111sll[-17,T+17] < 00.
C2 The process Z has sample paths in a subset Z of the space of left-continuous
functions with right-hand limits, the matrix VarZ(t) is positive definite
for any t E [0, rJ, inf!3EB inf(s,x)Ef"-,1) s(O)(x; s, {3) > 0, and there exists a
constant Ml such that

lE sup sup [II Z(t) II 2k+1 e2!3TZ (t)] ~ Ml, k = 0, 1,2.


tE[-17,T+17] !3EB

C3 The functions s ~ >'o,xls(x; s) and s ~ s(k)(x; s, {3), k = 0,1, are twice


differentiable on {s : (s,x) E 1T17 }, with second derivatives such that
sUPXSUPISl-S21:Sh l).o,xls(x; 81) - ).O:XIS(X; S2) I and sUPxSUP!3EBsuPlsl-S21:Sh
IIS(k) (x; SI, {3) - S(k) (x; S2, {3)1I tend to zero as h -+ 0, and the functions
s(k,l)(x,y;') are continuous, uniformly in x and y.

C4 I~lldK(z)1 < 00, nh~ -+ 00 and hn = o(n- 1 / 4 ).

C5 IISAk) - s(k) IIfn,'r,B ~ 0, for k = 0, 1,2.

Using an integration by parts, C5 is verified if nl/2I1W~k) - W(k) IIf.,-,B converges


weakly. In particular, this convergence holds under C1-C3 if Z is a random
Cox Models for Heterogeneous Populations 215

variable and if there exist conditional densities fXls,z and gCls,z for X and C
and a constant M2 such that

sup sup lEUIZ0k(s + xl)e,6T Z(S+Xl) 11 2 {gcls,z(s + X2; s, Z)


,6EB (S,Xl,X2):(S,Xl),(S,X2)EI",7J
+ fXls,z(x2; s, Z)}IS = s] ::::; M 2 ·

The weak consistency of i3n and the asymptotic normality of n 1 / 2 (i3n - /30)
are established by the classical arguments in maximum likelihood estimation.
For large n, an expansion of the score process gives

sup
11,6-,60 II sc:
IIIn(/3) - 1(/30)11 ~ °
as E ---t °
and n ---t 00,

Un = n- 1 (dl n /d/3) (/30) = n- 1 L En(Si)8d Zi(Ti ) - [S~l) S~?)-l](Xi; Si, /30) }


i

is approximated using a statistic of the form Lih i.pn (~i' (j) with ~i = (Si, Xi, 8i)
and(j = (Sj,Xjl\(Cj-Sj),Zj). Denoting1/Jn((~i'(i),(~j,(j)) = Hi.pn(~i,(j)+
i.pn(~j, (i)}, we obtain a U-statistic of order 2, Lih 1/Jn( (~i, (i), (~j, (j)), and the
weak convergence of n 1/ 2 Un follows from a Hoeffding decomposition [Serfling
(1980)].

Theorem 16.2.1 n 1/ 2 (i3n - /30) converges weakly to a Gaussian variable


N(O, I-I (/30)) and I-I (/30) is the minimal variance for a regular estimator of /30.

For fixed s in [hn, T - hn], let D([O, T - s]) be the space of right contin-
uous real functions with left-hand limits on [0, T - s]. In Pons and Visser
(2000), conditions are given for weak convergence of the process Ln (·; s) =
(nh n )1/2(An,XIS - Ao,xIS)(·; s) in the Skorohod topology on D[O, T - s]. Its limit
is a continuous Gaussian process L with mean zero and covariance (J(x ' 1\ x; s)
at (x; s) and (x'; s) in In,r, where

(16.3)

The asymptotic variance of Ln(x; s) is consistently estimated by

(16.4 )
216 O. Pons

which provides a point-wise asymptotic confidence interval for Ao,xls)(x; s).


The weak convergence of in can be extended on In,r: Let a~O) = n1/2CW~0) -
w(O»), a;[ = n 1/ 2(Hn - H) and an = (a~O), a;[). For every n the variables
an (s, x) and an (s', x') have the same covariance 2:0< (s, s' , x, x') defined by

lE{ a;; (s, x )a;; (s', x')} H(s 1\ s', x 1\ x') - H(s, x)H(s', x'),
lE{ a~O) (s, x )a~O) (s', x')} lE[l I1 ~ e(3'{; {Zi(S+X)+Zi(S'+X')}j
{Si:SsI\S} {Xi~XVX'}
- W(O)(s,x)W(O)(s',x'),
lE{ ''''(O)(
U- n S, X
)
s, x ')}
anH(' lE[l {Si:Ssl\s'} 1{x:sxi:Sx'} e(3'{; Zi(S+X)j

- W(O)(s,x)H(s', x') if x ~ x',


lE{a~)(s,x)a;;(s',x')} = -W(O)(s,x)H(s',x') if x 2: x', (16.5)

thus 2:,A s, s', x, x') is written in the form

Theorem 16.2.2 If there exists a sequence of centered Gaussian processes Bn


on In,r with covariance matrix 2:0< (16.5) such that !Ian - Bnllln,r = op(h::P)
and if s(O)(" sL
has a derivative 8(0)(., s) such that s(O) is continuous on Ir,'T))
then (nh n )1/2(An,XIS - Ao,xls) converges weakly on In,r to a centered Gaussian
process L.

PROOF. On In,r, in is written

and it develops as in = Ln + LIn + L2n, with

which is a uniform O((nh~)1/2) under the regularity conditions, and

Lln(X; s) = (nh n )1/2 fox {S~O)-I(y; S, fjn) - S~O)-l(y; s)} Hln(dy; s),

L2n(X; s) = (nhn)1/2 {foX S~O)-I(y; s) Hln(dy; s) - fox s~O)-l(y; s) H1n(dy; s)}.


By an expansion of S~O)-l(fjn) for 13n in a neighborhood of f30 and by Theorem
16.2.1,
Cox Models for Heterogeneous Populations 217

and therefore LIn are op(l) uniformly on In,T.


Finally,

a;; (x; s) - fox s(0)-2 (y; s )a;; (y; s )8(0) (y; s) dy


S(O)-1 (x; s )

- fox s(0)-2(y;s)a~0)(y;s)Hl(dy;s)
with an(x;s) = f; Khn(S - u)an(x,du) for (s,x) E In,T. An integration by
parts entails

and it tends to zero in probability. The processes defined on IT as the


B~
f;
stochasticintegralB~(x;s) = hlj2 Khn(S-u)Bn(du,x) are centered Gaussian
processes with the same limit covariance since f K 2(z)COI (dz,x,x') is finite
for every x and x', from the expression (16.5). Moreover, under the above
assumptions s~O)(.,s) has a derivative 8~0)(.,s) and 8~0) converges to 8(0) on IT,
therefore IIL2n - L2nliln r converges to zero in probability, with

L 2n (x;s) = s(O)-I(x;s)B;;*(x;s) - fox s(0)-2(y;s)B;;*(Y;S)8(0)(y;s)dy


- fox s(0)-2(y;s)B~0)*(y;s)Hl(dy;s)
and Ln converges weakly to a centered process L whose covariance is the lim-
iting covariance of L 2n .

If Z is a variable with values in a bounded subset Z of JR, let Vn be the


empirical process of the variables (Si, Xi, Zi)i<n and v~ be the empirical process
of the variables (Si, Xi, Zi) with 8i = 8, i -;; n. By Theorem 1 in Massart
(1989), there exist a constant c and a sequence of Gaussian processes G n such
that Po(lIvn - GnlllrxZ > c(10gn)3/2n- 1/ 6 ) tends to zero as n -- 00. As in
Burke, Csorgo, and Horvath (1981) and in Castelle (1991) for the empirical
process of a one-dimensional variable associated with a discrete variable, it is
then expected that (vn , v~, v;J may be approximated by Gaussian processes at
the same rate, so that the rate of the Gaussian approximation of (vn , v~, v~) is
an op(hlj2) if h n = O(n- d ) with d < 1/3. As

a~O)(s,x) = L 6Z {fo
ef3
T vn(s,dt,dz) - vn(s,x-,dz)}

and
a;; (s, x) = L v;(s, x, dz),
218 O. Pons

the approximation Ilan - Bnll!.r = op(h;(2) assumed in Theorem 16.2.2 is then


satisfied.
A goodness-of-fit test of a Cox model for the duration X against the alterna-
tive of a non-stationary model (16.1), may be viewed as a test for stationarity,
with the hypothesis Ho: AXls(x; s) = AX(X) for every (s, x) E IT, hence

AXIS,Z(X; S, Z) = AX(X) exp{,6T Z(S + X)}.

Tests for Ho can be based on the difference between the estimated hazard
functions under the hypothesis and under the alternative, i.e. on the process
D 1n defined on In,T by

where An,x is Breslow's estimator of Ax under the null hypothesis. Under Ho


and under conditions C1-C5, An,x - AO,XIS = Op(n- 1/ 2), therefore

D1n = (nh n)1/2 (An,xIS


'
- Ao,xls) + op(l)
and it converges weakly to a Gaussian process under the conditions of Theorem
16.2.2. Under the alternative, IID1nIIIn,'T tends in probability to infinity.
As for the comparison of the hazard functions of several groups [Andersen
et al. (1993)], nonparametric tests for comparing AXIs and Ax can be based
on processes of the form

where Bn is a predictable process with respect to the filtration generated by the


observations in the duration scale under Ho. For instance a test statistic can be

°
based on a discretization of D1n on a finite grid: Let (Xj)oS:jS:J be an increasing
sequence of [0, r] with Xo = and let V1n (s) be the vector of dimension J with
components the variables

By (16.3) and (16.4), the components of Vin(s) are asymptotically independent


and their asymptotic variance Vj(s) = a 2(xj;s) - a 2(Xj_1;S) is estimated by
Vnj(s) = iT2(Xj;s) - iT 2(Xj_1;S). Let A1n(S) be the diagonal matrix with el-
ements Vnj(s), under the conditions of Theorem 16.2.2 the process A~~/2V1n
tends to a Gaussian process with mean zero and variance identity. However, its
asymptotic distribution under Ho depends on the unknown parameters Ao and
,60 through its covariances. Bootstrap tests with statistics
Cox Models for Heterogeneous Populations 219

or

may be used, with Fn,s the empirical distribution function of (Sik;n.


A test for stationarity of the baseline hazard function against an alternative
of differences at fixed times Si, of [hn, T - hnl can very simply be performed if
(Si)i<I is an increasing sequence with Si - Si-l > 2h n . Let Vn be the vector of
dimension I J with components the variables

1 :::; i :::; I, 1 :::; j :::; J

and let An be the diagonal matrix with elements

defined by (16.4). From the expression of ~a, the covariance of

(S, x) and (S', x') E In,T, is asymptotically equivalent to

and therefore to zero if Is-s/l > 2h n since K is zero outside [-1,1]. Then under
C1-C5, the statistic V; A~l Vn has an asymptotic X2 distribution with I J de-
grees offreedom under Ho and it tends in probability to infinity ifAxls(Xj; Si) =f
Ax(xj) for some i :::; I and j :::; J.

16.3 Varying-Coefficient Cox Model


On a probability space (0, F, Po) we observe a sample (Ti' lSi, Xi, Zdl<i<n of n
independent and identically distributed observations where Ti = TP ACi and
lSi = l{TP::;C;} are defined as in the introduction, Xi is a real random variable
and Zi is a p-dimensional vector of processes. We assume that conditionally on
(Xi, Zi), TP has a hazard function defined by (16.2) with parameters (130, Ao)
and C i is independent of TP conditionally on. (Xi, Zi). We suppose that the
time observations Ti belong to a finite time interval [0, T] and that the Xi'S
have a density Ix. The problem is to estimate the function 130 on a compact
subset JX of the support of Ix, so that 130 (x) belongs to a compact set B for
220 o. Pons

every x in Jx. In Pons (1999), the estimator i3n(x) is defined as the value of (3
which maximizes

where Yi(t) = l{T;~t} is the risk indicator for individual i at t, and an estimator
of the integrated baseline hazard function follows,

We assume that Jx and Jx,'f/ = {X : infYEJx Ix - yl ::;


included in X = {x E IR : fx(x) > O} for some 1] >
notations and conditions of Section 16.2. Let
° are strictly
1]}
and we modify the

where (3 is a function satisfying the conditions below. For (3 E IRP, x E Jx, let

L Kh n (x - Xi)Yi(t)Zi0k (t)e fF Zi(t),

L Kh n (x - X i )Yi(t)Zf k(t)e{3o(X i )T Zi(t),

m(k)(t,(3,X) 1E{Yi(t)Zi0k (t)e{3TZi (t) I Xi = x},


8(k)(t, (3, x) m(k)(t, (3, x)fx(x),
sCk)(t,x) m(k) (t, (30 (x ), x )fx (x),
1((3, x) faT {8(2) - 8(1)02 8 (0)-1 }(t, (3, x) dAo(t),

8~k)(t,(3,x) 1En-1S~k)(t, (3, x)


J Kh n (x - y)m(k)(t, (3, y)fx(y) dy,
1En-1S~k)(t,x)

J Khn(X - y)m(k)(t,/3o(Y), y)fx(y) dy,

m (k,l) (8,t,x1,X2,X ) 1E{Yi( t)Zfk (8) (Zfl (t))T


x e{30(X 1 )TZi(S)+{30(X 2 )TZi(t)} I Xi = x},

X1,X2,X E Jx,'f/' k,l = 0, 1.


Cox Models for Heterogeneous Populations 221

Now, we denote Un(j3, x) and -In ((3, x) the first two derivatives of n-IZn,x((3)
with respect to /3, and we simply write

Un(x) Un (/30 (x ), x),


S~k)(t,x) S~k)(t, /3o(x), x),
s~k)(t,x) s~k) (t, /30 (x ), x),
s(k)(t,x) s(k)(t, /30 (x) , x).

We throughout assume that the next conditions 01-05 are satisfied,

Cl For every x in JX,1) , fx(x) > °


and /3o(x) belongs to the interior of a
compact and convex subset B of ffiP.

C2 The process Z has sample paths in a subset Z of the space of left-continuous


functions with right-hand limits, the matrix Var{ Z(t) IX = x} is positive
definite for any t in [0, T] and x in JX,TJ' AO(T) < 00, IlfxllJx < 00,

inf inf inf s(O)(t,/3,x) >


tE[O,rl f3EE xEJx ,'1
°
and there exists a constant MI such that for k = 0, 1,2,
1E sup sup[IIZ(t)112k+le2f3TZ(t)] :S MI.
tE[O,rl f3EE

C3 The functions /30 and fx are twice continuously differentiable on JX,TJ·


For k, l = 0,1 and for every sand t in [0, T] and /3 in B, the functions
m(k,l)(s, t, .,.,.) are continuous on Jlrl' the functions m(k)(t, .,.) are twice
continuously differentiable on B X JX,TJ' with second derivatives such that

lim sup sup sup Ilm(k)(t,/3I,xI)-m(k)(t,/32,X2)11=0.


10,10'---.0 IXI -x21::;e 11f31 -f3211 ::;10' tE[O,rl

C4 nh --7 00 and h n = o(n- 1/5 ).

C5 The variables Iln-IS~k) - s(k)II[O,rlxExJx and Iln-IS~k) - sC k) II [O,rlxJx con-


verge in probability to zero, k = 0, 1,2.

By classical arguments, 73n (x) and In (/3, x) converge in probability to /30 (x)
and 1(/3, x) for any x in Jx and /3 in B, and this point-wise weak consistency
is extended as a uniform convergence under 05: The variables

sup II;3n(x) - /30 (x) II


XEJx
and
sup sup IIIn(/3,x) - I(/3',x)l!
XEJx 1If3-,B'11 ::;11,Bn (x )-f3o (x) II
222 O. Pons

converge in probability to zero.


Let

Un(x) = n- 1 L Kh n (x - Xi)8i {Zi(Ti ) - (S~l) S~O)-l )(Ti' x) } ,


i

x in Jx. The asymptotic normality of (nh n )1/2(fjn(x) - po(x)) relies on a


uniform expansion of the score process, which implies

with IlrnllJx = op(l). Using a Hoeffding decomposition of Un(x) we obtain the


approximation (nh n )1/2Un (x) = nl/2U~(x){1 + op(ln with a uniform op(l) on
Jx and
U~(x) = n- 1 L {Vl,n,i(X) - IEVl,n,i(Xn + n- 1 L {V2,n,i(X) - IEV2,n,i (x)} ,
i
(16.6)

where

Vi,n,i(X)

V2,n,i(X)

We deduce a point-wise weak convergence result.


Theorem 16.3.1 The variable (nh n )1/2(lJn - po)(x) converges weakly to a
Gaussian variable N(0,r2(K)Ir;1(x)) for every x in Jx.
For the weak convergence of the process (nh n )1/2(fjn - po), let us define for
tE [0, TJ, X and y E Jx the following processes:

WP)(t,y) 8i l{Ti::;t} l{xi::;Y} '


WP)(t,y) 8i Z i(Ti ) 1{Xi::;Y} ,
Wi(3) ( t,x,y ) e(3o(x)T Zi(t) l{Ti?:t} l{xi::;Y}
and
W i(4)(t ,x, Y) = Z i (t) e(3o(x)TZi(t)l {Ti?:t} 1{Xi::;Y} , 1 <
_ 2. <
_ n.

Let W(k) = IEWi(k) , a~k) = n 1 / 2 (n- 1 L:i WiCk) - W(k»), k = 1, ... ,4, and an =
(a~l), ... ,a~4»)T. For every n the variables an(t, x) and an(t', x') have the same
covariance matrix ~a(t, t', x, x') which is again of the form

Ca(t, t', X 1\ x') - Ea(t, x){Ea(t' , x'nT.


Cox Models for Heterogeneous Populations 223

Theorem 16.3.2 If there exists a sequence of centered Gaussian processes Bn


on [0, T] X Jx with covariance matrix L:a such that Ilan - Bnll[o,TjXJx = op(h;!2)
and if the functions s(k) (t, x), k = 0, 1, have a continuous derivative with respect
to t E [0, TJ, then (nh n )1/2(jjn - (30) converges weakly to a centered Gaussian
process on [0, T].

This result is based on the approximation of the process Un given by (16.6)


which implies

and the proof is similar to the convergence of the process L 2n for Theorem
16.2.2.
The asymptotic behavior of the process (An - Ao) relies on an expansion of
S~O) (jjn) for jjn close to (30, and therefore on the behavior of S~O) (jjn) - S~O) ((30).
From a development of this sum and due to the convergence rate (nh n )-1/2 of
the estimators jjn(Xi), the process

converges weakly to a Gaussian process G, and its mean is a O((nh~)1/2) under


a Lipschitz condition for m(1)(., 2(30, x) and from the convergence in probability
to zero for the variables

Iln-lS~k) - s(k)ll[o,TjXBXJx and Iln-lS~k) - S<k)ll[o,TjXJx'

k = 0,1,2 [Pons (1999)]. The asymptotic distribution of An is deduced from

Theorem 16.3.3 The process (nh n )1/2(An - Ao) converges weakly to


- fa G(t){fJx s(O)(t, y) dy} -2 dAo(t).

A goodness-of-fit test of a Cox model for the survival time TO against the
alternative of a model (16.2) where the regression coefficients vary with the
values of the variable X on Jx is a test for the hypothesis Ho : (3(x) = (3 E B
°
for every x in J x. Tests for H can be based on the process
224 o. Pons

x in J x, where 7Jn,o is the Cox estimator of the regression parameter (30 under
the null hypothesis. Under H o, 7Jn,o converges to (30 at the rate n- 1/ 2 and

Under C1-C5 and under the conditions of Theorem 16.3.3, it converges weakly
under Ho to a Gaussian process with mean zero and variance Ir;l(x) and
IID2nllJx tends to infinity under the alternative. A bootstrap test based on
the process D§n(x)I;;1(X)D2n(X) may be used, where In(x) = In (/3n (x) , x) is a
consistent estimator of Io(x).
If we restrict the alternative to possible differences between values (3(Xi) on
a finite subset (Xi)i9 of Jx such that (xiksI is an increasing sequence in Jx
with Xi -Xi-I> 2hn . Let V2n be the vector of dimension I with components the
variables D2n(Xi). From the expression of the covariances of an, the covariance
of
foT Kh n (x - u) an (du) and foT Kh n (x' - u') an(du')

tends to zero for every sand s' E Jx such that Is - s'l > 2h n . The asymptotic
variance of the variable V2n is therefore a block-diagonal matrix of dimension pI
and with the sub-matrices Ir;l(Xi), i ::; I, as diagonal blocks, it is consistently
estimated by the block-diagonal matrix A 2n with sub-matrices I n (7Jn(Xi)). Then
a simple test statistic for constant regression coefficients is given by V2~A2~ Y2n,
under conditions C1-C5 it has an asymptotic X2 distribution with I degrees of
freedom under Ho and it tends to infinity under the alternative if (3o(xd -I
(3o(Xj) for some i and j ::; I.

References
1. Andersen, P. K., Borgan, 0., Gill, R. D., and Keiding, N. (1993). Statis-
tical Models Based on Counting Processes, New York: Springer-Verlag.

2. Breslow, N. E. (1972). Discussion of the paper by Cox, D. R., Journal of


the Royal Statistical Society, Series B, 34, 216-217.

3. Breslow, N. E. and Crowley, J. J. (1974). A large sample study of the


life table and product limit estimates under random censorship, Annals
of Statistics, 2, 437-453.

4. Brown, C. (1975). On the use of indicator variables in studying the time-


dependence of parameters in a response-time model, Biometrics, 31, 863-
872.
Cox Models for Heterogeneous Populations 225

5. Burke, M. D. and Csorgo, M., and Horvath, 1. (1981). Strong approxi-


mations of some biometric estimates under random censorship, Zeitschrijt
fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 56, 87-112.

6. Castelle, N. (1991). Principes d'invariance et application a la statistique


de modeles gaussiens, Thesis Orsay: Universite Paris-Sud.

7. Cox, D. R. (1972). Regression model and life tables (with discussion),


Journal of the Royal Statistical Society, Series B, 34, 187-220.

8. Dabrowska, D. M. (1987). Non-parametric regression with censored sur-


vival time data, Scandinavian Journal of Statistics, 14, 181-197.

9. Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models, Journal


of the Royal Statistical Society, Series B, 55, 757-779.

10. Keiding, N. (1991). Age-specific incidence and prevalence: A statistical


perspective, Journal of the Royal Statistical Society, Series A, 154, 371-
412.

11. Massart, P. (1989). Strong approximations for multidimensional empirical


and related processes via KMT constructions, Annals of Probability, 17,
266-291.

12. Pons, O. and Visser, M. (2000). A non-stationary Cox model, Scandina-


vian Journal of Statistics, 27, 619-639.

13. Pons, O. (1999). Nonparametric estimation in a varying-coefficient Cox


model, Mathematical Methods of Statistics, to appear.

14. Serfiing, R. J. (1980). Approximation Theorems of Mathematical Statis-


tics, New York: John Wiley & Sons.

15. Staniswalis, J. G. (1989). The kernel estimate of a regression function in


likelihood-based models, Journal of the American Statistical Society, 84,
276-283.

16. Tibshirani, R. and Hastie, T. (1987). Local likelihood estimation, Journal


of the American Statistical Society, 82, 559-567.

17. Tsiatis, A. A. (1981). A large sample study of Cox's regression model,


Annals of Statistics, 9, 93-108.

18. Zucker, D. and Karr, A. (1990). Nonparametric survival analysis with


time-dependent covariate effects: A penalized partial likelihood approach,
Annals of Statistics, 18, 329-353.
17
Assumptions of a Latent Survival Model

Mei-Ling Ting Lee and G. A. Whitmore


Harvard University, Cambridge, Massachusetts
McGill University, Montreal, Quebec, Canada

Abstract: Whitmore, Crowder, and Lawless (1998), henceforth WCL, con-


sider a model for failure of engineering systems in which the physical degrada-
tion process is latent or unobservable but a time-varying marker, related to the
degradation process, is observable. Lee, DeGruttola, and Schoenfeld (2000),
henceforth LDS, extend the WCL model and investigate the relationship be-
tween a disease marker and clinical disease by modeling them as a bivariate
stochastic process. The disease process is assumed to be latent or unobserv-
able. The time to reach the primary endpoint or failure (for example, death,
disease onset, etc.) is the time when the latent disease process first crosses
a failure threshold. The marker process is assumed to be correlated with the
latent disease process and, hence, tracks disease, albeit imperfectly perhaps.
The general development of this latent survival model does not require the pro-
portional hazards assumption. The Wiener processes assumptions of the WCL
model and the extended model by LDS, however, must be verified in actual
applications to have confidence in the validity of the findings in these applica-
tions. In this article, we present a suite of techniques for checking assumptions
of this model and discuss a number of remedies that are available to make the
model applicable.

Keywords and phrases: First hitting time, inverse Gaussian distribution,


latent status, marker, Wiener processes

17.1 Introd uction


Whitmore, Crowder, and Lawless (1998), henceforth WCL, consider a model for
failure of engineering systems in which the physical degradation process is latent
or unobservable but a time-varying marker, related to the degradation process,
is observable. Lee, DeGruttola, and Schoenfeld (2000), henceforth LDS, extend

227
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
228 M-L. T. Lee and C. A. Whitmore

the WCL model and investigate the relationship between a disease marker and
clinical disease by modeling them as a bivariate stochastic process. The disease
process is assumed to be latent or unobservable. The time to reach the primary
endpoint or failure (for example, death, disease onset, etc.) is the time when
the latent disease process first crosses a failure threshold. The marker process
is assumed to be correlated with the latent disease process and, hence, tracks
disease, albeit imperfectly perhaps. The general development of this latent sur-
vival model does not require the proportional hazards assumption. The Wiener
processes assumptions of the original WCL model and the extended model by
LDS, however, must be verified in actual applications to have confidence in the
validity of the findings in these applications. In this article, we present a suite
of techniques for checking assumptions of this model and discuss a number of
remedies that are available to make the model applicable.

17.2 Latent Survival Model


We shall follow the extended model adapted by LDS in considering the latent
process as a disease process, rather than an engineering degradation process as
originally proposed by WCL.
Let {X (t)} denote a latent disease process which represents the level of dete-
rioration of the health of a subject. Likewise, let {Yw(t)} represent an observed
marker process that is correlated with the disease process {X(t)} and tracks its
progress. For example, in an AIDS setting, CD4 count tracks the latent dis-
ease. We consider the two-dimensional Wiener diffusion process {X(t), Yw (t)} ,
for t 2: 0, with initial values {X(O), Yw(O)}. The vector {X(t), Yw(t)} has
a bivariate normal distribution with mean vector {X(O), Yw(O)} + tIL, where
/L = (/Lx, /Ly), and covariance matrix tL:" where

L:, = ( (Jxx (J xy ) . (17.1)


(Jyx (Jyy

We assume that the subject's initial disease level is some negative number
X(O) < O. This initial level is unknown and will be estimated. We set a failure
threshold at zero on the disease scale. The closer the value of X(t) is to zero, the
more diseased is the subject. A subject fails, i.e, reaches a primary endpoint,
when the disease level first reaches the origin. The distance from the initial level
to the origin will be denoted by parameter 8, where 8 = IX(O)I. Also, we find it
convenient to consider a modified marker process that measures changes in the
marker process from its initial level, i.e., we consider the marker change process
{Y(t)} where Y(t) = Yw(t)-Yw(O). The initial marker level Yw(O) = Ywo is used
as a baseline covariate. We denote the first passage time from the initial disease
Assumptions of a Latent Survival Model 229

level to the failure threshold by random variable S. To relate our notation to


that of WCL replace their a and x by 8 and x + 8, respectively.

17.3 Data and Parameter Estimation


As the disease process is assumed to be latent, inferences about the disease
process must be based on observations on the marker process, supplemented
by observed survival times for subjects who fail. No direct observation is made
of the latent disease process, although one can infer that the X process has
reached the failure threshold at the moment a subject fails and that X has not
reached that level at any time in the interval (0, tJ if the subject has survived
to time t. In the following presentation, we will consider the available data to
have the following form.
We consider m independent subjects (i = 1, ... , m). For the ith subject,
we observe the marker process {Yi(t)} at ni + 1 times or ages: 0 = tiO <
til < ... < tini. The realized initial marker level for subject i is Ywi (tiO) =
Ywi(O) = YwiO. The realized marker increments over the subsequent time inter-
vals b..tij = tij - ti,j-l are denoted by b..Yij = Ywi(tij) - Ywi(ti,j-l) = b..Yij,
for j = 1, ... , ni and i = 1, ... , m. We let n = :Li ni. The correspond-
ing realized increments of the disease process {Xi(t)} for subject i are de-
noted by b..Xij = Xi(tij) - Xi(ti,j-l) = b..xij and the initial disease level is
Xi(tiO) = Xi(O) = XiO. These disease increments and the initial degradation
level are unobservable by assumption.
We denote the first passage time for subject i by random variable Si with
realization Si. If the ith subject is observed for a period of length bi then all
subjects with Si > bi will have censored failure times. Subjects that fail before
the end of their observation periods bi are given subscripts i = 1, ... ,p, where
p :::; m. We assume that tini = Si for failing subjects and tini = bi for subjects
whose failures are censored. This last statement implies that a marker reading
is available for each subject either at the time of failure or at the end of the
observation period bi, as the case may be.
The termination time tini for the sample path of any subject i is a random
variable in advance of data collection because it is uncertain if the sample path
will end in survival, in which case tini = bi, or in failure, in which case tini = Si.
Thus, the index ni of the last increment in a sample path is random, as well as
the length of the last time increment b..tini.
As demonstrated by both WCL and LDS, maximum likelihood can be used
to estimate the model parameters. As the disease process is unobservable, one
of its parameters must be fixed arbitrarily. For example, the initial disease
level XiO or the variance CY xx might be set to unity. LDS implemented their
estimation procedure and demonstrated how the model parameters may be
230 M-L. T. Lee and G. A. Whitmore

related to relevant baseline covariates of the subjects using generalized link


functions. In the following discussion, we do not take explicit account of possible
individual variation in model parameters that arises because of different baseline
covariates. Modifications in the methods that are needed to take account of
individual parametric variability will be apparent.

17.4 Model Validation Methods


The latent survival model makes a number of assumptions that would require
verification in any application to ensure confidence in scientific findings based
on it. These assumptions center mainly on the bivariate Wiener process for the
joint latent disease and marker processes and the use of a first passage time as a
survival time. As a Wiener process requires independent normally distributed
increments, checks are required for these two features. The first passage time
to a fixed barrier in a Wiener process with drift towards the barrier follows an
inverse Gaussian distribution so the censored survival data provide a check on
this feature. In this section we describe methods for examining the available
data to check these assumptions.
LDS illustrated the application of a latent survival model to data from
the AIDS Clinical Trial Group (ACTG) 116A Study. In brief, this study was a
multicenter, randomized double-blind trial that compared the clinical efficacy of
zidovudine (at varying dosage levels) with two dosage levels of didanosine (500
and 750 mg/d) for patients with AIDS, AIDS-related complex or asymptomatic
HIV. Death was the primary study endpoint. Their data file contained records
for 787 patients. The logarithm of CD4 cell count was taken as the marker.
LDS have illustrated several of the model checking methods listed below. We
give a cross-reference to LDS in each case so the reader can see an application
of the checking method.

1. Checking the Wiener Property of the Latent Disease Process:


The censored survival data can help in verifying the form of the latent
disease process. As already noted, the first passage time to a fixed barrier
in a Wiener process follows an inverse Gaussian (IG) distribution. Hence,
a probability plot can be used to compare the censored survival data
to this distribution. Alternatively, a Kaplan-Meier (KM) nonparametric
survival function plot can be compared with the IG survival curves implied
by the model. LDS illustrate a comparison of IG and KM survival plots for
their case study - see Figure 3, p. 757 of Lee, DeGruttola, and Schoenfeld
(2000) .

2. Checking the Wiener Property of the Observable Marker Process:


Assumptions of a Latent Survival Model 231

The marker increments b..Yij are distributed as N (/-Lxb..tij, (J xxb..tij) under


the latent survival model. Therefore, given the parameter estimates /lx
u
and xx , the b..Yij may be standardized using the known time increments
b..tij as follows:
b..Yij - /lxb..tij
Wij = (U xx b.. t ij)I/2
(17.2)

A normal probability plot of these standardized values should confirm


approximate normality.
The Wiener process also requires that both the mean and variance of the
marker increment b..Yij should vary in direct proportion to the time incre-
ment b..tij used in computing the standardized values in (17.2). Therefore,
a scatter plot of the Wij against the b..tij should show no systematic pat-
tern.
LDS illustrate both of these plots - see Figure 2, p. 755, and Figure 1, p.
754, respectively, of Lee, DeGruttola, and Schoenfeld (2000).

3. Checking the Assumption of Independent Marker Increments:


With longitudinal marker data, the assumption of independent increments
can be checked by examining the autocorrelation structure of the marker
increments. Specifically, consecutive marker increments of the form b..Yij
and b..Yi,j-l, j = 1, ... ,ni, for subject i should be uncorrelated under the
independent increments assumption. As a practical matter, we note that
since both of these marker increments share the same measured marker
value at time tj-I, i.e., they share the reading for Ywi(ti,j-I), there will
be some negative autocorrelation if the marker reading is subject to in-
dependent measurement error. If the measurement error is not large, the
negative correlation will be relatively small, although possibly statistically
significant.
LDS did not report that they examined this model feature but we have
used their data to perform a small demonstration check. We have com-
puted the correlation coefficient for the last two marker increments for
subjects in their study who had three or more longitudinal measurements
of the marker. The coefficient is -0.178. It is small and negative, but
significant, indicating that some measurement error is present. We note,
however, that the squared correlation coefficient is only 0.03 which sug-
gests that the measurement error variance is small relative to the marker
process variance. CD4 cell counts are known to have some measurement
error so the finding is not unexpected. Thus, the marker increments in
this study are reasonably consistent with the assumption of independent
increments, except for the presence of small (but material) measurement
error. In later discussion, we explain how measurement error can be mod-
elled if it is judged to be great enough to warrant explicit attention.
232 M-L. T. Lee and G. A. Whitmore

4. Goodness-of-fit Test for Survival Time:


The following chi-square test provides a predictive check on model validity.
Partition survival time into a set of time intervals. Assign each subject
to an interval based on his or her actual survival time. The total counts
in the intervals provide the observed frequencies for the test. Compute
the survival probability for each interval for each subject from the fitted
model, taking account of individual covariates. Sum these probabilities for
each interval for all subjects. These probability sums provide the expected
frequencies for the test. A chi-square test is then used to compare the
observed and expected frequencies. A low chi-square value would indicate
a valid model. The test is more discriminating if the survival probabilities
are computed from a model that is fitted without the reference subject
(Le., on a remove-one basis). The test is modified in a standard way for
censored survival data.
LDS do not perform this model check as it is somewhat redundant if the
methods described under item 1 above have already been applied. We
also will not illustrate the method as its application is straightforward in
principle, albeit somewhat tedious to apply.

5. Consistency with Established Science:


The latent survival model must always meet validation checks that arise
from subject matter knowledge and the established science of the field of
application. The marker, for example, should be either a known cause or
a known effect of the disease. The robustness of the latent survival model
has yet to be tested. Experience with other statistical methods based on
normal theory suggests that parametric inferences from this model are
probably more robust than predictive inferences. Some departures from
the model assumptions are not likely to be a major concern. For example,
a marker that must have a monotonic sample path (for example, lifetime
cumulative tobacco smoking as a marker for lung disease) may still be
reasonably well represented by a Wiener process. In this case, the fitted
process parameters are likely to be consistent with a relatively smooth
sample path, having a small coefficient of variation (O"xx)1/2 / J-tx.
LDS provide a strong defense of the validity of the latent survival model in
their case study based on medical scientific understanding of the disease
course of AIDS and its relationship to CD4 cell count.
Assumptions of a Latent Survival Model 233

17.5 Remedies to Achieve a Better Model Fit


A number of remedies are available that frequently can help to bring the latent
survival model into line with the data in a particular application and thus
achieve an acceptable degree of model fit. We itemize some of these remedies
below, although many others are available that may be appropriate in particular
studies but not for general use.

1. Transformation:
Appropriate transformations may bring an application context into con-
formity with the model. Engineering applications of this kind of model
show that monotonic transformations of the time scale are often needed so
a Wiener process has a constant drift over time [see, for example, Whit-
more and Schenkelberg (1997)]. Some disease or marker processes will
tend to accelerate or decelerate with time. A monotonic transformation
of the calendar time scale r, such as t = r"l for some, > 0, may pro-
duce a constant process mean parameter on the transformed time scale
t. Parameters in this time transformation (such as ,) would then require
estimation. Transformations may be suggested by scientific knowledge
rather than by the data. Longitudinal data would be useful for checking
on the suitability of a transformation for a marker but there are some
checks that don't require this. For example, a monotonic transformation
of the time scale may bring the observed survival times into conformity
with an inverse Gaussian distribution. Nonstationarity in both the marker
and disease processes may be an issue. As the marker is expected to track
the disease, a similar transformation may be required for both processes.

2. Measurement Errors:
If independent measurement errors are present in the increments of the
marker process then the associated measurement bias and variability will
already be incorporated in the parameters J-ly and (J"yy. The marker incre-
ments are then interpreted as 'marker increments measured with error'
and no modifications are required in the analysis or the model. If mea-
surement error is an independent disturbance term that appears in each
marker reading (as indicated by a negative correlation of marker incre-
ments) then the true marker Y becomes latent or unobservable. The
jth measurement on the marker process then becomes disguised by the
presence of a measurement error term Ej as follows (here we suppress the
subject index i in the notation).

Owj = Y wj + Ej (17.3)
234 M-L. T. Lee and G. A. Whitmore

Here OWj is the observed reading and YWj is the true marker value. In
this situation, we might assume that the Ej are independently distributed
as N(O, v). The likelihood function of the model can then be extended
accordingly and parameter v estimated together with the process para-
meters. See Whitmore (1995) for a similar extension of the Wiener model
to include measurement error.
It should be noted that an independent measurement error of the type
shown in (17.3) will introduce a negative correlation between the baseline
marker reading (with error) Owo = Ywo + EO and the observed marker
increment .6.0wl = Owl - Owo because they share the measurement error
EO in the initial reading Owo. The same negative correlations would ap-
pear in the marker increments for longitudinal marker data, as we noted
earlier. This kind of correlation is found in longitudinal blood pressure
readings for example. We note, however, that the presence of measure-
ment error can be confounded with the presence of other dependencies in
the increments of a marker process.

3. Risks Associated with Competing Disease Processes:


The investigation of any disease process must take account of compet-
ing diseases. A marker may reflect the progress of one or more diseases
simultaneously. One can view the course of several diseases as a multi-
dimensional latent process {X(t)} with failure being triggered when the
process enters a failure region F for the first time. The failure time would
then be defined as S = mint (X(t) E F). To be more specific, the set F
may be defined by a separate failure threshold in each dimension of X(t)
and the time of failure would be determined by the first threshold which
the subject crosses.
Competing risks can be handled easily within the current latent survival
model as follows, using an approach that is quite standard in the litera-
ture. In this approach, the model and methods are applied to each mode
of failure separately, treating failures from competing modes as censored
observations. The separate analysis by failure mode can provide impor-
tant insights into the relationship between the marker and the different
disease processes, as well as the effects of covariates if these effects are
included in the model.

Acknowledgements. We acknowledge with thanks the financial support pro-


vided for this research by the National Institutes of Health grants HL40619-09
and NIGMS 55326-02 (Lee) and by a grant from the Natural Sciences and
Engineering Research Council of Canada (Whitmore).
Assumptions of a Latent Survival Model 235

References
1. Lee, M.-L.T., DeGruttola, V., and Schoenfeld, D. (2000). A model for
marker and latent health status, Journal of Royal Statistical Society, Se-
ries B, 62, 747-762.

2. Whitmore, G. A. (1995). Estimating degradation by a Wiener diffusion


process subject to measurement error, Lifetime Data Analysis, 1, 307-
319.

3. Whitmore, G. A. and Schenkelberg, F. (1997). Modelling accelerated


degradation data using Wiener diffusion with a time scale transformation,
Lifetime Data Analysis, 3, 1-19.

4. Whitmore, G. A., Crowder, M. J., and Lawless, J. F. (1998). Failure infer-


ence from a marker process based on a bivariate Wiener model, Lifetime
Data Analysis, 4, 229-251.
18
Goodness-of-Fit Testing for the Cox Proportional
Hazards Model

Karthik Devarajan and Nader Ebrahimi


Northern fllinois University, DeKalb, fllinois

Abstract: For testing the validity of the Cox proportional hazards model,
a goodness-of-fit test of the null proportional hazards assumption is proposed
based on a semi-parametric generalization of the Cox model, whereby the haz-
ard functions can cross for different values of the covariates, using Kullback-
Leibler distance. The proposed method is illustrated using some real data. Our
test was compared to that of previously described tests by using simulation
experiments and found to perform very well.

Keywords and phrases: Cox proportional hazards, hazard function, cumu-


lative hazard function, Kullback-Leibler discrimination information measure,
directed divergence measure, goodness-of-fit test, Weibull distribution, Chi-
square distribution

18.1 Introduction
The Cox proportional hazards (PH) model [Cox (1972)] offers a method for
exploring the association of covariates with the failure time variable often seen
in medical and engineering studies. It is a widely used tool in the analysis of
survival data and hence testing its validity is a matter of prime importance.
For a given vector of observed covariates z = (1, Zl, ... , zp)', the hazard
time t is modeled as
A(tlz) = AO(t) exp(,B' z), (18.1)
where,B = (/30, /31,'" ,/3p )' is a (p+ 1) vector of regression coefficients and Ao(t)
is an unspecified function of t and is referred to as the baseline hazard function.
Over the years, numerous graphical and analytical procedures have been de-
veloped to test the assumption of proportional hazards. The graphical methods

237
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
238 K. Devarajan and N. Ebrahimi

include plotting the logarithm of survival [Kalbfleisch and Prentice (1980)] and
methods based on the definitions of different residuals, among others. Schoen-
feld (1982) and Lin and Wei (1991) recommended plotting the elements of the
Schoenfeld residuals against failure times. Pettitt and Bin Daud (1990) sug-
gested smoothing the Schoenfeld residuals to consider time dependent effects.
Wei (1984) and Therneau, Grambsch and Fleming (1990) recommended plotting
the cumulative sums of martingale residuals. Lagakos (1980) proposed a graph-
ical method for assessing covariates in the Cox PH model based on the cumu-
lative hazard transformation and the score from the partial likelihood. Thaler
(1984) proposed nonparametric estimation and plotting of the hazards ratio
to check for non-proportionality. Arjas (1988) proposed a graphical method
based on comparisons between observed and expected frequencies of failures as
estimated from the Cox PH model. Departure from the proportional hazards
assumption is indicated by an imbalance between such frequencies as shown by
the graphs. Other graphical methods include those by Kay (1977), Crowley and
Hu (1977), Cox (1979) and Crowley and Storer (1983). In general, graphical
procedures give a first-hand idea about the departure from proportionality but
are quite subjective.
The analytical methods include, among others, tests for equality sensitive
to crossing hazards [Fleming et al. (1980)] and tests of proportionality for
grouped data [Gail (1981)]. A number of tests have been proposed based on
the time-weighted score tests of the proportional hazards hypothesis. These
include models in which the parameter vector f3 is defined as a given function
of time as considered by Cox (1972) and Stablein et al. (1981), tests in which f3
varies as a step function according to a given partition of the time axis [Moreau,
o 'Quigley, and Mesbah (1985)], tests in which f3 has defined trends along the
time intervals [O'Quigley and Pessione (1989)] and tests in which f3 varies as a
step function according to a given partition of the covariate space [Schoenfeld
(1980) and Andersen (1982)]' among others. Wei (1984) and Gill and Schumaker
(1987) developed tests using time-weighted score tests based on the two-sample
hazards ratio. In the multiple regression setting, similar tests were developed
for the parameters using a rank transformation of time by Harrell (1986) and
Harrell and Lee (1986). Nagelkerke, Oosting, and Hart (1984) considered testing
the global validity of the proportional hazards hypothesis without reference to
any alternative. Horowitz and Neumann (1992) and Lin, Wei, and Ying (1993)
proposed global tests based on the cumulative sums of martingale residuals.
Most of these tests have been shown to be special cases of the methods developed
by Therneau and Grambsch (1994). These methods are applicable when a
prespecified form is given for departures from proportionality.
Lin and Wei (1991) extend the methods of White (1982) for detecting para-
metric model misspecification to the Cox partial likelihood. Kooperberg, Stone,
and Truong (1995) introduced a model for the log-hazard function conditional
on the covariates. The Cox PH model is a member of the class of models con-
Goodness-of-Fit Testing 239

sidered for the conditional log-hazard and hence they test the proportionality
assumption. Hess (1994) considered cubic spline functions for assessing time by
covariate interactions in the Cox PH model. Quantin et al. (1996) derived a
global test of the proportional hazards hypothesis using the score statistic from
the partial likelihood. Pena (1998) discussed smooth goodness-of-fit tests for
the baseline hazard in the Cox PH model.
We define a semi-parametric generalization of the Cox PH model in which
the hazard functions corresponding to different values of the covariates can
cross. The cumulative hazard function corresponding to a covariate vector z is
given by
/\( tlz) = {(/\o(t))} exp(,' Z) exp(,l3' z), (18.2)
where /\o(t) is an arbitrary baseline cumulative hazard function and ,l3 and,
are unknown (p + 1) vectors of parameters.
In addition to being a semi-parametric generalization of the Cox PH model,
the model (18.2) has several nice features. Below we only mention two. For
other features and also more details about this model see Quantin et al. (1996)
and Devarajan and Ebrahimi (2000).

i. For two different covariate vectors Zl and Z2 we have


log {~f;I~~n = d(t) (Zl - Z2),
where a(t) = (3 + g(th with g(t) = log /\o(t). This is the Cox PH model
with time dependent coefficients a(t) that allows for crossing of hazard
curves;

ii. For two different values of covariates Zl and Z2, we have


),(tIZ1 _ )'~t\Z2~ ('( ))
1\ t Zl - 1\ t Z2 exp 'Y Zl - Z2 .

That is, the corresponding ratios of the hazard function to the cumulative
hazard function are proportional.
In this paper, our goal is to develop goodness-of-fit testing methods for the
Cox PH model against the model (18.2) using Kullback-Leibler Discrimination
(KLD) information measures [see Kullback (1978) and Ebrahimi et al. (1994)].
The structure of the paper is as follows.
The methods are described in Section 18.2. In Section 18.3, we compare
the proposed test with existing tests based on empirical power estimates via
simulation experiments. In Section 18.4, we illustrate our methods using real
life data sets. Throughout this paper we assume that the data consists of
independent observations on the triple (Xi, 6i, Zi), i = 1, ... ,n, where Xi is the
minimum of a failure and censoring time pair (Ti, Ci), 6i = J(Ti ~ C i ) is the
indicator of the event that a failure has been observed, and Zi = (1, Zi1, ... , Zip)'
is a (p + 1) vector of covariates for the i-th individual. The random variables
Ii and Ci are assumed to be independent.
240 K. Devarajan and N. Ebrahimi

18.2 Goodness-of-Fit Testing for the Cox PH Model


In the context of goodness-of-fit testing for the Cox PH model against the
alternative defined by (18.2), the null hypothesis of interest is

Ho: I\(tlz) = (exp(,6'z)) 1\0 (t), (18.3)

where I\o(t) is an arbitrary baseline cumulative hazard function, I\(tlz) is the


conditional cumulative hazard function given the covariate vector z and ,6 is
the (p + 1) vector of regression parameters. The alternative is

HI : I\(tlz) = exp(,6' z){l\o(t)}exp(,'z) , (18.4)

where, is an additional p + 1 vector of regression parameters. Hence, the null


hypothesis corresponds to , = 0 and the alternative corresponds to , i= O.
Given that F and G are the cumulative distribution functions under Ho and
HI respectively, to discriminate between the two hypotheses (18.3) and (18.4),
we use the KLD information between two distributions given by

J(F : G) = Joroo f(x) log {f(X)}


g(x) dx. (18.5)

Here, J(F : G) measures the discrepency between two distributions F and G


corresponding to the Cox model and the model (18.2) in the direction of H o.
It is well known that J(F : G) ?: 0 and the equality holds if and only if F = G.
Similarly, one can also define

J(G;F) = Jroo {g(x)}


o g(x)log f(x) dx
(18.6)

which measures the discrepancy between the two distributions F and G in the
direction of HI.
We can derive the divergence measure

J(F : G) = J(F : G) + J(G : F), (18.7)

measuring the difficulty of discriminating between Ho and HI. Both J(F : G)


and J(G : F) are commonly referred to as the directed divergence measure and
J(F : G) is referred to as the divergence measure.
It is clear that the survival function and the density function corresponding
to the Cox PH model given the covariate vector z are

F(tlz) = exp( - I\p (tlz)) = exp{ - (I\o(t)) exp(,6' z)}


and
f(tlz) = (exp(,6' Z))Ao(t) exp{ - I\p (tlz)}. (18.8)
Goodness-of-Fit Testing 241

Similarly, the survival function and the density function corresponding to the
non-proportional hazards model of HI, denoted by G(tlz) and g(tlz) respec-
tively, are

G(tlz) = exp( - I\a (tlz))

= exp ( - I~ AO(Y) exp{,8' z +;' z (18.9)

x (e;'Z - 1) log 1\0 (y)}dy ) ,

and

g(tlz) = exp { - I\a (tlz) + ,8' z +;' z} Ao(t) [1\0 (t)]exp(;'Z-I) . (18.10)

Thus using (18.8) and (18.10) the directed divergence J(G : Flz) given the
covariate vector z, is given by

J(G: Flz) = (1- e-;'Z){ -,8'z + Ke} + ;'z-l


(18.11)
+exp(,8'z - ;'z - ,8'zei'Z)r(e;'Z) ,

and the directed divergence J(F : Glz) given covariate vector z, is given by

J(F: Glz) = (1- ei'Z){-,8'z + Ke} - ;'z


(18.12)
+ exp(,8' z +;' z - ,8' zei'Z)r(ei'Z) - 1.

Hence, given the covariate vector z, the divergence measure J(F : Glz) is

J(F : Glz) = J(F, Glz) + J(G; Flz). (18.13)

Here Ke = 1000 (eu log u )du is Euler's constant and r(·) is the gamma function.
An important feature of all three measures described in (18.11)-(18.13) is that
they are all free from the baseline AD (t).
Since evaluations of J(G : Flz),J(F : Glz) and J(F : Glz) in (18.11)-
(18.13) require complete knowledge of unknown parameters,8 and;, then these
measures are not operational. We operationalize J(G : Flz), J(F : Glz) and
J(F: Glz) by developing discrimination information statistics i(G : Flz), i(F :
Glz) and J(F : Glz), where ,8 and; are replaced by j3 and 1'. Here, j3 and l'
are the maximum likelihood estimates obtained by approximating the baseline
hazard function with a linear combination of cubic B spline basis functions .
For more details about these estimates and their properties see Devarajan and
Ebrahimi (2000). Thus, for a given value of z, our goodness-of-fit test will be
based on either i(G : Flz),i(F : Glz) or J(F : Glz).
242 K. Devarajan and N. Ebrahimi

Remark 18.2.1 Observing i(G : Flz), i(F: Glz) and J(F: Glz) we see that
they all depend on the covariate vector z. As a global measure of goodness-
of-fit, we suggest averaging i(F : GIZi), i(G : Flzi) and J(F : GIZi) over all
individuals in the sample. That is considering ~ fi(F : GIZi)'~ fi(G :
n i=l n i=l
1~
Flzd and - L J(F : GIZi). Another approach is taking J(F : Glz), J(G :
A A A

n i=l
Flz), and J(F Glz), where z is the average covariate value over all the
individuals.

Now, to implement test statistics i(F : Glz), i(G : Flz) and J(F : Glz) use
the following steps:
Step 1: Use the Devarajan and Ebrahimi (2000) approach to estimate {3 and
'Y. Denote the estimates by /3 and 1'.
Step 2: Replace (3 and 'Y by /3 and l' in equations (18.11)-(18.13) to get
i(F: Glz), i(G: Flz) and J(F: Glz).
Step 3: One can show that 2n i(F : Glz), 2n i(G : Flz) and J(F : Glz)
have asymptotically chi-squared distribution with q degrees of freedom under
Ho. [See Kullback (1978)]. Here q is the number of parameters under Ho.
Therefore, if you are using i(F : Glz), then reject Ho if 2n i(F : Glz) > X~,x,
where X~,x is the chi-squared distribution with q degrees of freedom and a is
the significance level of the test. If you are using i(G : Flz), then reject Ho if
2n i(G : Flz) > X~,x' Finally, if you are using J(F : Glz), then reject Ho if
2
n J(F : Glz) > Xq,x'
A

18.3 Comparison of the Proposed Goodness-of-Fit


Test with Existing Methods
First, we compare our proposed goodness-of-fit test based on the divergence
J(F : Glz) with existing tests for the Cox PH model using empirical power
estimates based on 1000 simulations for different sample sizes and censoring
patterns. The comparison is made with results from Quantin et al. (1996) to
reduce the computational burden.
Simulations were performed based on a Weibull model for the two sample
problem using hazard functions AO(t) = 1 in Group 0 (corresponding to the unit
exponential distribution) and Al (t) = {3a(at)f3- 1 in Group 1 (corresponding to
a Weibull distribution with shape parameter (3 and scale parameter a). The
experiment was repeated 1000 times for each of the following combinations:
{3 = 0.5,2, a = 1,2 and n = 30,50,100 per group. Independent censoring
samples were generated based on the uniform distribution over a fixed interval
Goodness-of-Fit Testing 243

(O,B) to result in 25% censoring. Computations for the proposed method were
based on the full likelihood approach using B-spline approximations for the
baseline hazard as described by Devarajan and Ebrahimi (2000).
The proposed method is compared with goodness-of-fit tests of Quantin et
al. (1996), Breslow et al. (1984), Gill and Schumaker (1987), Nagelkerke et
al. (1984), Wei (1984) and that of Cox (1972) incorporating time-dependent
covariate effect. The critical values of the test statistics for the proposed test
were computed based on the Chi-square distribution with 1 degree of freedom
at a significance level of 0.05 as discussed in Section 18.2. Quantin et al. (1996)
note that the critical values of all the test statistics in their comparison study
were also based on the Chi-square distribution with 1 degree of freedom at a
significance level of 0.05 except that of Wei (1984) whose critical values are
based on the tables of Koziol and Byar (1975).
lt was seen in the simulations that the proposed test achieves the specified
significance level of 5% and hence it is a consistent test. Tables 18.1 through
18.6 present the results of the power study. From the results, there is clear
evidence that the proposed test performs better than most of the existing tests
for the Cox PH model. The empirical power of the proposed test is higher than
all the other tests in most situations. Overall, the results are much better for the
case of the Weibull distribution with shape parameter (3 = 0.5 for both choices
of the scale parameter a = 1,2. The proposed test performs moderately well
relative to the other tests for the case of (3 = 2 and a = 1 in both uncensored
and censored samples with group sizes 30 and 50.
lt would be interesting to study how the proposed test performs in the case
of other distributions such as a log-logistic and a lognormal distribution that al-
low the hazard functions corresponding to differing scale and shape parameters
to cross but they do not satisfy the model (18.2). In order to get a first-hand
idea on the performance of the proposed test in such situations, a small simu-
lation study was performed. Simulations were performed based on a lognormal
model for the two sample problem using hazard functions AO(t) in Group 0 cor-
responding to the standard lognormal distribution with scale parameter J-L = 0
and shape parameter cr = 1 and Al (t) in Group 1 corresponding to a lognormal
distribution with scale parameter J-L and shape parameter cr. The experiment
was repeated 1000 times for uncensored samples for each of the following combi-
nations: J-L = 0, 1, cr = 0.5,2 and n = 30,50 and 100 per group. The results are
shown in Tables 18.7 through 18.9. From the tables, we see that the empirical
power estimates are much better when cr = 2 relative to cr = 1. Even in the
case of cr = 1, the empirical powers are higher for the case of J-L = 1 relative to
J-L = O. There is an indication that the proposed test is able to pick up crossing
hazards even in situations where the underlying distributions do not belong to
the family of models (i.e., Weibull) included in the non-proportional hazards
model (18.2).
We also compare the goodness-of-fit statistics based on the directed diver-
244 K. Devarajan and N. Ebrahimi

gence as given by i(F : G) and i(G : F) and the divergence J(F : G). The
comparison is made for each combination of sample size, censoring percentage
and Weibull distribution characteristics as given above. In each case, we ob-
serve that the directed divergence given by i(F : G) gives the highest empirical
power among the three statistics. But J(F : G) measures the directed diver-
gences in both directions as pointed out earlier. Overall, the power estimates
based on these three measures are in proximity to each other.

Table 18.1: Simulation results: Power comparison of pro-


posed test uncensored Weibull samples, Weibull sample size
per group = 30
Test /3 = 0.5 /3=2
0=1 0=2 0=1 0=2
Proposed test 1.000 1.000 0.657 0.899
Quantin et al. 0.884 0.828 0.842 0.823
Breslow et al. 0.786 0.734 0.787 0.672
Gill and Schumaker 0.492 0.537 0.754 0.807
Nagelkerke et al. 0.208 0.166 0.191 0.711
Wei 0.692 0.705 0.603 0.601
Harrell and Lee 0.788 0.794 0.888 0.728
Cox (time dependent) 0.704 0.677 0.825 0.716

Table 18.2: Simulation results: Power comparison of pro-


posed test uncensored Weibull samples, sample size per
group = 50
Test /3 = 0.5 /3=2
0=1 0=2 0=1 0=2
Proposed test 1.000 1.000 0.832 0.948
Quantin et al. 0.986 0.979 0.978 0.951
Breslow et al. 0.955 0.923 0.957 0.898
Gill and Schumaker 0.673 0.772 0.953 0.933
Nagelkerke et al. 0.307 0.231 0.338 0.908
Wei 0.913 0.901 0.924 0.824
Harrell and Lee 0.957 0.949 0.990 0.908
Cox (time dependent) 0.892 0.863 0.958 0.915
Goodness-oI-Fit Testing 245

Table 18.3: Simulation results: Power comparison of pro-


posed test uncensored Weibull samples, sample size per
group = 100
Test (3 = 0.5 (3=2
a=1 a=2 a=1 a=2
Proposed test 1.000 1.000 0.968 0.991
Quantin et al. 1.000 1.000 1.000 0.999
Breslow et al. 0.996 0.998 0.999 0.997
Gill and Schumaker 0.939 0.971 1.000 0.999
Nagelkerke et al. 0.574 0.428 0.588 0.997
Wei 0.998 0.997 0.997 0.982
Harrell and Lee 0.957 0.949 0.990 0.908
Cox (time dependent) 0.993 0.986 1.000 0.997

Table 18.4: Simulation results: Power comparison of pro-


posed test 25% censoring, Weibull samples, sample size per
group = 30
Test (3 = 0.5 (3=2
a=1 a=2 a=1 a=2
Proposed test 1.000 1.000 0.626 0.705
Quantin et al. 0.745 0.683 0.752 0.737
Breslow et al. 0.610 0.554 0.673 0.666
Gill and Schumaker 0.602 0.612 0.742 0.710
Nagelkerke et al. 0.098 0.166 0.123 0.290
Wei 0.686 0.662 0.696 0.641
Harrell and Lee 0.686 0.633 0.706 0.551
Cox (time dependent) 0.536 0.515 0.711 0.654
246 K. Devarajan and N. Ebrahimi

Table 18.5: Simulation results: Power comparison of pro-


posed test 25% censoring, Weibull samples, sample size per
group = 50
Test (3 = 0.5 (3=2
a=1 a=2 a=1 a=2
Proposed test 1.000 1.000 0.787 0.946
Quantin et al. 0.934 0.891 0.937 0.912
Breslow et al. 0.829 0.775 0.741 0.861
Gill and Schumaker 0.820 0.797 0.909 0.896
Nagelkerke et al. 0.148 0.221 0.171 0.442
Wei 0.793 0.778 0.808 0.775
Harrell and Lee 0.904 0.888 0.917 0.815
Cox (time dependent) 0.694 0.719 0.908 0.872

Table 18.6: Simulation results: Power comparison of pro-


posed test 25% censoring, Weibull samples, sample size per
group = 100
Test (3 = 0.5 (3=2
a=1 a=2 a=1 a=2
Proposed test 1.000 1.000 0.956 0.998
Quantin et al. 0.999 1.000 0.999 0.994
Breslow et al. 0.980 0.974 0.992 0.992
Gill and Schumaker 0.938 0.982 0.996 0.990
Nagelkerke et al. 0.196 0.208 0.304 0.723
Wei 0.986 0.980 0.991 0.973
Harrell and Lee 0.999 0.995 0.999 0.987
Cox (time dependent) 0.963 0.948 0.998 0.991

Table 18.7: Power comparison of directed di-


vergence and divergence uncensored lognormal
samples, sample size per group = 30
Test u = 0.5 u=2
1-"=0 1-"=1 1-"=0 1-"=1
J(F: G) 0.584 0.876 1.000 1.000
i(G: F) 0.540 0.851 1.000 1.000
J(F : G) 0.561 0.865 1.000 1.000
Goodness-oi-Fit Testing 247

Table 18.8: Power comparison of directed di-


vergence and divergence uncensored lognormal
samples, sample size per group = 50
Test a=0.5 a=2
p,=0 p,=1 p,=0 p,=1
JCF: G) 0.614 0.853 1.000 1.000
iCG: F) 0.579 0.832 1.000 1.000
JCF: G) 0.599 0.847 1.000 1.000

Table 18.9: Power comparison of directed di-


vergence and divergence uncensored lognormal
samples, sample size per group = 100
Test a= 0.5 a=2
p,=0 p,=1 p,=0 p,=1
JCF: G) 0.646 0.849 1.000 1.000
iCG: F) 0.629 0.834 1.000 1.000
JCF: G) 0.641 0.841 1.000 1.000

Table 18.10: Power comparison of directed di-


vergence and divergence uncensored Weibull
samples, sample size per group = 30
Test f3 = 0.5 f3=2
a=l a=2 a=l a=2
JCF: G) 1.000 1.000 0.680 0.905
iCG: F) 1.000 1.000 0.614 0.881
JCF: G) 1.000 1.000 0.657 0.899
248 K. Devarajan and N. Ebrahimi

Table 18.11: Power comparison of directed di-


vergence and divergence uncensored Weibull
samples, sample size per group = 50
Test (3 = 0.5 (3=2
n=1 n=2 n=1 n=2
J(F: G) 1.000 1.000 0.842 0.952
i(G : F) 1.000 1.000 0.810 0.943
J(F: G) 1.000 1.000 0.832 0.948

Table 18.12: Power comparison of directed di-


vergence and divergence uncensored Weibull
samples, sample size per group = 100
Test (3 = 0.5 (3=2
n=1 n=2 n=1 n=2
J(F: G) 1.000 1.000 0.972 0.991
i(G: F) 1.000 1.000 0.964 0.988
J(F: G) 1.000 1.000 0.968 0.991

Table 18.13: Power comparison of directed di-


vergence and divergence 25% censoring, Weibull
samples, sample size per group = 30
Test (3 = 0.5 (3=2
n=1 n=2 n=1 n=2
J(F: G) 1.000 1.000 0.647 0.697
i(G : F) 1.000 1.000 0.588 0.681
J(F: G) 1.000 1.000 0.626 0.697
Goodness-of-Fit Testing 249

Table 18.14: Power comparison of directed di-


vergence and divergence 25% censoring, Weibull
samples, sample size per group = 50
Test {3 = 0.5 (3=2
a=1 a=2 a=1 a=2
J(F: G) 1.000 1.000 0.798 0.947
i(G: F) 1.000 1.000 0.763 0.945
J(F: G) 1.000 1.000 0.787 0.946

Table 18.15: Power comparison of directed di-


vergence and divergence 25% censoring, Weibull
samples, sample size per group = 100
Test {3 = 0.5 (3=2
a=1 a=2 a=1 a=2
J(F: G) 1.000 1.000 0.961 0.998
i(G: F) 1.000 1.000 0.949 0.998
J(F: G) 1.000 1.000 0.956 0.998

18.4 Illustration of the Goodness-of-Fit Test using


Real-Life Data
To illustrate the usefulness of the proposed tests in applied work, we examine
three data sets that have been considered in the literature.

Data 1: The data consists of times to remission (in weeks) of 42 leukemia


patients from Kalbfleisch and Prentice (1980). The patients are randomized
into two groups; a control group and a treatment group for which the drug
6-MP was administered. Each group consists of 21 patients. The covariate is
group membership specified by the indicator variable z where z = 0 denotes
the control group and z = 1 denotes the treatment group. For the complete
data set see Kalbfleisch and Prentice (1980, p. 206).
Using {3 = 1.573 and i = -.111 from Devarajan and Ebrahimi (2000), we
have, J(F : Glz = 1) = 3.33XlO- 6 ,nJ(F : GTz = 1) = 0.00014 which gives a
p-value of 0.991 using an asymptotic X2 distribution with 1 degree of freedom
under the null hypothesis. Since p-value = 0.991 > a = 0.05, we do not reject
Ho. This data set was also analyzed by Gill and Schumaker (1987). Their
250 K. Devarajan and N. Ebrahimi

conclusion coincides with ours. However our p-value is larger.

Data 2: In this example, we analyze data from an animal experiment


presented in Wei (1984) in which a group of male mice are subject to 300
rads of radiation and followed for cancer incidence. This is the placebo or the
control group. The treatment group was placed in a germ free environment.
Using ~ = -.414 and 1 = -.348 from Devarajan and Ebrahimi (2000), we have,
J(F : Glz = 1) = .317. Consequently, nJ(F : Glz = 1) = 16.167 which gives
a p-value of (5.8) x 10- 5 using an asymptotic X2 distribution with 1 degree of
freedom under the null hypothesis. Since p-value = 5.8 x 10- 5 , we do reject Ho
in favor of HI. That is, the Cox model is not appropriate. This data set was
analyzed by Wei (1984). He ended up with a similar conclusion. However, his
p- val ue is larger.

Data 3: The data consists of the times to remission of 33 leukemia patients


presented in Cox and Oakes (1984). There are two covariates, a binary variable
indicating group membership (Control or Treatment group) and a continuous
variable denoting the logarithm of White Blood Count (WBC). There are no
censored observations.
Using (31 = 13.941, (32 = -6.194, 11 = 1.056 and 12 = -0.595 from De-
varajan and Ebrahimi (2000) J(F : Glz) = J(F : GIZI = ii, Z2 = 1) = 2.032.
Now, nJ(F : Glz) = 67.056 which gives a p-value of 2.78X10- 15 using a X2
distribution with 2 degrees of freedom under the null hypothesis. Since p-value
= 2.78XlO- 15 < a = 0.05, we reject Ho in favor of HI. That is, the non-
proportional hazards model is more appropriate.
This data was truncated before the observation value of 7 units. For the
truncated data the parameter estimates are: ~1 = 1.336, 11 = -1.404, ~2 =
-0.318 and 12 = 1.007; see Devarajan and Ebrahimi (2000). The value of our
test statistic is J(F : Glz) = 0.0168. Thus, nJ(F : Glz) = 0.554 which gives
a p-value of 0.758 using a X2 distribution with 2 degrees of freedom under the
null hypothesis. Since p-value = 0.758 > a = 0.05, we do not reject Ho. That
is for the truncated data the Cox PH model is appropriate.
It should be mentioned that mean value of ZI = Zl and Z2 = 1 were used
for computing J(F : Glz) in both cases above.

18.5 Concluding Remarks


A test for the Cox proportional hazards model based on an estimate of the
KLD information between a non-proportional hazards model and the Cox PH
model has been presented. The power of the proposed test was investigated
and compared with the standard tests. The following results were obtained.
Goodness-oi-Fit Testing 251

(a) Use of the information statistic enables us to test the Cox PH model against
a non-proportional hazards model.

(b) Compared with all the tests, our proposed test is not dependent on the
baseline hazard function >'0 (t) .

(c) The proposed test performs very well in terms of power compared with other
leading tests against non-proportional hazards alternatives.

References
1. Andersen, K. (1982). Testing goodness-of-fit of Cox's regression and life
model, Biometrics, 38, 67-77.

2. Arjas, E. (1988). A graphical method for assessing goodness of fit in


Cox's proportional hazards model, Journal of the American Statistical
Association, 83, 204-212.

3. Breslow, N. E. (1974). Covariance analysis of censored survival data,


Biometrics, 30, 89-100.

4. Breslow, N. E., Edler, L., and Berger, J. (1984). A two-sample censored-


data rank test for acceleration, Biometrics, 40, 1049-1062.

5. Cox, D. R. (1972). Regression models and life tables (with discussion),


Journal of the Royal Statistical Society, Series B, 34, 187-202.

6. Cox, D. R. (1979). A note on the graphical analysis of survival data,


Biometrika, 66, 188-190.

7. Cox, D. R. and Oakes, D. O. (1984). Analysis of Survival Data, London:


Chapman Hall.

8. Crowley, J. and Hu, M. (1977). Covariance analysis of heart transplant


survival data, Journal of the American Statistical Association, 72, 27-36.

9. Crowley, J. and Storer, B. E. (1983). Comments on "A Reanalysis of the


Standard Heart 'fransplant Data," Journal of the American Statistical
Association, 78, 277-281.

10. Devarajan, K. and Ebrahimi, N. (2000). Inference for a non-proportional


hazards regression model and applications, submitted for publication.
252 K. Devarajan and N. Ebrahimi

11. Ebrahimi, N., Habibullah, M., and Soofi, E. (1994). Testing exponential-
ity based on Kullback-Leibler information, Journal of the Royal Statistical
Society, Series B, 54, 739-748.

12. Fleming, T. R., O'Fallon, J. R, O'Brien, P. C., and Harrington, D. P.


(1980). Modified Kolmogorov-Smirnov test procedures with application
to arbitrarily right-censored data, Biometrics, 36, 607-625.

13. Gail, M. (1981). Evaluating serial cancer marker studies in patients at


risk of recurring disease, Biometrics, 37, 67-78.

14. Gill, R and Schumaker, M. (1987). A simple test of the proportional


hazards assumption, Biometrika, 74, 289-300.

15. Harrell, F. E. (1986). The PHGLM Procedure, SAS Supplemented Library


User's Guide, Version 5, Cary, North Carolina: SAS Institute Inc.

16. Harrell, F. E. and Lee, K. L. (1986). Verifying assumptions of the Cox


proportional hazards model, In Proceedings of the Eleventh Annual SAS
Users' Group International Conference, pp. 823-828, Cary, North Car-
olina: SAS Institute Inc.

17. Hess, K. R (1994). Assessing time-by-covariate interactions in propor-


tional hazards regression models using cubic-spline functions, Statistics
in Medicine, 13, 1045-1062.

18. Horowitz, J. L. and Neumann, G. R. (1992). A generalized moments spec-


ification test of the proportional hazards model, Journal of the American
Statistical Association, 87, 234-240.

19. Kalbfleisch, J. D. and Prentice, R L. (1980). The Statistical Analysis of


Failure Time Data, New York: John Wiley & Sons.

20. Kay, R (1977). Proportional hazards regression models and the analysis
of censored survival data, Applied Statistics, 26, 227-237.

21. Kooperberg, C., Stone, C. J., and Truong, Y. K. (1995). Hazard regres-
sion, Journal of the American Statistical Association, 78, 78-94.

22. Koziol, J. A. and Byar, D. P. (1975). Percentage points of the asymptotic


distributions of one and two-sample K.S. tests for truncated or censored
data, Technometrics, 17, 507-510.

23. Kullback, S. (1978). Information Theory and Statistics, Dover Publica-


tions.

24. Kullback, S. and Leibler, R A. (1951). On information and sufficiency,


Annals of Mathematical Statistics, 22, 79-86.
Goodness-of-Fit Testing 253

25. Kupperman, M. (1957). Further applications of information theory to


multivariate analysis and statistical inference, Ph.D. Dissertation, Grad-
uate Council of George Washington University.

26. Lagakos, S. W. (1980). The graphical evaluation of explanatory variables


in proportional hazards regression models, Biometrika, 68, 93-98.

27. Lin, D. Y., Wei, L. J., and Ying. Z. (1993). Checking the Cox model with
cumulative sums of martingale residuals, Biometrika, 80, 557-572.

28. Lin, D. Y. and Wei, L. J. (1991). Goodness-of-fit tests for the general
Cox regression model, Statistica Sinica, 1, 1-17.

29. Moreau, T., O'Quigley, J., and Mesbah, M. (1985). A global goodness-
of-fit statistic for the proportional hazards model, Applied Statistics, 34,
212-218.

30. Nagelkerke, N. J. D., Oosting, J., and Hart, A. A. M. (1984). A simple


test for goodness-of-fit of Cox's proportional hazards model, Biometrics,
40,483-486.

31. o 'Quigley, J. and Pessione, F. (1989). Score-tests for homogeneity of


regression effect in the proportional hazards model, Biometrics, 45, 135-
145.
32. Pena, E. A. (1998). Smooth goodness-of-fit tests for the baseline haz-
ard in Cox proportional hazards model, Journal of American Statistical
Association, 93, 673-692.
33. Pettitt, A. N. and Bin Daud, I. (1990). Investigating time-dependence in
Cox's proportional hazards model, Applied Statistics, 313-329.

34. Quantin, C., Moreau, T., Asselain, B., Maccario, J., and Lellouch, J.
(1996). A Regression model for testing the proportional hazards hypoth-
esis, Biometrics, 52, 874-885.

35. Schoenfeld, D. (1980). Chi-squared goodness-of-fit tests for the propor-


tional hazards regression model, Biometrika, 67, 145-153.

36. Schoenfeld, D. (1982). Partial residuals for the porportional hazards re-
gression model, Biometrika, 69, 239-241.

37. Stablein, D. H., Carter, W. H., and Novak, J. W. (1981). Analysis of


survival data with nonproportional hazard functions, Controlled Clinical
Trials, 2, 149-159.
38. Thaler, H. T. (1984). Nonparametric estimation of the hazard ratio, Jour-
nal of the American Statistical Association, 79, 290-293.
254 K. Devarajan and N. Ebrahimi

39. Therneau, T. R. and Grambsch, P. M. (1994). Proportional hazards tests


and diagnostics based on weighted residuals, Biometrika, 81, 515-526.

40. Therneau, T. R., Grambsch, P. M., and Fleming, T. R. (1990). Martingale-


based residuals for survival models, Biometrika, 11, 147-160.

41. Wei, L. J. (1984). Testing goodness-of-fit for the proportional hazards


model with censored observations, Journal of the American Statistical
Association, 19, 649-652.

42. White, H. (1982). Maximum likelihood estimation ofmisspecified models,


Econometrica, 50, 1-26.
19
A New Family of Multivariate Distributions for
Survival Data

Shulamith T. Gross and Catherine Huber-Carol


The City University of New York, New York
Universite Paris V and U472 INSERM, Paris, France

Abstract: We introduce a family of multivariate distributions in discrete time


which may be regarded as a multiple logistic distribution for discrete conditional
hazards. In the independent case, the marginal laws are identical to those of
the univariate logistic model for survival data discussed by Efron (1988). We
present the analysis of a data set previously analyzed using frailty models.

Keywords and phrases: Discrete correlated survival data, censoring, frailty


models, Markov assumption, log-odds, Akaike information criterion

19.1 Introduction
A large fraction of analyses of correlated survival data appearing in recent sta-
tistical publications is based on some form of frailty models. Frailty models
have been used both to model population heterogeneity in univariate survival
data and to model association in multiple survival data. The model we intro-
duce in the present treatise is meant to offer an alternative to the latter only.

19.2 Frailty Models: An Overview


Frailty models, or random effect models, for multivariate survival times of d
members of a family may be divided into two broad categories:
1. conditional frailty models in which the individual survivals within a cluster
are assumed independent conditionally given the cluster frailty, and

255
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
256 S. T. Gross and C. Huber-Carol

2. marginal frailty models often referred to as copula models.

The former were first introduced by Vaupel, Manton, and Stallard (1979) and
Clayton and Cuzick (1985). For recent references and an illuminating discussion
of the two types of frailty models, we refer the reader to Scheike, Petersen, and
Martinussen (1999), and for a rigorous study of the asymptotic properties of
frailty models to Pamer (1998). A Cox type conditional frailty model assumes
that the data for a single cluster k, k = 1,2, ... ,K consists of possibly censored
observations (Tk1,"', T kd ) and death indicators (6k1,"', 6kd) where

Tkj X kj /\ Ckj
6kj 1{Xkj:::;Ckj}'

The censoring variables Ck = (Ck1, ... ,Ckd) are assumed independent and in-
dependent of the survival variables Xk = (Xk1,"', Xkd). Their distributions
are also assumed not to depend explicitly of the parameters (3 and 'Y character-
izing the laws of Xk. The components of Xk are assumed independent given the
frailty Wk and the covariates (Z, V) of cluster k with hazards (instantaneous in
the continuous case and discrete in the discrete case)

(19.1)

The frailties Wk are assumed independent identically distributed according to


some convenient frailty law G. The Wk are also assumed independent of the
covariates and the censoring variables. This conditional frailty model suffers
from several important shortcomings. Chief among them is the lack of inter-
pretability of the regression parameters. In general, one may no longer interpret
single regression parameters, say a single e,j as the effect of the covariate 10
on an individual hazard of failure keeping all other covariates fixed because one
would need to condition on the unobservable frailty Wk of the corresponding
cluster. Furthermore, the values of the parameters (3 and 'Y depend on the
choice of the frailty distribution G, which is typically arbitrary and not eas-
ily estimated from the data. Copula or marginal frailty models overcome the
problem of parameter interpretation because they are in fact semi-parametric
models in which the marginal laws with survival functions Sl, S2, ... ,Sd are
modeled separately from the copula. Clayton's copula for the multivariate case
may be written as
d
Sk(t1,"', td) = P(Xk1 2: t1,'" ,Xkd 2: td) = [2: Skj(tj)~~ -(d-1)ra (19.2)
j=l

for some parameter a > 0, where the marginal survival functions Sl, S2, . ", Sd
are arbitrary, and often assumed to follow Cox model:

(19.3)
New Multivariate Survival Model 257

Here the margin parameters are interpretable in the usual way but the model,
dependent as it is on a single association parameter Ct, allows no possibility
of modeling within cluster dependence on the individual level covariate Vkj,
j = 1,2, ... ,d. The model may therefore represent a first order approximation
to the far more complicated dependence structure likely to be present in real
survival data. Our model which breaks down the dependence structure in the
data into hierarchical components without resorting to the use of random effects
allows a finer representation of dependence. Moreover it allows an interpretation
of the parameters as log odds ratios for failure.

19.3 The Model


Let d = max{ nk, k ~ K} be the maximal cluster size and T the maximum value
of the discrete time. Rk(t) is an element of the set {(a, l)®d} representing the
subset of the elements still at risk just before t, in cluster k, while Yk(t) is
an element of the set {(a, l)®d} representing the subset, at time t, of those
elements of cluster k who jump at time t. Finally, Zk is an observed covariate
that may depend on time t. Following Arjas and Haara (1988), we define the
"past" O'-field recurrently by:

is the initial information on cluster k


:Fk(t - 1) VO'(Rk(t), Zk(t)) (19.4)
Qk(t) VO'(Yk(t)).
Assuming that the censoring law does not depend explicitly on the parameter
of interest 0, the likelihood of parameters 0 = (01, ... , Olp)' of interest may be
written as
K T
V(O) II II p8[Yk(t) = Yk(t)l:Fk(t - 1), Rk(t)]
k=1 t=1
K T
II II p8[Rk(t) = rk(t)l:Fk(t - 1)] (19.5)
k=1 t=1
Vl(O)V2(O).

Here Yk(t) and rk(t) denote the observed values of Rk(t) and Yk(t) for k
1,2," " K, and t = 1,2"", T. For uncensored data rk(t) = rk(t - 1) - Yk(t)
and V(O) = Vl(O). In case of censoring, we assume that 0 = (01,02 ) where 01
is the parameter of interest, so that we can use the partial likelihood Vi for
inference on 01. Without further comment, we shall drop the subscript 1 from
258 S. T. Gross and C. Huber-Carol

e. The fundamental assumption in our model is

P(Yk(t) = Y I Rk(t) = r)
P(Y(t) = Y I R(t) = r). (19.6)

This is a Markov like assumption that characterizes our model. The likelihood
K
V(e) = II pO[Yk(t) = Yk(t) I Rk(t) = rk(t)] (19.7)
k=l
is a complete likelihood for our model in the uncensored case and a partial
likelihood in the right censored case. It may also be viewed, in light of (19.5-
19.6) as a partial likelihood for (j for censored or uncensored data when the true
model for the complete, possibly unobserved, data, does not satisfy assumption
(19.6). Defining now

N(r, y, t) = #(k : Rk(t) = r, Yk(t) = y)


we may write the log-likelihood of (19.7) as

K T
L =L LLL N(r, y, t) In(po[Y(t) = Y I R(t) = r]). (19.8)
k=lt=l r y5,r

We now write pO[Y(t) = Y I R(t) = r] as Pnn(r, Y, t)/c(r, t) and parameterize


the non normalized probabilities Pnn(r, y, t) using a set of parameters Pr,y(t)
indexed by rand 0 < Y :::; r as follows:

(19.9)

and
c(r, t) = L e L:o<rl:O;r L:o<yl:O;rIAyPrl,yl(t). (19.10)
y5,r
When no restrictions are placed on the new parameters Pr,y(t), our model is
saturated and imposes no further restrictions on the model beyond the basic
Markov assumption (19.6). For d = 2 we have a five p-parameters set, Pl1,l1(t),
PU,Ol (t), Pl1,lO( t), P10,1O( t) and P01,0l (t).
Theorem 19.3.1 For clusters of size d, the law defined by (19.7)-(19.10) with
K = 1 for complete data defines a law for independent Xl, ... ,Xd if and only
if Pr,y(t) == 0 for all r with Irl > 1 and all Y :::; r, and marginal laws defined by

P(Xj = t) = II (1 + ePj(tl))-lepj(t') (19.11)


19'9

for pj(t) = Pr,y(t) with rjl = 6jj' and y = r.


New Multivariate Survival Model 259

We state Theorems 19.3.2 and 19.3.3 for the bivariate case. Extensions to
the general d-dimensional case are straightforward but require cumbersome
notation.
Theorem 19.3.2 For the bivariate law defined by (19.1}-(19.10) with K = 1,
d = 2 for the complete data (without censoring), we have:

Pll,IO
= In (P[XI = t,X2 > tlXI ~ t,X2 ~ t]jP[XI > t,X2 > tlXI ~ t,X2 ~ tJ)
P[XI = t,X2 < tlXI ~ t,X2 < tJ/P[XI > t,X2 < tlXI ~ t,X2 < tJ
(19.12)

and similarly for Pll,Ol. Also


-1 (P(XI = t,X2 = t)P(XI > t,X2 > t)) (19.13)
Pll,ll - n P(XI = t, X 2 > t)P(X I > t, X2 ~ t) .
PROOF. Immediate.

This shows that parameter Pll,IO is the log of the odds ratio of the failure risks
for one subject in the cluster according to the presence or absence at risk of the
other one in the cluster.
Theorem 19.3.3 A general discrete bivariate law on ET ® ET belongs to the
family of laws defined by (19.1}-(19.10) for K = 1, d = 2, as defined for
complete data, if and only if:
P(X2 = tlX2 ~ t,XI < t) = P(X2 = tlX2 ~ t,XI = tl) V 0 < tl < t
(19.14)
P(XI = tlXI ~ t, X 2 < t) = P(XI = tlXI ~ t, X2 = t2) V 0 < t2 < t.
PROOF. We prove first that (19.14) is a sufficient condition for being in our
family, as for any bivariate law, we can write for 0 < tl < t2 :::; T,

P(XI = tl, X2 = t2) = P(XI ~ tl, X2 ~ tl)


X P(XI = tl,X2 > tllX I ~ tl,X2 ~ tl)
X P(X2 ~ t21X I = tl, X2 ~ tl + 1)
X P(X2 = t21X2 ~ t2, Xl = tl) (19.15)

tl-l
II P(XI > t,X2 > tlX I ~ t, X2 ~ t)
t=l
xP(XI = tl, X 2 > tllXI ~ tl, X2 ~ tl)
t2-1
X II P(X2 > tlXI = tl, X2 ~ t)
t=h+l
XP(X2 = t21XI = tl,X2 ~ t2)' (19.16)
260 s. T. Gross and C. Huber-Carol

Under assumption (19.14), we may replace Xl = tl by Xl < t and Xl < t2


respectively inside the two last factors, obtaining thus
t1- l
P(XI = tl, X2 = t2) = II P(XI > t, X2 > tlXl 2:: t, X2 2:: t)
t=l
XP(XI = tl,X2 > tllXl 2:: tl,X2 2:: tI)
t2- l
X II P(X2 > tlXl < t, X2 2:: t)
t=t1 +1
XP(X2 = t21Xl < t2, X 2 2:: t2). (19.17)

By symmetry, we have the same result for the case T 2:: tl > t2 > O. For
o :S tl = t2 = t :S T,
we can always write
t-l
P(XI = t, X2 = t) = II P(XI > t/, X 2 > t/IXI 2:: t/, X2 2:: t/)
t'=l
xP(Xl = t, X2 = tlXl 2:: t, X 2 2:: t). (19.18)

This proves the if part of the theorem. We note that the formulation of this
result in terms of our parameters Pnn and c(r, t) is

P(Xl=tl,X2=t2) := 1Y[ c((l, 1


1), t)
]Pnn ((l,l),(l,O),t l )
c((l, 1), tI)
If [
t=l

1 ] Pnn((O, 1), (0, 1), t2)


t=t1+ l c((O, 1), t) c((O, 1), t2)
tII1-l 1 Pnn ((l, 1), (1, 1), tl)
(
PXl=tl,X2=tl )
= .
t=l c((l, 1), t) c((l, 1), t1)

Conversely, if £(Xl' X2) is in our family, (19.14) follows directly from the basic
assumption of our model (19.6). Thus, a test criterion for being in our family
may be based on equations (19.14). This completes the proof of the theorem .

Remarks
1. In case of censoring, the only unusable observations are those for which
the first time is a censoring time. Otherwise, one can stratify on the first
observed "death" time the number of the second "deaths" at each time t
and check for independence. For consistency and asymptotic results for
our partial maximum likelihood estimate see Gross and Huber (2000).

2. In applications, when (Xkl' ... ,Xkd) are not exchangeable, their indices
can represent "structural covariates". In some applications Xkl may rep-
resent survival of the treated and Xk2 survival of the non treated. In
New Multivariate Survival Model 261

the skin grafts example below there are up to 4 closely matched grafts
and up to 4 poorly matched grafts per patient. One representation of the
data would then involve up to d = S grafts, where X kl, ... , X k4 represent
survival of closely matched grafts, and Xk5, ... ,Xk8 represent survival
of poorly matched grafts. We chose a more parsimonious representation
below that reflects the exchangeability within the poorly matched and
within closely matched grafts.

19.4 An Application to Skin Grafts Rejection


19.4.1 Description of the data
The data from Bachelor and Hackett (1970), see Table 19.1, is concerned with
skin grafts on severely burnt patients, having 30% to 50% of the surface of their
skin burnt. There are at most N = 4 grafts per patient, and the HLA matching
of the donor and the receiver is divided into two classes: 1 for close and 2 for
poor. Only K = 16 patients (clusters) are available in all. The question is what
is the impact of the HLA matching on the duration of the graft, taking into
account the dependence of the different grafts on the same patient. Censoring
time is death of patient before rejection. The data are as follows. The notation
for the risk and jump sets is slightly different, as one has to take into account
the fact that some elements in the cluster are exchangeable leading to equality
of the corresponding parameters piS. Examples of risk sets, jump sets and
parameters are given in Table 19.2. Since the data refer to 16 patients in all,
with a total of 34 allografts, a completely non parametric model that leaves
all main effect parameters Pl,l(t) and P2,2(t), for t between 11 and 93 days, is
not feasible. We therefore attempted a model with time-linear main effects and
constant interaction parameters. That is model 1 (see Table 19.3). We dropped
the time dependency from the model using Akaike's criterion and chose model
2. We then proceeded to search for the most parsimonious model with constant
p-parameters. In the model chosen for its smallest AIC, model 8, only three
parameters remain: PlI,l, P12,1 and P12,1 = PI2,2· Thus the single interaction
parameter in the model does not depend on the quality of the match (l=close,
2=poor). Since PI,1 is a log odds and so is P2,2, we may estimate closeness
of match effect by PI,1 - P2,2 = -1.42 with a 95% confidence interval equal to
(-2.21, - .62) leading to e(Pl,1-P2,2) = .24, with a 95% confidence interval equal
to (.11, .53). The model we chose, modelS, shows a main effect for graft match,
but no second order effect since P12,1 = P12,2 in this model. This should be
compared to the results of Nielsen et al. (1992). They fit a Cox model with
gamma frailty and obtained an estimate of the "treatment effect" ef3 = 0.3l.
The likelihood ratio test of Ho : j3 = 0 was 10.2S. No confidence interval
is reported. They mention that Kalbfleisch and Prentice (19S0) obtained an
262 S. T. Gross and C. Huber-Carol

Table 19.1: Bachelor and Hackett (1970) skin grafts data on severely burnt
patients

Patient sex age % burnt HLA time until rejection censoring


1 m 7 38 p 19 1
2 f 8 40 c 24 1
3 m 2.5 40 p 18 1
p 18 1
4 f 7 50 p 29 1
c 37 1
5 m 22 35 c 19 1
p 13 1
6 f 14 35 p 19 1
p 19 1
7 f 63 40 c 57 0
c 57 0
p 15 1
8 m 47 45 c 93 1
p 26 1
9 f 60 20 c 16 1
p 11 1
10 m 57 18 c 22 1
p 17 1
11 m 17 45 c 20 1
p 26 1
12 m 16 30 p 21 1
c 18 1
13 m 32 55 c 77 1
c 63 1
p 43 1
c 29 1
14 f 31 50 p 28 0
p 28 0
15 f 42 30 c 29 1
p 15 1
p 18 1
16 m 62 45 p 41 1
c 60 0
New Multivariate Survival Model 263

Table 19.2: Some risk sets R and jump sets S for skin grafts data

Risk sets:
Rk(i) = (1122) 2 close and 2 poor grafts are present at time i in patient k.
Rk,(i) = (1120) 2 close and 1 poor grafts are present at time i in patient k'.
Rk,,(i) = (1100) 2 close grafts are present at time i in patient k".

Jump sets:
Sk(i) = (1200) 1 close and 1 poor grafts were rejected at time i in patient k.
Sk,(i) = (2000) 1 poor graft was rejected at time i in patient k'.
Sk" (i) = (0000) No graft was rejected at time i in patient k".

Table 19.3: Model selection for burn data

Model # Model description Ale


1 PI,I(t) = al + bIt 221.97
P2,2(t) = a2 + b2 t
Pn,l, P22,2, PI2,1, P12,2 constant
2 PI,I, P2,2, Pll,l, P22,2, PI2,1, P12,2 constant 220.10
3 PI,I,P2,2,PI2,1 constant 219.59
4 PI,1,P2,2,pn,1 constant 220.52
5 PI,I, P2,2, P12,2 constant 216.52
6 PI,1 = P2,2, P12,2 constant 227.79
7 PI,1 = P2,2,PI2,1 = P12,2 constant 226.72
8 PI,1, P2,2, P12,1 = P12,2 constant 215.21 *
9 PI,I,P2,2, constant 218.90
10 PI,I,P2,2,P22,2 constant 217.28
11 PI,I, P2,2, PI2,1, P12,2 constant 217.21
12 PI,I, P2,2, PI2,1, P22,2 constant 217.73

Table 19.4: Parameters estimation in model 8 having the smallest Ale


Parameter Estimate 95 % confidence interval Ale
PI,1 -3.1438226 -3.816822 -2.470823 215.2077
P2,2 -1.7249390 -2.177313 -1.272564
P12,1 = P12,2 -0.6581479 -1.951248 0.634952
264 S. T. Gross and C. Huber-Carol

effect of 0.22 with a 95% confidence interval of (0.048, 1.03) based on a subset
of the data. Our result is certainly in rough agreement with theirs. Nielsen et
al. (1992) could not reject the hypothesis that the variance parameter of the
frailty gamma distribution is zero, in other words, the hypothesis that allografts
of a given patient are independent. In our fitted model we can reject the null
hypothesis that allograft survivals in a single patient are independent (although
the 95% confidence interval for P12,1 = P12,2 contains zero, but just barely). The
likelihood ratio test for Ho : P12,1 = P12,2 = 0 in model 8 is LR = 5.7, which is
significant with P = 0.02. Note that our model contains only one dependence
parameter and that it does not depend on graft match.

References
1. Arjas, E. and Haara, P. (1988). A note on the asymptotic normality in
the Cox regression model, The Annals of Statistics, 16, 1133-1140.
2. Bachelor, J. R. and Hackett, M. (1970). HL-A Matching in treatment of
burned patients with skin allografts, Lancet, 19, 581-583.
3. Clayton, D. G. and Cuz ick , J. (1985). Multivariate Generalizations of
the Proportional Hazards Model (with discussion), Journal of the Royal
Statistical Society, Series A, 148, 82-117.
4. Clegg, 1. X., Cai, J., and Sen, P. K. (1999). A marginal mixed baseline
hazards model for multivariate failure time data, Biometrics, 55, 805-812.
5. Efron, B. (1988). Logistic regression, survival analysis, and the Kaplan-
Meier curve, Journal of the American Statistical Association, 83,414-425.
6. Gross, S. T. and Huber-Carol C. (2000). Hierarchical dependency models
for multivariate survival data with censoring, Lifetime Data Analysis, 6,
299-320.
7. Hanley, J. A. and Parnes, M. N. (1983). Nonparametric estimation of
a multivariate distribution in the presence of censoring, Biometrics, 39,
129-139.
8. Huster, W. J., Brookmeyer, R., and Self, S. G. (1989). Modeling paired
survival data with covariates, Biometrics, 45, 145-156.
9. Kalbfleisch, J. G. and Prentice (1980). The Statistical Analysis of Failure
Time Data, New York: John Wiley & Sons.
10. Nielsen, G. G., Gill, R. D., Andersen, P. K., and Sorensen, T. 1. A. (1992).
Scandinavian Journal of Statistics, 19, 25-43.
New Multivariate Survival Model 265

11. Pamer, E. (1998). Asymptotic theory for the correlated Gamma-frailty


model, Annals of Statistics, 26, 183-214.

12. Ross, E. A. and Moore, D. (1999). Modeling clustered discrete or grouped


time survival data with covariates, Biometrics, 55, 813-819.

13. Scheike, T. H., Petersen, J. H., and Martinussen, T. (1999). Retrospective


ascertainment of recurrent events: An application to time to pregnancy,
Journal of the American Statistical Association, 94, 713-725.

14. Vaupel, J. W., Manton, K. G., and Stallard, E. (1979). The impact of
heterogeneity in individual frailty and the dynamics of mortality, Demog-
raphy, 16,439-447.
20
Discrimination Index, the Area Under the ROC
Curve

Byung-Ho Nam and Ralph B. D'Agostino


Boston University, Boston, Massachusetts

Abstract: The accuracy of fit of a mathematical predictive model is the degree


to which the predicted values coincide with the observed outcome. When the
outcome variable is dichotomous and predictions are stated as probabilities
that an event will occur, models can be checked for good discrimination and
calibration. In case of the multiple logistic regression model for binary outcomes
(event, non-event), the area under the ROC (Receiver Operating Characteristic)
curve is the most used measure of model discrimination. The area under the
ROC curve is identical to the Mann-Whitney statistic. We consider shift models
for the distributions of predicted probabilities for event and non-event. From the
interval estimates of the shift parameter, we calculate the confidence intervals
for the area under the ROC curve. Also, we present the development of a
general description of an overall discrimination index C (overall C) which we
can extend to a survival time model such as the the Cox regression model. The
general theory of rank correlation is applied in developing the overall C. The
overall C is a linear combination of three independent components: event vs.
non-event, event vs. event and event vs. censored. By showing that these three
components are asymptotically normally distributed, the overall C is shown to
be asymptotically normally distributed. The expected value and the variance
of the overall C are presented.

Keywords and phrases: Discrimination, calibration, logistic regression


model, receiver operating characteristic, Cox regression model, discrimination
index, health risk appraisal function, confidence interval, Mann-Whitney sta-
tistic, censoring

267
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
268 B-H. Nam and R. B. D'Agostino

20.1 Introduction
Background: Performance measures in mathematical predictive
models
Consider a vector of variables V(Vl' V2, ... , Vk), independent variables in a re-
gression, or risk factors, and a va,riable W, the dependent variable, or outcome
variable having 1 for positive outcome and 0 for negative outcome. Here, 'pos-
itive outcome' indicates occurrence or presence of an event such as coronary
heart disease.
Health Risk Appraisal functions (HRAF) are mathematical models that are
functions of the data (V), which relates to the probability of an event (W).
Symbolically, for the configuration of V of the data

!(V) = !(Vl, l/2, ... , Vk) = P(W = 1) = P


where P(W = 1) is the probability of a positive outcome or an event.
Our focus is to evaluate the performance of a HRAF with regard to its ability
to predict the outcome variable. We consider two models: logistic regression
model, and survival model such as Cox regression model.
First, the logistic regression model is a model that relates V to the develop-
ment of an event over a period of time t, and the following is its mathematical
expression:

1 1
P(W = 1) = _
1 + exp(( -,B/V))

Second, the Cox regression model is a survival analysis model that relates
V to the development of an event over a period of time t, but in this model, we
take into consideration the time to event and censoring, for example, dropouts,
lost to follow-up. The following is its mathematical expression:

where So(T = t/V) is the survival probability for those with the mean vector
values, V.

Measures of predictive accuracy


The accuracy of a model is the degree to which the predicted values coincide
the observed outcomes. When the outcome variable is dichotomous and predic-
tions are stated as probabilities that an event will occur, models can be checked
for two general concepts, discrimination and calibration. Our focus here is to
Area Under the ROC Curve 269

evaluate the performance of a model with regard to its discrimination. Dis-


crimination refers to a model's ability to correctly distinguish the two classes
of outcomes. Perfect discrimination would result in two non-overlapping sets of
predicted probabilities from the model, one set for the positive outcomes, the
other for the negative outcomes.

20.2 Nonparametric Confidence Interval for Area


under the ROC Curve
20.2.1 Discrimination in logistic regression
The area under the Receiver Operating Characteristic (ROC) Curve is one
the most used measures for a model discrimination. The following is how to
construct a ROC curve from the logistic regression:
Suppose we have n subjects. All n subjects have their predicted probabil-
ities (Yl, Y2, ... , Yn). Then, select some probability value y* and state as the
decision rule that all subjects with predicted probabilities equal to or above
that value Y* will be classified as positive, and all of less will be classified
as negative. Hence, for each Y*, a two by two table such as follows can be
generated:
Call subject + if Yi > Y*
+
True + a b
State
c d
From this table, sensitivity = a~b' and specificity = c~d can be calculated. If
one selects all possible values of Y* for decision cutoff points and plots sensitivity
on the Y axis and I-specificity on the X axis in a two dimensional graph and
connect the plots by a line curve, then the resulting curve is called the Receiver
Operating Characteristic (ROC) Curve. The area under this curve is a measure
for model discrimination. Interpretation of this area, also called C statistic, is
that it is the estimated probability for positive outcome being higher than that
for a negative outcome. Thus,
C statistic C = area under the ROC curve
P(Yl > Y2)
where
Yl predicted probabilities for those who had events
Y2 predicted probabilities for those without events.
270 B-H. Nam and R. B. D'Agostino

The value of C varies from 0.5 with no discrimination ability to 1 with per-
fect discrimination and is related only to the ranks of the predicted probabilities.
Bamber (1975) recognized that the area under the ROC curve is an unbiased
estimator of the probability of correctly ranking a (event, no-event) pair and
that this probability is closely connected with the Mann-Whitney statistic.
Hanley and McNeil (1982) elaborated the relationship between the area
under the ROC curve and the Mann-Whitney statistic and showed that the
two are identical, i.e.,

C statistic (20.1)

where

number of those who had events


number of those without events
[number of pairs (Yl, Y2) with Yl > Y2]
1
+2" [number of pairs (Yl, Y2) with Yi = Y2].

Lehmann (1975) showed that the C statistic is asymptotically normally


distributed. See the appendix for detail. Several methods for constructing
confidence intervals or confidence bounds for the area under the ROC curve
have been proposed: Sen (1967) and Govindarajulu (1991) proposed confidence
intervals based upon asymptotic normality of the Mann-Whitney statistic and
the consistent estimator of its standard error. Birnbaum and McCarthy (1958)
developed a method for constructing confidence bounds that do not depend on
asymptotic normality, but which are very conservative. Ury (1972) proposed
a method that used Chebychev's inequality in place of asymptotic normality,
which yields a very conservative confidence interval. More recently, Hilgers
(1991) proposed distribution- free confidence bounds for ROC curves based
on a combination of two local confidence bounds calculated separately for the
sensitivity and specificity. Schafter (1994) proposed efficient confidence bounds
for .ROC curves by elaborating the test statistic introduced by Greenhouse and
Mantel, and generalized by Linnet (1987).
In this paper, we propose a nonparametric approach for constructing a con-
fidence interval for the area under a ROC curve. In Section 20.2.1, we reviewed
the computation of the area under the ROC curve. In Section 20.2.2, we de-
scribe how to obtain point and interval estimates for the shift parameter'~, the
"difference" between the distribution of predicted probabilities of the events
and non-events. Using the interval estimates for ~, we then derive a confidence
interval for the area under the ROC curve in Section 20.2.3.
Area Under the ROC Curve 271

20.2.2 Estimation of the shift parameter .6 under the shift


model
As above, let the YI 's be the predicted probabilities for those who had events and
Y2 's be the predicted probabilities for those without events. Say the cumulative
distribution function of YI and Y2 are G and F. The shift model assumes that
G(y) = F(y - ~), for all y, ~ > 0 so that the distribution G is obtained by
shifting F by an amount of ~.

Point estimate of ~

Say, we have random samples of nI events and n2 non-events, then under the
shift model, the observations Y2I , Y22, .. " Y 2n2 , and Yu -~, Yi2 -~, ... , Y In ! -
~ have the same distribution. Hence, we could estimate ~ by the amount
which the YI-values must be shifted to give the best possible agreement with
Y2-values. To do this, we define Djk = YIj - Y 2k , for j = 1 to nI, and k = 1
to n2. Then following Lehmann (1975), the estimator Li would be the median
of nIn2 values of Djk which Lehmann (1975) showed is an unbiased estimator
of ~ (i.e., E(Li) = ~) and also median unbiased (i.e. ~ is the median of the
distribution of Li) if one of the following conditions are satisfied:

(1) The distribution of F is symmetrically distributed about some point J-L.

(2) The two sample sizes are equal, that is, nI = n2.

Confidence interval of ~

A distribution-free (i.e., independent of F) confidence interval (~low' ~up) for


~ can now be obtained readily using another theorem of Lehmann's (1975, p.
87). From this theorem, we have

(20.2)

and 8 is the coverage probability such that

(20.3)

where Dee) is the eth ordered value of Djk for j = 1 to nl and k = 1 to n2


and 8 is the coverage probability. For small nI and n2, we can use the table for
e
the Mann-Whitney test to obtain the value of and the closest value of 8 (e.g.
0.5%,2.5%,97.5%,99.5%). If nI and n2 are large, Lehmann (1975) suggested a
normal approximation to get the value of e. Applying the approximation with
continuity correction to the right-hand side of (20.3), we have

(20.4)
272 B-H. Nam and R. B. D'Agostino

where <1>(-) is the cumulative distribution of the standard normal distribution.


For a 95% confidence interval such that P(~low :5 ~ < ~up) = 0.95, the lower
bound ~low = DC£) where

o
(, = '12 [
nln2 + 1 - 196
. nln2(nl + n2 + 1)
3 (20.5)

so, the ~low will be the fth value from the lowest of nln2 values of Djk. In
a similar fashion, the upper bound ~up = DC n ln2-£+1) is the (nIn2 - f + 1)th
value from the lowest of values of Djk.

20.2.3 Confidence interval for the area under the ROC curve
We now construct the lower and the upper confidence bounds for C (Ctow, Cup)
by using the lower and the upper bounds for ~. Let Dis(Y) denote the distri-
bution of Y. Then, it can be seen that, under the shift model,

Dis(YI - ~o) = Dis(Y2). (20.6)

Hence,

Dis(Yl + ~I - ~o) = Dis(Y2 + ~I). (20.7)

For the lower bound (Ctow), let ~I = ~low' ~o = A. Then, from (20.6), (20.7),
Dis(Yl - A) = Dis(Y2) and

Dis(YI + ~low - A) = Dis(Y2 + ~low)· (20.8)

Now, say we have a new pair of (Y2k, Vj) for k = 1 to n2, j = 1 to nl, where
Vj = YIj + (~low - A). Hence, the Ctow would be
_1_ [{ number of pairs (k, j) with Y2k < Vj}
mn
+{number of pairs (k,j) with Y2k = Vj}]
1
= --WVY2 . (20.9)
nln2
For the upper bound (Cup), let ~I = ~up, ~o = A. Then, from (20.6), (20.7),
Dis(Yl - A) = Dis(Y2). So,

Dis(Yl + ~up - A) = Dis(Y2 + ~up). (20.10)

Now, say we have a new pair of (Y2k, Uj) for k = 1 to n2, j = 1 to nl, where
1
Uj = }j + (~up - ~). Hence, the Cup would be nln2 WUY2. Therefore, the
h

confidence interval for C, the area under the ROC curve would be

(20.11)
Area Under the ROC Curve 273

Since WYIY2 is monotonic non-decreasing, this interval preserves the confi-


dence coverage probability of (~low' ~up).
For example, below we have 20 non-events and 5 events and their predicted
probabilities:

Y1 : 0.111 0.1480.1890.2370.251
Y2: 0.0340.0670.095 0.107 0.114 0.121 0.1280.133 0.139 0.142
0.1470.152 0.155 0.164 0.175 0.1870.1930.2160.2270.243

From (20.1), we compute WYIY2 = 70 and C = s:~o = 0.7.

Point estimate of ~(A)


A is the median of mn values of Djk. Here, we have mn = 100, we see A is

1 1
2[D(SO) + D(s1)l = 2[0.041 + 0.042] = 0.0415.

Confidence interval for ~

From (20.5), the £ value for 95% confidence interval is £ = 22. Hence, ~low is
D(22) = -0.022. Similarly, ~tip is D(lOO-22+1) = D(79) = 0.104. So, the 95%
confidence interval for ~ is (-0.022,0.104).

Confidence interval for C


For C low , we get Vj = Ylj+(~low-A) = Y1j +( -0.022-0.0415) = Yij -0.0635
for j = 1 to 5. Then, we calculate WVY2 = 38, hence Clow = 13~o = 0.38. For
Cup, we get Uj = Ylj + (~up - A) = Ylj + 0.104 - 0.0415 = Y1j + 0.0625 for
j = 1 to 5. Then we calculate WU Y 2 = 91, hence Cup = :Jo = 0.91. Therefore,
the 95% confidence interval for C is (0.38, 0.91).

20.3 Extension of C Statistic to Survival Analysis


Suppose we have n individuals, among which nl developed events in time t
(event), n2 did not develop events in time t (non-event) and n3 were censored
in time t (censored) (n = nl + n2 + n3)'
Define

Ti survival time for ith individual, i = 1,2, .. , ,n


Yi predicted probability for developing an event in time t for
ith individual, i = 1,2, ... , n.
274 B-H. Nam and R. B. D'Agostino

Then, we have n pairs of (Tl' Yl), (T2, Y2), ... , (Tn, Yn).
Define

where,

Q the total number of comparisons made


aij 1 if Ii < Tj, and at least one of pair (Ii, Tj) is for event,
i, j = 1,2, ... , n
o otherwise
bij 1 if Yi > Yj, and at least one of pair (Yi, Yj) is for event,
i,j=1,2, ... ,n
o otherwise.
Hence we have

Tli survival time for event, i= 1,2, ... ,nl


Yli predicted probability for event, i = 1,2, ... , nl
T 2j survival time for non-event, j = 1,2, ... ,n2
Y2j predicted probability for non-event, j = 1,2, ... , n2
T3j survival time for censored, j = 1,2, ... ,n3
Y3j predicted probability for censored, j = 1,2, ... , n3.
From the above, we have three sets of comparisons:

1. event vs. non-event: comparing those who developed events against those
who did not

2. event vs. event: comparing those who developed events against those who
also developed events

3. event vs. censored: comparing those who developed events against those
who were censored

Note that these three comparisons are independent of one another.


Now, we examine the first component of overall C (event vs. non-event):
Define C statistic, Cl

(20.12)
Area Under the ROC Curve 275

where

Ql the total number of comparisons made


aij 1 if Tli < T2j
o otherwise
bij 1 if Yli > }2j.

Here, since all the survival times for those who did not develop events are
longer than the maximum value of the event time for those who developed
events, it is obvious that aij is always equal to 1. Hence ,
1 nl n
Cl = -Q L Lbij. (20.13)
1 i=1 j=1

The numerator in Cl is exactly the same as the Mann-Whitney statistic for


continuous data when we compare the predicted probabilities for dichotomous
outcomes where Ql = nl . n2. Thus, Cl can be expressed as
1
Cl = - - - WYIY2· (20.14)
nl· n2

Therefore, C 1 is asymptotically normally distributed.


Next, we examine the second component of overall C (event vs. event). One
important assumption to make here is that one who developed an event earlier
in time has higher predicted probability for an event. Define

(20.15)

where

aij 1 if Tli < T 1j , i,j = 1,2, ... ,nl, i<j


o otherwise
bij 1 if Y 1i > Yij, i,j = 1,2, ... ,nl, i<j
= o otherwise.
C 2 is very closely related to the rank correlation coefficient 7. 7 has total score
in its numerator. Instead, C2 has the total positive score, P S in its numerator.
Both 7 and C2 have the same denominator Q2, which is equal to ~nl(nl - 1).
It can be shown that C2 has a linear relationship with 7,

C2 = ~ (7 + 1). (20.16)

Kendall (1970) showed that it is asymptotically normally distributed. See the


appendix for its mean and variance. Hence C2 is also asymptotically normal.
276 B-H. Nam and R. B. D'Agostino

Now, we examine the third component of overall C (event vs. censored).


Define

(20.17)

where

aij 1ifT1i<T3j, i=1,2, ... ,n1, j=1,2, ... ,n3


o otherwise
bij 1 ifYii> Y3j, i=1,2, ... ,nI, j=1,2, ... ,n3
o otherwise.
In this component, we only take a pair of comparison [(T1i, Y 1i), (T3j, Y3j)]
where the censored time (T3j) is longer than the event time (T1i)' Thus, the
numerator of C3 can be expressed as E7';1 Ej!l (bij I aij = 1). Also, Q3 should
be equal to E7';1 Ej!l aij. An important assumption that we make here is
that the censoring occurs completely at random so that C3 is independent of
C 1 and C2. The numerator of C3 is the same as the Mann-Whitney statistic
given that we only include the pairs where the censored time is longer than
the event time. Hence, C3 can be described as a conditional Mann-Whitney
statistic. We express C3 as

(20.18)

Since the Mann-Whitney statistic is asymptotically normally distributed, we


can argue that C 3 is also asymptotically normal. See appendix for its mean
and variance.
Now, we combine all three components into one overall discrimination index
C. From (20.1), (20.2) and (20.3), we have

c =

so, C is a linear combination of C1, C2 and C3 where


Area Under the ROC Curve 277

Now, we can argue that the overall C tends to normality since Cl, C 2 and C3
are all independent of one another and each of them is asymptotically normal.
See appendix for the mean and variance of the overall C.

Appendix
1. Mean and Variance of C statistic, the area under the ROC curve in logistic
regression

E(C) P(YI > Y2) = H


1 2
Var(C) - [ P l ( l - PI) + (n2 -1)(P12 - PI)
nln2
+(nl - 1) (P13 - Pr)]

where,

H2 P(Y2 < Yl and Y2 < Y{) for Yl 1= Y{


P13 P(Y2 < Yl and Y~ < Yl ) for Y2 1= Y~.

2. Mean and Variance of (f)

E(f)
Var(f)

where, T is the rank correlation coefficient and Var(Ti) is an unknown


quantity and

3. C3 tends to normality with

E(C3)
Var(C3)

where,

P32 P(YI > Y3, Y{ > Y3!Tl < T3, T{ < T3)
P33 P(YI > Y3, Yl > Y;!Tl < T3, Tl < T~)
and Q3, A and B are unknown quantities.
278 B-H. Nam and R. B. D'Agostino

4. Mean and Variance of overall C

Var[C]
Var[aCI + bC2 + (1 - a - b)C3]
a 2Var[C I ] + b2Var[C2] + (1 - a - b)2Var[C3]
a 2_ 1 _{PI(1_ PI) + (ni - 1)(P12 - Pf) + (n2 - 1)(PI3 - Pf)}
nIn2
21 [ 4(nl - 2) 2 2 ]
+b- 4 nl (nl - 1)Var(7i)+ nl (nl - 1)(1-7)
2 1 2 2
+(1 - a - b) Q2)3 [Q3 P3(1- P3) + A(P32 - P 3 ) + B(P33 - P 3 )]
nIn2
{nIn2 + ~nl(ni -1) + E~';I Ej!1 aij}2
x{H(l - PI) + (ni - 1) (P12 - Pf) + (n2 - 1)(PI3 - Pf)}
gnl(nl - 1)}2
+ 2
{nIn2 + ~nl (ni - 1) + E~';l Ej!1 aij}
1 [ 4(nl - 2) 2 2 ]
4 nl (ni - 1)Var(7i)+ ni (ni - 1)(1-7)
x-
1
+ 2
{nIn2 + ~nl(nl -1) + E~';I Ej!1 aij}

x [~t, a'j P3(1 ~ P3) + A(P32 ~ Pj) + B(P33 ~ pj)].


Area Under the ROC Curve 279

References
1. Bamber, D. (1975). The area above the ordinal dominance graph and
the area below the receiver operating graph, Journal of Mathematical
Psychology, 12, 387-415 .

2. Birnbaum, Z. W. and McCarthy, R. C. (1958). A distribution-free upper


bounds for Pr{Y < X}, based on independent samples of X and Y,
Annals of Mathematical Statistics, 29, 558-562.

3. Govindarajulu, Z. (1991). Distribution-free confidence bounds for P(X <


Y), Methods of Information in Medicine, 30, 96-101 .

4. Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area
under a receiver operating characteristic (ROC) curve, Radiology, 143,
29-36.

5. Hilgers, R. A. (1991). Distribution-free confidence bounds for ROC curves,


Methods of Information in Medicine, 30, 96-101.

6. Kendall, M. G. (1970). Rank Correlation Methods, London: Griffin.

7. Lehmann, E. L. (1975). Nonparametrics (Statistical Methods based on


Ranks), California: Holden-Day.

8. Linnet, K. (1987). Comparison of quantitative diagnostic tests: type I


error, power, and sample size, Statistics in Medicine, 6, 147-158.

9. Schafter, H. (1994). Efficient confidence bounds for ROC curves, Statistics


in Medicine, 13, 1551-1561.
10. Sen, P. K. (1967). A note on asymptotically distribution-free confidence
bounds for P{X < Y}, based on two independent samples, Sankhyii,
Series A, 29, 95-102,

11. Dry, H. K. (1972). On distribution-free confidence bounds for Pr{Y < X},
Technometrics, 14, 577-581.
21
Goodness-of-Fit Tests for Accelerated Life Models

Vilijandas Bagdonavicius and Mikhail S. Nikulin


Vilnius University, Lithuania
Universite Bordeaux 2, Bordeaux, France

Abstract: Goodness-of-fit test for the generalized Sedyakin's model is pro-


posed when accelerated experiments are done under step-stresses. Various al-
ternatives are considered. Power of the test against approaching alternatives is
investigated.

Keywords and phrases: Accelerated life testing, accelerated failure time


(AFT), Cox model, Generalized Sedyakin's model (GS), goodness-of-fit, power
function, proportional hazards (PH), Sedyakin's model, step-stress

21.1 Introduction
In accelerated life testing (ALT) units are tested at higher-then-usuallevels of
stress to induce early failures. The results are extrapolated to estimate the
lifetime distribution at the design stress using models which relate the lifetime
to the stress.
Many models for constant-over-time stresses are known. An important tool
for generalization of such models to the case of time-varying stresses is the physi-
cal principle in reliability formulated by Sedyakin (1966) for simple step-stresses
and generalized by Bagdonavicius (1978) for general time-varying stresses.
Some of the well-known accelerated life models [see, for example, Bagdon-
avicius et al. (2000)] for time-varying stresses as, for example, the accelerated
failure time (AFT) model, verify this principle, some do not. An example is the
case of the proportional hazards (PH) model when the failure time distribution
is not exponential under constant stresses. In this paper a goodness-of-fit test
is given for the generalized Sedyakin's (GS) model when the data are obtained
from accelerated experiments with step-stresses.

281
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
282 V. BagdonaviCius and M. S. Nikulin

Suppose that stress is a deterministic time function:

where E is a set of possible stresses. If x(·) is constant in time, we shall write


x instead of x(·) and we note El = {x} a set of constant-in-time stresses.
Denote informally by T x(.) the time-to-failure under stress x(·) and by Sx(.)(t)
the survival function.
The sense of accelerated life models is best seen if they are formulated in
terms of the hazard rate function
. 1 S~()t)
C¥x(.)(t) = hm -h P{Tx(.) E (t, t + hJI Tx(.) > t} = - S ()'
hlD x(.) t
Each specified accelerated life model relates the hazard rate (or other func-
tion) to stress in some particular way. Denote by AxO(t) = -In{SxO(t)} the
accumulated hazard rate under stress x(·).

21.2 Generalized Sedyakin's Model


The idea of the physical principle in reliability is the following. For two identical
populations of units functioning under different stresses Xl and X2, two moments
tl and t2 are equivalent if the probabilities of survival until these moments are
equal:
P{TX1 > tl} = SXl(tl) = SX2(t2) = P{TX2 > t2}'
If after these equivalent moments the units of both groups are observed under
the same stress X2 , i.e. the first population is observed under step-stress

x(r) = {Xl.X2,
0::; r < tI,
r 2:: tI,

and the second always under the constant stress X2, then for all s > 0

C¥x(.)(tl + s) = C¥X2(t2 + s).


Using the idea of Sedyakin, Bagdonavicius generalized the model to the case of
any time-varying stresses by supposing that the hazard rate c¥xO(t) at any mo-
ment t is a function of the value of stress at this moment and of the probability
of survival until this moment. It is formalized by the following definition.
Definition 21.2.1 The generalized Sedyakin's (GS) model holds on a set of
stresses E if there exists a positive on E x R + function 9 such that for all
x(·) E E

(21.1 )
Goodness-of-Fit Tests for Accelerated Life Models 283

Equivalently, the model can be written in the form axe) (t) = gl (x (t), A x (.) (t) ) ,
where gl(X,S) = g(x,exp{-s}). On sets El of constant stresses the equality
(21.1) always holds. It is seen from the following proposition.

Proposition 21.2.1 If the hazard rates ax(t) > 0, t > 0 exist on a set of
constant stresses El then the as model holds on El.

PROOF. For all x E El we have:

ax(t) = ax(A;l(Ax(t))) = gl (x, Ax(t)) ,


with gl (x, s) = ax(A;l(s )). •
Thus, the GS model does not give any relations between the hazard rates
(or survival functions) under different constant stresses. This model only shows
the influence of variability of stress in time on survival and gives the rule of the
hazard rate (or survival) function construction under any time-varying stress
from the hazard rate (or survival) functions under different constant stresses.
It is seen from the following proposition.

Proposition 21.2.2 If the as model holds on a set E :::) El of stresses x(·) :


R+ --. E l , then for all xC) E E

(21.2)

where Xt is constant stress equal to the value of time-varying stress x(·) at the
moment t.

PROOF. If the GS model holds on a set E:::) El then for all x E El

gl (x, s) = gl { x, Ax (A; 1 (s )) } = ax (A; 1 (s ) ).


Thus,


Restrictions of the GS model when not only the rule (21.2) but also some
relations between survival under different constant stresses are assumed can be
considered. These narrower than GS models can be formulated by using models
for constant stresses and the rule (21.2).
Let us consider the meaning of the rule (21.2) for step-stresses of the form:

x(r)=xi, if rE[tm-l::::;r<tm ) i=l,···,m, (21.3)

where Xl, ... , xm are constant stresses. If m = 2, step-stress is called simple.


Sets of step-stresses of the form (21.3) will be usually denoted by Em.
284 V. Bagdonavicius and M. S. Nikulin

Proposition 21.2.3 If the GS model holds on E2 then the survival function


under stress x(·) E El verifies the equality

(21.4)

the moment ti is determined by the equality SXl (tl) = SX2 (ti).


PROOF. Set

(21.5)

The accumulated hazard rate verifies the integral equation (we denote by g the
function gl in what follows)

(21.6)

The equalities (21.5) and (21.6) imply that for all t 2: tl

and

So for all t 2: tl the functions A x (.) (t) and AX2 (t - tl + tn satisfy the integral
equation
h(t) =a+ it
h
g (X2, h(u)) du

with the initial condition h(tI) = a. The solution of this equation is unique,
therefore we have

It implies the equality (21.4). •

Let us consider a set Em of more general stepwise stresses of the form (21.3).
Set to = O. Using Proposition 21.2.3 and by recurrence the following proposition
can be shown.
Goodness-of-Fit Tests for Accelerated Life Models 285

Proposition 21.2.4 If the GS model holds on Em then the survival function


Sx(o)(t) verifies the equalities:

Sx(o)(t) = SxJt - ti-I + ti-I), if t E [ti-I. ti), i = 1,2,00. ,m, (21.7)

where ti verify the equations

i = 1,0 .. ,m-1. (21.8)

In the literature on ALT [see Nelson (1990)J the model (21.7) is also called
the basic cumulative exposure model.
N. M. Sedyakin called his model the physical principle in reliability meaning
that this model is very wide. Nevertheless, this model and it's generalization
can be not appropriate in situations of periodic and quick change of stress level
or when switch-on's or switch-off's of stress from one level to another can imply
failures or shorten the lifetime.
Let us consider an example which shows how the GS model can be used
for generalization of models for constant stresses to the case of time-varying
stresses. Suppose that under different constant stresses the survival functions
differ only in scale: for any x E EI

Sx(t) = G{r(x) t}. (21.9)

Let us generalize the model (21.9) to the case of time-varying stresses by sup-
posing that the GS model also holds, i.e. the hazard rates under time-varying
stresses are obtained from the hazard rates under constant stresses by the rule
(21.2).

Proposition 21.2.5 The GS model with the survival functions {2109} on EI


holds on E :J EI iff there exist a positive on E function r and a positive on
[0,00) function q such that for all x(·) E E

ax(o)(t) = r{x(t)} q{Sx(o) (t))}. (21.10)

Let us find the expression of the survival function under time-varying stresses
for the model (21.10).

Proposition 21.2.6 The model {21010} holds on a set of stresses E iff there
exists a survival function G such that for all x(o) E E

(21.11)

The model (21.11) is the accelerated failure time model.


286 V. BagdonaviCius and M. S. Nikulin

21.3 Alternatives to the GS Model


21.3.1 Proportional hazards model
In survival analysis the most used model describing the influence of covariates
on the lifetime distribution is the proportional hazards (PH) or Cox model. In
terms of stresses it is formulated as follows.
Suppose that under different constant stresses x E EI the hazard rates are
proportional to a baseline hazard rate:

Ctx(t) = r(x) Cto(t). (21.12)

In the statistical literature the following formal generalization of the PH model


to time-varying stresses is used.

Definition 21.3.1 The proportional hazards (PH) model holds on a set of


stresses E if for all x (.) E E

ax(-)(t) = r{x(t)} ao(t). (21.13)

The model (21.13) is not natural when items are aging under usual constant
stress. Indeed, denote by Xt constant-in-time stress equal to the value of time-
varying stress xC) at the moment t. Then the PH implies that

For any t the intensity under time-varying stress xC) at the moment t does not
depend on the values of stress x(·) before the moment t but only on the value of
stress at this moment. It is not natural when the hazard rates are not constant
under constant stresses, i.e. when times-to-failure under constant stresses are
not exponential under constant stresses. When is the PH model also the GS
model? It is given in the following proposition.

Proposition 21.3.1 [Bagdonavicius and Nikulin (2000)] Suppose that the PH


model holds on the set E including a set of constant stresses EI and all stresses
of the form
( ) _ {Xl,
Xs T -
O:S T < S,
X2, T ~ S,

with S < 8, where 8 is any positive number and Xl,


X2 E EI are fixed stresses.
The as
model also holds on E iff the time to-failure is exponential for all
X E Eo.
Goodness-of-Fit Tests for Accelerated Life Models 287

The proposition implies that on sets of time-varying stresses the PH model is


an alternative to the GS model if the time-to-failure distribution under constant
stresses is not exponential.
Similarly as for the AFT model the PH model for constant stresses can
be generalized to the case of time-varying stresses using the rule (21.2). The
obtained model is different from the model (21.13) and is much more natural.

21.3.2 Model including influence of switch-up's of stresses on


reliability
Suppose that an item is observed under step-stress (21.3) and after the switch-
up at the moment ti from constant stress Xi to constant stress Xi+ 1 the survival
function has a jump:
SX(o) (ti) = Sx(-) (ti-) 6i;

here 6i is the probability for an item not to fail because of the switch-up at
the moment ti. In this case the GS model for step-stresses can be modified as
follows:

(21.14)

where

(21.15)

21.4 Test Statistic for the GS Model


Suppose that a group of no units is tested under the step-stress (21.3) and
m groups of nl, ... , nm items are tested under constant stresses Xl, ... , Xm
(Xl < ... < x m ), respectively. The units are observed time tm given for the
experiment. We write x(·) < y(.) if Sx(o)(t) > Sy(o)(t) for all t > O. The
idea of goodness-of-fit is based on comparing two estimators A~~\ and A~~\
of the accumulated hazard rate Ax(o). One estimator can be obtained from
the experiment under the step-stress (21.3) and another from the experiments
under stresses Xl,··· ,xm by using the equalities (21.7) and (21.8).
Denote by Ni(t) and Yi(t) the numbers of observed failures in the interval
[0, t] and the number of items at risk just prior to the moment t, respectively, for
the group of items tested under stress Xi and N (t), Y (t) the analogous numbers
for the group of items tested under stress x(·).
Set ai = a Xi , a = ax(o), Ai = AXil A = Ax(o) i = 1, ... , m. The first
estimator A(l) of the accumulated hazard A is the well-known Nelson-Aalen
288 V. Bagdonavicius and M. S. Nikulin

estimator obtained from the experiment under the step-stress (21.3):

AA(l)(
t
) = rt dN(v)
Jo Y(v)'

The second is suggested by the GS model [formulae (21.7) and (21.8)] and is
obtained from the experiments under constant stresses:

where

0,
AlA A AlA A
A2 (Al(tl)), ... , ti+l =Ai+2(Ai+l(ti+l-ti+ti)), i=0, ... ,m-2,

AAi 1(s ) = in f{ u: AAi ()


u 2::: S}, AA.• (t) = Jrt0 dNi (V)
Yi (v) i = 1, ... , m.

The test is based on the statistic

(21.16)

where K is the weight function.


We shall consider the weight functions of the type: for v E [ti, ti + 6..ti)

where 9 is a nonnegative bounded continuous function with bounded variation


on [0,1] and n = 2::;'0 ni. The condition Xl < ... < Xm implies that

P{Tn is defined} -+ 1 as ni -+ DO.

21.5 Asymptotic Distribution of the Test Statistic


Let us consider at first the asymptotic distribution of the estimators i;. Denote
by ~ the convergence in distribution. Assumptions A.

(a) The hazard rates ai are positive and continuous on (0, DO);

(b) Ai (t) < DO for all t > 0;

(c) n -+ DO, ndn -+ Ii, Ii E (0,1).


Goodness-of-Fit Tests for Accelerated Life Models 289

Lemma 21.5.1 Suppose that Assumptions A hold. Then


j
v'n(ij - tj) ~ aj L djl{Ul(tl- 1 + ~tl-1) - Ul+1 (ti)}, (21.17)
1==1
where
j-1
djl = II cs , l = 1, ... , j - 1; djj = 1,
s==l
1

U1,"', Um and U are independent Gaussian martingales with Ui(O) = U(O) =0


and
1 1 - Si(Sl /\ S2) 2
Cov (Ui(Sl), Ui(S2)) = -Z S ( ):= O"d81 /\ S2),
i i 81 /\ 82

1 1- S(Sl /\82) 2
Cov (U(81), U(82)) = -Z S( ):= 0" (81/\82)
o Sl /\ 82
with Si = exp{ -Ad, S = exp{ -A}.

PROOF. Under Assumptions A for any t E (0, t m ) the estimators Ai and A(1)
are uniformly consistent on [0, t], and

(21.18)

on D[O, t], the space of cadlag functions on [0, t] with Skorokhod metric.
We prove (21.17) by recurrence. If i = 1 then

v'n(ii - ti) = v'n(A 2 1(A 1(t1)) - A 2 1(A 1(td))


+v'n(A 2 1(A 1(t1)) - A 2 1(A 1(t1))). (21.19)

For any 0 < Sl < S2 < 00 [see Andersen et aZ. (1993)]

(21.20)

U2*( s ) -__ e-SU2(A21 (8))


1
P2(A 2 (8))
and Pi is the density of Tx;. Consistence of the estimator A 1(t1) and the con-
vergence (21.20) imply that

(21.21 )
290 V. Bagdonavicius and M. S. Nikulin

Using the delta method and the convergence (21.18), we obtain

vn{A 21(AI (tl)) - A21(A I(tI)n E. ~( *) UI(tI)


02 tl
= aIUI(tO + ~to).
(21.22)

Thus (21.19), (21.21) and (21.22) imply that

vn(ii - ti) E. a1dl1{U1(to + ~to) - U2(tin.


Suppose that (21.17) holds for i = j. Then similarly as in the case i = 1 we
have

vn(t;H - tj+l)
vn{Aj;2(A jH (ij + ~tj)) - Aj;2(AjH(tj + ~tj))}
= aHI {UHI(tj + ~tj) - Uj +2(tj+In + aHI ~j vn(ij - tj) + ~n,
J

p
where ~n -+ 0 as n -+ O. The last formula and the assumption of recurrency
imply that

vn(ij+1 - tjH) E.
aHI { UH1 (tj + ~tj) - UH2(tj+d

+ ttl: c,{U,(t'_l + l>t'_l) - U,+1(ti)}}


HI
= ajH L dHI,Z{Uz(ti-1 + ~tZ-1) - Ul+I (tin·
Z=l

Let us consider the limit distribution of the statistic Tn. Note that

K (v) P loliH
r,;:;
yn
-+ k(v) = l O+i+1
l S(v) 9 ((lo + li+1)S(V)) , v E [ti, tiHl·

Set

e'J
Goodness-of-Fit Tests for Accelerated Life Models 291

Proposition 21.5.1 Under Assumptions A

(21.23)

PROOF. The statistic (21.16) can be written in the form

(21.24)

where op(l) ~ 0 as n ~ 00. The lemma implies that


m-l
L {k(ti)ai+l(ti) - k(ti + ~ti)ai+1(ti + ~ti)
i=l

m-2 m-l
= L fi+lUi+l(ti + ~ti) --.: L fiUi+1(ti) + op(l). (21.25)
i=O i=l

The formulae (21.24) and (21.23) imply the result of the proposition. •
292 v. Bagdonavicius and M. S. NikuJin
Corollary 21.5.1 Under the assumptions of the theorem Tn E. N(O, at), where

(21.26)

Remark 21.5.1 The variance at can be consistently estimated by the statistic

where

k(v) = K(v)/v'n, a-2 (v) = ~


no
(-J:- -1) ,
S(v)

Sand Si are the empirical survival functions,

io im = 0,
2m-l
ii 2: esdsi i = 1, ... , m - 1,
s=i

s-1
dsi II el, i = 1, ... , s - 1, d ss = 1,
l=i
{Xl+1 (ti + t:,.tl)
{Xl+1(tn
Goodness-oE-Fit Tests Eor Accelerated Life Models 293

and &s+l(i~), &s+l(i~ + tlt s ) are the kernel estimators:

here K er is some kernel function.

21.6 The Test


The hypothesis

Ho : GS model holds on E = {Xl,· .. , X m , x(·)}

is rejected with the approximative significance level a, if

where XI-a(1) is the (1 - a)-quantile of the chi-square distribution with one


degree of freedom.

21. 7 Consistency and the Power of the Test Against


Approaching Alternatives
Let us find the power of the test against the following alternatives:

H*: PH model with specified non-exponential time-to-failure distributions


under constant stresses.

Under H*:
i
A(l)(v) .!: A~l)(v) = Ai+1(V) - Ai+1(ti) + l{i > o} 2) Ai (tz) - Ai(tZ-1)},
Z=l
A(2) (v) .!: A(2)(V) = Ai(V-ti+ti), VE[ti,ti+1), i=O,···,m-1,
1 p
fo,K(v) -7 k*(v), v E [ti, ti+1),
294 V. BagdonaviCius and M. S. Nikulin

where
101i+1S~I)(V)Si(V - ti + tn
k*(v)
10S~I)(v) + li+lSi(V - ti + tn
x 9 (lOS~I)(v) + li+1Si(v - ti + ti)) , (21.27)

and S~I)(v) = exp{ -Ai1 ) (v)}. Convergence is uniform on [0, tmJ.


Proposition 21.7.1 Assume that Assumptions A hold and

~* = fot m k*(v) d{Ai1) (v) - A(2) (v)} t= 0.


Then the test is consistent against H* .
PROOF. Write the test statistic in the form

Tn = 10 00
K(v) d{A(I)(v) - Ai1) (v)} - 10 00
K(v) d{A(2) (v) - A(2) (v)}

+ 10 00
K(v) d{AP) (v) - A(2) (v)} = TIn + T2 n + T3n. (21.28)

Under H*
TIn + T2 n ~ N(O, 0']-,2),
where 0']-,2 has the same form (21.26) with the only difference that k is replaced
by k* and 0'2 (t) is replaced by

(0'
(1) 2 _
) (t) - 10
1(1 )
S(I)(t) - 1 .
Under H* we have
(21.29)
and
Tln:- T2n ~ N(O, 1). (21.30)
O'T
The third member in (21.28) can be written in the form

:L 1
m-l t i+l
T3n = K (v){ G¥i+ 1 (v) - G¥i+ 1 (v - ti + ti)} dv. (21.31)
i=1 ti

The assumptions of the proposition and the equalities (21.28)-(21.31) imply


that under H*
Tn P
o-T
-~oo
.
Thus under H*


Goodness-of-Fit Tests for Accelerated Life Models 295

Remark 21. 7.1 If D:i are increasing (decreasing) then the test is consistent
against H*.
PROOF. The inequalities Xl < ... < Xm imply that ti > ti for all i. If D:i are
increasing (decreasing) then Ll* > 0 (Ll* < 0) under H*. Proposition 21.7.1
implies the consistency 'of the test. •

Let us consider the sequence of approaching alternatives

* PH w~thD:i
Hn: . ()
t = (t)Jn
()i

with fixed E > 0 (i = 1,"" m). Then

T3n
p
-+ /-L = -E t; 1
m-l
ti
t i +1
k*(v) In(l +
t~ - ti
-2-
v-)dv > 0,

and
Tn
-A-
D
N ( a,l), Tn)2
( a-T D 2
-+ -+ X (1, a),
aT
where a = /-LIar and X2(1, a) denotes the chi-square distribution with one degree
of freedom and the non-centrality parameter a (or a random variable having
such distribution).

r
The power function of the test is approximated by the function

J3 = nl~~ P { (~; > Xi-a(1) I Hn} = P {x2(1, a) > xi-a(1)}. (21.32)

Let us find the power of the test against the following alternatives:
H**: the model (21.14) with specified time-to-failure distributions under
constant stresses
Under H**

A(l)(v) ~ A~;)(v) = Ai(V - ti + tt), v E [ti, tHl), i = 0,"', m - 1,

A.(2) (v) ~ A(2)(v) = Ai(V - ti + tn, v E [ti, tHl), i = 0,'" ,m - 1,


1 p
fo,K(v) -+ k**(v),

where k** (v) has the same form as k* (v) with the only difference that si l ) is
replaced by si!)(v) = exp{ -A~;)(v)}. Convergence is uniform on [0, tmJ.

Proposition 21. 7.2 Assume that Assumptions A hold and

Ll** = 10 00
k**(v) d{A~;)(v) - A(2) (v)} =I- O.

Then the test is consistent against H** .


296 V. Bagdonavicius and M. S. Nikulin

Remark 21.7.2 If Cti are increasing (decreasing) then the test is consistent
against H**.

Let us consider the sequence of approaching alternatives

in.
H~*: the model (21.14) with specified time-to-failure distributions under

constant stresses and 8i = 1 -

Similarly as in the case of the alternatives H~ it can be shown that

~; ~ N(a, 1), (~;) 2 ~ X2 (1, 1 a I),

where a = f..L/0"y.* and

The parameter f..L is positive (negative) if the functions Cti are convex (con-
cave).
The power function of the test is approximated by the function (21.32) with
a = f..L/0"y.*.

Acknowledgement. This research was supported by the Conseil Regional


d'Aquitane, Grant 20000204009.

References
1. Andersen, P. K., Borgan, 0., Gill, R. D., and Keiding, N. (1993). Statis-
tical Models Based on Counting Processes, New York: Springer-Verlag.

2. BagdonaviCius, V. (1978). Testing the hypothesis of the additive accumu-


lation of damages, Probability Theory and its Applications, 23, 403-408.

3. BagdonaviCius, V. and Nikulin, M. (2000). Semiparametric estimation


in accelerated life testing, In: Recent Advances in Reliability Theory.
Methodology, Practice and Inference (Eds., N. Limnios and M. Nikulin),
pp. 405-418, Boston: Birkhiiuser.

4. BagdonaviCius, V., Gerville-Reache, L., Nikoulina, V., and Nikulin, M.


(2000). Experiences accelerees: analyse statistique du modele standard
de vie accleree, Revue de Statistique Appliquee, XLVIII, 5-38.
Goodness-of-Fit Tests for Accelerated Life Models 297

5. Nelson, W. (1990). Accelerated Testing: Statistical Models, Test Plans,


and Data Analyses, New York: John Wiley & Sons.

6. Sedyakin, N. M. (1966). On one physical principle in reliability theory,


Technical Cybernetics, 3, 80-87.
PART VI
GRAPHICAL METHODS AND
GENERAL GOODNESS-OF-FIT TESTS
22
Two Nonstandard Examples of the Classical
Stratification Approach to Graphically Assessing
Proportionality of Hazards

Niels Keiding
University of Copenhagen, Copenhagen, Denmark

Abstract: Goodness-of-fit assessment of the crucial proportionality and log-


linearity assumptions of the Cox (1972a,b) proportional hazards regression mod-
els for survival data and repeated events has necessitated several new develop-
ments. This contribution presents two concrete examples of nonstandard ap-
plication of these ideas: in discrete-time regression for the retro-hazard of the
reporting delay time in a multiple sclerosis registry, and in analysing repeated
insurance claims in a fixed time window.

Keywords and phrases: Complementary log-log model, Cox regression


model, goodness-of-fit, multiple sclerosis, non-life insurance, nonparametric
maximum likelihood, renewal process, retro-hazard, semi-Markov process

22.1 Introduction
Survival analysis has contributed a number of specialised approaches to the
general methodology of goodness-of-fit; for recent textbook surveys see e.g.
Andersen et al. (1993, Section VII.3), Hill et al. (1996, Chapter 7), Klein and
Moeschberger (1997, Chapter 11), Hosmer and Lemeshow (1999, Chapter 6) or
Therneau and Grambsch (2000, Chapter 6).
Beyond asserting the role of the specific approaches to goodness-of-fit in
survival analysis and more general event history analysis, the purpose of this
presentation is to report on two recent examples from my own experience, where
the classical stratification approach to graphically assessing proportionality of
hazards was used in nonstandard contexts.

301
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
302 N. Keiding

22.2 Some Approaches to Testing Proportionality of


Hazards
Cox (1972a) proposed the regression model

ACtlz) = Ao(t)ef3z

for the hazard A(tlz) at time t for given covariates z. Here AO(t) is a freely
varying "underlying" hazard which together with the log-linear dependence on
covariates makes the model semi-parametric.
The proportionality assumption is restrictive, but nevertheless often taken
for granted. The most common specific graphical checks and numerical tests
focus attention on one covariate component at a time, i.e. asks whether pro-
portionality holds "with respect to w" in the model

A(tlz, w) = Ao(t)e f3z +,yw.

Cox (1972a) pointed out that his model and the estimation techniques allowed
time-dependent covariates z(t), and he was there primarily motivated by wishing
to test numerically for proportional hazards. As an example in our situation,
this could be done by defining an additional covariate WI (t) = w . log t and
testing the hypothesis c5 = 0 in the extended model

A(tlz, w) = Ao(t)e f3z +,ywHw logt.


In this presentation we link to a different tradition, most obvious for discrete
covariates w. Assume for simplicity w dichotomous, assuming values 0 or l.
Then define the model stratified on w

A(tlz,w) = Aw(t)ef3 z

and check (usually graphically) proportionality of AO(t) and Al (t). Note that
(3 does not depend on w. Under proportionality Al (t) = e'Y AO(t), i.e. Al (t) =
JJ
e'YAo(t) and logAI(t) = "! + log Ao(t) with Aw(t) = Aw(u)du.
Thus, the curves (t,logAw(t),w = 0,1, are parallel, and (Ao(t),AI(t)) is
the line through (0,0) with slope e'Y. The two most common plots are therefore
(t,logAw(t)) and (Ao(t),AI(t)), see Andersen et al. (1993, Section VII.3).
We shall here give two non-standard examples of this basic stratification
approach to testing proportional hazards.
Assessing Proportionality of Hazards 303

22.3 "Proportionality" in Discrete-Time Regression


for Retro-Hazard
Esbjerg, Keiding, and Koch-Henriksen (1999) studied models for reporting de-
lay to the Danish Multiple Sclerosis Registry. As is well known particularly
from the large literature on AIDS epidemiology, reporting delays are by nature
right-truncated, and as discussed in detail by Kalbfleisch and Lawless (1989,
1991), Keiding and Gill (1990), Gross and Huber-Carol (1992) and Keiding
(1992), the retro-hazard is then the most convenient parameterization, as seen
by the following argument.
Esbjerg, Keiding, and Koch-Henriksen (1999) worked in discrete time, with
X denoting the calendar time of onset and T the time between onset of multiple
sclerosis (MS) and diagnosis. T will be called the reporting delay. In the analysis
we consider X and T independent discrete random variables. They are only
observed if X +T :::; x*, where x* is some fixed calendar time when the collection
of data stops. Now condition on X +T:::; x* and X = x, and define T = x* -x.
Because of the independence of X and T

fT(tIT, x) P(T = tlX = x, X + T :::; x*)


f(t)/ F(T)
fT(tIT) 0:::; t :::; T
which is independent of x. It follows that the retro-hazard or the reverse time
hazard defined by

p(t) P(T = tiT:::; t) = f(t)/F(t)


satisfies
fT(tIT) .
p(t)
FT(tIT) ,
that is, the retro-hazard is the same in the marginal distribution of T and the
distribution right-truncated by T.
Esbjerg, Keiding, and Koch-Henriksen (1999) assumed the so-called c log log
link model for the distribution of the reporting delay T for a patient with
covariate vector z given by the retro-hazard

P(T = ulT:::; u) = p(ulz) = 1- exp(-expbu + z'j3)) .


The two crucial assumptions in this model for the retro-hazard are the assump-
tion of proportionality of the logarithm of one minus the hazard function and
linearity in'the covariates of the link-function-transformed distribution func-
tion. In the following the two assumptions will be denoted the assumptions of
proportionality and linearity.
304 N. Keiding

Under the assumed model the ratio between the logarithms of one minus
the hazard for persons i and j is

log (1 - p(UIZi)) - ((. _ ,)'13)


log (1 _ p(ulzj)) - exp Zt zJ

which is independent of time u. It is then clear that the ratio


10g(F(UIZi))
10g(F(ulzj))
is also independent of time. This property resembles the proportional hazards
assumption in the Cox-model. One way to check the assumption for a cat-
egorical time-independent covariate zm with k levels is as follows: Represent
zm by the vector v m = (v m1 , •.. , v mk ) where v ml = I (zm = l) , l = 1" ... .k
Let Z = (zl, ... , zm-l, v m) = (zO, vm), and consider the model where all other
covariates are included and stratify by the levels of zm. The baseline distribu-
tion functions from the different strata are Fo ml , •.. , FOmk and the distribution
function in stratum j for person i is given as

while in the original model it is given as


f3 Otf3 )
F(ulzi) = (Fo(u))ex p( Vi = 1, ... ,k
mj
mjexpZi 0 ,j

with 13m k == O. Plotting log( -log(Foml))"'" log( -log(Fomk )) against time Ul


the distance between the lines log( -log(Foml )) and log ( -log(FomJ) should be
constant and approximately equal to 13mj'
Using these ideas Esbjerg, Keiding, and Koch-Henriksen (1999) showed that
the cloglog-link model was not satisfied for the covariate age at onset, so that
young and old patients had to be analysed separately.

22.4 The Renewal Assumption in Modulated


Renewal Processes
Keiding, Anderson, and Fledelius (1998) studied insurance claims data collected
in the time window from 1 January 1988 to 31 December 1991.
Observing a simple renewal process in an observation window [tl, t2J involves
four different elementary observations:

1. Times Xi from one renewal to the next, contributing the density !(Xi) to
the likelihood.
Assessing Proportionality of Hazards 305

2. Times from one renewal to t2, right-censored observations of F, contribut-


ing factors of the form 1 - F(t2 - T j ) to the likelihood.

3. Times from tl to the first renewal (forward recurrence times), contribut-


ing, in the stationary case, factors of the form (1 - F(Tj - tt)) / f-L to the
likelihood.

4. Knowledge that no renewal happened in [tl, t2], being right-censored ob-


servations of the forward recurrence time, contributing in the stationary
case a factor

In the stationary case the resulting nonparametric maximum likelihood estima-


tion problem can be solved by an EM-type algorithm [Vardi (1982) and Soon
and Woodroofe (1996)]. Keiding, Anderson, and Fledelius (1998) wanted to
generalize the estimation method from iid variables to the modulated renewal
process proposed by Cox (1972b), and also preferred to avoid the stationar-
ity condition required for inclusion of the (uncensored and censored) forward
recurrence times of types 3 and 4.
This is possible by restricting attention to (uncensored and censored) times
since a renewal, that is, observations of types 1 and 2. As discussed repeatedly
by Gill (1980, 1983), see also Aalen and Husebye (1991) and Andersen et al.
(1993, Example X.lo8), the likelihood based on observations of type 1 and 2
is identical to one based on independent uncensored and censored life times
from the renewal distribution F. Therefore the standard estimators (Kaplan-
Meier, Nelson-Aalen) from survival analysis are applicable, and their usual large
sample properties may be shown (albeit with new proofs) to hold.
The above analysis is sensitive to departures from the assumption of homo-
geneity between the iid replications of the renewal process. Restricting atten-
tion to time since first renewal will be biased (in the direction of short renewal
times) if there is unaccounted heterogeneity, as will the re-use of second, third,
... renewals within the time window. As always, incorporation of observed co-
variates may reduce the unaccounted heterogeneity, but the question is whether
this will suffice.
These ideas were implemented on the above mentioned data on insurance
claims over a four-year interval. For property claims Figure 22.1 shows the
Kaplan-Meier estimate (with pointwise 95% confidence intervals) based on ob-
servations of types 1 and 2, (that is, durations after a claim), while the curve
marked "RT-algorithm" is the nonparametric maximum likelihood estimate
based on all four types of observations. As envisioned, the durations after a
claim are shorter than they should be under the stationary renewal hypothesis,
indicating heterogeneity between the insurance takers.
306 N. Keiding

,
...., - Kapian.Meier estimate
0.9
,
..
~, '" ,- -·.. Kt.1 + confidence interval
-'. -'.'" ·"tCt. ...... KM .. confidence Interval
to, '''.

0.8 ",.
-',
<'",
....: :., ..
'., ~~
to '0
...~ .....
0.7 ... ,.. ~'"
I, to,. '
......
'lit'''.-r._ ..
_-- .. .. -......... _- ..... _-.;- . . . ..
..... ... ::-:::-.......................
0.6 ..... .... ......... .......... .. . -.~
... .
~
"
'" ...........
0.5 " ..........
"
'" -, -"
0.4 "
" ,"

0,3

0.2

0.1

o++++++++++++++++++~~++++++++++++++++++

o 2 4 6 8 10 12 14 18 18 20 22 24 28 28 30 32 34 38 38

Figure 22.1: The probability of remaining property-claim free calculated by


the Kaplan-Meier estimate based on durations since an observed claim (with
pointwise 95% confidence limits) and by the nonparametric maximum likelihood
estimate based on all observations in the assumed stationary renewal process. It
is seen that the durations after an observed claim are generally shorter. [From
Andersen and Fledelius (1996)].
Assessing Proportionality of Hazards 307

Keiding, Anderson, and Fledelius (1998) went on to develop a Cox regression


model for a modulated renewal process that could account for some of the
heterogeneity. Here the occurrence of claims of type h for policy holder i at
duration t since last claim of the type is governed by a Cox regression model
with intensity
Ahi(t) = aOh(t) exp [f3~Zhi(t)J Yhi(t)
where aOh(t) is a freely varying so-called underlying intensity function common
to all policy holders i but specific to insurance type h. The indicator Yhi (t)
is 1 if policy holder i is at risk to make a claim of type h at time t, 0 other-
wise. The covariate process Zhi(t) indicates fixed exogenous as well as time-
dependent endogenous covariates. Finally the vector f3h contains the regression
coefficients on the covariates Zhi(t). For this model Dabrowska (1995) proved
asymptotic results for the "usual" profile likelihood based inference, under the
crucial assumption that the covariates Zhi(t) depend on time only through (the
backwards recurrence time) t.
The claim occurrences are viewed through a fixed time window, but under
the model valid inference may be based on the likelihood composed of the
product of contributions from the distribution of time from first to second
claim, second to third claim, and so on, the last being right-censored. The
expected deviation from the model is that time from claim j = 1 is longer than
times from claims j = 2,3, ... Keiding, Anderson, and Fledelius (1998) therefore
extended the model to the Cox regression model

In practice the regression coefficients f3hj and the underlying intensities aOhj (t)
after claim j are assumed identical for j = 2,3, .... A good evaluation of the fit of
the Cox model can be based on first assessing identity of regression coefficients
(f3hl = f3h2) and then, refitting in a so-called stratified Cox regression model
with identical f3hj but freely varying ahOj (t) over j, comparing the undedying
intensities (aohl (t) = aOh2 (t)) after first and after later claims. For the first
hypothesis a standard log partial likelihood ratio test may be performed, for
the second Keiding, Anderson, and Fledelius (1998) documented a series of
graphical checks as surveyed by Andersen et al. (1993, Section VII.3). Further
development of this goodness-of-fit approach might follow the lines of Andersen
et al. (1983).
Based on generalizations of the standard graphs mentioned above, Keiding,
Anderson, and Fledelius (1998) concluded that property and auto claims could
not be described by the postulated modulated renewal processes, while the
model for household claims was not rejected by this approach.
308 N. Keiding

References
1. Aalen, O. O. and Husebye, E. (1991). Statistical analysis of repeated
events forming renewal processes, Statistics in Medicine, 10, 1227-1240.

2. Andersen, C. and Fledelius, P. (1996). Individuel tarifering - En anven-


delse af Cox' regressionsmodel i skadesforsikring, M.Sc. thesis in actuarial
mathematics, University of Copenhagen.

3. Andersen, P. K., Borgan, 0., Gill, R D., and Keiding, N. (1993). Statis-
tical Models Based on Counting Processes, New York: Springer Verlag.

4. Andersen, P. K., Christensen, E., Fauerholdt, L., and Schlichting, P.


(1983). Evaluating prognoses based on the proportional hazards model,
Scandinavian Journal of Statistics, 10, 141-144.

5. Cox, D. R (1972a). Regression models and life tables (with discussion),


Journal of the Royal Statistical Society, B, 34, 187-220.

6. Cox, D. R (1972b). The statistical analysis of dependencies in point


processes, In Stochastic Point Processes (Ed., P.A.W. Lewis), pp. 55-66,
New York: John Wiley.

7. Dabrowska, D. M. (1995). Estimation of transition probabilities and boot-


strap in a semiparametric Markov renewal model, N onparametric Statis-
tics, 5, 237-259.

8. Esbjerg, S., Keiding, N., and Koch-Henriksen, N. (1999). Reporting delay


and corrected incidence of multiple sclerosis, Statistics in Medicine, 18,
1691-1706.

9. Gill, R D. (1980). Nonparametric estimation based on censored obser-


vations of a Markov renewal process, Zeitschrijt fUr Wahrscheinlichkeits-
theorie und verwandte Gebiete, 53, 97-116.

10. Gill, R D. (1983). Discussion of the papers by Helland and Kurtz, Bul-
letin of the International Statistical Institute, 50, 239-243.

11. Gross, S. T. and Huber-Carol, C. (19,92). Regression models for truncated


survival data, Scandinavian Journal of Statistics, 19, 193-213.

12. Hill, C., Com-Nougue, C., Kramar, A., Moreau, T., O'Quigley, J., Senoussi,
R, and Chastang, C. (1996). Analyse statistique des donnes de survie, 2e
ed., Paris: Flammarion.
Assessing Proportionality of Hazards 309

13. Hosmer, D. W. and Lemeshow, S. (1999). Applied Survival Analysis.


Regression Modeling of Time to Event Data, New York: John Wiley &
Sons.
14. Kalbfleisch, J. D. and Lawless, J. F. (1989). Inference based on retrospec-
tive ascertainment: an analysis of the data on transfusion-related AIDS,
Journal of the American Statistical Association, 84, 360-372.

15. Kalbfleisch, J. D. and Lawless, J. F. (1991). Regression models for right


truncated data with applications to AIDS incubation times and reporting
lags, Statistica Sinica, 1, 19-32.
16. Keiding, N. (1992). Independent delayed entry, In Survival Analysis:
State of the Art (Eds., J. P. Klein and P. K. Goel), pp. 309-326, Dordrecht:
Kluwer.

17. Keiding, N., Andersen, C., and Fledelius, P. (1998). The Cox regression
model for claims data in non-life insurance, ASTIN Bulletin, 28, 95-118.
18. Keiding, N. and Gill, R. D. (1990). Random truncation models and
Markov processes, The Annals of Statistics, 18, 582-602.

19. Klein, J. P. and Moeschberger, M. L. (1997). Survival Analysis. Tech-


niques for Censored and Truncated Data, New York: Springer Verlag.

20. Soon, G. and Woodroofe, M. (1996). Nonparametric estimation and con-


sistency for renewal processes, Journal of Statistical Planning and Infer-
ence, 53, 171-195.
21. Therneau, T. M. and Grambsch, P. M. (2000). Modeling Survival Data.
Extending the Cox Model, New York: Springer.
22. Vardi, Y. (1982). Nonparametric estimation in renewal processes, Annals
of Statistics, 10, 772-785.
23
Association in Contingency Tables,
Correspondence Analysis, and
(Modified) Andrews Plots

Ravindra Khattree and Dayanand N. N aik


Oakland University, Rochester, Missouri
Old Dominion University, Norfolk, Virginia

Abstract: Andrews plots [Andrews (1972)], as a tool to graphically interpret


multivariate data, have recently gained considerable recognition. We present
the use of Andrews plots and Modified Andrews plots which are recently in-
troduced by the authors [Khattree and Naik (2001)]' in graphically exhibiting
the association in the cross classified data and in the context of correspondence
analysis. A new alternative to traditional correspondence analysis introduced
by C. R. Rao (1995) and implementation of our approach in this case are also
discussed.

Keywords and phrases: Andrews plots, correspondence analysis, Hellinger


distance, modified Andrews plots

23.1 Introduction
Andrews plots [Andrews (1972)] as a tool for the graphical representation of
multivariate data have recently gained considerable popularity due to the sim-
plicity of their plotting and many desirable and attractive mathematical prop-
erties. Khattree and Naik (2001) provide a review of Andrews and other related
plots along with their merits and demerits.
For a p-dimensional multivariate observation y = (Yl, ... , YP)" the Andrews
curve in argument t is defined by

fy(t) = Y~ + Y2 sin(t) + Y3 cos(t) + Y4 sin(2t) + Y5 cos(2t) + ...


v2
-1f :S t :S 1r. (23.1)

311
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
312 R. Khattree and D. N. Naik

Clearly the p-dimensional observation is then represented by a curve and


hence can be plotted in two-dimensions, without any loss of information, which
usually occurs in other graphical representation such as the scatter plots or the
dimensionality reduction based techniques such as Bi-plots.
A plot containing Andrews curves corresponding to all the multivariate ob-
servations is called an Andrews plot. Khattree and Naik (2001) recently sug-
gested a modified Andrews plot as a plot consisting of curves

gy(t) = ~{Yl +Y2[sin(t) +cos(t)]


+Y3[sin(t) - cos(t)] + Y4[sin(2t) + cos(2t)]

+Y5[sin(2t) - cos(2t)] + ... },

-Jr :S t :S Jr, (23.2)

which has all the properties of the Andrews plots yet avoids some of the short-
comings of Andrews plots. Specifically, the odd numbered terms in the Andrews
plot given in (23.1) simultaneously vanish at t = O. This is a major disadvan-
tage since human eyes tend to concentrate on and notice changes or similarities
more easily and more immediately in the central regions of the graph (that is,
around t = 0); yet, for Andrews plot around t = 0, these similarities or dissim-
ilarities are mostly due to even numbered variables Y2, Y4,' ... This is not so,
for the modified Andrews plots given by (23.2). It can be seen from the fact
that ~[sin(kt)±cos(kt)] = sin (kt ± ~J = sin (k (t ± ,Ik)), which has the phase
angle ±4'k' Thus, none of the terms in (23.2) has the common phase angle and
hence will not vanish at the same point. Therefore, for any t, the value of fy(t)
depends on at least (p - 1), Yi values.
The objective of this article is to use the Andrews plots to graphically rep-
resent the associations in the contingency tables. Often such associations are
depicted through correspondence analysis where the rows and columns of the
contingency table are represented by points in a two-or three-dimensional scat-
ter plot. As mentioned earlier, this may not be completely satisfactory due
to loss of information caused by the dimensionality reduction. Andrews and
modified Andrews plots are clearly the ideal alternatives to avoid such loss. We
describe the approach in the next section.
Modified Andrews Plots 313

23.2 (Modified) Andrews Plots In Correspondence


Analysis
For an a x b contingency table N = (( nij)) the levels of two discrete variables A
and B are represented by rows and columns, respectively. The correspondence
analysis is a mathematical approach to represent these as the points in an
m-dimensional space, where m = min(a, b) - 1. This is done by using the
singular value decomposition of the correspondence matrix Q = ((qij)), where
a b
qij = ~, n = L L nij· Clearly the correspondence matrix Q is the matrix
i=l j=l
of relative frequencies. The above singular value decomposition provides the
representation of each row and column of N as a point in m-dimensional space.
Projections of these points in two or three dimensions provide a way to present
the information graphically. To present things more formally yet briefly, from
the correspondence matrix Q, we define

r=Ql

and

e = Q'l,

where 1 = (1, ... , I)' is a column vector of appropriate dimension, containing


1 as the entry at each place. The vectors rand e are called, respectively, the
row and column masses. We also define

Dr = diag (r),
Dc = diag (e),
and let
F = D;l(Q - re')D~l(Q - re')'.

One can show with little manipulation that


a b n.n.)2
( n··- .:..::!.:.:.::l
ntr (F) = LL ~J ni.n.jn
i=l j=l n

which is the Pearson's chi-squared test statistic for testing independence be-
tween A and B. The quantity TJ = tr(F) is referred to as the total inertia. It is
just a multiple of Pearson's chi-squared and can be interpreted as a measure of
the magnitude of the total row (or column) squared deviations. This interpreta-
tion avails us an opportunity to attempt to partition TJ into several components
314 R. Khattree and D. N. Naik

and allocate these components (which would essentially be the eigenvalues of


F) to various dimensions. This is done using the generalized singular value de-
composition [see, Jobson (1992)] of the matrix Q-re'. In fact, one can directly
take the generalized singular value decomposition of the correspondence ma-
trix Q rather than Q - re'. The two decompositions are essentially equivalent
except for one trivial (and in fact, redundant for our purpose) singular value
which is always equal to 1.
In the traditional correspondence analysis only the first two or three di-
mensions are retained and using these, one explores any associations or other
striking features in the plots. However, the graphical aspect at this stage can be
handled more effectively through (modified) Andrews plots. Instead of working
with projections of all m x 1 row and column points, we instead transform these
points to Andrews or (modified) Andrews curves. Consequently, the informa-
tion available in the data is used to its fullest. Row (or column) points closer to
each other transform to curves nearer to each other. Any exceptional or influen-
tial categories can also be readily identified by these curves. Further, assuming
a reasonable degree of symmetry in the correspondence matrix Q, association
of any row category to any column category will be reflected in the similari-
ties between the corresponding (modified) Andrews curves. Alternatively, the
asymmetric version of (modified) Andrews plots can be used [Greenacre (1984)].
Independence in the contingency table should result in these plots showing no
specific patterns.
It is more instructive to illustrate the techniques, approach and ways of
interpretation through some examples. The next section provides these illus-
trations using certain real data sets.

23.3 Some Examples


Example 23.3.1 Fisher et al. (1982) present a study of reproducibility of
coronary arteriography diagnostic procedure which plays an important role in
deciding if a bypass surgery on a patient may be appropriate. It is thus im-
portant to monitor the quality of arteriography and ascertain the amount of
agreement between the quality control site readings and the clinical site read-
ings. The problem is essentially that of interlaboratory testing in a categorical
data context. Table 23.1 presents the results for 870 randomly selected pa-
tients with two readings. For these readings the degree or intensity of disease
was classified as none, zero vessel disease but some disease and one-, two-, and
three-vessel disease. For identification in plots, these groups for the quality
control site are denoted by 1, 2, 3, 4 and 5, respectively, and for the clinical
site, the notation used are A, B, C, D and E, respectively.
Modified Andrews Plots 315

Table 23.1: Agreement with respect to number of diseased vessels

Clinical Site Reading


Quality Control
Site Reading Normal Some One Two Three
(A) (B) (C) (D) (E)

Normal (1) 13 8 1 0 0
Some (2) 6 43 19 4 5
One (3) 1 9 155 54 24
Two (4) 0 2 18 162 68
Three (5) 0 0 11 27 240

The row and column points y = (Yl, Y2, Y3, Y4)' for the data are obtained by
performing the generalized singular value decomposition and these are reported
in Table 23.2. Using these, we obtain the functions fy(t) and gy(t) for each of
the five rows and columns. These plots are given in Figures 23.1 and 23.2. In
the case of perfect agreement between the clinical site and quality control site
readings, all row points should be the same as the corresponding column points,
and thus the corresponding Andrews curves should be identical also. As seen
in the plots, the agreement between the two readings appears to be quite good
for each of the five classification groups and the (modified) Andrews curves of
the corresponding points follow each other very closely in case of fy(t) as well
as gy(t).

The same argument can be made for the plots in Figures 23.3, 23.4 and
23.5, where the components of vector y have been permuted so that different
Yi are assigned to the constant ~ in the expressions of gy(t) (similar plots
corresponding to fy(t) are not shown here). It is not surprising, since in the
present case, none of the four dimensions can be deemed redundant in that each
explains a significant percent of total inertia. Specifically, these percentages are
43.80, 28.59, 16.31 and 11.30%. However, in many cases, the major portion (up
to 85 or 90%) of the total inertia can be attributed to the first two dimensions.
Whenever this does not happen, as in the present case, the use of only the first
two dimensions to obtain a plot of points will not be very effective compared
to the use of Andrews plots which utilize all dimensions.
316 R. Khattree and D. N. Naik

Table 23.2: Clinical and QC site evaluations: Rowand column points

Points Coordinates

YI Y2 Y3 Y4
Row Normal 3.14513 1.72789 0.91817 1.52739
(QO Site) Some 1.67179 0.25096 -0.32230 -0.97903
One 0.24290 -0.84605 -0.32512 0.27160
Two -0.34541 -0.16330 0.72270 -0.15729
Three -0.61364 0.68014 -0.34912 0.05434

Column Normal 3.12841 1.74992 0.96954 1.71600


(Clinical) Some 1.94382 0.40705 -0.25836 -1.07768
One 0.35725 -0.89110 -0.45605 0.26900
Two -0.26104 -0.32340 0.72042 -0.12930
Three -0.56821 0.59772 -0.26196 0.02836

Example 23.3.2 Data in Table 23.3 are taken from Srole et al. (1978) and
are reported in Agresti (1990) as well. The objective of the study was to
examine the relationship, if any, between the mental impairment and parents'
socioeconomic status. Six levels of socioeconomic status, 1 (high) to 6 (low)
and four levels of mental health status, A (well), B (mild symptom formation),
C (moderate symptom formation) and D (impairment) are taken. Data are
obtained on a sample of 1660 residents of Manhattan. It may be pointed out
that both variables are ordinal in nature.
As per the structure of the data for plotting, we have six row points and
four column points in the three-dimensional space. However, the first dimension
alone is able to explain 93.95% of the total inertia, and the first two dimensions
combined explain almost 99% of that. Thus, the third dimension contains no
significantly useful information for all practical purposes. The ten row and
column points are listed in Table 23.4. The modified Andrews plots are given
in Figures 23.6 and 23.7. The following observations are made from Figure 23.6.

(a) The two variables are ordered categorical in nature and this fact clearly
shows up in Figure 23.6. Also, it can be observed that there is possibly
a further subgrouping within the socioeconomic status. Status 1 and 2
are similar to each other; 3 and 4 are similar and to some extent, there
Modified Andrews Plots 317

6 ..,-- ~ -~ --~ -.-------~.-- - ~1

5 1 \ i

C
4 ~
.2
U
c
~
I..-

~ 1
~
-c
c
0
... -1
-2
-4 -3 -2 -1 o 1 2 3 4

Solid line: Rows; Broken line: Columns


Coeff. of yl is=l s~=t:2

Figure 23.1: Agreement w.r.t. no. of diseased vessels

6
5-
-=-
Ol
I:
~0
4
3
r~;
" 2 " "' ....
'"
::l
"-
1 < ~

"-!! -~ 13
:0
0
:::;
0 i
-1
-2
-4 -3 -2 -1 0 1 2 4

Solid line: Rows; Broker, line: Cc~\..:.rnns

Coeff_ of y1 is=1Isqrt!21

Figure 23.2: Agreement w.r.t. no. of diseased vessels

is some similarity between status 5 and 6. However, as subgroups, these


three subgroups clearly separate themselves out in these plots.

(b) With respect to mental health status, the health status B (mild symptom
formation) and C (moderate symptom formation) appear to be similar,
but are clearly different from A (well) and D (impaired). Further, the
ordinal nature of each health status is self evident in these plots.

(c) There appears to be positive association between the mental health status
of children and parents' socioeconomic status, with mental health of chil-
dren to be generally better at the higher levels of parents' socioeconomic
status. Further, the upper two levels of parents' socioeconomic status
groups generally correspond to a "well" mental health status; the next
318 R. Khattree and D. N. Naik

6 j
'"
......
'"c
11
0
:nc
..."

:ll
"t)

~
'ij
a
:::<
j
" i
-4 -3 -2 -1 0 1 2 3 4

Solid line: Rows, Bral.en ll::e: ':cluren!;.


coeff. of y2 is=iis<;r~i2J

Figure 23.3: Agreement w.r.t. no. of diseased vessels

g;
i::
.~
H
1
1i 0
"
::I
"- -I
"~ -2
'ii -3
0
:;
-4
-5
-4 -3 -2 -1 o 1 2 3 4

Solid line~ Rows; Ercke~ :ir.e: COlU~r.S

Coeff. of y3 is=1/sqrtl2i

Figure 23.4: Agreement w.r.t. no. of diseased vessels

two levels are closely related to "mild" or "moderate symptom forma-


tion" and the lowest two levels of parents' socioeconomic status generally
severely affect the mental health status of children towards impairment.
It may be remarked that ideally one should use the asymmetric plots to
assess the closeness of row and column points. However, in the present
case our regular plot provides the interpretable results.

The plot Figure 23.7 corresponds to the case when Y2 has been assigned
to the constant ~ in the function gy(t). This plot reveals a few additional
interesting observations.
(a) With respect to parents' socioeconomic status, curves corresponding to
groups 1 and 2 show the patterns just opposite to those corresponding to
Modified Andrews Plots 319

~~
i"l
4 -1
'"C
.2 3
;:; 2
t:
:l
U. 1 j
"-=
~
-~ ~
'6
0
::::; -21
=~ l
-4 - 3 -2 -1 1 2

So:id line: RO'-'1S; Broker. Ilne: Colurr.ns

Figure 23.5: Agreement w.r.t. no. of diseased vessels

groups 5 and 6.

(b) With respect to mental health status, groups A (well) and D (impaired)
follow the patterns opposite to each other.

(c) There, possibly, is very little difference between groups 3 and 4 of parents'
socioeconomic status and between groups Band C with respect to mental
health status. Conceivably, in each case, the corresponding two groups
can be combined. If this is done, then there is a positive association
between the combined groups (3, 4) and (B, C). Plots for this situation

r-
are not shown.

0.21
0.16
0; 0.11 .
i: 0.06
.2
;:; 0.01
c:
2 -0.04
., -0.09
"1J

-= - 0 . 14
:;:;
0
::::; -0.19
-0.24
-0.29
-4 -3 -2 -1 0 1 2 4

Solid line: Rows; Broken lice: Co2umns


Caeff. of y1 is=1/sq::c:121

Figure 23.6: Agreement w.r.t. no. of diseased vessels


320 R. Khattree and D. N. Naik

Table 23.3: Cross-classification of mental health status and par-


ents' socioeconomic status

Mental Health Status

Parents' (Mild (Moderate


Socioeconomic Symptom Symptom
Status (Well) Formation) Formation) (Impaired)
A B C D

1 (high) 64 94 58 46
2 57 94 54 40
3 57 105 65 60
4 72 141 77 94
5 36 97 54 78
6 (low) 21 71 54 71

It may be emphasized that so much information could not have been ex-
tracted from these data if traditional two-dimensional scatter plots of corre-
spondence analysis were constructed or if the usual Pearson's chi-squared test
for independence was applied. Further, we note that the suggested modifica-
tion of Andrews plots provides curves which are easier to interpret and are more
informative.

23.4 Modified Andrews Plots and Rao's


Correspondence Analysis
Rao (1995) suggested the use of Hellinger distance instead of chi-squared dis-
tance for measuring the distance between two row (or column) profiles in corre-
spondence analysis. This was done for the two main reasons (a) the chi-squared
distance is not just the function of individual profiles under consideration but
it also depends on the average of all profiles and (b) for the reason given in
(a) and while measuring the affinities between profiles, undue emphasis may be
given to categories with low frequencies. However, one important point to be
emphasized here is that regardless of the choice of distance, (modified) Andrews
plots can still be used in the context of correspondence analysis. To discuss the
Modified Andrews Plots 321

Table 23.4: Mental health and parents' socioeconomic status: row


& column points

Points Coordinates

Row Yl Y2 Y3
(Socioeconomic ) 1 0.180932 0.019248 0.027525
2 0.184996 0.011625 -.027386
3 0.059031 0.022197 -.010575
4 -.008887 -.042080 0.011025
5 -.165392 -.043606 -.010368
6 -.287690 0.061994 0.004824

Column Well 0.259536 -.012102 0.022589


(Mental Health) Mild 0.029588 -.023651 -.019818
Moderate -.014210 0.069901 -.003230
Impaired -.237392 -.018897 0.015848

approach briefly, let R = ((rij)) be the a x b matrix of row profiles defined as


E!!. Ell Ell
PI. PI. PI.
Ell Ell. ~
P2. P2. P2.
R=D;:-lQ =

Pal &a 1!.9Jz.


Pa. Pa. Pa.

Define
h'1
h'2

h'a
where the vector hi is defined as

hi = (y'ril, y'ri2, ... , ..;rib)', i = 1,2, ... ,a.


The Hellinger distance between the ith and jth profile is defined as

8;j = (hi - hj)'(hi - hj),


322 R. Khattree and D. N. Naik

0.4
-:::- 0.3
0:; 6
i: o. 2
"
/
.Q
U 0.1
c:
:>
u..
O. 0
"tl
~~~~~----~~~~d
~"
0
-0. 1 -
,~
:::e - 0 .2 Well
- 0. 3
-4 -3 -2 -1 1 2 4

Solid line; Rows; Broken line; Columns


Coeif. of y2 is=l/sqrt 121

Figure 23.7: Agreement w.r.t. no. of diseased vessels

and as stated earlier, it depends only on the ith and jth profiles.
The canonical coordinates for representing row profiles are determined from
1

ro '
the singular value decomposition of the a x b matrix D~ (H - 1'11), where Rao
suggests the choice of vector. 11 as

11=( fI~
n.l
~' ~, ... , ~
n.2 n.b

or
11 = H'r.
It must be noted that the dimension of the space m, in this case, is the rank
1
of D~ (H - 1'11) and is not necessarily equal to min(a, b) - 1. If the singular
1
value decomposition of D~ (H - 1'11) is
1
D~(H -1'11) = AlUlV~ + A2U2V; + ... + AmUmV~,
then canonical coordinates for row profiles are given by
1 1 1
A1D;2"Ul, A2D;2"U2, ... , AmD ;2"Um ,

and the column profiles are given by standard coordinates

where 1
ac= [diag[(H - l11')'(H - 111')]] 2" .
Modified Andrews Plots 323

These can be plotted in the same plot. If one wishes to have the canonical
coordinates for column profiles and standard coordinates for row profiles, above
formulas can be suitably modified.
We will illustrate the modified Andrews plots based on Rao's correspondence
analysis using a drug efficacy data of Calimlin et al. (1982).

Example 23.4.1 The data are reported in Table 23.5. The objective of the
study was to examine the ratings assigned by the hospital patients to four
drugs, with an objective to determine whether a particular drug or a group of
drugs are favored by the patients. The four drugs are named as Z100, EC4,
C60 and C15 and the five ratings on ordinal scale are from poor to excellent.
We compute here the canonical coordinates for row profiles representing various
drugs and standard coordinates for the column profiles representing the ratings.
The coordinates are not shown here to save space. Also, we present here only
the modified Andrews plots as Figures 23.8 and 23.9.

Table 23.5: Results of a survey on analgesic efficacy of drugs

Ratings of Drug's Efficacy


Drug Poor Fair Good Very Good Excellent Total
Z100 5 1 10 8 6 30
EC4 5 3 3 8 12 31
C60 10 6 12 3 0 31
C15 7 12 8 1 1 29
Total 27 22 33 20 19 121

Figure 23.8 clearly illustrates the closeness of the modified Andrews curves
corresponding to drugs C15 and C60 and that of curves corresponding to EC4
and Z100. In the same plot, one also observes the anticipated clustering of the
efficacy ratings, namely, {Excellent and Very Good} forming one cluster and
the remaining three ratings forming the other cluster. The proximity of EC4
and Z100 to the higher two efficacy ratings and proximity of C15 and C60 to
the lowest three ratings is also noted. This indicates that the EC5 and Z100
are perceived as the superior choices by patients.
Figure 23.9 represents modified Andrews plot when the roles of Yl and Y2
(that is, the scores corresponding to the first and second dimensions) in (23.2)
have been interchanged. This obviously changes the curves. However, in this
plot the similarities and dissimilarities between various drugs and ratings and
associations between the particular drugs and ratings are depicted not by the
relative closeness of corresponding curves but by their particular patterns (in
terms of ups and downs). The curves corresponding to EC4 and Z100 have
324 R. Khattree and D. N. Naik

-2
-4 -3 -2 -1 a 1 2 3 4

Solid line Rows: Broken line: Columns


Coeff. of yl is=1/sqnI2)

Figure 23.8: Drug vs. efficacy rating

--------
2 oj
~

-0: ,,-
;; 1
.Q
"0
c:
::J
L.-
0
"tl
:! .....
'5 -1
0
:::E

-2
-4 -3 -2 -1 o 1 2 4

Solid lIne: Rows: Broken line: Col'Jmns


Coeff. of y2 is=!,sqrL(2)

Figure 23.9: Drug vs. efficacy rating

larger amplitudes compared to the other two drugs. A similar observation is


made about the efficacy ratings. At the same time, it is also observed that the
curves for EC4 and ZIOO and those corresponding to C15 and C60 follow the
opposite directions (when the former ones go up, the latter ones tend to go
down). At the same time such changes for {EC4, ZlOO} and efficacy ratings
{Excellent, Very Good} are in the same direction, thereby indicating the strong
association in this case. A similar statement can be made for drugs {C15,
C60} and efficacy ratings {Poor, Fair, Good}. It must be emphasized that all
these features are not as distinct in the two-dimensional plots (not shown here)
obtained by appropriate dimensionality reduction.
Modified Andrews Plots 325

23.5 Conclusions
Andrews plots as a powerful graphical technique for multivariate data are not
only useful in traditional clustering or outlier detection problems, but they can
also be very important tools in analyzing experimental data in other discrete
or descriptive multivariate analysis problems, such as contingency tables and
correspondence analysis. The fact that there is no loss of dimension, and hence
no approximations before the graphical displays of the data, allows these plots to
he much more informative than traditional scatter plots or biplots. Further, the
modified Andrews plots are even more informative and useful in the graphical
displays of these data.

References
1. Agresti, A. (1990). Categorical Data Analysis, New York: John Wiley &
Sons.

2. Andrews, D. F. (1972). Plots of high dimensional data, Biometrics, 28,


125-136.

3. Calimlin, J. F., Wardell, W. M, Davis, H. T., Lasagna, L., and Gillies,


A. J. (1982). Analgesic efficacy of an orally administered combination of
Pentazocine and Aspirin: With observations on the use of and statistical
efficacy of global subjective efficacy ratings, Clinical Pharmacology and
Therapeutics, 21, 34-43.

4. Fisher, L. D., Judkins, M. P. Lesperance, J., Cameron, A., Swaye, P.,


Ryan, T. J., Maynard, C., Bourassa, M., Kennnedy, J. W., Gosselin, A.,
Kemp, H., Faxon, D., Wexler, L., and Davis, K. (1982). Reproducibility
of coronary arteriographic reading in the coronary artery surgery study
(CASS), Catheterization and Cardiovascular Diagnosis, 8, 565-575.

5. Greenacre, M. J. (1984). Theory and Applications of Correspondence


A nalysis, London: Academic Press.

6. Jobson, J. D. (1992). Applied Multivariate Data Analysis, Vol. II, New


York: Springer-Verlag.

7. Khattree, R. and Naik, D. N (2001). Andrews plots for multivariate data:


Some new suggestions and applications, Journal of Statistical Planning
and Inference, to appear.
326 R. Khattree and D. N. Naik

8. Rao, C. R. (1995). A review of canonical coordinates and an alternative


to correspondence analysis using Hellinger distance, Quwstiio, 19, 23-63.

9. Srole, L. , Langner, T. S., Michael, S. T., Kirkpatrick, P., Opler, M. K.,


and Rennie, T. A. C. (1978). Mental Health in the Metropolis: The Mid-
town Manhattan Study, Revised edition, New York: NYU Press.
24
Orthogonal Expansions and Distinction Between
Logistic and Normal

Carles M. Cuadras and Daniel Cuadras


University of Barcelona, Barcelona, Spain

Abstract: We propose a graphical goodness-of-fit test which consists in repre-


senting a sample along principal dimensions of a random variable. This method
is used to distinguish the logistic from the normal distribution. Some simula-
tions are given.

Keywords and phrases: Maximum correlation, Wasserstein distance, prin-


cipal components, continuous scaling

24.1 Introduction
Given an ordered sample X = {Xl ~ ... ~ x n }, where the x~s are iid observa-
tions of a r.v. X with continuous cdf F, the classical statistics for testing the
hypothesis Ho : F = Fo are based on

Dn = sup IFn(x) - Fo(x)1


-oo<x<oo

and
(24.1)

where Fn is the empirical cdf and 'l1 is a suitable function. Recently, some
goodness-of-fit tests have been proposed by studying the Wasserstein distance
between Fn and Fo

w~ = fal [F;;,-l(u) - FOl(u)]2du

327
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
328 C. M. Cuadras and D. Cuadras

[del Barrio, Cuesta-Albertos, and Matran (2000)]. Another related test is based
on the maximum correlation between the sample and X

where x, J-L are the means, s2, 0"2 the variances and

EHn(X·X) = 11 F;;l(u)Fo-l(U)du. (24.2)

[Cuadras and Fortiana (1993)]. These statistics can be justified as follows. Let
X, Y be two r.v.'s with cdf's F,G and finite means and variances J-Ll,J-L2,O"I,0"§,
respectively. The Wasserstein distance between F, Gis

W2 = inf E(X -
H
y)2 = r [F-l(u) - G- (u)]2du,
Jo
1 1

and the maximum correlation is

p+ = supp(X, Y) = (
H
r F-l(u)G- (u)du - J-LIJ-L2)/0"1O"2,
Jo
1 1

where H is a bivariate distribution with marginals F, G. The minimum for


E(X - Y? and the maximum for the correlation p(X, Y) is reached for the
distribution H+(x,y) = min{F(x),G(yn, and then F(X) = G(Y) (a.s.), i.e.,
we obtain W 2 and p+ when there is a functional relation between X, Y. Thus
W~ and r;t may be used as agreement measures between Fn and Fo. However,
W2 and p+ are related by
W2 = O"t + O"§ - 20"1O"2P+ + (J-Ll - J-L2?'
and the goodness-of-fit tests based on W~ and r;t are essentially equivalent.
In this paper we relate the orthogonal expansion of a r.v. in principal
components to r;t and use this methodology to distinguish two very similar
distributions such as logistic and normal.

24.2 Orthogonal Expansions in Principal


Components
Let X be a continuous random variable with range [a, b] and cdf F. Let (Aj, 'ljJj)
be the countable orthonormal set of eigenvalues and eigenfunctions of the inte-
gral operator K with kernel K(s, t) = min{F(s), F(tn - F(s)F(t)

(Kcp)(t) = lb K (s, t) cp(s)ds,


Orthogonal Expansions 329

i.e., (Aj, '1/Jj) satisfies K'1/Jj = Aj'1/Jj. Cuadras and Fortiana (1995, 2000) proved

l
that defining
x
hj (x) = '1/Jj (s)ds,

then the sequence Xj = hj (X) ,j ::::: 1, is a countable set of uncorrelated random


variables, providing several orthogonal expansions for X, e.g.,

X = Xo + 2:~1 hj(b)(Xj - hj(xo)),


X = Xo + 2:~1 (X] - hj(b)hj(xo)), (24.3)
X = /.Lo + 2:~1 hj (b) (Xj - /.Lj),

where
/.Lo = E(X), /.Lj = E(Xj ).
The sequence (Xj) can be obtained by Karhunen-Loeve expansion on the sto-
chastic process X = {Xt, t E [a, b]}, where X t for each t E [a, bl is the indica-
tor of [X > t] = X-I (t, <Xl), or by continuous scaling on the distance function
8(x, X') = (I x - x' 1)1/2. It can be proved that
1 00

tr(K) = -E[I X - X' Il = LAj,


2 .
J=
1

!
where X, X' are iid and V = E[I X-X' Il is the so-called geometric variability
of X with respect to distance 8. Moreover Aj =Var(Xj ) and each eigenvalue
accounts for the geometric variability V, which is a dispersion measure for X.

lb
Thus
Xl = h1(X) = X t '1/J1(t)dt,

with variance AI, is the first principal component of X. Examples of princi-


pal components hj(X) and the corresponding variances Aj are [Cuadras and
Fortiana (1995) and Cuadras and Lahlou (2000)]:

1. v'2(1 - cos j7rX)/(j'rr) , Aj = 1/(j7r)2, if X is (0,1) uniform and

1 1
L -('
00

tr(K) = )2 = -6'
J7r
j=l

2. [2Jo(~j exp( -X/2)) - 2Jo(~j)] /~jJo(~j), Aj = 4/~J, if X is exponential


with mean 1, where J1(~j) = 0 and Jo, J1 are the Bessel functions of the
first kind of order 0 and 1, and

4 1
L e = 2'
00

tr(K) =
j=l J
330 C. M. Cuadras and D. Cuadras

3. J(2j + 1) jj(j + l)Lj(F(X)), Aj = 1jj(j + 1), if X is standard logistic


with F(x) = (1 + e-X)-l and

1
=L .(" + 1) = 1,
00

tr(K)
j=l J J

where Ll (x) = y'3(2x - 1), L2(X) = J5(6x 2 - 6x + 1), ... are the shifted
Legendre polynomials on [0, 1J.

Motivation for the use of these expansions is as follows:

1. If X is (0,1) uniform and writing the second expansion in (24.3) as


00 U2
X = L
j=l J 7r
'2 J 2'

where Uj = )2(1 - cosj7rX),j ~ 1, is a countable set of uncorrelated


equally distributed random variables, we have a formal analogy with the
expansion
00 y2
W2 = L '/2'
j=l J 7r
where W2 is the limit distribution of the Cramer-von Mises statistic W;,
and Yl, 1'2, ... are iid N(O, 1).

2. If X is logistic, we also obtain a formal analogy with the Anderson-Darling


statistic A; [see Shorack and Wellner (1986, p. 225)]. Such an analogy
may be due to the function \lI in defining (24.1), which is \lI-l = 1 , the
uniform density, for the Cramer-von Mises statistics, and \lI-l = t(l -
t), giving the logistic density f = F(l - F) for the Anderson-Darling
statistics. This suggests that by setting \lI-l = a probability density
function, we obtain a general form for W;(\lI).

3. Let us define the "maximum correlations" rl, r2, ... between the sample
and the principal dimensions Xl, X 2, ... of a r. v. X as the correlations

°
obtained by considering the bivariate cdf Hn(x, y) = min{F(x), Fn(Y)}.
Assuming /-LO = and var(X) = 1, writing the last expansion in (24.3) as
00

X = LPiYj,
j=l

where each }j now has variance 1, it is clear that Pj = p(X, Xj) is the
correlation coefficient between X and the principal dimension X j . The
agreement between Fn and F may then be seen by comparing rl, r2, ... to
Orthogonal Expansions 331

PI, P2, .... Taking the expectation E(X·X), where X is the sample, with
respect to H n , we obtain the following relation between these correlations
and the overall maximum correlation:
CXl

r;t = LPjrj.
j=l

Thus r;t has an expansion like A~ and W~, where Pl,P2, .... are constant
coefficients and rl, r2, ... are random but not independent.

24.3 Maximum Correlation for the Logistic


Distribution
In this section we obtain the maximum correlations for the logistic distribution
and use this goodness-of-fit approach to distinguish this distribution from the
normal distribution.
A. If X follows the standard logistic distribution F(x) = (1 + e-x)-l, then
F-l(u) = -log (u- l - 1) , and

and since J(log(u- l -1))du = -(l-u)log(l-u) -ulogu, after several


simplifications (24.4) can be written as

where
Ai = (n - i) log (n - i) + i log (i) ,
Bi = (n - i + 1) log (n - i + 1) + (i - 1) log (i - 1) ,
with 0 log 0 = O. Thus the maximal Hoeffding correlation between the sample
and the logistic variable X, with mean 0 and variance 71 2 /3, is given by

B. Correlations between X and the principal dimensions.


332 C. M. Cuadras and D. Cuadras

From the expansion for the logistic variable, as hj ( +00) = 0 if j is even, we


have
00 00

X/O" = L2c§j_l L2j-l(F(X))/0" = LPjYj,


j=1 j=1

where 0" = 7r/V3,cj = J(2j + 1) jj(j + 1), Yj = ..)2j + lLj(X) and


if j is even,
Pj ={ ~J3 (2j + 1)/(7rj (j + 1)) if j is odd.

The first four correlations are

PI = 0.997,P2 = 0,P3 = 0.1103,P4 = O.


C. Correlations between the sample X and the principal dimensions.
A tedious computation shows that the covariances mk = EHn(X·F(X)) are
given by
n
mk = L xi(i k+1 - (i - l)k).
i=1

Thus, the first four correlations between the sample X and the principal dimen-
sions are:

rl = v'I2(ml-x/2)/s,
r2 = ..)180 (m2 - ml + x/6) Is,
r3 = .J28 (10m3 - 15m2 + 6ml - x/2) Is,
r4 = J95 (7m4 - 14m3 + 9m2 - 2ml + x/l0) / s,
where x, s are the mean and standard deviation of the sample. The expansion
of r;t is
r+ = 3.. "J3(2j+l)r.
n 7r ~ . ( . + 1) J
J odd J J
D. Plotting the principal dimensions.
Figures 24.1-24.4 give the principal dimensions hi(X),i = 1, ... ,4, where
X is standard logistic. By using the following approximation [Abramowitz and
Stegun (1972)]:

y(p) = ..)-2Iogp
2.515517 + 0.802853y'log(l/p2) + 0.0103281og(l/p2)
1 + 1.432788Jlog(l/p2) + 0.1892691og(l/p2) + .001308(log(l/p2))1.5'

where p is (0,1) uniform, we generate the N(O,I) r.v. Z = y(1 - p). Thus
y = (7r / V3) Z is a normal r. v. with the same mean and variance as X with
Orthogonal Expansions 333

standard logistic cdf F. Then we also plot hi(Y), i = 1, ... ,4, where Y is
normal, see Figures 24.1-24.4.
These plots have been obtained as follows. We write hi(X) in terms of
F(X) = p, and hi(Y) in terms of F(Y), i.e., F(G- 1 (p)), where G is the cdf of
Z. As it is described below, in Figures 24.5-24.8 we perform a similar plot, but·
replacing X, Y by a logistic and a normal sample, respectively.

24.4 Distinction Between Logistic and Normal


Suppose that X is standard logistic and Y is normal with the same mean 0
and variance 7r2 /3. Both distributions are very similar in shape, as noted by
Johnson, Kotz, and Balakrishnan (1995, p. 119), although the value of the
kurtosis (32 is 4.2 for logistic, quite different from the normal (32 = 3. But this
has a small relative effect on the cdf. Consequently, an ommnibus test based
on the Kolmogorov Dn or the maximum correlation r;t may not be conclusive
for deciding whether a given sample comes from a logistic or from a normal
distribution. As a procedure, we propose to examine the correlations between
the sample and the principal dimensions and represent the sample along the
principal axes.
As an illustration, we generated 10 normal and 10 logistic samples of size
n = 50, computed the maximum correlations and represented the curves. The
results for the 20 samples are:
Logistic samples ormal samples
TL TN DL DN LIN Li Gr TL TN DL DN LIN Li Gr
0.983 0.9850.07 0.10 1.7 L L 0.976 0.9830.06 0.08 1.3 L N
0.986 0.9850.23 0.11 1.3 L L 0.982 0.9880.09 0.07 0.7 N N
0.976 0.9740.06 0.10 8.1 L L 0.958 0.9780.08 0.08 0.2 N N
0.975 0.9840.12 0.10 1.5 L L 0.987 0.9880.08 0.07 2.1 L N
0.972 0.9730.08 0.10 4.1 L L 0.975 0.9850.05 0.07 0.6 N N
0.975 0.9800.11 0.09 0.5 N L 0.979 0.9870.07 0.08 1.2 L N
0.961 0.9670.13 0.13 2.6 L L 0.976 0.9860.11 0.10 0.3 N N
0.971 0.9700.07 0.13 5.5 L L 0.983 0.9860.11 0.09 1.7 L L
0.962 0.9580.14 0.09 4.8 L L 0.976 0.991 0.08 0.06 0.2 N N
0.982 0.991 0.06 0.07 0.7 N L 0.983 0.9920.11 0.08 0.4 N N

where:

1. The ten left and the ten right samples are generated as logistic and normal,
respectively.

t, t
2. r r are the maximum correlations and D L, D N are the Kolmogorov
statistics obtained assuming logistic and normal distribution [Stephens
334 C. M. Cuadras and D. Cuadras

(1979)]. LIN is the likelihood ratio. If LIN> 1 the sample may be


logistic (L), otherwise normal (N).

3. Li indicates the decision (L=logistic, or N=normal) according to the like-


lihood ratio LIN.

4. Gr indicates which decision is taken by looking at the plot along the


principal dimensions. The graphical decision is: L (logistic distribution),
N (normal distribution or no logistic).

Thus, for the first logistic sample rt = 0.983 < rt = 0.985 and the fit to
the normal appears better. However, the sample correlations with the first four
dimensions are:

rl = 0.9507, r2 = -0.0574, r3 = 0.2606, r4 = 0.0155,

and the agreement with the theoretical correlations is quite good. For the first
normal sample with the same size we have rt = 0.976 < rt = 0.983.
Figures 24.5-24.8 give a plot of the first to the fourth dimensions. The
continuous line contains the theoretical values, the 6-line is the first logistic
sample and the .-line corresponds to the first normal sample. The fit to the
theoretical line is better for the logistic sample, whereas the normal sample
trend is similar to its theoretical curve, see Figures 24.5-24.8. So both samples
can be identified correctly.
Some conclusions are:

1. r+ may not distinguish between logistic and normal when the sample is
logistic, but provides a correct distinction when the sample is normal.

2. However, DL, DN and LIN are quantities which cannot discriminate


between logistic and normal.

3. The decision from the graphical display of the principal dimensions is


correct in all cases of logistic sample and in nine cases of normal sample.

4. The graphical decision could not be 100% conclusive due to the proximity
between logistic and normal curves, but may help the user when the other
tests are unable to make a clear distinction.
Orthogonal Expansions 335

2

1.5

0.5
1
O~O----~O.~2--~O~.4~-p~O~.6----~O.~8----

Figure 24.1: Plot of the theoretical principal dimensions hI (X), hI (Y), where
X, Y follow the logistic (solid line) and normal (dashed line) distribution re-
spectively

1.2

Figure 24.2: Plot of the theoretical principal dimensions h2(X), h2(Y), where
X, Y follow the logistic (solid line) and normal (dashed line) distribution re-
spectively
336 C. M. Cuadras and D. Cuadras

14}! I
1.21 ,c·
j.l.
i
08+
0.6

0.4 0.6 0.8


P

Figure 24.3: Plot of the theoretical principal dimensions h3(X), h3(Y), where
X, Y follow the logistic (solid line) and normal (dashed line) distribution re-
spectively

00 0.2 0.4 0.6 0.8


P

Figure 24.4: Plot of the theoretical principal dimensions h4(X), h4(Y), where
X, Y follow the logistic (solid line) and normal (dashed line) distribution re-
spectively
Orthogonal Expansions 337

Figure 24.5: First logistic dimension: continuous line. Logistic sample: ~-line.
Normal sample: .-line. Compare to Figure 24.1

12

• t.

••
<!.<!.
<!.

• ••
0.6

t.

Figure 24.6: Second logistic dimension: continuous line. Logistic sample: ~­


line. Normal sample: .-line. Compare to Figure 24.2
338 C. M. Cuadras and D. Cuadras

f
{

1.2

/'" j"
"" •• •
" }. .• ••
0.6 "t. •
/ " •
! t.

""
•• A

Figure 24.7: Third logistic dimension: continuous line. Logistic sample: .6.-line.
Normal sample: .-line. Compare to Figure 24.3

0.8

"' f:

\-'
o

\
Figure 24.8: Fourth logistic dimension: continuous line. Logistic sample: .6.-,-
line. Normal sample: .-line. Compare to Figure 24.4
Orthogonal Expansions 339

References
1. Abramowitz, M. and Stegun, 1. A. (1972). Handbook of Mathematical
Functions, New York: Dover Publications.
2. Cuadras, C. M. and Fortiana, J. (1993). Continuous metric scaling and
prediction, In Multivariate Analysis, Future Directions 2 (Eds., C. M.
Cuadras and C. R. Rao), pp. 47-66, Amsterdam:Elsevier Science Pub-
lishers B. V. (North-Holland).

3. Cuadras, C. M. and Fortiana, J. (1995). A continuous metric scaling


solution for a random variable, Journal of Multivariate Analysis, 52, 1-
14.

4. Cuadras, C. M. and Fortiana, J. (1996). Weighted continuous metric


scaling, In Multidimensional Statistical Analysis and Theory of Random
Matrices (Eds., A. K. Gupta and V. L. Girko), pp. 27-40, The Nether-
lands: VSP, Zeist.

5. Cuadras, C. M. and Fortiana, J. (2000). The Importance of Geometry


in Multivariate Analysis and some Applications, In Statistics for the 21st
Century (Eds., C. R. Rao and G. Szekely), pp. 93-108, New York: Marcel
Dekker.

6. Cuadras, C. M. and Lahlou, Y. (2000). Some orthogonal expansions


for the logistic distribution, Communications in Statistics-Theory and
Methods, 29 (in press).
7. del Barrio, E., Cuesta-Albertos, J. A., and Matnin, C. (2000). Contri-
butions of empirical and quantile processes to the asymptotic theory of
goodness-of-fit tests, TEST, 9, 1-96.

8. Johnson, N. L., Kotz, S., and Balakrishnan, N. (1995). Continuous Uni-


variate Distributions, Second Edition, New York: John Wiley & Sons.
9. Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Ap-
plications to Statistics, New York: John Wiley & Sons.
10. Stephens, M. A. (1979). Tests of fit for the logistic distribution based on
the empirical distribution function, Biometrika, 66, 591-595.
25
Functional Tests of Fit

Denis Bosq
Universite Paris VI, Paris, France

Abstract: In this paper we define and study a class of functional tests of


fit. These smooth tests are associated with projection density estimators. The
Pearson's X2-test belongs to this class since it is associated with the histogram.
Using a multidimensional Berry-Esseen type inequality we obtain the as-
ymptotic behaviour in distribution of the test's statistics. As a consequence we
obtain consistency with rates. Exponential rate holds in a special case.
We also study asymptotic efficiency under adjacent hypothesis and calculate
the Bahadur slope.
Finally we give criteria for choosing a test in the class and present some
numerical simulations.

Keywords and phrases: Test of fit, X2-test , adjacent hypothesis, contigiiity,


projection density estimator, Berry-Esseen inequality, asymptotic efficiency, Ba-
hadur efficiency

25.1 Introduction
In this paper we study a large class of goodness-of-fit tests in a general frame-
work. This class contains the smooth test introduced by Neyman (1937) and
the Pearson's X2-test (1900).
The functional tests of fit (FTF) are based on the deviation of a density
estimator with respect to the true density. They have been considered by
several authors, we may quote Bickel and Rosenblatt (1973), Nadaraja (1976),
Henze (1997), Hart (1997), Gregory (1980), among others.
We now describe the class oftests that is studied in this paper: Let X I, ... ,
Xn be i.i.d. random variables with values in a measurable space (E,8). We
want to test
Ho : Xl has the distribution J..l(XI J..l). ('oJ

341
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
342 D. Bosq

For this purpose we consider an orthonormal system (ejn, j 2: 0) in L 2 (J-L)


with eon = 1 and a sequence (>'jn, j 2: 0) of real numbers such that Aon = 1
and 2:/]=0 AJn < 00. We always suppose that L2 (M) is an infinite dimensional
separable Hilbert space.
Now we construct the kernel
00
Kn(x, y) = L Ajnejn(x)ejn(Y), X,Y E E (25.1)
j=O
and the associated density estimator defined as
1 n
fn(x) = - LKn(x,Xi ), x E E. (25.2)
n i=1
The associated test is based on the statistic
Tn = Vri(fn - 1), (25.3)
it rejects Ho for large values of IITnl1 where I ·11 denotes the L 2 (J-L)-norm. Note
that the X2-test is the FTF test based on the kernel
kn
Kn,o(x,y) = LJ-L(Ajn)-llAjn(x)lAjn(Y) (25.4)
j=O
where (Ajn) is a partition of (E,8).
Kn,o has clearly the form (25.1) since it appears as the reproducing kernel
of Sp(lA on ,"" 1A kn n), a space that contains the constant 1. It is then possible
to rewrite Kn,o under the form (25.1).
The organization of this paper is as follows: Section 25.2 describes the be-
haviour of IITnl1 in distribution under Ho and under the alternative hypothesis
HI. Section 25.3 deals with consistency and exponential rate of convergence.
Section 25.4 provides limits in distribution under adjacent hypothesis while the
choice of Kn is discussed in Section 25.5. Section 25.6 is devoted to local effi-
ciency of the test and efficiency associated with the Bahadur slope. Indications
concerning the proofs are given in Section 25.7. Finally numerical applications
appear in Section 25.S.

25.2 Behaviour of IITnl1 in Distribution


First we give an inequality that provides approximation of Tn's distribution
for fixed n. For this purpose we need some notation and assumptions. We set

~ ( 1 + -(A-3n-~3-J2-=2 ) (A3n i- 0),


sup Ilejn 1100 < 00,
i?O
Functional Tests of Fit 343

where I . 1100 denotes essential supremum,


Ar,n LAJn, r 2: 0,
j>r
00
Un L Ajn~jejn
j=l

where (~j, j 2: 1) is an i.i.d. auxiliary sequence of N(O, 1) random variables.


Now we have the following bound
Theorem 25.2.1 If Xl rv J-L then

sup IP(IITnI1 2:::; a) - P(llUnl1 2:::; a)1 :::; en, n 2: 1, r 2: 4, (25.5)


f?O

with

(25.6)

where Co is a universal constant.

This explicit bound holds in particular for the X 2-test: if r = kn and


J-L(AjJ = (k n + 1)-1, 0:::; J :::; kn, one obtams the bound 3eQ n vn .
. . (k +1)9/2

Now from (25.5) it is easy to derive limits in distribution for IITn112. We


consider two cases:

(B) There exists (rn) such that

Corollary 25.2.1 If A holds, then


00
2 2
I Tnl1 ~ IIUI1 = LAJe· (25.7)
j=l

If B holds, then

(25.8)
344 D. Bosq

We now give an approximation if Xl f'J v 1= I-" for the special kernel.


kn
Kn,l(X, y) = 1 + L ej(x)ej(Y), X,Y E E (25.9)
j=l

where eo = 1, el, e2, ... is a fixed orthonormal system in L2(1-") such that mn =
maxO::;j::;kn lIejlloo < 00.
Concerning v we suppose that the covariance matrix

r v,n (1 ejef.dv - 1 1
djdv ef. dv ) l::;j, f.::;kn

is regular with eigenvalues 1'1 2:: ... 2:: I'k n > o.


Then we have the following bound
Theorem 25.2.2 If Xl f'J v with rv,n regular and if Tn is constructed with the
kernel Kn,l then

(25.10)

where G kn is the distribution function associated with the characteristic function


[det(Ik n - 2itrv, n)]-1/2.

In particular if k n = k, IITnll2 ~ T f'J 1]k where 1]k is the distribution


associated with Gk.

25.3 Consistency of FTF Tests and Rate of


Convergence
In order to study consistency we specify HI, in the general case, by setting

HI = {v: :3 jn(v), n 2:: 1: lim 11 Ajn(v),n 1 ejn(v),ndVI > o} .


If ejn = ej, n 2:: 1 and limAjn > 0, j 2:: 1, this is equivalent to

HI = {v::3 j(V): 1 ej(v)dv 1= o}.


Thus, if (ej) is a complete orthonormal system in L 2(1-"), Ho + HI contains
all the probabilities of the form

v=j·I-"+'Y
where 0 1= f E L2(1-") and'Y is orthogonal to 1-".
Now as usual we will say that the test IITnll > Wn is consistent if
Functional Tests of Fit 345

The following statement gives necessary and sufficient conditions for consis-
tency.
Theorem 25.3.1 If B holds the FTF test is consistent if and only if

Wn- +
-
n
°, (25.11)

and
Wn - L:l AJn
(25.12)
(L:l A]n) 1/2 -+ 00.

Now it is possible to obtain bounds for an and f3n. In the next statement
K~ = Kn - 1 and Wn = cy'ri.

Theorem 25.3.2 (1) We have

2
an < 2 exp (_n _ _-,-c_ -:-:--) (25.13)
- 2a n + (3/2)b n c
with

and
bn = IIKn (X1, ,)1100'
(2) If Tn := I J Kn(X, ·)dv(x) II - c > 0, then

f3n(v)::::: 1- 2exp (-n2a~ + (~)3)b~Tn) (25.14)

where

and
b~ = IIK~(X1,') - EI/K~(X1' ,)1100'
If Kn = L:J=O ej @ ej where k is fixed and l\ejlloo < 00, j = 1, ... , k,
Theorem 25.3.2 shows that the level and the power of the test tend to zero at
an exponential rate. More generally the same property holds if
00

Kn =L Ajej @ ej
j=O
is fixed and bounded.
346 D. Bosq

25.4 Adjacent Hypothesis


In this section we study the local behaviour of FTF tests. We first consider the
simple case where the kernel is
k
K(x, y) = 1 + L ej(x)ej(Y); x,Y E E. (25.15)
j=l

Thus K is the reproducing kernel of a finite dimensional subspace of L 2 (J.L) ,


say [, which contains the constants and such that Ilrplloo < 00 for each rp in [.
The local situation is described by observation of independent r.v. 's Xl n , ... ,
Xnn with common distribution I/n (1/ 2: 1).
We set K' = K - 1 and [' = sp{ el, ... ,ek} and consider the following
conditions:
(Cl) .;n J K'(x, ·)dl/n(x) n->oo
-+ g, with IIgl12 = ,X2 =1= 0,
where convergence takes place in ['.

(C2) r nil (j, f) = J eje.edl/n - J ejdl/n J e.edl/nn~6j,.e, 1 :::; j, f :::; k,


where 6j ,.e = 1 if j = f; 0 if j =1= f.
lf (Cl) and (C2) are satisfied we will say that the sequence (I/n ) is adjacent
to J.L. Note that if

I/n = (1 + hn ) . 1/ (25.16)

where h n E [', n 2: 1 then (Cl) and (C2) may be replaced by

(C) .;nlll- 4J:IIL2(/L) ---7'x =1= 0

a condition that implies contiguity of (I/n ) with respect to J.L [see Roussas (1978)
or Van der Vaart (1998)].
Under adjacency we have the following asymptotic result.
Theorem 25.4.1 If (CJ) and (C2) [resp. (C)] hold then

II Tnl1 2 ~ Qi).) "-' X2(k,'x) (25.17)

where X2(k,'x) is a X2 with k degrees of freedom and parameter,X.


Moreover, if (C) holds with hn = -:}n then

(25.18)

where c > 0 is constant.


Functional Tests of Fit 347

We now turn to the more general case where

kn
Kn(x,y) = 1 + Lejn(x)ejn(Y); X,Y E E, n ~ 1 with (k n ) -+ 00. (25.19)
j=1

Here we replace (C1) , (C2) by

(CD Jk L::j~1 (J ejndvn)2 -+ f i= o.


(C 2) lim L::j,£=1 (J ejne.endvn - J ejndVn J e.e'ndvn - OJ.e)2 < 1.

We then have
kll
Theorem 25.4.2 IjsuPj,n Ilejnlloo < 00, kn -+ 00, ~ -+ 0 then

(25.20)

Theorems 25.4.1 and 25.4.2 allow us to obtain asymptotic power of the


FTF test IITnl1 > W n . For the kernel defined by (25.15) one may use (25.6) in
Corollary 25.2.1 to obtain a test of asymptotic level a by setting w~ = a Xf
where P(Qh(O) > Xf a) = a. The asymptotic power (3 is given by Theore~
25.4.1, we have '

thus

(25.21)

Now for the kernel defined by (25.19) one may use (25.8) in Corollary 25.2.1.
Set w; = kn + v'2knN a then IITnl1 > Wn has asymptotic level a and Theorem
25.4.2 provides the asymptotic power

(25.22)

25.5 Choosing a Kernel


One of the major interests of FTF tests is possibility of choosing a reasonable
kernel that takes into account the alternative hypothesis HI. We discuss such
a choice in this section.
348 D. Bosq

(a) Testing JL against a mixture


Consider the case where

where 1 = fa, iI, ... ,fk are given densities.


In that case a natural kernel should be the reproducing kernel of
sp{l, iI, ... , fk} since Theorem 25.3.2 shows that the associated test converges
at an exponential rate.
A typical example is the Gaussian case where JL = N(mo,0'6) and 1/ =
Ef=o aiN(mi, 0';). Note that it is eventually possible to replace (mi,O';), 0 ~
i ~ k by suitable estimators without changing asymptotic properties of the test.

(b) Choosing a kernel under adjacent hypothesis

Suppose that I/n = (1 fo) .


+ JL where gn -t 9 t= 0 weakly in L 2 (JL), with 9 ..1 1
and take a kernel of the form
g(x) g(y) ~
K(x,y) = 1 + M' M + f;;-/j (x)ej (y) (25.23)

where 1, Q' e2, ... , ek is an orthonormal system.


Then the associated test has a maximal asymptotic power among the class
of FTF tests. One may extend this property by considering various orthogonal
adjacent hypothesis, namely

I/n-i = (1 + gn,i)
Vn' JL,

gn,i- t git= 0, weakly, 1 ~ i ~ k; with 1,gl, ... ,gk orthogonal. This situation
leads to the choice

(25.24)

Concerning the .xjs they can be useful to measure a weight for each "part" of
HI.

25.6 Local Efficiency of FTF Tests


In order to determine local efficiency of FTF tests we consider adjacent hy-
pothesis (see Section 25.5) and use the optimal kernels defined in the previous
subsection. In the second subsection we study efficiency associated with Ba-
hadur slope.
Functional Tests of Fit 349

(a) Local efficiency


Since the yardstick is the Neyman-Pearson (NP) test we first study the asymp-
totic behaviour of the NP test under adjacent hypothesis. Assumptions in the
next lemma are a little more general than (C).

Lemma 25.6.1 If Vn = (1 + eft) /-l where IIgnl1 2 ---t A2 and (c n) is a sequence


of real numbers such that

max(l, c;) max(l, Ilgnll~J


Vn ---t,
°
then

(1) If Xln r-..J /-l, n ~ 1

c~l t -j = 1log (1 + ~gn(Xjn)) + c; IIgnl1 2 £, N r-..J N(O, ,2).


(25.25)

(2) If Xl n r-..J vn , n ~ 1

c~l t -j = llog (1 + ~gn(Xjn)) - c; IIgnl1 2 £, N r-..J N(O, ,2).


(25.26)

This lemma is, in some sense, more general than those generally given since
it does not suppose that (cnllgnI1 2 ) has a limit.
Here we only use Lemma 25.6.1 in the particular case where Cn ---t 1. It
is then easy to see that, if the N.P. test has asymptotic level a EjO,I[, its
asymptotic power is given by

(30 = P(N > No: - A). (25.27)

Now if (C) holds one may use (25.21) and (25.27) to obtain asymptotic
efficiency of the optimal FTF test. We have

P(Qk(O) > X~ 0: - A2)


Ek = P(N > No: '- A) . (25.28)

If k = 1 this efficiency may be written under the form

P(IN - AI > No:/ 2 )


El = P(N) No: - A) , (25.29)

which shows that the FTF test has a good asymptotic behaviour.
350 D. Bosq

(b) Bahadur efficiency


In order to calculate the Bahadur slope of ETF tests we consider a kernel of the
form K = 1 + K' where
00

K' =L Ajej ® ej (25.30)


j=l

with SUPj lIejlloo < 00, IAjl 1 0 and l:j A; < 00. On the other hand we set

~(v) = IIK(x, ·)dv(x) II ,


Then we have:
Theorem 25.6.1 The Bahadur slope of the test defined by IITnl1 ~ Wn is

(K') ~2(v)
cT (v) = ~(1 + 0(1)) as ~(v) -t O. (25.31)
1

Note that a similar result appears in Gregory (1980). Now if v = (1 + h)J.L


with Ihl < 1, J hdJ.L = 0, J h2dJ.L > 0 are obtained

d/')(v) = ~ (~~r (J hejdJ.L)2.


J_

Hence the best kernel of the form (25.30) is


, h h
K h = TIhf~ lihif (25.32)

and

d{'h) = Ilh11 2 . (25.33)

Now the Bahadur slope of the N.P. test is given by

C~N.P.) = 2 J+ (1 h) log(1 + h)dJ.L


[see Van der Vaart (1998, p. 308)]. Thus the Bahadur relative efficiency of the
FTE test based on Kh is

IIhl1 2 (25.34)
EB(h) = 2 n1 + h) log(1 + h)dJ.L .

Note that

lim EB(h) = 1. (25.35)


II hll 00->0
Functional Tests of Fit 351

25.7 Indications Concerning the Proofs


Proof of Theorem 25.2.1 uses the decomposition
rn
Tn = L Ajno'jnej + L Ajno'jnej := Znl + Zn2
j=1 j>r n

where
1 n
o'jn = "2 L
ej(Xj), j 2: 1.
i=1
Concerning Znl one uses Sazonov (1968) inequality that gives the bound
3coMn~. The bound (6 + 2£n + M2)A;(;f is obtained by using Tchebychev
inequality at the order 4. Details which are rather intricate, appear in Bosq
(1980).
Corollary 25.2.1 is an easy consequence of Theorem 25.2.1. Proof of Theo-
rem 25.2.2 is similar to that of Theorem 25.2.1. Theorem 25.3.1 is a consequence
of Theorems 25.2.1 and 25.2.2. Theorem 25.3.2 is easily established by using
exponential type inequalities in Hilbert space [see Pinelis-Sakhanenko (1985)
and Bosq (2000)].
Proofs of Theorems 25.4.1, 25.4.2 and Lemma 25.6.1 are given in Bosq
(1983) .

PROOF OF THEOREM 25.6.1. (sketch) The FTF test is based on the statistics
Un = II~ L~1 K'(Xi, ')11·
The strong law of large numbers in a Hilbert space entails

Un -Ilf K'(x, ')dv(x)ll, va.s.; V E HI. (25.36)

On the other hand the Sethuraman theorem [see Nikitin (1995, p. 23)] implies
1
-logPj.t(Un > c) - - - t £(c), c>O (25.37)
n n-H)Q

where
c2
£(c) =- 2(}2 (1 + 0(1)) as c - 0 (25.38)

with
(}2 = sup Varx*[K'(Xl' .)].
Ilx*II=1
It is easy to see that

(}2 = lambdai. (25.39)


352 D. Bosq

Now we are in a position to use the Bahadur Theorem [see Nikitin (1995,
pp. 6-7)]. We obtain

Ct(v) = :r II! 2
K'(x, .)dV(X)11 (1 + 0(1)) hence (25.31).

25.8 Simulations
The simulations presented below have been performed by Izraelewitch et al.
(1988). The kernel has the form (25.15) where el, ... , ek are Legendre polyno-
mials over [-1, +1].
Here m is the uniform distribution on [-1, +1]. The goal of these simulations
is comparison between the power of X2 test and the power of the FTF test based
on the Legendre polynomials under various alternatives.
For each alternative the problem is transported over [-1, +1] by putting

where F is the distribution function of v.


The power of each test appears in ordinates while the number of cells in the
X2 test and the degree of the Legendre polynomial used appears in abscissas.
In general the smooth test is better than the X 2-test especially if the vari-
ance is greater under the alternative than under Ho.
100
90
80
70
60
50~
40
30
20
" ... -----------~
10

2 3 4 5 6 7 8 9 10

Figure 25.1: H : N(O, 1); Ha : N(O, 25/16); n = 50


Functional Tests of Fit 353

100
90
80 /
70 ____
60
50
40
30
20
10

2 3 4 5 6 7 8 9 10

Figure 25.2: H : N(O, 1); Ha : N(O, 25/16); n = 100

tOO
90
80
70 '-----"
60
50
40
30
20
10

2 3 4 5 6 7 8 9 10

Figure 25.3: H : N(O, 1); Ha : N(O, 5,1); n = 50

100
~----------
90
80
70
60
50
40
30
20
10

2 J 4 5 6 7 3 9 :0

Figure 25.4: H : N(O, 1); Ha : N(O, 5,1); n = 100


354 D. Bosq

100
90
, ,
80 I \
I
70 I ,,
I

60 I
I

I
50 I
/
I
40 I

30

2 3 4 5 6 7 8 9 10

Figure 25.5: H: CAUCHY (0,1); Ha: STUDENT (25); n = 50

100
90
----- , ,
80
70
60
50
40
30
20
10

2 3 4 5 6 7 3 9 10

Figure 25.6: H: U(O, 1); Ha: BETA(3/2,1)

100
90
80
70
60
50

/-
40
30
20 ,,
10 -- -- ... , -.' _.. -"" --- , ,
2 3 4 5 6 7 8 9 10

Figure 25.7: H : N(O, 1); Ha : 0, 9N(0, 1) + 0, IN(O, 25); n = 50


Functional Tests of Fit 355

100
90
80
70
60
50
'-',\
40 I
\ , ... - ....

,. "
30 \
..... "
,, ,
~

20 '--
10

2 3 4 5 6 7 8 9 10

Figure 25.8: H : N(O, 1); Ha : 0, 8N(0, 1) + 0, 2N(0, 0, 04)

References
1. Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the
deviations of density function estimates, Annals oj Statistics, 1, 1071-
1095.

2. Bosq, D. (1980). Sur une classe de tests qui contient le test du X 2 , Publ.
[SUP fasc. 1-2 p. 1-16.

3. Bosq, D. (1983). Lois limites et efficacite asymptotique des tests Hilber-


tiens, Stat. et Anal. des donnees, 8, 1-40.

4. Bosq, D. (1989). Tests du X2 generalises. Comparaison avec le test du


X2 classique, Rev. Statist. Appliquee, XXXVII (1), 43-52.

5. Bosq, D. (2000). Linear Processes in Function Spaces - Lecture Notes in


Statistics, New York: Springer-Verlag.

6. Gregory, G. G. (1980). On efficiency and optimality of quadratic tests,


Annals oj Statistics, 8, 116-131.

7. Hart, J. D. (1997). Nonparametric Smoothing and its Applications in


Lack-oj-Fit Testing, New York: Springer-Verlag.

8. Henze, N. (1997). Do component of smooth tests of fit have diagnostic


properties? Metrika, 45, 121-130.
356 D. Bosq

9. Izraelewitch, E., Lafitte, I., Lavault, Z., and Roubert, B. (1988). Le test
du X2 et Ie test de Legendre, Projet ISUP - Paris.

10. Nadaraja, E. A. (1976). A quadratic measure of the deviation of a density


estimator, Theory of Probability and its Applications, 21, 843-850.

11. Neyman, J. (1937). Smooth test for goodness of fit, Skand. Aktuar, 20,
119-128.

12. Nikitin, Y. (1995). Asymptotic Efficiency of Nonparametric Tests, Cam-


bridge University Press.

13. Pearson, K. (1900). On the criterion that a given system of deviations


from the probable in the case of correlated system of variables is such
that it can be reasonably supposed to have arisen from random sampling,
Philosophical Magazine, 50, 157-175.

14. Pinelis, I. F. and Sakhanenko, I. (1985). Remarks on inequalities for large


deviation probabilities, Theory of Probability and Analysis, I, 157-214.

15. Roussas, G. (1978). Contiguity of Probability Measures, Cambridge Uni-


versity Press.

16. Sazanov, V. V. (1968). On the multidimensional central limit theorem,


Sankhya, Series A, 30, 191-204.

17. Sazanov, V. V. (1968). On w 2 criterion, Sankhya, Series A, 30, 205-210.

18. Tenreiro, C. (2000). On a class of integral statistics for testing goodness


of fit, Preprint, University Coimbra, Portugal.

19. Van der Vaart, A. W. (1998). Asymptotic Statistics, Cambridge Univer-


sity Press.
26
Quasi Most Powerful Invariant Tests of
Goodness-of-Fit

Gilles R. Ducharme and Benoit Frichot


Universite M ontpellier II, M ontpellier, France

Abstract: In this chapter, we consider the problem of testing the goodness-of-


fit of either one of two location-scale families of density when these parameters
are unknown. We derive an O(n-l) approximation to the densities of the maxi-
mal invariant on which the most powerful invariant test is based. The resulting
test, which we call quasi most powerful invariant, can be applied to many situ-
ations. The power of the new procedure is studied for some particular cases.

Keywords and phrases: Most powerful invariant test, Laplace approximation

26.1 Introduction
Let Xl, ... , Xn be i.i.d. observations from a real random variable X with density
f (.) on lR. We consider the problem of testing

1 ('-1-")
'Ho : f(·) = ;fo --;- against 'HI: f(·) 1 ('-1-")
=;h --;- (26.1)

where fo and h are two densities with known form and (1-", 0") E lR x lRt are
the location-scale parameters. If (1-",0") are given under both hypotheses, the
Neyman-Pearson Lemma gives the most powerful test for (26.1). Otherwise,
it is natural to restrict attention to the class of tests invariant to the group of
affine-linear transformations

9= {Xl, ... , Xn --t CXl + b, ... , CXn + b; (c, b) E lR; x lR}.

Lehmann (1959) gives in this context the most powerful invariant (MPI) test
for (26.1), who rejects 'Ho for large values of log(ql/qo) where, for j = 0,1 and

357
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
358 G. R. Ducharme and B. Frichot

ej the log-likelihood of the data

(26.2)

(26.3)

which is proportional to the density of the maximal invariant linked to g. The


quantity (26.2) has been obtained explicitly for few distributions only: normal
- Hajek and Sidak (1967), uniform and exponential - Uthoff (1970), double
exponential - Uthoff (1973), Cauchy - Franck (1981). Otherwise, it is com-
plicated to calculate (26.3) and this impairs the use of this approach. In this
work, we propose an approximation of (26.2) which allows us to circumvent this
difficulty.

26.2 Laplace Approximation


The approach developed here is based on the Laplace approximation for inte-
grals, which can be found in for example Barndorff-Nielsen and Cox (1989).
Let D = lR x lRt, (flj, OJ) = arg max ej (/1,0-) the maximum likelihood estima-
(J.L,a)ED
tors (MLE) and £j(/1, 0-), the Hessian matrix of ej, supposed non-singular. It's
existence is assured by regularity conditions on fJ.

e
Proposition 26.2.1 If the derivatives of order 2 of j are continuous in the
neighborhood of (fl, 0") and if (fl, 0") are in the interior of D, then

If fl is located at a border of D, we can adapt the approximation. For


example, if D = (-00, flj] x lRt and if Oej(/1, u)jO/1 < 0 on D, we find the
following corollary :

Corollary 26.2.1 If oej jO/1 and o2ej jou 2 are continuous in the neighborhood
of (flj,O"j) in D, (26.2) can be written
Quasi Most Powerful Invariant Tests of GOF 359

26.3 Quasi Most Powerful Invariant Test


The preceding results suggest to approach the MPI test by the test that rejects
Ho for large values of IOg(lhNo). We call this test quasi most powerful invariant
(QMPI) because the statistic of the test is the approximation to the order n- 1
of the statistic of the MPI test and is invariant to the action of g.

• Normal distribution: the exact value of (26.2) is obtained by Hajek and


qn-1) 1 n
Sidak (1967). They give qN = 2nn/2C7T"<3-~)(n-1)/2 with 0- 2 = ;;; ~(Xi - ft)2 and
exp( -!?:)
ft = x. Proposition (26.2.1) gives qN = J22n(27r) (2~/2
n- IJ n-
1)'
A ( Going from qN
to qN is made by Stirling's formula .

• Logistic distribution: the logistic case is interesting because the density of


the maximal invariant has not been calculated to our knowledge. We have

(26.6)

(26.7)

Proposition (26.2.1) gives the approximation

A eC(it,&-) (27r)
qL = no-vlAB _ C21
with (ft, 0-) the maximum likelihood estimators of (p" IJ) and

A
1 2 n exp (2~)
0- 2 - n0- 2 ~ (1 + exp (Xi-;/ ))2
(26.8)

~ n [( ~) exp (~)]2
B
A2L
nlJ i=l 1 + exp (')
Xi-;;'""
(26.9)

C (26.10)
360 G. R. Ducharme and B. Frichot

The QMPI test of the hypothesis Ho of normal distribution against a logistic


distribution under HI rejects Ho for large values of log (fjL NN ). The following
Monte Carlo simulations show the gain of power of the QMPI test as com-
pared to some classical tests for normality (Jb1: kurtosis, b2: skewness, SW:
Shapiro-Wilks, K S: Kolmogorov-Smirnov, CvM: Cramer-von Mises) described
in D'Agostino and Stephens (1986).

Table 26.1: Empirical power of tests of normality


based on 10,000 samples of size n = 50 from a logistic
distribution
level 0: Jb1 b2 SW KS CvM QPPI
0.10 24.7 27.3 24.1 12.4 14.1 38.6
0.05 17.8 21.3 17.0 6.4 5.1 31.4

• Exponential distribution: Uthoff (1970) has obtained qE = f(n_ - ~) with


nn(Jn-
-nv'21f
a- = x - jj and jj = min{xI, ... , x n }. Corollary (26.2.1) gives liE = ~3/2a-:~.
Here again, Stirling's formula shows the closeness of the approximation. Sup-
pose now that we want to test an exponential distribution against an alternative
of normality. The QMPI test rejects Ho for large values ofloga--log 0-. One find
here the test of Shapiro and Wilk (1972), which explains its good properties.
We can remark here that we have log(qN/qE) = log(IiNNE) + an where an
doesn't depend on the data but only on the size n of the sample. This shows
that the QMPI test gives exactly the same power than the MPI test in this
case.

References
1. Barndorff-Nielsen, O. E. and Cox, D. R. (1989). Asymptotic Techniques
Jor Use in Statistics, New York: Chapman & Hall.
2. D'Agostino, R. B. and Stephens, M. A. (1986). Goodness-oj-fit Tech-
niques, New York and Basel: Marcel Dekker Inc.
3. Franck, W. E. (1981). The most powerful invariant test of normal versus
Cauchy with applications to stable alternatives, Journal oj the American
Statistical Association, 76, 1002-1005.
4. Hajek, J. and Sidak, F. (1967). Theory oj Rank Tests, New York: Acad-
emic Press.
Quasi Most Powerful Invariant Tests of GOF 361

5. Lehmann, E. 1. (1959). Testing Statistical Hypotheses, New York: John


Wiley & Sons.

6. Shapiro, S. S. and Wilk, M. B. (1972). An analysis of variance test for the


exponential distribution (complete samples), Technometrics, 14,355-370.

7. Uthoff, V. A. (1970). An optimum test property of two well-known sta-


tistics. Journal of the American Statistical Association, 65, 1597-1600.

8. Uthoff, V. A. (1973). The most powerful scale and location invariant


test of the normal versus the double exponential, Annals of Statistics, 1,
170-174.
PART VII
MODEL VALIDITY IN QUALITY OF LIFE
27
Test of Monotonicity for the Rasch Model

Jean Bretagnolle
University of Paris-Sud, Orsay, France

Abstract: Sums of independent non-identically distributed indicators have


been studied by numerous authors. We give an application of them to some
questionnaire models.

Keywords and phrases: Concentration, monotonicity test

27.1 Results in the Literature


Let Ai, i =
1,2, .... n be n independent events with respective probabilities
Pi. Let S = be the number of success, with mean np = ~Pi, where P is
~lAi
the mean probability of success. Let T, with probability law Bin(n,p), have
a binomial distribution with parameters n et p. For these two variables, we
define:
their equal mean, E(S) = E(T) = np,
their median p,(S), p,(T) (definition p,(X) = Inf(kJP(X ~ k) > 1/2)),
their modes m(S), m(T) (definition m(X) = ArgmaxP(X = k)).
Let k, k' be two integers bounding np : k = sup(kJk ~ np), k' = inf(kJk :2:
np). When X = S and when X = T, p,(X) equal k or k', m(X) equal k or k'.
This result is known for the binomial distribution, and its extension to S
is due to Samuels (1965) for the mode and Hamza (1995) for the median, who
give the more precise following result Jp,(S) - E(S) J ~ log 2.
In another respect, Hoeffding (1955) shows that:
Vx < k, P(S ~ x) ~ P(T ~ x) , Vz > k', P(T:2: z) ~ P(S :2: z). (27.1)
This last result is about concentration around the mean. A consequence
(more easy to prove directly) is:
if r.p is convex, Er.p(S) ~ Er.p(T). (27.2)

365
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
366 J. Bretagnolle

27.2 Extension of Hoeffding Result


Now let Aij, with probability Pij, be independent, i E Ij, j E J; for convenience,
we suppose by now that I j = I does not depend upon j, and that I, J are the
cardinals of those sets. Let Sj be the number of successes in the column j and
Tj the corresponding Bi(I,pj) where Ipj = ~iPij; finally S is the sum of Sj,
and T that of Tj. Let P the probability"mean mean" defined by I x J x P =
~jPj = ~ijPi,j and T' the binomial Bi(I x J,p).
The previous result gives immediately that:
S is more concentrated around it's mean than T'. But Var(T') is in general
greater than Var(T).
Combining these three results of 27.1 we can extend, with a loss of only a
lag of one unit, the concentration result.
Let, K, and K' be two equal or consecutive integers bounding ET = I x
J x P = I X ~Pi = ~i,jPi,j. Then, we have:

'\Ix < K-1, P(S~x) ~P(T~x),


'\Iz > K' + 1, P(T 2 z) ~ P(S 2 z)(1) (27.3)

and, of course, for every i.p convex of J variables

(27.4)

Then, to evaluate the level of a test about the partial sums,we can increase it
under the additional hypothesis, that in every column,the Pij does not depend
on i:
- for the tests on the total sum

- for the tests on a convex function of the partial sums, conditionally to


replacing its exact level by its convex majoring (particularly, for the Large
Deviation approximation).

27.3 A Questionnaire Model


I persons must answer to J questions (independent answers).The model as-
sumption is that exact response probability, the Pi,j ,could be stochastic, but
conditionally to their abilities Oi, must verify: For a permutation cr not depend-
ing on i, Pi,a(j) is monotone on j which imply (with previous notations) the
weaker hypothesis: For a permutation cr, Pa(j) is monotone on j.
Monotonicity for Rasch Model 367

(In the Rasch Model, we suppose that the odds ratio A = pl(l - p) can be
written Aij = exp(Bi - /3j).)
The good ordering of the questions correspond to the case with () = id.
We have one observation, (ei,j) (independent conditionally to the Ai,j), with
law B(l,Pi,j) (in the following, we consider the A as deterministic). Let Sj be the
score of the question j. We want to test the hypothesis () = id against () =I- id.
(Actually, as there is an equal number of parameters and observations,we cannot
estimate them.) Let then
Ho: The Pj are increasing on j. (27.5)

Let us consider the associated problem, where the independent Tj follow re-
spective laws B(I,pj). Note HA the additional hypothesis:

HA : Pi,j does not depend on i. (27.6)

The natural estimates are the pj = Sjll. But under Ho and HA, we can use
another estimate called isotonic:
Let Fj be the cumulative suit: Fo = 0 , ... , Fj = Fj-1 + Sj,'" .
The maximum likelihood estimate, under monotonicity constraint, is con-
structed as follows: Let Gj be the bigger convex function smaller than Fj. The
estimate is defined as
(27.7)
By nodes suit we mean the suit of integers whereG and F coincide, and by
structure of the observation, the statistic
:E = (U, VO = O,Fo = 0, v1,Fyu "', Vu = J,FJ) (27.8)

where U is the number of nodes, v the positions of these nodes, the F y = G y


the values in these nodes.
Among conceivable tests, we particularly study two, the first of type L1, the
second of type Kolmogorov-Smirnov, based on the scores.
(27.9)

(note that Fj - Gj ~ 0 by construction) with rejection set IF L ~ C, F L ~ C ' .


(A) Let Ii = EFj, ~ = Sup(Fj - Ij), 6 = Sup(Ii - Fj). Under Ho, I is
convex, and then F L :S 6 +~. Using the Paul Levy inequalities and the
control of the medians, it follows that

P(6 + ~ > x) :S 4P(FJ - fJ > x - 2) + 4P(fJ - FJ > x - 2)


[this upper bound is asymptotically precise, see formula 2 on Page 146
of the Borodin and Saalminen (1996)]. Finally, using the concentration
result, it follows, in general, for x ~ 3 that
P(FL > x) :S 4P(T ~ x - 3) + 4P(T < 3 - x) (27.10)
368 J. Bretagnolle

where the right-hand depend only on the Pj' We will then get a good
approximation by substituting the fiJ or the Pj.
This upper bounding-evaluation is valid underHo , but unfortunately, it
is very conservative because the inequality F L ~ <5 + ~ is very bad (in
a similar problem of isotonic estimation of a density or a regression, the
order of size is not the same).

(B) The statistic FL is convex on the scores. Note p(pj) any probability
with mean (Pj), and P(pj),A the binomial probability with the same mean.
Then, for any c.p increasing convex, we have

(27.11)

We can check by simulation that the right-hand satisfies an inequality of


the form

(27.12)

where M is of order 2. Moreover, using the crude approximation described


in the last section which gives the order of the queue, we can justify that
the factor M must be lower than e. But the loss of this factor is, again,
too large.

(C) Applying the bootstrap, we replace the distribution by the distribution P


(which depends only on ~), and not on the distributionP.

We simulate under H A, (that is, the binomial simulation) keeping only


those observations which verify ~(Z) = ~(X). Note that m the number of
such observations.
Let j5~,m be the empirical obtained. j5~,m (I F( Z) 2: C) = a gives the
threshold C = C(~,a) of a test with level a (as for the test about IFL). (We
do not try to construct the test when the structure satisfies U = J + 1, i.e., when
the Xj, and then the two estimates, are increasing). We repeat the operation e
times, under the same probability Pij, of course.

27.4 Simulations about the Level in the Conditional


Test Case
Our simulation shows that, with equal Pj, the level is largely dominated by
the level under the additional hypothesis, as the concentration inequality try
to involve it.
Monotonicity for Rasch Model 369

Under H A, the bootstrap level is very good, but slightly larger than the
theoretical level when the Pj are constants, which is not surprising for special-
ists who well know that the isotonic estimation is most problematic when the
estimate is not strictly monotonic. This phenomenon will correct itself when
I increases (else, the test is conservative). Let a the theoretical level, a the
bootstrap level.
For J = 4,5,6, f = 400, m = 200, by I = 10, for the two tests, with Pj
constant

a = 5%, a - a < 1 . 9%
a = 10%, a - a < 3%
a = 20%, a - a < 4%.

27.5 Simulations about the Power under H A


We wanted to compare our two tests to Gaussian tests based on the following
well-known asymptotic property: if we let Aj = 2ArcsinJXj/I, then Aj ~
Pj + Nj, where the Nj are independent N(O,l), which allows us to test the
monotonicity of the means. But, even for I = 40 or 80, this approximation
turned out to be very bad (for instance the distribution of the SUPj( (Aj -
Aj+1)+ ) or of their sums are roughly false).

27.6 Conclusion
Our present applications are promising, but do not include the expected result
which could be:

for tests about the gap between the observation and its isotonic, the
maximum level is obtained under the binomial case, and that will
allow us to use a test function only of the scores.

References
1. Borodin, A. and Salminen, P. (1996). Handbook of Brownian Motion
Facts and Formulae, Basel: Birkha,user Verlag.
370 J. Bretagnolle

2. Hamza, K. (1995). The smallest uniform upperbound on the distance be-


tween the mean and the median of the binomial and Poisson distributions,
Statistics and Probability Letters, 23, 21-25.

3. Hoeffding, W. (1955). On the distribution of the number of successes in


independent trials, Annals of Mathematical Statistics, 27, 713-721.

4. Samuels, S. (1965). On the number of successes in independent tri-


als,Annals of Mathematical Statistics, 36, 1272-1278.
28
Validation of Model Assumptions in Quality of
Life Measurements

A. Hamon, J. F. Dupuy, and M. Mesbah


University of South Brittany, Vannes, France

Abstract: The measurement of Quality of Life (QoL) has become an increas-


ingly used outcome measure in clinical trials over the past few years. QoL can
be assessed by self-rated questionnaires, i.e. a collection of items intended to
measure several characteristics of the QoL. When new tests are developed, the
goal is to produce a valid measure. The measuring process needs to construct a
statistical model and then try to approach it in the best way. Different statis-
tical "methods," in fact models, are used during the validation process of QoL
questionnaires. In this paper, we will present the hypothesis underlying each
model and which tests are used in practice.

Keywords and phrases: Measurement model, reliability, Cronbach alpha


coefficient, Rasch model, goodness-of-fit tests

28.1 Introduction
For the past ten years, almost all major clinical trials have included programs
designed to evaluate Quality of Life (QoL). Now that the data from these trials
are being analyzed as part of the results, we are faced with the validity of the
instrument which produced such measures.
"Increasing and sometimes indiscriminate use of QoL measures has provoked
concern about these method in the (health) context, especially when important
consequences, such as treatment decisions or resource allocation, depend on
them" [Cox et al. (1992)]. Generally we can assume that QoL is a complex
notion which can be divided in multiple components. Each component is related
to a specific domain, such as sociability, communication or mobility, and is

371
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
372 A. Hamon, J. F. Dupuy, and M. Mesbah

evaluated using a set of well-chosen items. Hence a QoL instrument is composed


of different subsets of items referred to as dimensions.
The validation process needs to construct a statistical model and then to test
its goodness-of-fit. In practice, the validation of the QoL instrument is mainly
based on a survey where the QoL questionnaires are assessed by people chosen
randomly from the population of interest. The statistical validation consists
generally of the following steps: 1- assumption of an underlying stochastic model
generating the individual responses to the form; 2- checking the validity of the
model or some of its specific properties by examining the real data obtained.
This step could be done by an adequate goodness-of-fit test. In practice, the use
of goodness-of-fit test is unfortunately limited. Graphical exploratory methods
and computing of some well-known measure of association mostly correlation,
are more popular. Classical methods are based on mixed linear models. We
present in Section 28.2, the most popular measurement model used in this field:
the parallel model. Modern methods use a mixed generalized linear model, and
the Rasch model is since 1960 the most frequently used measurement model.
After presenting briefly this model, various goodness-of-fit tests are presented.

28.2 Classical Theory


We denote by Xij the response of person i (i = 1, .. , n) to item j (j = 1, .. , k).
One of the most popular model is a mixed one way models with the item as
random factor:
(28.1)
In this equation, the random variables ai and eij are uncorrelated. They have
an expectation equal to 0 and their variances are respectively denoted by (J~ and
(J~. The variable Tij = J1j + ai is called the true measure so that the observed
score is the sum of the true measure and the measurement error eij.
The reliability p of the instrument is defined as the ratio of the true measure
variance and the observed measure variance. In the model defined by (28.1)
2
we can show that p = CIa2a+aCY e2, which is also the constant correlation between
any two items. The reliability coefficient p can be easily interpreted as a cor-
relation coefficient between the true and the observed measure. The k straight
regression lines (Xij, Tij) , corresponding to the items (j = 1, .. , k) are parallel
and Carr (Xij, Tij) = pl/2. The reliability p of the sum of k items is equal to
_ kp
p= (28.2)
(k-1)p+1

This formula is known as the Spearman-Brown formula, it shows that when the
number of items increases, the reliability tends to 1. Its maximum likelihood
Validation of QoL Measurements 373

estimator, under the assumption of a normal distribution of the error and the
true score, is known as the Cronbach Alpha Coefficient (CAC) [Kristof (1963)]:

k - [ 1-
a=- 2:~=12 SJ] (28.3)
k- 1 Stot
where SJ = n~l 2:r=l (Xij - Xj)2 and Slot = nLl 2:r=l 2:~=1 (Xij - x)2. CAC
can be computed to find the most reliable subset of items [Moret et al. (1993)].
As a first step, all items are used to compute the CAC. Then at every step,
one item is removed from the scale. The removed item is the one which gives
for the scale without the item the maximum CAC. This procedure is repeated
until only two items remain. If the parallel model is true, it can be shown, using
the Spearman-Brown formula that increasing the number of items increases the
reliability of the total score which is estimated by Cronbach alpha. Thus, a
decrease of such a curve when adding an item could strongly bring us to suspect
that the given item is a bad one (in term of goodness-of-fit of the model).
A supplementary and popular way to assess the influence of the item on the
goodness-of-fit of the parallel model is by examining the empirical correlations
item to total (or to total minus the given item). Assuming the parallel model,
these correlations must be equal. A low correlation indicates a bad item. We
now present a real data example.

28.3 SIP Mobility Data (I)


Classical validation of a QoL questionnaire will now be illustrated using data
from a clinical trial in the depression field. 483 persons answer to the SIP (Sick-
ness Impact Profile) questionnaire as reported by Bergner et al .. We use here a
French translation. The SIP consists on 136 dichotomous items distributed in
12 dimensions. As an example, we apply the described method to the Mobility
scale. This dimension contains 10 items. Individuals are asked to respond "Yes"
to those items that described them on the present day. Items of the Mobility
dimension and the numbers of persons who answered positively are presented
in Table 28.1. In Table 28.2, we give the distribution of the individuals scores.
The step by step procedure with the Alpha coefficient explained in the last
section, is given in Figure 28.1.
If an instrument is already validated, the curve is monotonously increas-
ing. Figure 28.1 shows a non-increasing curve. We can choose to increase the
reliability of the instrument by deleting 1 item: the number 3.
374 A. Hamon, J. F. Dupuy, and M. Mesbah

Table 28.1: Items of the Mobility dimension; n = 466


Item Contents frequency proportion
1 I am getting around only with one building 97 0.21
2 I stay within a room 59 0.13
3 I am staying in bed more 235 0.50
4 I am staying in bed most of the time 88 0.19
5 I am not now using public transportation 133 0.26
6 I stay home most of the time 296 0.63
7 I am only going to places with rest rooms nearby 20 0.04
8 I am not going into town 167 0.36
9 I stay away from home only for brief periods of time 275 0.59
10 I do not get around in the dark or in unlit places 50 0.11
without someone's help

Table 28.2: Distribution of the individuals scores for the Mobility


dimension; n = 466
Score 012 3 4 5 6 7 8 9 10
Frequency 69 56 62 93 74 54 34 19 7 2 o

().72,------,--~--....,.--..,_--,_-__,--____,--,

0.71
Item 3 is remo ed

0.7

0.68

0.67

OG6 2~----L-----'----:----:---:':------::---~---:, 0

Number 01 items

Figure 28.1: Step by step procedure with the CAC for the Mobility dimension
Validation of QoL Measurements 375

28.4 The Rasch Model


The Rasch model assumes that each of n persons labelled i (i = 1, ... , n) an-
swered to k items labelled j (j = 1, ... , k). For each individual and each item,
a binary response Xij equal to 1 or 0 (yes or no) is recorded. Rasch (1960)
proposed the model

(28.4)

where ()i (()i E IR) is an individual parameter representing an individual's level


of quality of life and f3j (f3j E lR) is known as the difficulty parameter, or, in
the context of quality of life, it can be interpreted as the quality of life level
an individual must possess to achieve a 0.5 probability to answer positively
to the item. Individuals are assumed to be independent. The variables Xij
(j = 1, ... , k) are locally independent, that is if ()i is held fixed the answers are
independent.
One of the possible estimation methods is based on the particular property of
the Rasch model, namely sufficiency of the total score. This is the conditional
maximum likelihood (CML) estimation method, first proposed by Andersen
(1972). This method is important because many goodness-of-fit tests are based
on CML estimations. We will now explain how it is possible to test the fit
of a set of items to the Rasch model in order to evaluate the validity of the
questionnaire.

28.4.1 Goodness-of-fit tests


First, we must point out some tests that are constructed by remarking that the
Rasch model can also be formulated as a log linear model. In this context, Mel-
lenbergh and Vijn (1981) have presented some goodness-of-fit tests. Relying on
another point of view, Tjur (1982) introduced a random raw effect Rasch model
and related it to a conditional multiplicative Poisson model. This connection
leads also to some goodness-of-fit tests.
There exists an extensive literature on goodness-of-fit tests in the Rasch
model [see Glas and Verhelst (1995) for a good review]. Since statistical prop-
erties of tests (asymptotic distribution, power) are not always well stated, we
will present only some of the well-established tests. For all these goodness-of-fit
tests, the null hypothesis is that the data are realizations of random variables
distributed 'according to the Rasch model.
The first test we report was proposed by Andersen (1973) and is based on the
following properties of the Rasch model. Firstly, the raw score Si = L:j Xd
is a sufficient statistic for the individual parameter ()i. Secondly, the CML
376 A. Hamon, J. F. Dupuy, and M. Mesbah

estimators are consistent and asymptotically normally distributed. Finally the


CML estimators based on the sub-sample of individuals who achieved a score s
(s = 1, ... , k - 1), are consistent and asymptotically normally distributed when
ns --+ 00 [Andersen (1973)]. The conditional likelihood denoted by Lc(x, (3)
can be written

(28.5)

where the coefficients 'Yk are defined

(28.6)

The CML estimators will be denoted by /3e. The conditional likelihood based
on the sub-sample of persons with score s will be denoted by Lg) (x, (3) and the
associated maximum likelihood estimators by /3g).
Andersen (1973) proposed
a test based on the ratio of the estimated likelihood Lc(x, /3e)
and the sum of
the sub-samples likelihoods Lg)(x,/3g)). The test statistic is :
k-1
Z = 2LLg)(x,/3g)) - 2Lc(x,/3e). (28.7)
s=l

Theorem 28.4.1 [Andersen (1973)] If ns --+ 00 for s = 1, ... , k - 1 and if the


assumptions of the Rasch model hold then

Z ---- XZk-1)(k-2)' (28.8)

This theorem implies that, only when all the size (ns) of the sub samples of
persons with a score s is sufficiently large, can we consider that the distribution
of Z can be approximated by a X 2 -distribution. These requirements are not
generally achieved in practice. Most often, we find that the number of persons
with a low or large score is too small. To take this problem into account,
Andersen (1973) proposed to group the values s to obtain larger and more
homogeneous sub-samples. Let so, ... , Sr, be r integers (r > 1) so that we have
So = 0 < Sl < ... < Sr-1 < Sr = k - 1. Then we group in a sub-sample all
the persons who achieved a score between 1 and Sl, in another sub-sample we
group the individuals with scores between Sl + 1 and S2, and so on .... In each
subsample, we can obtain CML estimators /3g).
Andersen proves that with
slight modifications, the previous theorem is always true. Now the test statistic
is Zl
Sl

L Lg) (x, /3g)) - 2Lc(x, /3e) (28.9)


1=1 S=SI_1+1
Validation of QoL Measurements 377

where nz is the number of individuals with a score between SZ-l + 1 and sz. If
nz ---> 00 for l = 1, ... , r then the statistic Zl is asymptotically X2-distributed
with (r - l)(k - 1) degrees of freedom. Andersen (1973) considers the power
of this statistic against the specific alternative of the logistic two parameters
model. In this model the probability of a positive answer now depends on two
parameters
exp{aj (B - ;3j)}
P(Xij = 1I Bi,;3j) = l+exp{aj(B-;3j)}' (28.10)

aj is called the discrimination parameter; this allows the items' characteristic


curves to have different slopes. A special case is the Rasch model in which all
the discrimination parameters are equal.
Glas (1988) proposed two X2 goodness-of-fit tests. The first test, denoted by
R1 and called first order test, is based on the differences between the observed
and expected number of individuals with a score s and who achieved item j.
The second test (R2, second order test) is constructed with the number of
persons who positively answer to two questions j and l (j i- l).
Let Nsj be the number of persons with score s and who respond 1 to item j.
Instead of computing the expectation of N sj , Glas (1988) proposed to compute
the conditional expectation IE(Nsj, ;3INs = ns). After some computations we
can show that

Ik-l(S -l)exp(-;3j) ~
IE(Nsj,;3INs = ns) = () ~p(Si = siNs = ns). (28.11)
Ik s i=l

This expectation will be estimated

(28.12)

Let df = (d~)j=l, ... ,k be the vector of the differences nsj - IE(Nsj, /3c INs = ns)
and -v;,c its estimated variance-covariance matrix.

Theorem 28.4.2 [Glas (1988)] Let R1 be defined by

(28.13)

If for all s (1 :::; s :::; k - 1) the size of the sub sample of persons with score
s tends to infinity then the statistic Rl is asymptotically X2 distributed with
(k - l)(k - 2) degrees of freedom.
The same argument with the number of persons who positively answer to two
items land j leads to the construction of the R2 statistic. This statistic has an
asymptotic X2 distribution with k(k~H) - 2(k - 1) degrees of freedom.
378 A. Hamon, J. F. Dupuy, and M. Mesbah

In a validating process, we aim at detecting bad items (in the sense that
they do not fit well), and sometimes also subsets of items that fit the model.
Molenaar (1983) proposes a graphical method to select homogeneous subsets of
items. We will briefly present this interesting method in the next section.

28.4.2 A graphical method


The graphical method suggested by Molenaar (1983) serves to identify items
weakly related to the latent trait and subs cales of items measuring the same
latent trait. We briefly describe this method and then apply it in the following
section.
This graphical method is based on the following consideration: the CML
estimations of the item difficulties being independent, the same difficulty es-
timates should be obtained apart from random fluctuations, for a division of
the respondents into two subgroups, according to their response to a particular
item. The principle of the method is as follows:

(a) for each particular item i in its turn, separate the respondents into two
subgroups (respectively denoted by Go and Gd according to their re-
sponse (respectively yes and no) to the chosen item i;

(b) estimate the difficulties of the other items j (j -I- i) in the two groups Go
and G 1 ;

(c) plot the estimates of the difficulties in G 1 versus the estimates of the
difficulties in Go.

If the items i and j are locally independent, the position of the item j should
lie close to the first diagonal. If there exists some dependence between these
two items, item j should lie below or above the diagonal.

28.5 SIP Mobility Data (II)


One popular graphical method to assess the monotonicity of the item response
function is to plot the item trace (Figure 28.2). The item trace is the represen-
tative graph of the proportion of success against the corresponding raw score.
If the Rasch model is valid and if the raw score is replaced by the true latent
trait e, these curves are logistics and "parallels" or shifted.
To compute the goodness-of-fit tests presented before, we use the software
RSP [Glas and Ellis (1993)]. For all these tests, the split ofthe sample based on
raw scores is given in Table 28.3. The likelihood ratio test provides a significant
result (Zl = 95 with 27 degrees of freedom and the p-value is less than 0.0000),
indicating that the assumptions of the Rasch model do not hold for the studied
Validation of QoL Measurements 379

100,----,-----,----,-----~---,----~--~~----~

90

80

~ 70

~
C) 60
m
>
co
w
~ 50
o
ill
rn
~ 40

~
(L

Item 7

I_--~-

Raw score

Figure 28.2: Traces of the Mobility dimension items

Table 28.3: Division into 4 subgroups; n = 466


Group from score size
1 1 to 2 118
2 3 to 3 93
3 4 to 4 70
4 5 to 9 116
380 A. Hamon, J. F. Dupuy, and M. Mesbah

data. The Rl test is also significant. This test is based on the observed and
expected frequency of the positive answers to each item in each subgroup. In
Table 28.4, we present these numbers for the item 3 because it is the worst item
of all, in the sense of a large difference between observed and expected frequency.
If this item is removed, the statistic Rl is no longer significant. It value is 20 and
the asymptotic distribution is a X2 with 24 degrees of freedom (p-value = 0.68).
With this set of 9 items, the R2 test is highly significant, indicating that at
least two items of the questionnaire are not locally independent. Applying the
described graphical method results in the following conclusions. Separation of
respondents is made on item 2 of the scale.

Table 28.4: Expected and observed frequency of


positive answers to item 3 in each subgroup
Group expected observed
1 31.7 58
2 52.3 59
3 50 37
4 101 81

+itp. 4

+ item S

+ item S
+ item 1

+ ilp.1n
-2 - + it(~~Hii

V
-U.5 () 0.':,
dif1ir:ully estimates in qrollp Go

Figure 28.3: Difficulty estimates in each group formed by the individuals who
positively answer to item 2 (Gl) and negatively answer to item 2 (Go)

Figure 28.3 shows that item 1 appears to be dependent on item 2 ; it appears


Validation of QoL Measurements 381

to be more difficult for people scoring 0 to item 2 than for people scoring 1 to
item 2. To a less extent, item 10 seems to be correlated with item 2. Other
items seem to lie close to the diagonal, indicating that the response to each of
them is independent of the response to item 2. This method is repeated for
each item of the scale. We present one more example. It involves separation on
item 10, in order to check the conclusions drawn from the previous graph. Item

;J [, I,----~-~-~--,--___,__-___,__-___,__-___,__-_,_-/~

/
Ilentj

1.5

item 4
(')~
++
Ilet 1
~

~ 0.5 + Ilem 2

~ o~--------~~---------~
~

item I:l

"
ij
-1

-1.5
Item ~

-2
1+ iter :

" c,l::':-----'---~-~-~--~-~-~-~-
-?S -? -1,~) () O,~; 1's
dilliclJlty estllnCl\es ill qrollp Gn

Figure 28.4: Difficulty estimates in each group formed by the individuals who
positively answer to item 10 (Gl) and negatively answer to item 10 (Go)

10 is used to split and gives the graph 28.4. Item 2 appears to be correlated
with item 10 whereas there is no indication that the other items depend on the
response to item 10.
All the items appear to be locally independent of item 8 (the figure is not
presented here). Similar plots are obtained when using items 1,4,5,6,7 and
9 to separate the respondents. Hence the scale Mobility contains a group of
three items correlated with each other {I, 2, 1O}. In order to obtain a Rasch
homogeneous instrument, we choose one item from {1, 2, 10} and form a scale
by gathering this item and the items {4, 5, 6, 7, 8, 9} .
For making a choice, we calculate the likelihood ratio test. We check that
it is not significant for each of the three possi~le sets of items. As a scale for
Mobility we choose the set for which the level of significance of the test is the
largest.
382 A. Hamon, J. F. Dupuy, and M. Mesbah

28.6 Conclusion
A century after the famous Karl Pearson paper, fields of application where
goodness-of-fit tests are useful, are increasing. The validation of quality of life
questionnaire is undoubtedly one of those. Nevertheless, checking underlying
properties of those ideal measurement models is in practice, often done by
simple graphics or by way of simple statistics.

References
1. Andersen, E. B. (1973). Asymptotic properties of conditional maximum
likelihood estimators, Journal of the Royal Statistical Society, Series B,
32, 283-301.
2. Cox, D., Fitzpatrick, R., Fletcher, A. E., Gore, S. M., Spiegelhalter, D.
J., and Jones, D. R. (1992). Quality-of-life Assessment: Can we Keep It
Simple?, Journal of the Royal Statistical Society, Series A, 155,353-393.

3. Glas, C. A. W. (1988). The derivation of some tests for the Rasch model
from the multinomial distribution, Psychometrika, 53, 525-546.
4. Glas, C. A. W. and Ellis, J. (1993). Rasch Scaling Program: User's
Manual Guide, iec ProGAMMA, Groningen, The Netherlands.
5. Glas, C. A. W. and Verhelst, N. D. (1995). Testing the Rasch model, In
Rasch Models Foundations, Recent Developments and Applications (Eds.,
G. Fischer and 1. Molenaar), New York: Springer-Verlag.
6. Kristof, W. (1963). The statistical theory of stepped-up reliability coef-
ficients when a test has been divided into several equivalent parts, Psy-
chometrika, 28, 221-238.
7. Mellenbergh, G. J. and Vijn, P. (1981). The Rasch model as a loglinear
model, Applied Psychological Measurement, 5, 369-376.
8. Molenaar, 1. W. (1983). Some improved diagnostics for failure of the
Rasch model, Psychometrika, 48, 49-72.
9. Moret, L., Mesbah, M., Chwalow, J., and Lellouch, J. (1993). Valida-
tion statistique interne d'une chelle de me sure : relation entre analyse
en composantes principales, coefficient a de Cronbach et coefficient de
correlation intra-classe, Revue d'Epidmiologie et de Sante Publique, 41,
179-186.
Validation of QoL Measurements 383

10. Rasch, G. (1960). Probabilistic models for some intelligence and attain-
ment tests, Danmarks Paedagogiske Institut, Copenhagen.

11. Tjur, T. (1982). A connection between Rasch's item analysis model and
a multiplicative Poisson model, Scandinavian Journal of Statistics, 9, 23-
30.
PART VIII
TESTS OF HYPOTHESES AND ESTIMATION
WITH ApPLICATIONS
29
One-Sided Hypotheses in a Multinomial Model

Richard M. Dudley and Dominique M. Haughton


Massachusetts Institute of Technology, Cambridge, Massachusetts
Bentley College, Waltham, Massachusetts

Abstract: Several independent data sets are represented as coordinates of one


i.i.d. data set. We consider model selection for several 2 x 2 contingency tables
including models with and without a common odds ratio and one-sided models
such as models where a treatment is beneficial. We use Jeffreys priors on each
model. For studies of long-term treatment with aspirin after a heart attack, we
find that the treatment appeared to be beneficial if it began within six months
after the heart attack; if begun later it had no effect on mortality.

Keywords and phrases: Information criteria, Jeffreys prior, aspirin

29.1 Introduction
In this paper we continue [Dudley and Haughton (1997)] to study model selec-
tion based on an extended Schwarz (1978) BIC method for multiple data sets
and where individual models may be half-spaces or half-lines.
Multiple clinical trials of the same treatment against placebo give multiple
data sets, where the true or pseudo-true parameter values may differ between
studies. We also consider models where one parameter, a common odds ratio,
is the same for all data sets, while others vary. Some models of interest are
one-sided; specifically, models where the treatment is beneficial or harmful. For
one treatment, the long-term use of aspirin after a heart attack (myocardial
infarction, or MI), of seven individual studies, six did not show a significant
effect of aspirin on overall mortality at the 5% level, and one showed a mildly
significant benefit. The AMIS (1980) study had the largest sample size and
found a (non-significant) negative effect for aspirin. Several meta-analyses of
the mortality data have been done, e.g. Canner (1987), DerSimonian and Laird
(1986), and Gaver et al. (1992), which included 6 of the 7 studies, but not
Vogel, Fischer, and Huyke (1979), the one study showing a significant benefit

387
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
388 R. M. Dudley and D. M. Haughton

of aspirin. In summary, the meta-analyses found a point estimate showing an


overall benefit for aspirin, for example with 10.7% mortality in the placebo
groups and 9.9% mortality in the aspirin groups [Canner (1987)] but with a
95% confidence interval for the common odds ratio which contains 1, or not,
by a narrow margin, depending on the specifics of each analysis. We will do an
apparently new kind of analysis, based on a mixture model device and Jeffreys
priors, and considering not only the full data but also data separated according
to whether treatment began within 6 months after the MI [ef. Canner (1987)].
We will need to have in mind possible "multiple comparisons" issues.

29.2 Putting Multiple Data Sets Into an i.i.d. Form


Let Po, fJ E e, be laws on a sam pIe space X, where each Po has a density f (fJ, x) ,
fJ E e, x E X, with respect to a a-finite measure JL on X. For k = 1, ... , K let
Xkl, ... ,Xkn(k) be Li.d. PO(k) for some fJ(k) E 8. For different k let the data sets
be independent. Let N := n(1) + ... n(K). Let's see how the Xki can be taken
to be coordinates of N Li.d. observations. For r = 1, ... , K let er be the rth
standard unit vector in RK so that (er)k := 6r k := l{r=k}. Let Z be the set
of K points er . Let v be counting measure on Z. Consider the family of laws
on X X Z having densities with respect to JL x v given by

(29.1)

where 0° := 1, {vdf=1 E VK := {{Vi}~I: Vi 2: 0, L:~lVi = I}, and


Z = (ZI, ... ,ZK) E Z. If we take N i.i.d. observations (xi,z(i)) from such a
distribution, then the numbers nk := n(k) of observations having z(i) = ek for
k = 1, ... , K have a multinomial (N, VI, ... , VK) distribution and the observations
Xi with z(i) = er will be Li.d. PO(r) for each r. The likelihood function (29.1) is
the special case where all observations are "categorized" of the mixture model
given e.g. by Titterington, Smith, and Makov (1985, p. 3, (1.2.3)).

29.3 Model Selection Criteria


Let P = {Po, fJ E e}, where e is an open set in a Euclidean space. Let
ml, ... , mJ be a finite set of models included in e, where ml = e. For k =
1, ... , K, the kth independent data set X(k) := (Xkl' ... , X knk ) will consist
of nk observations i.i.d. with a law p(k) which may not be in P. We assume
that the maximum likelihood estimate Bk of fJ E e based on X(k) as nk ~ 00
eventually exists almost surely and converges to some fJOk , the pseudo-true value
One-Sided Hypotheses 389

of e for pCk) [Berk (1966), Huber (1967) and Poskitt (1987)J. We will need some
regularity conditions on models mj [Dudley and Haughton (1997, 2000)J.
One is that each mj is either a manifold, or a manifold-with-boundary in-
cluded in some mi where in a suitable parameterization, mi is an open set in a
Euclidean space and mj is a half-space intersected with mi.
We call a model mj the best model if it is the unique smallest model (in the
e
sense of inclusion) containing Ok for k = 1, ... , K. A best model will always
exist if for any i and j = 1, ... , J, mi n mj = mr for some r, as will hold for
the models we consider. Our object is to choose the best model mj, under loss
functions which may differ for different wrong choices of models.
Well-known model selection criteria are based on penalized maximum like-
lihood. Let M Lk (mj) be the supremum of the likelihood for the kth data set
over 8 E mj. Let MLLkj := 10gMLk(mj). Suppose first that there is only
one data set (K = 1) and all the models are manifolds. One chooses the model
for which MLLlj - tl j is largest, where the penalty tlj increases with the di-
mension dj of mj. In the BIC of G. Schwarz (1978), tlj = dj(10gn)/2 where
n = nl is the sample size. It has been shown under some conditions [Haughton
(1988) and Poskitt (1987)] that as nl ---t 00, the BIC is consistent, i.e. the
probability of choosing the best model converges to 1. Moreover, the BIC is
equivalent to choosing the model having the highest value of the leading terms
in the logarithm of its posterior probability of being the best model, under any
prior probabilities having strictly positive, continuous densities. The leading
terms do not involve the particular choices of priors.
The BIC has not been found to work optimally, so one may look beyond
the leading terms. In our cases the well-known Jeffreys (1946) prior measure Jj
exists on each mj. The density of Jj is the square root of the determinant of
the Fisher information matrix, independently of the parameterization, see also
Kass (1989). If 0 < Jj(mj) < 00, as is true in our cases, then the Jeffreys prior
probability on mj will be called J-lj := Jj/Jj(mj).
In the hierarchical Bayes method of meta-analysis, e.g. DerSimonian and
Laird (1986) and Gaver et al. (1992), a parameter 8i gives an effect size, e.g.
a difference in mortality rates, for the treatment vs. placebo in the ith study,
where 8i are Li.d. with a hyper-parametric prior. Such priors may be more
flexible and realistic, but we use Jeffreys priors to avoid subjective choices.
To approximate posterior probabilities we will need to approximate integrals
J likCn)dJ-lj for k = 1, ... , K, j = 1, ... , J, where likCn) is a likelihood function.
Here n may be nk for some k, when likCn) is the likelihood function for the kth
data set, or n may be the total sample size N = nl + ... + nK in (29.1). If mj
is a manifold, and ML(mj) := SUPmj likCn) , then we use the approximation

(29.2)

given by a sharpened Laplace method, which holds with a relative error of


Op(n- 1 ) [Poskitt (1987)], if the pseudo-true parameter 80 is in mj and under
390 R. M. Dudley and D. M. Haughton

some further hypotheses. If eo ~ mj, both the actual integral and its ap-
proximant become exponentially small in n relative to the integral over any mi
containing eo [Haughton (1988, 1989), Poskitt (1987) and Dudley and Haughton
(2000)]. The simple and accurate approximation (29.2) applies if and only if
/-lj is the Jeffreys prior. The factors n- dj / 2ML(mj) correspond to the BIC; the
remaining constant factors provide the sharpening.
If mj is a half-space intersected with a parameterized manifold mi, and mr
is the boundary of mj, so that dj = di = dr + 1, then Jj is Ji restricted to mj.
Let MLLu := sUPmu loglik(n). We use the approximation

(29.3)

where <I> is the standard normal distribution function, Sij = 1 if M LLj =


MLLi and Sij = -1 if MLL j < MLL i , and Xir = [2(MLLi - MLLr)P/2.
According to Dudley and Haughton (2000), either (29.3) is valid with a relative
error approaching 0 as n ~ 00, or fmj lik(n)d/-lj / fmu lik(n)d/-lu approaches 0
exponentially for any mu containing the pseudo-true eo.

29.4 Application to 2 x 2 Contingency Tables


We consider seven models Mj(t), j = 1, ... , 7, for K independent 2 x 2 contin-
gency tables, k = 1, ... , K. In each model t = (tl' ... , tK) where 0 < tk < 1 for
each k. For our application, the probabilities in the kth table will be written as

Died Survived Total


Treatment (1 - rk)tk
Control (1- Pk)(1 - tk)

The treatment probability tk was set by design as tk = 1/2 in six of the


seven studies we consider; in the PARIS study, tk = 2/3. In MI(t), Pk and rk
are unrestricted in [0,1]. The other models Mj(t) will be subsets of MI(t). In
M2(t), 0 :::; Pk :::; rk :::; 1 for each k: the treatment is not beneficial. In M3(t),
o :::; rk :::; Pk for each k: the treatment is not harmful. In Mj(t), j = 4,5,6,7,
o < Pk < 1, 0 < rk < 1 for each k and there is a common odds ratio 'l/J, i.e.
(29.4)

M7(t) will be the full common odds ratio model where 0 < 'l/J < 00. M 4 (t), M5(t)
and M6(t) will be the submodels of M7(t) where 'l/J = 1, 'l/J ~ 1 and 'l/J :::; 1
respectively. Thus M4(t) is the null hypothesis that the treatment has no effect.
Since M7(t) = M5(t) U M6(t), M7(t) can never be the best model.
One-Sided Hypotheses 391

The models Mj(t) for K = 1 will be called mj(t), where 0 < t = tl < 1 in
this case. Then ml(t) = m7(t), m2(t) = m5(t), and m3(t) = m6(t). The total
Jeffreys prior measures of these models are, for 0 < t < 1,

J4(m4(t)) = 7r, Jl(ml(t)) = 7r 2[t(l- t)]1/2, (29.5)

J2(m2(t)) = J5(m5(t)) = h(m3(t)) = J6(m6(t)) = JI(ml(t))/2. (29.6)


Each model M4(t) is a fiat hyperplane if 'l/J is a coordinate parameter for
M7(t). Thus M5(t) and M6(t) are half-spaces in M7(t).

29.5 Common Odds Ratio Profile Likelihoods


Suppose there is a common odds ratio 'l/J as in (29.4). First let K = 1, P := PI,
and y := p/(I-p). In terms of y and 'l/J we find that the likelihood function for
a 2 x 2 table with entries nij > 0, i, j = 1,2, with a dot representing summation
over an index, is 'l/Jn l1 yn. 1 [(I-t)/(I+y)]n 2 . [t/(I+'l/Jy)]n 1 . in the ml(t) model. For
fixed 'l/J and t, the likelihood function clearly is maximized where its derivative
equals 0, giving a quadratic equation in y [Yu (1992, eq. (2.4))]. We take the
unique positive root, which is the MLE of y for fixed 'l/J. We will get a Yk for
each data set, not depending on t = tk' Inserting Yk as a function of 'l/J given the
data nij for each k into the likelihood function and multiplying over k, we get a
function pC(·) of one variable, 'l/J, the profile likelihood, that has to be maximized
to find the MLE (j; for our model. Here (j; also does not depend on tl, ... , t K .
Suppose that pC is unimodal, as it was in our cases. Then maximizing it on
a computer is straightforward. To get, approximately, a 95% credible interval
for 'l/J, i.e. an interval whose posterior probability conditional on M7(t) \ M4(t)
is 0.95, we seek 'l/Jo and 'l/JI such that the posterior probabilities of 'l/J :::; 'l/Jo
and 'l/J ~ 'l/JI are each 0.025. We can do this, with good asymptotic accuracy
[Dudley and Haughton 2000)], by finding 'l/Jo < (j; < 'l/JI such that logpC at
'l/Jo and at 'l/Jl is smaller than at (j; by 1.92 == 1.962 /2. We do not know a
proof that pC is unimodal under general conditions. In our cases it appeared to
be logarithmically concave on an interval starting at 0 and containing (j;, but
logarithmically convex for larger 'l/J.

29.6 Jeffreys Priors for Mixture Models


Let MI(t) be the model given by (29.1) where the parameters ()(k), k = 1, ... , K,
are those of M1(t), namely ()(k) = (Pk, rk), k = 1, ... , K. Likewise for j = 2, ... ,7
let M j (t) be the model included in M 1 (t) whose parameters ()( k) are those
392 R. M. Dudley and D. M. Haughton

of Mj(t). For each model Mj(t), the Fisher information matrix has a block
structure Fj = [6 ~.] J
where V is the (K -1) x (K -1) Fisher information matrix
for the multinomial family (1, VI, ... , VK) with the parameters VI, ... , VK-1. Thus
i/
VK == I-VI - ... VK-1, Vkk = Vk 1 +V for k = 1, ... , K -1 and Vkr = Vrk = vl/ ,
1 ::; k < r ::; K - 1. We have det V = I/IIf'=l Vk, as can be seen by way
of the parameters Wk = v~/2: the Jeffreys prior measure for the multinomial
(1, wI, ... , wk) family is 2K-lwI/dwl ... dWK-l = (IIf'=l Vk)-1/2dvl ... dVK-l,
which is the surface area measure in the positive orthant of the sphere wI +
... + wk = 1, e.g. Kass (1989).
The matrix C 1 = C2 = C3 is a 2K x 2K diagonal matrix with diagonal
entries vktk/[rk(l - rk)] and vk(l - tk)/[Pk(l - Pk)] for k = 1, ... , K. So

(det C1)1/2 = IIf'=l Vk [tk(l - tk)]1/2 / {Pk(l - Pk)rk(1 - rk)}1/2.


Thus the Jeffreys prior measure on Mj(t) for j = 1,2, or 3 is a product mea-
sure on a product of K + 1 spaces, one being VK and the others being squares
(j = 1) or triangles (j = 2,3) with coordinates (Pk, rk), k = 1, ... , K. Since the
likelihood function has the same product form, so does any posterior distrib-
ution. Each triangle is the intersection of a half-plane with the corresponding
square. Thus the results of Dudley and Haughton (2000) apply for each k. We
have

By symmetry, if the integral is taken only over the region where rk ::; Pk for
each k, for M 3 (t), it is multiplied by 2- K , and likewise for the M2(t) region. By
normalization of Dirichlet distributions, e.g. Johnson and Kotz (1972, Chap.
40, Sec. 5), the total Jeffreys prior measures of Mj(t) for j = 1,2,3 are

(29.7)

(29.8)
The matrix C4 for M4(t) is a K x K diagonal matrix with entries Vk/[Pk(I-Pk)],
k = 1, ... , K, so the total Jeffreys prior measure of M4(t) is

(29.9)

For the common odds ratio model M7(t) and its submodels M5(t) and M6(t),
the matrix C5 = C6 = C7 is a (K + 1) x (K + 1) non-diagonal matrix C with

Vr [ 1 - tr 'ljJtr]
Cr+1,r+l = Yr (1 + Yr)2 + (1 + 'ljJYr) 2
One-Sided Hypotheses 393

and CI,r+1 = Cr+l,l = trvr/(1 + 'ljJYr)2 for r = 1, ... , K, and Cuv = Cvu = 0 for
2 :S u < v :S K + 1. By row reduction we get a matrix B with

B lr = 0 for r = 2, ... , K + 1, Buv = Cuv for 2 :S u :S K + 1, 1 :S v :S K + 1, so

We have J7(Mj(t)) = Jj(Mj(t)) for j = 5,6. By symmetry, J7(M5(t)) =


J7(M6(t)) = J7(M7(t))/2. Perhaps surprisingly, from the form of Bll, the Yk
are not conditionally independent given 'IjJ in the Jeffreys prior, even if tk = 1/2
for all k. We integrated (det C det V)I/2 over its 2K-dimensional domain, where
all the parameters are nonnegative and VI + ... + VK-I :S 1, and tl, ... , tK are
fixed, to estimate the total Jeffreys prior measure J7(M7(t)) by Monte Carlo
calculations for K ~ 2. Writing Jj(Mj, K) to indicate dependence on K,
where in our cases K = 3,6 or 7, while tj = 1/2 except for one value of j
where tj = 2/3, our simulations (to 50,000 iterations) gave J5(M5, 3) : : : : 16.7,
J5(M5, 6) ::::::: 14.5, J5(M5, 7) ::::::: 9.0.

29.7 Posterior Probabilities that Models are Best


i
Our overall prior probability will be L:J=1 Vj where Vj = Jj/Jj(Mj(t)) is the
Jeffreys prior probability on Mj(t) for j = 1, ... ,6. Recall that M7(t) cannot
be the best model, so V7 is omitted from the prior. Let Mj := Mj(t).
Let likCN) be the likelihood function for all K data sets together, given by
(29.1). The posterior distribution is then 7rN = (L:J=llikCN)Vj)/D, where D is
the total mass of L:J=llikCN)Vj on MI. The posterior probability that Mj is the
best model is then 7rN(M j ) where Mj = Mj \ U{Mr;M r C Mj,M r i- Mj}
is the set where Mj is the best model. The maximum likelihood estimate of Vk
is as usual nk/N where nk is the sample size for the kth data set and N is the
grand total sample size.
Let dj be (now) the dimension of Mj for each j. The models Mj are man-
ifolds for j = 1, 4 and 7. For K ~ 2 these three models are of different dimen-
sions, with M4 c M7 C MI. Let M L(j) be the maximum of the likelihood over
M j for N observations. Then by (29.2) we can approximate the ratio of inte-
grals J likCN)dvi/ J likCN)dvj by Ai/Aj where Ar = M L Cr)(27r/N)d r /2/Jr(Mr)
for i and j = 1,4 or 7. Here the maximum likelihoods with respect to the Vk
divide out, so we can omit them. Let Al := J lik(N)dvl.
394 R. M. Dudley and D. M. Haughton

Define Sijk and Xirk as Sij and Xir respectively for the kth data set in (29.3)
with n = nk. We use (29.3) for each data set separately and (29.8) to get
for j = 2,3, 7rN(Mj) '" )q(2K + 1)IIf=I~(SljkX14k)/D, as N ~ +00, where
Vk > 0 for all k and '" denotes asymptotic equivalence except possibly for
exponentially small probabilities (Dudley and Haughton, 2000). We also get:
- K K
7rN(MI) '" AI[1-IIk=I~(S12kXI4k)-IIk=I~(SI3kXI4k)l/D. By (29.2) and (29.9),
7rN(M4) = 7rN(M4) '" (2/N)(2K-I)/2ML(4)(K - l)!/(DJ7i} Applying (29.2)
to MI gives Al '" (27r/N)(3K-I)/2ML(I)/JI(MI) where JI(MI) is given
by (29.7). For M7 we get Jlik(N)dI/7 '" (27r/N)KML(7)/J7(M 7). Let Sij
and Xir be as in (29.3) for the full data set with n = Nand mu replaced
by Mu for each u. Then for j = 5 or 6, since Mj is a half-space in M7,
J lik(N)dl/j / J lik(N)dI/7 '" 2~(S7jX74)'

29.8 Data on Long-Term Aspirin Therapy


after an MI
The following contingency tables give survival data after long-term treatment
(averaging a year or more) following an MI, for clinical trials in which patients
were given aspirin or a placebo, for the full data from 7 different studies.

FULL DATA FROM SEVEN STUDIES


Died Survived Total Died Survived Total
Aspirina 49 566 615 Aspirin o 85 725 810
Placebo 65 559 624 Placebo 52 354 406
Total 114 1125 1239 Total 137 1079 1216

Died Survived Total Died Survived Total


Aspirinc 49 623 672 Aspirin d 45 713 758
Placebo 71 597 668 Placebo 65 706 771
Total 120 1220 1340 Total 110 1419 1529
Died Survived Total Died Survived Total
Aspirin e 27 290 317 Aspirin! 103 744 847
Placebo 32 277 309 Placebo 127 751 878
Total 59 567 626 Total 230 1495 1725
Died Survived Total
Aspiring 246 2021 . 2267
Placebo 219 2038 2257
Total 465 4059 4524
a: Elwood et at. (1974); b: PARIS (1980) study; c: Vogel et al. (1979); d: CDPRG (1980)
study; e: Breddin et al. (1980); f: Elwood and Sweetnam (1980); g: AMIS (1980) study.
One-Sided Hypotheses 395

The numbers in the above seven tables are as in Appendix I of the ATC
(1994) survey, which included some updates from the original publications,
except for the first row of the table from the PARIS study, not given in ATC
(1994).
Patients entered the CDPRG study on average 7 years after their last heart
attack. For two other studies, it was also a long time between last MI and
entry into the study for some patients. The next two tables give data only on
those patients who began treatment within 6 months after their heart attack
in those two studies. In the other four studies, all or nearly all patients entered
the studies within 6 months of their last MI.

SUB SAMPLES WHO BEGAN TREATMENT WITHIN 6 MONTHS


Died Survived Total Died Survived Total
Aspirin D 16 157 173 Aspiring 35 249 284
Placebo 18 77 95 Placebo 29 234 263
Total 34 234 268 Total 64 483 547
b: PARIS (1980) study (for two surviving patients taking aspirin, the time since last MI
was unknown); g: AMIS (1980) study (estimated).
We will also analyze separately the study (CPDRG) and subsamples of two
other studies, tabulated as follows, where patients entered treatment later.
SUBSAMPLES WHO BEGAN TREATMENT AFTER 6 MONTHS
Died Survived Total Died Survived
AspirinD 69 566 635 Aspiring 211 1772 1983
Placebo 34 277 311 Placebo 190 1804 1994
Total 103 843 946 401 3576 3977
b: PARIS (1980) study; g: AMIS (1980) study (estimated).

29.9 Numerical Results


For the full data sets or either of the subdivisions according to when treatment
began, the estimated posterior probabilities of being best, 1rN(M j ), never rose
above 0.0004 for j = 1,2,3, apparently because of the higher dimension of these
models. Thus our analysis selects common odds ratio models.
For the full data from all seven studies, 7rN(M4) = 0.916 for the null hy-
pothesis, so it is preferred, and 7rN(M6) = 0.083 for a benefit of aspirin. Exact
frequentist confidence intervals, in a sense, for common odds ratios can be com-
puted [Mehta and Walsh (1992)] by the package StatXact. The approximate
95% credible interval for 'l/J conditional on M7 \ M4, and the frequentist ap-
proximate 95% confidence interval, is [0.783,0.993]' not quite containing 1; the
interval also equals the "mid-p corrected exact" confidence interval for 'l/J given
396 R. M. Dudley and D. M. Haughton

by StatXact. Thus in a frequentist sense the null hypothesis would be rejected,


but by an uncomfortably narrow margin. The contrast between this and the
Bayesian outcome is analogous to the well-known "Lindley's paradox," Lindley
(1957), see also e.g. Bernardo and Smith (1994, pp. 394, 406-7, 415-6, 422). The
unconditional MLE 1./JMLE of 1./J is 0.882, as are the conditional MLE 1./JCML, e.g.
Nam (1993), and the Mantel-Haenszel (1959) estimator 1./JMH.
For patients who entered the studies within 6 months of their last MI (six
data sets), 7rN(M6) = 0.660 for a benefit of treatment and 7TN(M4) = 0.339 for
the null hypothesis. We have 1./JMLE = 1./JCML = 0.770 and 1./JMH = 0.771. The
95% credible interval for 1./J in M7 \ M4 is [0.651,0.911]' which again equals the
mid-p corrected exact confidence interval. The null hypothesis M4 is rejected
in a frequentist sense with a 2-sided p-value by the likelihood ratio test equal
to 0.0023, or by more precise StatXact computations, 0.0024 or 0.0026. If
with probability 2/3 aspirin reduces mortality (by an estimated 23%) and with
probability 1/3 is equivalent to a placebo, then consideration of plausible loss
functions suggests that aspirin should be recommended.
For patients entering studies more than 6 months after their last MI (three
data sets), 7rN(M4) = 0.971, strongly supporting the null hypothesis.

29.10 Discussion and Conclusions


Since the separation of the data according to the time treatment began was
done retrospectively it does not give a statistically clear outcome. The number
of possible multiple comparisons is not well-defined. It appears however that
medical researchers would consider it unethical to do any further prospective
studies of aspirin vs. placebo in the situation of the seven studies we considered.
Aspirin appears to be beneficial only if treatment begins within 6 months
after a heart attack. Canner's survey (1987) found benefits of aspirin only
during the first year or two of treatment. ATC (1994, p. 96) suggest to the
contrary that aspirin (or other antiplatelet) treatment be continued indefinitely
after a heart attack. It may be that lower doses of aspirin avoid enough of its
negative consequences to provide a benefit beyond two years. We would suggest
further clinical trials of that and related questions.

Acknowledgment. We thank Michael Woodroofe for telling us about the


paper of Berk (1966).
One-Sided Hypotheses 397

References
1. AMIS (1980). The Aspirin Myocardial Infarction Study Research Group.
The Aspirin Myocardial Infarction Study: Final results, Circulation, 62
(suppl. V), V79-V84.

2. ATC (1994). Antiplatelet Trialists' Collaboration. Collaborative overview


of randomised trials of antiplatelet therapy-I: Prevention of death, my-
ocardial infarction, and stroke by prolonged antiplatelet therapy in various
categories of patients, British Medical Journal, 308, 81-106.

3. Berk, R. H. (1966). Limiting behavior of posterior distributions when the


model is incorrect, Annals of Mathematical Statistics, 31, 51-58.

4. Bernardo, J. M. and Smith, A. F. M. (1994). Bayesian Theory, Chich-


ester: John Wiley & Sons.

5. Breddin, K, Loew, D., Lechner, K, Oberla, K, and Walter, E. (1980).


The German-Austrian aspirin trial: A comparison of acetylsalicylic acid,
placebo and phenprocoumon in secondary prevention of myocardial in-
farction, Circulation, 62 (suppl. V), V63-V72.

6. Canner, P. 1. (1987). An overview of six clinical trials of aspirin in coro-


nary heart disease, Statistics in Medicine, 6, 255-263.

7. CDPRG (1980). The Coronary Drug Project Research Group. Aspirin in


coronary heart disease, Circulation, 62 (suppl. V), V59-V62.

8. DerSimonian, R. and Laird, N. (1986). Meta-analysis in clinical trials,


Controlled Clinical Trials, 1, 177-188.

9. Dudley, R. M. and Haughton, D. (1997). Information criteria for multiple


data sets and restricted parameters, Statistica Sinica, 1, 265-284.

10. Dudley, R. M. and Haughton, D. (2000). Asymptotic normality with


small relative errors of posterior probabilities of half-spaces, Preprint.

11. Elwood, P. C., Cochrane, A. L., Burr, M. L., Sweetnam, P. M., Williams,
G., Welsby, E., Hughes, S. J., and Renton, R. (1974). A randomized con-
trolled trial of acetyl salicylic acid in the secondary prevention of mortality
from myocardial infarction, British Medical Journal, 1, 436-440.

12. Elwood, P. C. and Sweetnam, P. M. (1980). Aspirin and secondary mor-


tality after myocardial infarction, Circulation, 62 (suppl. V), V53-V58.
398 R. M. Dudley and D. M. Haughton

13. Gaver, D. P~, Draper, D., Goel, P. K., Greenhouse, J. B., Hedges, L.
V., Morris, C. N., and Waternaux, C. (1992). In Combining Informa-
tion: Statistical Issues and Opportunities for Research, Washington D.C.:
National Academy Press.

14. Haughton, D. M. A. (1988). On the choice of a model to fit data from an


exponential family, Annals of Statistics, 16, 342-355.

15. Haughton, D. (1989). Size of the error in the choice of a model to fit data
from an exponential family, Sankhya, Series A, 51 , 45-58.

16. Huber, P. J. (1967). The behavior of maximum likelihood estimates under


nonstandard conditions, Proceedings of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability [1965], vol. 1, 221-233. Berkeley
and Los Angeles: University of California Press.

17. Jeffreys, H. (1946). An invariant form for the prior probability in esti-
mation problems, Proceedings of the Royal Society of London, Series A,
186, 453-461.

18. Johnson, N. L. and Kotz, S. (1972). Distributions in Statistics 4; Contin-


uous Multivariate Distributions, New York: John Wiley & Sons.
19. Kass, R. E. (1989). The geometry of asymptotic inference, Statistical
Science, 4, 188-219 (with discussion).
20. Lindley, D. V. (1957). A statistical paradox, Biometrika, 44, 187-192.
21. Mantel, N. and Haenszel, W. (1959). Statistical aspects of the analysis of
data from retrospective studies of disease, Journal of the National Cancer
Institute, 22, 719-748.
22. Mehta, C. R. and Walsh, S. J. (1992). Comparison of exact, mid-p, and
Mantel-Haenszel confidence intervals for the common odds ratio across
several 2 x 2 contingency tables, The American Statistician, 46, 146-150.

23. Nam, J.-M. (1993). Bias-corrected maximum likelihood estimator of a log


common odds ratio, Biometrika, 80, 688-694.

24. PARIS (1980). The Persantine-Aspirin Reinfarction Study Research


Group. Persantine and aspirin in coronary heart disease, Circulation,
62, 449-461.

25. Poskitt, D. S. (1987). Precision, complexity and Bayesian model deter-


mination, Journal of the Royal Statistical Society, Series B, 49, 199-208.

26. Schwarz, G. (1978). Estimating the dimension of a model, Annals of


Statistics, 6, 461-464.
One-Sided Hypotheses 399

27. Titterington, D. M., Smith, A. F. M., and Makov, U. E. (1985). Statistical


Analysis of Finite Mixture Distributions, Chichester: John Wiley & Sons.

28. Vogel, G., Fischer, C., and Huyke, R. (1979). Reinfarktprophylaxe mit
Azetylsalizylsaure, Folia Haematologica, 106, 797-803.

29. Yu, K. F. (1992). On estimating standardized risk differences from odds


ratios, Biometrics, 48, 961-964.
30
A Depth Test for Symmetry

Peter J. Rousseeuw and Anja Struyf


University of Antwerp, Antwerp, Belgium
FWO, Belgium

Abstract: It was recently shown for arbitrary multivariate probability distribu-


tions that angular symmetry is completely characterized by location depth. We
use this mathematical result to construct a statistical test of the null hypothesis
that the data were generated by a symmetric distribution, and illustrate the
test by several real examples.

Keywords and phrases: Angular symmetry, characterization, hypothesis


testing, location depth

30.1 Introduction
It is natural to expect of a multivariate location estimator that in the case of
a symmetric distribution the population estimate corresponds to the center of
symmetry. Rousseeuw and Struyf (2000) prove that for any angularly sym-
metric multivariate distribution the point with maximal location depth [Tukey
(1975)] corresponds to the center of angular symmetry, and they give an expres-
sion for this maximal depth. Moreover, they show the converse: whenever the
maximal depth equals this expression, the distribution has to be angularly sym-
metric. Based on this characterization we will now construct a test for angular
symmetry of a particular distribution, which also gives us more insight in some
existing tests for centrosymmetry and uniformity of a spherical distribution.

401
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
402 P. J. Rousseeuw and A. Struyf

30.2 Location Depth and Angular Symmetry


Let P be an arbitrary probability distribution on JRF (with its usual Borel sets)
that need not have a density or any moments. We say that P is angularly
sYmmetric about a point c if for any Borel cone A in JRF (i.e., a Borel set A
such that sA = A for any 0 < s < (0) it holds that

P(c + A) = P(c - A).


It can easily be seen that P is angularly symmetric about c if and only if
PIRP\{c} is angularly symmetric about c. When P({c}) < 1 we can consider
the conditional probability distribution on JRF \ {c} defined as

P'(B) '= P(B \ {c})


. I-P({c})'
and then the angular symmetry of P is equivalent to that of P'. Let us now
define the mapping h : ·JRF \ {c} --+ S = S (0, 1) as the radial projection onto
the unit sphere, i.e. h(x) = (x - c)/llx - ell. Moreover, let

(30.1)

be the law of h. Then P is angularly symmetric about c if and only if

for any Borel set B C S, i.e. if Ph is centrosymmetric about O. Figure 30.1


illustrates the transformation h and the difference between angularly symmetric
and centrosymmetric distributions.
The halfspace location depth was introduced by Tukey (1975) as a tool for
analyzing finite data sets. The location depth of any point 9 E lRF relative to
the data set Xn = {Xl, ... , x n } C lRF is defined as the smallest number of data
points in any closed halfspace with boundary through 9, i.e.

ldepth(9; Xn) = min #(H(),u n Xn)/n (30.2)


Ilull=l
where H(),u = {x E lRF; u'(x-9) 2: O}. This definition can easily be generalized
to any probability distribution P on lRF. The location depth of a point 9 relative
to P then becomes

ldepth(9; P) = inf P(H(),u).


Ilull=l
Since (30.2) equals zero for 9 lying outside the convex hull of the data, and
increases when 9 moves closer to the center of the data, it is often referred to
A Depth Test for Symmetry 403

(a) (b)

(c) (d)

Figure 30.1: Examples of (a) a discrete and (b) a continuous angularly sym-
metric distribution around e. Transforming (a) and (b) through the mapping
h(x) = (x - e)/llx - ell yields the centrosymmetric distributions in (c) and (d)

as multivariate ranking [Eddy (1985) and Green (1981)]. This can be visualized
by means of the ldepth regions D a given by

Da = {O E lRF; ldepth(O;Xn) 2: a}.

These regions are convex sets, with Da ~ Da' for each a' < a. The center of
gravity of the innermost ldepth region is a point with maximal ldepth, called
the deepest location or the Tukey median of the data set. This multivariate
location Tz* (P) is a robust generalization of the univariate median. Donoho and
Gasko (1992) explored the properties of the location depth and of the deepest
location for finite data sets. Masse and Theodorescu (1994) and Rousseeuw and
Ruts (1999) gave several properties of the location depth for general probability
distributions P, that need not have a density. The asymptotic behavior of the
404 P. J. Rousseeuw and A. Struyf

depth function was studied by He and Wang (1997) and Masse (1999), and that
of the deepest location by Bai and He (1999). Many statistical applications of
location depth have been developed. A survey is given in Liu, Parelius, and
Singh (1999).
Recently, Rousseeuw, Ruts, and Tukey (1999) proposed the bagplot, a bi-
variate generalization of the univariate boxplot based on location depth. Fig-
ure 30.2a depicts the weight of the spleen versus the heart weight of 73 hamsters
[Cleveland (1993)]. The Tukey median is given by the black-on-white cross in
the center of the data cloud. The dark gray area around the deepest location
is called the bag. It is an interpolation of two subsequent depth regions, and
contains 50% of the data. The bag corresponds to the box in the classical box-
plot. An outlier (plotted as a star) is a point lying outside of the fence, which is
obtained by inflating the bag by a factor 3 relative to the Tukey median. The
light gray loop ("bolster") is the convex hull of all nonoutlying data points. In
this example, 4 hamsters seem to have an extraordinary large spleen and/or
heart. The shape of the loop reveals skewness of the data cloud, and suggests
a logarithmic transformation of both variables. In Figure 30.2b the bagplot of
the transformed data set is given. In this plot, only one outlier remains.
Rousseeuw and Struyf (2000) prove that the location depth can be used to
characterize angular symmetry :

Theorem 30.2.1 A distribution P on JRl is angularly symmetric about some


(}o if and only if

1 1
ldepth((}o; P) = 2: + 2: P ({(}o}).
Jn that case, ldepth( (}o; P) is the maximal value of ldepth( 0; P) over all 0 E JRl.

From Theorem 30.2.1 it follows that any P which is angularly symmetric about
some 00 with P( {(}o}) > 0 has a unique center of angular symmetry. Otherwise,
there can only be two different centers 0 1 i- O2 of angular symmetry if P has
all its mass on the straight line through 0 1 and 02. These corollaries have been
proved in another way by Liu (1990) and Zuo and Serfling (2000). A similar
property holds for the L1-median

T£1(P) = argminE(lIx - xoll)


Xo

since Zuo and Serfling (2000) have proved that a distribution P that is angularly
symmetric about a unique point 00 has 0 0 as an L1-median.

Remark. Since the condition of the theorem is that ldepth((}o;P) = ~ +


~ P ({ Oo}) which is at least 1/2, one might think that it would be sufficient to
require that ldepth( 00; P) 2: ~ [this property is called halfspace symmetry by
A Depth Test for Symmetry 405

Zuo and Serfling (2000)]. This is not sufficient, however. For instance, take
a distribution PI which is not angularly symmetric and with ldepth(O) = k.
Then put P2 = ~o and P := ~PI + ~P2' For this probability measure P we find
ldepth(O) = ~(k) + ~(1) = ~ ~ ~ although P is not angularly symmetric. As a
consequence, Theorem 30.2.1 is stronger than a similar property given by Zuo
and Serfling (2000) which is based on halfspace symmetry and requires stricter
conditions on P.
For the special case of probability measures with a density it always holds
that maxe ldepth(O; P) ~ ~, which yields the following corollary of Theo-
rem 30.2.1.

Corollary 30.2.1 Assume that P has a density. Then P is angularly symmet-


ric about some 00 if and only if

1
ldepth(Oo; P) = "2'
In that case, ldepth(Oo;P) = maXe ldepth(O;P).

The 'only if' part of this property was previously proved by Rousseeuw and Ruts
(1999) in a different way, whereas the 'if' part follows from Theorem 30.2.1.

30.3 A Test for Angular Symmetry


Given Xn = {Xl, X2, ... , Xn} and 00, Corollary 30.2.1 allows us to use
ldepth(Oo; P) as a test statistic for the null hypothesis

Ho : the data come from a continuous distribution P which is angularly


symmetric about 0 0 .

In the bivariate case, Daniels (1954) gave an expression for the cumulative
distribution function

under the null hypothesis Ho, namely

if k ~ [en - 1)/2],

otherwise.
(30.3)
406 P. J. Rousseeuw and A. Struyf

Here j' = [kj(n-2k)] and each term is a probability of the binomial distribution
B(n, ~).
The same test statistic has been used by other people to test for different null
hypotheses. In two dimensions, the location depth ldepth(B o, Xn) reduces to the
bivariate sign test statistic of Hodges (1955) where the null hypothesis Ho was
that P is centrosymmetric about Bo. By Theorem 30.2.1 we can now see that the
real null hypothesis of this test is larger than the original Ho. It actually tests
for angular symmetry instead of centrosymmetry, which is a special case. Ajne
(1968) uses essentially the same test statistic to test for another null hypothesis,
that a distribution on the circle is uniform. Bhattacharyya and Johnson (1969)
first noted that both tests use the same test statistic. By the construction in
(30.1) and Property 30.2.1 it follows that Ajne's test has a much larger null
hypothesis, namely centrosymmetry of the circular distribution. The latter
is an illustration of the fact that the masses of all hemispheres of a sphere
S in JRP do not suffice to characterize the distribution P on S. Indeed, for
any centrosymmetric distribution P on S (such as the one in Figure 30.1d)
it is true that the mass of each hemisphere equals !, and hence we cannot
distinguish between such distributions on the basis of the masses of hemispheres
alone. On the other hand, the masses of all caps of S would be sufficient to
characterize P on S by the theorem of Cramer and Wold (1936), since any
nontrivial intersection of a halfspace H C lRP and S determines a cap of Sand
vice versa.

Example 1. Let us consider the exchange rates of the German Mark relative
to the US Dollar (DEMjUSD) and of the Japanese Yen (JPY jUSD) from July
to December 1998. Every weekday (except on holidays), the exchange rates
were recorded at 8PM GMT. Figure 30.3 shows the evolution of the exchange
rates over this time period, measured in units of 0.0001 DEMjUSD and 0.01
JPY jUSD. The data set in Figure 30.4 consists of the 129 differences (~x, ~y)
between the exchange rates on consecutive days, for both currencies.
From the time series plot in Figure 30.3 as well as from the scatter plot in
Figure 30.4 it is clear that ~x and ~y are correlated. We want to test whether
these pairs of exchange rate movements come from a bivariate distribution
which is angularly symmetric around the origin. Intuitively, we want to test if a
movement (~x, ~y) of the rates of DEMjUSD and JPY jUSD with ~yj ~x = a
and ~x > 0 is equally likely to occur as a movement (~x, ~y) with ~yj ~x = a
and ~x < O. The location depth of the point Bo = (0,0) can be calculated with
the program of Rousseeuw and Ruts (1996). Here, ldepth(B o, X) = 57. The
p-value equals H29(57) = 0.88435, hence we accept the null hypothesis that the
data are angularly symmetric around Bo. Note that large distances or long tails
have no effect on this result.
A Depth Test for Symmetry 407

Example 2. The azimuth data [Till (1974, p. 39) and Hand et al. (1994)]
consist of 18 measurements of paleocurrent azimuths from the Jura Quartzite,
Islay. The original measurements (in degrees east of north) are projected onto
the circle in Figure 30.5. The location depth of the point (0,0) relative to this
data set equals 1. The p-value is H8(1) = 0.002197, so we conclude that the
distribution of these data points deviates significantly from angular symmetry.

Example 3. Ferguson et al. (1967) described an experiment in which 14 frogs


were captured, transported to another place, and then released to see if they
would find their way back. The directions in which the 14 frogs started their
journey are given by the angles 104°,110°,117°,121°,127°,130°,136°,145°,
152°, 178°, 184°, 192°, 200°, and 316°. The depth of the origin relative to
these data equals 1, which leads to a p-value of 0.020508. Therefore, we reject
the null hypothesis that the distribution of the frogs' movements is angularly
symmetric around the origin.

30.4 Regression Depth and Linearity of the


Conditional Median
Van Aelst et al. (2000) present a test similar to the one in Section 30.3 for testing
the linearity of the conditional median, using the regression depth [Rousseeuw
and Hubert (1999)] as a test statistic. In the simple regression case, the null
distribution of that test statistic coincides with the distribution (30.3).
408 P. J. Rousseeuw and A. Struyf

Bagplol

0.3
(a)
••
0.25


E 0.2 •
''"t
.
c:
.9!
Co
• •

:u'" 0.15
• •
.
in
E
<=

0.1 •

• •• • •
• •• ••
0.05
••

0.4 0.6 0.8 1.2 1.4 1.6 1.8
hamsler heart weighl

Bagplot

-2
• .


- 2.5 • • •
E • •
.2'
-3 •
.
~
c:
.9!

~
:u - 3.5
..
in

e
E \..
• •
E • •
-4
• • • • •

•• •
- 4.5 • • •


-5

-1 2 -1 - 0.8 - 0.6 - 0 .4- 0.2 0 0 .2 04 0 .6 0.8
(b) loglhamSler heart weIght}

Figure 30.2: (a) Bagplot of the spleen weight versus heart weight of 73 hamsters.
(b) Bagplot of the log-transformed data set
A Depth Test for Symmetry 409

JPY

~
Q)
0>
C

'"
.<:: a
a
~
Q)
a

o 20 40 60 80 100 120

index

Figure 30.3: Evolution of the exchange rates of DEMjUSD (dashed line) and
JPY jUSD (full line) from July to December 1998
410 P. J. Rousseeuw and A. Struyf

0
0
'<t

• • •
• ..••
• •••.e..
, .• • ..I"
0

.. -
0

"' ...

C\I

.....
•• • •
• •• •
. .. •
, . e\
0

• • ••• • .~
~~
.-..'
: ,. .....

0 0 •• ••• • •
en 0
C)I • • •
• • • ••
;:)
):
a.
....,

8 • •
"1 •

~
0
~

I I I I

-400 -300 -200 -100 o 100 200

DEMIUSD

Figure 30.4: Differences between exchange rates on consecutive days for


DEM/USD and JPY/USD in the second half of 1998. The origin is depicted
as a triangle
o ..
In
0 - .
N

~ 0
0
E
·N
os

In
9

-1.0 -0.5 0.0 0.5 1.0

azimuth!, 1]

Figure 30.5: The azimuth data


A Depth Test for Symmetry 411

References
1. Ajne, B. (1968). A simple test for uniformity of a circular distribution,
Biometrika, 55, 343-354.

2. Bai, Z. and He, X. (1999). Asymptotic distributions of the maximal depth


estimators for regression and multivariate location, Annals of Statistics,
27, 1616-1637.

3. Bhattacharyya, G. K. and Johnson, R. A. (1969). On Hodges' bivariate


sign test and a test for uniformity of a circular distribution, Biometrika,
56, 446-449.

4. Cleveland, W. S. (1993). Visualizing Data, New Jersey: Hobart Press,


Summit.

5. Cramer, H. and Wold, H. (1936). Some theorems on distribution func-


tions, Journal of the London Mathematical Society, 11, 290-294.

6. Daniels, H. E. (1954). A distribution-free test for regression parameters,


Annals of Statistics, 25, 499-513 ..

7. Donoho, D. L. and Gasko, M. (1992). Breakdown properties of location


estimates based on halfspace depth and projected outlyingness, The An-
nals of Statistics, 20, 1803-1827.

8. Eddy, W. F. (1985). Ordering of multivariate data, In Computer Science


and Statistics: Proceedings of the 16th Symposium on the Interface (Ed.,
L. Billard), pp. 25-30, Amsterdam: North-Holland.

9. Ferguson, D. E., Landreth, H. F., and McKeown, J. P. (1967). Sun com-


pass orientation of the northern cricket frog, Animal Behaviour, 15, 43-53.

10. Green, P. J. (1981). Peeling bivariate data, In Interpreting Multivariate


Data (Ed., V. Barnett), pp. 3-19, New York: John Wiley & Sons.

11. Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J., and Ostrowski, E.
(1994). A Handbook of Small Data Sets, London: Chapman & Hall.

12. He, X. and Wang, G. (1997). Convergence of depth contours for multi-
variate datasets, Annals of Statistics, 25, 495-504.

13. Hodges, J. L. (1955). A bivariate sign test, Annals of Mathematical Sta-


tistics, 26, 523-527.
412 P. J. Rousseeuw and A. Struyf

14. Liu, R. Y. (1990). On a notion of data depth based on random simplices,


Annals of Statistics, 18, 405-414.
15. Liu, R. Y., Parelius, J., and Singh, K. (1999). Multivariate analysis by
data depth: descriptive statistics, graphics and inference, The Annals of
Statistics, 27, 783-840.

16. Masse, J. C. (1999). Asymptotics for the Tukey depth, Technical Report,
Universite Laval, Quebec, Canada.

17. Masse, J. C. and Theodorescu, R. (1994). Halfplane trimming for bivari-


ate distributions, Journal of Multivariate Analysis, 48, 188-202.

18. Rousseeuw, P. J. and Hubert, M. (1999). Regression depth, Journal of


the American Statistical Association, 94, 388-402.

19. Rousseeuw, P. J. and Ruts, I. (1996). Algorithm AS 307: Bivariate loca-


tion depth, Applied Statistics (JRSS-C), 45, 516-526.

20. Rousseeuw, P. J. and Ruts, I. (1999). The depth function of a population


distribution, Metrika, 49, 213-244.

21. Rousseeuw, P. J., Ruts, I., and Tukey, J. W. (1999). The bagplot: A
bivariate boxplot, The American Statistician, 53, 382-387.

22. Rousseeuw, P. J. and Struyf, A. (2000). Characterizing Angular Symme-


try and Regression Symmetry, Technical Report, University of Antwerp,
submitted.

23. Till, R. (1974). Statistical Methods for the Earth Scientist, London:
MacMillan.

24. Tukey, J. W. (1975). Mathematics and the picturing of data, Proceedings


of the International Congress of Mathematicians, Vancouver, 2, 523-531.
25. Van Aelst, S., Rousseeuw, P. J., Hubert, M., and Struyf, A. (2000). The
deepest regression method, Technical Report, University of Antwerp.

26. Zuo, Y. and Serfiing, R. (2000). On the performance of some robust non-
parametric location measures relative to a general notion of multivariate
symmetry, Journal of Statistical Planning and Inference, 84, 55-79.
31
Adaptive Combination of Tests

Yadolah Dodge and Jana Jureckova


University of NeucMtel, NeucMtel, Switzerland
Charles University, Czech Republic

Abstract: In this paper we present a combination of two tests of the linear


hypothesis in the linear regression model. The adaptive decision rule which
selects the optimal combination of the tests is quite analogous to that which
led to the optimal combinations of estimators proposed by the authors.

Keywords and phrases: Regression, adaptive regression, testing hypothesis,


adaptive estimation

31.1 Introduction
Consider the linear regression model

Y = X,6+z (31.1)

where Y is a (n xl) vector of observations with the design matrix X of order


n x p such that XiI = 1, i = 1..., n, {3 is a (p x 1) vector parameter and z is
an (n x 1) vector of independent errors, identically distributed according to a
distribution function (dJ.) F, which is generally considered as unknown; we
only assume that F belongs to some family :F of distribution functions. The
problem is that of estimating the parameter {3. Notice that the first component
of,6 is an intercept. Depending on the estimation procedures involved, we shall
have to put some more restrictions on the class of underlying distributions,
like existence of some moments, positivity of the density in some interval or at
least at some points, etc. The regularity conditions are usually mild. Even the
condition of symmetry is not necessary and we impose it rather to avoid the
problem of eventual nonidentifiability of the intercept.
There are different methods of estimating the unknown parameters. Three
of such methods are minimization of (1) sum of squares errors; (2) sum of

413
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
414 Y. Dodge and J. Jureckova

absolute errors; and (3) maximum of absolute errors. These three methods
are members of the class called Lp-estimators which are obtained by mini-
mizing what is known as Minkowsky metric or Lvnorm (criteria) defined as
[2: I Zi IP]l/P with P ~ 1. If we set p = 1, we obtain what is known as an
absolute or city block metric or Ll-norm. The minimization of this criteria is
called Ll-norm or Least Absolute Deviations (LAD) method. If p = 2, we have
what is known as a Euclidian metric or L 2 -norm. The minimization of this
distance is known as the least squares (LS) method. The classical approach
to the regression problem uses this method. If we minimize the Lp-norm for
p = 00 we have the minimax method. There are, however, many other methods
for estimating {3. For a complete treatment of some of the major methods the
reader is referred to Birkes and Dodge (1993) and to Jureckova and Sen (1996).
While the method of least squares enjoys well-known properties within
Gaussian parametric models, it is recognized that outliers, which arise from
heavy-tailed distributions, have an unusually large influence on the resulting
estimates. Outlier diagnostics statistics based on least squares have been devel-
oped to detect observations with a large influence on the least squares estima-
tion. For documents related to such diagnostics the reader is referred to Cook
and Weisberg (1982, 1994).
While many location estimators were extended in a straightforward way to
the linear regression model, this, until recently, was not the case of L-estimators
(linear combinations of order statistics). The attempts which were made either
were computationally difficult or did not keep the convenient properties of lo-
cation estimators. In 1978, Koenker and Bassett introduced the concept of
regression quantile which provided a basis for L-procedures in the linear model.
The trimmed least squares estimator, suggested by the same authors, is an
extension of the trimmed mean to the linear model.
At present, there exists a big variety ofrobust estimators in the linear model.
Besides distributional robustness, estimators resistant to the leverage points in
the design matrix and possessing a high breakdown point [introduced originally
by Hampel (1968); the finite sample version is studied in Donoho and Huber
(1983)] were developed and studied.
Summarizing, the last 40 years brought a host of statistical procedures,
many of them enjoying excellent properties and being equipped with a com-
putational software. On the other hand, this progress has put an applied sta-
tistician into a difficult situation: if he needs to fit his data with a regression
hyperplane, he is hesitating which procedure should he use. His decision is then
sometimes based on peculiar reasons. If he had some more information on the
model, he could choose the estimation procedure accordingly. If his d~ta are
automatically collected by a computer and he is not able to make any diag-
nostics then he might use one of the high breakdown-point estimators but he
usually would not, either due to the difficult computation or perhaps due to
his scepticism. Then, finally, he might prefer the simplicity to the optimality
Adaptive Combination of Tests 415

and good asymptotics and use the classical least squares, LAD-method or one
of other reasonably simple methods.
An idea of what to advise for such a situation, instead of to concentrate on
one method, is to combine two convenient methods and in such a way diminish
the eventual shortages of both. This idea, simple as it is, was surprisingly not
very much elaborated until recently. Arthanari and Dodge (1981) introduced
an estimation method based on a direct convex combination of LAD- and LS-
methods. Dodge (1984) extended this method to a convex combination of
LAD and Huber's M-estimation methods and supplemented that by a numerical
study based on simulated data.
Later on Dodge and Jureckova (1988) observed that the convex combina-
tion of two methods could be adapted in the sense that the optimal value of the
convex combination methods coefficient, minimizing the resulting asymptotic
variance, could be estimated from the observations. The resulting estimator at-
tains a minimum asymptotic variance over all estimators of this kind and for any
general distribution with a nuisance scale. Dodge and Jureckova (1988, 1991)
then extended the adaptive procedure to the combinations of LAD-method
with M-estimation and with trimmed least squares estimation methods. An
analogous idea can be used to develop for the combination of two tests of the
linear hypothesis in the linear regression model. In Section 31.3 we develop and
discuss optimal combination of tests.
In what follows we briefly describe the general idea, leading to a construction
of an adaptive convex combination of two estimation methods.

31.2 Adaptive Combination of Estimators


We shall consider a family of symmetric densities indexed by an appropriate
measure of scale:

F={f:f(z)=S-lfo(z/s), s>O}. (31.2)

The shape of fo is generally unknown; it only satisfies some regularity con-


ditions given later and the unit element fo E F has So = 1. When we would
like to combine an Ll-estimator with another class of estimators, then we take
s= 1/ f(O).
Generally, the scale characteristic s should have a reasonably consistent
estimator Sn based on Y1 , .•. , Yn . Moreover, it would be natural to assume
that the estimator Sn is regression-invariant and scale-equivariant, i.e.

(a) sn(Y) ~ s as n --t 00 (31.3)


(b) sn(Y + Xb) = Sn(Y) for any bE RP (regression-invariance)
(c) sn(cY) = csn(Y) for c> 0 (scale-equivariance).
416 Y. Dodge and J. Jureckova

The idea of the adaptative estimator as introduced in Dodge and Jureckova


(2000) is as follows:

Let Tn (15) be a solution of the minimization problem

~
L..JP
i=1
(Yi - x~t)
Sn
:= mm
. (31.4)

with respect to t E RP, where Sn is a consistent estimator of sand p(z) =


I5PI (z) + (1-I5)p2(z), 0 :::; 15 :::; 1, where PI (z) and p2(Z) are symmetric (convex)
discrepancy functions defining the respective estimators. For instance, PI (z) =1
z 1 and p2(Z) = z2 if we want to combine LAD and L8 estimators. Then
fo(T n(l5) -(3) has an asymptotically normal distribution Np(O, Q-I(T2(15, p, 1)),
where Q = (X'X). Using 15 = 150 which minimizes (T2(15, p, 1) with respect to 15,
0:::; 15 :::; 1, we get an estimator T n(l5o) minimizing the asymptotic variance for
a fixed distribution shape. Typically, (T2 (15, p, f) depends on f only through two
moments of fo.
That is, in the case of a least squares estimator

(T5 = J x 2 fo(x)dx

and

EP = J 1x 1fo(x)dx,
and in the M-estimation case

and

EP = J 1 'IjJ(x) 1 fo(x)dx,

where 'IjJ =p;.


It is, in fact, a product of s2 and of an expression containing 15, (T6,and
EP which, being convex in 0 :::; 15 :::; 1, could be well minimized with respect
to 15. Instead of (T2 (15, p, 1), we then minimize its estimation with (T6 and EP
being replaced by their estimators 0-6 and EP
based on the data; denote 60 the
minimizing value.
Then we shall consider the function

and the minimization (31.4) leads to the estimator T n( 60) such that

(31.5)
Adaptive Combination of Tests 417

where
(31.6)

Hence, T n(80) attains the minimum possible asymptotic variance among


all solutions of (31.4) corresponding to the pertaining distribution shape. If
80 = 1 or 0, then T n(80) coincides with the estimator generated by PI or P2,
respectively.
If we combine Ll-estimation with another estimation procedure, we take
s = 1/1(0) due to the fact that the asymptotic covariance matrix of the Ll-
estimator is proportional to 1/4(1(0))2. The characteristic s = 1/1(0) is simple
and robust but not easily estimable, similarly as the density itself.
Recently Dodge and Jureckova (1995) proposed two estimators of s, based
on regression quantiles, satisfying (a)-(c) in (31.3). These estimators, histogram
and kernel type, respectively, do not need any initial estimator !J of {3, and seem
to be very convenient for the adaptive convex combinations of estimators of {3.
Various procedures based on a combination of two or several estimations,
mostly adaptive in some well defined sense are described in Dodge and Jureckova
(2000). The advantage of these procedures is their simplicity and also that the
ideas leading to their construction are acceptable for applied statisticians.

31.3 Adaptive Combination of Tests


We have considered adaptive convex combinations of two kinds of estimators.
An analogous idea can be exploited for the combinations of two tests of the
linear hypothesis in the linear regression model. It turns out that the same
adaptive decision rules, that led to the optimal combinations of estimators also
lead to the optimal combinations of two tests. The efficiencies of estimating
and testing procedures are also closely related; it is well known that the Pit-
man efficacy of the test coincides with the reciprocal standard deviation of the
asymptotic distribution of the corresponding estimator. Noting that, the adap-
tive combination of two tests can also be considered as the test corresponding
to the adaptive combination of two pertaining estimators.
In this section, we shall briefly illustrate how to use the adaptive procedures
developed in Section 31.2 for an adaptive combination of tests. We shall start
with some general remarks, and describe in more detail the important special
cases of convex combinations of the F -test with the median-type test and the
M-test with the median-type test, respectively.
Consider the linear regression model

Y=X{3+z (31.7)
418 Y. Dodge and J. Jureckova

where Y is an (n x 1) vector of observations, X = Xn is an (n x p) design


matrix, j3 is a (p xl) vector of unknown parameters and z is an (n xl) vector
of independent errors, identically distributed with the density f (z) satisfying

f(z) = f( -z), z E RI (31.8)


o < f(O) < 00 and f has a bounded derivative in
a neighborhood of 0
0< (72 = J
z2 f(z)dz < 00

and f(z) = (l/s) fo (z/s) , s > 0 (31.9)

where fo is a fixed (but generally unknown) symmetric density such that fo(O) =
1 and the scale statistic s is s = 1/ f(O). Denote F = {j : f(z) = (l/s)
fo(z/s), s > O} the family of densities, satisfying (31.8) to (31.9), indexed by s.
We shall consider the hypothesis

Ho: j3 = 0; (31.10)

but obviously the procedure could also be applied to more general hypotheses
of the type H: Aj3 = b.
We can generally consider three types of tests of the linear hypothesis:
(i) the Wald type tests,

(ii) the likelihood ratio type tests,

(iii) the score type tests.


(i) The Wald type test of Ho is based on the quadratic form of an appropriate
estimator /3 of (3,
(31.11)
where V is the covariance matrix of /3 or its approximation. Typically, (31.11)
has asymptotically X2 distribution under Ho and a noncentral X2 distribution
with the noncentrality parameter /3b V-I /30 under the local (Pitman) alternative

(31.12)

with a fixed (30 E RP. The problem may be that of estimating the covariance
matrix V.
(ii) and (iii): The likelihood ratio tests and the score type tests are closely
related. The latter has a simpler linear form: for instance, for the model f(x, e)
with the scalar parameter e, and hypothesis H*: e = eo, the parametric score
test is based on the statistic

(31.13)
Adaptive Combination of Tests 419

The score tests can be performed with less or no estimation of unknown parame-
ters and matrices, compared with the two other tests; moreover, the sign-rank
tests, which asymptotically have forms of score tests, need even less estimation
due to their invariance. For this reason, we recommend using the ranks rather
than the regression quantiles or LAD estimation for testing various hypotheses
in the linear model.
The score tests belong to the class of M-tests of Ha, which are closely
connected with the M-estimation of (3. The M-test of Ha is characterized by
the test criterion
(31.14)

where n
Mn = (nQn)-1/2 LXi?j!(Yi), Qn = n-IX~Xn, (31.15)
i=1

X~ is the ith row of X n , i = 1, ... ,n, and a;


is an estimator of the functional
J~oo ?j!2 (x) dF (x); ?j! is the (skew-symmetric) score function generating the M-
estimator. Then the criterion (31.14) has asymptotically the X~ distribution
under H a, and the noncentral X~ distribution under Hn with the noncentrality

i:
parameter

'Y(?j!,1) = f(x)d?j!(x). (31.16)

The noncentrality parameter (31.16)" is equal to the reciprocal square of the


Pitman efficacy of the test (31.14). Also notice that (31.16) is reciprocal to the
asymptotic variance of the M-estimator.
The sign-rank test criterion for Ha has the form

(31.17)

where
n R+
S~ = (nQn)-1/2 {; Xi<P+ (n ~ 1)' (31.18)

where Rt is the rank of IYiI among IYII, ... , IYnl, and <P+ : [0,1) I--t R~ is a
nondecreasing score function, square-integrable on (0, 1), and such that <p+(O) =
O. Denote also

<p(u) = { <p+(2u - 1) if ~:::;u<l


(31.19)
-<p(1 - u) if 0< u:::;~.

The test criterion (31.17) has asymptotically the X~ distribution under Ha and

i: -
the noncentral X~ distribution under Hn with noncentrality parameter

'Y(<p,1) = f'(x)<p(F(x))dx. (31.20)


420 Y. Dodge and J. Jureckova

Moreover, under H o, the sign-rank statistic (31.18) admits the asymptotic rep-
resentation
n
s~ = (nQn)-lf2 LXi'P(F(Zi)) + op(l) as n -+ 00, (31.21)
i=1

and hence it asymptotically has the form of the M-test.


If we have no knowledge about the shape of the distribution, we could
recommend using the pure sign-rank test, which is distribution-free under Ho.
However, if we believe that our distribution is close to normal, we can try to
use a combination of the F-test of Ho with the median-type test, which we
shall describe in the next subsection. Another possibility would be to combine
the F-test with another simple sign-rank test, such as the Wilcoxon. On the
other hand, to combine two sign-rank tests (say the median and Wilcoxon ones)
makes no sense because the resulting test is distribution-free, and hence would
not adapt itself to the underlying distribution.

31.3.1 Adaptive combination of F-test and median-type test


The classical F-test and the median-type test are the counterparts of the LS
and LAD estimators, respectively. The classical F-test of Ho can be described
by the criterion (31.14)-(31.15) with 'IjJ(z) = z, z E ./R1 . The F-test of Ho is
based on the criterion

(31.22)

and o-~ is an estimator of (12. On the other hand, the median test of Ho is also
of type (31.14)-(31.15) with

Mn -_ s+
n -_ (nQn )-1/2 L.,..x
~ t. 2"slgn
1. Yi.. (31.23)
i=1

While the F -test is optimal for f normal, the median test is the locally most
powerful signed-rank test of Ho for the double exponential distribution of errors.
We are looking for the optimal convex combination

W n -_ (1 - 6) -Tn
1
s
+ 6S n+ , 0 S; 6 S; 1. (31.24)

The test is considered optimal when it has the maximum Pitman efficacy over
6 E [0,1]. Notice that the test (31.24) is of the type (31.14) with the 'IjJ function

'IjJ(z) = ~(1 - 6)z + 6 ~Sign z, z E ./R 1 • (31.25)


Adaptive Combination of Tests 421

As it follows from Section 31.2, W n is asymptotically normally distributed


under the hypothesis Ho; more precisely,

(31.26)

where

2 2 (/2 [j2 El 1 2
,." =(1-6) -+-+6(1-6)-=-(/ ('ljJ,F,8) (31.27)
82 4 8 82

with
2
(/2 ( 'ljJ, F, 8) = ~ {4 (1 - 6) 2 (/6 + 46 (1 - 6) EP + 62 } . (31.28)

The asymptotic distribution of W n under the Pitman alternative Hn (31.12)


will be normal with a nonzero expectation,

(31.29)

If we knew the parameters El, (/2, 8, we would use the test criterion

(31.30)

which is asymptotically X~ distributed under Ho and asymptotically noncentral


X~ distributed under Hn with the noncentrality parameter

The test would have the maximal efficacy for 6 minimizing (/2 ( 'ljJ, F, 8) given in
(31.28). Thus, we conclude that 5n , leading to the optimal adaptive combination
of the least squares and Ll estimators leads to the optimal adaptive combination
of the F-test and the median-type test.

31.3.2 Adaptive combination of M-test and median-type test


Let us consider the combination of the M-test (31.14) with the Huber score
function and the median-type test (31.23). We shall deal with the mixed crite-
rion

(31.31 )

Equation (31.31) implies that, under Ho, W n has an asymptotically normal


distribution N(O, ji2Ip) , where

(31.32)
422 Y. Dodge and J. Jureckova

with

s; J~2 (~) f(x)dx (31.33)

s2 {(I _ 8)2 40"6 + 82 + ~8 (1 - 8) EP} .


4 'Y5 'Yo

The asymptotic distribution of Wn under the Pitman alternative Hn (31.12)


will be normal with a non zero expectation,

v N p (/-L, /'l,-21)
W n ---+ p, II.
rv = Q-l/2/30. (31.34)

If we knew the parameters EP, O"~, s, we would use the test criterion
S2
Wn = K;-2W~Wn = 40"-2(~,8,k)W~Wn' (31.35)

which is asymptotically X~ distributed under Ho, and asymptotically noncentral


X~ distributed under Hn with the noncentrality parameter K;-2 /3bQ-l /30. The
test would have the maximal efficacy for 8 minimizing K;2. Hence, the optimal
8 would minimize 0"2(F, 8, k) given in (31.33) and would coincide with 80 in
(31.36). The unknown EP,
0"6, and 'Yo we estimate by EP, 0-6, and 'Yo defined
in (31.37)-(31.39).

o
80 = (31.36)
40"6 - 4EP'Yo + 'Y5
1

EP _1
nSn i=l
t {(Yi - x~t3 (~)) I [IYi - x~t3 (~) I ~ kS m ]} (31.37)

k n
+- L
n i=l
{1- I [IYi - x~t3 (~)I ~ kS m ]}

(31.39)
Adaptive Combination of Tests 423

Thus we conclude that 8n , leading to the optimal adaptive combination of


the M-estimator and L1 estimators leads to the optimal adaptive combination
of the M-test and the median-type test.

References
1. Arthanari, T. S. and Dodge, Y. (1981). Mathematical Programming in
Statistics, New York: John Wiley & Sons.

2. Birkes, D. and Dodge, Y. (1993). Alternative Methods of Regression, New


York: John Wiley & Sons.

3. Cook, R. D. and Weisberg, S. (1982). Residuals and Inference in Regres-


sion, London: Chapman & Hall.

4. Cook, R. D. and Weisberg, S. (1994). An Introduction to Regression


Graphics, New York: John Wiley & Sons.

5. Dodge, Y. (1984). Robust estimation of regression coefficient by minimiz-


ing a convex combination of least squares and least absolute deviations,
Computational Statistics Quarterly, 1, 139-153.

6. Dodge, Y. and Jureckova, J. (1988). Adaptive combination ofM-estimator


and L1-estimator in the linear model, In Optimal Design and Analysis of
Experiments (Eds.,Y. Dodge, V. V. Fedorov, and H. P. Wynn), pp. 167-
176, Amsterdam: North-Holland.

7. Dodge, Y. and Jureckova, J. (1991). Flexible L-estimation in the linear


model, Computational Statistics and Data Analysis, 12, 211-220.

8. Dodge, Y. and Jureckova, J. (1995). Estimation of quantile density func-


tion based on regression quantiles, Statistics fj Probability Leiters, 23,
73-78.

9. Dodge, Y. and Jureckova, J. (2000). Adaptive Linear Regression, New


York: Springer-Verlag.

10. Donoho, D. 1. and Huber, P. J. (1983). The notion of breakdown point,


In A Festschrift for Erich Lehmann (Eds., P. J. Bickel, K. A. Doksum,
and J. L. Hodges), pp. 157-184, Belmont, California: Wadsworth.

11. Hampel, F. R. (1968). Contributions to the theory of robust estimation,


Ph.D. Thesis, University of California, Berkeley, CA.
424 Y. Dodge and J. Jureckova

12. Jureckova, J. and Sen, P. K. (1996). Robust Statistical Inference: Asymp-


totic and Interrelations, New York: John Wiley & Sons.

13. Koenker, R. and Bassett, G. (1978). Regression quantiles, Econometrica,


46.33-50.
32
Partially Parametric Testing

J. C. W. Rayner
University of Wollongong, Wollongong, Australia

Abstract: If a smooth test of goodness-of-fit is applied and the null hypoth-


esis is rejected, the hypothesized probability density function is replaced by a
k parameter alternative to the original model. Three examples are given of
inference based on such a model:
• S-sample smooth tests for goodness-of-fit
• partially parametric alternative tests to the t-test
• tests for the location of modes.

Keywords and phrases: k parameter alternative, modes, orthonormal func-


tions, score test, S-sample goodness-of-fit, Wald test, Wilcoxon test

32.1 Partially Parametric Inference


The smooth tests of goodness-of-fit, as described in Rayner and Best (1989), are
based on an idea of Neyman (1937), and a tool of Pierce [see Kopecky and Pierce
(1979) and Thomas and Pierce (1979)]. If we wish to test for a distribution with
probability density function f(x; (3), we imbed it in a k-parameter alternative

gk(x;B,{3) = C(B,(3)exp { .2:


q+k
Bi hi(X;(3)
}
f(x;(3),-oo < x < 00, (32.1)
'!=q+l

that involves kBs. We test Ho : B = 0 against K : B of. O. Here f(x; (3) involves a
q x 1 vector of nuisance parameters (3 (for example, composed of the mean J-L and
standard deviation (j when testing for normality with unspecified parameters).
The {hi (x; (3)} may be taken to be a set offunctions orthonormal on f(x; (3) and
C(B, (3) is a normalizing constant that ensures gk(X; B, (3) is a proper probability
density function. Care must be taken because C(B, (3) may not exist or may

425
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
426 J. C. W. Rayner

only exist over restricted domains. See Figures 32.1 and 32.2 for examples of
smooth probability density functions.
Neyman (1937) used this model with f(x; (3) the continuous uniform dis-
tribution and {hi(X;,a)} the Legendre polynomials. He assumed there were no
nuisance parameters, so the probability integral transform could be used to con-
vert completely specified distributions to uniformity. If testing for normality the
orthonormal system is usually taken to be the normalized Hermite-Chebyshev
polynomials. Kopecky and Pierce (1979) and Thomas and Pierce (1979) both
permitted nuisance parameters but took hr(x; (3) = x r , with a consequent lack
of convenience.
Here by Partially Parametric Inference I mean inference based on the more
complicated probability density function of the form 9k(X; B, (3) rather than
f(x; (3). One way this may arise is if a smooth goodness-of-fit test for f(x; (3) has
been applied and rejected. Often goodness-of-fit testing is preliminary to some
other use, or is a retrospective verification that use was valid. So normality
may be assessed if the analysis of variance is contemplated. If normality is
rejected, the alternative specified by a smooth test of goodness-of-fit is of the
form 9k(X; B, (3). This leads to generalizations of the tests recommended in
Section 32.3. In general the term Partially Parametric Inference could be used
for more general forms. For example, the family {hi(X; (3)} may not be the
orthonormal polynomials; a different form of nesting family other than that
specified in (32.1) may be used. An example of the latter occurred in Rayner
and Best (1996), where a smooth alternative to independence was proposed and
used to find new tests of independence.
If the summation in (32.1) is "full" (k maximal), then virtually all distribu-
tions are possible. Pearson's X~ is an example. If the summation is "empty",
the probability density function is f(x; (3) and the inference is parametric. The
more terms involved in the summation, the richer the family 9k(X; B, (3) and
the more "distribution-free" the inference. The focus here is on having only
one or two terms in the summation, towards the parametric end of what might
be called the partially parametric continuum. Semiparametric generally has a
different meaning. Three examples will be outlined.

32.2 S-Sample Smooth Tests for Goodness-of-Fit


If we wish to assess if S random samples have been drawn from the same
population, we may assume each population probability density function has
the form (32.1), but with possibly different B, and test to see if the corresponding
B in different populations are consistent. See Rayner and Rayner (1997, 1998).
One problem with this approach is the choice of f(x; (3), that we call a target
probability density function.
Partially Parametric Testing 427

As an example of the testing procedure, consider the following example from


Rayner and Rayner (1998). Teaching assessments from five class surveys are
given in Table 32.1. These results relate to one question in the survey, "Because
of the lecturer, in this subject my enthusiasm has ... ", which was answered in
five categories, labeled from "decreased greatly" (here coded as 0) to "increased
greatly" (here coded as 4).

Table 32.1: Class survey results

0 1 2 3 4 Total Mean
81 0 1 8 6 1 16 2.438
82 0 0 4 11 6 21 3.095
83 0 0 4 6 0 10 2.600
84 1 14 42 19 3 79 2.114
85 4 6 19 16 12 56 2.456

Table 32.2: Components Vr8 using a


discrete uniform target and normalized
Chebyshev polynomials
8urvey Order (r)
(s) 1 2 3
1 1.237 -3.137 -1.591
2 3.549 -0.913 -2.467
3 1.342 -2.646 -2.683
4 0.716 -7.323 -0.636
5 2.435 -2.216 -1.124

A convenient "target" distribution is the discrete uniform on five points.


The Chebyshev orthogonal polynomials were used. The testing procedure in-
volves calculating standardized sums Vr8 to assess rth order effects in the 8th
survey. The Vr8 are asymptotically standard normal and asymptotically mu-
tually independent, and are the same as the components used in one sample
testing for goodness-of-fit.
Roughly, the order r statistic reflects rth moment differences between the
data and the target distribution. Of the five surveys, survey four has smallest
mean and survey two has greatest mean. All the means are above that of
the target distribution, while all the dispersions are below that of the target
distribution. This suggests the target may not have been well chosen. If only the
first two orders are thought relevant, then the third order terms in Table 32.2
428 J. C. W. Rayner

may be neglected, with the caveat that the conclusions only apply in respect
of, roughly, location and dispersion effects.
Rayner and Rayner (1997) derive score tests for the situation when hierar-
chic testing is appropriate; Rayner and Rayner (1998) construct the same test
statistics nonhierarchically as contrasts in the Vrs . From this approach an LSD
assessment can be made. Ultimately we find a first order difference between
samples two and four, and a second order difference between samples four and
five. The location differences suggest the relatively small class responded better
than the larger class, a well-known effect. The second order difference between
the larger classes reflect greater polarization of one class relative to the other.
In fact this was due to using different teaching methods that some students
in the polarized class responded well to, while others responded poorly. This
effect is not usually accessed by current methods of analysis of such surveys.
Using different targets may affect the conclusion, but the example in Rayner
and Rayner (1997) shows remarkable robustness to the targets considered.
I suspect the sort of modifications made by the data-driven school to the
one sample problem may well be applied profitably to this problem.

32.3 Partially Parametric Alternatives to the t-Test


Assume we have a random sample Xl, ... , Xn from a distribution initially
thought to be N(/-L, (J2). We wish to test Ho : /-L = 0 against K : /-L # 0
with (J a nuisance parameter. If normality has been rejected, testing could
be based on nonparametric tests, but we now consider using the probability
density function the goodness-of-fit testing has identified. We retain symme-
try and simplicity by including only one hi in (32.1), and that is h4, giving,
roughly, kurtosis differences from normality. This probability density function
is denoted by g(x; 04,/-L, (J) in general and by g4(X) in Figure 32.1 that gives
probability density functions of this form for varying rh. Note that we restrict
consideration to 04 < 0 to ensure the probability density function is proper.
We now conduct a simulation study to compare, for a random sample of
n = 50 and a nominal size of 5%, the performances of three tests of Ho against
K:
• the t-test, which is known to be uniformly most powerful unbiased level
a if the data are normal,
• the score test based on g(x; 04, /-L, (J) [derived in Carolan and Rayner
(2000a)], and
• the Wilcoxon test.
The score test is quite complicated, and no details are given here: instead see
Carolan and Rayner (2000a). Sizes and powers are based on 5,000 simulations.
Our simulations show that for all 04 the test sizes are comparable, that
Partially Parametric Testing 429

9=0.2
___ ._._ 9 = 0.4
0.5
------ 9 =0.8
- - 9=-0.4

"
_ ... _ ..- 9=-0.8 ,-, ~

0.4 I \ " "


• ---- 9=-1.6 I
I
\
, ,I \,
,/
\

, ,.
I \
\
,
\

0.1
~., .
, I

0.0
-2 -1 ox 1 2

Figure 32.1: The probability distribution function g(x; ()4, 0,1) for varying val-
ues of ()4

when ()4 = 0 there is no real difference between the power curves, and that
as ()4 increases the score test quickly becomes dominant. The Wilcoxon test
is inferior to the t-test for -1.2 < fh < 0 and thereafter becomes the more
powerful of the two. The results are most effectively shown graphically. See
Figure 32.3.
It is interesting to note that one of the early criticisms of Pearson's test
was that when it rejected a model, no alternative model was identified. The
presentation of Pearson's test as a smooth test [see Rayner and Best (1989,
Theorem 5.1.2)] overcomes this objection. The study reported above shows
there may be considerable power gains available if testing is based on the model
identified by goodness-of-fit testing.
A score test based on this model is not the only option. In fact Carolan
and Rayner (2000b) found a Wald test to be preferable in a number of ways.
In particular the Wald test was found to have better power properties for non-
local alternatives. While the chi-squared asymptotic distribution is quite sat-
isfactory for the score test with moderate to large sample sizes, resampling is
recommended to obtain p-values for the Wald test. Both these tests require the
numerical evaluation of maximum likelihood estimates and computational con-
siderations limit the number of parameters that can be included in the model
(particularly if resampling is to be used for p-values). In general it is imprac-
tical to include more than two extra parameters. Fortunately, Carolan and
Rayner (2000b) showed that including just (h can lead to large power gains
430 J. C. W. Rayner

0.2

g(x)

0.1

-4 -3 -2 -1 0 1 2 345 6 7
x
Figure 32.2: Probability density function of the bimodal distribution given by
Equation (32.1) with modes at 0 and 4.32

and in general the gains to be made by including further parameters are small
for data from all but the most extremely nonnormal distributions (including e6
may be worth while for distinctly trimodal data).
This work has been extended to cover the completely randomized, random-
ized blocks and balanced incomplete block designs. Papers on these designs are
in preparation.

32.4 Tests for the Location of Modes


There is some literature on tests for the number of modes, but little on the
location of modes. To construct such tests we consider probability density
functions of the form (32.1) with (j = 1 and q = 1; including the e2 term
models dispersion differences from (j = 1. In addition, for the probability
density function to be proper, q + k must be even, e2 < 0.5 and ()q+k < O. If
also hr(x; (3) = (x - J-Lt then Carolan and Rayner (2001) show that J-L is a mode
of the distribution. There may be up to k/2 modes.
Score and various Wald tests of Ha : J-L = J-La against K : J-L -# J-La may be
derived. One issue is when there are several modes, which maximum likelihood
estimator is used? Carolan and Rayner (2001) suggest, depending upon the
aims of the data analyst, either choosing the nearest mode of the fitted distri-
bution or the mode of maximum height. Here we present some power curves
using the nearest mode approach. Again due to the inadequacies of asymptotic
approximations, resampling would be required to calculate p-values in practice.
As an example, consider testing Ha : J-L = 0 against K : J-L -# 0 given a
Partially Parametric Testing 431

sample of size n (taking the values 20, 50 and 100 below) from a bimodal
distribution of the form (32.1) with modes at 0 and 4.32. To achieve this we
put q = 1 (f3 = J.L), k = 3 and fJ = (0,0.25, -0.03l. See Figure 32.2.
The power functions have the interesting but sensible property of having
power approximately equal to the size at both modes. It seems we are not so
much testing "are the data consistent with a population mode of zero?" as "are
the data consistent with a population mode?" See Figure 32.4.
This procedure addresses a problem previously given very little attention in
the literature.

6. =0 6. =-{).4

1.0 1.

0.8 0.8

0.6 0.6
Power
0.4 0.4

0.2 0.2

0.0 O.
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5

8. =-{).8 8. =-1.2

1.0 1.
Score test Score test
t test t test
0.8 0.8 WilcoX!l11 test

0.6 O.
Power
0.4 O.

0.2

0.0 O.
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
!1 !1

8.=-1.6
1.0
Score test,'
ttest.
0.8 Wilcoxon test

0.6
Power
0.4

02

0.0
0.0 0.1 0.2 0.3 0.4 0.5
!1

Figure 32.3: Comparison of t-test, Wilcoxon test and score test power curves
for testing Ho : J.L = 0 against K : J.L -=1= 0 as the data becomes progressively
more non-normal
432 J. C. W. Rayner

1.0 .I,....
... -
".!.\..·..:~·I.
"
...
".
,, '..
.
0.8

0.6
Power
0.4

0.2
. .
• • • • • • 1 . . . . . . . . . . . . . . . . . .; ' . • • • • • • • • • • • • • • • • • • • • • • • • • I ' • • • r •••••• II ...... ~·~ II .......... ~.I ••

0.0
-2 o 2 4 6
Il

Figure 32.4: Comparison of power curves of the Wald test using the nearest
mode technique for samples of size 20 (solid), 50 (dashes) and 100 (dots) from
the bimodal distribution in Figure 32.2 above; 1000 simulations

References
1. Carolan, A. M. and Rayner, J. C. W. (2000a). One sample tests of location
for nonnormal symmetric data, Communications in Statistics- Theory
and Methods, 29, 1569-1581.

2. Carolan, A. M. and Rayner, J. C. W. (2000b). Wald tests of location for


symmetric nonnormal data, Biometrical Journal, 42, 777-792.

3. Carolan, A. M. and Rayner, J. C. W. (2001). One sample score tests for


the location of modes of nonnormal data, Journal of Applied Mathematics
and Decision Sciences, 5, 1-19.

4. Kopecky, K. J. and Pierce, D. A. (1979). Efficiency of smooth goodness-


of-fit tests, Journal of the American Statistical Association, 74, 393-397.

5. Neyman, J. (1937). Smooth test for goodness of fit, Skandinavisk .Aktua-


rietidskrijt, 20, 150-199.

6. Rayner, J. C. W. and Best, D.J. (1989). Smooth Tests of Goodness of Fit,


Oxford University Press: Oxford, England.
Partially Parametric Testing 433

7. Rayner, J. C. W. and Best, D.J. (1996). Smooth extensions of Pearson's


product moment correlation and Spearman's rho. Statistics and Proba-
bility Letters, 30, 171-177.
8. Rayner, J. C. W. and Rayner, G. D. (1997). S-sample smooth goodness
of fit tests, Mathematical Scientist, 22, 106-116.

9. Rayner, J. C. W. and Rayner, G. D. (1998). S-sample smooth goodness of


fit tests: Rederivation and Monte Carlo Assessment, Biometrical Journal,
40,651-663.

'10. Thomas, D. R. and Pierce, D. A. (1979). Neyman's smooth goodness-


of-fit test when the hypothesis is composite, Journal of the American
Statistical Association, 14,441-445.
33
Exact Nonparametric Two-Sample Homogeneity
Tests

Jean-Marie Dufour and Abdeljelil Farhat


CIRANO and C.R.D.E., Universite de Montreal, Montreal, Quebec, Canada

Abstract: In this paper, we study several tests for the equality of two un-
known distributions. Two are based on empirical distribution functions, three
on nonparametric probability density estimates, and the last ones on differences
between sample moments. We suggest controlling the size of such tests (under
nonparametric assumptions) by using permutational versions of the tests jointly
with the method of Monte Carlo tests properly adjusted to deal with discrete
distributions. In a simulation experiment, we show that this technique provides
perfect control of test size, in contrast with usual asymptotic critical values.

Keywords and phrases: Monte Carlo tests, goodness-of-fit tests, nonpara-


metric methods, two-sample problem

33.1 Introduction
A common problem in statistics consists in testing whether the distributions
of two random variables are identical against the alternative that they differ
in some way. More precisely, we consider two random samples Xl, ... ,Xn and
YI , ... ,Ym such that F(x) = P[Xi :::; x], i = 1, ... ,n, and G(x) = P[Y] :::; x],
j = 1, ... ,m. In this paper, we do not wish to impose additional restrictions on
the form of the cumulative distribution functions (cdf) F and G, which may be
continuous or discrete. We consider the problem of testing the null hypothesis
Ho : F = G against the alternative HI : F =I- G.
Ho is a nonparametric hypothesis, so testing Ho requires a distribution-free
procedure. Thus, many users who have to make such a confrontation resort to
a goodness-of-fit test, usually the two-sample Kolmogorov-Smirnov (KS) test
[Smirnov (1939)] or the Cramer-von Mises (CM) test [see Lehmann (1951),

435
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
436 J.-M. Dufour and A. Farhat

Rosenblatt (1952) and Fisz (1960)]. Other procedures that have been suggested
include permutation tests based on Ll and L2 distances between kernel-type
estimators of the relevant probability density functions (pdf) [Allen (1997)] and
tests based on the difference of the means of the two samples considered [see
Pitman (1937), Dwass (1957), Efron and Tibshirani (1993)]. Except for the last
procedure, which is designed to have power against samples that differ through
their means, the exact and limiting distributions of the test statistics are not
standard, and tables for the exact distributions are only available for a limited
number of sample sizes. Thus these tests are usually performed with the help
of tables based on asymptotic distributions. This leads to procedures that do
not have the targeted size (which can easily be too small or too large) and may
have low power.
In this paper, we aim at finding test procedures with two basic features.
Namely, the latter should be: (1) truly distribution-free, irrespective of whether
the underlying distribution F is discrete or continuous, and (2) exact in finite
samples (i.e., they must achieve the desired size even for small samples). In this
respect, it is important to note that the finite and large sample distributions of
usual test statistics are not necessarily distribution-free under Ho. In particu-
lar, while the KS and eM statistics are distribution-free when the observations
are i.i.d with continuous distribution, this is not anymore the case when they
follow a discrete distribution. For the statistics based on kernel-type density
estimators, distribution-freeness does not obtain even for i.i.d observations with
a continuous distribution. This difficulty can be relaxed by considering a per-
mutational version of these tests which uses the fact that all permutations of
the pooled observations are equally likely when the observations are i.i.d with a
continuous distributions. The latter property, however, does not hold when the
observations follow a discrete distribution. So none of the procedures proposed
to date for testing Ho satisfies the double requirement of yielding a test that is
both distribution-free and exact.
Given recent progress in computing power, a way to solve this difficulty
consists in using simulation-based methods, such as bootstrapping or Monte
Carlo tests. The bootstrap technique however does not ensure that level will be
fully controlled in finite samples. For this reason, we favor Monte Carlo (MC)
test methods. MC tests were introduced by Dwass (1957) and Barnard (1963)
and developed by Birnbaum (1974), Dufour (1995), Kiviet and Dufour (1996),
Dufour et al. (1998), and Dufour and Kiviet (1998)].
In this paper, we first show how the size of all the two-sample homogene-
ity tests described above can be perfectly controlled for both continuous and
discrete distributions on considering their permutational distribution and using
the technique of MC tests properly adjusted to deal with discrete distributions.
As a result, in order to implement these tests, it is not anymore necessary to
establish the distributions of the test statistics, either in finite samples or as-
ymptotically. Second, as a consequence of the great flexibility allowed by the
Two-Sample Homogeneity Tests 437

MC test technique in selecting test statistics, we suggest alternative procedures,


in particular: (i) a statistic based on the Loo distances between kernel-type pdf
estimators; (ii) extensions of the permutational test based on the difference of
two-sample means to higher order moments, such as sample variances, asym-
metry (based on third moments) and kurtosis sample coefficients. Thirdly, we
present the results of an MC experiment showing clearly that usual large-sample
critical values do not control size, while the MC versions of the tests achieve
this aim perfectly. Further, we see that the new procedure introduced can
lead to the true power, independently of the distributions nature, continuous
or discrete.
Section 33.2 presents the test statistics studied. In Section 33.3, we explain
how the technique of MC tests can be applied with all the statistics considered
to control the size of the corresponding tests under nonparametric assumptions.
Section 33.4 describes the results of our study, first for continuous distributions
and then for discrete distributions. We conclude in Section 33.5.

33.2 Test Statistics


Let Xl, ... , Xn be a sample of independent and identically (i.i.d.) distributed
observations with common cdf F(x) = P[Xi ::; x] and Yl, ... , Ym a sample
i.i.d. observations with cdf G(x) =. P[Yi ::; x]. The problem is to test the
homogeneity hypothesis Ho and, for that matter, our study will include the
following test statistics. In all the tests presented below, Ho is rejected when
the test statistic is large.
The first two criteria are the K S and the eM statistics. The K S test was
introduced by Smirnov (1939) and uses the statistic

KS = sup IFn(x) - Gm(x)1 (33.1)


x

where Fn(x) and Gm(x) are the usual empirical distribution functions (edf)
associated with the X and Y samples respectively. It is well known that K S
is distribution-free [see Conover (1971, page 313)] under Ho when the common
distribution function F is continuous, but its exact and limiting distributions
are not standard. Birnbaum and Hall (1960) provide tables for its exact dis-
tribution in the case of equal sample sizes, whereas Massey (1952) does the
same for unequal sample sizes. Further, it is important to note that KS is not
distribution-free when F is a discrete distribution.
The two-sample eM statistic is defined as
438 J.-M. Dufour and A. Farhat

CM is also distribution-free under Ho with F continuous and, again, the exact


and limiting null distributions of CM are not standard. Anderson (1962) and
Burr (1963, 1964) provide tables for the exact distribution in the case of small
sample sizes (n + m :::; 17). Otherwise, a table of the asymptotic distribution is
available from Anderson and Darling (1952).
The next three statistics are based on distances (L1, L2 and Loo) between
kernel-based pdf estimators. If i is the pdf associated with the cdf F, Allen
(1997) considered the kernel-type density estimator
C
in(x) = ~
n
L K[Cx(x -
i=l
n
Xi)] (33.3)

where

Cx = nl/5/(2Sx), K (x) = { ~, if Ix I : : ; 1,
0, if Ixl > 1,
n
and Sx = [(n - 1)-1 2::i=l (Xi - X)
- 2] 1/2 . .
IS the usual estImator of the popula-
tion standard deviation. If 9 is the pdf associated with the cdf G, its estimator
9m(X) is defined in a way analogous to (33.3). The L 1-distance test initially
proposed by Allen (1997) is based on the statistic
n m

L1 = L lin(Xi) - 9m(Xi) I + L lin(Yj) - 9m(Yj)1 . (33.4)


i=l j=l

The L 2-distance and Lao-distance tests are based on the statistics:

(33.5)

Loo sup lin - 9ml


x

For the case where both F and G are discrete, the pdf i and 9 are replaced
by the probability mass functions p(x) = P[X = x] and q(x) = pry = x]:
each one is estimated with the help of formula (33.3) after which in and 9m are
replaced by the these estimators in the L 1, L2 and Loo statistics. In contrast
with the K Sand C M statistics, the finite sample distributions of the statistics
L1, L2 and Loo are not distribution-free even when the distribution function F
is continuous as well as (a fortiori) when F is discrete.
The next statistic to enter our study is the difference of the sample means
(h = X - Y. Permutation tests based on (h were initially proposed by Fisher
Two-Sample Homogeneity Tests 439

(1935) and used by Dwass (1957) for testing the equality of means, but Efron
and Tibshirani (1993, Chap. 15) suggested to extend their use, along with
bootstrap tests, for testing the equality of two unknown distributions. Contrary
to Allen (1997) who also considered bootstrap tests, the statistic based on the
studentized difference of sample means

(33.7)

will not be considered since our study is restricted to permutation tests and
it is straightforward to see that such tests based on el
and i are equivalent
[see, for instance, Lehmann (1986)]. Further, we suggest here alternative test
statistics based on comparing higher-order moments. Namely, the difference
between unbiased estimators of sample variances,

fh
A
= -1-
n-1
2:
n
( Xi - X-)2 - -1-
m-1
2:
m
( Yi - y-)2 ,
i=l i=l

as well as statistics based on comparing sample skewness and kurtosis coeffi-


cients:

(33.8)

(33.9)

where

(33.10)

Note that skewness and kurtosis coefficients playa central role in testing nor-
mality [see Jarque and Bera (1987) and Dufour et al. (1998)].
440 J.-M. Dufour and A. Farhat

33.3 Exact Randomized Permutation Tests


Except for the Dwass (1957) procedure, all the tests described in the previous
section involve imperfectly tabulated null distributions or are not distribution-
free in finite samples. Consequently, the latter may lead to arbitrarily large size
distortions. In view of obtaining distribution-free tests with known size in finite
samples, we first note that truly distribution-free tests (for any given sample
size) can be based on the statistics KS, eM, L I , L2, L oo , i, BI , B2 ,B3 and B4 by
considering the distribution obtained on permuting in all possible ways (with
equal probabilities) the m + n grouped observations Xl, ... , X n , YI, ... , Y m .
Since these permutations are equally probable under the null hypothesis Ho,
irrespective of the unknown distribution F, any test which rejects Ho by using
an exact critical value obtained from its permutational distribution [i.e., its
conditional distribution given the ordered statistics of the grouped observations]
will have the same level conditionally (on the ordered statistics) as well as
unconditionally.
If T designates a pivotal test statistic (i.e. its distribution does not depend
on unknown parameters under the null hypothesis), we can proceed as follows
to conduct an MC test. Denote by To the test statistic computed from the
observed sample. When the null hypothesis is rejected for large values of To, the
associated critical region of size a may be expressed as G(To) ::; a, where G(x) =
P [T ~ xlHo] is the p-value function. Generate N independent samples (XiI,
... ,Xin , lil, ... , lim), 1 ::; i ::; N, drawn from the specified null distribution
Fo. This leads to N independent realizations Ti = T(Xil, ... ,Xin, lil, ... ,
lim),l ::; i ::; N, from which we can compute an empirical p-value function:

~ ( ) _ NGN(X) + 1 (33.11)
PN x - N +1
where

lA(X) = { 1, x E A
0, x ~ A

The associated M C critical region is defined as

(33.12)

where fiN(To) may be interpreted as an estimate of G(To). When T has a


continuous distribution, it can be shown that [see Dufour (1995)]:

P r;;::;-
IYN iO
(fTl) <
_
n'l LIO =
u.
IT ] I [a(N + 1)]
N +1 ,0::; a ::; 1 , (33.13)
Two-Sample Homogeneity Tests 441

where I[x] denotes the largest integer not exceeding x. Thus if N is chosen
such that o:(N + 1) is an integer, the critical region (33.12) has the same size as
the critical region G(To) ~ 0:. The Me test so obtained is theoretically exact,
irrespective of the number N of replications used.
The above procedure is closely related to the parametric bootstrap, with a
fundamental difference however. Bootstrap tests are, in general, provably valid
for N ---t 00. In contrast, we see from (33.13) that N is explicitly taken into
consideration. in establishing the validity of Me tests. Although the value of
N has no incidence on size control, it may have an impact on power which
typically increases with N.
Note that (33.13) holds for tests based on statistics with continuous dis-
tributions. In such a case, ties have non-zero probability. Nevertheless, the
technique of Me tests can be adapted to discrete distributions by appeal to the
following randomized tie-breaking procedure [see Dufour (1995, Section 2.2)].
Draw N + 1 uniformly distributed variates Uo, UI, ... , UN, independently of
the Ti'S and arrange the pairs (Ti, Ui) following the lexicographic order:

Then, proceed as in the continuous case and compute

_ ( ) _ NGN(X) + 1
PN X - N +1 '

where

The resulting critical region PN(To) ~ 0: has the same level as the region
G(To) ~ 0:, provided again that o:(N + 1) is an integer. More precisely,

If a null hypothesis ensures that the random sample is made up of exchange-


able variables and if it should be rejected for large values of the test statistic,
an Me test of that hypothesis is carried out in five steps: first, the test statis-
tic is computed with the help of the observed sample which gives a value To,
say; second, N permutations of the sample are chosen at random and without
replacement from all possible permutations; third, the test statistic is recom-
puted for each of the permuted samples which gives the values TI, ... , TN, say;
fourth, if Ro designates the rank of To among the set {To, T I , ... , TN} [in the
case of ties, one may resort to the randomization method suggested by Dufour
(1995)], the p-value associated with the Me test of the null hypothesis is given
442 J.-M. Dufour and A. Farhat

by 1 - Ro/(N + 1); lastly, a decision is reached according to the chosen level


[see Dufour(1995)]. The fact that the procedure is randomized plays a central
role in controlling the size of the test. In bootstrap-type procedures, one does
as if the number of replications were infinite.

33.4 Simulation Study


In the simulation study presented here, all tests _ original as MC _ were per-
formed at the 5% level using 10000 trials. This entails that the 95% confidence
interval for the nominal level is [4.57%, 5.43%]. Furthermore, they were all
conducted with equal sample sizes m = n = 22. As mentioned earlier, each MC
test was carried out by picking at random N = 99 permutations of the original
grouped sample and this was done by using the IMSL Program Library random
number generator. In his simulation study, Allen (1997) used 2500 trials and
each permutation or bootstrap test was carried out with 499 samples. A more
extensive set of simulation results are available in Dufour and Farhat (2000).
For the first part of the study where F and C are both continuous, the fol-
lowing distributions were considered: normal N(O, 1), exponential Exp(O, 1.5),
gamma r(2, 1), beta B(2, 3), logistic Log( -1, 1), lognormal A(4, 1.5) and uni-
form U(O, 1). In this choice, care was taken to have at the same time simple
parameters as well as appreciably different means and variances. Table 33.1
gives the list of those means and variances. From this preliminary list, four
situations were considered: (i) the distributions were standardized, and thus
had common null mean and unit variance; (ii) the distributions were centered
to have common null mean but possibly different variances; (iii) the distribu-
tions were scaled to have common unit variance but different means; (iv) the
distributions remained with different means and different variances. Whatever
the situation, a null hypothesis is obtained each time F and C share the same
distribution from the list and an alternative hypothesis is obtained each time
F and G possess different distributions from that list.

Table 33.1: Continuous distributions with their means and variances.

Distribution N Exp r B Log A U


Mean 0 1.5 2 .4 -1 168.17 .5
Variance 1 2.25 2 .04 0.55133- 2 240055 1/12

For the second part of the study where F and C are discrete, the five most
commonly used distributions were retained: discrete uniform (DU), binomial
(Bin), geometric (Ceo), negative binomial (Nbin) and Poisson (P). Since it is
Two-Sample Homogeneity Tests 443

a prohibitive task to find parameters that will simultaneously give rise to either
common mean and common variance, the following three situations were con-
sidered: (i) the distributions were DU(19), Bin(20, 0.5), Geo(O.l), Nbin(8, 0.2),
P(10) and, thus had common mean 10 and variance 30, 5, 90, 2.5 and 10 respec-
tively; (ii) the distributions were DU(lO), Bin(33,0.5), Geo(( J34 - 1)/16.5),
Nbin(3, ()108 - 3)/16.5), P(8.25) and, thus had mean 5.5, 16.5, 3.42, 2.23
and 8.25 respectively but common variance 8.25; (iii) the distributions were
DU(10), Bin(lO, 0.1), Geo(0.3), Nbin(10, 0.2), P(5) and, thus had mean 5.5, 1,
3.33, 50 and 5 respectively and variance 8.25, 0.9, 7.78, 200 and 5 respectively.
lh As a first check on the accuracy of our study, Tables 1 and 2 of Allen (1997)
were done again adding, however, the CM, the Lx) and MC tests based on
higher-moments and by excluding the bootstrap tests. The results appear in
Table 33.2 and they are quite similar to those of Allen(1997).

Table 33.2: Empirical level and power for tests of equality of two distributions:
m = 22, n = 22 and a = 5%

F = N(O, 1)
G Original tests Me tests
KS CM KS CM ih ih ih iJ4 il i2 i=
N(O, 1) 6.2 5.2 4.8 5.1 5.0 4.9 4.7 5.4 4.8 4.8 4.7
N(0.2,1) 7.7 6.7 6.2 6.3 6.4 4.7 5.0 5.1 5.3 5.2 5.1
N(0.3,1) 10.8 9.5 8.6 9.0 9.8 4.8 5.2 4.9 6.8 6.7 6.7
N(O.4,l) 16.1 15.8 13.5 15.0 16.2 4.7 5.6 5.5 10.2 10.1 9.2
N(0.5,1) 32.8 34.1 28.3 32.9 36.1 3.9 6.1 5.9 19.3 18.7 17.1
N(0.7, 1) 54.3 57.5 48.9 55.9 60.3 3.1 6.1 6.5 34.8 33.9 30.8
N(O, 1.2") 7.3 5.9 5.8 5.8 5.1 11.6 4.7 5.1 9.9 10.1 10.0
N(O, 1.42) 9.1 6.8 7.2 6.6 5.0 28.5 4.2 5.2 22.7 23.4 23.6
N(O, 1.62) 12.5 9.7 10.5 9.8 5.1 49.9 4.0 4.5 42.0 42.6 42.9
N(O, 1.82) 15.4 11.5 13.1 11.6 5.3 66.2 3.2 3.9 58.9 59.7 59.8
N(0,2.2) 20.4 16.1 17.3 15.6 5.1 80.2 2.6 3.4 74.6 75.3 74.5

Tables 33.3 and 33.4 contain the results of our study. The following con-
clusions can be drawn. As expected, the MC tests control size perfectly and
are easily applicable. The original K Sand C M tests, for which tables are
available, show size distortions. Although in the case where both F and G are
contir.uous, the eM test appears adequate and the K S test only exhibits light
size distortions, the distortions become severe when both F and G are discrete.
The use of fh to carry out equality tests of two distributions is erroneous.
It is obvious that two distributions cannot be equal if they do not have the
same mean but the converse is not true. Consequently, if the test based on lh
accepts the hypothesis Ha, it should not be interpreted as an acceptance of the
fact that F = G but rather that these distributions have equal means. The
LI and L2 tests behave almost identically and differ slightly from the Loo test.
444 J.-M. Dufour and A. Farhat

In the same way, the power of the K S test is not very different from that of
the C M test. In general, if we compare the powers of the tests based on edf's
(KS and CM tests) with those based on pdf estimates (£1, £2 and LX) tests),
we notice a great difference and we cannot conclude that a test stemming from
one group is more powerful than all the tests in the other group. The edf tests
are more powerful than those based on pdf estimates when two distributions
have the same variance but different means. On the other hand, if the two
distributions have the same mean but different variances, the tests based on
pdf estimates are the most powerful.
As for the case where both F and G are discrete, the results for two from the
three situations are presented successively in Table 33.4. As in the case where
the distributions were continuous, the conclusions reached for the MC tests still
apply. Moreover, the simulation confirms the result stated by Noether (1967)
indicating that, if random variables are discrete, the K S test is still valid but
becomes conservative. On the other hand, it reveals that the C M test is quite
often tolerant although Conover (1971) indicates that it has a tendency to be
conservative in the case of discrete distributions.

33.5 Conclusion
In this paper, we first showed that finite-sample distribution-free two-sample
homogeneity tests, for both continuous and discrete distributions, can be eas-
ily obtained on combining two techniques: (1) by considering permutational
versions of most proposed tests for that problem; (2) by implementing the per-
mutation procedures as Monte Carlo tests with an appropriate tie-breaking
technique to take account of the discreteness of the test null distributions. Sec-
ond, due to the flexibility of the Monte Carlo test technique, we could easily
introduce and implement several alternative procedures, such as permutation
tests comparing higher-order moments. Other alternative procedures are de-
scribed in Dufour and Farhat (2000). Thirdly, in a simulation study, it was
shown that the procedures proposed work as expected from the viewpoint of
size control, while the new suggestions made yielded power gains.

Note _ This paper is a summary of Dufour and Farhat (2000).


Two-Sample Homogeneity Tests 445

Table 33.3: Empirical level and power for Me tests of equality of two continuous
distributions having same mean and same variance: m = n = 22 and a = 5%

F=N
G Original tests Me tests
KS CM KS CM 81 82 83 84 Ll L2 Loo
N 6.2 5.2 4.8 5.1 5.0 4.9 4.7 5.4 4.8 4.8 4.7
Exp 16.1 12.8 13.6 12.6 5.6 10.6 41.7 15.7 17.4 17.2 15.7
r 11.0 8.4 8.8 8.2 5.2 7.9 26.8 10.1 11.3 11.4 11.0
B 7.1 5.6 5.7 5.7 4.7 5.6 7.7 7.4 6.3 6.2 5.9
Log 6.4 5.2 5.1 5.0 4.7 5.5 5.4 6.3 5.2 5.4 5.4
A 77.3 69.1 71.6 65.0 6.0 59.0 70.3 63.1 68.8 67.9 65.6
U 8.2 5.9 6.7 5.9 5.2 6.1 6.7 17.1 6.7 6.8 7.2
F=Exp
Exp 6.1 5.0 4.9 5.0 4.9 5.2 5.4 5.2 4.8 4.9 4.8
r 7.0 5.7 5.7 5.5 5.0 6.3 7.6 6.2 6.2 6.1 6.4
B 13.6 9.8 11.6 9.9 5.1 12.8 36.6 20.9 15.6 16.0 16.1
Log 16.7 13.3 14.1 12.8 5.2 8.9 36.5 11.5 15.1 14.7 13.6
A 88.7 76.3 84.8 72.2 5.0 49.9 30.6 31.8 55.2 55.0 55.0
U 19.0 13.6 16.2 13.1 5.0 16.7 55.7 35.3 22.1 22.4 23.5
F=f
r 5.9 4.9 4.7 4.8 4.9 5.2 4.9 5.5 5.0 5.1 5.0
B 9.0 7.1 7.4 6.7 5.0 8.9 20.2 11.7 9.4 9.5 9.6
Log 11.3 8.7 9.2 8.7 4.8 6.5 22.1 6.5 9.1 9.1 8.6
A 84.3 72.5 80.1 67.8 5.2 53.6 43.0 43.4 60.7 60.1 59.7
U 12.8 9.4 10.6 9.0 5.2 12.6 35.9 25.6 15.0 15.2 15.5.
F=B
B 6.5 5.6 5.2 5.5 5.3 5.2 5.1 4.9 5.3 5.2 5.4
Log 7.6 5.3 6.2 5.4 4.5 6.4 8.2 11.4 6.8 6.7 6.8
A 83.7 76.1 78.3 71.4 5.8 63.3 74.4 69.7 72.3 71.4 69.9
U 6.6 5.2 5.5 5.1 5.0 5.8 8.0 10.3 5.9 5.8 6.1
F=Log
Log 6.4 5.2 5.0 5.1 5.1 4.7 4.5 4.6 4.7 4.7 4.6
A 72.8 64.6 66.2 60.6 5.8 56.9 60.1 52.7 66.4 65.4 63.3
U 9.6 6.9 8.0 6.7 5.0 8.2 9.4 25.6 9.2 9.3 9.6
F=A
A 6.2 4.9 4.8 4.8 4.7 5.0 5.1 5.0 5.0 5.0 4.9
U 87.9 82.4 82.3 77.2 5.7 65.5 91.0 83.8 76.6 75.7 73.2
446 J.-M. Dufour and A. Farhat

Table 33.4: Empirical level and power for tests of equality of two discrete
distributions: m = n = 22 and a = 5%

Same mean but different variances


F=UD
G Original tests Me tests
KS CM KS CM {h e e
2 3 e4 £1 £2 £=
UD 3.8 5.4 4.7 5.1 5.1 5.1 4.8 5.5 5.2 5.2 4.9
Bin 38.3 50.0 51.5 45.1 5.5 99.2 7.0 13.3 96.8 96.9 97.1
Geo 14.1 19.4 16.6 17.9 6.5 35.6 34.6 8.7 22.3 23.2 26.0
BinN 69.2 79.3 82.6 71.3 5.5 100. 25.1 20.8 99.9 99.9 99.9
Poi 18.5 22.6 26.5 21.3 5.2 82.6 12.8 24.8 70.1 71.6 72.0
F=Bin
Bin 2.3 6.0 5.2 5.1 5.1 5.2 4.8 5.1 5.0 5.1 5.0
Geo 75.6 86.9 82.6 82.8 8.2 99.6 13.1 1.2 99.7 99.7 99.7
BinN 4.7 10.6 10.3 8.0 5.3 33.2 17.1 10.6 28.6 29.0 27.5
Poi 3.9 9.5 8.1 8.3 5.2 29.9 4.6 4.7 22.2 22.5 22.5
F=Geo
Geo 3.8 5.1 4.8 4.7 4.6 4.8 4.8 4.6 4.8 4.8 4.8
BinN 94.4 95.7 97.1 91.8 8.4 99.9 1.3 2.7 100. 100. 100.
Poi 53.4 61.3 61.6 57.8 7.8 94.3 10.9 2.4 93.3 93.9 93.8
F = BinN
BinN 1.8 9.7 5.3 5.4 5.2 5.0 4.7 5.2 5.1 5.1 5.1
Poi 11.5 27.5 24.2 21.8 5.1 78.7 10.2 9.8 63.5 64.7 66.7
Different means but same variance
F=UD
UD 2.9 5.6 4.9 4.9 4.9 5.2 4.8 5.2 4.9 4.9 4.8
Bin 100. 100. 100. 100. 100. 62.0 20.1 11.2 99.8 99.8 99.9
Geo 64.0 84.4 70.9 79.4 68.2 13.7 67.0 52.4 64.1 63.3 60.5
BinN 100. 100. 100. 100. 100. 0.2 27.1 33.1 100. 100. 100.
Poi 10.6 15.7 17.3 13.9 9.6 33.9 15.1 26.9 26.5 27.4 26.4
F=Bin
Bin .9 23.3 5.1 5.1 5.1 4.9 4.6 4.9 4.9 4.9 4.8
Geo 84.5 95.3 94.9 84.2 99.6 15.1 3.5 0.9 65.8 66.8 71.3
BinN 100. 100. 100. 100. 100. 0.0 7.8 83.1 100. 100. 100.
Poi 100. 100. 100. 100. 100. 7.0 13.4 9.8 100. 100. 100.
F=Geo
Geo 2.1 12.3 4.8 4.7 4.7 4.9 4.8 4.9 4.7 4.6 4.7
BinN 100. 100. 100. 100. 100. 8.0 33.2 77.0 100. 100. 100.
Poi 68.3 79.2 76.1 74.5 57.1 12.8 40.7 23.6 55.0 55.6 56.1
F = BinN
BinN 1.9 7.8 5.2 5.1 5.2 4.8 4.8 5.1 5.0 4.9 5.0
Poi 100. 100. 100. 100. 100. 0.1 15.6 49.9 100. 100. 100.
Two-Sample Homogeneity Tests 447

References
1. Allen, D. 1. (1997). Hypothesis testing using an Ll-distance bootstrap,
The American Statistician, 51-2, 145-150.

2. Anderson, T. W. (1962). On the distribution of the two-sample Cramer-


von Mises criterion, Annals of Mathematical Statistics, 33, 1148-1159.

3. Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of cer-


tain "goodness-of-fit" criteria based on processes, Annals of Mathematical
Statistics, 23, 193-212.

4. Barnard, G. A. (1963). Discussion on M. S. Bartlett, The spectral analysis


of point processes, Journal of the Royal Statistical Society, Series B, 25,
294.

5. Birnbaum, Z. W. (1974). Reliability and Biometry, pp. 441-445, Philadel-


phia: SIAM.

6. Birnbaum, Z. W. and Hall, R. A. (1960). Small sample distributions


for multi-sample statistics of the Smirnov type, Annals of Mathematical
Statistics, 31, 710-720.

7. Burr, E. J. (1963). Distribution of the two-sample Cramer-von Mises


criterion for small equal samples, Annals of Mathematical Statistics, 34,
95-101.

8. Burr, E. J. (1964). Small samples distributions of the two-sample Cramer-


von Mises' W2 and Watson's U 2 , Annals of Mathematical Statistics, 35,
1091-1098.

9. Conover, W. J. (1971). Practical Nonparametric Statistics, New York:


John Wiley & Sons.

10. Dufour, J.-M. (1995). Monte Carlo tests with nuisance parameters: A
general approach to finite sample inference and nonstandard asymptotics
in econometrics, Discussion paper, CRDE, Universite de Montreal.

11. Dufour, J.-M. and Farhat, A. (2000). Exact nonparametric two-sample


homogeneity tests for possibly discrete distributions, Discussion Paper,
CRDE and CIRANO, Universite de Montreal.

12. Dufour, J.-M. and Farhat, A. and Gardiol, L. and Khalaf, 1. (1998).
Simulation based finite sample normality tests in linear regressions, The
Econometrics Journal, 1, 154-173.
448 J.-M. Dufour and A. Farhat

13. Dufour, J.-M. and Kiviet, J. F. (1998). Exact inference methods for first
order autoregressive distributed lag models, Econometrica, 66, 79-104.

14. Dwass, M. (1957). Modified randomization tests for nonparametric hy-


potheses, Annals of Mathematical Statistics, 28, 181-187.

15. Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap,


Vol. 57, New York: Chapman & Hall.

16. Fisher, R. A. (1935). The Design of Experiments, London: Oliver and


Boyd.

17. Fisz, M. (1960). On a result by M. Rosenblatt concerning the Mises-


Smirnov test, Annals of Mathematical Statistics, 31, 427-429.

18. Jarque, C. M. and Bera, A. K. (1987). A test for normality of observations


and regression residuals, International Statistical Review, 55, 163-172.

19. Kiviet, J. F. and Dufour, J.-M. (1996). Exact tests in single equation au-
toregressive distributed lag models, Journal of Econometrics, 20-2, 325-
353.

20. Lehmann, E. L. (1951). Consistency and unbiasedness of certain non-


parametric tests, Annals of Mathematical Statistics, 22, 165-179.

21. Lehmann, E. L. (1986). Testing Statistical Hypotheses, 2nd Edition, New


York: John Wiley & Sons.

22. Massey, F. J. (1952). Distribution table for the deviation between two
sample cumulatives, Annals of Mathematical Statistics, 23, 435-441.

23. Noether, G. E. (1967). Elements of Nonparametric Statistics, New York:


John Wiley & Sons.

24. Pitman, E. J. G. (1937). Significance tests which may be applied to


samples from any populations, Journal of the Royal Statistical Society,
Series A, 4, 119-130.
25. Rosenblatt, M. (1952). Limit theorems associated with variants of the
von Mises statistic, Annals of Mathematical Statistics, 23, 617-623.

26. Smirnov, N. V. (1939). Sur les ecarts de la courbe de distribution em-


pirique (Russian/French summary), Rec. Math., 6, 3-26.
34
Power Comparisons of Some Nonparametric Tests
for Lattice Ordered Alternatives in Two-Factor
Experiments

Thu Hoang and Van L. Parsons


Universite Rene Descartes, Paris, France
National Center for Health Statistics, Hyattsville, Maryland

Abstract: In biological or medical situations the expectations of response


variables may be ordered by a rectangular grid partial ordering. For example,
serum glucose as a function of body mass index and age would typically be as-
sumed to be nondecreasing in each predictive variable. An order-restricted least
squares approach to hypothesis testing may be implemented, but the practical
implementation of estimation techniques and sampling theory tend to be com-
plicated. However, advances in computer processing have now made computer
intensive methods for such inference more practical.
In this paper we consider the problem of testing trend under the rectangular
grid ordering whenever we have small sample sizes and nonparametric sampling
assumptions. Our focus is upon the behavior of rank-based test statistics,
in particular, the order-restricted version of the Kruskal-Wallis statistic. We
compare the power of test statistics generated by order-restricted least squares
to the power of more traditionally defined statistics.

Keywords and phrases: Isotonic regression, rectangular grid

34.1 Introduction
Data from two-factor completely random designs are typically analyzed using
classical ANOVA methods. However, in certain two-factor experiments, the
response is expected to increase as the levels of both factors increase. (Note,
decreasing expectations and/or levels can be reparameterized to satisfy this
framework.) This knowledge allows the researcher to implement order-restricted
data analysis techniques on the experimental data. The usage of such techniques

449
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
450 T. Hoang and V. L. Parsons

should result in testing procedures having more power than unrestricted F-tests.
For example, consider a two-factor design for controlling high blood pressure
by a hypertension treatment at increasing dosage and a regimen of physical
exercise at increasing levels. Blood pressure should be expected to decrease
if levels of both treatments increase. In such cases, we say that experimental
responses are stochastically ordered with respect to the lattice order on the two
factors. Discussion of such experiments appears in Higgins and Bain (1999),
and we adopt some of their notation in presenting the analytical structures.
More precisely, consider a two-way experiment in which the increasing levels
of the factors A and B are labeled by i E {I, 2, ... I} and j E {I, 2, ... J},
respectively. We refer to the resulting I . J pairs of levels as the I x J cross-
classification grid. A stochastic ordering of the experimental response random
variable, X ij , defined on the I x J grid can be imposed, and for this paper the
case of distributional shift will be considered. We assume that the response has
the form Xij = ILij + eij where the ei/s are independent identically distributed
random errors having a continuous distribution function. The trend function,
ILij, on the I x J grid will be said to be isotone with respect to the lattice
ordering::s on the grid if ILij :::; ILkl whenever (i, j) ::S (k, l), i.e., i :::; k and j :::; l.
In experimental situations involving small sample sizes and weak distri-
butional assumptions about the samples, nonparametric procedures may be
appropriate. Most of the previous work on nonparametric tests of ordered al-
ternatives against homogeneity have been discussed for one-way layouts in com-
pletely random design or randomized complete block designs [Terpstra (1952),
Jonckheere (1954), Buning and Kassler (1996), Chacko (1963) and Shorack
(1967)]. Ordered two-factor completely random designs have been investigated
to a lesser degree [Ager and Brent (1978) and Higgins and Bain (1999)]. Chacko
(1963) and Shorack (1967) used an order-restricted least squares (ORLS) ap-
proach [see Robertson, Wright, and Dykstra (1988) for general discussion] to
define test statistics. For two-factor designs we focus on a Kruskal-Wallis-type
test statistic defined by ORLS procedures [see Robertson, Wright, and Dykstra
(1988, Chapter 4.5)]. For this case the test statistic and its null distribution are
difficult to compute, and thus, its usage has been deferred. Recent advances
in computer processing have made ORLS methods more practical. With non-
parametric tests, complicated null distributions can be simulated, and statistics
involving heavy computation can be put to practical use. The authors are not
aware of a power study of nonparametric ORLS tests such as the one herein.

34.2 Hypotheses and Test Statistics


Let Xijk, k = 1,2, ... nij, be the sample of size nij in cell (i, j) of the I x J
cross-classification grid. The null hypothesis, Ho : ILij = IL, is to be tested
Power Comparisons 451

against the alternative hypothesis, HI: The J.lij'S are isotone with respect to
the lattice order on the I x J grid, and strict inequality holds for at least one
pair of cells.
Three types of "traditional" test statistics for trend will be assessed on
the rectangular grid: Jonckheere-Kendall, lattice-rank, and linear rank type
statistics. The two-factor Jonckheere-type statistic is of the form:

J=
{a,b : ajb}

where Ua,b is the Mann-Whitney two-sample test statistic evaluated at distinct


grid points a :::::S h.
Higgins and Bain (1999) considered a Kendall-type statistic on the I x J
grid. For one-factor experiments, it has been shown [see Robertson, Wright,
and Dykstra (1988, p. 202)] that the Kendall and Jonckheere statistics are
equivalent, and the argument can be extended to the two-factor case. Higgins
and Bain (1999) also discussed a Spearman-type lattice rank statistic
I J nij

V* = L L ij L RL(Xijk)
i=lj=1 k=1

where RL(Xijk) is the lattice rank of Xijk. The lattice rank for an observation
in cell (i,j) is its rank among all observations in cells that are comparable to
(i,j), i.e., all Xghk such that (g,h):::::s (i,j) or (i,j):::::s (g,h).
Traditional linear rank statistics can also be defined for a lattice ordering,
for example,
I J nij

L = LLCij LRijk
i=1 j=1 k=1
where Cij are isotone coefficients with respect to the lattice order. We chose
Cij = (Vi + VJ)2 = i + j + 2ViJ to represent monotonic main effects and
interaction structure. The case Cij = i + j turned out to be quite similar.
The statistics J, V* and L all reject Ho when large values are attained.
These three statistics will be compared to a nonparametric version of the ORLS-
motivated statistic, X5I' for testing homogeneity against ordered alternatives
[see Robertson, Wright, and Dykstra (1988, Chapter 2)]. A general ORLS
version of X5I can be presented as follows:
If { (Ii, nd 0"2 Hf=1 are random variables and specified weights associated
with the k elements of a finite partially ordered set, (8, :::::5), then a test statistic
for testing Ho : E(Ti) are constant versus HI : E(Ii) satisfy the partial ordering
(not all constant) is
452 T. Hoang and V. L. Parsons

where (Ti, T2, ... Tn is the isotonic regression of {(Ti' nd (J"2)}f=1 [see Robert-
son, Wright, and Dykstra (1988, Definition 1.3.3)] and T is the weighted grand
mean of the Ti's. One rejects Ho for large values of X6I.
For rank-transformed data with Wilcoxon scores Chacko (1963) and Shorack
(1967) proposed analogues of the X6I test statistic, which can be thought of as
ORLS versions of the Kruskal-Wallis test statistic. Shiraishi (1982) provided
generalizations that cover scores other than Wilcoxon scores. For our study
let Rijk denote the Wilcoxon-scored rank of Xijk based upon all n = Li,j nij
observations on the I x J grid. We will also consider the median scores:

cP(R- ok)
tJ
= {o if 0 < Rijk < (n + 1)/2
1 if (n + 1)/2 ::; Rijk ::; n .

The latter transformation is often used when sampling from heavy tailed
distributions. For each cell (i, j) we compute the cell means for the scores
nij nij

Hij = L ~jk/nij, ¢ij =L cP(Rijk)/nij.


k=1 k=1

Two ORLS nonparametric statistics considered for testing Ho versus HI are

X6I (R) = n(n


12+ 1) t; j;
I J
nij
( _
Rij -
n+1)
-2-
2
,

-2
XOl (cPR) =
(n-1) ~~
(1 _ ) L... L... nij cPij - P
(-* )2 ,
np p i=1 j=1

P = ~ for n even and p = ~ + in


for n odd.
For additional motivation and null distribution properties for these non-
parametric statistics the reader is referred to Robertson, Wright, and Dykstra
(1988, pp. 204-206).

34.3 Test Statistic Power Evaluations


Any nonparametric comparison among the test statistics involves the dimen-
sions of the I x J grid, the sample sizes per cell, and sampling distributions
under an ordered alternative. We have run simulations under the following
sampling distributional assumptions to give an indication of relative power.

1. We considered I = 5 and J = 5 to specify 25-cell grids with 1 and 4


observations per grid cell.
Power Comparisons 453

2. Shift alternative distributions, Xijk = J.Lij + eijk where eijk were indepen-
dent identically distributed random errors from the families:
Standard Gaussian, double exponential, H dg/ Z exp(hZ 2/2), h = 0.5, 1,2
[see Hoaglin, Mosteller, and Tukey (1985)] and Gamma b) with density
function ex: x'Y- 1 e- x , I = ~,1 (exponential), 2, were considered.
All the distributions for the error, e, were scaled to have unit variance
except for the H distributions which were scaled so that P( -1 < e <
1 ) = .6826 to agree with a standard Gaussian. The Gamma distributions
were selected to generate a skewed error.

3. The shape of the trend function, J.Lij, on the I x J grid was determined
to be the most important factor in power comparisons, but having two
factors now squares the order of magnitude for the number of cases needed
for consideration over that of one dimension. For a two-factor design, the
response as a function of the experimental levels is J.Lij = f) + Qi + (3i +
lij , Q and (3 are the main factors and I the interaction. To represent
such a broad range of increasing trends on the grid we considered simple
discretized monotonic functions on the unit square and simple monotone
step functions. The following list provides some basic trend shapes on the
25-cell grid which should provide some insight into relative merits of the
proposed statistics.

Trl: One step: J.L55 = 1, = 0 elsewhere


J.Lij

Tr2: Additive effect threshold: J.Lij = 1, i + j > 6, J.Lij = 0 elsewhere


Tr3: Border one step: J.L5j = 1, j = 1,2, ... 5, J.Lij = 0 elsewhere
Tr4: Angle corner one step: J.Lij = 0, i ::; 4, j ::; 4, J.Lij = 1 elsewhere
Tr5: Angle three steps: J.Lij = 0, i, j ::; 3, J.Lij = 1, i ::; 3, j 2 4, J.Lij = 2,
j ::; 3, i 2 4, J.Lij = 3, i 2 4, j 2 4
Tr6: Uniform one main effect: J.Lij ex: (j - 1)
Tr7: Uniform two main effects: J.Lij ex: (i - 1) + (j - 1)
Tr8: Early effects: J.Lij ex: (i - 1)1/2 + (j - 1) 1/2
Tr9: Late effects: J.Lij ex: (i - 1)2 + (j - 1)2
Tr1O: Late effects + interaction: J-Lij ex: [(i - 1) + (j - 1)]2
Trll: Early+late effects: J-Lij ex: (i - 1)1/2 + (j - 1)2

Our choice of the terms "early" and "late" for trends Tr8-Tr 11 refer to the
position of effects producing maximum change. Trends Tr1-Tr4 represent
extreme cases of trend functions taking Just two values.

4. We scaled the trend shapes of 3. above as in Robertson, Wright, and Dyk-


stra (1988, Section 2.5) to represent a distance from the null hypothesis.
454 T. Hoang and V. L. Parsons

Denoting the basic shape trends as {J.LO,ij} I x J, then for 8 = 1, 2, 3, 4 we


defined scaled grids

This transformation provided some degree of standardization among dif-


ferent trend functions and standardization for different sample sizes.

5. Based upon 100,000 simulations using a Fortran uniform random number


generator and code [11], an approximate size a = 0.01 critical region was
established for each statistic. The X61 statistics were computed using the
algorithm of Qian and Eddy (1996). The alternative distribution simu-
lations were based upon 10,000 simulations. Publicly available Fortran
algorithms were used in DATAPAC (1986). Furthermore, for a fixed error
variable distribution, we used the same eijk'S for different J.Lij'S and test
statistics. In general, this technique introduced a strong positive simula-
tion correlation among the estimated powers for the different statistics.
For example, the estimated correlations between estimated powers of the
X61 (R) and J statistics were frequently 0.5 or greater for J.Lij close to Ho·
This simulation feature helps to reduce the sampling error when making
comparisons.

Comparing test statistics by their power levels is straightforward when the


isotone function, J.Lij, error distribution and sample sizes are fixed, but com-
paring over different specifications becomes difficult. To facilitate global com-
parisons over different HI alternatives, a measure of relative efficiency will be
used. John Tukey (personal communication) suggested the following measure
for small sample nonparametric tests:

where PHa (8 E C) is the size of the 8 test critical region, C, PHI (8 E C) is


the power of the 8 test for a distribution F E HI, and <I>-1 is the Gaussian
quantile.
It should be noted that the traditional measure of efficiency is the ratio of
sample sizes for two statistics needed to achieve the same power. The measure
used above is not of that form; here both statistics are based upon the same
sample sizes. However, some motivation for using eff for "large" sample sizes
and shift alternatives can be informally stated as follows.
If PH(S E C)) ~ <I>( c(n, a) - E(8H) )/CJo) then

[ <I>-I(PHI (8 E C)) - <I>-I(PHa(8 E C)) t~ [E(SHI) - E(SHa)]2 / CJ 6·


Power Comparisons 455

This quantity can be considered the distance of HI to Ho, and is often


proportional to the sample size n. Under such conditions the magnitude of eff
above would be somewhat consistent with a traditional measure of test statistic
efficiency. For the small sample situation we consider herein, we believe that
the magnitudes of eff and traditional efficiency would exhibit similar patterns.
Another reason to use eff instead of power alone is that this measure adjusts
for the possible different a-levels of the competitor statistics. For small sample
sizes the discrete nature of the test statistics result in nominal a-levels beirig
somewhat different from the achieved a-level.
Our main focus of this study is the comparison of the ORLS statistics to
the more classically-defined competitors. The XBI (R) has been well-established
and can be considered an omnibus test statistic for trend. In our work we let
S = XBI (R) in the definition of eff and let S' be a competitor statistic.

34.4 Results and Conclusions


Our simulations are too numerous to present here, but Table 34.1 and Table 34.2
provide examples of the types of power and efficiencies obtained. Trend function
shapes, firstly, and error variable distributions, secondly, are the most important
factors in making global assessments about the different tests. Tables 34.1 and
34.2 present the two trend functions .most likely to be of practical concern: a
two-main-effects model and a one-main-effect model. Tables 34.3 and Table
34.4 provide the ranges of efficiencies over different shaped trend functions and
error variables representing degrees of tail weight and skewness. The tabulated
efficiencies are best used to discern general patterns of test statistic superiority.
In Tables 34.3 and 34.4 we specify for each combination of trend function and
distribution the superior statistic(s) in the column "Test choice". As a caveat
we note that large efficiencies may be observed when the power of both XBI (R)
and its competitor are both small, e.g. both powers ~ 0.10, but superiority
for such a case would be of little practical concern. Furthermore, for both
powers small, the simulation error of the efficiency (see Appendix) could make
a simulated 10% difference not significant. In our tables the results for the
symmetric distributions with 6 = 1 tend to result in small powers for all the
statistics and interpretation of the magnitudes of efficiencies must be treated
with caution. Some conclusions are:
• Except for very heavy tailed distributions, the Jonckheere statistic per-
forms relatively well when the trend function JLij is strictly increasing in
both levels i and j. For example, in Tables 34.1 and 34.2 for the uniform
two-main-effects model, Tr7, the Jonckheere statistic shows substantial
increases in power over the XBI (R) statistic. While our targeted cell sam-
ple sizes were limited to n = 1 and n = 4, there appears to be a stronger
456 T. Hoang and V. L. Parsons

relative superiority for the larger n. This superior behavior is also appar-
ent for trends Tr7-Trll in Tables 34.3 and 34.4. While the L test and V*
test exhibit superiority in some cases, the Jonckheere statistic appears to
perform well over a wider range of strictly increasing trend shapes.

• The one-step trend shapes Tr2, Tr3, Tr4 and Tr6 reverse the superiority
just discussed. For these trend shapes the X51 (R) statistic shows superior
power over the J, Land V* statistics. The border one-step trend, Tr3,
appears to be an extreme case. The one-main-effect trend, Tr6, is detailed
in Tables 34.1 and 34.2.

• The angle three step trend, Tr5, represents a degree of trend somewhat
between the one-step and strictly increasing trends. Here, the J and
V* statistics tend to have greater power than the X51 (R) statistic, but
reversals are more frequent than for the stronger trends, especially for
n=l.

• Some caution must be used with the interpretation of the one-step, Tr1,
trend. Here, cell parameter /--l5,5 can be thought of as the only cell that
deviates from Ho. The reduction of the data to ranks results in very
low power for small sample sizes, even as /--l5,5 -> 00. For trend Tr1 the
maximum power was of the order 0.05 for n = 1 and about 0.20 for n = 4.
When comparing the two different sample sizes for this case, the patterns
of eff were less consistent between than for the other trend shapes.

• The V* statistic is defined to detect strong trends that occur during late
levels of the experimental treatments. Its best performance was for the
"late effects + interaction trend", TrIO, and the one-step trend, Trl.
These observations are consistent with the discussion presented in Higgins
and Bain (1999).

• As expected, for the very heavy tailed distribution, H, with h = 2, the


X51 (<p R ) statistic tends to superior power over the competitor statistics.
This is most noticeable for the n = 4 sample size.

In conclusion the study seems to suggest that if extremely heavy tailed distri-
butions are discounted, then the Jonckheere statistic or X51 (R) statistic would
be a reasonable choice test statistic for testing a broad range of two factor or-
dered alternatives. The Jonckheere statistic might be favored if the researcher
believed that an increase in anyone factor level should result in a strictly in-
creased response. For experiments where the effectiveness of one factor or many
of levels are questionable, the X51 (R) statistic would be favored.
Power Comparisons 457

Table 34.1: Power and efficiency of test statistics compared to isotonized


Kruskal Wallis statistic for a = 0.01, 5 x 5 grids and one observation per
cell

,.-
J
~

Trend Dis1rlbulian Delta Efficiency Pow«


Jane UnR V' Isomed KW Jcnc UnR V' Isomed
n=1
Uniform normal 1 1.07 1.07 1.03 0.67 0.07 0.08 0.08 0.08 0.05
gradient 3 1.09 1.07 1.01 0.63 0.61 0.66 0.64 0.61 0.39
4 1.08 1.08 1.01 0.61 0.86 0.90 0.89 0.87 0.63
exponential 1 1.07 1.03 0.83 0.56 0.17 0.19 0.17 0.14 0.09
3 1.15 1.10 1.04 0.74 0.79 0.86 0.84 0.81 0.65
4 1.18 1.12 1.09 0.75 0.93 0.96 0.95 0.95 0.83
gamma(.5) 1 1.11 1.01 0.85 0.68 0.33 0.37 0.33 0.28 0.22
T 3 1.19 1.11 1.07 0.84 0.87 0.93 0.91 0.90 0.80
w 4 1.24 1.13 1.10 0.85 0.95 0.98 0.97 0.97 0.91
0
gamma(2) 1 1.05 1.01 0.89 0.60 0.11 0.12 0.11 0.10 0.07
3 1.11 1.08 1.02 0.67 0.72 0.77 0.76 0.73 0.52
e
4 1.13 1.10 1.08 0.68 0.91 0.94 0.94 0.93 0.75
f
H(t...5) 1 1.12 1.10 1.09 1.03 0.07 0.08 0.08 0.08 0.07
f
3 1.15 1.12 1.05 0.90 0.48 0.56 0.53 0.50 0.43
e
4 1.18 1.12 1.04 0.89 0.67 0.76 0.73 0.70 0.61
e
t H(h-1) 1 1.12 1.09 1.06 1.24 0.08 0.09 0.09 0.08 0.10
s 3 1.21 1.13 1.06 1.08 0.44 0.54 0.50 0.47 0.47
4 1.22 1.13 1.04 1.07 0.59 0.70 0.66 0.62 0.62
H(h=2) 1 1.16 1.11 1.06 1.51 0.11 0.13 0.12 0.12 0.17
3 1.30 1.17 1.06 1.31 0.42 0.54 0.49 0.44 0.54
4 1.30 1.15 1.04 1.28 0.53 0.67 0.60 0.55 0.65
Unfform normal 1 0.88 0.88 0.90 0.64 0.05 0.05 0.04 0.05 0.04
steps 3 0.78 0.73 0.74 0.63 0.39 0.31 0.28 0.29 0.24
4 0.73 0.67 0.67 0.60 0.67 0.53 0.48 0.48 0.43
eXponential 1 0.88 0.79 0.70 0.57 0.10 0.09 0.08 0.07 0.06
3 0.81 0.70 0.70 0.74 0.60 0.51 0.44 0.44 0.46
4 0.82 0.68 0.71 0.75 0.80 0.72 0.62 0.64 0.66
gamma(.5) 1 0.92 0.76 0.68 0.70 0.20 0.19 0.15 0.13 0.14
0 3 0.90 0.72 0.76 0.86 0.72 0.67 0.55 0.58 0.64
n 4 0.89 0.71 0.77 0.87 0.86 0.82 0.71 0.75 0.80
e gamma(2) 1 0.91 0.86 0.78 0.61 0.06 0.06 0.06 0.05 0.04
3 0.79 0.71 0.68 0.69 0.50 0.40 0.35 0.34 0.34
e 4 0.77 0.67 0.67 0.69 0.74 0.63 0.54 0.55 0.56
f H(h=.5) 1 0.92 0.93 0.92 1.11 0.05 0.05 0.04 0.04 0.05
f 3 0.86 0.77 0.79 0.88 0.31 0.27 0.23 0.24 0.27
e 4 0.86 0.75 0.78 0.88 0.49 0.43 0.37 0.38 0.43
e H(h=1) 1 0.90 0.93 0.90 1.28 0.05 0.05 0.05 0.05 0.06
t 3 0.92 0.79 0.83 1.06 0.29 0.27 0.22 0.24 0.31
4 0.93 0.78 0.82 1.03 0.42 0.41 0.33 0.35 0.44
H(tJ-2) 1 0.98 0.93 0.95 1.58 0.07 0.07 0.07 0.07 0.10
3 1.00 0.80 0.87 1.26 0.28 0.29 0.23 0.25 0.36
4 1.02 0.80 0.87 1.24 0.37 0.39 0.30 0.32 0.46

Power and effiency are computed for random variables of the form eX + J.L8 where J.L is a
trend function on the grid scaled by 8, the distance of J.L to H a, and X a random variable
standardized by c. Distributions of X are symmetric: Gaussian Z, H = Z exp(hZ 2 /2)
for h = .5,1,2, or skewed: exponential, gamma(.5), gamma(2). In the case of uniform
gradient trend J.Lij oc (i - 1) + (j - 1) and for uniform steps J.Lij oc (i - 1)
458 T. Hoang and V. L. Parsons

Table 34.2: Power and efficiency of test statistics compared to isotonized


Kruskal Wallis statistic for a = 0.01, 5 x 5 grids and four observations per
cell

Trend Distributim Delta Efficiency p""""


Jmc Un R V" lsomed KW Jmc UnR V" lsomed
n=4
Uniform ~ 1 1.22 1.21 1.08 0.63 0.07 0.09 0.09 0.08 0.05
gradient 3 1.18 1.16 1.09 0.67 0.63 0.71 0.70 0.67 0.44
4 1.19 1.17 1.08 0.66 0.89 0.94 0.94 0.92 0.71
exponential 1 1.21 1.15 1.04 0.41 0.19 0.23 0.22 0.19 0.08
3 1.22 1.18 1.15 0.59 0.92 0.96 0.96 0.95 0.70
4 1.26 1.21 1.21 0.67 0.99 1.00 1.00 1.00 0.93
glWT1ma(.5) 1 1.29 1.18 1.06 0.35 0.46 0.58 0.54 0.49 0.16
T 3 1.26 1.16 1.17 0.81 0.99 1.00 1.00 1.00 0.96
w 4 1.24 1.17 1.25 0.92 1.00 1.00 1.00 1.00 1.00
0
9IWT1ma(2) 1 1.19 1.15 1.05 0.53 0.11 0.13 0.13 0.12 0.06
3 1.19 1.17 1.12 0.60 0.80 0.88 0.87 0.85 0.55
e 0.63 0.96 0.99 0.98 0.98 0.83
4 1.24 1.18 1.15
f
H(h=.5) 1 1.25 1.22 1.12 1.00 0.07 0.09 0.09 0.08 0.07
f
3 1.19 1.16 1.07 1.04 0.58 0.67 0.65 0.62 0.60
e
4 1.21 1.15 1.06 1.02 0.83 0.90 0.88 0.85 0.83
c
t H(h=1) 1 1.25 1.21 1.09 1.35 0.08 0.10 0.10 0.09 0.11
s 3 1.22 1.15 1.06 1.30 0.61 0.71 0.88 0.64 0.74
4 1.24 1.15 1.06 1.26 0.82 0.90 0.88 0.85 0.91
H(h=2) 1 1.22 1.20 1.11 1.87 0.13 0.16 0.16 0.15 0.25
3 1.26 1.15 1.04 1.61 0.70 0.81 o.n 0.72 0.90
4 1.29 1.14 1.03 1.54 0.86 0.94 0.90 0.87 0.97
Uniform ~ 1 1.02 0.98 0.92 0.62 0.05 0.05 0.05 0.05 0.04
steps 3 0.82 0.80 0.75 0.65 0.47 0.39 0.37 0.35 0.30
4 0.76 0.73 0.69 0.63 o.n 0.64 0.62 0.59 0.55
exponential 1 0.98 0.93 0.81 0.37 0.12 0.12 0.11 0.10 0.05
3 0.82 0.75 0.74 0.54 0.82 0.73 0.69 0.68 0.52
4 0.79 0.73 0.73 0.62 0.97 0.92 0.89 0.89 0.83
QIWT1ma(.5) 1 0.96 0.83 o.n 0.29 0.32 0.31 0.26 0.25 0.09
0 3 0.85 0.74 0.78 0.72 0.96 0.93 0.88 0.90 0.88
n 4 0.83 0.70 o.n 0.86 1.00 0.99 0.97 0.98 0.99
e ganma(2) 1 0.99 0.95 0.85 0.51 0.07 0.07 0.07 0.06 0.04
3 0.81 o.n 0.73 0.57 0.88 0.55 0.53 0.50 0.39
e 4 0.76 0.72 0.70 0.59 0.91 0.81 0.78 o.n 0.68
f H(h=.5) 1 1.02 0.99 0.94 1.12 0.05 0.05 0.05 0.05 0.05
f 3 0.85 0.81 o.n 1.05 0.43 0.36 0.34 0.33 0.45
e 4 0.80 0.76 0.72 1.05 0.69 0.58 0.56 0.53 0.72
c H(h=1) 1 1.01 1.00 0.92 1.47 0.06 0.06 0.06 0.05 0.08
t 3 0.86 0.81 o.n 1.37 0.46 0.40 0.37 0.35 0.61
4 0.83 0.76 0.74 1.34 0.70 0.61 0.56 0.55 0.83
H(h=2) 1 1.05 1.02 0.93 2.19 0.08 0.09 0.08 0.08 0.18
3 0.90 0.79 o.n 1.78 0.58 0.51 0.45 0.44 0.84
4 0.89 0.76 0.76 1.72 0.75 0.69 0.61 0.61 0.95

Power and effiency are computed for random variables of the form cX + l·u5 where /L
is a trend function on the grid scale~ by 8, the distance of /L to Ho, and X a random
variable standardized by a constant c. Distributions of X are symmetric: Gaussian
Z, H = Z exp(hZ2 /2) for h = .5,1,2, or skewed: exponential, gamma(.5), gamma(2).
In the case of uniform gradient trend /Lij ex (i - 1) + (j - 1) and for uniform steps
/Lij ex (i - 1) on cells (i,j)
Power Comparisons 459

Table 34.3: Compa ring ranges of effiency of statist ics and choosi
ng a test for
selected trend shapes and distrib utions and for 0; = 0.01, 5 x 5
grids and one
observ ation per cell

Trend Ostribution Jonckheer e test I Unear rank test T V'test I Isotonic rredian test I Test
Mn tv'ex I Mn tv'ex Mn tv'ex Mn tv'ex choice
n-l
O1estep Gauss 1.21 1.34 1.14 1.24 2.15 2.26 0.18 0.36 V'
Skewed 1.08 1.50 0.98 1.32 1.57 2.17 0.18 0.58 V'
Heavy tail 1 1.13 1.45 1.11 1.40 1.87 2.34 0.20 0.50 V'
Heavy tail 2 1.07 1.37 1.18 1.30 2.07 2.40 0.20 0.38 V'
lJagona/ Gauss 0.90 0.96 0.91 0.96 0.71 0.83 0.74 0.77 f<MI
one step Skewed 0.90 0.98 0.89 0.94 0.66 0.73 0.53 0.81 f<MI
Heavy tail 1 0.90 0.99 0.92 1.00 0.72 0.85 0.94 1.19 Mf<MI
Heavy tail 2 0.94 1.00 0.94 1.01 0.77 0.84 1.30 1.76 M
Border Gauss 0.50 0.77 0.39 0.63 0.53 0.90 0.30 0.62 f<MI
one step Skewed 0.49 0.90 0.39 0.77 0.51 0.85 0.33 1.47 f<MI
Heavy tail 1 0.48 0.83 0.38 0.73 0.52 0.93 0.29 0.94 f<MI
Heavy tail 2 0.56 0.85 0.47 0.69 0.62 0.83 0.42 1.11 f<MI
Angle Gauss 0.66 0.91 0.56 0.84 0.64 0.99 0.61 0.69 f<MI
corner Skewed 0.65 0.92 0.57 0.78 0.63 0.81 0.70 1.71 f<MIlM
one step Heavy tail 1 0.64 0.95 0.55 0.82 0.64 0.92 0.71 1.03 f<MI
Heavy tail 2 0.77 0.91 0.66 0.81 0.74 0.86 0.98 1.49 M f<MI
Angle Gauss 0.94 1.08 0.91 1.05 0.96 1.15 0.62 0.71 f<MI V'
three steps Skewed 0.98 1.08 0.93 0.99 0.89 1.05 0.66 0.93 f<MI Jone
Heavy tail 1 0.96 1.09 0.92 1.03 0.98 1.12 0.74 1.06 V'
Heavy tail 2 1.06 1.11 0.99 1.03 1.04 1.11 1.04 1.52 MJone
Lhfform steps Gauss 0.73 0.88 0.67 0.88 0.67 0.90 0.60 0.67 f<MI
(one effect) Skewed 0.77 0.92 0.67 0.86 0.67 0.78 0.57 0.87 f<MI
Heavy tail 1 0.77 0.94 0.68 0.93 0.70 0.92 0.73 1.11 f<MI
Heavy tail 2 0.90 1.02 0.78 0.93 0.82 0.95 1.03 1.58 M f<MI
Lhfform gracfler1! Gauss 1.07 1.09 1.06 1.08 1.01 1.03 0.61 0.67 Jonc
(tw 0 unfforn1y Skewed 1.05 1.24 1.01 1.13 0.83 1.10 0.56 0.85 Jane
increasing effects) Heavy tail 1 1.11 1.18 1.09 1.12 1.00 1.09 0.73 1.03 Jonc
Heavy tail 2 1.12 1.30 1.09 1.17 1.04 1.07 1.07 1.51 MJonc
Early effects Gauss 1.03 1.08 1.08 1.09 0.89 0.97 0.61 0.66 LJane
Skewed 1.00 1.14 0.98 1.10 0.74 0.99 0.51 0.86 Jonc
Heavy tail 1 1.06 1.14 1.11 1.13 0.93 0.99 0.75 1.00 LJonc
Heavy tail 2 1.13 1.27 1.11 1.18 0.98 1.03 1.08 1.52 MJonc
Late effects Gauss 1.04 1.06 0.98 1.00 1.05 1.07 0.60 0.65 V'Jonc
Skewed 1.06 1.26 1.01 1.09 0.90 1.18 0.63 0.88 Jonc
Heavy tail 1 1.08 1.14 1.00 1.07 1.06 1.11 0.72 0.97 Jonc
Heavy tail 2 1.11 1.28 1.01 1.10 1.07 1.11 1.05 1.48 MJone
Late effects Gauss 1.04 1.06 1.06 1.09 1.15 1.18 0.57 0.65 V'
+ interaction Skewed 1.04 1.27 1.06 1.18 0.99 1.27 0.64 0.83 Jane V'
Heavy tail 1 1.09 1.16 1.09 1.15 1.17 1.24 0.71 0.96 V'
Heavy tail 2 1.10 1.28 1.08 1.17 1.12 1.21 1.04 1.46 MJonc
Early + late effects Gauss 1.04 1.05 1.04 1.06 0.97 1.01 0.59 0.68 L Jane
Skewed 1.04 1.20 0.97 1.10 0.82 1.08 0.56 0.81 Jane
Heavy tail 1 1.10 1.14 1.06 1.11 0.99 1.07 0.70 1.06 Jonc
Heavy tail 2 1.13 1.27 1.09 1.13 1.02 1.05 1.01 1.48 MJane

Distrib utions: Gauss; Skewed: expone ntial, gamma (.5), gamma (2);
Heavy tail 1: dou-
ble expone ntial, H = Z exp(hZ 2 /2) for h = .5; Heavy tail 2: H for
h = 1,2
460 T. Hoang and V. L. Parsons

Table 34.4: Comparing effiency of statistics and choosing a test for selected
trend shapes and distributions and for Q; = 0.01, 5 X 5 grids and four observations
per cell
Trend Distribution Janckheere test Unear r..,k test I yo test I Isotanie medi.., test Test
Min Max Min Max Min Max Min Max choice
n=4
One step Gauss 0.48 0.85 0.48 0.82 0.90 1.48 0.20 0.46 KW yo
Skewed 0.49 1.04 0.48 1.01 0.94 1.67 0.20 1.51 KW yo
Heavy tall 1 0.45 0.95 0.44 0.99 0.86 1.81 0.16 0.61 KW V·
Heavy tall 2 0.59 0.96 0.59 0.91 1.10 1.62 0.28 0.81 yo
Diagonal Gauss 0.99 1.21 0.95 1.15 0.76 0.90 0.65 0.68 Jane KW
one step Skewed 0.95 1.20 0.94 1.10 0.78 0.87 0.28 0.56 Jane KW
Heavy tail 1 0.98 1.24 0.95 1.19 0.76 0.93 0.96 1.23 Jane KW
Heavy tail 2 1.00 1.21 0.97 1.13 0.78 0.90 1.44 2.18 1M
Border Gauss 0.44 0.99 0.39 0.86 0.49 0.98 0.56 0.73 KW
one step Skewed 0.28 0.85 0.26 0.77 0.32 0.84 0.59 1.79 KW
Heavy tail 1 0.41 0.95 0.36 0.84 0.46 0.98 0.69 1.24 KW
Heavy tail 2 0.48 0.93 0.42 0.81 0.54 0.94 0.96 2.03 1M KW
Angle Gauss 0.73 1.01 0.63 0.88 0.67 0.91 0.63 0.64 KW
caner Skewed 0.64 0.99 0.56 0.89 0.63 0.92 0.41 0.91 KW
one step Heavy tall 1 0.71 1.07 0.62 0.93 0.66 1.00 0.94 1.18 KW 1M
Heavy tall 2 0.75 1.06 0.66 0.90 0.70 0.94 1.40 2.20 1M KW
Angle Gauss 0.99 1.29 0.94 1.24 0.97 1.21 0.62 0.66 Jonc
lhree steps Skewed 0.95 1.17 0.89 1.08 0.98 1.09 0.36 1.06 Jone
Heavy1alll 0.98 1.29 0.93 1.20 0.95 1.20 0.89 1.14 Jane
Heavy tail 2 1.03 1.27 0.94 1.20 0.95 1.21 1.34 2.11 1M Jane
Uniform steps Gauss 0.76 1.02 0.73 0.98 0.89 0.92 0.62 0.65 KW
(ane effect) Skewed 0.76 0.99 0.70 0.95 0.70 0.85 0.29 0.86 KW
Heavy tail 1 0.75 1.02 0.72 0.99 0.89 0.94 0.90 1.12 KW
Heavytait 2 0.83 1.05 0.76 1.02 0.74 0.93 1.34 2.19 1M KW
Uniform gradient Gauss 1.18 1.22 1.16 121 1.08 1.09 0.63 0.67 JoncL
(two uniformly Skewed 1.19 1.29 1.15 1.21 1.04 1.25 0.35 0.92 Jane
increasing effects) Heavy tall 1 1.19 1.28 1.15 1.25 1.06 1.19 0.87 1.11 Jane
Heavy tail 2 1.22 1.29 1.14 1.21 1.03 1.11 1.26 1.87 1M Jane
Ea1y effects Gauss 1.11 1.21 1.14 121 0.94 1.01 0.63 0.65 L Jane
Skewed 1.12 1.21 1.11 1.20 0.92 1.10 0.30 0.77 L Jane
Heavy tail 1 1.10 1.27 1.11 128 0.95 1.07 0.85 1.12 L Jane
Heavy1all2 1.16 1.24 1.13 1.25 0.96 1.03 1.25 1.84 1M LJane
L.a!e effects Gauss 1.12 1.21 1.06 1.13 1.10 1.15 0.60 0.65 Jane
Skewed 1.16 1.30 1.04 1.14 1.11 1.23 0.41 0.98 Jane
Heavy1alll 1.12 1.28 1.05 1.18 1.08 1.24 0.87 1.12 Jane
Heavy1all2 1.18 1.28 1.06 1.17 1.05 1.17 1.26 1.91 IMJonc
L.a!e effects Gauss 1.12 1.24 1.14 1.24 1.21 1.28 0.63 0.65 yo
+ interaction Skewed 1.15 1.31 1.12 1.23 1.22 1.30 0.47 1.08 V' Jane
Heavy tail 1 1.14 1.28 1.13 1.27 1.18 1.34 0.85 1.10 V' Jane
Heavy1all2 1.19 1.25 1.15 123 1.13 1.28 1.22 1.85 1M yo Jane
One eaiy + Gauss 1.12 1.23 1.10 1.21 1.02 1.10 0.62 0.66 Jane
ane Iale effects Skewed 1.14 1.25 1.10 1.14 1.02 1.15 0.35 0.97 Jane
Heavy tail 1 1.11 1.27 1.08 1.24 1.02 1.15 0.85 1.12 Jane
Heavy1all2 1.16 1.24 1.10 1.18 1.00 1.09 1.22 1.85 1M Jonc

Distributions: Gauss; Skewed: exponential, gamma(.5), gamma(2); Heavy tail 1: dou-


ble exponential, H = Z exp(hZ2/2) for h = .5; Heavy tail 2: H for h = 1,2
Power Comparisons 461

Appendix
The simulation standard error of eff can be approximated by applying a Taylor
linearization on the functional form. For an estimator P of a proportion P
we have Var(cp-I(p)) ~ [¢(cp-l(p))]-2 p(l - p)jn , where n is the number of
simulations. For two estimated proportions on the same simulation we have
the correlation coefficient P (cp-I(PI), cp-I(P2)) ~ p(PI,P2). For a variable of
the form: T2 = [(cp-I(P2) - cp-I(a)) / (cp-I(PI) - cp-l(a))]2 with a treated
as a constant, we have after additional Taylor linearizations an approximation
for the standard error of T2, SE(T2), due to the simulation:
SE(T2) ~ 2T2 (CVl + CV;2 - 2p(pl ,P2)CVICV2) 1/2 where
CV? = [¢(cp-I(Pi)) . (cp-I(pd - cp-l(a))]-2 Pi(l- Pi)/n. The value of pis
estimated from the simulation. Using this approximation examples of estimates
for the standard error of eff will be 0.0281 and 0.0127 when PI = P2 = 0.15 and
PI = P2 = 0.50 respectively. (Here, we used p = 0.30 ).

References
1. Ager, J. W. and Brent, S. B. (1978). An index of agreement between a
hypothesized partial order and an empirical rank order, Journal of the
American Statistical Association, 73, 827-830.

2. Buning, H. and Kassler, W. (1996). Robustness and efficiency of some


tests for ordered alternatives in the c-sample location problem, Journal
of Statistical Computation and Simulation, 55, 337-352.

3. Chacko, V. J. (1963). Testing homogeneity against ordered alternatives,


The Annals of Mathematical Statistics, 34, 945-956.

4. DATAPAC (1986). A Fortran subroutine library for probability distribu-


tions, National Institute of Standards and Technology.

5. Higgins, J. J. and Bain, P. T. (1999). Non parametric tests for ordered


alternatives in unreplicated two-factor experiment, Journal of Nonpara-
metric Statistics, 11, 307-318.

6. Hoaglin, D. C., Mosteller, F., and Tukey, J. W. (Eds.) (1985). Exploring


Data Tables, Trends, and Shapes, New York: John Wiley & Sons.

7. Jonckheere, A. R. (1954). A distribution-free k-sample test against or-


dered alternatives, Biometrica, 41, 133-145.
462 T. Hoang and V. L. Parsons

8. Marsaglia, G. and Zaman, A. (1987). Toward a universal random number


generator, Florida State University Report: FSU-SCRI-8J-50.

9. Qian, S. and Eddy, W. F. (1996). An algorithm for isotonic regression


on ordered rectangular grids, Journal of Computational and Graphical
Statistics, 5, 225-235.

10. Robertson, T., Wright, F. T., and Dykstra, R. (1988). Order Restricted
Statistical Inference, Chichester: John Wiley & Sons.

11. Shiraishi, T. (1982). Testing homogeneity against trend based on rank


in one-way layout, Communications in Statistics-Theory and Methods,
11, 1255-1268.

12. Shorack, G. R. (1967). Testing ordered alternatives in model I analysis of


variance; Normal theory and nonparametric, The Annals of Mathematical
Statistics, 38, 1740-1753.

13. Terpstra, T. J. (1952). The asymptotic normality and consistency of


Kendall's test against trend when ties are present in one ranking, Pro-
ceedings of the Section of Science Koninklijke Nederlandse Akademie van
Wetenschappen (A), 55, Indagationes Mathematicae, 14, 327-333.
35
Tests of Independence with Exponential Marginals

Paul Deheuvels
L.S. T.A., Universite Paris VI, Bourg-la-Reine, France

Abstract: We present tests of independence for bivariate vectors with exponen-


tial marginals, in the setup of bivariate extreme value distributions. These rely
on a new Karhunen-Loeve expansion due to Deheuvels and Martynov (2000).

Keywords and phrases: Test of independence, bivariate extreme values,


Cramer-von Mises-type tests, Karhunen-Loeve expansions

35.1 Introd uction


Let {(Xn, Y n ) : n ~ 1} be independent and identically distributed [LLd] bi-
variate random vectors with exponential marginals. Assume, in addition, that
the distribution of (X, Y) = (Xl, YI), denoted by EA(" v), is such that, for
constants "( > 0 and v > 0,

lP(X ~ "(x, Y ~ vy) = exp ( - (x + Y)A(x: y)) for x> 0, y > 0, (35.1)

where {A(u) : 0 ~ u ~ 1} fulfills the assumptions


(A.1) max{u, 1 - u} ~ A(u) ~ 1 for 0 ~ u ~ 1;
(A.2) A is convex on [0,1].
This model is discussed at length in Falk, Hiisler, and Reiss (1994, Section
4.2, pp. 111-118) [see also Deheuvels (1984) and Resnick (1987, Ch. 5)]. It
is noteworthy [see, for example, Pickands (1981,1989) and Galambos (1987)]
that the conditions (A.1-2) are necessary and sufficient conditions for lP(X ~
x, Y ~ y) to define, via (35.1) the survival function of a bivariate extreme value
probability distribution for minima, with the following characteristic property.
Whenever (X, Y) = EA(" v) (this denoting the fact that (X, Y) follows the
°
distribution E A ( ,,(, v) ), then, for any constants c > and d > 0, mine cX, dY) is

463
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
464 P. Deheuvels

exponential. This holds, in particular, for X (with c = 1, d = (0) and Y (with


c = 00, d = 1), both of which are exponentially distributed with expectations
given by lE(X) = I and lE(Y) = v.
We are concerned with testing the null hypothesis, denoted hereafter by
(H.O), that X and Yare independent, against the alternative. Under the
EA(r, v) model, the independence assumption (H.O) may be reformulated into
(H.O) A(u) = 1 for 0::; u ::; 1.

There is a huge literature dealing either with models of bivariate distri-


butions with exponential marginals [see, for example, Gumbel (1960a), Fre-
und (1961), Marshall and Olkin (1967,1982), Downton (1970), Arnold (1975),
Proschan and Sullo (1976), Johnson and Kotz (1977), Raftery (1984), Barnett
(1985), Balakrishnan and Basu (1995)], or with bivariate extreme value models
[see, for example, Geffroy (1958/59), Tiago de Oliveira (1958, 1984), Gumbel
(1960b), de Haan and Resnick (1977), Pickands (1981, 1989), Leadbetter, Lind-
gren, and Rootzen (1983), Galambos (1987), Smith, Tawn, and Yuen (1990),
Deheuvels (1991), Einmahl, de Haan, and Huang (1993)J. The latter corre-
spond to limit laws obtained by taking coordinatewise minima (or maxima) for
sequences of independent and identically distributed random vectors. In either
case, it is of great practical usefulness to test independence of the marginals.
The approach we will follow turns out to generate a series of theoretical results
of interest.

°
-Towards the aim of testing (H.O) against the general alternative (H.l) that
A(u) i- 1 for some < u < 1, we introduce the empirical process [see, for
example, Deheuvels (1991)J

(35.2)

where we set
_ 1 n
and Yn = - 2:Yi. (35.3)
n i=l

We note that the statistic

_1_ _ ~ ~ . (Xii Xn , Yi/Yn) ,


~ - ~mln
An(u) n i=l u 1- u

is a variant of an estimator of I/A(u) due to Pickands (1981,1989). The original


Pickands estimator (see, e.g., Deheuvels (1991)) yields the maximal likelihood
estimator of I/A(u) under the E(I, 1) model. In (35.2) and elsewhere, we set
x/O = 00 when x > 0. This convention entails that Zn, as given in (35.2),
defines a random variable with values in the Ban·ach space (C[O,I],U). The
latter consists of the set C[O, 1J of continuous functions on [0,1], endowed with
the uniform topology U defined by the sup-norm Ilfll = sUPo:s;u 9 If(u)l. It is
Tests of Independence 465

noteworthy that the distribution of Z in (35.2) is independent of'Y > 0 and


v > O. Denote by {Z(u) : 0 < u < I} a centered Gaussian process with
covariance function
2v - u 2 - v 2
R(u, v) = R(v, u) (l-u)v -l-(l-u)(l-v)-uv (35.4)
for 0 ~ u ~ v ~ 1.

Following the arguments of Deheuvels (1991), Deheuvels and Martynov (2000)


have described the weak limiting behavior under (H.O) of Zn as n -+ 00 by
proving the following theorem.

Theorem 35.1.1 Under (H.O)) the empirical process {Zn(u) : 0 < u < I}
converges weakly in (C[O, l],U) to {Z(u) : 0 ~ u ~ I} as n -+ 00.

Given this result, it is natural to consider the general class of tests of inde-
pendence of X and Y based upon statistics of the form 8(Zn), where 8 is an
appropriate functional on (C[O,l],U). Since, under suitable continuity condi-
tions on 8, the limiting distribution as n -+ 00 of 8(Zn) is that of 8(Z), it is
not too difficult to make use of tests of this type, as long as the distribution
of 8(Z) can be evaluated. In the following, we present some examples of the
kind. For further details on this problem, we refer to Deheuvels and Martynov
(1996,2000).

35.2 Karhunen-Loeve Expansions


We first recall the main properties of the Karhunen-Loeve [KL] expansion of
a general centered Gaussian process {Z (u) : 0 ~ u ~ I} with continuous
covariance function

R(u, v) = IE(Z(u)Z(v)) for 0 ~ u, v ~ 1,

and almost surely continuous sample paths. Under these assumptions [see,
for example, Adler (1990, pp. 66-79), Kac and Siegert (1947), Shorack and
Wellner (1986, pp. 206-218)] there exist constants Al ~ A2 ~ ... ~ 0, together
with continuous functions el (t), e2(t), . .. , on [0,1] (the eigenfunctions of the
covariance kernel R(u, v)), such that the following properties (K.1-2-3-4) are
fulfilled.
(K.1) The {ek : k ~ I} are orthonormal in L2[0, 1], i.e.,

r 1
10 ei(t)ej(t)dt =
{I0 ifi=j,
if i i= j, (35.5)
466 P. Deheuvels

(K.2) The {(Ak' ek) : k 2: I} form a complete set of solutions of the Fredholm-
type equation in (A, e),

Ae(u) = 10 1 R(u,v)e(v)dv for 0:::'; u:::.; 1; (35.6)

(K.3) We have
<Xl

R(u, v) = L Akek(u)ek(v), (35.7)


k=1
where the series on the right hand side of (35.7) is absolutely and uniformly
convergent on [0,1]2;
(K.4) There exists a sequence {Wk : k 2: I} of independent N(O, 1) random
variables such that the following Karhunen-Loeve [KL] expansion holds. For all
O:::';u:::';l
<Xl

Z(u) = L ~ Wkek(U), (35.8)


k=1
where the series is almost surely uniformly convergent on [0,1].
The KL expansion induced by (K.1-2-3-4) is of major interest for several
reasons. Below, we discuss two of the most important applications. We limit
ourselves to the non-trivial case where Ak > 0 for all k 2: 1 and assume implicitly
from now on that this condition is fulfilled.
First, the KL expansion (35.7)-(35.8) yields an explicit description of the
reproducing kernel Hilbert space [RKHS] of Z [see, for example, Kuelbs (1976)
and Adler (1990, Theorem 3.16)]. When Z is considered as a random variable
with values in the Banach space (e[O, l],U), the RKHS lH of Z is the Hilbert
subspace of e[O, 1] given by

lH
k=1
<Xl

= {f : f(u) = L ak~ ek(u), °: .; u :::.; 00

1, L ak <
k=1
oo}, (35.9)

with inner product

where
<Xl <Xl

f(u) = Lak~ek(u) and g(u) = Lbk~ek(u) for 0:::'; u :::.; 1.


k=1 k=1
(35.11)
Via (35.10)-(35.11), the sequence {y'Xkek : k 2: I} yields a convergent ortho-
normal sequence [CONS] in lH, which is essential to analyze the structure of Z
[refer to Ledoux and Talagrand (1991) and Lifshits (1995)].
Tests of Independence 467

Second, a statistically useful consequence of (K. 1-2-3-4) , is the description


of the distribution of

(35.12)

given by its characteristic function

lE(exp(iu.1 2)) = II (1 - 2iuAk) -1/2


<Xl
for u E JR. (35.13)
k=l

°
The relations (35.12)-(35.13) have applications when {Z(u) : ~ u ~ I} is the
weak limit (with respect to an appropriate topology) of a sequence {Zn (u)
°~ u ~ I} of empirical processes. In this framework, the statistic

(35.14)

is typically of interest for tests of goodness-of-fit, since its distribution can be


approximated for large values of n by that of .12 . This, however, necessitates
in practice a numerical evaluation of the quantiles of the latter distribution. It
is not too difficult [see, for example, Johnson, Kotz, and Balakrishnan (1994,
Section 18-8, pp. 444-450)] to invert numerically a finite product approximation
of the right hand side of (35.13) leading to the desired values of IP(.12 ~ x).
For discussions concerning the precision of this approach and related methods,
refer to Imhof (1961) and Martynov (1975,1976). This, in turn, requires a prior
explicit knowledge of the eigenvalues {Ak : k 2: I}, which are implicit in terms of
R(u, v) via (35.6). In general, one knows only R(u, v) and the needed numerical
evaluation of the Ak'S can only be made by tedious recursions, which do not
allow us to achieve reasonable precision for higher order terms. Therefore,
(35.12) is mostly useful for such applications when there exist simple closed-
form expressions for the Ak'S. When it is not the case, one must use different
techniques based on direct approximations of IP(.1 ~ x) [see, for example,
Martynov (1992) and Deheuvels and Martynov (1996)].
For most Gaussian processes of interest with respect to statistics, the values
of the Ak'S are not known in explicit form [see, for example, Adler (1990, p. 76)].
Below, we give examples of Gaussian processes on [0,1] for which the constants
{Ak : k 2: I} in the KL expansion are known [refer to Csorgo (1979, 1981),
Cotterill and Csorgo (1985), Deheuvels (1981), for examples of such processes
indexed on [0, l]d with d 2: 2].
- The (restriction to [0,1] of the) Wiener process {W(t) : t 2: O}, with Z = W
and Ak = l/((k - ~)7r)2 for k 2: 1 [see, e.g. Adler (1990, p. 77)];
°
- The Brownian bridge {B(t) : ~ t ~ I}, with Z = Band Ak = l/(k7r? for
k 2: 1. When Zn is the uniform empirical process on [0, 1], .1~ reduces to the
468 P. Deheuvels

Cramer-von Mises statistic [see, for example, Shorack and Wellner (1986, Propo-
sition 1 and Theorem 1, pp. 213-217), Durbin (1973, p. 32), Darling (1955, p.
15), Smirnov (1948), Anderson and Darling (1952), and Darling (1957)].
- The limiting process Z(t) = B(t)/ Jt(1 - t) of the Anderson-Darling statistic,
where B(t) denotes a Brownian bridge [see, for example, Anderson and Darling
(1954), Watson (1961,1967), Shorack and Wellner (1986, pp. 148 and 224-227)].
In this case, Ak = 1/(k(k + 1)) for k 2:: l.
It turns out that in the framework of our study, the KL expansion of the
centered Gaussian process Z with covariance function (35.4) has been obtained
by Deheuvels and Martynov (2000). Their result is given in the next theorem~
The following notation and facts from the theory of orthogonal polynomials will
be needed.
The Jacobi polynomials [see, for example, Tricomi (1970, pp. 160-177) and
Szego (1967)] denoted by Pi:,f3(x) for n 2:: 0 and a,f3 > -1, with x E [-1,1],
yield the modified Jacobi polynomials [see, for example, Chihara (1978, (2.1),
p. 143)] via the change of variable u = (x + 1)/2. These are defined, for n 2:: 0
and 0 ~ u ~ 1, by
(l)n 1 dn
QQ,f3(u) = PQ,f3(2u - 1) = - - - -{uf3+n (1- u)Q+n}. (35.15)
n n n! uf3(1 - u)Q dun

The modified Jacobi polynomials {Q~,f3 : n 2:: O} fulfill the orthogonality rela-
tions [see, for example, Chihara (1978, (2.18), p. 148)], for m, n 2:: 0,

10 1 Q~l(u)Q~,f3(u)uf3(1- u)Qdu
= 0 when m -I n, (35.16)
r(n + a + l)r(n + f3 + 1)
when m = n.
(2n+a+f3+ l)r(n+a+f3+ l)n!
For a = f3 = 2, the polynomials, defined, for n 2:: 0 and 0 ~ u ~ 1, by

Pn(u) = Q2,2(u) = (-I)n 1 dn {un+2(1 _ u)n+2}, (35.17)


n n! u 2(1 - u)2 dun
fulfill the othogonality relations, for m, n 2:: 0,

101 u 2(1 - u)2 Pm(u)Pn(u)du =0 when m -I n, (35.18)

= An
u
= 1 X (n + 1)(n + 2) when m=n.
2n + 5 (n + 3)(n + 4)
Theorem 35.2.1 Let Z = Z and R( u, v) = R( u, v) be as in (35.4). Then, the
properties (K.I-2-3--4) hold with
6
Ak = k(k + 1)(k + 2)(k + 3) for k 2:: 1, (35.19)
Tests of Independence 469

and, for k ~ 1 and 0 ~ u ~ 1,

u(l- u) p; () = {(2k 3) (k + 2)(k + 3) }1/2


J~k-l k-l U k(k+l) + x
( _I.)k-1 1 d k- 1
k+1 (1 )k+1}
x(k_l)! u(l-u) duk- 1 { u -u . (35.20)

Remark 35.2.1 For an explicit computation of the eigenfunctions {ek : k ~ I}


in (35.20), one may use a binomial expansion of (u - l)i+1 in the formula (see,
for example, Chihara (1978, (2.60), p. 144)]

-{(2k 3) (k + 2)(k + 3) }1/2


ek(u) = + x k(k + 1)

x ~
.
( k + 1 .) (k ~ l)uk-i(U _ l)i+1.
k-l-J J
(35.21)
)=0

The first of these eigenfunctions are given by


e1(u) J30 u(l- u)
e2(u) -v'210 u(l- u)(2u - 1)
e3(u) 3v'IO u(l- u)(14u 2 - 14u + 3) (35.22)
e4(u) -3v'2310 u(l- u)(12u3 - 18u2 + 8u - 1)
e5(u) 2v'1365 u(l- u)(33u4 - 66u 3 + 45u 2 - 12u + 1).
Remark 35.2.2 1°) It follows from (35.20) that ek(l- u) = (-I)k+1ek(u) for
k ~ 1. This, when combined with the version of (35.8) holding for Zo, shows
that

L L
00 00

Z(1 - u) = ~ (_I)k+1wkek(u) =d Z(u) = ~ Wkek(U), (35.23)


k=l k=1
where" =d" denotes equality in distribution. That
{Z(1 - u) : 0 ~ u ~ I} =d {Z(u) : 0 ~ u ~ I}
may be checked directly from the equality
R(u, v) = R(I- u, 1- v) for all 0 ~ U,v ~ 1.

2°) The equality (35.23) implies that the processes


1
2{ Z(u) + Z(I- u)} = L
00

ZS(u) = V>"2R.+1 W2R.+1 eU+1(U), (35.24)


.e=1
and
1
2{ Z(u) - = L v.x; w2.ee2.e(U),
00

ZA(u) = Z(I- u)} (35.25)


.e=1
are independent, with KL expansions given as above.
470 P. Deheuvels

35.3 Applications to Tests of Independence


An immediate corollary of Theorems 35.1.1 and 35.2.1 is given below in terms
of f;,
and .12 defined respectively by

f; = fa1 Z;,(u)du and .12 = fa1 Z2(u)du. (35.26)

Let {Ak : k ~ I} be as in (35.19).

Corollary 35.3.1 Under (H.O), we have

Ji..~ IE(exp(iu.J;)) = IE(exp(iu.J2 )) = IT (1 - 2iuAk) -1/2 for u E JR.


k=1
(35.27)

PROOF. Combine Theorems 35.1.1 and 35.2.1 with (35.12)-(35.13) and (35.26) .

The statistic .1; in (35.26) allows us to test (H.O) against (H.l) by rejecting
the null hypothesis when .1; exceeds a critical level cn,a, chosen in such a way
that, for a specified 0 < ll! < 1, IP(.J; ~ cn,al(H.O)) = ll!. The evaluation of the
exact values of cn,a for the various possible choices of the risk level ll! E (0,1),
and the sample size n ~ 1, is beyond the scope of the present paper. Below, we
limit ourselves to some selected values of the limiting constants Ca such that

(35.28)

In Deheuvels and Martynov (2000) Ca is tabulated with a precision of 10- 3 for


various values of ll!. For example ClO% = 0.770 and C5% = 1.053.

Remark 35.3.1 We will not discuss here the efficiency of the test based upon
.1;, with respect to alternative methods [refer to Balakrishnan and Basu (1995)].
Among the many possible statistics based upon functionals of Zn which (in
addition to .In) may be used to test (H.O) against (H.l), one should mention
the principal component test statistics

(35.29)

A direct consequence of the above theorems is that, under (H.O), for each k ~ 1,

(35.30)
Tests of Independence 471

and we may use this property to reject (H.O) when ±Tn,k or ITn,kl exceeds the
appropriate quantiles of the N(O, 1) law. The fact that the ek have explicit
expressions allows a simple use of this methodology. For example, making use
of (35.22), we obtain readily, under (H.O), that, as n ---+ 00,

= 2v'30 10 Zn(u)u(I - u)du


1
Tn,l (35.31)

J35
Vii
t{
i=l
XiYi
Xi Y n + YiX n
- ~}
3
---+d N(O, 1).

This statistic has a particularly simple expression. Moreover, under (H.I), it


holds that

+ o(I))Jn x 2v'30 Jor 1


1
Tn,l = (1 {A(u) - 1 }u(I - u)du ---+ 00 a.s.,

so that the test of (H.O) based upon Tn ,l is consistent. This property is not
shared in general by Tn,k for k ::::: 2. For example for k = 2 and A(u) = A(I- u)
for 0 :S u :S I, we infer from (35.22) that

Jro {A(u)
1
Jor 1
1 1
- 1 }e2 (u)du = {A(u) - 1 }J2IOu(I - u)(1 - 2u)du = O.

By (35.24)-(35.25), we infer from the above theorems the limiting distribu-


tions of the statistics

41 Jor }2 du,
1 {
.1; Zn(u) + Zn(1- u) (35.32)
and
.1;: = 4 o
r
1 J { Zn(u) - Zn(I - u) }2 duo
1
(35.33)

Under (H.O), it holds that, for u E JR,

(35.34)

and
II (1 - 2iuA2£)
00 -1/2
Ji.,~ lE(exp(iu.1;:)) = . (35.35)
£=1

Acknowledgement. I am grateful to the organizers of the conference for


allowing me to present in this paper a series of results of a joint research program
with G. Martynov.
472 P. Deheuvels

References
1. Adler, R. J. (1990). An introduction to continuity, extrema, and related
topics for general Gaussian processes, IMS Lecture Notes-Monograph Se-
ries 12, Hayward, California: Institute of Mathematical Statistics.

2. Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain


goodness of fit criteria based on stochastic processes, Annals of M athe-
matical Statistics, 23. 193-212.
3. Anderson, T. W. and Darling, D. A. (1954). A test of goodness of fit,
Journal of the American Statistical Association, 49, 765-769.
4. Arnold, B. C. (1975). Multivariate exponential distributions based on
hierarchical successive damage, Journal of Applied Probability, 12, 142-
147.
5. Balakrishnan, N. and Basu, A. P. (1995). The Exponential Distribution,
Theory, Methods and Applications, Amsterdam: Gordon and Breach.
6. Barnett, V. (1985). The bivariate exponential distribution: A review and
some new results, Statistica Neerlandica., 39, 343-356.

7. Basu, A. P. (1988). Multivariate exponential distributions and their ap-


plications in reliability, In Handbook of Statistics (Eds., B. K. Ghosh and
P. K. Sen), pp. 581-592, New York: Marcel Dekker.
8. Chihara, T. S. (1978). An Introduction to Orthogonal Polynomials, New
York: Gordon and Breach.

9. Cotterill, D. S. and Csorgo, M. (1985). On the limiting distribution of and


critical values for the Hoeffding, Blum, Kiefer, Rosenblatt independence
criterion, Statistical Decisions, 3, 1-48.
10. Csorgo, M. (1979). Strong approximations of the Hoeffding, Blum, Kiefer,
Rosenblatt multivariate empirical process, Journal of Multivariate Analy-
sis, 9, 84-100.
11. Csorgo, M. (1981). On the asymptotic distribution of the multivariate
Cramer-von Mises and Hoeffding-Blum-Kiefer-Rosenblatt independence
criteria, In Statistical Distributions in Scientific Work, Vol. 5 (Trieste,
1980), pp. 145-156, ATO Adv. Study Inst. Ser. C: Math. Phys. Sci., 79,
Dordrecht: Reidel.

12. Darling, D. A. (1955). The Cramer-Smirnov test in the parametric case,


Annals of Mathematical Statistics, 26, 1-20.
Tests of Independence 473

13. Darling, D. A. (1957). The Kolmogorov-Smirnov, Cramer-von Mises tests,


Annals of Mathematical Statistics, 28, 823-838.

14. de Haan, L. and Resnick, S. 1. (1977). Limit theory for multivariate


sample extremes, Z. Wahrscheinlichkeit. verw. Gebiete, 40, 317-337.

15. Deheuvels, P. (1981). An asymptotic decomposition for multivariate


distribution-free tests of independence, Journal of Multivariate Analysis,
11, 102-113.

16. Deheuvels, P. (1984). Probabilistic aspects of multivariate extremes, In


Statistical Extremes and Applications (Ed., J. Tiago de Oliveira) pp.
117-130, Dordrecht: Reidel.

17. Deheuvels, P. (1991). On the limiting behavior of the Pickands estimator


for bivariate extreme-value distributions, Statistics & Probability Letters,
12, 429-439.

18. Deheuvels, P. and Martynov, G. V. (1996). Cramer-Von Mises-type tests


with applications to tests of independence for multivariate extreme-value
distributions, Communications in Statistics- Theory and Methods, 25,
871-908.

19. Deheuvels, P. and Martynov, G.y. (2000).A Karhunen-Loeve decomposi-


tion of a Gaussian process generated by independent pairs of exponential
random variables, submitted.

20. Downton, F. (1970). Bivariate exponential distributions in reliability the-


ory, Journal of the Royal Statistical Society, Series B, 32, 408-417.

21. Durbin, J. (1973). Distribution theory for tests based upon the sample
distribution function, Regional Conference Series in Applied Mathematics,
9, Philadelphia: S.LA.M ..

22. Einmahl, J. H. J., de Haan, L., and Huang, X. (1993). Estimating a multi-
dimensional extreme-value distribution. Journal of Multivariate Analysis,
47,35-47.

23. Falk, M., Husler, J., and Reiss, R. D. (1994). Laws of Small Numbers:
Extremes and Rare Events, Basel: Birkhauser.

24. Freund, J. (1961). A bivariate extension of the exponential distribution,


Journal of the American Statistical Association, 56, 971-977.

25. Galambos, J. (1987). The Asymptotic Theory of Extreme Order Statis-


tics, Second Edition, Malabar, Florida: Krieger.
474 P. Deheuvels

26. Geffroy, J. (1958/59). Contribution a la tMorie des valeurs extremes,


Pub. Inst. Statist. Univ. Paris, 7/8, 37-185.

27. Grad, A. and Solomon, H (1955) Distribution of quadratic forms and some
applications, Annals of Mathematical Statistics, 26, 464-477.

28. Gumbel. E. J. (1960a). Bivariate exponential distributions, Journal of


the American Statistical Association, 55, 698-707.

29. Gumbel, E. J. (1960b). Distribution des valeurs extremes en plusieurs


dimensions, Publ. Inst. Statist. Univ. Paris, 9, 171-173.

30. Imhof, J. P. (1961). Computing the distribution of quadratic forms in


normal variables, Biometrika, 48, 419-426.

31. Johnson, N. L. and Kotz, S. (1977). Distributions in Statistics: Continu-


ous Multivariate Distributions, New York: John Wiley & Sons.

32. Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994). Continuous Uni-
variate Distributions, Volume 1, New York: John Wiley & Sons.

33. Kac, M. and Siegert, A. J. F. (1947). An explicit representation of a


stationary Gaussian process, Annals of Mathematical Statistics, 26, 189-
211.

34. Kuelbs, J. (1976). A strong convergence theorem for Banach space valued
random variables, Annals of Probability, 4, 744-771.

35. Leadbetter, M. R., Lindgren, G., and Rootzen, H. (1983). Extremes


and Related Properties of Random Sequences and Processes, New York:
Springer-Verlag.

36. Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces, New


York: Springer-Verlag.

37. Lifshits, M. A. (1995). Gaussian Random Functions, Dordrecht: Kluwer.

38. Marshall, A. W. and Olkin, 1. (1967). A multivariate exponential distri-


bution, Journal of the American Statistical Association, 62, 30-44.

39. Marshall, A. W. and Olkin, 1. (1982). Domains of attraction of multivari-


ate extreme value distributions, Annals of Probability, 10, 168-177.

40. Martynov, G. V. (1975). Computation of distribution function of quadratic


forms of normally distributed random variables, Theory of Probability and
its Applications, 20, 782-793.
Tests of Independence 475

41. Martynov, G. V. (1976). Computation of limit distributions of statistics


for normality tests of type w2 , Theory of Probability and its Applications,
21, 1-13.

42. Martynov, G. V. (1992). Statistical tests based on empirical processes


and related questions, Journal of Soviet Mathematics, 61, 2195-2271.

43. Pickands, J. III (1981). Multivariate extreme value distributions, In Bul-


letin of the International Statistical Institute, Proceedings of the 43rd.
Session, Buenos Aires, pp. 859-878.

44. Pickands, J. III (1989). Multivariate negative exponential and extreme


value distributions, In Extreme Value Theory (Eds., J. Husler and R. D.
Reiss), Lecture Notes in Statistics 51, New York: Springer-Verlag.

45. Proschan, F. and Sullo, P. (1976). Estimating the parameters of a multi-


variate exponential distribution, Journal of the American Statistical As-
sociation, 71, 465-472.

46. Raftery, A. E. (1984). A continuous multivariate exponential distribution,


Communications in Statistics-Theory and Methods, 13, 947-965.

47. Resnick, S. 1. (1987). Extreme Values, Regular Variation, and Point


Processes, New York: Springer-Verlag.

48. Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Ap-


plications to Statistics, New York: John Wiley & Sons.

49. Sibuya, M. (1960). Bivariate extreme statistics. Annals of the Institute


of Statistical Mathematics, 11, 195-210.

50. Smirnov, N. V. (1936). Sur la distribution de w2 , C. R. Acad. Sci. Paris,


202, 449-452.

51. Smirnov, N. V. (1948). Table for estimating the goodness of fit of empir-
ical distributions, Annals of Mathematical Statistics, 19, 279-281.

52. Smith, R. L., Tawn, J. A., and Yuen, H. K. (1990). Statistics of multi-
variate extremes, International Statistical Review, 58, 47-58.

53. Szego, G. (1967). Orthogonal Polynomials, Third Edition, American


Mathematic Society, Colloquium Publication, Vol. 23, New York: Amer-
ican Mathematical Society.

54. Tiago de Oliveira, J. (1958). Extremal distributions, Revista da Fac.


Ciencias Univ. Lisboa, 7, 215-227.
476 P. Deheuvels

55. Tiago de Oliveira, J. (1984). Bivariate models for extremes; Statisti-


cal decision, In Statistical Extremes and Applications (Ed., J. Tiago de
Oliveira), pp. 131-154, Dordrecht: Reidel.

56. Tricomi, F. G. (1970). Vorlesungen uber Orthogonalreihen, New York:


Springer-Verlag.

57. Watson, G. S. (1961). Goodness of fit test on a circle, Biometrika, 48,


109-114.

58. Watson, G. S. (1967). Another test for the uniformity of a circular dis-
tribution, Biometrika, 54, 675-676.
36
Testing Problem for Increasing Function in a
Model with Infinite Dimensional Nuisance
Parameter

M. Nikulin and V. Soley


University Bordeaux 2, Bordeaux, France
f:j Steklov Mathematical Institute, St. Petersburg, Russia
Steklov Mathematical Institute, St. Petersburg, Russia

Abstract: We consider the next statistical problem arising, for example, in


accelerated life testing. Let Xl be a random variable with density function
!(t), W(t) be an increasing absolutely continuous function, <Ii(t) = W-1(t) be its
inverse function, random variable X2 be defined as X2 = <Ii(XI). In order to test
whether function W(t) belongs to a given parametric family when function! is
completely unknown, we take two independent nonparametric estimators in of
density function! and gn of density function g of X2 and compare the function
gn(t) with the function in(w(en; t))'lj;(en; t) for a minimum distance estimator
en. But at the begining we have to investigate the asymptotic behavior of
the estimator en. We consider a parametric minimum distance estimator for W,
when we observe (with a mechanism of independent censoring) two independent
samples from the distributions of Xl and X 2 respectively.

Keywords and phrases: Accelerated life testing, goodness-of-fit test, Kaplan-


Meier estimator, kernel density estimator, minimum distance estimator, non-
parametric estimation, nuisance parameter

36.1 Introduction
Let Xl be a random variable with the distribution function F(t) and density
function !(t), w(t) be an increasing absolutely continuous function, <Ii(t) be the
inverse function: <Ii = w- l . We put X 2 = <Ii(XI). The distribution function

477
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
478 M. Nikulin and V. Soley

G(t) and the density function g(t) of X2 may be easily calculated

G(t) = F(\J!(t)), g(t) = f(\J!(t))'l/J(t),


(36.1)
where 'l/J(t) = fit \J!(t), ¢(t) = fit <]?(t).
We consider the case when we have two independent samples

the first one with the distribution function F(t) and the density function f(t),
and the second one with the distribution function G(t) = F(\J!(t)) and the
density function g(t) = f(\J!(t))'l/J(t). Let

be another two independent samples of random variables with distribution func-


tions QI(t), Q2(t) and density functions ql(t), q2(t) respectively. We assume
that Xl, X 2 , Tl, T2 are independent.
Now suppose we observe censored data: rl,l,"', rl,nl and r2,1, ... , r2,n2'
where we denote

(36.2)

where

Xk,j :'S Tk,j Xk,j :'S Tk,j


(36.3)
Xk,j > Tk,j , Xk,j > Tk,j ,

j=l, .. ·,nk; k=1,2.


We consider the estimation problem for the unknown function <]? in a situa-
tion, when the distribution function F is completely unknown and the increasing
function <]? belongs to a given parametric family,

<]? E B = {<]? : <]?(t) = <]?(8; t), 8 E 8},

where 8 c Rd. So further we shall write

<]?(t) = <]?(8; t), \J!(t) = \J!(8; t), ¢(t) = ¢(8; t), 'l/J(t) = 'l/J(8; t),

G(t) = G(8; t), g(t) = g(8; t),


but sometimes it will be convenient for us to omit the parameter 8. We suppose
that
Estimation of Increasing Function 479

and
nl . n2
o < n--->oo
lim - = PI,
n
0 < hm -
n--->oo n
= P2· (36.4)

We note here that this problem is very natural in accelerated life testing,
see for example, Bagdonavicius and Nikulin (1999, 2000), Gerville-Reache and
Nikoulina (1999), Bagdonavicius, Gerville-Reache, Nikoulina and Nikulin (2000).
It is convenient for us to admit that random variables TI,j or T2,i may take the
value +00 with positive probability.
We shall use the Kaplan-Meier estimators Fnl (t) and 9n2(t) as the non-
parametric estimators for F(t) and G(B; t),

1-Fnl(t) = II
"K-I Z
( n- j )
<t n - ~ + 1
.,
J. 1,)-, 1,J_

(with the convention that I10 = 1), where ZI,I :S ... :S ZI,nl are the order
statistics of the sample yl = (YI,I,"', YI,nl) and Z2,1 :S ... :S Z2,n2 are the
order statistics of the sample y2 = (Y2,1,"', Y2,n2)'
We consider the kernel density estimator in(t) for estimating f(t) at a fixed
point t E R

(36.5)

and the kernel density estimator gn(t) for estimating g(B; t) at a fixed point
tE R

(36.6)

where K. is a suitably chosen nonnegative kernel function (K. : R ----+ R) such


that J K(x) dx = 1, h n > 0 is a bandwidth such that h n ----+ 0 and nhn ----+ 00, as
n ----+ 00. More about the properties of in(t) and gn(t) and their applications see,
for example, Rosenblatt (1956), Parzen (1962), Bickel and Rosenblatt (1973),
Bretagnolle and Huber (1979), Ghosh and Wei-Min (1991), Hall (1984), Horvath
(1991), Czorgo, Gombay, and Horvath (1991), etc. Further (and here) for
simplicity of notation all integrals J b(x )dx for any function b : R ----+ R, without
set of integration are understood to be integrals JR b(x)dx on the Lebesgue
measure. Let [a, b] be an interval such that F(a) = 0,

J(B) = [a(B), b(B)], where a(B) = w(B; a), b(B) = w(B; b).
480 M. Nikulin and V. Soley

We suppose for simplicity that function 1lJ(0;·) is defined on the real line and
the image of 1lJ(0;·) does not depend on O. We define the function f(O, 01 ; t) by
the relation

(36.7)

So if the operator Ao is defined by

[Aoh] (t) = h (1lJ(0; t)) 'ljJ(0; t),

and therefore the operator A(i 1 is defined by

then

(36.8)

We put

g~(O; t) in (1lJ(0; t)) 'ljJ(0; t),


f~(O; t) gn (<1>(0; t)) ¢(O; t). (36.9)

For a nonnegative function ro(t) we define the function ro(t) by the relations

ro(t)¢(O;t) = ro(<1>(O;t)) or ro(t)'ljJ(O;t) = ro(1lJ(O;t)).

It is clear that

J Ig~(O; t) - gn(t)1 2ro(t) dt = J lin(t) - f~(O; t)1 2ro(t) dt.


1(0) I

We consider the minimum-distance estimator en of 0 which is defined by the


relation

en = argmjn J
I
lin(t) - f~(0;t)12ro(t)dt. (36.10)

It is evident that

en = arg mjn J Ig~(O; t) - gn(t)12ro(t) dt. (36.11)


1(0)

The estimator en may be considered as a candidate for Vn-consistent estimator


with the Gaussian limiting distribution.
Estimation of Increasing Function 481

In order to test whether the function w(t) belongs to a given parametric


family
~ E B = {<l>: ~(t) = ~(O;t),O E 8},

where 8 c R d, and the function f is completely unknown, we have to study


the asymptotic properties of the statistics:

Tn = J
I
lin(t) - f~(O; t)1 2ro(t) dt

and we shall prove the asymptotic normality of Tn.


Now we introduce further notations and list all assumptions. We define the
distribution functions H(t), H(O; t), H*(O; t) by the relation

1 - H(t) = (1 - F(t)) (1- Ql(t)) , }


1 - H(O; t) = (1 - F(w(O; t))) (1 - Q2(t))) , . (36.12)
1- H*(O;t) = (1- F(t)) (1- Q2(~(O;t))).

So we have
H*(O; t) = H(O; ~(O; t)).
We assume, that
DI. Functions H(t) and H(O; t) satisfy the conditions:

b < t* = sup {t: H(t) < I} and b < t*(O) = sup {t: H*(O;t) < I}.

It is clear that under Dl

b(O) = w(O;b) < sup {t: H(O;t) < I}. (36.13)

We suppose also that kernel K satisfies the conditions


KI. Function K(t) is nonnegative, symmetric:

K(t) = K( -t) and K(t) = 0, when It I > 1.

K2. Function K(t) is twice differentiable.


K3. Integral J K(t) dt = l.
I
K4. Integral J [~.J(] (t)1
dt < 00. Notice that

J tK(t) dt = 0, and J t 2K(t) dt > O.

We assume that the sequence of bandwidths satisfies the conditions


HI.
logn 0
hn~O, -2-~ ,
hnn
482 M. Nikulin and V. Sole v

nh~ ~ 0, when n ~ 00.

We suppose too that the density function f(t), the weight ro(t) and the distri-
bution functions H(t), H(B; t) satisfy the following conditions on some appro-
priately chosen interval [a, b]:
Fl. Function f is continuously differentiable on interval [a, b + c:] for some
ft
c: > 0, function f is absolutely continuous on [a, b + c:] and

[ftft (t)
sup
tE[a,b+e:] f(t)
< 00, sup
tE[a,b+e:]
1[:22f]
t
(t)1 < 00.
F2. Functions f and 9 satisfy the condition

where
fnK(t) = J (th;:
- U)h1n K f(u) du,

=J
and
g~(B; t) C~nU) :n JC g(B; u) duo

Ql. Functions Ql(t) and Q2(t) are absolutely continuous on [a,b+c:] and
. d·
sup Iqj (t)1 < 00, where qJ(t) = dtQJ(t) (j = 1,2);
tE[a,b+e:]

rl. Function ro(t) is continuously differentiable on interval [a, b] and sup-


ported on [a, b] .
r2. There exists such C = C(B), that

re(t)C(B) ~ reI (t) for all Bl E 8, t E J(B) n J(BI).

For a nonnegative function r(t) we denote by the L2-space generated L;


by the measure with the density r(t) and by II<PUllr the norm of <p(.) in the
space L;.A function <p(B; t) (as function on B) is differentiable (in the space
L;) on B in the open kernel of 80 if for all Bo, that belongs to 80, there exists
such vector-function V <p that

<p(B;·) - <p(Bo;·) = (V<p (Bo; .), B - Bo) + peo(B; .),


where
Ilpeo (-, B) Ilr ---+ 0, when liB - Bo II ---+ 0,

and (-, .) , II . II are the inner product and the norm in Rd. Similarly a function
<p( B; t) (as function on B), that is differentiable (in the space L;) on B in the
Estimation of Increasing Function 483

open kernel of 8 0, is twice differentiable if for all Bo E 80 there exists such


matrix-function H'P (Bo; t), that

V'P (B;·) - V'P (B o;') = H'P (Bo; ·)(B - Bo) + {}()o(B; .),

where
II{}()o(', B)112 -+ 0 in the space L;, when liB - Boll -+ O.
Here for matrix A we write IIAII~ = trA* A, where A* denotes the conjugate
matrix and for

all ... aId)


matrix A = ( ...... ... , and vector b = (b l , ... , bd)
adl ... add

we use the notation Ab = (CI,"', Cd), where Cj = L.~=l ajkbk.


Further we will write

and

H'P(B;t) = (
a~l a~l cp(B; t) '"
... ..,
a~l at cp(B; t) )
... .

ata~l cp(B;t) atatcp(B;t)


D4. Function f ((), ()l; .) is continuous on ()1 in the space L;o in some neigh-
borhood of the true value () and for any E > 0

D5. Functions w(B; t) and 'lj;(B; t) are continuously differentiable on B.

36.2 Consistency of the Estimator On


For a nonnegative function r consider the L;-distance between inO and f(·)·
We put

(36.14)

Burke, Czorgo, and Horvath (1981) proved that under the above conditions the
next proposal is true.
484 M. Nikulin and V. Soley

Proposition 36.2.1 If function r is continuous then for some nonnegative


constants
(J"2(p) = (J"2 (p, F, Q, r, K) and m(p) = m (p, F, Q, r, K)
for 1 :S p < 00 we have

(36.15)

JIf~(t)
where
In(P) = - f(t)IPr(t) dt.

Proposition 36.2.2 Suppose that 0 is the true value of unknown parameter.


Under the conditions of Proposition 36.2.1 we suppose also, that there exists
such C = C(O), that
r(}(t)C(O) ~ r(Ol; t) for all 01 E e, t E 1(0) n 1(01 ),
Then
(36.16)

PROOF. Under the conditions of Proposition 36.2.1


A P
:fro Un, J) -) 0 as n -----+ 00, (36.17)

and

:fre(9n,g)£.O as n-----+oo. (36.18)

Let en be the minimum distance estimator,

en = argmjn J
1
lin(t) - f~(e;t)12ro(t)dt,

for the true value of the parameter e. We note that

J lin(t) - f~(en;t)12ro(t)dt

J
1

< lin(t) - f~(e;t)12ro(t)dt


1

< 2 {f If(t) - f~(0;t)12ro(t)dt+ f lin(t) - f(t)I'ro(t)dt}


2 {f Ig(0; t) - gn(t) 2r,(t) dt + f lin( t) - f( t) I'ro(t) dt } ,
1
Estimation of Increasing Function 485

from which it follows that

J
I
lin(t) - f~(en; t)1 2ro(t) ~ ° as n ---+ 00, (36.19)

and therefore

(36.20)

By the same way it can be proved that

J
I
A *A
If(8, 8n;t) - fn(8n; t)1 ro(t)
2 P
--t ° as n ---+ 00. (36.21 )

So we have for the true value 8 of the parameter

J
I
If(t) - f(8, en; tWro(t) ~ ° as n ---+ 00. (36.22)


We suppose, that the function f(8, (h;·) is continuous on 81 in the space L;o
in some neighborhood of the true value 8 and for any c > 0

Proposition 36.2.3 We suppose, that the conditions of Proposition 36.2.2 are


verified and for any c > 0
8(c) > O.
Then

P {18 - enl 2: c} ~ ° as n ---+ 00. (36.23)

PROOF. It is clear, that

(36.24)

Since
5(c) ---+ 0, when c --t 0,
and, as it is follows from Proposition 36.2.2, if 8 is the true value of parameter,
then
p {J If(t) - f(e, en; t)12 ro(t)dt 2: 8} ---+ 0, when 5 --t 0,

and hence we obtain (36.23). •


486 M. Nikulin and V. Soley

36.3 Asymptotic Behavior of Kernel Estimators


of Densities
Let X = {Xl,"', Xn} be a sample of Li.d. random variables with the distri-
bution function F(t) and the density function f(t) and T = {TI' .. " Tn} be a
sample of i.i.d. random variables with the distribution function QI (t) and the
density function ql(t). We assume that X and T are independent. Suppose we
observe the random vectors

ri = (Yi,Ki) , (i = 1,'" ,n),


which are defined by X, T as it was done in (36.3). Following to (36.12) we put

H(t) = 1- (1- F(t)) (1- QI(t)). (36.25)

Let
yCI,n) ::; yC2,n) ::; ... ::; Y(n,n)
be the ordered statistics of the sample Y = {YI,"', Yn}. We consider the
Kaplan-Meier estimator Fn(t) of the distribution function F(t)

1 - Fn(t) = IT (
n_ j )K j

n-j+1 '
j : 1 ::; j ::; n,
Y(j,n) ::; t

where we put IT Cj = 1, if S = 0. We consider also a kernel estimator of the


jES
density function f(t),

(36.26)

Notice that in the case, when P {Xj ::; Tj} = 1 the Kaplan-Meier estimator
Fn(t) coincides with the usual empirical distribution function Fn(t),
n
Fn(t) =L l(-oo,t) (Xi) .
i=l

We denote by D(t) the increasing function with density function p(t),

f(t) f(t)
p(t)
(1 - H(t)) (1 - F(t)) - (1 - QI(t)) (1 - F(t))2'
D(a) O. (36.27)
Estimation of Increasing Function 487

Burke, Czorgo, and Horvath (1981) proved that under the condition

b < t* = sup {t : H(t) < 1} D3

there exists such a sequence of Wiener processes {Wn(t), t 2: a}, Wn(a) = 0,


that

sup I~n(t) - W~I = 0 (n- 1 / 2 logn), a.s., (36.28)


tE[a,b]

where

W~(t) = (1 - F(t)) Wn (V(t)) , and ~n(t) = Vii (Fn(t) - F(t)) .

Thus,

Vii (Fn(t) - F(t)) = W~ (t) + Rn(t),


where sup IRn(t)1 = 0 (n- 1/ 2 logn), a.s. (36.29)
tE[a,b]

Let W(t) be a Wiener process on interval [a, 00), W(a) = 0, and

W*(t) = (1 - F(u)) W (D(u)).


We consider the linear operator A: L} - - t L}, such, that for y(.) E L}

A [y] (t) = y(t) (1 - F(t)) - Y(t), where Y(t) = J f(t)y(t) dt.


[t,b]

It must be noted that the function Y(t) is absolutely continuous,

d
dt Y(t) = -y(t)f(t)

J
and if
l(y) = f(t)y(t) dt = 0, then Y(a) = Y(b) = 0.
[a,b]

Lemma 36.3.1 Suppose that F(a) = 0, F(b) < 1, then

J (A [y] (t)? f(t)dt 2


(1 - F(t))
= J y2(t)f(t) dt - ( J y(t)f(t) dt) 2 (36.30)
[a,b] [a,b] [a,b]
488 M. Nikulin and V. Solev

PROOF. Under the conditions F(a) = 0, F(b) < 1, integration by parts gives

2 J y(t)Y(t) f(t)dt
(1 - F(t))
= y2(0) + J y2(t) f(t)dt .
(1 - F(t))2
(36.31)
[a,b] [a,b]

Since
2 f(t) 2 f(t) 2 f(t)
(A [y] (t)) (1 _ F(t))2 = Y (t)f(t) - 2y(t)Y(t) (1 - F(t)) +Y (y) (1 _ F(t))2'

from (36.31) we obtain (36.30).



Suppose that Ql (b) < 1,then

(t) - f (t) <C f (t) £ C 1


p - (1 - Q(t))(l- F(t))2 - (1 _ F(t))2 or = (1 - Q(b))"

Hence, under the conditions H(b) < 1, F(a) = 0, we have

J (A [y] (t))2 p(t)dt ::; C J y2(t)f(t) dt,


[a,b] [a,b]

Lemma 36.3.2 Suppose that function y(t) satisfies the condition

(J"2(y) = J IA [y](tWp(t) dt < 00.

[a,b]

J
Then the relation
Y(y) = y(t) dW*(t)
[a,b]

determines a Gaussian random variable with zero mean and variance (J"2 (y).

PROOF. It is clear that

dW*(t) = -W (V(t)) f(t) dt + (1- F(t)) dW (V(t)).


Since for any bounded function y(t) supported on such subinterval [a*, b*] c
[a, b], that the measure dV( t) is finite on [a*, b*], the integral

J W (V(t)) f(t)y(t) dt
[a,b]

is well defined and

J W (V(t)) f(t)y(t) dt = J W (V(t)) dY(t),


[a,~ [a,~
Estimation of Increasing Function 489

where the function Y(t) is defined on interval [a, b] by the relation

Y(t) = J f(s)y(s) ds.


[t,b]

Further we shall denote by S the set of all such functions y(t). Integration by

J J
parts gives
W (V(t» f(t)y(t) dt = Y(t) dW (V(t».
~~ ~~
Hence for such function y(t) integral

J y(t) dW* (t)


[a,b]

is well defined and

J
[a,b]
y(t) dW*(t) J
[a,b]
[(1 - F(t)y(t) - Y(t»] dW (V(t»

J
[a,b]
[A[y] (t»] dW (V(t».

Since the last integral is well defined for any y(t) such that

IlyOII~ = J IA(y(t))12p(t) dt < 00,

[a,b]

and the set S is dense in the space with semi-norm II· we can define the "*,
integral on dW*(t) for any function 'P, II'PII* < 00, by the relation

1:'('P) = J 'P(t) dW*(t) = J [A['P](t)] dW (V(t)) .


[a,b] [a,b]


Let W (t) be a Wiener process and let

Zn(t) = J (1- F(u»)W(V(u)) [:u:n K C~nU)] duo

Lemma 36.3.3 Suppose that conditions Dl, Kl-K4, Hl hold. Then, as n ~


00, we have
490 M. Nikulin and V. Solev

where

a 2 = a 2 (j, H, K) = 1 (1 - H(t))-2 f2(u)r2(u) du (1 K2(u) dU) 2,

[a,b]

m = m(j, H, K) =1 (1 - H(t))-l f(u)r(u) du 1 K2(u) duo


[a,b]·

This lemma was proved by Burke, Czorgo, and Horvath (1981).


Now we apply (36.29) also to the process 9n(t):

vn (9n(t) - G((); t)) = Wn(t) + Rn(t),


where sup IR n (t)I=O(n- 1 / 2 logn), a.s. (36.32)
tE[a,b]

Here

{wn} is a sequence of Wiener processes and we denote by D 2 (t) the increasing


function with density function P2(t),

g(();t)
(1 - Q2(t)) (1 - G((); t))2'
O. (36.33)

Thus, we have

1~K (~) = 1~K (t - dFn1 (u) U) dF(u)

+-1 1
hn hn hn hn

y'nl
(t - U)
1 1
-K
hn
-
hn
* 1
dWn(u)+-
y'nl
1
-K
hn
-
hn
(t - U) dRn(u) ,
and

f~((); t) ¢(();t) 1 :n K- U) dQn2(U)


CI!(();~:
¢(();t) 1:n
K (iI?(();~: - U) dG(();u)

+ ~¢(();t) 1L K (iI?(();~: - U) dWn(u)

+ ~¢(();t) 1:n K (iI?(();~: - U) dRn(U).

Burke, Czorgo,and Horvath (1981) obtained the next result.


Estimation of Increasing Function 491

Lemma 36.3.4 Under condition HI and conditions


(1) function K(t) is compactly supported and absolutely continuous,

(2) J J ~ K (t) dt <


J 00,

(3) and ro is a continuous nonnegative function such that,

J ro(t) dt < 00,

[a,b]

(4) function H(t) satisfies the condition D3,


we have

nhIj2 JJJn J~/~; C~nU) dRn(u)J 2ro(t)dt ~0 as n ~ 00.


By the same way we can prove the next lemma.
Lemma 36.3.5 Under the conditions of Lemma 36.3.4 and the condition
(5) r((); t) is a nonnegative function such that,

J r((); t) dt < 00,

[a(O),b(O)]

we have

nhIj2 J J_..,fii2
1_¢((); t) J ~K (<P(();
hn
t)"- U) dRn(u)J2 ro (t)
hn
~0 as n ~ 00 .
From Lemma 36.3.4 and Lemma 36.3.5 follows the next lemma.
Lemma 36.3.6 Under the conditions from Lemma 36.3.4 and Lemma 36.3.5
and the conditions F2 we have

nhn {/ lin(t) - f~ (en; t) I'ro(t) dt}


= nh,. {/ IJ~(t) - J~(en;t)12ro(t)dt} +r.,
where
J~(t) = ~ J K C~nU) dW~(u),
:n

1
I n2 ((); t) = ..,fii2¢((); t) J h1n K (<P(();t)-U)
hn -
dWn(u) ,
p
rn --+ 0 as n ~ 00.

Here we suppose that the Wiener processes Wn and wn are independent.


492 M. Nikulin and V. Soley

From this lemma and Lemma 36.3.3 we deduce that

Acknowledgements. This research was supported by the Conseil Regional


d'Aquitaine, Grant 20000204009 and by Russian Foundation for Basic Research,
grant 99-01-00111, 00-015-019, RFBR-DFG grant 99-01-04027.

References
1. Bagdonavicius, V. and Nikulin, M. (1999). On semiparametric estimation
of reliability from accelerated life testing, In Statistical and Probabilis-
tic Models in Reliability (Eds., D. Ionescu and N. Limnios), pp. 75-89,
Boston: Birkhauser.

2. Bagdonavicius, V. and Nikulin, M. (2000). Semiparametric estimation in


accelerated life testing. In Recent Advances in Reliability Theory, (Eds.,
N. Limnios and M. Nikulin), pp. 405-418, Boston: Birkhauser.

3. Bagdonavicius, V., Gerville-Reache, L., Nikoulina, V., and Nikulin, M.


(2000). Experiences accelerees: analyse statistique du modele standard
de vie acceleree, Revue de Statistique Appliquee, XILYIII, 5-38.

4. Bickel, P. and Rosenblatt, M. (1973). On some global measures of the de-


viations of density function estimators, Annals of Mathematical Statistics,
1, 1071-1095.

5. Bretagnolle, J. and Huber, C. (1979). Estimation des densites: risque


minimax, ZeitschriJt fur Wahrscheinlichketstheorie und Verwandte Gebi-
ete, 47, 119-137.

6. Burke, M., Czorgo, S., and Horvath, L. (1981). Strong approximation


of some biometric estimates under random censorship, ZeitschriJt fur
Wahrscheinlichketstheorie und Verwandte Gebiete, 56, 86-112.

7. Czorgo, M., Gombay, E., and Horvath, 1. (1991). Central limit theorems
for Lp distance of kernel estimators of densities under random censorship,
Annals of Statistics, 19, 1813-1831.
Estimation of Increasing Function 493

8. Gerville-Reache, L. and Nikoulina, V. (1999). Analysis of reliability char-


acteristics estimators in accelerated life testing, In Statistical and Prob-
abilistic Models in Reliability, (Eds., D. Ionescu and N. Limnios), pp.
91-100, Boston: Birkhauser.
9. Ghosh, B. K. and Wei-Min, H. (1991). The power and optimal kernel
of the Bickel-Rozenblatt test for goodness of fit, Annals of Mathematical
Statistics, 19, 999-1009.
10. Hall, P. (1984). Central limit theorem for integrated square arrow of
multivariate nonparametric density estimators, Journal of Multivariate
Analysis, 14, 1-16.
11. Horvath, L. (1991). On Lp-norms of multivariate density estimators, Annals
of Statistics, 19, 1933-1949. Annals of Statistics, 4, 912-923.
12. Parzen, E. (1962). On estimation of a probability density function and
mode, Annals of Mathematical Statistics, 33, 1065-1076.

13. Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a


density function, Annals of Mathematical Statistics, 27, 832-837.
37
The Concept of Generalized Asymptotic
Deficiency and its Application to the Minimum
Discrepancy Estimation

Masafumi Akahira
University of Tsukuba, Ibaraki, Japan

Abstract: The concept of (asymptotic) deficiency was defined as the (limit


of) additional number of observations which is an integer. However, it would
be convenient for the value of deficiency to be continualized in the higher order
asymptotics. In order to do so, Hodges and Lehmann (1970) suggested using
stochastic interpolation as one of the methods. In this paper, using the method,
we consider the concept of generalized asymptotic deficiency and apply it to
the minimum discrepancy estimation including the minimum chi-square one
considered by Fisher (1928) in relation to the work of Pearson (1900).

Keywords and phrases: Asymptotic deficiency, risk stochastic interpolation,


asymptotic variance, minimum chi-square estimator, maximum likelihood esti-
mator

37.1 Introduction
In the paper, Hodges and Lehmann (1970) introduced the concept of (asymp-
totic) deficiency as follows. Let 6n be a statistical procedure based on n obser-
vations, and a less effective procedure 6kn which requires a larger number kn
of observations to give equally good performance. For the additional number
k n - n of observations needed by the procedure 6t, they called it deficiency. If
d := limn->oo (k n - n) exists, it is called the asymptotic deficiency. In the higher
order asymptotics, the concept of asymptotic deficiency is very useful in dis-
criminating asymptotically efficient estimators [see, for example, Akahira (1986,
1999a, 1999b)]. Then, it is desirable for the value of the asymptotic deficiency

495
C. Huber-Carol et al. (eds.), Goodness-of-Fit Tests and Model Validity
© Springer Science+Business Media New York 2002
496 M. Akahira

to be extended to be continualized in its relation to the higher order term of as-


ymptotic variance of estimators. The method of stochastic interpolation on the
continualization was remarked by Hodges and Lehmann (1970). In this chapter,
using the method, we consider the concept of generalized asymptotic deficiency
(by risk) of the statistical procedure and apply to the minimum discrepancy
estimation of multinomial parameters which was studied by Fisher (1925), Rao
(1961), Ponnapalli (1976), Tanaka and Akahira (1995, 1996) and others. The
generalized asymptotic deficiency of the minimum chi-square estimator relative
to the maximum likelihood estimator is also given.

37.2 The Concept of Generalized Asymptotic


Deficiency
Let ch,n:= ch,n(X) and 02,n := 02,n(X) be two statistical procedures based on
the same sample X:= (Xl, ... , Xn) of size n with risk r(ol,n) >0 and r(02,n) > 0
both converging to zero as n ---7 00. Let k be a positive number, and define 7rk
and K as
7rk := k - [k]
and
K .- {[k] with probability (w.p.) 1 - 7rk,
.- [k]+1 with probability 7rk,
where [k] is the largest number not exceeding k. Then we have

E(K) = [k](1 - 7rk) + ([k] + 1)7rk = k.


We also define
02,[k] w.p. 1 - 7rk,
{ (37.1)
.- 02,[k]+1 w.p. 7rk,
(1 - 7rk)r(02,[k]) + 7rkr(02,[k]+1). (37.2)

Choose a sequence {k n } of positive numbers such that for some a > 0

(37.3)

If there exists
d:= lim (k n - n)
n-HXl

and it is independent of the particular sequence {k n } chosen, then it is called the


generalized asymptotic (gen. as.) deficiency (by risk) of c52 ,kn relative to c51,n [see
also Hodges and Lehmann (1970)]. Note that the value of the gen. as. deficiency
is not always integer.
Generalized Asymptotic Deficiency 497

Theorem 37.2.1 If for some a > 0 and bj E R

n a r (OJ,n
.) bj + 0
= a + -;;; (1) ;;;, , j = 1,2, (37.4)

then
. b2 - b1
lIm(kn-n)= aa 1/'
n-HXJ a

PROOF. Since the risks r(61,n) and r(62,n) are positive and converge to zero as
n -? 00, we have k n - ? 00 as n - ? 00. Since, by (37.3)

r(62,kJ = r(61,n) +0 (n:+ 1 ) ,

it follows from (37.2) that

(1 - 1fkn)r(62,[kn]) + 1fknr(62,[kn]H) = r(61,n) +0 (n:H ) .

From (37.4) we have

[(1-1f k n ) C~:l) a {a + [~:l + Ck~l) }0

+ 1fkn Ckn~~ 1Y~ {a + [knt2+ 1 + 0 Cknt+ 1) }f/a


=: {a + ~ + (~) ria0 (37.5)

Since

it follows from (37.5) that

(37.6)

Since lim n- HXJ kn = 00, it is easily seen from (37.6) that limn--->oo kn/n = 1.
Subtracting a 1/ cx and multiplying n in (37.6), we have for large n

a 1/ a _
n _
b2 + 0(1) = a 1/CX(kn - n) kn _
+ a 1/cx _ b1 + 0(1). (37.7)
[knl aa n aa
498 M. Akahira

Since limn-tcx) n/[knl = 1, it follows from (37.7) that


b2. bl
- = hm (k
aa n-tCX) n
- n) +-,
aa
which implies
· (k n _ n ) -_ b2 - bl .
11m
n-tCX) aa
Thus we complete the proof.

Assume that Xl, X2, ... , X n , ... is a sequence of independent and identi-
cally distributed (i.i.d.) random variables according to a density f(x, e) with
respect to a O"-finite measure /-L, where e is a real-valued parameter. Let
(h,n := 8l ,n (X) and (h,n := 82,n (X) be first order efficient estimators of e, where
X=(Xl, ... ,Xn ). Denote by V(J(8j,n) (j = 1,2) the (as.) variances of the esti-
mators ()j,n'S of e. Then for any positive number k we define 82,k and V(J(8 2,k)
like (37.1) and (37.2) by

e2 k:= 8A'2 [k]


A { w.p. 7rk := k - [k],
, e2,[k]+1 w.p. 1 - 7rk,
and
V(J(82,k) := 7rk V(J(8 2,[k]) + (1 - 7rk)V(J(82,[k]+1),
and, in a similar way to (37.3), take k n such that
nV(J(82,kn ) = nV(J(8l ,n) + o(l/n). (37.8)

If, for each j = 1,2, the (as.) variance V(J(8j,n) admits the expansion

(37.9)

then, letting a=l, a=l/I(e) and bj = /j.j(()) (j = 1,2) in Theorem 37.2.1 we


have
(37.10)

which is the gen. (as.) deficiency (by as. variance) of 82 ,kn relative to 8l ,n, where
I(e) is the amount of the Fisher information of Xl, Le.

Example 37.2.1 Suppose that Xl,X2, ... ,Xn, ... is a sequence of i.Ld. ran-
dom variables according to the normal distribution with mean e and variance
1, where n 2: 2. Then we consider two estimators
1 n
81,n .- X = - LXi,
n i=l
A(W) 1
.- - {Xl + ... + X n-2 + WXn-l + (2 - w)Xn },
e2,n
n
Generalized Asymptotic Deficiency 499

as unbiased ones of (), where 0 ~ w ~ 2. Their variances are given by

'(w)
VII (()2 ,n) = -n1 + -2
2 2
n (w - 1) .
(37.11)

Note that the Fisher information number J(()) is equal to 1 and Ol,n is the
UMVU estimator of (). From (37.8) and (37.11) we have

kn = ~ {n + Jn 2 + 8n(w - 1)2} = n + 2(w - 1)2 + 0 (~) ,


hence
lim (k n - n)
n-+oo
= 2(w _1)2, (37.12)

which is the gen. as. deficiency (by variance) of otdn relative to Ol,n. Here

and
'(w)
n
VO(()2,kJ = (1- 1l"k )Vo(()2,[k
,
n]) + 1l"knVo(()2,[k
,
n]H)'
On the other hand, from (37.9) and (37.11) we have ~1 (()) == 0, ~2(()) _
2(w _1)2, hence

(37.13)

It is easily seen from (37.12) and (37.13) that (37.10) holds. For example, if
A(1/4) A
w = 1/4, it follows from (37.13) that d(()2,k n ,()l,n) = 9/8. We define

0(1/4) ._ { 02,n+l w.p. 7/8,


2,n+(9/8)'- 02,n+2 w.p. 1/8,

A(1/4) A A
VO(()2,n+(9/8)) = (7/8)Vo(()2,nH) + (1/8)Vo(()2,n+2).
From Theorem 37.2.1 we have

lim (kn - n) = 9/8.


n--+oo

This means that O~~£:) asymptotically needs 9/8 more size (of sample) than Ol,n
in the continualized one for O~~£:) to be asymptotically equivalent to Ol,n in the
variance.
500 M. Akahira

37.3 An Application to the Minimum Discrepancy


Estimation
Suppose that Y = (Yl, ... , Yk) is a random vector with the multinomial dis-
tribution M (nj 7rl (0), ... , 7rk (0)), where 0 E e and 7rl, ... ,7rk satisfy suitable
regularity conditions. For each j = 1, ... ,n, we put Xj = (Xlj , ... ,Xkj).
Let Xl, ... , Xn be LLd. random vectors with the multinomial distribution
M(1j 7rl(O), ... , 7rk(O)). Then S:= CL/j=l Xlj, ... , 2:,]=1 Xkj) is distributed as
M(nj7rl(O), ... ,7rk(O)), so we can identify Y with S. We also denote the ob-
served proportions by Pi = 2:,]=1 Xj/n (i = 1, ... .k). Rao (1961) considered an
estimator as a suitably chosen root of an equation f(Ojp) = f(OjPl, ... ,Pk) = 0,
where f satisfies certain conditions. Let M be a set of estimators Of from solving
the estimating equation for f. For l, m = 0, 1,2, ... , let
k
J-llm( 0) := L (7r;( 0) /7rr ( 0) / (7r; (0) /7rr (0)) m 7rr ( f)),
r=l
particularly, J(O) = J-l20(O). Below we omit f) in J-llm(f))'S for simplicity. Now we
consider the gen. as. deficiency (by as. variance), following Tanaka and Akahira
(1996) [see also Ponnapalli (1976)]. Let the parameter space e be a finite
open interval. A minimum discrepancy (m.d.) estimator Og of f) is defined as
one which minimizes the discrepancy function D(f)jp) := 2:,~=lPrg(7rr(f))/Pr)
for a suitable function g. Most of usual estimators like maximum likelihood
estimator (mle), minimum chi-square estimator (mcse), etc. are m.d. estimator
[see Greenwood and Nikulin (1996) for the mcse]. Let L be a set of minimum
discrepancy estimators Og for 9 satisfying certain regularity conditions. The
class L is regarded as a subset of M. Ponnapalli (1976) derived the asymptotic
variance of Og in L as

(37.14)
Generalized Asymptotic Deficiency 501

with C g := 2 + glll(l)/g"(l). Here, the function 9 and the value of C g corre-


sponding to the various estimators are given by Table 37.1.

Table 37.1: Function 9 and value of C g of various estimators

Estimator; eg Function; 9 Value of Cg


maximum likelihood; eml -logx o
. . 2 e x-I -1
mInImUm X;
A

mcs
modified minimum X2; Bmmcs x2 2
minimum Haldane discrepancy; BHmDk X k +1 k+l
minimum Hellinger distance; BmHd _x l / 2 1/2
minimum Kullback-Liebler separator; BKL xlogx 1

Since any m.d. estimator Bg in L has a bias, in order to adjust the bias up
to the order o(l/n), let
(37.15)

where

~g(e) := 2~2 {Cg (M30 - It, ::) -Ml1}'


Then Eo[B; - e] = a(lln). Let L* be a set of all the bias-adjusted estimators
in L. Let No = {(PI,oo.,Pk)IO :::; Pr :::; 1 (r = 1,oo.,k),L:~=IPr = 1} and
~o := Bg(No). We assume that ~g(e) and its derivative ~~(e) are continuous on
80 (closure of 80)' Then the following holds [see Tanaka and Akahira (1996)].

Theorem 37.3.1 The gen. as. deficiency (by as. variance) of B; relative to B'h
is given by

where

In particular, letting B'h be B:nz, we have

Furthermore, B:nl has the minimum gen. as. deficiency (by as. variance) in L*.
502 M. Akahira

Example 37.3.1 We consider the case when k = 3 and 7rl(8) = 7r2(8) = 8,


7r3(8) = 1 - 28, where 0 < 8 < 1/2 [see also Tanaka and Akahira (1995)]. Then
the mle and mcse are given by

8mes =
( + fiFij
2 P3 2
Pl +P2
2
2
-l

'

respectively, and

2
1(8) = 8(1 - 28)'

If g(x) = eg = emes and Cg = -1 as is shown in Table 37.1. From


x- l , then
Theorem 37.3.1 it follows that the gen. as. deficiency (by as. variance) of the
bias-adjusted mcse e':nes relative to the bias-adjusted mle e':nz is given by

1 - 28
d ((}':nes' (}':nz) = 4"B > 0,
A A

where the bias-adjustment is due to (37.15). For example, if 8 = 1/4, d(e':nes, e':nz)
= 1/2. We newly denote e':nes and e':nz by e':nes(n) and e':nz(n) based on the sam-
ple (Xl, . .. , Xn) and define

e* (n + ~) := { ~':nes(n) w.p. 1/2,


mes 2 8':nes(n + 1) w.p. 1/2,
and
110 8':nes n + 2
(
A ( 1)) = 21Ve ((}':nes (n)) + 21Ve (8':nes (n + 1)).
A A

From (37.10), we have


lim (k n
n--->oo
- n) = 1/2.
This means that e':nes asymptotically needs 1/2 more size (of sample) than e':nz
in the continualized one for e':ncs to be asymptotically equivalent to e':nl in the
asymptotic variance up to the order o(1/n 2 ).

References
1. Akahira, M. (1986). The Structure of Asymptotic Deficiency of Estima-
tors, Queen's Papers in Pure and Applied Mathematics 75, Kingston,
Canada: Queen's University Press.
Generalized Asymptotic Deficiency 503

2. Akahira, M. (1999a). The concept of normalized deficiency and its appli-


cations, Statistics & Decisions, 17, 403-411.

3. Akahira, M. (1999b). On the normalized deficiency of estimators, Metron,


57,25-34.

4. Fisher, R. A. (1925). Theory of statistical estimation, Proceedings of the


Cambridge Philosophical Society, 22, 700-725.

5. Fisher, R. A. (1928). On a property connecting the X2 measure of dis-


crepancy with the method of maximum likelihood, Atti de Congresso In-
ternazionale dei Mathematici, Bologna, 6, 94-100.

6. Greenwood, P. E. and Nikulin, M. S. (1996). A Guide to Chi-Squared


Testing, New York: John Wiley & Sons.

7. Hodges, J. L. and Lehmann, E. L. (1970). Deficiency, Annals of Mathe-


matical Statistics, 41, 783-801.
8. Pearson, K. (1900). On the criterion that a given system of deviation
is such that it can be reasonably supposed to have arisen from random
sampling, Philosophical Magazine, 50, 157-175.

9. Ponnapalli, R. (1976). Deficiency of minimum discrepancy estimators,


Canadian Journal of Statistics, 4, 33-50.

10. Rao, C. R. (1961). Asymptoticefficiency and limiting information, Pro-


ceedings of the Fourth Berkeley Symposium on Mathematics, Statistics
and Probability, 1, University of California Press, Berkeley, 531-546.

11. Tanaka, H. and Akahira, M. (1995). On the concept of deficiency and


estimation of parameters of the multinomial distribution, (In Japanese),
Proc. Symp., Res. Inst. Math. Sci., Kyoto Univ., 916, 52-74.

12. Tanaka, H. and Akahira, M. (1996). Deficiency of minimum discrepancy


estimators of multinomial parameters, Statistics & Decisions, 14, 241-
251.
Index

Accelerated failure time, 281 Complementary log-log model, 301


Accelerated life testing, 281, 477 Composite hypothesis, 65
Adaptive estimation, 413 Concentration, 365
Adaptive test, 195 Conditional probabilities, 65
Adaptive regression, 413 Confidence interval, 267
Adjacent hypothesis, 341 Contiguity, 341
Akaike information criterion, 255 Continuous scaling, 327
Andrews plots, 311 Correspondence analysis, 311
Angular symmetry, 401 Cox model, 211, 281
Aspirin, 387 Cox proportional hazards, 237
Asymptotic deficiency, 495 Cox regression model, 267, 301
Asymptotic efficiency, 341 Cramer-von Mises-type tests, 463
Asymptotic variance, 495 Cronbach alpha coefficient, 371
Asymptotic distribution, 211 Cumulative hazard function, 237

B-splines, 173 Degenerate U-statistics, 73


Bahadur efficiency, 341 Dependent binary data, 161
Bayesian modeling, 25 Directed divergence measure, 237
Berry-Esseen inequality, 341 Discrete correlated survival data, 255
Bivariate extreme values, 463 Discrimination, 267
BLUE,73 Discrimination index, 267
Bolshev test, 57
Empirical distribution function, 113
Calibration, 165 Exponential distribution, 89, 113
Categorized composite null hypothesis,
45 First hitting time, 227
Censored data, 211 Fisher test, 195
Censoring, 267, 255 FOE, 73
Censure, 65 Frailty models, 255
Characterization, 125, 401
Chauvenet rule, 57 Generalized Sedyakin's model, 281
Chi-square test, 9, 57, 65, 143 Gibbs distribution, 161
Chi-square distribution, 237 Goodness of fit, 89, 113, 125, 195, 281,
Chi-squared, 3 301
Chi-squared statistic decomposition, 45 tests (G-O-F), 3,173,237,371,435,
X 2 -test, 341 477

505
506 Index

Grouped data, 113 Monotonicity test, 365


Monte Carlo, 25
Hazard function, 237 Monte Carlo tests, 435
Health risk appraisal function, 267 Most powerful invariant test, 357
Hellinger distance, 311 Multiple sclerosis, 301
History of statistics, 3
Hypothesis testing, 401 Neyman smooth test, 45
Neyman-Pearson classes, 57
Information criteria, 387 Nikulin-Rao-Robson-Moore statistic, 57
Inverse Gaussian distribution, 227 Non-life insurance, 301
Isotonic regression, 449 Non-parametric estimation, 211
Non-stationarity, 211
Jeffreys prior, 387 Nonparametric alternative, 195
Jensen difference, 9 Nonparametric estimation, 477
Nonparametric maximum likelihood, 301
K. P., 3
Nonparametric methods, 435
Kaplan-Meier estimator, 477
Nonparametric regression, 185, 195
Karhunen-Loeve expansions, 463
Nonparametric test, 185
Kernel density estimator, 477
Normal distribution, 57
Kolmogrov-Smirnov, 113
Nuisance parameter, 477
k parameter alternative, modes, 425
Kullback-Leibler discrimination informa- Order statistics, 143
tion measure, 237 Orthonormal functions, 425
Outliers, 57
Laplace approximation, 357
Latent status, 227 Parametric bootstrap, 113
Lifetime data, 89 Pearson-Fisher, 45
Likelihood ratio test, 9 Pearson's goodness-of-fit statistic, 161
Linear hypothesis, 185, 195 Power function, 281
Location depth, 401 Principal components, 327
Log-odds, 255 Progressive type-II censoring, 89
Logistic regression model, 267 Projection density estimator, 341
Logistic distribution, 57 Proportional hazards, 281
L-statistics, 73
Quadratic entropy, 9
Mann-Whitney statistic, 267
Marker, 227 Rao's score test, 9
Markov assumption, 255 Rasch model, 371
Markov chain, 25 Receiver operating characteristic, 267
Markov chain Monte Carlo, 161 Rectangular grid, 449
Maximum likelihood estimator, 57, 495 Regression, 413
Maximum correlation, 327 Regression GOF tests, 73
Measurement model, 371 Reliability, 371
Minimax hypothesis testing, 195 Renewal process, 301
Minimum chi-square estimator, 495 Retro-hazard, 301
Minimum distance estimator, 477 Risk stochastic interpolation, 495
Missing data, 143 Robustness, 25
Mixing, 173
Model selection, 195 Score test, 425
Modified Andrews plots, 311 Sedyakin's model, 281
Index 507

Semi-Markov process, 301 Wald test, 9, 425


Simulation, 25 Wasserstein distance, 327
SOADR,73 Watson statistic, 143
Spacings, 89 Weibull distribution, 237
S-sample goodness of fit, 425 Wiener processes, 227
Statistical inference, 3 Wilcoxon test, 425
Step-stress, 281

Test of fit, 341


Test of independence, 463
Testing hypothesis, 413
Two-sample problem, 435

You might also like