An Introduction To Nonparametric Statistics-CRC Press (2020)
An Introduction To Nonparametric Statistics-CRC Press (2020)
Nonparametric
Statistics
CHAPMAN & HALL/CRC
Texts in Statistical Science Series
Joseph K. Blitzstein, Harvard University, USA
Julian J. Faraway, University of Bath, UK
Martin Tanner, Northwestern University, USA
Jim Zidek, University of British Columbia, Canada
Time Series
A Data Analysis Approach Using R
Robert H. Shumway and David S. Stoffer
Surrogates
Gaussian Process Modeling, Design, and Optimization for the Applied Sciences
Robert B. Gramacy
Statistical Rethinking
A Bayesian Course with Examples in R and STAN, Second Edition
Richard McElreath
Principles of Uncertainty
Second Edition
Joseph B. Kadane
John E. Kolassa
First edition published 2021
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
c 2021 Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author
and publisher cannot assume responsibility for the validity of all materials or the conse-
quences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if
permission to publish in this form has not been obtained. If any copyright material has not
been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted,
reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other
means, now known or hereafter invented, including photocopying, microfilming, and record-
ing, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, access
www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rose-
wood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC
please contact [email protected]
Typeset in CMR
by Nova Techset Private Limited, Bengaluru & Chennai, India
Contents
Introduction xi
1 Background 1
1.1 Probability Background . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Probability Distributions for Observations . . . . . . . 1
1.1.1.1 Gaussian Distribution . . . . . . . . . . . . . 1
1.1.1.2 Uniform Distribution . . . . . . . . . . . . . 2
1.1.1.3 Laplace Distribution . . . . . . . . . . . . . . 3
1.1.1.4 Cauchy Distribution . . . . . . . . . . . . . . 3
1.1.1.5 Logistic Distribution . . . . . . . . . . . . . . 4
1.1.1.6 Exponential Distribution . . . . . . . . . . . 4
1.1.2 Location and Scale Families . . . . . . . . . . . . . . . 4
1.1.3 Sampling Distributions . . . . . . . . . . . . . . . . . 5
1.1.3.1 Binomial Distribution . . . . . . . . . . . . . 5
1.1.4 χ2 -distribution . . . . . . . . . . . . . . . . . . . . . . 5
1.1.5 T -distribution . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.6 F -distribution . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Elementary Tasks in Frequentist Inference . . . . . . . . . . 6
1.2.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . 6
1.2.1.1 One-Sided Hypothesis Tests . . . . . . . . . 7
1.2.1.2 Two-Sided Hypothesis Tests . . . . . . . . . 9
1.2.1.3 P -values . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Confidence Intervals . . . . . . . . . . . . . . . . . . . 11
1.2.2.1 P -value Inversion . . . . . . . . . . . . . . . 11
1.2.2.2 Test Inversion with Pivotal Statistics . . . . 12
1.2.2.3 A Problematic Example . . . . . . . . . . . . 12
1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
v
vi Contents
3 Two-Sample Testing 39
3.1 Two-Sample Approximately Gaussian Inference . . . . . . . 39
3.1.1 Two-Sample Approximately Gaussian Inference on
Expectations . . . . . . . . . . . . . . . . . . . . . . . 39
3.1.2 Approximately Gaussian Dispersion Inference . . . . . 40
3.2 General Two-Sample Rank Tests . . . . . . . . . . . . . . . . 41
3.2.1 Null Distributions of General Rank Statistics . . . . . 41
3.2.2 Moments of Rank Statistics . . . . . . . . . . . . . . . 42
3.3 A First Distribution-Free Test . . . . . . . . . . . . . . . . . 43
3.4 The Mann-Whitney-Wilcoxon Test . . . . . . . . . . . . . . 46
3.4.1 Exact and Approximate Mann-Whitney Probabilities 48
3.4.1.1 Moments and Approximate Normality . . . . 48
3.4.2 Other Scoring Schemes . . . . . . . . . . . . . . . . . . 50
3.4.3 Using Data as Scores: the Permutation Test . . . . . . 51
3.5 Empirical Levels and Powers of Two-Sample Tests . . . . . . 53
3.6 Adaptation to the Presence of Tied Observations . . . . . . . 54
3.7 Mann-Whitney-Wilcoxon Null Hypotheses . . . . . . . . . . 55
3.8 Efficiency and Power of Two-Sample Tests . . . . . . . . . . 55
3.8.1 Efficacy of the Gaussian-Theory Test . . . . . . . . . . 55
3.8.2 Efficacy of the Mann-Whitney-Wilcoxon Test . . . . . 56
3.8.3 Summarizing Asymptotic Relative Efficiency . . . . . 57
3.8.4 Power for Mann-Whitney-Wilcoxon Testing . . . . . . 57
3.9 Testing Equality of Dispersion . . . . . . . . . . . . . . . . . 58
3.10 Two-Sample Estimation and Confidence Intervals . . . . . . 60
3.10.1 Inversion of the Mann-Whitney-Wilcoxon Test . . . . 61
3.11 Tests for Broad Alternatives . . . . . . . . . . . . . . . . . . 62
3.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Bibliography 201
Index 211
Introduction
Preface
This book is intended to accompany a one-semester MS-level course in non-
parametric statistics. Prerequisites for the course are calculus through mul-
tivariate Taylor series, elementary matrix algebra including matrix inversion,
and a first course in frequentist statistical methods including some basic prob-
ability. Most of the techniques described in this book apply to data with only
minimal restrictions placed on their probability distributions, but performance
of these techniques, and the performance of analogous parametric procedures,
depend on these probability distributions. The first chapter below reviews
probability distributions. It also reviews some objectives of standard frequen-
tist analyses. Chapters covering methods that have elementary parametric
counterparts begin by reviewing those counterparts. These introductions are
intended to give a common terminology for later comparisons with new meth-
ods, and are not intended to reflect the richness of standard statistical analysis,
or to substitute for an intentional study of these techniques.
xi
xii Introduction
package is so tightly tied to the presentation in this book, and hence of less
general interest, it is hosted on the github repository, and installed via
library(devtools)
install_github("kolassa-dev/NonparametricHeuristic")
and, once installed, loaded into R using the library command.
An appendix gives guidance on performing some of these calculations using
SAS.
Errata and other materials will be posted at
http://stat.rutgers.edu/home/kolassa/NonparametricBook
as they become available.
Acknowledgments
In addition to works referenced in the following chapters, I consulted Stigler
(1986) and Hald (1998) for early bibliographical references. Bibliographic trails
have been tracked through documentation of software packages R and SAS,
and bibliography resources from JSTOR, Citulike.org, Project Euclid, and
various publishers’ web sites have been used to construct the bibliography.
I am grateful to the Rutgers Physical Sciences librarian Melanie Miller, the
Rutgers Department of Statistics Administrative Assistant Lisa Curtin, and
the work study students that she supervised, and the Rutgers Interlibrary
Loan staff for assistance in locating reference material.
I am grateful to my students at both Rutgers and the University of
Rochester to whom I taught this material over the years. Halley Constantino,
Jianning Yang, and Peng Zhang used a preliminary version of this manuscript
and were generous with their suggestions for improvements. My experience
teaching this material helped me to select material for this volume, and
to determine its level and scope. I consulted various textbooks during this
time, including those of Hettmansperger and McKean (2011), Hettmansperger
(1984), and Higgins (2004).
I thank my family for their patience with me during preparation of this
volume. I thank my editor and proofreader for their important contributions.
I dedicate this volume to my wife, Yodit.
1
Background
1
2 Background
.......
.... ....
... ....
.. ...
0.3 .
..
. ..
..
.
... ..
...
.
.... . . ....
... . . ....
.
. . . ..
.... ...
0.25 ...
............................................................................................................................
.. . .... ... .
... .
..
..
.. . ... ... ..
. . .... ... . ..
.. ..
... . .. . .
... . ..
.. . ... . .
... . ..
... ... . ..
0.2 ...
.. .
.
.
.
.
...
.
. .
...
.
...
.
..
..
..
... . .... ... . ..
. ..
dens- ...
.. .
. .
...
. ..
..
..
. ..
..
... . ..
ity ... .
. .
.
.
. .
..
.
.
. ..
..
0.15 .. .
... . .
.
.
.
.. ..
.
...
..
.
. ....
..
... ... .
.. .. . ...
... . .. ... . ...
.. . .. ..
... .. ... ....
..... .. ... ...
... ... .. .
0.1 . ..
. ..
.
. .
..
..
.
.
.. ....
....
..
.. .
.
. .
.. ..... ... .
.
. .. ....
. ......... . ..... ... .
. .
... ..
..... .
..
. . ........ .
. ..
.......
. .. .
......
.....
. .
... .
.. ....... Cauchy .
. ...
.
. ...... ... .. ......... .
0.05 . ..............
.
..............
.
.
.
..
.
..
........ .
......... .
.................
.
.............
.
. .
.
..
..
. . . . . Gaussian .
..
..
........... .
.............
................
...................... . . ... .. . .......................
. .
. .
. .
..
.
....................... Uniform ..
..
...........................................................
.
..........................................................
0
-4 -2 0 2 4
ordinate
The parameter µ is both the expectation and the median, and σ is the standard
deviation. The Gaussian cumulative distribution function is
Z x
FG (x) = fG (y) dy.
−∞
Again, the expectation and median for this distribution are both √ θ, and the
distribution is symmetric about θ. The standard deviation is λ/ 12. A com-
mon canonical member of this family is the distribution uniform on [0, 1],
with θ = 1/2 and λ = 1. The distribution in its generality will be denoted by
U(θ, λ).
As before, the expectation and median of this distribution are both θ. The
standard deviation of this distribution is σ. The distribution is symmetric
about θ. A canonical member of this family is the one with θ = 0 and σ = 1.
The distribution in its generality will be denoted by La(θ, σ 2 ).
The expectation is 1, and the median is log(2). The inequality of these values
is an indication of the asymmetry of the distribution. The standard deviation
is 1. The distribution will be denoted by E.
1.1.4 χ2 -distribution
If X1 , . . . , Xk are independent random variables, each with a standard
Gaussian distribution (that is, Gaussian with expectation zero and vari-
ance one), then the distribution of the sum of their squares is called the
chi-square distribution, and is denoted χ2k . Here the index k is called the
degrees of freedom.
Distributions of quadratic forms of correlated summands sometimes have
a χ2 distribution as well. If Y has a multivariate Gaussian distribution with
dimension k, expectation 0 and variance matrix Υ, and if Υ has an inverse,
then
Y > Υ−1 Y ∼ χ2k .
One can see this by noting that Υ may be written as ΘΘ> . Then X =
Θ−1 Y is multivariate Gaussian with expectation 0, and variance matrix
Θ−1 ΘΘ> Θ−1> = I, where I is the identity matrix with k rows and
columns. Then X is a vector of independent standard Gaussian variables,
and Y > Υ−1 Y = X > X.
Furthermore, still assuming
Xj ∼ G(0, 1), independent (1.2)
6 Background
the distribution of
k
X
W = (Xi − δi )2 (1.3)
i=1
1.1.5 T -distribution
2
p V has a χk distribution, and
When U has a standard Gaussian distribution,
U and V are independent, then T = U/ V /k has a distribution called
Student’s t distribution, denoted here by Tk , with k called the degrees of
freedom.
1.1.6 F -distribution
When U and V are independent random variables, with χ2k and χ2m distribu-
tions respectively, then F = (U/k)/(V /m) is said to have an F distribution
with k and m degrees of freedom; denote this distribution by Fk,m . If a vari-
able T has a Tm distribution, then T 2 has an F1,m distribution.
for which the null hypothesis is rejected is called the critical region.
θ ≥ 0. (1.6)
The other type of possible error occurs when the alternative hypothesis is
true, but the null hypothesis is not rejected. The probability of erroneously
failing to reject the null hypothesis is called the type II error rate, and is
denoted β. More commonly, the behavior of the test under an alternative
hypothesis is described in terms of the probability of a correct answer, rather
than of an error; this probability is called power; power is 1 − β.
One might attempt to control power as well, but, unfortunately, in the
common case in which an alternative hypothesis contains probability distri-
butions arbitrarily close to those in the null hypothesis, the type II error rate
will come arbitrarily close to one minus the test level, which is quite large.
Furthermore, for a fixed sample size and mechanism for generating data, once
a particular distribution in the alternative hypothesis is selected, the smallest
possible type II error rate is fixed, and cannot be independently controlled.
Hence generally, tests are constructed primarily to control level. Under this
paradigm, then, a test is constructed by specifying α, choosing a critical value
to give the test this type I error, and determining whether this test rejects the
null hypothesis or fails to reject the null hypothesis.
In the one-sided alternative hypothesis formulation {θ > θ0 }, the inves-
tigator is, at least in principle, interested in detecting departures from the
null hypothesis that vary in proximity to the null hypothesis. (The same
observation will hold for two-sided tests in the next subsection). For plan-
ning purposes, however, investigators often pick a particular value within the
alternative hypothesis. This particular value might be the minimal value of
practical interest, or a value that other investigators have estimated. They
then calculate the power at this alternative, to ensure that it is large enough
to meet their needs. A power that is too small indicates that there is a sub-
stantial chance that the investigator’s alternative hypothesis is correct, but
that they will fail to demonstrate it. Powers near 80% are typical targets.
Consider a test with a null hypothesis of form θ = θ0 and an alternative
hypothesis of form θ = θA , using a statistic T such that under the null hypoth-
esis T ∼ G(θ0 , σ02 ), and under the alternative hypothesis T ∼ G(θA , σA
2
). Test
0
with a level α, and without loss of generality assume that θA > θ . In this case,
the critical value is approximately t◦ = θ0 + σ0 zα . Here zα is the number such
that Φ(zα ) = 1 − α. A common level for such a one-sided test is α = 0.025;
z0.025 = 1.96. The power for such a one-sided test is
θA = θ0 + σA zβ + σ0 zα . (1.9)
Elementary Tasks in Frequentist Inference 9
One can then ask whether this effect size is plausible. More commonly, σ0 and
σA both may be made to depend on a parameter representing sample size,
with both decreasing to zero as sample size increases. Then (1.8) is solved for
sample size.
{W ≥ w◦ } (1.13)
Power for the two-sided test is the same probability as calculated in (1.11),
with the θA substituted for θ0 , and power substituted for α. Again, assume
that large values of θ make larger values of T more likely. Then, for alternatives
θA greater than θ0 , the first probability added in
1 − β = PθA [T ≤ t◦L ] + PθA [T ≥ t◦U ]
is quite small, and is typically ignored for power calculations. Additionally,
rejection of the null hypothesis because the evidence is in the opposite direc-
tion of that anticipated will result in conclusions from the experiment not
comparable to those for which the power calculation is constructed. Hence
power for the two-sided test is generally approximated as the power for the
one-sided test with level half that of the two-sided tests, and α in (1.8) is often
0.025, corresponding to half of the two-sided test level.
PkSome2 tests to be constructed in this volume may be expressed as W =
j=1 Uj for variables Uj which are, under the null hypothesis, approximately
standard Gaussian and independent; furthermore, in such cases, the critical
region for such tests is often of the form {W ≥ w◦ }. In such cases, the test
of level α rejects the null hypothesis when W ≥ χ2k,α , for χ2k,α the 1 − α
quantile of the χ2k distribution. If the standard Gaussian approximation for
the distribution of Uj is only approximately correct, then the resulting test
will have level α approximately, but not exactly.
If, under the alternative hypothesis, the variables Uj have expectations µj
and standard deviations εj , the alternative distribution of W will be a compli-
cated weighted sum of χ21 (µ2j ) variables. Usually, however, the impact of the
move from the null distribution to the alternative distribution is much higher
on the component expectations than on the standard deviations, and one
might treat these alternative standard deviations fixed at 1. With this Pksimpli-
fication, the sampling distribution of W under the alternative is χ2k ( i=1 δi2 ),
the non-central chi-square distribution.
1.2.1.3 P -values
Alternatively, one might calculate a test statistic, and determine the test level
at which one transitions from rejecting to not rejecting the null hypothesis.
This quantity is called a p-value. For a one-sided test with critical region of
form (1.5), the p-value is given by
P0 [T ≥ tobs ] , (1.14)
for tobs the observed value of the test statistic. For two-sided critical values of
form (1.10), with condition (1.12), the p-value is given by
2 min(P0 [T ≥ tobs ] , P0 [T ≤ tobs ]). (1.15)
These p-values are interpreted as leading to rejection of the null hypothesis
when they are as low as or lower than the test level specified in advance by
the investigator before data collection.
Elementary Tasks in Frequentist Inference 11
Pθ [L < θ < U ] ≥ 1 − α.
There may be t such that the equation PθL [T ≥ t] = α/2 has no solution,
because Pθ [T ≥ t] > α/2 for all θ. In such cases, take θL to be the lower
bound on possible values for θ. For example, if π ∈ [0, 1], and T ∼ Bin(n, π),
then Pπ [T ≥ 0] = 1 > α/2 for all π, Pπ [T ≥ 0] = α/2 has no solution, and
π L = 0. Alternatively, if θ can take any real value, and T ∼ Bin(n, exp(θ)/(1+
exp(θ))), then Pθ [T ≥ 0] = α/2 has no solution, and θL = −∞. Similarly,
there may be t such that the equation PθU [T ≤ t] = α/2 has no solution,
because Pθ [T ≤ t] > α/2 for all θ. In such cases, take θU to be the upper
bound on possible values for θ.
Construction of intervals for the binomial proportion represents a simple
example in which p-values may be inverted (Clopper and Pearson, 1934).
Then
{θ|t◦L < T (θ, data) < t◦U } (1.20)
is a confidence interval, if it is really an interval. In the case when (1.20) is
an interval, and when T (θ, data) is continuous in θ, then the interval is of the
form (L, U ); that is, the interval does not include the endpoints.
for υ = σzα/2 .
If X 2 + Y 2 < υ 2 , then Q(ρ) in (1.21) has a negative coefficient for
ρ , and the maximum value is at ρ = XY /(X 2 − υ 2 ). The maximum is
2
Exercises 13
(υ 2 −υ 2 + X 2 + Y 2 )/(υ 2 − X 2 ) < 0, and so the inequality in (1.21) holds
for all ρ, and the confidence interval is the entire real line.
If X 2 + Y 2 > υ 2 > X 2 , then the quadratic form in (1.21) has a negative
coefficient for ρ2 , and the maximum is positive. Hence values satisfying the
inequality in (1.21) are very large and very small values of ρ; that is, the
confidence interval is
√ ! √ !
−XY − υ X 2 + Y 2 − υ 2 −XY + υ X 2 + Y 2 − υ 2
−∞, ∪ ,∞ .
υ2 − X 2 υ2 − X 2
If X 2 > υ 2 , then the quadratic form in (1.21) has a positive coefficient for
2
ρ , and the minimum is negative. Then the values of ρ satisfying the inequality
in (1.21) are those near the minimizer XY /(X 2 − υ 2 ). Hence the interval is
√ √ !
XY − υ X 2 + Y 2 − υ 2 XY + υ X 2 + Y 2 − υ 2
, .
X 2 − υ2 X 2 − υ2
1.3 Exercises
1. Demonstrate that the moment generating function for P the statistic
k
(1.3), under (1.2), depends on δ1 , . . . , δk only through j=1 δj2 .
2
One-Sample Nonparametric Inference
15
16 One-Sample Nonparametric Inference
mon techniques designed for data with a Gaussian distribution require con-
sequences of this distribution beyond the marginal distribution of the sample
average.
TABLE 2.1: True levels for the T Test, and Sign Test, and Exact Sign Test,
nominal level 0.05
In all cases above, the t-test succeeds in providing a test level not much
larger than the target nominal level. On the other hand, in some cases the
true level is significantly below that expected.
This effect decreases as sample level increases.
Below, the term median refers to the population version, unless otherwise
specified.
18 One-Sample Nonparametric Inference
.. ...
..... .. .
0.4 .. .. .. ..
.. ... .. ....
.... .... .... ....
.. . .. .
.. .. .. ..
.. .. .. ..
.. .... .. ....
.... .. ..
. ...
.. ... .. ...
... .. ... ...
.. ..
.... ...
..
.
.
..
...
...
... .. . ...
... ..
0.3 ...
..
..
... .
.
.
. ...
..
... .. .
. ...
... ... ... ..
..
.... .. ... ...
.. ... ..
. ..
.... . ..
.... ... ...... ...
.. ...... ..
... ..
Density ...
...
..
... ..
..
0.2 .... ...
.. ...
... ..
... ..
..
... ..
... ..
...
... ...
... ...
... ..
... ..
..
... ..
... ...
0.1 ... ..
..
... ...
... ..
..
.. ...
..
. ..
..
. ...
.
.. ...
...
. ...
....
....
. ....
..
...
... ........
.
.... ...................
......................................... .........................
0
-4 -2 0 2 4
Data Value
Alternatively, one might define the sample median to minimize the sum of
distances from the median:
n
X
smed [X1 , . . . , Xn ] = argmin |Xi − η|; (2.4)
η
i=1
that is, the estimate minimizes the sum of distances from data points to
the potential median value, with distance measured by the sum of absolute
values. This definition (2.4) exactly coincides with the earlier definition (2.3)
for n odd, shares in the earlier definition’s lack of uniqueness for even sample
One-Sample Median Methods 19
sizes, and typically shares the opposite resolution (averaging the middle two
observations) of this non-uniqueness. In contrast, the sample mean X̄ of (2.1)
satisfies
Xn
X̄ = argmin |Xi − η|2 . (2.5)
η
i=1
for some convex function %; the sample mean uses %(z) = z 2 /2 and the sample
median uses %(z) = |z|. Huber (1964) suggests an alternative estimator com-
bining the behavior of the mean and median, by taking ρ quadratic for small
values, and continuing linearly for larger values, thus balancing increased effi-
ciency of the mean and the smaller dependence on outliers of the median; he
suggests (
z 2 /2 for |z| < k
%(z) = 2
, (2.7)
k|z| − k /2 for |z| ≥ k
and recommends a value of the tuning parameter k between 1 and 2.
That is, tl is that potential value for T such that not more than α/2 probability
sits below it. The largest such tl has probability at least 1 − α/2 equal to or
larger than it, and at least α/2 equal to or smaller than it; hence tl is the α/2
quantile of the Bin(n, 1/2) distribution. Generally, the inequality in (2.10) is
strict; that is, ≤ is actually <. For combinations of n and α for which this
inequality holds with equality, the quantile is not uniquely defined, and take
the quantile to be the lowest candidate. Symmetrically, one might choose the
smallest tu so that
n
n n
X
(1/2) ≤ α/2; (2.11)
j=t
j
u
n + 1 − tu is the α/2 quantile of the Bin(n, 1/2) distribution, with the opposite
convention used in case the quantile is not uniquely defined.
Then, reject the null hypothesis if T ≤ t◦L or T ≥ t◦U for t◦L = tl − 1 and
◦
tU = tu . This test is called the (exact) sign test, or the binomial test (Higgins,
2004). An approximate version of the sign test might be created by selecting
critical values from the Gaussian approximation to the distribution of T (θ0 ).
Again, direct attention to Table 2.1. Both variants of the sign test succeed
in keeping the test level no larger than the nominal value. However, the sign
test variants, because of the discreteness of the binomial distribution, in some
cases achieve levels much smaller than the nominal target. Subtable (a), for
sample size 10, is the most extreme example of this; subtable (b), for sample
size 17, represents the smallest reduction in actual sample size, and subtable
(c), for sample size 40, is intermediate. Note further that, while the asymp-
totic sign test, based on the Gaussian approximation, is not identical to the
exact version, for subtables (a) and (c) the levels coincide exactly, since for
all simulated data sets, the p-values either exceed 0.05 or fail to exceed 0.05
for both tests. Subtable (b) exhibits a case in which for one data value, the
exact and approximate sign tests disagree on whether p-values exceed 0.05.
Table 2.2 presents characteristics of the exact two-sided binomial test of
the null hypothesis that the probability of success is half, with level α = 0.05,
applied to small samples. In this case, the two-sided p-value is obtained by
doubling the one-sided p value.
For small samples (n < 6), the smallest one-sided p-value, 1/2n , is greater
than .025, and the null hypothesis is never rejected. Such small samples are
omitted from Table 2.2. This table consists of two subtables side by side, for
One-Sample Median Methods 21
TABLE 2.2: Exact levels and exact and asymptotic lower critical values for
symmetric two-sided binomial tests of nominal level 0.05
n ≤ 23, and for n > 23. The first column of each subtable is sample size. The
second is tl − 1 from (2.10). The third is the value taken from performing the
same operation on the Gaussian approximation; that is, it is the largest a such
that √
Φ((tl − 1 − n/2 + 0.5)/(0.5 n)) ≤ α/2. (2.12)
The fourth is the observed test level; that is, it is double the right side of
(2.10). Observations here agree with those from Table 2.1; for sample size 10,
the level of the binomial test is severely too small, for sample size 17, the
binomial test has close to the optimal level, and for sample size 40, the level
for the binomial test is moderately too small.
A complication (or, to an optimist, an opportunity for improved approx-
imation) arises when approximating a discrete distribution by a continuous
distribution. Consider the case with n = 10, exhibited in Figure 2.2. Bar areas
represent the probability under the null hypothesis of observing the number of
successes. Table 2.2 indicates that the one-sided test of level 0.05 rejects the
null hypothesis for W ≤ 1. The actual test size is 0.0215, which is graphically
represented as the sum of the areas in the bar centered at 1, and the very small
area of the neighboring bar centered at 0. Expression (2.12) approximates the
sum of these two bar areas by the area under the dotted curve, representing
the Gaussian
√ density with the appropriate expectation n/2 = 5 and standard
deviation n/2 = 1.58. In order to align the areas of the bars most closely
22 One-Sample Nonparametric Inference
with the area under the curve, the Gaussian area should be taken to extend to
the upper end of the bar containing 1; that is, evaluate the Gaussian distribu-
tion function at 1.5, explaining the 0.5 in (2.12). More generally, for a discrete
distribution with potential values ∆ units apart, the ordinate is shifted by
∆/2 before applying a Gaussian approximation; this adjustment is called a
correction for continuity.
0.25 ...........
.................................
. ....
. ... ....
...... ......
.. ...
. ... ...
.
.... ... ... ...
.. ... ... ...
... ... ... ...
... ... ... ..
............................
. ..
. ............................
. ... .. ..
0.2 ... .... ..
. ... .. ...
.. ..
.... ... ..
. ... .. ..
... ... .... ... .. ..
.. ..
... .. .
. .
. .. ...
... ... .... .... .. ..
.. ..
... ... ... ... ....
...... ... ... ....
...... ... ... .....
...
... ... ...
...... ... ... .......
...... ... ... ......
0.15 .. ...
... ...
...
...
...
.
.
... ...
... ...
... ... ... .... ... ..
... ... ... ... ... ...
.. ... .
. .
. ... ...
Probability ... ... .... .... ... ...
... ... ... ... ... ..
.......................... .
. .
. .............................
.... ... .... .... .... ... ... ...
... ... ... ... ... ... .. ...
... ... ... ... ... ... .. ..
.. ..
0.1 ... ...
... ...
...
...
...
...
...
...
...
...
.. ..
.. ..
... .. ... ... ... ... .. ...
.. ..
... ... ... ... ... ... .. ..
...... ... ... ... ... ....
..... . .....
... .
. .
. . ..
.
. .
. ..
. ..
. ..
. ....
........ .... .... .... .... ......
.. .. .
. .
. .
. .
. ... ...
.. ...
. ..
. ..
. ..
. ..
. ... ...
... ... ..
. ..
. ..
. ..
. ... ...
0.05 ..
................................
.
.
. .
.
.
....
.
.
.
....
.
.
.
....
.
.
.
....
... ...
.............................
... ... ... ... ... ... ... ... .. ..
.. ..
... ... ... ... ... ... ... ... .. ..
.. ..
... .... ... ... ... ... ... ... .. ..
..... ... ... ... ... ... ... ...
....
. .
. .
. .
. .
. .
. .
. .
.
........
. ..
. ..
. ..
. ..
. ..
. .
.
.
.
.... .......
.
..... ... ..
. ..
. ..
. ..
. ..
. .
.
. .............................
.................................
. ..
. ..
. ..
. ..
. ..
. ..
. .... ........
... .
.. .... .
.
. .
.
. .
.
. .
.
. .
.
. .
.
. .
.
. . ...........
.
.................................. .......................
0 . . . . . . . .
0 1 2 3 4 5 6 7 8 9 10
Sample Size 10, Target Level 0.05, and from Table 2.2 a − 1 = 1
The power of the sign test is determined
by PθA Xj ≤ θ0 for values
of θA 6= θ0 . Since θA > θ0 if PθA Xj ≤ θ0 < 1/2, alternatives θA > θ0
correspond to one sided alternatives P [Yj = 1] < 1/2.
If θ0 is the true population median of the Xj , and if there exists
a set of
form (θ0 − , θ0 + ), with > 0, such that P Xj ∈ (θ0 − , θ0 + ) = 0, then
any other θ in this set is also a population median for Xj , and hence the test
will have power against such alternatives no larger than the test level. Such
occurrences are rare.
Table 2.3 represents powers for these various tests for various sample levels.
The alternative is chosen to make the t-test have power approximately .80 for
the Gaussian and Laplace distributions, using (1.9). In √ this case both σ0 and
σA for the Gaussian and Laplace distributions are 1/ n. Formula (1.9) is
inappropriate for the Cauchy distribution, since in this case X̄ does not have
One-Sample Median Methods 23
TABLE 2.3: Power for the T Test, Sign Test, and Exact Sign Test, nominal
level 0.05
Example 2.3.1 An early (and very simple) application of this test was
to test whether the proportion of boys born in a given year is the same as
the proportion of girls born that year (Arbuthnott, 1712). Number of births
was determined for a period of 82 years. Let Xj represent the number
of births of boys, minus the number of births of girls, in year j. The
parameter θ represents the median amount by which the number of girls
exceeds the number of boys; its null value is 0. Let Yj take the value 0 for
years in which more girls than boys are born, and 1 otherwise. Note that
in this case, (2.9) is violated, but P [Xj = 0] is small, and this violation
is not important. Test at level 0.05.
Values in (2.10) and (2.11) are tl = 32 and tu = 51, obtained as the
0.025 and 0.975 quantiles of the binomial distribution with 82 trials and
24 One-Sample Nonparametric Inference
4.76 × 10−7 , 1.00 × 10−5 , 1.00 × 10−4 , 6.34 × 10−4 , 2.85 × 10−3 ,
9.70 × 10−3 , 2.59 × 10−2 , 5.54 × 10−2 , 9.70 × 10−2 , 1.40 × 10−1 ,
4.77 × 10−7 , 1.05 × 10−5 , 1.11 × 10−4 , 7.45 × 10−4 , 3.60 × 10−3 ,
1.33 × 10−2 , 3.91 × 10−2 , 9.46 × 10−2 , 1.92 × 10−1 , 3.32 × 10−1 .
The largest of these cumulative sums smaller than 0.025 is the sixth,
corresponding to T < 6. Hence tl = 6. Similarly, tu = 16. Reject the null
hypothesis that the mean is 0.26 if T < 6 or if T ≥ 16. Since 11 of the
observations are greater than the null median 0.26, T = 11. Do not reject
the null hypothesis.
Alternatively, one might calculate a p-value. Using (1.15), the p-value
is 2 min(P0 [T ≥ 11] , P0 [T ≤ 11]) = 1.
Furthermore, the confidence interval for the median is (X(6) , X(16) ) =
(0.118, 0.358).
The values tl and tu may be calculated in R by
a<-qbinom(0.025,21,.5); b<-21+1-qbinom(0.025,21,.5)
and the ensemble of calculations might also have been performed in R
using
arsenic<-as.data.frame(scan(’arsenic.dat’,
what=list(age=0,sex=0,drink=0,cook=0,water=0,nails=0)))
library(BSDA)#Gives sign test.
SIGN.test(arsenic$nails,md=0.26)#Argument md gives null hyp.
Graphical construction of a confidence interval for the median is calcu-
lated by
library(NonparametricHeuristic)
invertsigntest(log(arsenic$nails),maint="Log Nail Arsenic")
and is given in Figure 2.3. Instructions for installing this last library are
given in the introduction, and in Appendix B.
not rejected are between the horizontal lines; log medians in the confidence
intervals are values of the test statistic within this region.
-3 -2 -1 0 1 2
In this construction, order statistics (that is, the ordered values) are first
plotted on the horizontal axis, with the place in the ordered data set on the
vertical axis. These points are represented by the points in Figure 2.3 where
the step function transitions from vertical to horizontal, as one moves from
lower left to upper right. Next, draw horizontal lines at the values tl and tu ,
given by (2.10) and (2.11) respectively. Finally, draw vertical lines through
the data points that these horizontal lines hit.
For this particular example, the exact one-sided binomial test of level 0.025
rejects the null hypothesis that the event probability is half if the sum of event
indicators is 0, 1, 2, 3, 4, or 5; tl = 6. For Yj of (2.8), the sum is less than
6 for all θ to the left of the point marked X(tl ) . Similarly, the one-sided level
0.025 test in the other direction rejects the null hypothesis if the sum of event
indicators is at least tu = 16. The sum of the Yj exceeds 15 for θ to the right
of the point marked X(tu ) .
By symmetry, one might expect tl = n − tu , but this is not the case. The
asymmetry in definitions (2.10) and (2.11) arises because construction of the
confidence interval requires counting not the data points, but the n − 1 spaces
One-Sample Median Methods 27
between them, plus the regions below the minimum and above the maximum,
for a total of n + 1 ranges. Then tl = n + 1 − tu .
√This interval is not of the usual form θ̂ ± 2σ̂, for σ̂ with a factor of
1/ n. Cramér (1946, pp. 368f.) shows that if X1 , . . . , Xn is a set of indepen-
dent random variables, each having density f , then Var [smed [X1 , . . . , Xn ]] ≈
1/(4f (θ)2 n). Chapter 8 investigates estimation of this density; this estimate
can be used to estimate the median variance, but density estimation is harder
than the earlier confidence interval rule.
Example 2.3.3 Test the null hypothesis that the upper quartile (that is,
the 0.75 quantile) of the arsenic nail data from Example 2.3.2 is the ref-
erence value 0.26, and give a confidence interval for this quantile. The
analysis is the same as before, except that tl and tu are different. We
determine tl and tu in (2.13). Direct calculation, or using the R com-
mands
a<-qbinom(0.025,21,.75);b<-21+1-qbinom(0.025,21,1-0.75)
shows tl = 12 and tu = 20. Since T = 10 < tl , reject the null hypothesis
that the upper quartile is 0.26. Furthermore, the confidence interval is the
region between the twelfth and twentieth ordered values, (X(12) , X(20) ) =
(0.269, 0.851). With data present in the R workspace, one calculates a
confidence interval as
28 One-Sample Nonparametric Inference
sort(arsenic$nails)[c(a,b)]
represent the power for test Tj using n observations, under the alternative θA :
Assume that
$j,n (θA ) is continuous and increasing in θA for all j, n,
limθA →∞ $j,n (θA ) = 1, (2.14)
limn→∞ $j,n (θA ) = 1 for all θA > θ0 .
lim n1 /n2 ,
n1 →∞
2.4.1.1 Power
Consider test statistics satisfying
Tj ∼ G(µj (θ), ςj2 (θ)), for ςj (θ) > 0, µj (θ) increasing in θ. (2.15)
The Gaussian distribution in (2.15) does not need to hold exactly; holding
approximately is sufficient. In thish case, one i can find the critical values for
the two tests, t◦j,nj , such that P0 Tj ≥ t◦j,nj = α. Since (Tj − µj (0))/ςj (0) is
approximately standard Gaussian under the null hypothesis, then
Hence
t◦j,nj = µj (0) + ςj (0)zα . (2.16)
The power for test j is approximately
Often the variance of the test statistic changes slowly as one moves away from
the null hypothesis; in this case, the power for test j is approximately
Then
A √ σj (0)zα A A
$j,nj (θ ) = 1 − Φ nj µj (0) + √ − µj (θ ) /σj (θ ) . (2.20)
nj
As sample sizes increase, power increases for a fixed alternative, and calcula-
tions will consider a class of alternatives moving towards the null. Calculations
below will consider behavior of the expectation of the test statistic near the
null hypothesis (which is taken as θ = 0). Suppose that
µj (0) − µj (θA )
A √
$j,nj (θ ) ≈ 1 − Φ nj + zα
σj (0)
µj (θA ) − µj (0)
√
= Φ nj − zα . (2.22)
σj (0)
This expression for approximate power may be solved for sample size, by
noting that if
$j,nj (θA ) = 1 − β, (2.23)
√ A
h i
µ (θ )−µ (0)
then $j,nj (θA ) = Φ(zβ ), and (2.22) holds if nj j σj (0) j − zα = zβ , or
Common values for α and β are 0.025 and 0.2, giving upper Gaussian quantiles
of zα = 1.96 and zβ = 0.84. Recall that z with a subscript strictly between
0 and 1 indicates that value for which a standard Gaussian random variable
has that probability above it.
It may be of use in practice, and will be essential in the efficiency cal-
culations below, to approximate which member of the alternative hypothesis
Comparing Tests 31
corresponds with a test of a given power, with sample size held fixed. Solving
(2.24) exactly for θA is difficult, since the function µ is generally non-linear.
Approximating this function using a one-term Taylor approximation,
for
ej = µ0j (0)/σ(0).
The quantity ej is called the efficacy of test j. Setting this power to 1 − β,
√
zα − nj ej θA = z1−β . Solving this equation for θA ,
√
θA ≈ (zα + zβ )/ nj ej , (2.25)
verifying the requirement that θA get close to zero. This expression can be
used to approximate an effect size needed to obtain a certain power with a
certain sample size and test level, and will be used in the context of asymptotic
relative efficiency.
TABLE 2.4: Empirical powers for one-sample location tests with sample size
ratios indicated by asymptotic relative efficiency
t-test Sign
√ test Relative
Gaussian µ0 (0) 1/ρ 1/( 2πρ)
σ(0) 1 p 1/2 p
e 1/ρ 2/π/ρ
√ π/2
Laplace µ0 (0) 1 1/ 2
σ(0) 1 1/2
√ √
e 1 2 1/ 2
Cauchy µ0 (0) 1 π −1
σ(0) ∞ 1/2
e 0 2/π 0
Example 2.4.1 In this example I calculate power for a sign test applied
to 49 observations from a Gaussian distribution with unit variance. Sup-
pose X1 , . . . , X49 ∼ G(θ, 1), with null hypothesis θ = 0 and alternative
hypothesis θ = 1/2. The sign test statistic, divided by n, approximately
satisfies (2.15)
√ and (2.19), with µ and σ given by (2.27). √
Then µ1 (0) = .5,
σ1 (0) = 0.5 × 0.5 = .5, µ1 (0.5) = 0.691, σ1 (0.5) = .691 × 0.309 =
0.462, and power for a one-sided test of level 0.025, or a two-sided test
of level 0.05, is approximated by (2.17): 1 − Φ(7 × (0.5 + 0.5 × 1.96/7 −
0.691)/0.462) = 1−Φ(−0.772) = 0.780. The null and alternative standard
deviations are close enough to motivate the use of the simpler approxi-
mation (2.22), approximating power as
If, instead, a test of power 0.85 were desired for alternative expectation
1/2, with a one-sided test of level 0.025, zα = 1.96, and zβ = 1.036. From
(2.24), one needs at least
The above intervals will extend outside [0, 1], which is not reasonable; this can
be circumvented by transforming the probability scale.
Figure 2.4 represents the bounds from (2.28), without any rescaling, to
be discussed further in the next example. Confidence bounds in Figure 2.4
exhibit occurrences of larger estimates being associated with upper confi-
dence bounds that are smaller (ex., in Figure 2.4, the region between the
second-to-largest and the largest observations), and for the region with the
cumulative distribution function estimated at zero or one (that is, the region
below the smallest observed value, and the region above the largest observed
value), confidence limits lie on top of the estimates, indicating no uncertainty.
Both of these phenomena are unrealistic. The first phenomenon, that of non-
monotonic confidence bounds, cannot be reliably avoided through rescaling;
the second, with the upper confidence bounds outside the range of the data,
can never be repaired through rescaling. A preferred solution is to substitute
the intervals of Clopper and Pearson (1934), described in §1.2.2.1, to avoid
all three of these problems (viz., bounds outside (0, 1), bounds ordered differ-
ently than the estimate, and bounds with zero variability). Such intervals are
exhibited in Figure 2.5.
Finally, the confidence associated with these bounds is point-wise, and not
simultaneous. That is, if (L1 , U1 ) and (L2 , U2 ) and are 1−α confidence bounds
associated with two ordinates x1 and x2 , then P [L1 ≤ F (x1 ) ≤ U1 ] ≥ 1 − α
and P [L2 ≤ F (x2 ) ≤ U2 ] ≥ 1 − α, at least approximately, but the preced-
ing argument does not bound P [L1 ≤ F (x1 ) ≤ U1 and L2 ≤ F (x2 ) ≤ U2 ] any
higher than 1 − 2α.
Example 2.5.1 Consider the arsenic data of Example 2.3.2. For every
real x, one counts the number of data points less than this x. For any
x less than the smallest value 0.073, this estimate is F̂ (x) = 0. For x
greater than or equal to this smallest value and smaller than the next
smallest value 0.080, the estimate is F̂ (x) = 1/21. This data set con-
tains one duplicate value 0.118. For values below, but close to, 0.118 (for
example, x = 0.1179), F̂ (x) = 4/21, since 21 of the observations are
less than x. However, F̂ (x) = 6/21; the jump here is twice what it is
at other data values, since there are two observations here. This esti-
Exercises 35
mate is sketched in both Figures 2.4 and 2.5, and may be constructed in
R using ecdf(arsenic$nails), presuming the data of Example 2.3.2 is
still present to R. The command ecdf(arsenic$nails) does not produce
confidence intervals; use
library(MultNonParam); ecdfcis(arsenic$nails,exact=FALSE)
to add confidence bounds, and changing exact to TRUE forces exact inter-
vals.
FIGURE 2.4: Empirical CDF and Confidence Bounds for Arsenic in Nails
. . . . . . . . . . . . . . ..
.
1 .. . . ◦......................................................................
.
.
◦..........................................................................................•..
.. .
. ◦.•. .
. .
. .. . . . . . . . . . . . . . . .
. ◦...................• .
. ◦ ....•
.. .
0.8 .
.◦ ....•
. .
.
. . .
. ◦ ..•
.. . . . .
. .
. ◦ .•
.. .
.
. ..
. ◦ • .
0.6 . . ..
. ◦ •
.
Probab- .◦
.
.....•
.. .
.
ility .◦.. .
•
. .
.. .
.◦
•
0.4 .. .
◦
.• .
.. .
.•
◦
. ..
•. .
.◦
..
..
.. . Empirical CDF
0.2 .◦...
•
...
. ◦
•
.
. . Confidence Bounds
..•
◦ .
..
..
◦
•
..
..
0 ..................................................................•
..
.
-1 0 1 2 3
Arsenic in Nails
Confidence Level 0.95 Approximate
2.6 Exercises
1. Calculate the asymptotic relative efficiency for the sign statistic
relative to the one-sample t-test (which you should approximate
using the one-sample z-test). Do this for observations from the
a. uniform distribution, on [−1/2, 1/2] with variance 1/12 and mean
under the null hypothesis of 0, and
b. logistic distribution, symmetric about 0, with variance π 2 /3 and
density exp(x)/(1 + exp(x))2 .
36 One-Sample Nonparametric Inference
FIGURE 2.5: Empirical CDF and Confidence Bounds for Arsenic in Nails
1 .
. . . . . . . . . . . . . . .• ....................................................................
. . . . ...........................................................................................
.. • ◦
.
. •.◦. .
. •...................◦
..
. .. . . . . . . . . .
. • ....◦
.. .
0.8 .
.• ....◦
. . . . . . . . . . . . . . . .
.
. . .
. • ..◦
.. .
. ... .
. • ◦ .
. . . . . .
. • ◦ .
0.6 . ◦ . . .
. • .
Probab- .•
.
.. . .
.....◦
ility .•.. . .
◦
. ..
.•..
◦
0.4 .. .
.•
◦ .
.•.. .
◦
. ..
.• ◦. ..
.
. .
..
Empirical CDF .
0.2 . . . . . . . . . . . .
.• ...
◦
Confidence Bounds
•
◦..
.
.
•.◦..
•
◦...
..
0 ..................................................................◦
-1 0 1 2 3
Arsenic in Nails
Confidence Level 0.95 Exact
http://ftp.uni-bayreuth.de/math/statlib/datasets/lupus
http://lib.stat.cmu.edu/datasets/bodyfat
gives data on body fat in 252 men. The second column gives propor-
tion of lean body tissue. Give a 95% confidence interval for upper
quartile proportion of lean body tissue. Note that the first 116 lines
Exercises 37
and last 10 lines are data set description, and should be deleted.
(Line 117 is blank, and should also be deleted).
4. Suppose 49 observations are drawn from a Cauchy distribution, dis-
placed to have location parameter 1.
a. What is the power of the sign test at level 0.05 to test the null
hypothesis of expectation zero for these observations?
b. What size sample is needed to distinguish between a null hypoth-
esis of median 0 and an alternative hypothesis of median 1, for inde-
pendent Cauchy variables with a one-sided level 0.025 sign test to
give 80% power?
3
Two-Sample Testing
This chapter addresses the question of two-sample testing. Data will gener-
ally consist of observations X1 , . . . , XM1 from continuous distribution function
F , and observations Y1 , . . . , YM2 from a continuous distribution function G.
Model these observations as independent, and unless otherwise specified, treat
their distributions as identical, up to some known shift θ; that is,
Techniques for testing a null hypothesis of form θ = θ0 in (3.1), vs. the alter-
native that (3.1) holds for some alternative θ 6= θ0 , and for estimating θ
assuming (3.1), are presented. Techniques for tests in which (3.1) is the null
hypothesis, for an unspecified θ, are also presented.
{k}
for Ij equal to the 1 if the item ranked j in the combined sample comes
{k}
from the group k, and 0 otherwise. The superscript in Ij refers to group,
and does not represent power.
{2}
The statistic TG is designed to take on large values when items in group
two are generally larger than the remainder of the observations (that is, the
items in group one), and to take small values when items in group two are
{1}
generally smaller than the remainder of the observations. The statistic TG is
designed to take on large values when items in group one are generally larger
than the remainder of the observations (that is, the items in group two), and
to take small values when items in group one are generally smaller than the
{1}
remainder of the observations. The statistic TG provides no information not
{2} {2} P N {1}
also captured in TG , since TG = j=1 aj − TG .
sampling without replacement, is used. There are some conditions on the set
of scores needed to ensure that they are approximately Gaussian. Critical
values for the two-sided test of level α of the null hypothesis of equality of
distribution, vs. the alternative, relying on the Gaussian approximation, are
given by
h i r h i
{2} {2}
E0 G ± Var0 TG zα/2
T (3.8)
Moments needed to evaluate (3.8) and (3.9) are given in the next section.
h i N
{k}
X
E0 TG = Mk aj /N = Mk ā; (3.10)
j=1
Y X Total
Greater than Median A N/2 − A N/2
Less than Median B N/2 − B N/2
Total M2 M1 N
So
h i N X
N h i
{k} {k} {k}
X
Var TG = ai aj Cov Ii , Ij
i=1 j=1
N
X X
= a2i b1 + ai aj b2 = (b1 − b2 )N â + N 2 b2 ā2
i=1 i6=j
although, as originally formulated, this test was applied only in the case of
even sample sizes, and so the score 0 would not be used.
Mood’s test, and other rank tests, ignore the ordering of the observations
from the first group among themselves, and similarly ignore the orderings of
the second group among themselves. Represent the data set as a vector of N
symbols, M1 of them X and M2 of them Y . The letter X in position j indicates
that, after ordering a combined set of N observations, the observation ranked
j comes from the first group, and the letter Y in position j indicates that
the observation ranked j comes from the second group. The advantage of
Mood’s test lies in its simplicity, and its disadvantage is its low power. To
44 Two-Sample Testing
Y X Total
Greater than Median A (N − 1)/2 − A (N − 1)/2
Equal to Median C 1−C 1
Less than Median B (N − 1)/2 − B (N − 1)/2
Total M2 M1 N
see why its power is low, consider the test with M1 = M2 = 3, for a total
of six observations. A data set label X, Y, X, Y, X, Y indicates that the lowest
observation is from the first group, the second lowest is from the second group,
the third lowest is from the first group, the fourth lowest is from the second
group, the fifth lowest (that is, the second highest) is from the first group,
and the highest is from the second group. Mood’s test treats X, Y, X, Y, X, Y
and X, Y, X, X, Y, Y as having equal evidence against H0 , but the second
should be treated as having more evidence. Furthermore, Mood’s test takes a
value between 0 and min(M2 , bN/2c). This high degree of discreteness in the
statistic’s support undermines power.
Westenberg (1948) presented the equal-sample case of the statistic, and
Mood (1950) detailed the use in the case with an even combined sample size.
Mood’s test is of sufficiently low importance in applications that the early
references did not bother to present the slight complication that arises when
the combined sample size is odd.
When the total sample size is odd, one might represent the median test
as inference from a 2 × 3 contingency table with ordered categories, as in
Table 3.2. Then TM = A − B. Then the one-sided p-values may be calculated
as
M1 M2
P0 [A − B ≥ t] = P0 [A − B ≥ t|C = 0] + P0 [A − B ≥ t|C = 1] .
N N
The probabilities P0 [A − B ≥ t|C = c] are calculated from the hypergeomet-
ric distribution.
The null expectation and variance of TM are given by (3.10) and (3.11)
respectively. Note ā = 0, and
(
(N − 1)/N if N odd
â =
1 if N even.
Critical values and p-values for Mood’s test may be calculated from (3.8) and
(3.9) respectively.
A First Distribution-Free Test 45
M
18
♦
♦
17
-2 -1 0 1 2
Normal Quantile
×
19
♦
18 × Symbol Bobbin
Sample 1
Quantile 17 ♦
×× ♦ 2
Yarn
M 3
Type 5♦
B 16 M5+ + 4
5+ ×
M+ 5
15 5 + ♦ 5 6
M
M
14
-2 -1 0 1 2
Normal Quantile
Yarn Type B
Mood’s test, using (3.7) in conjunction with (3.12), is almost never used,
and is presented here only as a point of departure for more powerful tests.
Mood’s test looses power primarily because of the discreteness of the scores
(3.12). The balance of this chapter explores tests with less discrete scores.
.................................................
. ... ...........................................................
..........................................
A ... ..
...............................................
..
Type
................................................................................
........................................ .. ................................................................................
B .. ..
.......................................................................
..
12 13 14 15 16 17 18 19 20
Strength
coming from the second group. Denote this specific example of the general
rank statistic as TW . The term “rank-sum” is used to differentiate this test
from another test proposed by the same author, to be discussed in a later
chapter.
An alternative test statistic for detecting group differences is
M1 X
X M2
TU = I(Xi < Yj ). (3.14)
i=1 j=1
This new statistic TU and the previous statistic TW can be shown to be iden-
tical. For j ∈ {1, . . . , M2 } indexing a member of the second sample, define Rj
as the rank of observation j among all of the observations from the combined
samples. Note that
M2
X M2
X
TW = Rj = #(sample entries less than or equal to Yj )
j=1 j=1
M2
X
= #(X values less than or equal to Yj )
j=1
M2
X
+ #(Y values less than or equal to Yj ) (3.15)
j=1
The first sum in (3.15) is TU , and the second is M2 (M2 + 1)/2, and so
TW = TU + M2 (M2 + 1)/2.
48 Two-Sample Testing
The test based on TU is called the Mann-Whitney test (Mann and Whitney,
1947). This statistic is a U statistic; that is, a statistic formed by summing
over pairs of observations in a data set.
Equating quadratic terms above gives a = 1/3. Setting the linear term to zero
gives b = 1/2, and setting the constant term to zero gives c = 1/6. Then
w
X
j2 = g(w) = w(2w + 1)(w + 1)/6, (3.20)
j=1
â = (2N + 1)(N + 1)/6,
â − ā2 = (2N + 1)(N + 1)/6 − (N + 1)2 /4 = (N 2 − 1)/12, (3.21)
In conjunction with the central limit theorem argument described above, one
can test for equality of distributions, with critical values and p-values given
by (3.8) and (3.9) respectively.
Example 3.4.1 Refer again to the yarn data of Example 3.3.1. Consider
yard strengths for bobbin 3.
wilcox.test(strength~type,data=yarn[yarn$bobbin==3,],
exact=TRUE)
Example 3.4.2 Consider the nail arsenic data of Example 2.3.2. One
might perform an analysis using these scoring methods.
library(exactRankTests)#Gives savage and vw scores
arsenic$savagenails<-as.numeric(cscores(
arsenic$nails,type="Savage"))
arsenic$vwnails<-as.numeric(cscores(arsenic$nails,
type="Normal"))
The Mann-Whitney-Wilcoxon Test 51
N M2
{2}
X X
TP = Z(j) Ij = Yj = M2 Ȳ . (3.23)
j=1 j=1
N Z̄ − M2 Ȳ N Ȳ − N Z̄
Ȳ − X̄ = Ȳ − = = N TP − M2 Z̄ /(M1 M2 ),
M1 M1
52 Two-Sample Testing
PN
where Z̄ = i=1 Z(i) /N . The pooled variance estimate for the two-sample t
statistic is
PM1 2
PM2 2
2 j=1 (Xj − X̄) + j=1 (Yj − Ȳ )
sp =
N −2
PM1 2 2
PM2 2 2
j=1 (Xj − Z̄) − M1 (X̄ − Z̄) + j=1 (Yj − Z̄) − M2 (Ȳ − Z̄)
=
N −2
(N − 1)s2Z − M1 (X̄ − Z̄)2 − M2 (Ȳ − Z̄)2
= .
N −2
Some algebra shows this to be
N − 1 (TP − M2 Z̄)2 (1/M1 + 1/M2 )
s2p = s2Z − .
N −2 (N − 2)
Hence, conditional on (Z(1) , . . . , Z(N ) ), the two-sample pooled t statistic is
p
(N − 2)N TP − M2 Z̄
q ,
(s2Z (N − 1)M1 M2 − (TP − M2 Z̄)2 N
for sZ the sample standard deviation of (Z(1) , . . . , Z(N ) ).
Hence the pooled two-sample t statistic is a strictly increasing function of
the score statistic TP with ordered data used as scores. However, while the
pooled t statistic is typically compared to a T distribution, the rank statistic
is compared to the distribution of values arising from random permutations of
the group labels; this is the same mechanism that generates the distribution for
the rank statistics with scores determined in advance. In the two-sample case,
N
there are M 1
ways to assign M1 labels 1, and M2 labels 2, to the order statis-
tics (Z(1) , . . . , Z(N ) ). A less-efficient way to think of this process is to specify N
labels, the first M1 of them 1 and the remaining M2 of them 2, and randomly
assign, or randomly permute, (Z(1) , . . . , Z(N ) ) without replacement; there are
N
N ! such assignments, leading to at most M 1
distinct values. The observed
value of TP is then compared with the sampling distribution arising from this
random permutation of values; such a test is called a permutation test. The
same permutation concept coincides with the desired reference distribution
for all of the rank statistics in this chapter.
Example 3.4.3 Again consider the nail arsenic data of Example 2.3.2.
Recall that there are 21 subjects in this data set, of whom 8 are male.
The permutation test testing the null hypothesis of equality of distribution
across gender may be performed in R using
library(MultNonParam)
aov.P(dattab=arsenic$nails,treatment=arsenic$sex)
to give a two-sided p-value of 0.482. In this case, all 21
8 = 203490 ways
to reassign arsenic nail levels to the various groups were considered. The
Empirical Levels and Powers of Two-Sample Tests 53
TABLE 3.4: Levels for various Two-Sample Two-Sided Tests, Nominal level
0.05, from 100,000 random data sets each, sample size 10 each
TABLE 3.5: Powers for various Two-Sample Two-Sided Tests, Nominal level
0.05, from 100,000 random data sets each, sample size 10 each, samples offset
by one unit
statistic TP of (3.23) was calculated for each assignment, this value was
subtracted from the null expectation Z̄, and the difference was squared
to provide a two-sided statistic. The p-value reported is the proportion of
these for which the squared differences among the reassignments meets or
exceeds that seen in the original data.
Table 3.5 excludes the exact version of the Wilcoxon test and Mood’s test,
since for these sample sizes (Mj = 10 for j = 1, 2), they fail to achieve the
desired level for any data distribution. The approximate Wilcoxon test has
comparable power to that of the t-test under the conditions optimal for the
t-test, and also maintains high power throughout.
π2
Logistic µ0 (0) = 1 0
√ µ (0) = 6
1
√ = 1.10
−1 9
σ(0) = √ πζ / 3 σ(0) =√ζ −1 /(2 3)
e = ζ 3/π e = ζ/ 3
p
ζ = (λ(1 − λ))1/2 , ε = Var [Xi ].
to the model parameter, evaluated at order statistics under the null hypoth-
esis, without providing rank expectations away from the null (Dwass, 1956).
In contrast with the shift alternative (3.1), one might consider the
Lehmann alternative
Example 3.9.1 Consider again the yarn data of Example 3.3.1. Test
equality of dispersion between the two types of yarn. Ranks are given in
Table 3.7, and are calculated in package NonparametricHeuristic as
yarn$ab<-pmin(rank(yarn$strength),rank(-yarn$strength))
yarn$st<-round(siegel.tukey.ranks(yarn$strength),2)
yarnranks<-yarn[order(yarn$strength),
c("strength","type","ab","st")]
R functions may be used to perform the test.
library(DescTools)#For SiegelTukeyTest
SiegelTukeyTest(strength~type,data=yarn)
yarnsplit<-split(yarn$strength,yarn$type)
ansari.test(yarnsplit[[1]],yarnsplit[[2]])
to find the Siegel-Tukey p-value as 0.7179, and the Ansari-Bradley p-value
as 0.6786. There is no evidence of inequality of dispersion.
60 Two-Sample Testing
in close parallel with definitions of §2.3.2. Then, reject the null hypothesis if
{2} {2}
TG (θ0 ) ≤ t◦L or TG (θ0 ) ≥ t◦U for t◦L = tl − 1 and t◦U = tu , and use as the
confidence interval
{2}
{θ|tl ≤ TG (θ0 ) < tu }. (3.32)
{2}
Applying the Gaussian approximation to TG (θ),
r h i
{2}
tl , tu ≈ M2 ā ± zα/2 Var TG (0) . (3.33)
Two-Sample Estimation and Confidence Intervals 61
{2}
is {θ|tl ≤ TU (θ) < tu }, for the largest tl and smallest tu as in (3.31), made
specific to the Mann-Whitney statistic:
l −1
tX h i α MX
1 M2 h i α
{2} {2}
PM1 ,M2 TU (0) = k ≤ , PM1 ,M2 TU (0) = k ≤ , (3.34)
2 2
k=0 k=tu
then the above technique indicates an order statistic that is generally not
an integer. In this case, many procedures interpolate between adjacent order
statistics.
Note parallels between this construction and that of the confidence inter-
val for the median in §2.3.3. In this section, pairwise differences replace raw
observations, and the Wilcoxon quantiles replace those of the binomial distri-
bution. Furthermore, the direction of the test was changed. The rest of the
construction is identical.
Compare Figure 3.3 to Figure 2.3. The test statistic in Figure 2.3 is con-
structed to be non-decreasing in θ; this simplified the discussion of the interval
construction, at the cost of employing a slightly non-standard version of the
sign test statistic. Figure 3.3 uses a more standard version of the test statistic.
-2 -1 0 1
Location Difference
Example 3.11.1 Again consider the the yarn data of Example 3.3.1.
Figure 3.4 shows the cumulative distribution functions of strengths for
the two types of yarn. This figure might be generated using
par(mfrow=c(1,1))
plot(range(yarn$strength),c(0,1),type="n",
main="Yarn Strength",xlab="Yarn Strength",
ylab="Probability")
yarnsplit<-split(yarn$strength,yarn$type)
lines(ecdf(yarnsplit[[1]]),col=1)
lines(ecdf(yarnsplit[[2]]),col=2)
legend(17,.2,lty=c(1,1),col=c(1,2),legend=c("A","B"))
The largest difference between distribution functions happens at strengths
slightly larger than 16; this gives the Kolmogorov-Smirnov statistic. The
Cramér-von Mises statistic is the integral of the squared distance between
these curves. Statistics may be calculated using
ks.test(yarnsplit[[1]],yarnsplit[[2]])
library(CvM2SL2Test)
cvmstat<-cvmts.test(yarnsplit[[1]],yarnsplit[[2]])
cvmts.pval(cvmstat,length(yarnsplit[[1]]),
length(yarnsplit[[2]]))
Note that the library CvM2SL2Test, giving cvmts.test and cvmts.pval,
is no longer supported, and must be installed from archives, using
library(devtools);install_version("CvM2SL2Test"). These calcu-
lations can also be performed using package MultNonParam. Significance
tests are based on permutation distributions; the Kolmogorov-Smirnov p-
value is 0.2591, and the Cramér-von Mises p-value is 0.09474.
3.12 Exercises
1. The data set
http://ftp.uni-bayreuth.de/math/statlib/datasets/schizo
1 5 .................................5
.............................
..
...................
5 ... 5 .................. ...
................
5 .. .......
5 ..
.............. Type A ..
5 .. 5 ..
.
...................
5 ... ..
5 .
............ Type B
0.8 5 .....
... .....................
5 ...
.....
5 ... ....
5 ..
5 ..
..
5 ..
.. 5 ..
.
..
5 .. ..
5 .
0.6 ..
5 .. ................
5 ..
Probab- ..
5 .. 5 .....
.
ility ..
5 .
................
5 ..
0.4 .......
5 ..
..
5 .
5 ..
.. 5 .......
..
..
5 ..
5 .....
.
0.2 ...................
...5
..
.
5
............
5 ...
..........................
5 ... ..........
5 ..
.....
5 ... .......
5 ..
0 .................................
... ...........................
..
12 14 16 18 20
Yarn
gives eye tracking target type. For all parts of this question select
subjects with target type CS. Gain ratios are given in various
columns; for the balance of this question, use the first of these,
found in the third column.
a. Select only non-schizophrenic patients. The comments at the top
of the data file tell you which subjects are female. Compare the first
gain ratios for women with those for men. Test whether the gain
ratios come from the same distribution, using the Wilcoxon rank
sum test.
b. Test whether the first gain ratios for women and men come from
the same distribution. Use data values as scores.
For the remainder of this question, select only female patients.
c. Estimate the poplulation median of the difference between the
gain ratios for schizophrenic patients (in the lower part of the file)
and for the non-schizophrenic patients (in the upper part of the
file). Calculate a 95% confidence interval for this median.
d. Test the null hypothesis that the dispersion of the first gain
ratios is the same for schizophrenic patients (in the lower part of
66 Two-Sample Testing
the file) as it is for the non-schizophrenic patients (in the upper part
of the file). Use the Ansari-Bradley test.
e. Test the null hypothesis that the distribution of the first gain
ratios is the same for schizophrenic patients (in the lower part of the
file) as it is for the non-schizophrenic patients (in the upper part of
the file). Use the Kolmogorov-Smirnov test.
2. The data set
http://lib.stat.cmu.edu/datasets/biomed.desc
This chapter develops nonparametric techniques for one way analysis of vari-
ance.
Suppose Xki are samples from K potentially different populations. That is,
for fixed k, Xk1 , . . . , XkMk are independent and identically distributed, each
with cumulative distribution function Fk . Here k ∈ {1, . . . , K} indexes group,
and Mk represents the number of observations in each group. In order to
determine whether all populations are the same, test a null hypothesis
vs. the alternative hypothesis HA : there exists j, k, and x such that Fj (x) 6=
Fk (x). Most tests considered in this chapter, however, are most powerful
against alternatives of the form HA : Fk (x) ≤ Fj (x)∀x for some indices k, j,
with strict inequality at some k, j, and x. Of particular interest, particularly
for power calculations, are alternatives of the form
Fi (x − θi ) = Fj (x − θj ) (4.2)
69
70 Methods for Three or More Groups
When the data have a Gaussian distribution, and (4.1) holds, the numerator
and denominator of (4.3) have χ2 distributions, and are independent; hence
the ratio W has an F distribution.
When the data are not Gaussian, the central limit theorem implies that the
numerator is still approximately χ2K−1 , as long as the minimal Mk is large, and
as long as the distribution of the data is not too far from Gaussian. However,
neither the χ2 distribution for the denominator of (4.3), nor the independence
of numerator and denominator, are guaranteed in this case. Fortunately, again
for large sample sizes and data not too far from Gaussian, the strong law of
large numbers indicates that the denominator of (4.3) is close to the population
variance of the observations, and the denominator degree of freedom for the
F distribution is large enough to make the F distribution close the the χ2
distribution. Hence in this large-sample close-to-Gaussian case, the standard
analysis of variance results will not mislead.
4.1.1 Contrasts
Let µk = E [Xki ]. Continue considering the null hypothesis that µj = µk for
all j, k pairs, and consider alternative hypotheses in which Xki all have the
same finite variance σ 2 , but the means differ, in a more structured way than
for standard ANOVA. Consider alternatives such that µk+1 − µk are the same
for all k, and denote the common value by ∆ > 0. One might construct a
test particularly sensitive to this departure from the null hypothesis using the
estimate ∆.ˆ If ∆ ˆ is approximately Gaussian, then the associated test of the
null hypothesis (4.1) vs.rthe ordered and equidistant alternative is constructed
h i h i
as T = (∆ ˆ − E0 ∆ ˆ )/ Var0 ∆ ˆ ; this statistic is compared to the standard
Gaussian distribution in the usual way.
An intuitive estimator ∆ ˆ is the least squares estimators; for example, when
ˆ
K = 3 then ∆ = (X̄3. − X̄1. )/2, and when K = 4 then ∆ ˆ = (3X̄4. + X̄3. −
X̄2. − 3X̄1. )/10. Generally, the least squares estimator is a linear combination
PK
of group means, of form k=1 ck X̄k. for a set of constants ck such that
K
X
ck = 0, (4.5)
k=1
h i
ˆ = 0 and
with ck evenly spaced. In this case, E0 ∆
h i K
X
ˆ
Var0 ∆ = Var0 [Xk ] c2k /Mk ,
k=1
two-sample pooled t-test (3.2), except that the standard deviation estimate of
(4.4) may be substituted for sp , with the corresponding increase in degrees of
freedom in (3.5).
Fisher’s LSD method fails to control family-wise error rate if K > 3. To
see this, suppose F1 (x) = F2 (x) = · · · = FK−1 (x) = FK (x − ∆) for ∆ 6= 0.
Then null hypotheses Fj (x) = Fi (x) are true, for i, j < K. One can make ∆
so large that the analysis of variance test rejects equality of all distributions
with probability close to 1. Then multiple true null hypotheses are tested,
without control for multiplicity. If K = 3, there is only one such test with a
true hull hypothesis, and so no problem with multiple comparisons.
Contrast this with Tukey’s Honest Significant Difference (HSD) method
(Tukey, 1953, 1993). Suppose that Yj ∼ G(0, 1/Mj ) for j ∈ {1, . . . , K}, U ∼
χ2m , and that the Yj and U are independent. Assume p further that Mj are all
p
equal. The distribution of max1≤i,j≤K (|Xj − Xi |/( U/n/ Mj ) is called the
Studentized range distribution with K and m degrees of freedom. If Mj are
not all equal,
√ p q
2 max ((Xj − Xi )/( U/m 1/Mj + 1/Mk )
1≤i,j≤K
then
P [µk − µj ∈
/ Cjk for some j, k] ≤ α. (4.10)
This method had been suggested before the Studentized range distribution
had been derived (Tukey, 1949).
We now proceed to analogs of one-way analysis of variance that preserve
nominal test size for small samples and highly non-Gaussian data.
General Rank Tests 73
N −1
uk = .
(N 2 (â
− ā2 )Mk
{k}
The remainder of this section considers the joint distribution of the TG ,
calculates their moments, and confirms the asymptotic distribution for WG .
(N − Mk − Mj )(Mk + Mj ) h
{j,k}
i
(â − ā2 ) = Var TG
(N − 1)
h i h i h i
{j} {k} {j} {k}
= Var TG + Var TG + 2Cov TG , TG
(N − Mj )Mj (N − Mk )Mk h
{j} {k}
i
= (â − ā2 ) + (â − ā2 ) + 2Cov TG , TG
(N − 1) (N − 1)
74 Methods for Three or More Groups
and h i −M M
{j} {k} j k
Cov TG , TG = (â − ā2 ).
(N − 1)
Var [Y ] (I + (N/MK )νν > ) = I + (−1 + (N/MK )(1 − ν > ν))νν > = I, (4.13)
PK−1
since ν > ν = j=1 Mj /N = 1 − MK /N . Then
−1
Var [Y ] = I + (N/MK )νν > . (4.14)
Hence
Y > (I + (N/MK )νν > )Y ∼ χ2K−1 . (4.15)
Also,
K−1
N X {j}
Y > (I + νν > )Y = ω2 (TG − Mj ā)2 /Mj
MK j=1
2
K−1
{j}
X
2
+ ω (TG − Mj ā) /MK
j=1
K−1 {j} 2 {K} 2
X (TG − Mj ā) (TG − MK ā))
= ω2 +
j=1
Mj MK
K {j}
N − 1 X (TG − Mj ā)2
= . (4.16)
(â − ā2 )N j=1 Mj
The above calculation required some notions from linear algebra. The cal-
culation (4.13) requires an understanding of the definition of matrix multipli-
cation, and the associative and distributive properties of matrices, and (4.14)
requires an understanding of the definition of a matrix inverse. Observation
(4.15) is deeper; it requires knowing that a symmetric non-negative definite
matrix may be decomposed as V = D > D, for a square matrix D, and an
understanding that variances matrices in the multivariate case transform as
do scalar variances in the one-dimensional case.
One might compare this procedure to either the standard analysis of vari-
ance procedure, which is heavily reliant on distribution of responses. Alterna-
tively, one might perform the ANOVA analysis on ranks; this procedure does
not depend on distribution of responses.
76 Methods for Three or More Groups
This test is called the Kruskal-Wallis test, and is often referred to as the
H test.PHere, again, Rki is the rank of Xki within the combined sample, and
Mk
Rk. = i=1 Rki , and (3.21) gives the first multiplicative factor.
G−1
K−1 (1 − α; 0), (4.18)
http://lib.stat.cmu.edu/datasets/Andrews/T58.1
represent corn (maize) yields resulting from various fertilizer treatments
(Andrews and Herzberg, 1985, Example 58). Test the null hypothesis
that corn weights associated with various fertilizer combinations have the
same distribution, vs. the alternative hypothesis that a measure of location
varies among these groups. Treatment is a three-digit string representing
three fertilizer components. Fields in this file are separated by space. The
first three fields are example, table, and observation number. The fourth
and following fields are location, block, plot, treatment, ears of corn, and
weight of corn. Location TEAN has no ties; restrict attention to that
location. The yield for one of the original observations 36 was missing
(denoted by -9999 in the file), and is omitted in this analysis. We cal-
The Kruskal-Wallis Test 77
maize<-as.data.frame(scan("T58.1",what=list(exno=0,tabno=0,
lineno=0,loc="",block="",plot=0,trt="",ears=0, wght=0)))
maize$wght[maize$wght==-9999]<-NA
maize$nitrogen<-as.numeric(substring(maize$trt,1,1))
#Location TEAN has no tied values. R treats ranks of
#missing values nonintuitively. Remove missing values.
tean<-maize[(maize$loc=="TEAN")&(!is.na(maize$wght)),]
cat(’\n Kruskal Wallis H Test for Maize Data \n’)
kruskal.test(split(tean$wght,tean$trt))
#Alternative R syntax:
#kruskal.test(tean$wght,tean$trt)
This might be compared with analysis of variance:
#Note that treatment is already a factor.
anova(lm(wght~trt,data=tean))
and with analysis of variance of the ranks:
anova(lm(rank(wght,na.last=NA)~trt,data=tean))
These last two tests have p-values 0.705 and 0.655 respectively. Note the
difference between these Gaussian theory results and the Kruskal-Wallis
test.
Figure 4.1 shows the support of the normal scores statistic on the set of
possible group rank sums for a hypothetical very small data set; the contour
of the approximate critical region for the test of level 0.05 is superimposed.
As is the case for the chi-square test for contingency tables, points enter the
critical region as the level increases in an irregular way, and so constructing an
additive continuity correction to (4.17) is difficult. Yarnold (1972) constructs a
continuity correction that is additive on the probability, rather on the statistic,
scale. Furthermore, even though group sizes are very small, the sample space
for the group-wise rank sums is quite rich. This richness of the sample space,
as manifest by the small ratio of the point separation (in this case, 1) to
the marginal standard deviations (the square roots of the variance in (3.22)),
implies that continuity correction will have only very limited utility (Chen
and Kolassa, 2018).
78 Methods for Three or More Groups
Example 4.4.1 Revisiting the TEAN subset of the maize data of Exam-
ple 4.3.1, one might perform the van der Waerden and Savage score tests,
FIGURE 4.1: Asymptotic Critical Region for Kruskal Wallis Test, Level 0.05
. . . . ....................................................................... . . .
......... ..........
....... ...
. . ............ . . . . . . . . .................. . .
.. .......
..... ..
25 . ....... . . . . . . . . . . . ............... .
.. ......
. ..
. .... . . . . . . . . . . . . . ............. .
.. .....
..
..
...
. . . . . . . . . . . . . . . ............ .
.....
.... ..
.... . . . . . . . . . . . . . . . ........... .
.. .....
...... . . . . . . . . . . . . . . . . .......... .
.
... ....
... ..
20 .... . . . . . . . . . . . . . . . . . ......... .
... ...
...
... . . . . . . . . . . . . . . . . . . ....... .
... ...
. .... . . . . . . . . . . . . . . . . . . . ...... . .
.
... ...
. ..... . . . . . . . . . . . . . . . . . . . ...... .
Group 2 ...
..
...
.
. .... . . . . . . . . . . . . . . . . . . . ..... .
rank ...
. ...... . . . . . . . . . . . . . . . . . . . ......
...
sum 15 ... ...
. .... . . . . . . . . . . . . . . . . . . .....
...
... ...
...
. ...... . . . . . . . . . . . . . . . . . ......
.... ...
....
. ........ . . . . . . . . . . . . . . . . ......
..... .
..... ..
. ........ . . . . . . . . . . . . . . . ......
.....
..... ....
10 . ........ . . . . . . . . . . . . . . ...
..... .
.. ..
. ............. . . . . . . . . . . . . . .... .
...... .
...... ..
.
. ........... . . . . . . . . . . . ....... .
....... .
.. .....
. . ................... . . . . . . . ............. . .
.......... .
........... .
.. ...
........
. . . . .............................................................. . . . .
5 10 15 20 25
Group 1 rank sum
Group sizes 3, 3, 4
The p-values for normal and Savage scores are 0.9544 and 0.9786 respec-
tively.
One might also apply a permutation test in this context. In this case, the
original data are used in place of ranks. The reference distribution arises from
the random redistribution of group labels among the observed responses. A
Gaussian approximation to this sampling distribution leads to analysis similar
to that of an analysis of variance.
Example 4.4.2 Revisiting the TEAN subset of the maize data of Exam-
ple 4.3.1, the following syntax performs the exact permutation test, again
using package MultNonPram.
#date()
#aov.P(tean$wght[!is.na(tean$wght)],
# tean$trt[!is.na(tean$wght)])
#date()
80 Methods for Three or More Groups
giving an approximation
p to the p-value of 0.7254, or, more carefully,
0.7254 ± 1.96 0.7254 × 0.2746/10000 = (0.717, 0.734).
Kruskal-Wallis test (4.17) of §4.3 for the analysis of variance test (4.3) in the
first stage of the procedure, and substituting the Mann-Whitney-Wilcoxon
test (3.14) for the two-sample t-test, with the same lack of Type I error con-
trol.
In the case of rank testing, when sample sizes are equal, the Studentized
range method may be applied to rank means for simultaneous population dif-
ferentiation (Dunn, 1964); Conover and Iman (1979) credits this to Nemenyi
(1963). This technique may be used to give corrected p-values and corrected
simultaneous confidence intervals for rank means. Since rank mean expecta-
tions are generally not of interest, the application of the Studentized range
distribution to rank means is typically of direct interest only for testing. Use
{k}
TG /Mk in place of X̄k . Note that for j 6= k,
h i h i h i
" # {k} {j} {j} {k}
T
{k}
T
{j} Var TG Var TG Cov TG , TG
Var G − G = 2 + 2 −2
Mk Mj Mk Mj Mj Mk
2
N − Mj N − Mk â − ā
= + +2
Mj Mk N −1
2
1 1 N (â − ā )
= + .
Mj Mk N −1
p
Hence S in (4.7) may be replaced with N (â − ā2 )/(N − 1) to obtain simulta-
neous p-values to satisfy (4.8). Also, take the denominator degrees of freedom
to be ∞ as the second argument to q. The same substitution may be made
in (4.9) to obtain (4.10), but the parameters bounded in these intervals are
differences in average rank, which are seldom of interest.
Example 4.5.1 Consider again the yarn data of Example 3.3.1. Con-
sider just type A, and explore pairwise bobbin differences. One might do
all pairwise Mann-Whitney-Wilcoxon tests.
yarna<-yarn[yarn$type=="A",]
cat(’\nMultiple Comparisons for Yarn with No Correction\n’)
pairwise.wilcox.test(yarna$strength,yarna$bobbin,exact=F,
p.adjust.method="none")
This gives pairwise p-values
1 2 3 4 5
2 0.112 - - - -
3 0.470 1.000 - - -
4 0.245 0.030 0.112 - -
5 0.885 0.312 0.772 0.470 -
6 0.661 0.146 0.470 0.042 1.000
82 Methods for Three or More Groups
In this case, the initial Kruskal-Wallis test fails to reject the null hypoth-
esis of equality of distribution, and no further exploration is performed.
here Ui is the Mann-Whitney Pstatistic for testing group i vs. all preceding
i
groups combined, and mi = j=1 Mj . The second equality in (4.20) follows
from independence of the values Ui (Terpstra, 1952). A simpler expression for
this variance is
" K
#
1 X
Var0 [J] = N (N + 1)(2N + 1) − Mi (Mi + 1)(2Mi + 1) . (4.21)
72 i=1
This test might be corrected for ties, and has certain other desirable properties
(Terpstra, 1952).
Jonckheere (1954), apparently independently, suggested a statistic that
is twice J, centered to have zero expectation, and calculated the vari-
ance, skewness, and kurtosis. The resulting test is generally called the
Jonckheere-Terpstra test.
Example 4.6.1 Consider again the Maize data from area TEAN in
Example 4.3.1. The treatment variable contains three digits; the first indi-
cated nitrogen level, with four levels, and is extracted in the code in Exam-
ple 4.3.1. Apply the Jonckheere-Terpstra test:
to perform this test, and the comparative three degree of freedom Kruskal-
Wallis test. The Jonckheere-Terpstra Test gives a p-value 0.3274, as com-
pared with the Kruskal-Wallis p-value 0.4994.
with κij defined as in (4.22). Alternative values for κij under shift models (4.2)
are calculated as in (3.25). Without loss of generality, one may take θ1 = 0.
Consider parallels with the two-group setup of §3. The cumulative distri-
bution function F1 of (4.2) corresponds to F of (3.1), and F2 corresponds to G
of (3.1). Then µ(θ) of (3.25) corresponds to κ12 . Calculation of κkl , defined in
(4.22), and applied to particular pairs of distributions, such as the Gaussian
in (3.26) and the logistic in (3.27), and other calculations from the exercises
of §3, hold in this case as well. Each of the difference probabilities κkl , for
k 6= l, depends on the alternative distribution only through θl − θk .
Power may be calculated from (2.18).
and
distribution for WH is χ2K−1 . One might attempt to act by analogy with (2.17),
and calculate power using alternative hypothesis expectation and variance
matrix. This is possible, but difficult, since WH (and the other approximately
χ2K−1 statistics in this chapter) are standardized using the null hypothesis
variance structure. Rescaling this under the alternative hypothesis gives a
statistic that may be represented approximately as the sum of squares of inde-
pendent Gaussian random variables; however, not only do these variables have
non-zero means, which is easily addressed using a non-central χ2 argument
as in (1.3), but also have unequal variances for the variables comprising the
summands; an analog to (1.3) would have unequal weights attached to the
squared summands.
A much easier path to power involves analogy with (2.22): Approximate
the alternative distribution using the null variance structure. This results in
the non-central chi-square approximation using (1.4).
Because the Mann-Whitney and Wilcoxon statistics differ only by an addi-
tive constant, the Kruskal-Wallis test may be re-expressed as
(K )
X
◦ 2
(Tk − Mk (N − Mk )κ ) /Mk /[ψ 2 (N + 1)N ]. (4.25)
k=1
the Mann-Whitney statistic for testing whether group k differs from all of the
other groups, with all of the other groups collapsed.
The variance matrix for rank sums comprising WH is singular (that is,
it does not have an inverse), and the argument justifying (1.4) relied on the
presence of an inverse. The argument of §4.2.2 calculated the appropriate
quadratic form, dropping one of the categories to obtain an invertible vari-
ance matrix, and then showed that this quadratic form is the same as that
generating t. The same argument shows that the appropriate non-centrality
parameter is
(K )
X
ξ= (EA [Tk ] − Mk (N − Mk )(1/2)) /Mk /[ψ 2 (N + 1)N ],
2
k=1
Powers of Tests 87
PK
where EA [Tk ] = Mk l=1,l6=k Ml κkl . The non-centrality parameter is
2
K K
1 X X
ξ = Mk Ml (κkl − κ◦ )
ψ 2 (N + 1)N
k=1 l=1,l6=k
K K
!2
1 X X
◦
= Mk Ml (κkl − κ ) . (4.27)
ψ 2 (N + 1)N
k=1 l=1
1 − GK−1 (G−1
K−1 (1 − α, 0); ξ), (4.28)
kwpower(rep(20,3),(0:2)/2,"normal")
G−1 −1
K−1 (β; ξ) = GK−1 (1 − α, 0). (4.29)
88 Methods for Three or More Groups
From (4.27),
!2
XK K
X
ξ≈ λk λl (κkl − κ◦ ) N/ψ 2 , (4.30)
k=1 l=1
√
for ψ = 1/ 12, and
!2
XK K
X
N ≈ ξψ 2 / λk λl (κkl − κ◦ ) , (4.31)
k=1 l=1
Example 4.7.4 The sum with respect to j disappears from ζ 2 , since the
sum of the proportions λj is 1. Under the same conditions in Example
4.7.3, one might use (4.33) rather than (4.31). Take θ † = θ A = (0, 1/2, 1)
and ∆ = 1. The derivative κ0 is tabulated, for the Gaussian and logistic
distributions,
√ −1 in Table 3.6 as µ0 (0). In this Gaussian case, κ0 = µ0 (0) =
(2 π) = 0.282. Also, ζ 2 = (02 /3 + (1/2)2 /3 + 12 /3 − (0/3 + (1/2)/3 +
1/3)2 ) = 5/12 − 1/4 = 1/6. The non-centrality parameter, solving (4.29),
is ξ = 9.63. The approximate sample size is 9.63/(12 × 12 × 0.2822 /6) =
61, or approximately 21 observations per group; compare this with an
approximation of 23 per group using a Gaussian approximation, without
expanding the κij .
This may be computed using the R package MultNonParam using
kwsamplesize((0:2)/2,"normal",taylor=TRUE)
In order to determine the effect size necessary to give a level α test power
1 − β to detect an alternative hypothesis ∆θ † , determine ξ from (4.28), and
then ∆ from (4.34).
Figure 4.3 reflects the accuracy of the various approximations in this sec-
tion. It reflects performance of approximations to the power of the Kruskal-
Wallis test. Tests with K = 3 groups, with the same number of observations
per group, with tests of level 0.05, were considered. Group sizes between 5 and
30 were considered (corresponding to total sample sizes between 15 and 90).
For each sample size, (4.34) was used to generate an alternative hypothesis
with power approximately 0.80, in the direction of equally-spaced alternatives.
The dashed line represents a Monte Carlo approximation to the power, based
on 50,000 observations. The dotted line represents the standard non-central
chi-square approximation (4.28). The solid line represents this same approxi-
mation, except also incorporating the linear approximation (4.32) representing
90 Methods for Three or More Groups
0.80 ........
............................................................................................................................................
.......................................................
..............................
...........
................... ...... .......................................
.. ....
.......... ........................................
...............
... .........
.......... .............
........ . . . .
........ . . . .
0.75 ............. . . . .
.... . . .
........ . . . . .
...
...... . .
.
.... . .
..... . .
...... . .
......... .
.. . .
....
... .
0.70 ...
..
.. .
.
.
.
...
Approx- ...
... .
.
.
.
imate .
...
.
.
.
... .
Power 0.65 ..
...
.
.
.
.
.
. .................................................................... Approx. (4.28) with (4.27)
.
. and (4.32)
0.60 .
.
.
.
. . . . . . . . . . . Approx. (4.28) with (4.27)
. ...................................................... Monte Carlo approx.,
0.55 50000 observations.
5 10 15 20 25 30
Number per Group
3 groups, level 0.05, power 0.8
for λi = Mi /N , where again κ◦ is the common value of κjk under the null
hypothesis, and κ0 is the derivative of the probability in (3.25), as a function
of the location shift between two groups, evaluated at the null hypothesis, and
calculated for various examples in §3.8.2. Hence
K−1 K
λi λj κ0 [θj† − θi† ].
X X
µ0J (0) =
i=1 j=i+1
Recall that κij depended on two group indices i and j only because the
locations were potentially shifted relative to one another; the value κ◦ and
its derivative κ0 are evaluated at the null hypothesis of equality of distri-
butions, and hence
h do not depend
i on the indices. Furthermore, from (4.21),
1
PK 3
Var [TJ ] ≈ 36 1 − k=1 λk /N . Consider the simple case in which λk = 1/K
for all k, and in which θj† − θi† = (j − i). Then µ0J (0) = κ0 (K 2 − 1)/6K,
1
Var [TJ ] ≈ 36 1 − 1/K 2 /N , and
(K 2 − 1)/6K κ0 (K 2 − 1) p
eJ = κ0 q = √ = κ0 K 2 − 1.
1 2 K2 − 1
36 [1 − 1/K ]
√ √
The efficacy of the Gaussian-theory contrast, from (4.6), is K 2 − 1/(2 3ε).
Hence the asymptotic relative efficiency of the Jonckheere-Terpstra test to the
contrast of means is
(κ0 )2 (12ε2 ). (4.35)
This is the same as the asymptotic relative efficiency in the two-sample case
in Table 3.6.
The ratio of sample sizes needed for approximately the same power for the
same alternative from the F and Kruskal-Wallis tests is approximately
2 2
) (κ0H )2 /(κ0A )2 = 12ε2 (κ0H )2 ,
NA /NH ≈ (ψA /ψH
which is the same as in the unordered case (4.35) and in the two-sample case.
4.9 Exercises
1. The data set
http://ftp.uni-
bayreuth.de/math/statlib/datasets/federalistpapers.txt
http://ftp.uni-bayreuth.de/math/statlib/datasets/Plasma Retinol
http://lib.stat.cmu.edu/datasets/CPS 85 Wages
reflects wages from 1985. The first 42 lines of this file contain a
description of the data set, and an explanation of variables; delete
these lines first, or skip them when you read the file. All fields are
numeric. The tenth field is sector, and the sixth field is hourly wage;
you can skip everything else. Test for a difference in wage between
various sector groups, using a Kruskal–Wallis test.
5
Group Differences with Blocking
95
96 Group Differences with Blocking
Under the null hypothesis that the distribution of the Xi is the same as the
distribution of the Yi , and again, assuming symmetry of the differences, then
(, Sj , ) and (, Rj , ) are independent random vectors, because Sj and |Xj | are
pairwise independent under H0 .
Components of the random vector (R1 , . . . , Rn ) are dependent, and hence
calculation of the variance from TSR via (5.2) requires calculation of the sum of
identically distributed but not independent random variables. An alternative
formulation, as the sum of independent but not identically distributed random
variables, will prove more tractable. Let
(
1 if the item whose absolute value is ranked j is positive
Vj = .
0 if the item whose absolute value is ranked j is negative
Nonparametric Paired Comparisons 97
P
Hence TSR = jVj , and the null expectation and variance are
X
E0 [TSR ] = j E0 [Vj ] = n(n + 1)/4 (5.3)
and
X X
Var0 [TSR ] = j 2 Var0 [Vj ] = j 2 /4 = n(2n + 1)(n + 1)/24. (5.4)
One can also calculate exact probabilities for TSR recursively, as one could
for the two-sample statistic. There are 2n ways to assign signs to ranks 1, . . . , n.
Let f (t, n) be the number of such assignments yielding TSR = t with n obser-
vations. Again, as in §3.4.1, summing the counts for shorter random vectors
with alternative final values,
0
for t < 0 or t > n(n + 1)/2
f (t, n) = 1 if n = 1 and t ∈ {0, 1} .
f (t, n − 1) + f (t − n, n − 1) otherwise
Pair 1 2 3 4 5 5 7 8 9 10
First 1005 1035 1281 1051 1034 1079 1104 1439 1029 1160
Second 963 1027 1272 1079 1070 1173 1067 1347 1100 1204
Diff. -42 -8 -9 28 36 94 -37 -92 71 44
Rank 6 1 2 3 4 10 5 9 8 7
Example 5.2.2 Perform the asymptotic score tests on the brain volume
differences of Example 5.2.1.
library(MultNonParam)
cat("Asymptotic test using normal scores\n")
brainpairs$normalscores<-qqnorm(seq(length(
brainpairs$diff)),plot.it=F)$x
symscorestat(brainpairs$diff,brainpairs$normalscores)
cat("Asymptotic test using savage scores\n")
brainpairs$savagescores<-cumsum(
1/rev(seq(length(brainpairs$diff))))
symscorestat(brainpairs$diff,brainpairs$savagescores)
giving p-values of 0.863 and 0.730 respectively.
Permutation testing can also be done, using raw data values as scores.
This procedure uses the logic that Xj and Yj having the same marginal
distribution implies Yj − Xj has a symmetric distribution. Joint distributions
exist for which this is not true, but these examples tend to be contrived.
-2 0 2 4 6 8 10
Potential Point of Symmetry
Hypothetical Data Set with 6 Observations
Nonparametric Paired Comparisons 101
Example 5.2.4 Refer again to the brain volume data of Example 5.2.1.
Find a 95% confidence interval for the difference in brain volume. The
0.025 quantile of the Wilcoxon signed-rank statistic with 10 observations
is t◦ = 9; this can be calculated from R using
qsignrank(0.025, 10)
and confidence interval endpoints are observations 9 and 55+1-9=47. As
tabulated above, Walsh average 9 and 47 are -32.0 and 49.5 respectively.
Hence the confidence interval is (-32.0, 49.5). This might have been cal-
culated directly in R using
wilcox.test(brainpairs$diff,conf.int=TRUE)
Similar techniques to those of this section were used in §2.3.3 to give con-
fidence intervals for the median, and in §3.10 to give confidence intervals
for median differences. In each case, a statistic dependent on the unknown
parameter was constructed, and estimates and confidence intervals were con-
structed by finding values of the parameter equating the statistic to appropri-
ate quantiles. However, the treatment of this section and that of §2.3.3 differ,
in that the resulting statistic in this section is non-increasing in θ in this sec-
tion, while in §2.3.3 it was non-decreasing. The increasing parameterization of
§2.3.3 was necessary to allow the same construction to be used in §2.3.4, when
quantiles other than the median were estimated; these quantiles estimates are
an increasing function of each observation separately. A parallel construction
might have been used in the current section, by inverting the statistic formed
by summing ranks of negative observations; as this definition runs contrary to
the common definition of the Wilcoxon Signed Rank statistic, it was avoided.
to match the notation of (2.15) and (2.19). Note that µ0 (0) is now twice the
value from the Mann-Whitney-Wilcoxon statistic, for distributions symmetric
about 0. This will be used for asymptotic relative efficiency calculations in the
exercises.
and hence
L X
X K
E [Rk.. ] = Mkl Mjl + 1 /2 . (5.6)
j=1
l=1
A Generalization of the Test of Friedman 103
Since blocks are ranked independently, variances and covariances for rank
sums add across blocks:
L X
X K
Cov [Rk.. , Rm.. ] = − [Mkl Mml ] Mjl + 1 /12. (5.8)
j=1
l=1
for k 6= m. In this balanced case, the test statistic (5.9) can be shown to equal
the sum of squares of ranks away from their average value per treatment
group:
XK
WF = 12L [R̄k.. − (M K + 1)/2]2 /[K(KM + 1)].
k=1
expenserank<-t(apply(as.matrix(expensesd[,-1]),1,rank))
rownames(expenserank)<-expensesd[[1]]
gives the table of ranks
g1 g2 g3 g4 g5 g6 g7
Housing 5 1 3 2 4 6 7
Operations 1 3 4 6 2 5 7
Food 1 2 7 3 5 4 6
Clothing 1 3 2 4 5 6 7
Furnishings 2 1 6 3 7 5 4
Transportation 1 2 3 6 5 4 7
Recreation 1 2 3 4 7 5 6
Personal 1 2 3 6 4 7 5
Medical 1 2 4 5 7 3 6
Education 1 2 4 5 3 6 7
Community 1 5 2 3 7 6 4
Vocation 1 5 2 4 3 6 7
Gifts 1 2 3 4 5 6 7
Other 5 4 7 2 6 1 3
Rank sums are 23, 36, 53, 57, 70, 70, 83, and group means (for example,
via
apply(expenserank,2,mean)
gives 1.643, 2.571, 3.786, 4.071, 5.000, 5.000, 5.929. Subtracting the overall
mean, 4, squaring, and adding, gives 13.367. Multiplying by 12L/(K(K +
1)) = 12×14/(7×8) = 3 gives the statistic value WF = 40.10. Comparing
to the χ26 distribution gives a p-value of 4.35×10−7 . This might also have
been done entirely in R using
friedman.test(as.matrix(expensesd[,-1]))
The development of Friedman’s test in the two-way layout remarkably pre-
ceded analogous testing in one-way layouts. When K = 2 and Mkl are all 1,
Friedman’s test is equivalent to the sign test applied to differences within a
block.
tice (1979) performs these calculations in somewhat more generality, and the
general unbalanced two-way test is generally called the Prentice test. Skillings
and Mack (1981) address this question using explicit numerical matrix inver-
sion.
temp1<-temp<-as.data.frame(scan("chicken.dat",what=
list(source="", lev=0,fish=0,weight=0,w2=0)))
temp1$weight<-temp1$w2;temp$h<-0;temp1$h<-1
temp$w2<-NULL;temp1$w2<-NULL
chicken<-rbind(temp,temp1)
attach(chicken)
boxplot(split(weight,source),horizontal=TRUE,ylab="g.",
main="Weight gain for Chickens", xlab="Protein Source")
detach(chicken)
Hence blocking on source is justified. Test for dependence of weight gain
on protein level, blocking on source. Perform these calculations in R using
attach(chicken)
prentice.test(weight,source,blocks=lev)
detach(chicken)
to obtain the p-value 0.1336.
.........................................................................
................................................... .... .............................................................................
S .. . .
Protein ..................................................................
Source .........................................
... .. ..
..............................
G ..
................................................
..............................................
library(MultNonParam)
#aov.P requires data sorted by block. Put block ends as the
#third argument.
108 Group Differences with Blocking
chicken<-chicken[order(chicken$lev),]
aov.P(chicken$weight,as.numeric(as.factor(chicken$source)),
c(8,16,24))
The p-value is 0.182.
This test was proposed by Page (1963) in the balanced case with one replicate
per block-treatment pair, and is called Page’s test.
For more general replication patterns, the null expectation and variance
for the statistic may be calculated, using (5.6), (5.7), and (5.8), and
K
X
E0 [TL ] = k E0 [Rk.. ]
k=1
K
X K
X K
X
Var0 [TL ] = k 2 Var0 [Rk.. ] + ikCov0 [Ri.. , Rk.. ] .
k=1 i=1 k=1,k6=i
using (3.20).
Tests for a Putative Ordering in Two-Way Layouts 109
library(crank); page.trend.test(expensesd[,-1],FALSE)
Page (1963) presents only the case with Mkl all 1, and provides a table of
the distribution of TL for small values of K and L. For larger values of K and
L than appear in the table, Page (1963) suggests a Gaussian approximation.
The scores in (5.10) are equally spaced. This is often a reasonable choice
in practice. When the Mkl are not all the same, imbalance among numbers of
ranks summed may change the interpretation of these scores, and a preferred
statistic definition to replace (5.10) is
K
X
TL∗ = k R̄k.. .
k=1
with moments given by
L
X XK L
X
E R̄k.. = Mkl Mjl + 1 /2 /
Mkl ,
l=1 j=1 l=1
PL P PK
l=1 {Mkl ( j6=k Mjl )( j=1 Mjl + 1)}
Var R̄k.. = PL ,
12( l=1 Mkl )2
and
!
L
X XK L
X L
X
Cov R̄k.. , R̄m.. = − [Mkl Mml ] Mjl + 1 / 12 Mkl Mml .
l=1 j=1 l=1 l=1
Example 5.6.2 Refer again to the chicken weight gain data of Exam-
ple 5.4.2. This data set is balanced, but one could ignore the balance and
numerically invert the variance matrix of all rank sums except the last.
In this case, test for an ordered effect of protein level on weight gain,
blocking on protein source. Perform the test using
library(MultNonParam)
attach(chicken)
cat(’\n Page test with replicates \n’)
page.test.unbalanced(weight,lev,source)
detach(chicken)
110 Group Differences with Blocking
In this balanced case, rank mean expectations are all 6.5, variances are
1.083, covariances are −0.542. The rank means by treatment level are
7.625, 7.250, 4.625, giving an overall statistic value of 36, compared with
a null expectation of 39 and null variance of 3.25; the p-value is 0.096.
Do not reject the null hypothesis.
5.7 Exercises
1. The data set
http://ftp.uni-bayreuth.de/math/statlib/datasets/schizo
http://ftp.uni-bayreuth.de/math/statlib/datasets/schizo
http://stat.rutgers.edu/home/kolassa/Data/yarn.dat
http://lib.stat.cmu.edu/datasets/CPS 85 Wages
reflects wages from 1985. The first 42 lines of this file contain a
description of the data set, and an explanation of variables; delete
these lines first, or skip them when you read the file. All fields are
numeric. The tenth field is sector, the fifth field is union member-
ship, and the sixth field is hourly wage; you can skip everything
else. Test for a difference in wage between union members and non-
members, blocking on sector.
6
Bivariate Methods
Suppose that independent random vectors (Xi , Yi ) all have the same joint
density fX,Y (x, y). This chapter investigates the relationship between Xi and
Yi for each i. In particular consider testing the null hypothesis fX,Y (x, y) =
fX (x)fY (y), without specifying an alternative hypothesis for fX,Y , or even
without knowledge of the null marginal densities.
This null hypothesis is most easily tested against the alternative hypothesis
that, vaguely, large values of X are associated with large values of Y (or vice
versa). Furthermore, if the null hypothesis is not true, the strength of the
association between Xi and Yi must be measured.
113
114 Bivariate Methods
So
X X X
Var Xj Zj = Xj2 Var [Zj ] + Xi Xj Cov [Zj , Zi ]
j j j6=i
= n2 σX
2 2
σY /(n − 1),
2 1 2
P
where σX = i n Xi . So
Var [rP ] = 1/(n − 1), (6.2)
under the permutation distribution (Hotelling and Pabst, 1936). This value
was determined by Student (1908), after fitting a parametric model to empir-
ical data, rounding parameter estimates to values more consistent with intu-
ition, and calculating the moments of this empirical distribution. Higher-order
moments were determined by David et al. (1951) using a method of proof sim-
ilar to that presented above for the variance.
This result suggests a test of the null hypothesis of independence versus
the two-sided alternative at level α using rP , that rejects the null hypothesis
if √
|rP | > zα/2 / n − 1. (6.3)
Nonparametric Correlation 115
that is, rS is the Pearson correlation on ranks. Exact agreement for ordering
of ranks results in a rank correlation of 1, exact agreement in the opposite
direction results in a rank correlation of -1, and the Cauchy-Schwartz theorem
indicates that these two vales are the extreme possible values for the rank
correlation.
The sums of squares in the denominator have the same value for every
data set, and the numerator can be simplified. Note that
n
X n
X
(j − (n + 1)/2)2 = j 2 − n(n + 1)2 /4
j=1 j=1
Example 6.3.1 Consider again the twin brain data of Example 5.2.1,
plotted in Figure 6.1 As before, the data set brainpairs has 10 records,
reflecting the results from 10 pairs of twins, and is plotted in Figure 6.1
via
attach(brainpairs); plot(v1,v2, xlab="Volume for Twin 1",
ylab="Volume for Twin 2",main="Twin Brain Volumes")
Ranks for twin brains are given in Table 6.1. The sum in the second factor
of (6.4) is
1×1+4×2+9×9+5×5+3×4+6×7+7×3+10×10+2×6+8×8 = 366,
and so the entire second factor is 366 − 10 × 112 /4 = 63.5, and the Spear-
man correlation is (12/(10 × 99) × 63.5 = 0.770. Observed correlations
may be calculated in R using
cat(’\n Permutation test for twin brain data \n’)
attach(brainpairs)
obsd<-c(cor(v1,v2),cor(v1,v2,method="spearman"))
to obtain the Pearson correlation (6.1) 0.914 and the Spearman corre-
lation (6.4) 0.770. Testing the null hypothesis of no association may be
performed using a permutation test with either of these measures.
out<-array(NA,c(2,20001))
dimnames(out)[[1]]<-c("Pearson","Spearman")
for(j in seq(dim(out)[2])){
newv1<-sample(v1)
out[,j]<-c(cor(newv1,v2),cor(newv1,v2,method="spearman"))
}
Nonparametric Correlation 117
to obtain p-values 1.5 × 10−4 and 6.1 × 10−3 . The asymptotic critical
value, from (6.3), is given by
cat("\n Asymptotic Critical Value\n")
-qnorm(0.025)/sqrt(length(v1)-1)
1400
◦
1300
◦
◦
1200
Volume ◦
for
Twin 2
1100 ◦
◦
◦ ◦
1000
◦
900
1000 1100 1200 1300 1400 1500
Volume for Twin 1
118 Bivariate Methods
Pair 1 2 3 4 5 5 7 8 9 10
Rank of First 1 4 9 5 3 6 7 10 2 8
Rank of Second 1 2 9 5 4 7 3 10 6 8
Note that
n
X n
X X
jRj = j 1 + I(Yj > Yi , Xj > Xi )
j=1 j=1 i6=j
n
X X
= n(n + 1)/2 + j I(Yj > Yi , Xj > Xi ).
j=1 i6=j
6.3.2 Kendall’s τ
Pairs of bivariate observations for which the X and Y values are in the
same order are called concordant; (6.7) refers to the probability that a pair
is concordant. Pairs that are not concordant are called discordant. Kendall
(1938) constructs a new measure based on counts of concordant and discor-
dant pairs. Consider the population quantity τ = 2p1 − 1, for p1 of (6.7),
Nonparametric Correlation 119
for Zij = I((Xj −Xi )(Yj −Yi ) > 0). Then U † = n(n−1)/2−U is the number of
discordant pairs; this number equals the number of rearrangements necessary
to make all pairs concordant. Estimate τ by the excess of concordant over
discordant pairs, divided by the maximum:
U − (n(n − 1)/2 − U ) 4U
rτ = = − 1. (6.8)
n(n − 1)/2 n(n − 1)
Note that E [U ] = n(n − 1)p1 /2, for p1 of (6.7), and E [rτ ] = 2p1 − 1. The null
value of p1 is half, recovering
E0 [U ] = n(n − 1)/4, E0 [rτ ] = 0. (6.9)
As with the Spearman correlation rS , Kendall’s correlation rτ may be used
to test the null hypothesis of independence, relying on its asymptotic Gaussian
distribution, but the test requires the variance of rτ . Note that
X ∗
X
Var [U ] = Var [Zij ] + Cov [Zij , Zkl ] .
i<j
P∗
Here the sum over three distinct indices i < j, k < l. This sum consists
of n2 (n − 1)2 /4 − n(n − 1)/2 − n(n − 1)(n − 2)(n − 3)/4 = n(n − 1)(n − 2)
terms. Hence
X ∗
X
Var [U ] = (p1 − p21 ) + (p3 − p21 )
i<j
Example 6.3.2 Consider again the twin brain data of Example 5.2.1,
with ranks in Table 6.1. Discordant pairs are 2 – 5, 2 – 9, 4 – 7, 4 – 9,
5 – 7, 5 – 9, 6 – 7, and 7 – 9. Hence 8 of 45 pairs are discordant, and
the remainder, 37, are concordant. Hence rτ = 4 × 37/90 − 1 = 0.644,
from (6.8). This may be calculated using
cor(brainpairs$v1,brainpairs$v2,method="kendall")
Figure 6.2 shows two artificial data sets with functional relationships
between two variables. In the first relationship, the two variables are iden-
tical, while in the second, the second variable is related to the arc tangent of
the first. Both variables relationships show perfect association. The first rela-
tionship represents perfect linear association, while the second reflects perfect
nonlinear association. Hence the Spearman and Kendall association measures
are 1 for both relationships; the Pearson correlation is 1 for the first relation-
ship, and 0.937 for the second relationship.
10
♦ ♦ ♦ ♦
♦
♦ ♦
♦
♦
5
♦
y 0 ♦
Symbol Relationship
♦ y=x
-5
♦ ♦ y = 5 arctan(x)
♦
♦
♦ ♦ ♦
♦ ♦ ♦
-10
-10 -5 0 5 10
x
Bivariate Semi-Parametric Estimation via Correlation 121
Generally, β̂1 6= smed [Y1 , . . . , Yn ] − β̂2 smed [X1 , . . . , Xn ]. Sen (1968) investi-
gates this procedure further.
This approach to estimation of β2 only holds in contexts with no ties
among the explanatory variable values.
Bivariate Semi-Parametric Estimation via Correlation 123
1 2 3 4 10 12 18
9 15 19 20 45 55 78
The following commands calculate a confidence interval for the slope
parameter:
tt<-c(1,2,3,4,10,12,18); xx<-c(9,15,19,20,45,55,78)
out<-rep(NA,length(tt)*(length(tt)-1)/2)
count<-0
for(ii in seq(length(tt)-1)) for(jj in (ii+1):length(tt)){
count<-count+1
out[count]<-(xx[jj]-xx[ii])/(tt[jj]-tt[ii])
}
There are 7 × 6/2 = 21 pairwise slopes W̃j :
1.00 2.50 3.67 3.71 3.75 3.83 3.93 3.94 4.00 4.00
4.00 4.00 4.06 4.12 4.14 4.17 4.18 4.38 5.00 5.00 6.00
The estimate of β2 in the regression model med [Xj ] = β1 + β2 tj is the
median of these pairwise slopes, 4. Confidence intervals may be exhibited
using quantiles of the number of concordant pairs:
library(MultNonParam); qconcordant(0.025,7)
giving 4 as the quantile. Hence the 0.95 confidence interval is (W̃4 , W̃18 ) =
(3.71, 4.38). These calculations may be performed using theil(xx,tt). It
may also be performed using theilsen(xx~tt) from the package deming,
which also gives a confidence interval for the intercept term. The intercept
is estimated by (6.14). Plotting may be done via
library(deming); plot(tt,xx);abline(theilsen(xx~tt))
Results are in Figure 6.3.
◦.......
.......
..
........
10 ..
.......◦
.......
.
.
.......
.......
0
0 5 10 15 20
x
which the horizontal lines cross the correlation curve, and their intersec-
tions with the horizontal axis determine the end points of the regression
confidence interval. Recall that the Theil estimator is the inversion of the
Kendall correlation. Figure 6.5 displays the results of
attach(bp)
plot(spd,dpd,main="Blood Pressure Change", ylab="Diastolic",
xlab="Systolic")
library(deming)#For theilsen
tsout<-theilsen(dpd~spd)
abline(tsout); abline(lm(dpd~spd),lty=2)
legend(median(spd),median(dpd),lty=1:2,
legend=c("Inversion of tau","Least squares"))
detach(bp)
producing regression lines.
5 ◦ ...
......
....... .
.......
◦ ................
....
......
......
0 ...
..
......
...... ◦
.....
.....
. ......
. ......
◦ . .....
............
◦ . .....◦
..
. ......
-5 . .....
. ......
. ...........
◦
.
. ............
.
Diastolic ◦ . ......
. .....
. ......
. .....
Blood . ............
Pressure -10
. .....
.
. ........
. ........◦
Change ◦ . . . ..............
.............. Inversion of tau
. ..........
.
.
. ...........
. ......
.
. . . Least squares
-15 . ......
.
. .
........
.
◦. .
......
.
. ......
. ......
. ......
.
.
.
...
....... ◦
. .... ◦
. ......
. ......
-20 .
.
....
........
...... ◦
......
......
◦
-25
-35 -30 -25 -20 -15 -10 -5 0
Systolic Blood Pressure Change
6.5 Exercises
1. The data set
http://ftp.uni-
bayreuth.de/math/statlib/datasets/federalistpapers.txt
http://stat.rutgers.edu/home/kolassa/Data/twinbrain.dat
http://ftp.uni-bayreuth.de/math/statlib/datasets/schizo
Suppose that one observes n subjects, indexed by i ∈ {1, . . . , n}, and, for sub-
ject i, observes responses Xij , indexed by j ∈ {1, . . . , J}. Potentially, covari-
ates are also observed for these subjects.
This chapter explores explaining the multivariate distribution of Xij in
terms of these covariates. Most simply, these covariates often indicate group
membership.
129
130 Multivariate Analysis
pooled sample covariance for the all observations: Σ̂j,j 0 = ((M1 − 1)Σ̂1,j,j 0 +
(M2 − 1)Σ̂2,j,j 0 )/(M1 + M2 − 2). Then the Hotelling two-sample statistic
measures the difference between sample mean vectors, in a way that accounts
for sample variance, and combines the response variables. Furthermore, under
the null hypothesis of equality of distribution, and assuming that this distri-
bution is multivariate Gaussian,
M1 + M2 − J − 1 2
T ∼ FJ,M1 +M2 −J−1 .
(M1 + M2 − 2)J
FIGURE 7.1: Univariate Normal Data That are Not Bivariate Normal
4 3 ◦
◦
◦
3 ◦
2 ◦◦
◦◦
◦◦
◦
◦ ◦◦◦
◦
2 ◦ 1 ◦
◦
◦◦◦◦◦
◦
◦ ◦◦
◦
◦
◦ ◦◦
◦
◦◦ ◦
◦
◦◦◦
◦◦◦ ◦◦◦
1 ◦
◦
◦◦◦◦◦ 0 ◦◦
◦◦◦◦◦◦
◦◦
◦◦
◦◦
◦
◦ ◦◦◦
◦
◦
◦ ◦◦
◦
◦◦◦◦ ◦◦
◦
◦◦
◦◦◦◦◦ ◦
◦
◦
◦
0 ◦◦◦◦◦◦◦◦ -1 ◦
◦
◦◦◦◦◦◦
◦◦◦ ◦◦
◦◦◦
◦◦
◦
◦◦
◦ ◦
◦◦
◦◦
◦ ◦◦
-1 ◦
◦◦◦◦ -2 ◦◦
◦◦ ◦◦
◦◦
◦◦
◦◦◦
-2 ◦◦ -3 ◦
◦
-3 -4
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
4 ◦
◦ ◦
4
◦ ◦
2 ◦◦
◦ ◦
◦◦
◦◦◦
◦
2 ◦◦
◦
◦ ◦
◦
◦◦◦◦ ◦
◦
◦◦ ◦
◦
◦
◦◦
◦
◦◦ ◦◦◦
0 ◦
◦
◦
◦
◦
◦
◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦
◦
◦
◦
◦
◦
◦
◦
◦◦
◦
◦◦
◦
◦
◦◦
◦ 0 ◦
◦
◦◦◦
◦
◦◦
◦
◦◦
◦
◦ ◦
◦
◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦
◦
◦
◦
◦
◦◦ ◦
◦◦◦ ◦◦◦
◦◦◦ ◦◦◦
◦
-2 -2 ◦
◦◦
◦◦ ◦◦
◦
◦◦
◦
-4 -4
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
that is, the estimate minimizes the sum of distances from data vectors to
the parameter vector, with distance measured by the sum of absolute val-
132 Multivariate Analysis
Then combine components of T from (7.5) to give the multivariate sign test
statistic, or from (7.6) to give the multivariate sign rank test. In either case,
components are combined using
for Υ = Var0 [T ]. As in §2.3, in the case that the null location value is 0, the
null distribution for the multivariate test statistic is generated by assigning
equal probabilities to all 2n modifications of the data set by multiplying the
rows (Xi1 , . . . , XiJ ) by +1 or −1. That is, the null hypothesis distribution of
T (X) is generated by placing probability 2−n on all of the 2n elements of
Test statistics (7.1) and (7.2) arose as quadratic forms of independent and
identically distributed random vectors, and the variances included in their
definitions were scaled accordingly. Statistic (7.3) is built using a more com-
plicated variance; this pattern will repeat with nonparametric analogies to
parametric tests.
Combining univariate tests into a quadratic form raises two difficulties. In
previous applications of rank statistics, that is, in the case of univariate sign
and signed-rank one-sample tests, in the case of two-sample Mann-Whitney-
Wilcoxon tests, and in the case of of Kruskal-Wallis testing, all dependence
of the permutation distribution on the original data was removed through
ranking. This is not the case for T , since this distribution involves correla-
tions between ranks of the various response vectors. These correlations are not
specified by the null hypothesis. The separate tests are generally dependent,
and dependence structure depends on distribution of raw observations. The
asymptotic distribution of (7.7) relies on this dependence via the correlations
between components of T . The correlations must be estimated.
Furthermore, the distribution of W of (7.7) under the null hypothesis is
dependent on the coordinate system for the variables, but, intuitively, this
dependence on the coordinate system might be undesirable. For example,
suppose that (X1i , X2i ) has an approximate multivariate Gaussian distribu-
tion, with expectation µ, and variance Σ, with Σ known. Consider the null
hypothesis H0 : µ = 0. Then the canonical test is (7.1), and it is unchanged
if the test is based on (Ui , Vi ) for Ui = X1i + X2i and Vi = X1i − X2i , with
Σ modified accordingly. Hence the parametric analysis is independent of the
coordinate system.
The first of this difficulty is readily
√ addressed. Under H0 , the marginal
sign test statistic (7.5) satisfies Tj / n ≈ G(0, 1). Conditional on the relative
ranks of the absolute values of the observations, the permutation distribution
134 Multivariate Analysis
is entirely specified, and conditional joint moments are calculated. Under the
permutation distribution,
X
σ̂jj 0 = s̃(Xij )s̃(Xij 0 )/n, (7.9)
i
and so the variance estimate used in (7.7) has components υ̂jj 0 = σ̂jj 0 /n.
Here again, the covariance is defined under the distribution that consists of
2n random reassignment of signs to the data vectors, each equally weighted.
As before, variances do not depend on data values, but covariances do depend
on data values. The solution for the Wilcoxon signed-rank test is also available
(Bennett, 1965).
Using the data to redefine the coordinate system may be used to address
the second problem (Randles, 1989; Oja and Randles, 2004).
Combine the components of T (X) to construct the statistic
for G−1 2
J (1 − α, 0) the 1 − α quantile of the χJ distribution, with non-centrality
parameter 0. Bickel (1965) discusses these (and other tests) in generality.
Example 7.3.1 Consider the data of Example 6.4.2. We test the null
hypothesis that the joint distribution of systolic and diastolic blood pres-
sure changes is symmetric about (0, 0), using Hotelling’s T 2 and the two
asymptotic tests that substitutes signs and signed-ranks for data. This test
is performed in R using
# For Hotelling and multivariate rank tests resp:
library(Hotelling); library(ICSNP)
cat(’\n One-sample Hotelling Test\n’)
HotellingsT2(bp[,c("spd","dpd")])
cat(’\n Multivariate Sign Test\n’)
rank.ctest(bp[,c("spd","dpd")],scores="sign")
cat(’\n Multivariate Signed Rank Test\n’)
rank.ctest(bp[,c("spd","dpd")])
Nonparametric One-Sample Testing Approaches 135
P -values for Hotelling’s T 2 , the marginal sign rank test, and marginal
sign test, are 9.839 × 10−6 , 2.973 × 10−3 , and 5.531 × 10−4 .
Tables 7.1 and 7.2 contain attained levels and powers for one-sample multi-
variate tests with two manifest variables of nominal level 0.05, from various
distributions.
Tests compared are Hotelling’s T 2 tests, and test (7.10) applied to the
sign and signed-rank tests. Tests have close to their nominal levels, except
for Hotelling’s test with the Cauchy distribution; furthermore, the agreement
is closer for sample size 40 than for sample size 20. Furthermore, the sign
test power is close to that of Hotelling’s test for Gaussian variables, and the
signed-rank test has attenuated power. Both nonparametric tests have good
power for the Cauchy distribution, although Hotelling’s test performs poorly,
and both perform better than Hotelling’s test for Laplace variables.
Some rare data sets simulated to create Tables 7.1 and 7.2 include some
for which Υ is estimated as singular. Care must be taken to avoid difficulties;
in such cases, p-values are set to 1.
larger than, that observed. In this way, the analysis of the previous subsection
for the sign test, and by extension the signed-rank test, can be extended to
general rank tests, including tests with data as scores.
I = {µ|W (X − 1n ⊗ µ) ≤ χ2J,α },
Example 7.4.1 Recall again the blood pressure data set of Exam-
ple 6.4.2. Figure 7.2 is generated by
library(MultNonParam); shiftcr(bp[,c("dpd","spd")])
and exhibits the 0.05 contour of p-values for the multivariate test con-
structed from sign rank tests for each of systolic and diastolic blood pres-
sure, and forms a 95% confidence region. Note the lack of convexity.
-10
..............
.. ..
................................................. ...........................
.. ..
............... ...
.. ...
...................................... .............
.
....................................
.. ...
.. ..
.............. . ...
.. ...
......................... ...
... ...
-15 ..............
...
.
.............
...
.. ...
. .
.... ..
.
. .
.
.............
.. ...
...
................ ...
.. .
.
............. .
.. ...
Systolic .............. ..
.. ...
.............. ..............
.. .
.
.... .
...
.. ...............
... ..
.............. ...
-20 ..
..............
...
...
... ..
... .....................................
.............. .............
.. ...
... ....
... .
.
.... ..........................
.
.............. .........................
.. ..
.... ............. .
.
... ...
.. ..
. ...
.............. ..........................
.
.............. .
.
-25 ...
...
...................................
..............
.
.
.. ..
................................................................................................
-15 -10 -5 0
Diastolic
95% Confidence from Inversion of Bivariate Sign Rank Test
have the same distribution for i ≤ M1 , and that the vectors (Xi1 , . . . , XiJ )
all have the same distribution for i > M1 . Let g be a vector indicating group
membership; gi = 1 if i ≤ M1 , and gi = 2 if i > M1 . As in §7.1, consider
testing and confidence interval questions.
X ∗ = {(X, g)|g has M1 entries that are 1 and M2 that are 2};
p-values are calculated by counting the number of such statistics with values
equal to or exceeding the observed value, and dividing the count by M1M+M
1
2
.
Other marginal statistics may be combined; for example, one might use
the Max t-statistic, defined by first calculating univariate t-statistics for each
manifest variable, and reporting the maximum. This statistic is inherently
one-sided, in that it presumes an alternative in which each marginal distribu-
tion for the second group is systematically larger than that of the first group.
Alternatively, one might take the absolute value of the t-statistics before maxi-
mizing. One might do a parallel analysis with either the maximum of Wilcoxon
rank-sum statistics or the maximum of their absolute values, after shifting to
have a null expectation zero.
Finally, one might apply permutation testing to the statistic (7.3), calcu-
lated on ranks instead of data values, to make the statistic less sensitive to
extreme values.
Example 7.5.1 Consider the data on wheat yields, in metric tons per
hectare (Cox and Snell, 1981, Set 5), reflecting yields of six varieties of
wheat grown at ten different experimental stations, from
http://stat.rutgers.edu/home/kolassa/Data/set5.data .
Two of these varieties, Huntsman and Atou, are present at all ten sta-
tions, and so the analysis will include only these. Stations are included
from three geographical regions of England; compare those in the north
to those elsewhere. The standard Hotelling two-sample test may be per-
formed in R using
wheat<-as.data.frame(scan("set5.data",what=list(variety="",
y0=0,y1=0,y2=0,y3=0,y4=0,y5=0,y6=0,y7=0,y8=0,y9=0),
na.strings="-"))
# Observations are represented by columns rather than by
# rows. Swap this. New column names are in first column.
dimnames(wheat)[[1]]<-wheat[,1]
wheat<-as.data.frame(t(wheat[,-1]))
dimnames(wheat)[[1]]<-c("E1","E2","N3","N4","N5","N6","W7",
"E8","E9","N10")
wheat$region<-factor(c("North","Other")[1+(
substring(dimnames(wheat)[[1]],1,1)!="N")],
c("Other","North"))
attach(wheat)
plot(Huntsman,Atou,pch=(region=="North")+1,
main="Wheat Yields")
Two-Sample Methods 139
legend(6,5,legend=c("Other","North"),pch=1:2)
Data are plotted in Figure 7.3. The normal-theory p-value for testing
equality of the bivariate yield distributions in the two regions is given by
library(Hotelling)#for hotelling.test
print(hotelling.test(Huntsman+Atou~region))
The results of hotelling.test must be explicitly printed, because the
function codes the results as invisible, and so results won’t be printed
otherwise. The p-value is 0.0327. Comparing this to the univariate results
t.test(Huntsman~region);t.test(Atou~region)
gives two substantially smaller p-values; in this case, treatment as a multi-
variate distribution did not improve statistical power. On the other hand,
the normal quantile plot for Atou yields shows some lack of normality.
Outliers do not appear to be present in these data, but if they were,
they could be addressed by performing the analysis on ranks, either using
asymptotic normality:
cat(’Wheat rank test, normal theory p-values’)
print(hotelling.test(rank(Huntsman)+rank(Atou)~region))
or using permutation testing to avoid the assumption of multivariate nor-
mality:
#Brute-force way to get estimate of permutation p-value for
#both T2 and the max t statistic.
cat(’Permutation Tests for Wheat Data, Brute Force’)
obsh<-hotelling.test(Huntsman+Atou~region)$stats$statistic
obst<-max(c(t.test(Huntsman~region)$statistic,
t.test(Atou~region)$statistic))
out<-array(NA,c(1000,2))
dimnames(out)<-list(NULL,c("Hotelling","t test"))
for(j in seq(dim(out)[[1]])){
newr<-sample(region,length(region))
hto<-hotelling.test(Huntsman+Atou~newr)
out[j,1]<-hto$stats$statistic>=obsh
out[j,2]<-max(t.test(Huntsman~newr)$statistic,
t.test(Atou~newr)$statistic)>=obst
}
apply(out,2,mean)
giving permutation p-values for the Hotelling and max-t statistics of 0.023
and 0.003 respectively. The smaller max-t statistic reflects the strong
140 Multivariate Analysis
association between variety yields across stations. If one wants only the
Hotelling statistic significance via permutation, one could use
print(hotelling.test(Huntsman+Atou~region,perm=T,
progBar=FALSE))
The argument progBar will print a progress bar, if desired, and an addi-
tional argument controls the number of random permutations.
7.5
7
6.5 ♦
6
♦
5.5 Symbol Region
♦
Other
5 ♦ North
♦ ♦
4.5
4 4.5 5 5.5 6 6.5 7 7.5
Example 7.5.2 Consider again the wheat yield data of Example 7.5.1.
Asymptotic nonparametric testing is performed using
library(ICSNP)#For rank.ctest and HotelingsT2.
rank.ctest(cbind(Huntsman,Atou)~region)
Exercises 141
rank.ctest(cbind(Huntsman,Atou)~region,scores="sign")
detach(wheat)
to obtain a multivariate version of Mood’s median test.
Alternate syntax for rank.ctest consists of calling it with two argu-
ments corresponding to the two data matrices.
7.6 Exercises
1. The data set
http://ftp.uni-
bayreuth.de/math/statlib/datasets/federalistpapers.txt
http://lib.stat.cmu.edu/datasets/cloud
contain data from a cloud seeding experiment. The first fifteen lines
contain comment and label information; ignore these. The second
field contains the character S for a seeded trial, and U for unseeded.
142 Multivariate Analysis
a. The fourth and fifth represent rainfalls in two target areas. Test
the null hypothesis that the bivariate distribution of observations
after seeding is the same as that without seeding. Use the marginal
rank sum test.
b. Repeat part (a) using the permutation version of Hotelling’s test.
8
Density Estimation
This chapter considers the task of estimating a density from a sample of inde-
pendent and identical observations. Previous chapters began with a review
of parametric techniques. Parametric techniques for density estimation might
involve estimating distribution parameters, and reporting the parametric den-
sity with estimates plugged in; this technique will not be further reviewed in
this volume.
8.1 Histograms
The most elementary approach to this problem is the histogram, which rep-
resents the density as a bar chart.
The bar chart represents frequencies of the values of a set of categorical
variable in various categories as the heights of bars. When the categories are
ordered, one places the bars in the same order. To construct a histogram for
a continuous random variable, then, coarsen the continuous variable into a
categorical variable, whose categories are subsets of the range of the original
variable. Construct a bar plot for this categorical variable, again, with bars
ordered according to the order of the intervals.
Because the choice of intervals is somewhat arbitrary, the boundary
between bars is deemphasized by making the bars butt up against their neigh-
bors. The most elementary version of the histogram has the height of the bar
as the number of observations in the interval. A more sophisticated analysis
makes the height of the bar represent the proportion of observations in the
bar, and a still more sophisticated representation makes the area of the bar
equal to the proportion of observation in the bar; this allows bars of unequal
width while keeping the area under a portion of the curve to approximate the
proportion of observation in that region. Unequally-sized bars are unusual.
Construction of a histogram, then, requires selection of bar width and bar
starting point. One generally chooses the end points of intervals generating the
bars to be round numbers. A deeper question is the number of such intervals.
An early argument (Sturges, 1926) involved determining the largest number
so that if the data were in proportion to a binomial distribution, every interval
143
144 Density Estimation
The quality of the histogram depends primarily on the width of the bin ∆n ;
the choice of tn is less important. Scott (1979) takes tn = 0, and, using Taylor
approximation techniques within the interval, shows that the integrated mean
squared error of the approximation is
Z ∞
1
1/(n∆n ) + ∆n 2
f 0 (x)2 dx + O(1/n + ∆n 3 ). (8.2)
12 −∞
The first term in (8.2) represents the variance of the estimator, and the second
term represents the square of the bias. Minimizing
R∞ the sum of the first two
terms gives the optimal bin size ∆∗n = [6/ −∞ f 0 (x)2 dx]1/2 n−1/3 . One might
R∞
approximate −∞ f 0 (x)2 dx by using the value of the Gaussian density with
variance matching that of the data, to obtain ∆∗n = 3.49sn−1/2 , for s the
standard deviation of the sample.
Example 8.2.1 Refer again to the nail arsenic data from Example 2.3.2.
Figure 8.1 displays kernel density estimates with a default bandwidth and
for a variety of kernels. This figure was constructed using
cat(’\n Density estimation \n’)
attach(arsenic)
#Save the density object at the same time it is plotted.
plot(a<-density(nails),lty=1,
main="Density for Arsenic in Nails for Various Kernels")
lines(density(nails,kernel="epanechnikov"),lty=2)
lines(density(nails,kernel="triangular"),lty=3)
lines(density(nails,kernel="rectangular"),lty=4)
legend(1,2, lty=rep(1,4), legend=c("Normal","Quadratic",
"Triangular","Rectangular"))
Note that the rectangular kernel is excessively choppy, and fits the poorest.
Generally, any other density, symmetric and with a finite variance, may be
used. The parameter ∆n is called the bandwidth. This bandwidth should
depend on spread of data, and n; the spread of the data might be described
using the standard deviation or interquartile range, or, less reliably, the sample
range.
The choice of bandwidth balances effects of variance and bias of the den-
sity estimator, just as the choice of bin width did for the histogram. If the
bandwidth is too high, the density estimate will be too smooth, and hide fea-
tures of data. If the bandwidth is too low, the density estimate will provide
too much clutter to make understanding the distribution possible.
We might consider minimizing the mean squared error for x fixed, rather
than after integration. That is, choose ∆n to minimize the mean square error
of the estimate,
h i h i 2
MSE[fˆ(x)] = Var fˆ(x) + E fˆ(x) − f (x) .
146 Density Estimation
2.5 ......
..... ......
......... ....
.... .... .....
....... ........... ....
.. ....... ..
...... ........ ..
.... . .. .......
... ... ... ..
...... ..... ..... ....
.. . .
..... .. ...
...... ... ........
...... ... ........
2 ........ ... ..........
..... ... .........
...... .. .........
.... .
... ..
....... ....
........
.......
............................................................. Normal
.... .. ....
.....
.
................
.
. ......
.......
. . . . . . . . . . Quadratic
........ .......
......
........
.. ...
......
........
.........
................................................. Triangular
1.5 .....
.........
......
.........
.......
......................................................... Rectangular
..... .....
Den- ...... ......
........ .......
.. ... .........
sity ...... .........
....... .........
... ......
............. ...............
......... ... ........ ..
... ..... ... .... ...
1 ..........
..........
... ....... ...
.......... ...
......... ......
......
......... .......
......... .....
.....
.......
...... ......
...... .........
.......
...... ......
..... .......
.... ........
..
0.5 .......
.....
.........
....
..... ......
...... ....... ...... ......
....... ...... .... .......
... ....
...... ..........
..... ...........................................
...... .... ..... ..... ... ...
.... ..
.. ..... . ..........
.....
.... ...... .............. ....... ........ . .....
....... ................... .......
....... ..................................................
.
.
....... .... . .
........
...... ..............
. ..........
........
. . . . .
. .
......... ..
.... ......... ......... ..........
.................... ..... ................................................................................................................................................................................................................... .....................
0
0 0.5 1 1.5 2 2.5
Nail Arsenic
Example 8.2.2 Refer again to the arsenic nail data of Examples 2.3.2
and 8.2.1. The kernel density estimate, using the suggested bandwidth,
and bandwidths substantially larger and smaller than the optimal, are
given in Figure 8.2. The excessively small bandwidth follows separate
data points closely, but obscures the way they cluster together. The exces-
sively large bandwidth wipes out all detail from the data set. Note the
large probability assigned negative arsenic concentrations by the estimate
with the excessively large bandwidth. This plot was drawn in R using
plot(density(nails,bw=a$bw/10),xlab="Arsenic",type="l",
main="Density estimate with inopportune bandwidths")
lines(density(nails,bw=a$bw*5),lty=2)
legend(1,2,legend=paste("Band width default",c("/10","*5")),
lty=c(1,2))
detach(arsenic)
and the object a is the kernel density estimate with the default bandwidth,
constructed in Example 8.2.1.
Silverman (1986, §3.3) discusses a more general kernel w(t). Equations (8.5)
and (8.6) hold, with
Z ∞ Z ∞
C1 (x) = f (x) w(t)2 dt, C2 (x) = 1/2f 00 (x) t2 w(t) dt,
−∞ −∞
and (8.8) continues to hold. Sheather and Jones (1991) further discuss the
constants involved.
Epanechnikov (1969) demonstrates that (8.3) extends to multivariate dis-
tributions; estimate the multivariate density f (x) from independent and iden-
tically distributed observations X1 , . . . , Xn , where Xi = (Xi1 , . . . , Xid ), using
d
Y n
X
fˆ(x) = n−1 ( ∆nj )−1 w((Xi1 − x1 )/∆n1 , . . . , (Xid − xd )/∆nd ).
j=1 i=1
148 Density Estimation
8 ...
......
......
........
.....
......
6 ... ....
..... ....
.
....
. .. ......
... ... ......
.... ....
. . Band width divided by 10
........
.....
...............
.. ... ......
... ..
Density 4 ........ ....
...... ..
Band width multiplied by 5
........
.....
. . .
.... ... ......
.... ..... ........
.. ... ... ...
... ... ... ... ..
... ... ... .... . ......
.... .. .. .
2 ...
... .. ........ .... ..... .....
... ... ......... ...... ..... ...... .. ....
.
.
....
.....
.. ... ... ........ ...... ....... ...... .. ... ....
... ... ....... ...... ..... ...... .. .... ......
..... .. . . . .. . . . .. . . .
. .
.
. . .... . ..... . .... . ......... ..... ......... . .... ... . . .
. .
.. ... .. ...
. . ..... ..... .. ..
. ... ... ..... ... ... ... ... ... .. ...
.. ........ .. ...... ........ .................................... .................................................................................................................................................................................. . . . .
..
0
0 0.5 1 1.5 2 2.5
Nail Arsenic
8.3 Exercises
1. The data at
http://lib.stat.cmu.edu/datasets/CPS 85 Wages
reflects wages from 1985. The first 27 lines of this file represent an
explanation of variables; delete these lines first, or skip them when
you read the file. The first six fields are numeric. The sixth is hourly
wage; you can skip everything else. Fit a kernel density estimate to
the distribution of wage, and plot the result. Comment on what you
see.
2. The data at
http://lib.stat.cmu.edu/datasets/cloud
contain data from a cloud seeding experiment. The first fifteen lines
contain comment and label information; ignore these. The third field
indicates season, and the sixth field represents a rainfall measure
expected not to be impacted by the experimental intervention. On
the same set of axes, plot kernel density estimates of summer and
winter rain fall, and comment on the difference.
9
Regression Function Estimates
Yj = g(Xj ) + j , (9.1)
149
150 Regression Function Estimates
for
X the n × K matrix with rows given by Xi> , (9.4)
and Y is the column vector with entries Yi . The estimator (9.3) is defined only
if X is of full rank, or, in other words, if the inverse in (9.3) exists. Otherwise,
the model is not identifiable, in that two different parameter vectors β and
β ∗ give the same fitted values, or Xβ = Xβ ∗ .
When the errors j have a Gaussian distribution, then the vector β̂ has a
multivariate Gaussian distribution, exactly, and
p
(β̂i − βi )/ s2 vi ∼ Tn−K , (9.5)
n
for s2 = i=1 (Yi − β > Xi )2 /(n − K), and vi the entry in row and column
P
i of (X X)−1 . When the errors j are not Gaussian, under certain circum-
>
The weight function can be the same as was used for kernel density estimation.
This weight function is often a Gaussian density, or a uniform density centered
at 0. Fan (1992) discusses a local regression smoother
L
X
ĝ(x) = βˆ` x` , (9.7)
`=0
for L = 1 and
n L
!2
X X
β̂ = argmin Yj − βˆ` Xj` w((x − Xj )/∆n ) , (9.8)
j=1 `=0
and argues that this estimator has smaller bias than (9.6). Köhler et al. (2014)
present considerations for bandwidth selection.
Kernel and Local Regression Smoothing 151
Example 9.2.1 Example 2.3.2 presents nail arsenic levels from a sam-
ple; arsenic levels in drinking water were also recorded, and one can inves-
tigate the dependence of nail arsenic on arsenic in water. The data may
be plotted in R using
attach(arsenic)
plot(water,nails,main="Nail and Water Arsenic Levels",
xlab="Water",ylab="Nails",
sub="Smoother fits. Bandwidth chosen by inspection.")
Kernel smoothing may be performed in R using the function ksmooth.
Bandwidth was chosen by hand.
lines(ksmooth(water,nails,"normal",bandwidth=0.10),lty=1)
lines(ksmooth(water,nails,"box",bandwidth=0.10),lty=2)
legend(0.0,2.0,lty=1:3, legend=c("Smoother, Normal Kernel",
"Smoother, Box Kernel", "Local Polynomial"))
The function locpoly from library KernSmooth applies a local regression
smoother:
library(KernSmooth)
lines(locpoly(water,nails,bandwidth=.05,degree=1),lty=3)
detach(arsenic)
Results are given in Figure 9.1. The box kernel performs poorly; a sample
this small necessarily undermines smoothness of the box kernel fit. Per-
haps the normal smoother is under-smoothed, but not by much. Library
KernSmooth contains a tool dpill for automatically selecting bandwidth.
The documentation for this function indicates that it sometimes fails,
and, in fact, dpill failed in this case. This local regression smoother
ignored the point with the largest values for each variable, giving the curve
a concave rather than convex shape.
These data might have been jointly modeled on the square-root scale,
to avoid issues relating to the distance of the point with the largest values
for each variable from the rest of the data. An exercise suggests exploring
this; in this case, the automatic bandwidth selector dpill returns a value.
Figure 9.2 demonstrates the results of selecting a bandwidth too small.
In this case, the Gaussian kernel gives results approximately constant
in the neighborhood of each data point, and the box kernel result is not
defined for portions of the domain, because both numerator and denomi-
nator in (9.6) are zero.
152 Regression Function Estimates
2.5
........................... Smoother, Normal Kernel
. . . .◦
. . . . . Smoother, Box Kernel .
.
2
...................... Local Polynomial .
.
.
. ...
. ........
. ........
........
..
.....
1.5 .....
..... .
.
..... .
Nail . . . . . . ......... . .
.
.
.
. ....
As .. . . ..
.
....
....
....
. ....
....
1 . .....
. ..
.....
...
◦ ◦ . .....
. .....
.....
..... ..
. ........
. .
...
..
..........................................................................
.
. ....... . . .
................ . . . ............
............. ......
........... . . . . . .......
.......... .......
0.5 ◦
..... ........... . . ...
....
........
... .......... ◦
........ . .............
◦ ....... .................
◦ ....... ..◦ ....................... .
◦
........................................◦.................................... . . . .
.....◦
◦
◦
◦
◦
◦
0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Water As
Smoother fits. Bandwidth chosen by inspection.
2.5
........................... Normal
....................................................◦
..
. . . . . Box
2
1.5
Nail
As
1
◦ ...............◦
.....................
....
. ..... ...
. ... ...
. ..... ...
.
. ....... ...
.
...
.... ...
..
0.5 . ◦
........... . ....
.........
. ...... . ...........◦
......................................................
. ....
◦ . .
....
◦
◦ ...
.
... ◦◦
.
.....
...........
......◦
.◦
◦
◦
◦
◦
0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Water As
Smoother fits. Bandwidth chosen too small.
Kernel and Local Regression Smoothing 153
Example 9.2.2 The solution using the default proportion of the data
0.75 appears to under-smooth the data, and raising this parameter above
1 reduces curvature, but not by much. Return again to the arsenic data
previously examined in Example 9.2.1. Again, plot the data in R using
attach(arsenic)
plot(water,nails,main="Nail and Water Arsenic Levels",
xlab="Water",ylab="Nails", sub="Loess fits")
lines(x,y=predict(loess(nails~water,span=1),x),lty=2)
2.5
........................... Default span .75
.◦
..
. . . . . Span 1 ...
......
..
..........
2 ...................... Span 10 ...
.......
.......
........
..
...........
.... ..
.... .....
..... ....
.... .....
. ..... ........
1.5 ....
...
..... ..........
...
.
9.4 Splines
A spline is a smooth curve approximating the relationship between an explana-
tory variable x and a response variable y, based on observed pairs of points
(X1 , Y1 ), . . . , (Xn , Yn ), constructed according to the following method, in order
to describe the dependence of y on x between two points x0 and xN . One first
picks N − 1 intermediate points x1 < x2 < · · · < xN −2 < xN −1 . The interme-
diate points are called knots. One then determines a polynomial of degree M
between xj−1 and xj , constrained so that the derivatives of order up to M − 1
match up at knots. Denote the fitted mean by ĝ(x).
156 Regression Function Estimates
2.5
. . . . . . . . . . . . . . . . . . .◦
.
.
.
2 .
.
.
.
.
.
1.5 .
Nail .
.
As .
.
.
1 .
◦ .
◦ .
. . . . . . . . . . . . . . . . . .
.
.
.◦
0.5 .
.
◦
◦◦ .
◦ .. . . .◦◦
.
◦.◦ ◦
◦◦
◦
0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Water As
Isotonic Regression Fit.
Taken to an extreme, if all Xj are unique, then one can fit all n points with
a polynomial of degree n − 1; this, however, will yield a fit with unrealistically
extreme fluctuations. Instead, choose the polynomials to minimize
n
X Z X(n)
(Yj − ĝ(Xj ))2 + λ |ĝ 00 (x)| dx;
j=1 X(1)
Example 9.4.1 Again revisit the arsenic data of Example 9.2.1. The
spline is fit using R as
attach(arsenic)
plot(water,nails, main="Nail Arsenic and Water Arsenic")
hgrid<-min(water)+(0:50)*diff(range(water))/50
lines(predict(spl<-smooth.spline(water,nails),hgrid))
lines(predict(smooth.spline(water,nails,spar=.1),hgrid),
lty=3)
lines(predict(smooth.spline(water,nails,spar=.5),hgrid),
lty=2)
legend(1250,1110,lty=1:3,col=1:3,
legend=paste("Smoothing",c(round(spl$spar,3),.5,.1),
c("Default","","")))
detach(arsenic)
Quantile Regression 157
Here hgrid is a set of points at which to calculate the spline fit. As with
most of the smoothing methods, R contains a lines method that will plot
predicted values directly. See Figure 9.5. Smoothing parameters smaller
than optimal yield over-fitted curves.
2.5
........................... Smoothing 0.934 Default ..
.◦
...
. . . . . Smoothing 0.5 .
.
....
...
.
.
2 ...................... Smoothing 0.1 ......
..
......
......
........
.
...
.......
.....
.........
1.5 ..
....
..
.... .......
..
.
...
.......
.
Nail ...
..
. ......
.. ......... ... ......
.... ...... ......... .... . ........
As .. ..
.. ..
.... ..
......
.
.....
......
.... ..... ........
.... ...... ........
1 .......
..
.....
.....
.
............
.......
...
..... ... .......
◦ ◦ ..... ........
...
.... ...... .........
... ...... ..
..............
.. ....... ......
......... ...........
... .......... ..........
....
◦ ...........
0.5 ...
.
.... ......
..... ......... ...
...... ...................... ◦
..
.............
.
◦ . ... ..................
◦ ..................◦
◦ ...
..◦
....
.........
.......◦
.◦ ...
.
◦◦. ................
◦
◦
0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Water As
Spline Fit
When comparing (9.11) with the standard least-squares criterion (9.3), note
the absence of ·2 in (9.11).
Equivalently, one might minimize
X
−
e+j + ej , (9.12)
j
158 Regression Function Estimates
for
− −
e+ +
j ≥ 0, ej ≥ 0, ej − ej = Yj − β1 − β2 Xj ∀j. (9.13)
The objective function is not differentiable; the optimum is where (9.12)
is either at a point, or flat, making the optimizer not unique.
4.5
...
...
...
...
....
4 ....
....
....
....
....
....
....
....
....
....
....
....
Sum of 3.5 ....
....
....
....
Absolute ....
....
....
....
Residual ....
....
.....
Values 3 .....
.....
.....
..
...
..... ..
..... ....
.....
..... ...
.....
..... ...
..... ..
..... ....
..... ..
.....
2.5 ..... ....
.......
2
0 5 10 15 20
Regression Parameter
Intercept Fixed at Optimum
TABLE 9.1: Achieved coverage and average interval length for 90% confidence
intervals for L1 and L2 regression, with various error distributions
ables with the smaller sample size, and longer for Cauchy variables and for
Laplace and exponential variables with the larger sample size.
One can adapt this technique to fit quantiles of the errors other than the
median. One can replace the objective (9.12) by
X X
γ e+
j + (1 − γ) e−
j , (9.14)
j j
still subject to constraints (9.13). This change causes the regression line to
run through the 1 − γ quantile of Y |X.
Example 9.5.1 Consider again the arsenic data of Example 9.2.1. The
quantile regression fitter is found in R in library quantreg. The following
commands fit this model:
library(quantreg)#Need for rq
rq(arsenic$nails~arsenic$water)
to give the results
Call: rq(formula = nails ~ water)
Coefficients:
coefficients lower bd upper bd
(Intercept) 0.11420 0.10254 0.14093
water 15.60440 3.63161 18.07802
Hence the best estimate of the slope of the linear relationship of arsenic
in nails to arsenic in water is 15.604, with 90% confidence interval
(3.632,18.078). This confidence level is the default for rq. Figure 9.6
is drawn using
attach(arsenic)
rqo<-summary(rq(nails~water))$coef
regp<-rqo[2,2]+diff(rqo[2,2:3])*(0:100)/100
m<-apply(abs(outer(nails,rep(1,length(regp)))
- outer(water,regp)-rqo[1,1]),2,sum)
plot(regp,m,type="l",xlab="Regression Parameter",
ylab="Sum of absolute values of residuals",
main="L1 fit for Arsenic Large Scale",
sub="Intercept fixed at optimum")
detach(arsenic)
and demonstrates the estimation of the slope; in this figure the intercept
is held at its optimal value.
Compare this to the Theil-Sen estimator:
attach(arsenic)
library(deming); theilsen(nails~water,conf=.9)$coef
yielding
Quantile Regression 161
(Intercept) water
0.1167158 14.2156504
If the linear model fits, and if the variance of the errors does not depend on
the explanatory variable, then the lines representing various quantiles of the
distribution of the response conditional on the explanatory variable will be
parallel, but this parallelism need not hold for the estimates in any particular
data set, as can be seen in the next example.
Example 9.5.2 Consider the blood pressure data set of Example (6.4.2).
Quantile regression can be used to model the median systolic blood pres-
sure after treatment in terms of systolic blood pressure before treatment.
#Expect warnings about nonunique estimators. This is OK.
rqout<-rq(bp$spa~bp$spb); summary(rqout)
yielding
coefficients lower bd upper bd
(Intercept) -11.41026 -34.76720 29.00312
bp$spb 0.94872 0.69282 1.11167
Hence blood pressure after treatment increases with blood pressure before
treatment; the best estimate indicates that a one-for-one increase is plau-
sible, in that 1 sits inside the confidence interval. One can also fit the .2
quantile:
rqoutt<-rq(bp$spa~bp$spb,tau=.2); summary(rqoutt)
to obtain
Coefficients:
coefficients lower bd upper bd
(Intercept) 17.25000 -68.48262 23.34294
bp$spb 0.75000 0.40129 1.20640
One can plot these relationships:
plot(bp$spb,bp$spa,main="Systolic Blood Pressure",
xlab="Before Treatment",ylab="After Treatment")
162 Regression Function Estimates
abline(rqout);abline(rqoutt,lty=2)
legend(150,200,legend=c("Median Regression",
".2 Quantile Regresson"),lty=1:2)
Results are plotted in Figure 9.7. Note the extreme lack of parallelism.
◦
200 ........................................... Median Regression
. . . . . . . .2 Quantile Regression .
......
......
......
......
.
...
.......
....
......
......
180 ◦.............
...
◦
....
.
.
... . .
.
......
..
.. . .
...
....... . . .
.
...... .
◦ ............. . .
◦.
..
....◦
After ◦ .
........
. . . .
..
...... . .
Treat- 160 ......
...... . .
...... . .
◦ ......
ment .
......
..
...
. .
.
...... . .
......
...... ◦ . .
.
...
....... .
.....
..◦ ◦.
...... . . ◦
...... . .
......
..
........ . .
.
140 .
......
..
...... . .
.
.
◦ ...... . .
...... .
...... .
..
......... .
.
..... . ◦
....... .
..◦
.......
......
..
........
.
.
............
. ........
120 ......
library(MASS); rlmout<-rlm(spa~spb,data=bp)
library(quantreg); rqout<-rq(spa~spb,data=bp)
plot(bp$spb,bp$spa,
main="Blood Pressure After and Before Treatment",
xlab="Systolic Blood Pressure Before Treatment (mm HG)",
ylab="Systolic Blood Pressure After Treatment (mm HG)",
sub="Linear Fits to Original Data")
abline(rlmout,lty=2)
abline(rqout,lty=3)
abline(lm(spa~spb,data=bp),lty=4)
legend(min(bp$spb),max(bp$spa),
legend=c("Huber","Median","OLS"),lty=2:4)
Compare this with results for a contaminated data set. Change the
response variable for the first observation to an outlier, and refit.
bp$spa[1]<-120
rqout<-rq(bp$spa~bp$spb);rlmout<-rlm(bp$spa~bp$spb)
plot(bp$spb,bp$spa,
xlab="Systolic Blood Pressure Before Treatment (mm HG)",
ylab="Systolic Blood Pressure After Treatment (mm HG)",
sub="Linear Fits to Perturbed Data")
abline(rlmout,lty=2);abline(rqout,lty=3);
abline(lm(spa~spb,data=bp),lty=4)
legend(min(bp$spb),max(bp$spa),
legend=c("Huber","Median","OLS"),lty=2:4)
These fits are shown in Figure 9.8. The median regression line is slightly
different in the two plots, although if one does not look closely one might
miss this. The least squares fit is radically different as a result of the shift
in the data point. The Huber fit is moved noticeably, but not radically.
164 Regression Function Estimates
FIGURE 9.8: Blood Pressure after and before Treatment, Original and Per-
turbed Data
◦
200 . . . . . . . Huber
.................................. Median ..
........
............ .
........................................ OLS ..
..
................. .
.
.
.... .
..............
.......... .
........ ◦
180 ◦................
......
........
.......
Systolic ......
.
...
.
...........
.
.........
Blood ◦ ........................ ◦
..........◦
..
◦ .................
Pressure ..
. . ...
.
...........
.
.
.
........
After 160 ◦
.............
..............
..............
Treatment ..
..
....................
....... ....
(mm HG) ....... ..... ◦
....... ......
................
..
...............◦
. . ◦
..... ... ◦
....... .....
....... .....
...
.
....... .........
140 ...
.
...... ....
.
..... ........
◦ ........ .....
....... ......
........ .....
..
........ ..........
.... .. ◦
....... ......
....... .........◦
........ ..
.
...
......... ..........
. ....
.....
.....
120 .....
Treatment ....
....... ....................
. ..
...... ...........
...........................
(mm HG) ...
................... ◦
.....................
...
...
........... .....◦ ◦
... .
........... . ..... ◦
........... . . ......
........... . . ..........
.
............ . ..
140 ......... .....
. .........
...
...
. . .
........
.... .
..
........... ◦ . . ............
. . ......
. . .......
.
.....
... ◦
. . ...... ◦
......
......
. .
.. ......
.....
120 ◦
9.7 Exercises
1. The data at
http://lib.stat.cmu.edu/datasets/CPS 85 Wages
reflects wages from 1985. The first 42 lines of this file contain a
description of the data set, and an explanation of variables; delete
these lines first, or skip them when you read the file. The first six
fields are numeric. The first is educational level, and the sixth is
hourly wage; you can skip everything else.
a. Fit a relationship of wage on educational level, in such a way as
to minimize the sum of absolute value of residuals. Plot this, and
compare with the regular least squares regression.
b. Fit a relationship of wage on educational level, in such a way as
to enforce monotonicity. Superimpose a plot of this relationship on
a plot of the data.
c. Fit a kernel smoother to estimate the dependence of wage on edu-
cational level, and compare it to the results for the loess smoother.
d. Fit a smoothing spline to estimate the dependence of wage
on educational level, and compare it to the results for the loess
smoother and the kernel smoother.
2. Repeat the analysis of Example 9.2.1 for arsenic levels on the square
root scale. As an intermediate step, calculate the optimal bandwidth
for the local linear smoother. Plot your results. Compare with the
results in Figure 9.1.
10
Resampling Techniques
167
168 Resampling Techniques
tions about the parametric shape of its distribution. Let G(θ̂; F ) represent
the desired, but unobservable, distribution of θ̂ computed from independent
random vectors Z1 , . . . , Zn , with each random vector Zi having a distribu-
tion function F . Assume that this distribution depends on F only via the
parameter of interest, so that one might write G(θ̂; F ) = H(θ̂; θ) for some
function H.
Consider constructing confidence intervals using the argument of §1.2.2.1.
Using (1.18) with T = θ̂, a 1 − α confidence interval for θ satisfies H(θ̂; θL ) =
1 − α/2 and H(θ̂; θU ) = α/2. The function H, as a function of θ̂, is unknown,
and will be estimated from the data. Since observed data all arise from a
distribution governed by a single value of θ, the dependence of H(θ̂; θ) on
θ cannot be estimated. An assumption is necessary in order to produce a
confidence interval. Assume that
H is to θ as H ∗ is to θ̂. (10.2)
Then the standard deviation estimate ςˆ is the sample standard deviation of the
bootstrap samples (Efron, 1981), and a 1 − α confidence interval is θ̂ ± ςˆz1−α/2
for ςˆ the sample standard deviation of bootstrap samples (10.5).
(θ̂ − vU ≤ θ ≤ θ̂ − vL ),
Then h i
P∗ uL − θ̂ ≤ θB,i − θ̂ ≤ uU − θ̂ = 1 − α, (10.7)
Univariate Bootstrap Techniques 171
This is the basic bootstrap confidence interval (Davison and Hinkley, 1997, p.
29).
(Efron, 1981).
This method is referred to as the percentile method (Efron, 1981), or as the
usual method (Shao and Tu, 1995, p. 132f). If the parameter θ is transformed
to a new parameter ϑ, using a monotonic transformation, then the bootstrap
samples transform in the same way, and so the percentile method holds if
there exists a transformation that can transform to symmetry, regardless of
whether one knows and can apply this transformation.
Example 10.2.2 Return again to the nail arsenic values of the previous
example. We again generate a confidence interval for the median. The
BCa method, and the previous two methods, may be obtained using the
package boot.
library(boot)#gives boot and boot.ci.
#Define the function to be applied to data sets to get
#the parameter to be bootstrapped.
boot.ci(boot(arsenic$nails,function(x,index)
return(median(x[index])),9999))
Univariate Bootstrap Techniques 173
to give
In this case, the normal and percentile intervals are suspect, because of
the asymmetry of the distribution. The more reliable interval is the bias
corrected and accelerated interval.
Recall that an exact confidence interval may be constructed using
library(MultNonParam); exactquantileci(arsenic$nails)
to obtain the interval (0.118, 0.354). Efron (1981) notes that this exact
interval will generally agree closely with the percentile bootstrap approach.
One can use the bootstrap to generate intervals for more complicated statis-
tics. The bootstrap techniques described so far, except for the percentile
method, presume that parameter values over the entire real line are possi-
ble. One can account for this through transformation.
boot.ci(boot(arsenic$nails,logscale,99999))$bca
gives the BCa 0.95 interval (-1.647,-0.096) for the log of standard devi-
ation of nail arsenic.
FIGURE 10.1: Boot Strap Samples for Log of Nail Arsenic Standard Deviation
2
.....
....
.. ..
.. ..
.. ..
... ....
.
.. ..
.. ....
... ...
.... ....
.
.. ..
.. ..
1.5 .. ...
.. ..
.... ....
. ..
... ....
... ...
.... ...
...
..
..... ...
...
.. ...
... ...
Density ... .....
...
.. .. ..
1 ...
..
..
... ... ....
.. ...
... ... ... ..
.
.. .. ... ...
.. .. .. ...
.. ... .. ..
.. .. ... ...
... .. ..
... .. ..
.
. ..
.......... .. ... .. ...
. . ... ... ..
..
. ... ..
. .
.... ...
..
. ... ..
. ..
... ..
. ..
. ..
..
.. .
0.5 .
.
........
.. ..
...
.
.
..
.
...
..
..
. ... ..
. ..
..
. ... ..
... ... .. ..
..
.. ... .
. ..
... .
. ..
.. .
. ..
... .. ... ..
.. .. .
. ...
.. .. .
. ..
.. .
... .
. ..
.. ... .
.
...
..........................
. . .
.
.
...
...
..
...
...
...
...
...... .... ..
..
...
.. ........
....................................
0 ...........
Pearson correlation between first and second twin brain volumes. First,
define the correlation function
rho<-function(d,idx) return(cor(d[idx,1],d[idx,2]))
and calculate the bootstrap samples
bootout<-boot(brainpairs[,c("v1","v2")],rho,9999)
for pairs of data. Figure 10.2 presents a histogram of the sample, with
vertical lines marking the estimate and percentile confidence interval end
points:
hist(bootout$t,freq=FALSE)
legend(-.4,6,lty=1:2,
legend=c("Estimate","Confidence Bounds"))
abline(v=bootout$t0)
abline(v=sort(bootout$t)[(bootout$R+1)*c(.025,.975)],lty=2)
Confidence intervals may be calculated using
boot.ci(bootout)
to give
Level Normal Basic
95% ( 0.7463, 1.1092 ) ( 0.8516, 1.1571 )
confidence intervals for exp(θ) are not the exponentiation of Studentized boot-
strap confidence intervals for θ, and, furthermore, the quality of the Studen-
tized bootstrap confidence intervals depends on the effectiveness of Studenti-
zation.
Example 10.3.3 Refer again to the brain size data of Example 5.2.1.
In order to apply the fixed-X bootstrap, first calculate fitted values and
residuals:
residuals<-resid(lm(brainpairs$v2~brainpairs$v1))
fittedv<-brainpairs$v2-residuals
Example 10.3.4 Refer again to the brain size data of Example 5.2.1.
Modify the approach of Example 10.3.3:
Example 10.3.5 Refer again to the chicken weight gain data of Exam-
ple 5.4.2. Use fixed-X bootstrap techniques to bound the largest group
mean minus the smallest group mean, when data are split by protein
level.
The Jackknife 181
meandiff<-function(data,indices){
fitv<-fitted(lm(data[,1]~as.factor(data[,2])))
residv<-data[,1]-fitv
y<-fitv+residv[indices]
return(diff(range(unlist(lapply(split(y,data[,2]),
"mean")))))
}
attach(chicken)
cat("Bootstrap CI for maximal difference in means\n")
boot.ci(boot(cbind(weight,lev),meandiff,9999))
Because the design is balanced, with an equal number of chickens at each
level of diet, no adjustment for different standard deviations of the resid-
uals is needed. Because the fixed-X bootstrap is employed, Studentization
of the resulting mean differences is not necessary. The resulting BCa
confidence interval is (85.6, 769.5).
A similar approach might have been employed for inference about quantities
like the R2 .
and
00
E [g(Wn )] ≈ g(ω) + g (ω)Var [Wn ] /2
000
+g (ω)E (Wn − ω)3 /6 + g 0000 (ω)E (Wn − ω)4 /24.
√
Then (10.10) holds, and (10.11) holds if the skewness of Wn times n con-
verges to 0.
n
X n
X
∗
T̄n−1 = Xj /(n(n − 1)) = X̄,
i=1 j=1,j6=i
Hence
∗ n+1 n+1 n−1
T̄n−1 − Tn = 4n X((n−1)/2) + 4n X((n+3)/2) + 2n X((n+1)/2) −
X((n+1)/2)
n+1
= X((n−1)/2) + X((n+3)/2) − 2X((n+1)/2) , (10.12)
4n
and the bias estimate is
∗ n2 − 1
B = (n − 1)(T̄n−1 − Tn ) = X((n−1)/2) + X((n+3)/2) − 2X((n+1)/2) .
4n
Example 10.4.1 Consider again the nail arsenic data of Example 2.3.2.
Calculate the Jackknife estimate of bias for the mean, median, and
trimmed mean for these data. Jackknifing is done using the bootstrap
library, using the function jackknife.
library(bootstrap)#gives jackknife
jackknife(arsenic$nails,median)
This function produces the 21 values, each with one observation omitted:
$jack.values
[1] 0.2220 0.2220 0.2220 0.2220 0.1665 0.1665 0.2220 0.2220
[9] 0.1665 0.2220 0.2220 0.1665 0.1665 0.1665 0.1665 0.1665
[17] 0.1665 0.2220 0.1665 0.2220 0.2135
Each of these values is the median of 20 observations. Ten of them, corre-
sponding to the omission of the lowest ten values, are the averages of X(10)
and X(11) . Ten of them, corresponding to the omission of the highest ten
values, are the averages of X(9) and X(10) . The last, corresponding to the
omission of the middle value, is the average of X(9) and X(11) . The mean
of the jackknife observations is 0.1952. The sample median is 0.1750, and
the bias adjustment is 20 × (0.1952 − 0.1750) = 20 × 0.0202 = 0.404, as
is given by R:
$jack.bias
184 Resampling Techniques
[1] 0.4033333
This bias estimate for the median seems remarkably large. From, (10.12)
the jackknife bias estimate for the median is governed by the difference
between the middle value and the average of its neighbors. This data set
features an unusually large gap between the middle observation and the
one above it.
Applying the jackknife to the mean via
jackknife(arsenic$nails,mean)
gives the bias correction 0:
$jack.bias
[1] 0
as predicted above. One can also jackknife the trimmed mean:
jackknife(arsenic$nails,mean,trim=0.25)
The above mean is defined as the mean of the middle half of the data, with
0.25 proportion trimmed from each end. Although the conventional mean
is unbiased, this unbiasedness does not extend to the trimmed mean:
$jack.bias
[1] -0.02450216
In contrast to the arsenic example with an odd number of data points,
consider applying the jackknife to the median of the ten brain volume
differences, from Example 5.2.1:
attach(brainpairs);jackknife(diff,median)
to give 0 as the bias correction.
The average effects of such corrections may be investigated via simulation.
Table 10.2 contains the results of a simulation based on 100,000 random sam-
ples for data sets of size 11. In this exponential case, the jackknife bias cor-
rection over-corrects the median, but appears to address the trimmed mean
exactly.
Under some more restrictive conditions, one can also use this idea to esti-
mate the variance of T .
Exercises 185
10.5 Exercises
1. The data set
http://ftp.uni-bayreuth.de/math/statlib/datasets/lupus
http://ftp.uni-
bayreuth.de/math/statlib/datasets/federalistpapers.txt
stat.rutgers.edu/home/kolassa/Data/federalistpapers.txt).
187
188 Analysis Using the SAS System
Example 2.3.3: The proc univariate call above gives intervals for a variety
of quantiles, including quartiles, but not for arbitrary user-selected quantiles.
The following code gives arbitrary quantiles. Manually edit the code below to
replace 21 with the actual number of observations, 0.025 with half the desired
complement of the confidence, and .75 with the desired quantile.
proc sort data=arsenic; by nails; run;
data ci; set arsenic;
a=quantile("binomial",.025,.75,21);
b=21+1-quantile("binomial",.025,1-.75,21);
if _N_=a then output; if _N_=b then output; run;
title ’Upper quartile .95 CIs for nail arsenic’;
proc print data=ci; run;
Example 3.4.1: This example applied the Wilcoxon rank sum test to investi-
gating differences between the strengths of yarn types. At present, analysis is
restricted to bobbin 3, since this data set did not have ties. These score tests
are calculated using proc npar1way , with the option wilcoxon. The first has
continuity correction turned off, with the option correct=no. Correction is on
by default, and so no such option appears in the second run, with correction.
Exact values are given using the exact statement, followed by the test whose
exact p-values are to be calculated.
title ’Wilcoxon Rank Sum Test, yarn strength by type, bobbin 3’;
title2 ’Approximate values, no continuity correction’;
proc npar1way data=yarn1 wilcoxon correct=no ;
class type; var strength; run;
title2 ’Approximate values, continuity correction’;
proc npar1way data=yarn1 wilcoxon ; class type;
var strength; run;
title2 ’Exact values’;
proc npar1way data=yarn1 wilcoxon ; class type;
exact wilcoxon; var strength; run;
Example 3.4.3: Permutation testing may be done using proc npar1way and
scores=data:
title ’Permutation test for Arsenic Data’;
proc npar1way data=arsenic scores=data ;
class sex; exact scores=data; var nails; run;
Tables 3.4 and 3.5: Test sizes and powers may be calculated using the macro
test2 in the file common.sas, as above. First and second arguments are the
group sizes. The third argument is the Monte Carlo sample size. The last
argument is the offset between groups.
title ’Monte Carlo Assessment of Test Sizes’;
%include "/folders/myfolders/common.sas";
190 Analysis Using the SAS System
Example 3.9.1: The npar1way procedure also performs the Siegel-Tukey and
Ansari-Bradley tests, using the keywords ab and st in the statement.
proc npar1way data=yarn ab st ; class type; var strength; run;
Examples 4.3.1 and 4.4.1: The Kruskal-Wallis test, and the variants using
the Savage and the van der Waerden scores, can be performed using proc
npar1way.
data maize; infile ’/folders/myfolders/T58.1’ ;
input exno tabno lineno loc $ block $ plot
treat $ ears weight;
nitrogen=substr(treat,1,1); if weight<0 then weight=.; run;
data tean; set maize; if loc="TEAN" then; else delete;
if weight=. then delete; run;
title ’Kruskal Wallis H Test for Maize Data’;
proc npar1way wilcoxon savage vw data=tean plots=none;
class treat; var weight; /* exact;*/ run;
The option exact is commented out above; depending on the configura-
tion of the example, exact computation can be very slow, and resource-
intensive enough to make the calculations fail. The option wilcoxon triggers
the Kruskal-Wallis test, since it uses Wilcoxon scores. These calculations may
be compared with the standard Analysis of Variance approach.
proc anova data=tean; class treat; model weight=treat; run;
Example 4.4.2: Here are two calls to npar1way that give in principle the same
answer. The exact command causes exact p-values to be computed. If you
Analysis Using the SAS System 191
specify scores (in this case data) in the exact statement, you’ll get that exact
p-value. If you don’t specify scores, you’ll get the exact p-value for the scores
specified in the proc npar1way statement. If you don’t specify either, you’ll
get ranks without scoring. As we saw before, exact computations are hard
enough that SAS quits. The second proc npar1way presents a compromise:
random samples from the exact distribution. Give the sample size you want
after /mc . The two calls give different answers, since one is approximate.
Successive calls will give slightly different answers.
Example 4.7.5: Here we test for an ordered effect of nitrogen, and compare
with the unordered version.
title ’Jonckheere-Terpstra Test for Maize’;
proc freq data=tean noprint; tables weight*nitrogen/jt;
output out=jttab jt; run;
proc print data=jttab noobs; run;
title ’K-W Test for Maize, to compare with JT’;
proc npar1way data=tean plots=none; class nitrogen;
var weight; run;
Examples 5.4.1 and 5.6.1: Perform a paired test on brain volume, assuming
symmetry. First, read the data:
data expensesd; infile ’/folders/myfolders/friedman.dat’;
input cat $ g1 g2 g3 g4 g5 g6 g7; run;
192 Analysis Using the SAS System
Convert to a set with one observation per line, with level l as well:
Example 8.2.1: SAS does density estimation via proc univariate. Only graph-
ical output is required, and so noprint is specified. The histogram is plotted
by default.
proc univariate data=arsenic noprint;
histogram nails/kernel (k=normal )
kernel(k=triangular )
kernel(k=quadratic ) ; run;
title ’Kernel density est. with an excessively small bandwidth’;
proc univariate data=arsenic noprint;
histogram nails/kernel (k=normal color=blue c=.001) ; run;
The following is another way to draw density estimates. This method allows
only the normal kernel. Syntax has changed with versions of SAS, and so
this will run only with recent versions. There is no reason to see the text
output, but suppressing text output also suppresses graphics. There is no
overall noprint option.
title1 ’Kernel smoother for arsenic via kde’;
proc kde data=arsenic ; univar nails/plots=density; run;
Example 9.3.1: Various sources suggest that isotonic regression can be done
with the macro at
http://www.bios.unc.edu/distrib/redmon/SASv8/redmon2.sas .
Include the macro using
%include "redmon2.sas"
I was not able to make this work.
Example 9.4.1: Fit the spline using
Example 9.5.2: Fit the median and 0.2 quantile regression estimates:
%boot(data=arseniarsenic,samples=999);
and after creating the bootstrap samples, evaluate the confidence interval
using %bootci. This macro has at least two arguments. The first is a keyword.
Use Hybrid for the residual method, pctl for the percentile method, t for
the Studentized method, and BCa for BCa. Do not put quotes around the
keyword. Second is the name of the variable in the data set passed from the
sampler to the interval creator giving the name of the statistic on which to
do calculations. Third (if present, and using argument name student) is the
divisor for Studentization. This quantity must be a variable in the data set
created by %analyze.
196 Analysis Using the SAS System
Example 10.4.1: Again we’ll use the SAS input file as above. This works sim-
ilarly to the R jackknife. You need to tell the macro what statistic to use. In
R this is via a function, but in SAS this is via a macro.
%macro analyze(data=,out=);
proc means noprint median data=&data vardef=n;
output out=&out(drop=_freq_ _type_) median=med ;
var nails ;
%bystmt;
run;
%mend;
%jack(data=arsenic,chart=0)
B
Construction of Heuristic Tables and
Figures Using R
Some of the preceding tables and figures were presented, not as tools in the
analysis of specific data sets, but as tools for comparing and describing various
statistical methods. Calculations producing these tables and figures in R are
given below. Some other of the preceding tables and figures were specific to
certain data sets, but are of primarily heuristic value and are not required
for the data set to which they apply. Commands to produce these tables
and figures are also given below. Before running these commands, load two
R packages, MultNonParam from CRAN, and NonparametricHeuristic from
github:
install.packages(c("MultNonParam","devtools"))
library(devtools)
install_github("kolassa-dev/NonparametricHeuristic")
library(MultNonParam); library(NonparametricHeuristic)
Figure 1.1:
fun.comparedensityplot()
Table 2.1:
library(VGAM)#vgam gives laplace distribution
library(BSDA)#Library gives z test
#Below the shift by -.5 for the uniform centers it at 0.
level<-fun.comparepower(samp1sz=c(10,17,40),nsamp=100000,
dist=list("rnorm","rcauchy","rlaplace","runif"),
hypoth=c(0,0,0,-.5),alternative=c("two.sided","greater"))
print(level)
Figure 2.1:
fun.studentizedcaucyplot(10,10000)
Table 2.2:
fun.achievable()
197
198 Construction of Heuristic Tables and Figures Using R
Figure 2.2:
drawccplot()
Table 2.3:
mypower<-array(NA,c(4,3,3))
nobsv<-c(10,17,40)
for(jj in seq(length(nobsv))){
temp<-fun.comparepower(samp1sz=nobsv[jj], nsamp=100000,
dist=list("rnorm","rcauchy","rlaplace"),
hypoth=(1.96+.85)*c(1,sqrt(2),1)/sqrt(nobsv[jj]))
if(jj==1) dimnames(mypower)<-list(dimnames(temp)[[1]],
dimnames(temp)[[2]],as.character(nobsv))
mypower[,,jj]<-temp[,,1,1,1]
}
cat("\nPower for T and Sign Tests \n")
print(mypower)
Table 2.4:
library(VGAM); testare(.5)
Table 3.4:
fun.comparepower(samp1sz=10,samp2sz=10,altvalue=0)
Table 3.5:
fun.comparepower(samp1sz=10,samp2sz=10,altvalue=1)
Figure 4.3:
powerplot()
Figure 5.1:
hodgeslehmannexample()
Figure 6.2:
x<-(-10):10; y<-x ; z<-5*atan(x)
plot(range(x),range(c(y,z)),type="n")
points(x,y,pch=1); points(x,z,pch=2)
legend(0,-5,pch=1:2,legend=c("y=x","y=5 atan(x)"))
Construction of Heuristic Tables and Figures Using R 199
Figure 7.1:
y<-x<-rnorm(100)
coin<-rbinom(100,1,.5)
y[coin==0]<--x[coin==0]
par(oma=c(0,0,3,0))
par(mfrow=c(2,2))
p1<-qqnorm(x,main="Marginal Distribution for X")
p2<-qqnorm(y,main="Marginal Distribution for Y")
p3<-qqnorm((x+y)/sqrt(2),
main="Marginal Distribution for (X+Y)/sqrt(2)")
p4<-qqnorm((x-y)/sqrt(2),
main="Marginal Distribution for (X-Y)/sqrt(2)")
Table 9.1:
distv<-c("rnorm","rcauchy","rlaplace","runif","rexp")
t1<-fun.testreg(dists=distv)
t2<-fun.testreg(dists=distv,npergp=50)
Warnings in the calculation of t1 note non-uniqueness of the quantile regres-
sion solutions.
Table 10.1:
fun.testboot(function(x,index) return(median(x[index])),
alpha=.1,sampsize=20, mcsamp=1000,
dists=c("rexp","runif","rcauchy"),true=c(log(2),.5,0))
Table 10.2:
201
202 Bibliography
SAS Institute Inc. (2017). SAS/STAT 14.3 User’s Guide. Cary, NC: SAS
Institute Inc.
211
212 Index