Introduction To Bios Tatis Tic S Second
Introduction To Bios Tatis Tic S Second
Introduction To Bios Tatis Tic S Second
D O V E R P U B L I C A T I O N S , INC.
Mineola, New York
Copyright
Bibliographical Note
T h i s D o v e r e d i t i o n , first p u b l i s h e d in 2009, is a n u n a b r i d g e d r e p u b l i c a t i o n of
t h e w o r k originally p u b l i s h e d in 1969 by W . H . F r e e m a n a n d C o m p a n y , N e w
Y o r k . T h e a u t h o r s h a v e p r e p a r e d a new P r e f a c e f o r this e d i t i o n .
S o k a l , R o b e r t R.
I n t r o d u c t i o n t o Biostatistics / R o b e r t R. S o k a l a n d F. J a m e s R o h l f .
D o v e r ed.
p. c m .
O r i g i n a l l y p u b l i s h e d : 2 n d ed. N e w Y o r k : W . H . F r e e m a n , 1969.
I n c l u d e s b i b l i o g r a p h i c a l r e f e r e n c e s a n d index.
I S B N - 1 3 : 978-0-486-46961-4
I S B N - 1 0 : 0-486-46961-1
I. B i o m e t r y . I. R o h l f , F. J a m e s , 1936- II. Title.
Q H 3 2 3 . 5 . S 6 3 3 2009
570.1 '5195 dc22
2008048052
M a n u f a c t u r e d in the U n i t e d S t a l e s of A m e r i c a
D o v e r P u b l i c a t i o n s , Inc., 31 Fast 2nd Street, M i n e o l a , N . Y . 1 1501
to Julie and Janice
Contents
PREFACE xiii
1. INTRODUCTION 1
1.1 Some definitions 1
1.2 The development of bioslatistics 2
1.3 The statistical frame of mindzyxwvutsrqponmlkjihgfedcbaZYXWV
4
2. D A T A IN B i O S T A T l S T I C S 6
2.1 Samples and populations 7
2.2 Variables in biostatisties 8
2.3 Accuracy and precision of data 10
2.4 Derived variables 13
2.5 Frequency distributions 14
2.6 The handling of data 24
3. D E S C R I P T I V E STATISTICS 27
3.1 The arithmetic mean 28
3.2 Other means 31
3.3 The median 32
3.4 The mode 33
3.5 The range 34
3.6 The standard deviation 36
3.7 Sample statistics and parameters 37
3.Ν Practical methods for computing mean and standard
deviation 39
3.9 The coefficient of variation 43
V1U CONTENTS
4. I N T R O D U C T I O N TO PROBABILITY DISTRIBUTIONS:
T H E B I N O M I A L A N D P O I S S O N D I S T R I B U T I O N S 46
4.1 Probability, random sampling, and hypothesis testing 48
4.2 The binomial distribution 54
4.3 The Poisson distribution 63
APPENDIXES 314
AI Mathematical appendix 314
A2 Statistical tables 320
BIBLIOGRAPHY 349
INDEX 353
Preface to the Dover Edition
We are pleased and honored to see the re-issue of the second edition of our Introduc-
tion to Biostatistics by Dover Publications. On reviewing the copy, we find there
is little in it that needs changing for an introductory textbook of biostatistics for an
advanced undergraduate or beginning graduate student. The book furnishes an intro-
duction to most of the statistical topics such students are likely to encounter in their
courses and readings in the biological and biomedical sciences.
The reader may wonder what we would change if we were to write this book anew.
Because of the vast changes that have taken place in modalities of computation in the
last twenty years, we would deemphasize computational formulas that were designed
for pre-computer desk calculators (an age before spreadsheets and comprehensive
statistical computer programs) and refocus the reader's attention to structural for-
mulas that not only explain the nature of a given statistic, but are also less prone to
rounding error in calculations performed by computers. In this spirit, we would omit
the equation (3.8) on page 39 and draw the readers' attention to equation (3.7) instead.
Similarly, we would use structural formulas in Boxes 3.1 and 3.2 on pages 41 and 42,
respectively; on page 161 and in Box 8.1 on pages 163/164, as well as in Box 12.1
on pages 278/279.
Secondly, we would put more emphasis on permutation tests and resampling methods.
Permutation tests and bootstrap estimates are now quite practical. We have found this
approach to be not only easier for students to understand but in many cases preferable
to the traditional parametric methods that are emphasized in this book.
Robert R. Sokal
F. James Rohlf
November 2008
Preface
Robert R. Sokal
F. J a m e s Rohlf
INTRODUCTION TO
BIOSTATISTICS
CHAPTER
Introduction
This c h a p t e r sets the stage for your study of biostatistics. In Section 1.1, we
define the field itself. We then cast a necessarily brief glance at its historical
development in Section 1.2. T h e n in Section 1.3 we conclude the c h a p t e r with
a discussion of the a t t i t u d e s that the person trained in statistics brings to
biological research.
sense asyxwvutsrqponmlkjihgfedcbaYXWVUTSRPNMLKJIHFEDCBA
the scientific study of numerical data based on natural phenomena. All
p a r t s of this definition a r e i m p o r t a n t a n d deserve emphasis:
Scientific study: Statistics m u s t meet t h e c o m m o n l y accepted criteria of
validity of scientific evidence. W e m u s t always be objective in p r e s e n t a t i o n a n d
e v a l u a t i o n of d a t a a n d a d h e r e t o the general ethical code of scientific m e t h o d -
ology, or we m a y find t h a t t h e old saying t h a t "figures never lie, only statisticians
d o " applies to us.
Data: Statistics generally deals with p o p u l a t i o n s or g r o u p s of individuals;
hence it deals with quantities of i n f o r m a t i o n , not with a single datum. T h u s , t h e
m e a s u r e m e n t of a single a n i m a l or the response f r o m a single biochemical test
will generally not be of interest.
Numerical: Unless d a t a of a study c a n be quantified in one way o r a n o t h e r ,
they will not be a m e n a b l e to statistical analysis. N u m e r i c a l d a t a can be m e a -
s u r e m e n t s (the length or w i d t h of a s t r u c t u r e or t h e a m o u n t of a chemical in
a b o d y fluid, for example) o r c o u n t s (such as t h e n u m b e r of bristles or teeth).
Natural phenomena: W e use this term in a wide sense to m e a n not only all
t h o s e events in a n i m a t e a n d i n a n i m a t e n a t u r e that take place outside the c o n t r o l
of h u m a n beings, but also those evoked by scientists a n d partly u n d e r their
control, as in experiments. Different biologists will c o n c e r n themselves with
different levels of n a t u r a l p h e n o m e n a ; o t h e r k i n d s of scientists, with yet different
ones. But all would agree t h a t the chirping of crickets, the n u m b e r of peas in
a pod, and the age of a w o m a n at m e n o p a u s e are n a t u r a l p h e n o m e n a . T h e
h e a r t b e a t of rats in response to adrenalin, the m u t a t i o n rate in maize after
irradiation, or t h e incidence o r m o r b i d i t y in patients treated with a vaccine
m a y still be considered n a t u r a l , even t h o u g h scientists have interfered with t h e
p h e n o m e n o n t h r o u g h their intervention. T h e average biologist w o u l d n o t c o n -
sider the n u m b e r of stereo sets b o u g h t by p e r s o n s in different states in a given
year to be a n a t u r a l p h e n o m e n o n . Sociologists o r h u m a n ecologists, however,
might so consider it a n d deem it w o r t h y of study. T h e qualification " n a t u r a l
p h e n o m e n a " is included in the definition of statistics mostly to m a k e certain
that the p h e n o m e n a studied are not a r b i t r a r y ones t h a t are entirely u n d e r the
will a n d c o n t r o l of the researcher, such as the n u m b e r of animals e m p l o y e d in
an experiment.
T h e w o r d "statistics" is also used in a n o t h e r , t h o u g h related, way. It can
be the plural of the n o u n statistic, which refers t o any one of m a n y c o m p u t e d
or estimated statistical quantities, such as the m e a n , the s t a n d a r d deviation, o r
the correlation coefficient. Each o n e of these is a statistic.
Data in Biostatistics
Each biological discipline has its own set of variables, which may include con-
ventional m o r p h o l o g i c a l m e a s u r e m e n t s ; c o n c e n t r a t i o n s of chemicals in b o d y
fluids; rates of certain biological processes; frequencies of certain events, as in
genetics, epidemiology, a n d radiation biology; physical readings of optical or
electronic machinery used in biological research; and m a n y more.
We have already referred to biological variables in a general way, but we
have not yet defined them. We shall define a variable as a properly with respect
to which individuals in a sample d i f f e r in some ascertainable way. If t h e property
does not differ within a s a m p l e at h a n d or at least a m o n g the samples being
studied, it c a n n o t be of statistical interest. Length, height, weight, n u m b e r of
teeth, vitamin ( ' c o n t e n t , and genotypes are examples of variables in o r d i n a r y ,
genetically and phcnotypically diverse g r o u p s of organisms. W a r m - b l o o d e d n e s s
in a g r o u p of m a m m a l s is not, since m a m m a l s are all alike in this regard.
2 . 2 / VARIABLES IN BIOSTATISTICS 9
Variables
Measurement variables
Continuous variables
Discontinuous variables
Ranked variables
Attributes
Measurement variables are those measurements and counts that are expressed
numerically. M e a s u r e m e n t variables are of t w o kinds. T h e first kind consists of
continuous variables, which at least theoretically can assume an infinite n u m b e r
of values between a n y t w o fixed points. F o r example, between the t w o length
m e a s u r e m e n t s 1.5 a n d 1.6 cm there are an infinite n u m b e r of lengths that could
be m e a s u r e d if o n e were so inclined a n d h a d a precise e n o u g h m e t h o d of
calibration. Any given reading of a c o n t i n u o u s variable, such as a length of
1.57 m m , is therefore an a p p r o x i m a t i o n to the exact reading, which in practice
is u n k n o w a b l e . M a n y of the variables studied in biology arc c o n t i n u o u s vari-
ables. Examples are lengths, areas, volumes, weights, angles, temperatures,
periods of time, percentages, c o n c e n t r a t i o n s , a n d rates.
C o n t r a s t e d with c o n t i n u o u s variables are the discontinuous variables, also
k n o w n as meristic or discrete variables. These are variables that have only cer-
tain fixed numerical values, with no intermediate values possible in between.
T h u s the n u m b e r of segments in a certain insect a p p e n d a g e may be 4 or 5 or
6 but never 5l or 4.3. Examples of d i s c o n t i n u o u s variables are n u m b e r s of a
given s t r u c t u r e (such as segments, bristles, leel h, or glands), n u m b e r s of offspring,
n u m b e r s of colonics of m i c r o o r g a n i s m s or animals, or n u m b e r s of plants in a
given q u a d r a t .
Some variables c a n n o t be m e a s u r e d but at least can be ordered or r a n k e d
by their m a g n i t u d e . T h u s , in an experiment one might record the rank o r d e r
of emergence o f t e n p u p a e without specifying the exact time at which each p u p a
emerged. In such cases we code the d a t a as a ranked variable. I he o r d e r of
emergence. Special m e t h o d s for dealing with such variables have been devel-
oped, and several arc furnished in this book. By expressing a variable as a series
of ranks, such as 1,2, 3, 4. 5, we d o not imply that the difference in m a g n i t u d e
between, say, r a n k s I and 2 is identical lo or even p r o p o r t i o n a l to the dif-
ference between r a n k s 2 a n d 3.
Variables that c a n n o t be measured but must be expressed qualitatively are
called attributes, or nominal variables. These are all properties, such as black
or white, p r e g n a n t or not p r e g n a n t , d e a d or alive, male or female. W h e n such
attributes are c o m b i n e d with frequencies, they can be treated statistically. Of
80 mice, we may, for instance, state that four were black, t w o agouti, and the
10 CHAPTER 2 / DATA IN BIOSTATISTICS
Color Frequency
Black 4
Agouti 2
Gray 74
T o t a l n u m b e r of m i c e 80
Implied limits
26.58 2 27
133.71 37 5 133.71
0.03725 3 0.0372
0.03715 3 0.0372
18,316 2 8.000
17.3476 3 17.3
2 . 4 / DERIVED VARIABLES 13
10 l· 25
/
0 qj I III· I . ll.l I.
10
100
30 r
20
10
500
i I L
0
70
60
50
2000
40
30
20
10
f i g u r k 2.1
S a m p l i n g from ;i p o p u l a t i o n of birth weights of i n f a n t s (a c o n t i n u o u s variable). Λ. Λ s a m p l e of 25.
Β. Λ s a m p l e of KM).zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
C. A s a m p l e of 500. D. Λ s a m p l e of 2(XX).
16 CHAPTER 2 / DATA IN BIOSTATISTICS
200
£ 150 -
JJ _ FIGURE 2 . 2
2 '"" B a r d i a g r a m . F r e q u e n c y of t h e sedgezyxwvutsrqpon
Car ex
£ flacca in 500 q u a d r a t s . D a t a f r o m T a b l e 2.2;
I orginally f r o m A r c h i b a l d (1950).
0 1 2 3 4 5 (i 7 S
N u m b e r of p l a n t s q u a d r a t
Variable Frequency
V /
9 I
8 1
7 4
6 3
5 1
4 1
Phenolype J
A 86
an 32
This tells us that there are two classes of individuals, those identifed by the A -
phenotype, of which 86 were f o u n d , a n d those comprising the h o n i o z y g o t e re-
cessive aa, of which 32 were seen in the sample.
An example of a m o r e extensive qualitative frequency distribution is given
in Table 2.1, which s h o w s the distribution of m e l a n o m a (a type of skin cancer)
over b o d y regions in men a n d w o m e n . This table tells us t h a t the t r u n k a n d
limbs are the most frequent sites for m e l a n o m a s and that the buccal cavity, the
rest of the gastrointestinal tract, and the genital tract are rarely afflicted by this
ΤΛΒΙ Κ 2.1
TABI.E 2.2
A meristic frequency distribution.
N u m b e r of p l a n t s of the sedgezyxwvutsrqponmlkjihgfedcbaZYXWVUTSR
Carex
flacca f o u n d in 500 q u a d r a t s .
0 181
1 118
2 97
3 54
4 32
5 9
6 5
7 3
8 1
Total 500
Original measurements
7
4254.35 4.3 4 4.254.45 4.35
4.354.45 4.4 3
4.454.55 4.5 1 4.454.65 4.55 | 1 4.454.75 4.6
4.554.65 4.6 0
4.654.75 4.7 1 4.654.85 4.75 | _1
If 25 25 25
Histogram of the original frequency distribution shown above and of the grouped distribution with 5 classes. Line below
abscissa shows class marks for the grouped frequency distribution. Shaded bars represent original frequency distribution;
hollow bars represent grouped distribution.
10 r
_] I 1 1 i—
3.4 3.7 4.0 4.3 4.6
Y (femur length, in units of 0.1 mm)
Completed array
Step I Step 2 ... Step 7 .. . (Step )
Ί Ί 7 3
3 3 3 3 67
4 9 4 96 4 96 4 964
5 5 5 5 5 5
6 6 6 4 6 4
7 7 7 7 13
X X X X
9 9 9 1 9 IX
10 10 10 10
11 11 11 11
12 12 12 7 12 7
13 13 13 13
14 14 14 14
15 15 15 15
16 16 16 3 16 3
17 17 17 17
IX IX 18 IX 0
FIGURE 2 . 3
F r e q u e n c y polygon. Birth weights of 9465
males infants. C h i n e s e third-class p a t i e n t s in
S i n g a p o r e , 1950 a n d 1951. D a t a f r o m Millis
a n d Seng (1954).
Exercises
2.1 R o u n d t h e f o l l o w i n g n u m b e r s t o t h r e e s i g n i f i c a n t figures: 1 0 6 . 5 5 , 0 . 0 6 8 1 9 , 3 . 0 4 9 5 ,
7815.01, 2.9149, a n d 20.1500. W h a t a r e t h e implied limits b e f o r e a n d after r o u n d -
ing? R o u n d these s a m e n u m b e r s t o o n e decimal place.
A N S . F o r t h e first v a l u e : 107; 1 0 6 . 5 4 5 106.555; 1 0 6 . 5 - 1 0 7 . 5 ; 106.6
2.2 D i f f e r e n t i a t e b e t w e e n t h e f o l l o w i n g p a i r s of t e r m s a n d g i v e a n e x a m p l e o f e a c h ,
(a) S t a t i s t i c a l a n d b i o l o g i c a l p o p u l a t i o n s , ( b ) V a n a l e a n d i n d i v i d u a l , (c) A c c u r a c y
a n d p r e c i s i o n ( r e p e a t a b i l i t y ) , ( d ) C l a s s i n t e r v a l a n d c l a s s m a r k , (e) B a r d i a g r a m
a n d h i s t o g r a m , (f) A b s c i s s a a n d o r d i n a t e .
2.3 G i v e n 2 0 0 m e a s u r e m e n t s r a n g i n g f r o m 1.32 t o 2 . 9 5 m m , h o w w o u l d y o u g r o u p
t h e m i n t o a f r e q u e n c y d i s t r i b u t i o n ? G i v e class limits a s well a s c l a s s m a r k s .
2.4 G r o u p t h e f o l l o w i n g 4 0 m e a s u r e m e n t s of i n t e r o r b i t a l w i d t h of a s a m p l e o f d o -
m e s t i c p i g e o n s i n t o a f r e q u e n c y d i s t r i b u t i o n a n d d r a w its h i s t o g r a m ( d a t a f r o m
O l s o n a n d M i l l e r , 1958). M e a s u r e m e n t s a r e in m i l l i m e t e r s .
12.2 12.9 11.8 11.9 11.6 11.1 12.3 12.2 11.8 11.8
10.7 1 1.5 1 1.3 11.2 1 1.6 11.9 13.3 11.2 10.5 11.1
12.1 11.9 10.4 10.7 10.8 11.0 11.9 10.2 10.9 11.6
10.8 11.6 10.4 10.7 12.0 12.4 11.7 11.8 1 1.3 11.1
2.5 H o w p r e c i s e l y s h o u l d y o u m e a s u r e t h e w i n g l e n g t h of a s p e c i e s of m o s q u i t o e s
in a s t u d y of g e o g r a p h i c v a r i a t i o n if t h e s m a l l e s t s p c c i m c n h a s a l e n g t h of a b o u t
2.8 m m a n d t h e l a r g e s t a l e n g t h of a b o u t 3.5 mm'. 1
2.6 T r a n s f o r m t h e 4 0 m e a s u r e m e n t s in E x e r c i s e 2.4 i n l o c o m m o n l o g a r i t h m s ( u s e a
t a b i c o r c a l c u l a t o r ) a n d m a k e a f r e q u e n c y d i s t r i b u t i o n of t h e s e t r a n s f o r m e d
v a r i a t e s . C o m m e n t o n t h e r e s u l t i n g c h a n g e in t h e p a t t e r n of t h e f r e q u e n c y d i s -
tribution from that found before
2.7 f o r t h e d a t a of T a h l e s 2.1 a n d 2.2 i d e n t i f y t h e i n d i v i d u a l o b s e r v a t i o n s , s a m p l e s ,
populations, and variables.
2.8 M a k e a s t e m - a n d - l c a f d i s p l a y of t h e d a t a g i v e n in E x c r c i s c 2.4.
2.9 T h e d i s t r i b u t i o n o f a g e s of s t r i p e d b a s s c a p t u r e d by h o o k a n d l i n e f r o m t h e E a s t
R i v e r a n d t h e H u d s o n R i v e r d u r i n g 1 9 8 0 w e r e r e p o r t e d a s f o l l o w s ( Y o u n g , 1981):
A<tc I
1 13
2 49
3 96
4 28
5 16
6 X
S h o w t h i s d i s t r i b u t i o n in t h e f o r m of a b a r d i a g r a m .
CHAPTER
Descriptive Statistics
14.9 tsronljfeaUT
10.8
12.3
23.3
Sum =- 6 1 7 3
3.1 / THE ARITHMETIC MEAN 29
Mean = 15.325%
Υ» v2, Y3, ^
Yl, Υ2,.·.,Υη
Σ" Yi = y> + y
2 + ·· ·+ η
i= 1
Σ 1 Yi = ιΣ 1 γί
!
= Σ γ<
;
= Σ γ
= Σ γ
T h e third symbol might be interpreted as meaning, " S u m the Y t 's over all
available values of /." This is a frequently used n o t a t i o n , a l t h o u g h we shall
not employ it in this b o o k . T h e next, with η as a superscript, tells us to sum η
items of V; note (hat the i subscript of the Y has been d r o p p e d as unneces-
sary. Finally, the simplest n o t a t i o n is s h o w n at the right. It merely says sum
the Vs. This will be the form we shall use most frequently: if a s u m m a t i o n sign
precedes a variable, the s u m m a t i o n will be u n d e r s t o o d to be over η items (all
the items in the sample) unless subscripts or superscripts specifically tell us
otherwise.
30 CHAPTER 3 /' DESCRIPTIVE STATISTICS
This f o r m u l a tells us, " S u m all the («) items a n d divide the s u m by n."
T h e mean of a sample is the center of gravity of the obsen'ations in the sample.
If you were to d r a w a h i s t o g r a m of an observed frequency d i s t r i b u t i o n o n a
sheet of c a r d b o a r d a n d then cut out the h i s t o g r a m a n d lay it flat against a
b l a c k b o a r d , s u p p o r t i n g it with a pencil b e n e a t h , chances a r e t h a t it would be
out of balance, t o p p l i n g to either the left o r the right. If you m o v e d the s u p -
p o r t i n g pencil p o i n t to a position a b o u t which the h i s t o g r a m w o u l d exactly
balance, this point of b a l a n c e would c o r r e s p o n d to the a r i t h m e t i c m e a n .
W e often m u s t c o m p u t e averages of m e a n s or of o t h e r statistics that m a y
differ in their reliabilities because they are based on different sample sizes. At
o t h e r times we m a y wish the individual items to be averaged to have different
weights or a m o u n t s of influence. In all such cases we c o m p u t e a weighted
average. A general f o r m u l a for calculating the weighted average of a set of
values Yt is as follows:
t = (3.2)
Σ »·.
w h e r e η variates, each weighted by a factor w„ are being averaged. T h e values
of Yi in such cases are unlikely to represent variates. They are m o r e likely to
be s a m p l e m e a n s Yt or s o m e o t h e r statistics of different reliabilities.
T h e simplest case in which this arises is when the V, are not individual
variates but are means. T h u s , if the following three m e a n s are based on differing
s a m p l e sizes, as shown,
>; n,
3.85 12
5.21 25
4.70 Η
GMr=nY\Yi (3.4a)
I
1 1 „ 1
You may wish to convince yourself that the geometric mean a n d the h a r m o n i c
m e a n of the four oxygen percentages are 14.65% a n d 14.09%, respectively. U n -
less the individual items d o not vary, the geometric m e a n is always less than
the arithmetic m e a n , and the h a r m o n i c m e a n is always less t h a n the geometric
mean.
S o m e beginners in statistics have difficulty in accepting the fact that mea-
sures of location or central tendency o t h e r t h a n the arithmetic m e a n are per-
missible or even desirable. T h e y feel that the arithmetic m e a n is the "logical"
32 CHAPTER 3 /' DESCRIPTIVE STATISTICS
T h ezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
median Μ is a statistic of location occasionally useful in biological research.
It is defined as that value of the variable (in an o r d e r e d array) that has an equal
number of items on either side of it. Thus, the m e d i a n divides a frequency dis-
tribution into two halves. In the following sample of five m e a s u r e m e n t s ,
the median would be the m i d p o i n t between the second and third items, or 15.5.
Whenever any o n e value of a variatc occurs m o r e than once, p r o b l e m s may
develop in locating the m e d i a n . C o m p u t a t i o n of the median item b e c o m e s m o r e
involved because all the m e m b e r s of a given class in which the m e d i a n item is
located will have the s a m e class m a r k . T h e median then is the {n/2)lh variate
in the frequency distribution. It is usually c o m p u t e d as that point between the
class limits of the m e d i a n class where the median individual would be located
(assuming the individuals in the class were evenly distributed).
T h e median is just o n e of a family of statistics dividing a frequency dis-
tribution into equal areas. It divides the distribution into two halves. T h e three
quartiles cut the d i s t r i b u t i o n at the 25, 50, and 75% p o i n t s — t h a t is, at points
dividing the distribution into first, second, third, and f o u r t h q u a r t e r s by area
(and frequencies). T h e second quarlile is, of course, the median. (There are also
quintiles, deciles, a n d percentiles, dividing the distribution into 5. 10, a n d 100
equal portions, respectively.)
M e d i a n s arc most often used for d i s t r i b u t i o n s that d o not c o n f o r m to the
s t a n d a r d probability models, so that n o n p a r a m e t r i c m e t h o d s (sec C h a p t e r 10)
must be used. Sometimes (he median is a m o r e representative m e a s u r e of loca-
tion than the a r i t h m e t i c m e a n . Such instances almost always involve a s y m m e t r i c
3.4 / THE MODE 33
T h eyxwvutsrqponmlkjihgfedcbaYXWVUTSRPNMLKJIHFEDCBA
mode r e f e r s t o the value represented by the greatest number of individuals.
When seen on a frequency distribution, the m o d e is the value of the variable
at which the curve peaks. In grouped frequency distributions the m o d e as a
point has little meaning. It usually sulliccs It) identify the m o d a l class. In biology,
the m o d e does not have m a n y applications.
Distributions having two peaks (equal or unequal in height) are called
bimodal; those with m o r e than two peaks are multimodal. In those rare dis-
tributions that are U-shaped, we refer to the low point at the middle of the
distribution as an antimode.
In evaluating the relative merits of the arithmetic mean, the median, a n d
the mode, a n u m b e r of c o n s i d e r a t i o n s have to be kept in mind. T h e m e a n is
generally preferred in statistics, since it has a smaller s t a n d a r d e r r o r than o t h e r
statistics of location (see Section 6.2), it is easier to work with mathematically,
and it has an a d d i t i o n a l desirablc p r o p e r t y (explained in Section 6.1): it will
tend to be normally distributed even if the original data are not. T h e mean is
34 CHAPTER 3 /' DESCRIPTIVE STATISTICS
20
18 η = 120
Hi uh
14
12 tsronljfeaUT
10
c"
ct- zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
U.
8
HGURi·: 3.1
An a s y m m e t r i c a l f r e q u e n c y d i s t r i b u t i o n ( s k e w e d t o the right) s h o w i n g l o c a t i o n of t h e m e a n , m e d i a n ,
a n d m o d e . P e r c e n t b u t t e r f a t in 120 s a m p l e s of milk ( f r o m a C a n a d i a n c a t t l e b r e e d e r s ' r e c o r d b o o k ) .
10
4 tsronljfeaUT
2 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED
0
10
£α; 6
α
ϊ 4
Uh
0 qj
10
(i 1
0 I
FIGURE 3 . 2
T h r e e frequency d i s t r i b u t i o n s h a v i n g identical m e a n s a n d s a m p l e si/.es but differing in dispersion
pattern.
TABLE 3.1
ΣΥ I Is.7
Mean Y - 7.713
3.7 / SAMPLE STATISTICS AND PARAMETERS 37
X>· 2 __ 308.7770
Variance = = 20.5851
15
T h e variance is a m e a s u r e of f u n d a m e n t a l i m p o r t a n c e in statistics, a n d we
shall employ it t h r o u g h o u t this b o o k . At the m o m e n t , we need only r e m e m b e r
that because of the s q u a r i n g of the deviations, the variance is expressed in
squared units. T o u n d o the effect of the squaring, we now take the positive
s q u a r e r o o t of the variance a n d o b t a i n the standard deviation:
(3.6)
We note that this value is slightly larger than o u r previous estimate of 4.537.
Of course, the greater the s a m p l e size, the less difference there will be between
division by η a n d by n I. However, regardless of sample size, it is good
practice to divide a sum of s q u a r e s by η — 1 when c o m p u t i n g a variance or
s t a n d a r d deviation. It m a y be assumed that when the symbol s2 is e n c o u n t e r e d ,
it refers to a variance o b t a i n e d by division of the sum of squares by the degrees
of freedom, as the q u a n t i t y η — 1 is generally referred to.
Division of the s u m of s q u a r e s by η is a p p r o p r i a t e only when the interest
of the investigator is limited to the s a m p l e at h a n d a n d to its variance a n d
3.8 / PRACTICAL METHODS FOR COMPUTING MEAN AND STANDARD DEVIATION 39
£y2 = X<y-y) 2
(3.7)
v r V>" (3.8)
11
Let us see exactly w h a t this f o r m u l a represents. T h e first term o n the right side
of the e q u a t i o n , Σ Υ 2 , is the sum of all individual Y's, each s q u a r e d , as follows:
£y 2
- Y 2 + >1 + >1 + · • • + Y2„
W h e n referred to by name, Σ Υ 2 should be called the "sum of Y s q u a r e d " and
should be carefully distinguished f r o m Σ>>2, "the sum of squares of Y." These
names are u n f o r t u n a t e , but they are t o o well established to think of a m e n d i n g
them. T h e o t h e r q u a n t i t y in Expression (3.8) is (ΣΥ) 2 />ι. It is often called the
correction term (CT). T h e n u m e r a t o r of this term is the s q u a r e of the sum of the
Y's; t h a t is, all t h e Y's are first s u m m e d , and this s u m is then s q u a r e d . In general,
this q u a n t i t y is different f r o m Σ Υ 2 , which first squares the y ' s a n d then sums
them. These two terms a r c identical only if all the Y's arc equal. If you arc not
certain a b o u t this, you can convince yourself of this fact by calculating these
two quantities for a few n u m b e r s .
T h e d i s a d v a n t a g e of Expression (3.8) is that the quantities Σ Y2 a n d (Σ Y)2hi
may b o t h be quite large, so that accuracy may be lost in c o m p u t i n g their dif-
ference unless one takes the precaution of c a r r y i n g sufficient significant figures.
W h y is Expression (3.8) identical with Expression (3.7)? T h e proof of this
identity is very simple a n d is given in Appendix A 1.2. You are urged to work
40 CHAPTER 3 /' DESCRIPTIVE STATISTICS
BOX 3.1
Calculation ofzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Ϋ and s from unordered data.
Neutrophil counts, unordered as shown in Table 3.1.
Computation
n = 15
£ 7 = 115.7
y = T y = 7.713
η
2
ΣΥ = 1201.21
= 308.7773
Σ / , 308.7773
S
~ η 1~ 14
= 22.056
s = V22.056 = 4.696
10 3
30 4
100 5
5 0 0 tsronljfeaUT 6
1000 6
42 CHAPTER 3 /' DESCRIPTIVE STATISTICS
BOX 3.2
Calculation of F, s, and Κ from a frequency distribution.
Birth weights of male Chinese in ounces.
(/) (2)
Class mark Coifei c/iJM mark
y
/
59.5 2 0
67.5 6 1
75.5 39 2
83.5 385 3
91.5 888 4
99.5 1729 5
107.5 2240 6
115.5 2007 7
123.5 1233 8
131.5 641 9
139.5 201 10
147.5 74 11
155.5 14 12
163.5 5 13
171.5 1 14
9465 = η
v
Y — 59 5
Σ JX = 59,629 Code: Yc = -
s? = = 2.888
η— 1
sc = 1.6991 To decode sf: s = 8sc = 13.593 oz
H a v i n g o b t a i n e d the s t a n d a r d deviation as a m e a s u r e of t h e a m o u n t of v a r i a t i o n
in the d a t a , y o u m a y be led to ask, " N o w w h a t ? " At this stage in o u r c o m -
prehension of statistical theory, n o t h i n g really useful comes of the c o m p u t a t i o n s
we have carried out. H o w e v e r , the skills j u s t learned are basic to all later statis-
tical w o r k . So far, the only use t h a t we might have for the s t a n d a r d deviation
is as an estimate of the a m o u n t of variation in a p o p u l a t i o n . T h u s , we may
wish to c o m p a r e the m a g n i t u d e s of the s t a n d a r d deviations of similar p o p u l a -
tions a n d see w h e t h e r p o p u l a t i o nzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIH
A is m o r e or less variable than p o p u l a t i o n B.
W h e n p o p u l a t i o n s differ appreciably in their means, the direct c o m p a r i s o n
of their variances o r s t a n d a r d deviations is less useful, since larger o r g a n i s m s
usually vary m o r e t h a n smaller one. F o r instance, the s t a n d a r d deviation of
the tail lengths of e l e p h a n t s is obviously m u c h greater than the entire tail length
of a mouse. T o c o m p a r e the relative a m o u n t s of variation in p o p u l a t i o n s having
different means, the coefficient of variation, symbolized by V (or occasionally
CV), has been developed. This is simply the s t a n d a r d deviation expressed as a
percentage of the m e a n . Its f o r m u l a is
Exercises
3.1 F i n d f , s, V, a n d t h e m e d i a n f o r t h e f o l l o w i n g d a t a ( m g o f g l y c i n e p e r m g o f
c r e a t i n i n e in t h e u r i n e o f 3 7 c h i m p a n z e e s ; f r o m G a r t l e r , F i r s c h e i n , a n d D o b -
z h a n s k y , 1956). A N S . Y = 0 . 1 1 5 , s = 0 . 1 0 4 0 4 .
.008 .018 .056 .055 .135 .052 .077 .026 .440 .300
.025 .036 .043 .100 .120 .110 .100 .350 .100 .300
.011 .060 .070 .050 .080 .110 .110 .120 .133 .100
.100 .155 .370 .019 .100 .100 .116
3.2 F i n d t h e m e a n , s t a n d a r d d e v i a t i o n , a n d c o e f f i c i e n t of v a r i a t i o n f o r t h e p i g e o n
d a t a g i v e n i n E x e r c i s e 2.4. G r o u p t h e d a t a i n t o t e n c l a s s e s , r e c o m p u t e Ϋ a n d s,
a n d c o m p a r e t h e m with the results o b t a i n e d from u n g r o u p e d data. C o m p u t e
the m e d i a n for the g r o u p e d data.
3.3 T h e f o l l o w i n g a r e p e r c e n t a g e s of b u t t e r f a t f r o m 120 r e g i s t e r e d t h r e e - y e a r - o l d
A y r s h i r e c o w s selected at r a n d o m f r o m a C a n a d i a n stock r e c o r d b o o k .
(a) C a l c u l a t e Y, s, a n d V d i r e c t l y f r o m t h e d a t a .
(b) G r o u p t h e d a t a i n a f r e q u e n c y d i s t r i b u t i o n a n d a g a i n c a l c u l a t e Y, s, a n d V.
C o m p a r e t h e r e s u l t s w i t h t h o s e o f (a). H o w m u c h p r e c i s i o n h a s b e e n l o s t b y
grouping? Also calculate the median.
mode, range? What would be the effect of adding 5.2 and then multiplying the
sums by 8.0? Would it make any difference in the above statistics if we multiplied
by 8.0 first and then added 5.2?
3.5 EstimatezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
μ and σ using the midrange and the range (see Section 3.8) for the data
in Exercises 3.1, _3.2, and 3.3. How well do these estimates agree with the esti-
mates given by Y and s? ANS. Estimates of μ and σ for Exercise 3.2 are 0.224
and 0.1014.
3.6 Show that the equation for the variance can also be written as
, ΤΥ2-ηΫ2
s2 = ^
η — 1
3.7 Using the striped _bass age distribution given in Exercise 2.9, compute the fol-
lowing statistics: Y, s2, s, V, median, and mode. ANS. 7 = 3.043, s2 = 1.2661,
s = 1.125, V = 36.98%, median = 2.948, mode = 3.
3.8 Use a calculator and compare the results of using Equations 3.7 and 3.8 to
compute s 2 for the following artificial data sets:
(a) 1 , 2 , 3 , 4 , 5
(b) 9001, 9002, 9003, 9004, 9005
(c) 90001, 90002, 90003, 90004, 90005
(d) 900001, 900002, 900003, 900004, 900005
Compare your results with those of one or more computer programs. What is
the correct answer? Explain your results.
CHAPTER
Introduction to Probability
Distributions: The Binomial and
Poisson Distributions
we w o u l d h a v e t o set u p a collection o r t r a p p i n g s t a t i o n s o m e w h e r e o n c a m p u s .
A n d t o m a k e certain t h a t the s a m p l e was truly r a n d o m with respect t o t h e
entire s t u d e n t p o p u l a t i o n , we w o u l d have t o k n o w t h e ecology of s t u d e n t s o n
c a m p u s very t h o r o u g h l y . W e should try to locate o u r t r a p a t s o m e s t a t i o n
where e a c h s t u d e n t h a d a n e q u a l probability of passing. F e w , if a n y , such places
can be f o u n d in a university. T h e s t u d e n t u n i o n facilities a r e likely t o be
frequented m o r e by i n d e p e n d e n t a n d foreign students, less by t h o s e living in
organized houses a n d d o r m i t o r i e s . F e w e r foreign a n d g r a d u a t e s t u d e n t s m i g h t
be f o u n d a l o n g fraternity row. Clearly, we w o u l d n o t wish t o place o u r t r a p
near the I n t e r n a t i o n a l C l u b o r H o u s e , because o u r p r o b a b i l i t y of s a m p l i n g a
foreign s t u d e n t w o u l d be greatly e n h a n c e d . In f r o n t of the b u r s a r ' s w i n d o w we
might s a m p l e s t u d e n t s p a y i n g tuition. But those o n scholarships m i g h t n o t be
found there. W e d o n o t k n o w w h e t h e r the p r o p o r t i o n of scholarships a m o n g
foreign o r g r a d u a t e s t u d e n t s is t h e s a m e as o r different f r o m t h a t a m o n g t h e
American or u n d e r g r a d u a t e students. Athletic events, political rallies, dances,
and the like w o u l d all d r a w a differential s p e c t r u m of the s t u d e n t body; indeed,
n o easy solution seems in sight. T h e time of s a m p l i n g is equally i m p o r t a n t , in
the seasonal as well as the d i u r n a l cycle.
T h o s e a m o n g t h e r e a d e r s w h o are interested in s a m p l i n g o r g a n i s m s f r o m
n a t u r e will already h a v e perceived parallel p r o b l e m s in their w o r k . If we were
to s a m p l e only s t u d e n t s wearing t u r b a n s or saris, their p r o b a b i l i t y of being
foreign s t u d e n t s w o u l d b e a l m o s t 1. W e could n o longer speak of a r a n d o m
sample. In the familiar ecosystem of t h e university these violations of p r o p e r
sampling p r o c e d u r e a r e o b v i o u s t o all of us, b u t they are not nearly so o b v i o u s
in real biological instances where we a r e unfamiliar with the true n a t u r e of the
environment. H o w s h o u l d we proceed t o o b t a i n a r a n d o m s a m p l e of leaves
f r o m a tree, of insects f r o m a field, o r of m u t a t i o n s in a culture? In s a m p l i n g
at r a n d o m , we are a t t e m p t i n g t o permit the frequencies of v a r i o u s events
occurring in n a t u r e t o be r e p r o d u c e d unalteredly in o u r records; t h a t is, we
h o p e t h a t o n the average the frequencies of these events in o u r s a m p l e will be
the same as they a r e in the n a t u r a l situation. A n o t h e r way of saying this is that
in a r a n d o m s a m p l e every individual in the p o p u l a t i o n being s a m p l e d has a n
equal probability of being included in the sample.
We might go a b o u t o b t a i n i n g a r a n d o m s a m p l e by using records repre-
senting the student b o d y , such as the student directory, selecting a page f r o m
it at r a n d o m a n d a n a m e at r a n d o m f r o m the page. O r we could assign an
an a r b i t r a r y n u m b e r t o each s t u d e n t , write each o n a chip or disk, put these
in a large c o n t a i n e r , stir well, a n d then pull out a n u m b e r .
I m a g i n e n o w t h a t we s a m p l e a single s t u d e n t physically by the t r a p p i n g
m e t h o d , after carefully p l a n n i n g t h e placement of the t r a p in such a way as to
m a k e s a m p l i n g r a n d o m . W h a t a r e the possible o u t c o m e s ? Clearly, the student
could be either a n A U , A G , F U or F G . T h eyxwvutsrqponmlkjihgfedcbaYXWVUTSRPNMLKJ
set of these four possible o u t c o m e s
exhausts the possibilities of this experiment. This set, which we c a n represent
as {AU, A G , F U , F G } is called the sample space. Any single trial of the experiment
described a b o v e w o u l d result in only o n e of the f o u r possible o u t c o m e s (elements)
50 CHAPTER 4 / INTRODUCTION TO PROBABILITY DISTRIBUTIONS
A = {AU, A G }
Β = {AG, F G }
{0.70,0.26, 0.01,0.03}
ο.»:! Η;
0.0210 0.0078 o.ooo:! 0.000!)
ε o.oi i'T
•a (1.0070 0.0020 0.0001 o.ooo:!
ο.2ΐ» \<;
0.1820 0.0()7(i 0.002(1 0.0078
0.70 Λ Γ
0.1900 0.1820 0.0070 0.0210
n c a κι 4.1
S a m p l e space lor s a m p l i n g Iwo students from Matchless University.
52 CHAPTER 4 / INTRODUCTION TO PROBABILITY DISTRIBUTIONS
{ F F , FA, AA}
{ P2, 2pq, q2 }
{ F F F , F F A , FAA, AAA}
{ p \ 3p2q, 3pq2, q3 }
F o r samples of 1, (p + q)1 = ρ + q
F o r samples of 2, (p + q)2 = p2 + 2pq + q2
F o r s a m p l e s of 3, (p + q)3 = p3 + 3 p 2 q + 3pq 2 + q3
k
1 1 1 y
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
5 1 5 10 10 5 1
pV + pV + p V + pY + p°q4
p 4 + 4p3<7 + 6 p V + 4pq3 + q4
or
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
TABLE 4 . 1
Expected frequencies of infected insects in samples of 5 insects sampled from an infinitely large
population with an assumed infection rate of 40",,.
U)
Number of (5) (6)
infected (2) (3) Relative Absolute (7)
insects Powers Powers Μ expected expected Observed
per sample of of Binomial frequencies frequencies frequencies
V ρ = 0.4 q = 0.6 coefficients L f f
Observed frequencies
900
Π Expected frequencies
H00
700
C,'
f ">00
k 400
:ioo tsronljfeaUT
'2 0 0
100
ο
0 1 2 I 5
N u m b e r of i n f e c t e d insects per s a m p l e
FIGURE 4 . 2
B a r d i a g r a m of o b s e r v e d a n d expected frequencies given in T a b l e 4.1.
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
TABLE 4 . 2
Artificial distributions to illustrate clumping and repulsion. E x p e c t e d frequencies f r o m T a b l e 4.1.
(/ ) (2) <i)
Number of Absolute Clumped (4) (5) «5)
infected insects expected (contagious) Deviation Repulsed Deviation
per sample frequencies frequencies from frequencies from
Y f f expectation f expectation
5 24.8 47 + 14 _
4 186.1 227 -Ι- 157 —
μ = kp σ = \Jkpq
S u b s t i t u t i n g the values k = 5, ρ = 0.4, a n d q = 0.6 of the a b o v e example, we
o b t a i n μ = 2.0 a n d σ = 1.095,45, which are identical to the values c o m p u t e d
f r o m c o l u m n (5) in T a b l e 4.1. N o t e that we use the G r e e k p a r a m e t r i c n o t a t i o n
here because μ a n d a are p a r a m e t e r s of an expected frequency d i s t r i b u t i o n , not
s a m p l e statistics, as a r e the m e a n a n d s t a n d a r d deviation in c o l u m n (7). T h e
p r o p o r t i o n s ρ a n d q a r e p a r a m e t r i c values also, and strictly speaking, they
should be distinguished from s a m p l e p r o p o r t i o n s . In fact, in later c h a p t e r s we
resort to ρ a n d q for p a r a m e t r i c p r o p o r t i o n s (rather t h a n π, which c o n v e n t i o n -
ally is used as the ratio of the circumfcrence to the d i a m e t e r of a circle). Here,
however, we prefer to keep o u r n o t a t i o n simple. If we wish to express o u r
variable as a p r o p o r t i o n r a t h e r than as a c o u n t — t h a t is, to indicate m e a n
incidence of infection in the insccts as 0.4, r a t h e r t h a n as 2 per sample of 5 we
can use o t h e r f o r m u l a s for the m e a n a n d s t a n d a r d deviation in a binomial
4.2 / THE BINOMIAL DISTRIBUTION 61
distribution:
μ = ρ σ =
C(k, Y)prqk Y
(4.1)
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
TABLE 4 . 3
S o m e expected frequencies of males and females for samples of 17 offspring on the assumption that
the sex ratio is 1:1 [ p v = 0.5,zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
q. = 0.5; ( p . + q-)k = (0.5 + 0 . 5 ) ' 7 ] .
^ O O h N ^ ' t n h M O N
iu
3 m —c oo Ο <*ϊ NO <N u-> 00 < ON ο
C o' ξ I"-, <N vo ο · v-i >/"i 00 oo fN Ο 1Λ
(N —< inNO NOOO (N in
-oCO a . 3 Tf oo r<i ο•-H NO<N
χ β" NO
δ .5!
5/ϊ in ο ο 00 Ο Ο </"> γ λ
a» ·~ oo n o fS ο t— o n V) ο οο ρ-
r—- rn CI s r- σν τ γ ρ·^ ίΝ
ν©
•I a
qj
Μ § © | <N
γ-'
s
crC n o
m <οNfS
— <Ν
<Ν r i
Ο
s s§ g
ο ο ο ο Ο Ο Ο ο ο ο ο ο ο
5 Si Ο
ν t o\ Tt Η
=0 ο
ΜΙΛΓ'κηΟΟ'Λ'Ί'^Ι—
r-- —η ^ \o m o\ oo ci
O ·—• —' rn c-f lr'> r i •—'
τ»· <Ν—'OOOOOO
O O O O O O O O O O O
Τ 'C
*
(X
4.2 / t h e BINoMiAL d i s t r i b u t i o n 65
^ · (4.2)
1!έ>"' 2!ί>"' 3!<>μ ' 4!?"
0, 1, 2, 3, 4,
BOX 4 1
Calculation of expected Poisson frequencies.
Yeast cells in 400 squares of a hemacytometer: f = 1.8 cells per square; η 400
squares sampled.
(/) ( 2) ( 3) Μ
Number of Observed Absolute Deviation from
cells per square frequencies expected frequencies expectation
Y / / / /
0 75 66. 1 +
1 103 119. 0 —
2 121 107. 1 +
3 54 64. 3 —
4 30 28. 9 +
5 13Ί 10. 41 4
6 2 3. 1 —
7 1 •17 0. 8 •14. 5 + • +
8 0 0. 2 —
9 lj 0.0. +
40) 399. 9
Source: "Student" (1907).
Computational steps
Flow of computation based on Expression (4.3) multiplied by n, since we wish
to obtain absolute expected frequencies,/.
4. Λ / 2tJ
J t
119.02
= 107.11
t )vtsrpnmlihgfcSOC
5./3=/
^
2
Y
3 = 107.1
'(f) 64.27
6 . / W 3 64.27 28.92
7. Λ - A y 28.92 10.41
; Y
8· /6= Λ
"(τ) - 3.12
68 CHAPTER 4 / i n t r o d u c t i o n t o p r o b a b i l i t y distributions
BOX 4.1
Continued
3 . 1 2 ^ - 0.80
ence of t h e first one, b u t is higher t h a n the p r o b a b i l i t y for the first cell. This
would result in a c l u m p i n g of the items in the classes at the tails of the distri-
b u t i o n so t h a t there w o u l d be s o m e s q u a r e s with larger n u m b e r s of cells t h a n ex-
pected, o t h e r s with fewer n u m b e r s .
T h e biological i n t e r p r e t a t i o n of the dispersion p a t t e r n varies with the
p r o b l e m . T h e yeast cells seem to be r a n d o m l y distributed in t h e c o u n t i n g
c h a m b e r , indicating t h o r o u g h mixing of the suspension. Red b l o o d cells, o n the
o t h e r h a n d , will often stick t o g e t h e r because of a n electrical c h a r g e unless the
p r o p e r suspension fluid is used. T h i s so-called r o u l e a u x effect w o u l d be indi-
cated by c l u m p i n g of t h e observed frequencies.
N o t e t h a t in Box 4.1, as in the s u b s e q u e n t tables giving examples of the
application of the P o i s s o n distribution, we g r o u p the low frequencies at o n e
tail of the curve, uniting t h e m by m e a n s of a bracket. This t e n d s t o simplify
the p a t t e r n s of d i s t r i b u t i o n s o m e w h a t . However, the m a i n r e a s o n for this g r o u p -
ing is related t o the G test for g o o d n e s s of fit (of observed t o expected f r e q u e n -
cies), which is discussed in Section 13.2. F o r p u r p o s e s of this test, n o expected
frequency / should be less t h a n 5.
Before we t u r n t o o t h e r examples, we need to learn a few m o r e facts a b o u t
the P o i s s o n distribution. Y o u p r o b a b l y noticed t h a t in c o m p u t i n g expected
frequencies, we needed t o k n o w only o n e p a r a m e t e r — t h e m e a n of the distri-
bution. By c o m p a r i s o n , in the b i n o m i a l distribution we needed t w o parameters, zyxwv
ρ and k. T h u s , the m e a n completely defines the s h a p e of a given Poisson distri-
bution. F r o m this it follows that the variance is some f u n c t i o n of the m e a n . In
a P o i s s o n distribution, we have a very simple relationship between the two:
μ = σ 2 , t h e variance being equal to the m e a n . T h e variance of the n u m b e r of
yeast cells per s q u a r e based o n the observed frequencies in Box 4.1 e q u a l s 1.965,
not m u c h larger t h a n t h e m e a n of 1.8, indicating again that the yeast cells are
distributed in Poisson fashion, hence r a n d o m l y . This r e l a t i o n s h i p between vari-
ance a n d m e a n suggests a rapid test of w h e t h e r an observed frequency distribu-
tion is distributed in Poisson fashion even w i t h o u t fitting expected frequencies
to the d a t a . We simply c o m p u t e a coefficient of dispersion
This value will be near 1 in distributions that are essentially Poisson distribu-
tions, will be > 1 in c l u m p e d samples, a n d will be < 1 in cases of repulsion. In
the yeast cell example, CD = 1.092.
T h e shapes of five Poisson d i s t r i b u t i o n s of different m e a n s are s h o w n in
Figure 4.3 as frequency polygons (a frequency polygon is formed by the line
connecting successive m i d p o i n t s in a bar diagram). We notice that for the low
value of μ = 0.1 the frequency polygon is extremely L-shapcd, but with an
increase in the value of μ the d i s t r i b u t i o n s b e c o m e h u m p e d a n d eventually
nearly symmetrical.
We c o n c l u d e o u r study of the Poisson distribution with a c o n s i d e r a t i o n of
two examples. T h e first e x a m p l e (Table 4.5) s h o w s the d i s t r i b u t i o n of a n u m b e r
70 chapter 4 / introduction t o probability distributions
1.0
figure 4.3
F r e q u e n c y p o l y g o n s of t h e P o i s s o n d i s t r i b u t i o n for v a r i o u s values of t h e m e a n .
t a b l e zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
4.5
Poisson
Observed expected
frequencies frequencies
f
f
0 447 406.3 +
132 189.0
2 42 44.0
3
4
5+
Total 647 647.0
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED
t a b l e4.6
U)
Number of (3) (4)
weevils (2) Poisson Deviation
emerging Observed expected from
per bean frequencies frequencies expectation
Y f f f - f
0 61 70.4
1 50 32.7 •
2 η 7.6] -1
3 oil 1.2}· 8.9
4 0.1 J -J
Total 112 112.0
Exe rc ise s
4.1 The two columns below give fertility of eggs of the CP strain of Drosophila
melanogaster raised in 100 vials of 10 eggs each (data from R. R. Sokal). Find
the expected frequencies on the assumption of independence of mortality for
72 chapter 4 / introduction t o probability distributions
each egg in a vial. Use the observed mean. Calculate the expected variance and
compare it with the observed variance. Interpret results, knowing that the eggs
of each vial are siblings and that the different vials contain descendants from
different parent pairs. ANS. σ 2 = 2.417, s 2 = 6.636. There is evidence that mor-
tality rates are different for different vials.
Number of eggs
hatched Number of vials
Y f
0 1
1 3
2 8
3 10
4 6
5 15
6 14
7 12
8 13
9 9
10 9
4.2 In human beings the sex ratio of newborn infants is about 100?V': 105 J J . Were
we to take 10,000 random samples of 6 newborn infants from the total population
of such infants for one year, what would be the expected frequency of groups
of 6 males, 5 males, 4 males, and so on?
43 The Army Medical Corps is concerned over the intestinal disease X. From
previous experience it knows that soldiers suffering from the disease invariably
harbor the pathogenic organism in their feces and that to all practical purposes
every stool specimen from a diseased person contains the organism. However,
the organisms are never abundant, and thus only 20% of all slides prepared by
the standard procedure will contain some. (We assume that if an organism is
present on a slide it will be seen.) How many slides should laboratory technicians
be directed to prepare and examine per stool specimen, so that in case a speci-
men is positive, it will be erroneously diagnosed negative in fewer than 1 % of
the cases (on the average)? On the basis of your answer, would you recommend
that the Corps attempt to improve its diagnostic methods? ANS. 21 slides.
4.4 Calculate Poisson expected frequencies for the frequency distribution given in
Table 2.2 (number of plants of the sedge Carex flacca found in 500 quadrats).
4.5 A cross is made in a genetic experiment in Drosophila in which it is expected
that { of the progeny will have white eyes and 5 will have the trait called "singed
bristles." Assume that the two gene loci segregate independently, (a) What
proportion of the progeny should exhibit both traits simultaneously? (b) If four
flies are sampled at random, what is the probability that they will all be
white-eyed? (c) What is the probability that none of the four flies will have either
white eyes or "singed bristles?" (d) If two flies are sampled, what is the probability
that at least one of the flies will have either white eyes or "singed bristles" or
both traits? ANS. (a) (b) (i) 4 ; (c) [(1 - i)(l - i)] 4 ; (d) 1 - [(1 - i)(l -
4.6 Those readers who have had a semester or two of calculus may wish to try to
prove that Expression (4.1) tends to Expression (4.2) as k becomes indefinitely
exercises 73
large (andzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
ρ becomes infinitesimal, so that μ = kp remains constant). HINT:
*Y /
1 -» e x as η oo
V "/
4.7 If the frequency of the gene A is ρ and the frequency of the gene a is q, what
are the expected frequencies of the zygotes A A, Aa, and aa (assuming a diploid
zygote represents a random sample of size 2)? What would the expected frequency
be for an autotetraploid (for a locus close to the centromere a zygote can be
thought of as a random sample of size 4)? ANS. P{AA} = p2, P{Aa} = 2pq,
P{aa} = q2, f o r a d i p l o i d ; a n d P{AAAA} = p4, P{AAAa} = 4 p 3 q , P{AAaa} =
6 p 2 q 2 , P{Aaaa} = 4 p q 3 , P{aaaa} = q4, f o r a t e t r a p l o i d .
4.8 Summarize and compare the assumptions and parameters on which the binomial
and Poisson distributions are based.
4.9 A population consists of three types of individuals, A„ A2, and A3, with relative
frequencies of 0.5,0.2, and 0.3, respectively, (a) What is the probability of obtaining
only individuals of type Αλ in samples of size 1, 2, 3 , . . . , n? (b) What would be
the probabilities of obtaining only individuals that were not of type Α γ or A 2
in a sample of size n? (c) What is the probability of obtaining a sample containing
at least one representation of each type in samples of size 1, 2, 3, 4, 5 , . . . , n?
ANS. (a) I i , I , . . . , 1/2". (b) (0.3)". (c) 0, 0, 0.18, 0.36, 0.507,
4.10 If the average number of weed seeds found in a j o u n c e sample of grass seed is
1.1429, what would you expect the frequency distribution of weed seeds lo be
in ninety-eight 4-ounce samples? (Assume there is random distribution of the
weed seeds.)
CHAPTER
The Normal
Probability Distribution
I KillKL 5.1
A p r o b a b i l i t y d i s t r i b u t i o n of ;i c o n t i n u o u s
variable.
76 chapter 4 / introduction to probability distributions
0.3 i
Λ,ι 0.2
0.1
0
0 1 2 3 4 5 6 7 8 ! ) 10 zyxwvutsrqponmlkjihgfedcb
V
FIGURE 5 . 2
H i s t o g r a m b a s e d o n relative expected frequencies resulting f r o m e x p a n s i o n of b i n o m i a l (0.5 + 0.5) 1 0 .
T h e 7 axis m e a s u r e s t h e n u m b e r of p i g m e n t a t i o n f a c t o r s F.
Conversely,
'>5.45'ν -
~ W.7.V; -
figure 5.4
Areas u n d e r t h e n o r m a l p r o b a b i l i t y density f u n c t i o n a n d the c u m u l a t i v e n o r m a ! d i s t r i b u t i o n
function
Expe rim e nt 5.1. You are asked to sample from two populations. The first one is an
approximately normal frequency distribution of 100 wing lengths of houseflies. The
second population deviates strongly from normality. It is a frequency distribution of the
total annual milk yield of 100 Jersey cows. Both populations are shown in Table 5.1.
You are asked to sample from them repeatedly in order to simulate sampling from an
infinite population. Obtain samples of 35 items from each of the two populations. This
can be done by obtaining two sets of 35 two-digit random numbers from the table of
random numbers (Table I), with which you became familiar in Experiment 4.1. Write
down the random numbers in blocks of five, and copy next to them the value of Y (for
either wing length or milk yield) corresponding to the random number. An example of
such a block of five numbers and the computations required for it are shown in the
TABLEzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
5.1
Populations of wing lengths and milk yields.zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLK
Column I. R a n k n u m b e r . Column 2. L e n g t h s (in
m m χ 1 ( T ' ) of 100 wings of houseflies a r r a y e d in o r d e r of m a g n i t u d e ; / / = 45.5. σ2 = 15.21, σ = 3.90;
d i s t r i b u t i o n a p p r o x i m a t e l y n o r m a l . Column 3. T o t a l a n n u a l milk yield (in h u n d r e d s of p o u n d s ) of
100 two-year-old registered Jersey c o w s a r r a y e d in o r d e r of m a g n i t u d e ; μ = 66.61, a 2 = 124.4779,
ο = 11.1597; d i s t r i b u t i o n d e p a r t s s t r o n g l y f r o m n o r m a l i t y .
(/ ) (2) li) (/ ) (2) (3) (/ ) (2) « (0 (2) (3) (' ) (2) (3)
01 36 51 21 42 58 41 45 61 61 47 67 81 49 76
02 37 51 22 42 58 42 45 61 62 47 67 82 49 76
03 38 51 23 42 58 43 45 61 63 47 68 83 49 79
04 38 53 24 43 58 44 45 61 64 47 68 84 49 80
05 39 53 25 43 58 45 45 61 65 47 69 85 50 80
06 39 53 26 43 58 46 45 62 66 47 69 86 50 8!
07 40 54 27 43 58 47 45 62 67 47 69 87 50 82
08 40 55 28 43 58 48 45 62 68 47 69 88 50 82
09 40 55 29 43 58 49 45 62 69 47 69 89 50 82
10 40 56 30 43 58 50 45 63 70 48 69 90 50 82
II 41 56 31 43 58 51 46 63 71 48 70 91 51 83
12 41 56 32 44 59 52 46 63 72 48 72 92 51 85
13 41 57 33 44 59 53 46 64 73 48 73 93 51 87
14 41 57 34 44 59 54 46 65 74 48 73 94 51 88
15 41 57 35 44 60 55 46 65 75 48 74 95 52 88
16 41 57 36 44 60 56 46 65 76 48 74 96 52 89
17 42 57 37 44 60 57 46 65 77 48 74 97 53 93
18 42 57 38 44 60 58 46 65 78 49 74 98 53 94
19 42 57 39 44 60 59 46 67 79 49 75 99 54 96
20 42 57 40 44 61 60 46 67 80 49 76 00 55 98
Source: Column 2—Data adapted from Soka] and Hunter (1955). Column 3 —Data from Canadian government
records.
82 chapter 5 / the normal probability distribution
Wing
Random length
number Y
16 41
59 46
99 54
36 44
21 42
£ Y = 227
2
£Y = 10,413
y = 45.4
Those with ready access to a computer may prefer to program this exercise and take
many more samples. These samples and the computations carried out for each sample
will be used in subsequent chapters. Therefore, preserve your data carefully!
41 - 45.5
= : 1 1 5 3 8
- l 9 0 - "
This m e a n s t h a t the first wing length is 1.1538 s t a n d a r d deviations below the
true m e a n of t h e p o p u l a t i o n . T h e deviation f r o m the m e a n m e a s u r e d in s t a n d a r d
deviation units is called a standardized deviate o r standard deviate. T h e a r g u -
m e n t s of T a b l e II, expressing distance f r o m the m e a n in units of σ, a r e called
standard normal deviates. G r o u p all 35 variates in a frequency distribution; then
d o t h e s a m e for milk yields. Since you k n o w the p a r a m e t r i c m e a n a n d s t a n d a r d
deviation, you need not c o m p u t e each deviate separately, but can simply write
d o w n class limits in terms of the actual variable as well as in s t a n d a r d deviation
f o r m . T h e class limits for such a frequency distribution are s h o w n in T a b l e
5.2. C o m b i n e the results of y o u r s a m p l i n g with those of your classmates a n d
study the percentage of the items in the distribution one, two, a n d three s t a n d a r d
deviations t o each side of the m e a n . N o t e the m a r k e d differences in d i s t r i b u t i o n
between the housefly wing lengths and the milk yields.
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
t a b l e5.2
Table for recording frequency distributions of standard deviateszyxwvutsrqponmlkjihgfedcbaZYXWV
(¥, — μ )]ο for samples of
Experiment 5.1.
Variates Variates
falling falling
between between
these these
limits f limits
— 00 — 00
— 3σ -3σ
- 2 k
- 2 a 36, 37 — 2(7
38, 39 - l k
- i k
— a 40,41 51-55
42, 4 3 56-61
- k
μ = 45.5 44, 4 5 μ = 66.61 62 - 6 6
46, 47 67-72
k
a 48, 49 73-77
50, 51 Ik 78-83
i k
2a 52, 53 2σ 84-88
54, 5 5 2k 89-94
2 k
3σ 3σ 95 98
+ GO + GO
1 5 10 30 50 70 90 95 99
C u m u l a t i v e p e r c e n t in p r o b a b i l i t y scale
figure 5.5
Normal distributions
figurk 5.6
E x a m p l e s of s o m e f r e q u e n c y d i s t r i b u t i o n s with their c u m u l a t i v e d i s t r i b u t i o n s p l o t t e d with the
o r d i n a t e in n o r m a l p r o b a b i l i t y scale. (See Box 5.1 for e x p l a n a t i o n . )
BOX 5.1
Graphic test for normality of a frequency distribution and estimate of mean and
standard deviation. Use of arithmetic probability paper.
Birth weights of male Chinese in ounces, from Box 3.2.
Computational steps
1. Prepare a frequency distribution as shown in columns (1), (2), and (3).
2. Form a cumulative frequency distribution as shown in column (4). It is obtained
by successive summation of the frequency values. In column (5) express the
cumulative frequencies as percentages of total sample size n, which is 9465 in
this example. These percentages are 100 times the values of column (4) divided
by 9465.
3. Graph the upper class limit of each class along the abscissa (in linear scale)
against percent cumulative frequency along the ordinate (in probability scale)
on normal probability paper (see Figure 5.7). A straight line is fitted to the points
by eye, preferably using a transparent plastic ruler, which permits all the points
to be seen as the line is drawn. In drawing the line, most weight should be
given to the points between cumulative frequencies of 25% to 75%. This is
because a difference of a single item may make appreciable changes in the
percentages at the tails. We notice that the upper frequencies deviate to the right
of the straight line. This is typical of data that are skewed to the right (see
Figure 5.6D).
4. Such a graph permits the rapid estimation of the mean and standard deviation
of a sample. The mean is approximated by a graphic estimation of the median.
The more normal the distribution is, the closer the mean will be to the median.
5.5 / d e p a r t u r e s f r o m n o r m a l i t y : g r a p h i c methods 89
BOX 5.1
Continued
The median is estimated by dropping a perpendicular from the intersection of
the 50% point on the ordinate and the cumulative frequency curve to the
abscissa (see Figure 5.7). The estimate of the mean of 110.7 oz is quite close to
the computed mean of 109.9 oz.
5. The standard deviation can be estimated by dropping similar perpendiculars
from the intersections of the 15.9% and the 84.1% points with the cumulative
curve, respectively. These points enclose the portion of a normal curve repre-
sented byzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
μ ± σ. By measuring the difference between these perpendiculars and
dividing this by 2, we obtain an estimate of one standard deviation. In this
instance the estimate is s = 13.6, since the difference is 27.2 oz divided by 2. This
is a close approximation to the computed value of 13.59 oz.
οVI ο ο ο
— ο C ΙΛ COα.
g C υχ ω-ϊ ·°
υ
Ο rt ο -s&
ε ·£ •£
w
ϊ
ε J2
Ρ -S
ο.
u
S ο Q•
•Ο — σ- Μ .5
tc "3> ω
ι' zyxwvutsrqponmlkjihg
Ά *>
τ? _c _
• Γ (Λ ^
i rt c c jz —
/ C ο
V η
Jo S
Ο 9
£ f, 3= ·°
Ο ο χι
-ο
< .Β js ν£ 3§ on°ς 3 d
£ Ξ 3 C
3 >υ W u Λ ς< U"
Ο Ο
σ· ;>
ι/)
Μ Ο-
Λ
ο cu οΗ .-
u
ο"
u Q
ε V
•Ό £
οο ο©
ir> ο
8
5 ωc
ο-
ι> υ
'S·, Ο
5.5 / d e p a r t u r e s f r o m n o r m a l i t y : g r a p h i c methods 91
Exercises
5.1 U s i n g t h e i n f o r m a t i o n g i v e n in B o x 3.2, w h a t is t h e p r o b a b i l i t y o f o b t a i n i n g a n
i n d i v i d u a l w i t h a n e g a t i v e b i r t h w e i g h t ? W h a t is t h i s p r o b a b i l i t y if w e a s s u m e
t h a t b i r t h w e i g h t s a r e n o r m a l l y d i s t r i b u t e d ? A N S . T h e e m p i r i c a l e s t i m a t e is z e r o .
If a n o r m a l d i s t r i b u t i o n c a n b e a s s u m e d , it is t h e p r o b a b i l i t y t h a t a s t a n d a r d
n o r m a l d e v i a t e is less t h a n (0 - 1 0 9 . 9 ) / 1 3 . 5 9 3 = - 8 . 0 8 5 . T h i s v a l u e is b e y o n d
t h e r a n g e of m o s t tables, a n d t h e p r o b a b i l i t y can be c o n s i d e r e d z e r o for practical
purposes.
92 chapter 5 / the normal probability distribution
5.2 C a r r y o u t t h e o p e r a t i o n s l i s t e d in E x e r c i s e 5.1 o n t h e t r a n s f o r m e d d a t a g e n e r a t e d
i n E x e r c i s e 2.6.
5.3 A s s u m e y o u k n o w t h a t t h e p e t a l l e n g t h of a p o p u l a t i o n of p l a n t s of s p e c i e s X
is n o r m a l l y d i s t r i b u t e d w i t h a m e a n o fzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONM
μ = 3.2 c m a n d a s t a n d a r d d e v i a t i o n o f
σ = 1.8. W h a t p r o p o r t i o n o f t h e p o p u l a t i o n w o u l d b e e x p e c t e d t o h a v e a p e t a l
l e n g t h ( a ) g r e a t e r t h a n 4 . 5 c m ? ( b ) G r e a t e r t h a n 1.78 c m ? (c) B e t w e e n 2 . 9 a n d
3.6 c m ? A N S . (a) = 0 . 2 3 5 3 , ( b ) = 0 . 7 8 4 5 , a n d (c) = 0 . 1 5 4 .
5.4 P e r f o r m a g r a p h i c a n a l y s i s o f t h e b u t t e r f a t d a t a g i v e n i n E x e r c i s e 3.3, u s i n g p r o b -
ability paper. In addition, plot the d a t a on probability p a p e r with the abscissa
in l o g a r i t h m i c units. C o m p a r e t h e r e s u l t s of t h e t w o a n a l y s e s .
5.5 A s s u m e that traits A a n d Β are independent a n d normally distributed with p a r a m -
e t e r s μ Λ = 2 8 . 6 , σΑ = 4 . 8 , μ Β = 16.2, a n d σΒ = 4.1. Y o u s a m p l e t w o i n d i v i d u a l s
a t r a n d o m (a) W h a t is t h e p r o b a b i l i t y o f o b t a i n i n g s a m p l e s i n w h i c h b o t h
i n d i v i d u a l s m e a s u r e l e s s t h a n 2 0 f o r t h e t w o t r a i t s ? (b) W h a t is t h e p r o b a b i l i t y
t h a t a t l e a s t o n e o f t h e i n d i v i d u a l s is g r e a t e r t h a n 3 0 f o r t r a i t B ? A N S .
(a) P{A < 20}P{B < 2 0 } = ( 0 . 3 6 5 4 ) ( 0 . 0 8 2 , 3 8 ) = 0 . 0 3 0 ; (b) 1 - (P{A < 3 0 } ) χ
( Ρ { Β < 30}) = 1 - (0.6147)(0.9960) = 0.3856.
5.6 P e r f o r m t h e f o l l o w i n g o p e r a t i o n s o n t h e d a t a o f E x e r c i s e 2.4. (a) If y o u h a v e
not already d o n e so, m a k e a frequency distribution f r o m the d a t a a n d g r a p h the
r e s u l t s i n t h e f o r m of a h i s t o g r a m , ( b ) C o m p u t e t h e e x p e c t e d f r e q u e n c i e s f o r e a c h
o f t h e c l a s s e s b a s e d o n a n o r m a l d i s t r i b u t i o n w i t h μ = Ϋ a n d σ = s. (c) G r a p h
t h e e x p e c t e d f r e q u e n c i e s in t h e f o r m o f a h i s t o g r a m a n d c o m p a r e t h e m w i t h t h e
o b s e r v e d f r e q u e n c i e s , (d) C o m m e n t o n t h e d e g r e e of a g r e e m e n t b e t w e e n o b s e r v e d
a n d expected frequencies.
5.7 L e t u s a p p r o x i m a t e t h e o b s e r v e d f r e q u e n c i e s in E x e r c i s e 2.9 w i t h a n o r m a l f r e -
q u e n c y distribution. C o m p a r e the observed frequencies with those expected w h e n
a n o r m a l d i s t r i b u t i o n is a s s u m e d . C o m p a r e t h e t w o d i s t r i b u t i o n s b y f o r m i n g
a n d superimposing the observed a n d the expected histograms a n d by using a
h a n g i n g h i s t o g r a m . A N S . T h e e x p e c t e d f r e q u e n c i e s f o r t h e a g e c l a s s e s a r e : 17.9,
4 8 . 2 , 7 2 . 0 , 5 1 . 4 , 17.5, 3.0. T h i s is c l e a r e v i d e n c e f o r s k e w n e s s in t h e o b s e r v e d
distribution.
5.8 Perform a graphic analysis on the following measurements. Are they consistent
w i t h w h a t o n e w o u l d e x p e c t in s a m p l i n g f r o m a n o r m a l d i s t r i b u t i o n ?
T h e f o l l o w i n g d a t a a r e t o t a l l e n g t h s (in c m ) o f b a s s f r o m a s o u t h e r n laki
29.9 40.2 37.8 19.7 30.0 29.7 19.4 39.2 24.7 20.4
19.1 34.7 33.5 18.3 19.4 27.3 38.2 16.2 36.8 33.1
41.4 13.6 32.2 24.3 19.1 37.4 23.8 33.3 31.6 20.1
17.2 13.3 37.7 12.6 39.6 24.6 18.6 18.0 33.7 38.2
C o m p u t e t h e m e a n , t h e s t a n d a r d d e v i a t i o n , a n d t h e coefficient of v a r i a t i o n . M a k e
a h i s t o g r a m of t h e d a t a . D o t h e d a t a s e e m c o n s i s t e n t w i t h a n o r m a l d i s t r i b u t i o n
o n t h e b a s i s o f a g r a p h i c a n a l y s i s ? If n o t , w h a t t y p e o f d e p a r t u r e is s u g g e s t e d ?
A N S . F = 2 7 . 4 4 7 5 , s = 8 . 9 0 3 5 , V = 3 2 . 4 3 8 . T h e r e is a s u g g e s t i o n o f b i m o d a l i t y .
CHAPTER
Estimation and
Hypothesis Testing
in w h i c h case t h ezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
t d i s t r i b u t i o n m u s t be used. W e shall i n t r o d u c e the t dis-
t r i b u t i o n in Section 6.4. T h e a p p l i c a t i o n of t t o t h e c o m p u t a t i o n of c o n f i d e n c e
limits f o r statistics of s m a l l s a m p l e s w i t h u n k n o w n p o p u l a t i o n s t a n d a r d d e v i a -
t i o n s is s h o w n in S e c t i o n 6.5. A n o t h e r i m p o r t a n t d i s t r i b u t i o n , t h e c h i - s q u a r e
d i s t r i b u t i o n , is e x p l a i n e d in S e c t i o n 6.6. T h e n it is a p p l i e d to s e t t i n g c o n f i d e n c e
limits for t h e v a r i a n c e in S e c t i o n 6.7. T h e t h e o r y of h y p o t h e s i s t e s t i n g is i n t r o -
d u c e d in Section 6.8 a n d is a p p l i e d in S e c t i o n 6.9 to a variety of cases e x h i b i t i n g
the n o r m a l o r t d i s t r i b u t i o n s . Finally, S e c t i o n 6.10 illustrates h y p o t h e s i s t e s t i n g
for v a r i a n c e s by m e a n s of t h e c h i - s q u a r e d i s t r i b u t i o n .
W e c o m m e n c e o u r s t u d y of t h e d i s t r i b u t i o n a n d v a r i a n c e of m e a n s with a s a m -
pling experiment.
Expe rim e nt 6.1 You were asked to retain from Experiment 5.1 the means of the seven
samples of 5 housefly wing lengths and the seven similar means of milk yields. We
can collect these means from every student in a class, possibly adding them to the sam-
pling results of previous classes, and construct a frequency distribution of these means.
For each variable we can also obtain the mean of the seven means, which is a mean
of a sample 35 items. Here again we shall make a frequency distribution of these means,
although it takes a considerable number of samplers to accumulate a sufficient number
of samples of 35 items for a meaningful frequency distribution.
In T a b l e 6.1 we s h o w a f r e q u e n c y d i s t r i b u t i o n of 1400 m e a n s of s a m p l e s
of 5 h o u s e f l y w i n g lengths. C o n s i d e r c o l u m n s (1) a n d (3) for the t i m e being.
A c t u a l l y , t h e s e s a m p l e s w e r e o b t a i n e d not by b i o s t a t i s t i c s classes but by a digi-
tal c o m p u t e r , e n a b l i n g us t o collect t h e s e values with little elTort. T h e i r m e a n
a n d s t a n d a r d d e v i a t i o n a r c given at the f o o t of the table. T h e s e v a l u e s are p l o t -
ted o n p r o b a b i l i t y p a p e r in F i g u r e 6.1. N o t e t h a t t h e d i s t r i b u t i o n a p p e a r s q u i t e
n o r m a l , as d o c s t h a t of the m e a n s b a s e d o n 200 s a m p l e s of 35 w i n g l e n g t h s
s h o w n in t h e s a m e figure. T h i s i l l u s t r a t e s a n i m p o r t a n t t h e o r e m : The means of
samples from a normally distributed population are themselves normally distributed
regardless of sample size n. T h u s , we n o t e t h a t t h e m e a n s of s a m p l e s f r o m the
n o r m a l l y d i s t r i b u t e d housefly w i n g l e n g t h s a r e n o r m a l l y d i s t r i b u t e d w h e t h e r
t h e y a r e b a s e d o n 5 or 35 i n d i v i d u a l r e a d i n g s .
Similarly o b t a i n e d d i s t r i b u t i o n s of m e a n s of t h e heavily s k e w e d milk yields,
as s h o w n in F i g u r e 6.2, a p p e a r t o be close t o n o r m a l d i s t r i b u t i o n s . H o w e v e r ,
t h e m e a n s based o n five milk yields d o n o t a g r e e with the n o r m a l nearly as
well as d o the m e a n s of 35 items. T h i s illustrates a n o t h e r t h e o r e m of f u n d a -
m e n t a l i m p o r t a n c e in statistics: As sample size increases, the means of samples
drawn from a population of any distribution will approach the normal distribution.
This theorem, when rigorously stated (about sampling from populations with
finite variances), is k n o w n as t h e central limit theorem. T h e i m p o r t a n c e of this
t h e o r e m is that if η is l a r g e e n o u g h , it p e r m i t s us t o use the n o r m a l distri-
6.1 / d i s t r i b u t i o n a n d v a r i a n c e o f means 95
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB
TABLE 6 . 1
Frequency distribution of means of 1400 random samples of
5 housefly wing lengths. ( D a t a f r o m T a b l e 5.1.) C l a s s m a r k s
chosen t o give intervals of t o each side of the p a r a m e t r i c
mean μ .
Class mark W
Y Class mark (S)
(in mm χ 10~ ') (in ffy units) f
_ il 1
39.832 ->4
z 11
40.704 4
41.576 4 19
42.448 - U 64
43.320 - U 128
3 247
44.192 4
, 45.064 41 226
μ = 45.5 -»
1
45.936 4 259
3 231
46.808 4
47.680 u 121
48.552 |3 61
*A
49.424 21 23
50.296 z 6
4
51.168 ->4 3
1400
_l 1 i i i . i i i i i i 1 i l
- 3 - 2 - 1 0 1 2 3 4
H o u s e f l y w i n g l e n g t h s in σ γ units
Samples of 35
0.1 -
. 1 ι 1 I I I I I I—I I 1 I I I
- 3 - 2 - 1 0 I 2 3 4
H o u s e f l y w i n g l e n g t h s in (i v units
figure 6.1
G r a p h i c analysis of m e a n s of 14(X) r a n d o m s a m p l e s of 5 housefly wing lengths (from T a b l e 6.1)
a n d of m e a n s of 200 r a n d o m s a m p l e s of 35 housefly wing lengths.
Samples of 5
0.1
- 3 - 2 - 1 0 1 2 3
M i l k y i e l d s in ιτ,7 units
S a m p l e s of 3 5
99.9
99
•S
a
£-95
| 90
1 8 0
1.70
r 5 ft)
s 0
8 40
X 30
ω 20
1 "0
3 5
ε3
υ ι
οι
- 2 - 1 0 1 2 3
M i l k y i e l d s in <τν units
v w _ in > i i
Σ Η,'
for the weighted m e a n . W c shall state w i t h o u t proof t h a t the variance of the
weighted s u m of independent items Σ" is
V a r ( £ w , Y ^ = £vvraf (6.1)
Since the weights u·, in this case equal 1. Σ" »ν( = η, a n d we can rewrite the a b o v e
expression as
η
V
σΐ
6.1 / d i s t r i b u t i o n a n d v a r i a n c e o f means 99
(6.3)
t a b l ezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB
6.2
Means, standard deviations, and standard deviations of means
(standard errors) of five random samples of 5 and 35 housefly
wing lengths and Jersey cow milk yields, respectively. ( D a t a
f r o m T a b l e 5.1.) P a r a m e t r i c values for t h e statistics are given
in the sixth line of each c a t e g o r y .
U) (2) (3)
Υ s Sf
Wing lengths
Milk yields
.s .v ,s
i~*r\/\/\ .. I i ί\ί\ /Sn 1A
6.2 / d i s t r i b u t i o n a n d v a r i a n c e o f o t h e r statistics 101
100
0
:s.80 II. II 10.02 2(>.(>2 ii 1.21! 11 ,S:i 10. 11 57.01 (vl.lifi
1
0 S 10 12 II IS
f i g u r i : 6.3
H i s t o g r a m of v a r i a n c e s b a s e d o n 1400 s a m p l e s of 5 housefly w i n g l e n g t h s f r o m T a b l e 5.1. Abscissa
102 c h a p t e r 6 / estimation a n d hypothesis testing
ofzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
V, which is the same as t h e s t a n d a r d e r r o r of V. Used w i t h o u t a n y qualifica-
tion, t h e t e r m " s t a n d a r d e r r o r " conventionally implies the s t a n d a r d e r r o r of t h e
m e a n . " S t a n d a r d d e v i a t i o n " used w i t h o u t qualification generally m e a n s stan-
d a r d deviation of items in a s a m p l e or p o p u l a t i o n . T h u s , w h e n y o u r e a d t h a t
m e a n s , s t a n d a r d deviations, s t a n d a r d errors, a n d coefficients of v a r i a t i o n a r e
s h o w n in a table, this signifies t h a t a r i t h m e t i c m e a n s , s t a n d a r d d e v i a t i o n s of
items in samples, s t a n d a r d deviations of their m e a n s ( = s t a n d a r d e r r o r s of
means), a n d coefficients of v a r i a t i o n are displayed. T h e following s u m m a r y
of terms m a y be helpful:
BOX 6.1
Standard errors for common statistics.
S Sy Is?
1 Ϋ η - 1 True for any population
Sf
Vn yfn V η with finite variance
2 Median s med « (1.2533)sy η - 1 Large samples from
normal populations
This m e a n s that the probability Ρ that the sample means Y will dilfcr by no
m o r e t h a n 1.96 s t a n d a r d e r r o r s σ/sjn f r o m the p a r a m e t r i c mean μ equals 0.95.
T h e expression between the brackets is an inequality, all terms of which can be
multiplied by aj\fn to yield
because —a < b < a implies a > — b > —a, which can be written as —a <
— b < a. And finally, we can transfer — Ϋ across the inequality signs, just as in an
104 chapter 6 / estimation and hypothesis testing
e q u a t i o n it could be t r a n s f e r r e d across the equal sign. This yields the final desired
expression:
ί- 1.96σ - 1.96σ)
P\Y + = 0.95 (6.4)
I \Jn \Jn )
or
Experiment 6.2. For the seven samples of 5 housefly wing lengths and the seven similar
samples of milk yields last worked with in Experiment 6.1 (Section 6.1), compute 95%
confidence limits to the parametric mean for each sample and for the total sample based
on 35 items. Base the standard errors of the means on the parametric standard deviations
of these populations (housefly wing lengths σ = 3.90, milk yields σ = 11.1597). Record
how many in each of the four classes of confidence limits (wing lengths and milk yields,
η = 5 and η = 35) are correct—that is, contain the parametric mean of the population.
Pool your results with those of other class members.
6.4 Student'syxtsrponmljifecbaYSRPOJHGEDCA
t distribution
T h e deviations Υ — μ of s a m p l e m e a n s f r o m the p a r a m e t r i c m e a n of a n o r m a l
distribution are themselves normally distributed. If these deviations are divided
by the p a r a m e t r i c s t a n d a r d deviation, the resulting ratios, (Ϋ — μ )/σγ, are still
normally distributed, with μ — 0 a n d σ = 1. S u b t r a c t i n g the c o n s t a n t μ f r o m
every Ϋ, is simply an additive code (Section 3.8) and will not c h a n g e the f o r m
of the distribution of s a m p l e means, which is n o r m a l (Section 6.1). Dividing each
deviation by the c o n s t a n t o Y reduces the variance to unity, but p r o p o r t i o n a t e l y
so for the entire distribution, so that its s h a p e is not altered a n d a previously
normal distribution r e m a i n s so.
If, on the o t h e r h a n d , we calculate the variance sf of each of the samples
a n d calculate the deviation for each m e a n \\ as ( V· — /()/%,, where ,sy .stands for
the estimate of the s t a n d a r d error of the m e a n of the f'th sample, we will find
the distribution of the deviations wider and m o r e peaked than the n o r m a l distri-
bution. This is illustrated in f i g u r e 6.4, which shows the ratio (Vi - μ )/*Υι for
the 1400 samples of live housefly wing lengths o f T a b l e 6.1. T h e new distribution
ranges wider than the c o r r e s p o n d i n g n o r m a l distribution, because the d e n o m i -
n a t o r is the sample s t a n d a r d e r r o r r a t h e r than the p a r a m e t r i c s t a n d a r d e r r o r a n d
will s o m e t i m e s be smaller a n d sometimes greater than expected. This increased
variation will he reflected in the greater variance of the ratio (Υ μ ) 'sY. T h e
6.4 / s t u d e n t ' s i d i s t r i b u t i o n 107
f.
figure 6.4
D i s t r i b u t i o n of q u a n t i t y f szyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
= (Ϋ — μ )/Χγ a l o n g abscissa c o m p u t e d for 1400 s a m p l e s of 5 housefly wing
lengths presented as a h i s t o g r a m a n d as a c u m u l a t i v e frequency d i s t r i b u t i o n . R i g h t - h a n d o r d i n a t e
represents frequencies for the h i s t o g r a m ; l e f t - h a n d o r d i n a t e is c u m u l a t i v e frequency in probability
scale.
Normal =
/
-() - 5 -4 -:i - 2 - 1 0 2 3 4 5
I units
H<aiKi 6.5
F r e q u e n c y c u r v e s of ί d i s t r i b u t i o n s for 1 a n d 2 d e g r e e s
of f r e e d o m c o m p a r e d with t h e n o r m a l d i s t r i b u t i o n .
6.5 / c o n f i d e n c e l i m i t s b a s e d o n s a m p l e statistics 109
A r m e d with a k n o w l e d g e of thezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFE
t distribution, we are n o w able to set confidence
limits to the m e a n s of samples f r o m a n o r m a l frequency distribution whose
p a r a m e t r i c s t a n d a r d deviation is u n k n o w n . T h e limits a r e c o m p u t e d as L , =
Ϋ — ία(Μ_ ιjSy a n d L2 = Ϋ + tx[„^1]Sy for confidence limits of p r o b a b i l i t y Ρ —
1 - α. T h u s , for 95% confidence limits we use values of f 0 0 5 [ „ _ , v W e c a n rewrite
Expression (6.4a) as
P{L, < μ < L2} = P{ Ϋ - ί α ί η _ n s y < μ < Υ + t^-uSr} = 1 - a (6.5)
An example of the application of this expression is shown in Box 6.2. W e can
BOX 6.2
Confidence limits foryxtsrponmljifecbaYSRPOJHGEDCA
μ .
Aphid stem mother femur lengths from Box 2.1: Ϋ = 4.004; s = 0.366; η = 25.
Values for ιφ_,, from a two-tailed t table (Table ΠΙ), where 1 - α is the proportion
expressing confidence and η — 1 are the degrees of freedom:
«0 .0 5 1 2 4 1 = 2.064 t0 . 0 1 [ 2 4 1 = 2 " 7 9 7
The 95% confidence limits for the population mean μ are given by the equations
s
L , (lower limit) = Y - Vosi»- π
Sn
= 3.853
s
L2 (upper hmit) = Y + to ost„- „ ~
= 4.004 + 0.151
= 4.155
The 99% confidence limits are
Lt — Y to.01[24]
= 3.799
L2 — Y + 'o.01[24] r-
4n
= 4.004 + 0.205
= 4.209
110 chapter 6 / estimation and hypothesis testing
Experiment 6.3. Repeat the computations and procedures of Experiment 6.2 (Section 6.3),
but base standard errors of the means on the standard deviations computed for each
sample and use the appropriate t value in place of a standard normal deviate.
1(10
X u m b e r of t r i a l s
N u m b e r of t r i a l s
i k a JKf: 6.6
N i n e t y - l i v e p e r c e n t c o n f i d e n c e i n t e r v a l s of m e a n s of 20(1 s a m p l e s of 35 houselly w i n g l e n g t h s , b a s e d
o n s a m p l e s t a n d a r d e r r o r s s,. T h e h e a v y h o r i z o n t a l line is t h e p a r a m e t r i c m e a n μ . T h e o r d i n a t e
represents the variable
6.5 / c o n f i d e n c e l i m i t s b a s e d o n s a m p l e statistics 111
w h e r ezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
VP s t a n d s f o r t h e p a r a m e t r i c value of t h e coefficient of v a r i a t i o n . Since
t h e s t a n d a r d e r r o r of t h e coefficient of v a r i a t i o n e q u a l s a p p r o x i m a t e l y =
V/sJln, w e p r o c e e d as follows:
, 100s 100(0.3656)
= 9.13
Y 4.004
V L 2 9
~ V ^ 2 5 7.0711
L ι = V — l0.05[24]SF
= 9.13 - ( 2 . 0 6 4 ) ( 1.29)
= 9.13 - 2.66
= 6.47
L2 = V + io.05[24]SK
= 9.13 + 2.66
= 11.79
x'2
FKiURE 6.7
F r e q u e n c y curves of χ 2 d i s t r i b u t i o n for I. 2, 3, a n d 6 degrees of f r e e d o m .
6.6 / t h e c h i - s q u a r e distribution 113
-a
Y'h we can rewrite Σ" Y'2 as
U s i n g the definition ofzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
π |y-u) 2
1yxtsonmljieaYWPNLJIA
"
(6.6)
Kiw- y? (6.7)
We saw in the last section that the ratio (η — 1 )s2/a2 is distributed as χ 2 with
η — 1 degrees of freedom. We take a d v a n t a g e of this fact in setting confidence
limits to variances.
First, we can m a k e the following statement a b o u t the ratio (η — 1 )s2/a2:
2 ( 2
P iJ 1 - (a/2 ))[n - 1]<— - "σ2^ < yS /(α/·2)[η- 1] \r ~~ l1 - u a
2
•fi^— <σ < -J--Σζ! 1 - I 7 (6.10)
This still looks like a formidable expression, but it simply means that if we
divide the sum of squares Σ y2 by the two values of xf„ that cut off tails each
a m o u n t i n g to a/2 of the area of the _,,-distribution, the two quotients will
cnclose the true value of the variance σ1 with a probability of Ρ = I — a.
An actual numerical example will m a k e this clear. Suppose we have a sam-
ple of 5 housefly wing lengths with a sample variance of s 2 = 13.52. If we wish to
set 95% confidcncc limits to the parametric variance, we evaluate Expression
(6.10) for the sample variance .s-2. We first calculate the sum of squares for this
sample: 4 x 13.52 = 54.08. Then we look up the values for xf, 0 2 a n d χο.<παι*ί·
Since 95% confidence limits are required, a in this case is equal lo 0.05. These χ2
values span between them 95% of the area under the χ2 curve. They correspond
to 11.143 and 0.484, respectively, and the limits in Expression (6.10) then become
54.08 54.08
Und /
11.1-13 -·' 0.484
or
/., - - 4 . 8 5 and L2 = I 1 1 . 7 4
This confidence interval is very wide, but we must not forget that the sample
variance is, alter all, based on only 5 individuals. N o t e also that the interval
6.8 / i n t r o d u c t i o n t o h y p o t h e s i s testing 115
BOXyxvutsrqponmlihfedcbaYVTSRPONHGFDCA
63
Confidence limits for a 2 . Method of shortest unbiased confide»» Intervals.
s2 0.1337.
Aphid stem mother femur lengths from Box 11: » = 25;zyxwvutsrqponmlkjihgfedcbaZYXWVU
one. T h e n a t u r e of the tests varies with the d a t a a n d the hypothesis, but the
same general philosophy of hypothesis testing is c o m m o n to all tests a n d will
be discussed in this section. Study the material below very carefully, because it
is f u n d a m e n t a l to an u n d e r s t a n d i n g of every subsequent chapter in this b o o k !
W e would like to refresh your m e m o r y on the sample of 17 animals of
species A, 14 of which were females a n d 3 of which were males. These d a t a were
examined for their fit t o the binomial frequency distribution presented in Sec-
tion 4.2, and their analysis was shown in T a b l e 4.3. We concluded f r o m T a b l e 4.3
that if the sex ratio in the p o p u l a t i o n was 1:1zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQP
(ρς = qs = 0.5), the probability
of obtaining a sample with 14 males a n d 3 females would be 0.005,188, m a k i n g
it very unlikely that such a result could be obtained by chance alone. W e learned
that it is conventional to include all "worse" o u t c o m e s — t h a t is, all those that
deviate even m o r e f r o m the o u t c o m e expected on the hypothesis p9 = qs = 0.5.
Including all worse outcomes, the probability is 0.006,363, still a very small
value. T h e above c o m p u t a t i o n is based on the idea of a one-tailed test, in which
we are interested only in departures f r o m the 1:1 sex ratio that show a pre-
p o n d e r a n c e of females. If we have no preconception a b o u t the direction of the
d e p a r t u r e s f r o m expectation, we must calculate the probability of obtaining a
sample as deviant as 14 females a n d 3 males in either direction f r o m expectation.
This requires the probability either of obtaining a sample of 3 females a n d 14
males (and all worse samples) or of obtaining 14 females and 3 males (and all
worse samples). Such a test is two-tailed, and since the distribution is symmet-
rical, we d o u b l e the previously discussed probability to yield 0.012,726.
W h a t does this probability mean? It is our hypothesis that p.t = q , = 0.5.
Let us call this hypothesis H0, the null hypothesis, which is the hypothesis under
test. It is called the null hypothesis because it assumes that there is n o real
difference between the true value of ρ in the p o p u l a t i o n from which we sampled
and the hypothesized value of ρ = 0.5. Applied to the present example, the null
hypothesis implies that the only reason o u r sample does not exhibit a 1:1 sex
ratio is because of sampling error. If the null hypothesis p.t = q; = 0.5 is true,
then approximately 13 samples out of 1000 will be as deviant as or more deviant
than this one in either direction by chance alone. Thus, it is quite possible to have
arrived at a sample of 14 females and 3 males by chance, but it is not very
probable, since so deviant an event would occur only a b o u t 13 out of 1000 times,
or 1.3% of the time. If we actually obtain such a sample, we may m a k e one
of two decisions. We may decide that the null hypothesis is in fact true (that is,
the sex ratio is 1:1) and that the sample obtained by us just happened to be one
of those in the tail of the distribution, or we may decide that so deviant a sample
is too improbable an event to justify acceptance of the null hypothesis. W e may
therefore decide that the hypothesis that the sex ratio is 1:1 is not true. Either
of these decisions may be correct, depending u p o n the truth of the matter. If
in fact the 1:1 hypothesis is correct, then the first decision (to accept the null
hypothesis) will be correct. If we decide to reject the hypothesis under these
circumstances, we commit an error. The rejection of a true null hypothesis is
called a type I error. O n the other hand, if in fact the true sex ratio of the pop-
6.8 / i n t r o d u c t i o n t o h y p o t h e s i s testing 117
ulation is other t h a n 1:1, the first decision (to accept the 1:1 hypothesis) is an
error, a so-calledzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
type II error, which is the acceptance of a false null hypothesis.
Finally, if the 1:1 hypothesis is not true and we d o decide to reject it, then we
again m a k e the correct decision. Thus, there are two kinds of correct decisions:
accepting a true null hypothesis a n d rejecting a false null hypothesis, a n d there
are two kinds of errors: type I, rejecting a true null hypothesis, a n d type II,
accepting a false null hypothesis. These relationships between hypotheses a n d
decisions can be summarized in the following table:
Statistical decision
Null hypothesis
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED
TABLE 6 . 3
Relative expected frequencies for samples of 17 animals
under two hypotheses. Binomial d i s t r i b u t i o n .
(•J) W
(2)
3? 1rel /rel
17 0 0.0000076 0.0010150
16 1 0.0001297 0.0086272
15 2 0.0010376 0.0345086
14 3 0.0051880 0.0862715
13 4 0.0181580 0.1509752
12 5 0.0472107 0.1962677
11 6 0.0944214 0.1962677
10 7 0.1483765 0.1542104
9 8 0.1854706 0.0963815
8 9 0.1854706 0.0481907
7 10 0.1483765 0.0192763
6 11 0.0944214 0.0061334
5 12 0.0472107 0.0015333
4 13 0.0181580 0.0002949
3 14 0.0051880 0.0000421
2 15 0.0010376 0.0000042
1 16 0.0001297 0.0000002
0 17 0.0000076 0.0000000
Total 1.0000002 0.9999999
C r i t i c a l or C r i t i c a l or
rejection rejection
—region—η- -Acceptance region- ρ—region—•
1 -zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONM
a
0.2
0.15
0.1
0.05
^ i n HL
0 1 2 3 14 5 6 7 8 9 10 11 12 13| 14 15 16 17
ι
N u m b e r of f e m a l e s in s a m p l e s of 17 a n i m a l s
I I
-1 — β-
0.2
0.15
0.1
0.05
0
0 1 2 3j 4 5 0 7 8 9 10 11 12 13|14 15 10 17
Jk
N u m b e r of f e m a l e s in s a m p l e s of 17 a n i m a l s
FIGURE 6 . 8
Expected d i s t r i b u t i o n s of o u t c o m e s when s a m p l i n g 17 a n i m a l s f r o m two h y p o t h e t i c a l p o p u l a t i o n s .
(A) //(>:./>, — 4 ; = 2- (B) / / , : p, = 2q; = J. D a s h e d lines s e p a r a t e critical regions f r o m a c c e p t a n c e
region of the d i s t r i b u t i o n of part A. Type I e r r o r -x e q u a l s a p p r o x i m a t e l y 0.01.
hypothesis H , : p . = 2q ,t, which states thai the sex ratio is 2:1 in favor of females
so that p , = f a n d q ; = 3. We now have to calculate expected frequencies for
the binomial distribution (p. + q .f = (5 + J ,) 17 to lind the probabilities of the
various o u t c o m e s u n d e r the alternative hypothesis. These arc s h o w n graphically
in Figure 6.8B a n d a r e tabulated and c o m p a r e d with expectcd frequencies of the
earlier distribution in T a b l e 6.3.
S u p p o s e we h a d decided on a type I e r r o r of α 0.01 means " a p p r o x i -
mately equal to") as s h o w n in Figure 6.8A. At this significance level we would
accept (he / / 0 for all samples of 17 having 13 o r fewer a n i m a l s of o n e sex.
Approximately 99% of all samples will fall into this category. However, what
if H 0 is not true a n d H , is true? Clearly, f r o m the p o p u l a t i o n represented by
hypothesis / i , we could also o b t a i n o u t c o m e s in which o n e sex w a s represented
120 c h a p t e r 6 /' e s t i m a t i o n a n d h y p o t h e s i s testing
45 5
Lt = μ 0 - t0.0ii„.pr = - " (1.96)0.744) = 42.08
and
FIGURE 6 . 9
Expected d i s t r i b u t i o n of m e a n s of s a m p l e s of 5 housefly wing lengths f r o m n o r m a l p o p u l a t i o n s
specified by μ as s h o w n a b o v e curves a n d a j = 1.744. C e n t e r curve r e p r e s e n t s null h y p o t h e s i s ,
H0 : μ = 45.5; curves at sides represent alternative h y p o t h e s e s , μ = 37 or μ = 54. Vertical lines delimit
5% rejection regions for the null h y p o t h e s i s (2i°7> in each tail, shaded).
F1GURL 6 . 1 0
D i a g r a m to illustrate increases in type II error β as alternative hypothesis H , a p p r o a c h e s null
hypothesis / / „ — t h a t is, μ , a p p r o a c h e s μ . Shading represents β. Vertical lines m a r k off 5% critical
regions (2{% in each tail) for the null hypothesis. T o simplify the graph the alternative distributions
are shown for one tail only. D a t a identical to those in Γ-'igure 6.9.
FK.UKi: ή I I
Power curves for testing //„: μ - 45.5, / / , : μ φ 45.5 for ιι 5
6.8 / i n t r o d u c t i o n t o h y p o t h e s i s testing 125
40
48.:i7
Wing length (in units of 0 I mm1
FIGURE 6 . 1 2
O n e - t a i l e d significance test for the d i s t r i b u t i o n of F i g u r e 6.9. Vertical line n o w cuts off 5% rejection
region f r o m o n e tail of t h e d i s t r i b u t i o n ( c o r r e s p o n d i n g a r e a of curve has been shaded).
= (6-1U
Sy
* = 0.05 > Ρ > 0.01 ** = 0.01 > Ρ > 0.001 *** = ρ < 0.001
BOX 6.4
Testing the significance of a statistic—that is, the significance of a deviation of a
sample statistic from a parametric value. For normally distributed statistics.
Computational steps
I. Compute t„ as the following ratio:
St — S i .
t β r.
ss<
where St is a sample statistic, Sip is the parametric value against which the
sample statistic is to be tested, and ss, is its estimated standard error, obtained
from Box 6.1, or elsewhere in this book.
Ζ The pertinent hypotheses are
H 0 : St — Stp Hi-· St Φ St„
for a twotailed test, and
Hq'. St" Stp Ht: St > St„
or
H0: St — Stp Ht·. St < St,
for a onetailed test.
3. In the twotailed test, look up the critical value of t,(v), where α is the type I
error agreed upon and ν is the degrees of freedom pertinent to the standard
error employed (see Box 6.1). In the onetailed test look up the critical value
of for a significance level of a.
4. Accept or reject the appropriate hypothesis in 2 on the basis of the ts value
in 1 compared with critical values of t in 3.
The method of Box 6.4 can be used only if the statistic is normally distributed.
In the case of the variance, this is not so. As we have seen, in Section 6.6, sums
of squares divided by <τ2 follow the χ 2 distribution. Therefore, for testing the
hypothesis that a sample variance is different from a parametric variance, we
must employ the χ 2 distribution.
Let us use the biological preparation of the last section as an example.
Wc were told that the s t a n d a r d deviation was 11.2 based on 10 samples. There-
fore, the variance must have been 125.44. Suppose the government postulates
that the variance of samples from the preparation should be no greater than
100.0. Is our sample variance significantly above 100.0? Remembering from
130 c h a p t e r 6 /' e s t i m a t i o n a n d h y p o t h e s i s testing
γ ζ J " ~ I)* 2
a2
_ (9)125.44
~ 100
= 11.290
The values represent chi-squarcs at points cutting off 2\'Z rejection regions
at each tail of the χ 2 distribution. Λ value of X 2 < 2.700 or > 19.023 would
have been evidence that the sample variance did not belong to this population.
O u r value of X 2 = 11.290 would again have led to an acceptance of the null
hypothesis.
In the next chapter we shall see that there is another significance test avail-
able to test the hypotheses a b o u t variances of the present section. This is the
mathematically equivalent F test, which is, however, a more general test, allow-
ing us to test the hypothesis that two sample variances come from populations
with equal variances
131
Exercises
6.1 Since it is possible to test a statistical hypothesis with any size sample, why
are larger sample sizes preferred? ANS. When the null hypothesis is false, the
probability of a type II error decreases as η increases.
6.2 Differentiate between type I and type II errors. What do we mean by the power
of a statistical test?
6.3 Set 99% confidence limits to the mean, median, coefficient of variation, and vari-
ance for the birth weight data given in Box 3.2. ANS. The lower limits are
109.540, 109.060, 12.136, and 178.698, respectively.
6.4 The 95% confidence limits for μ as obtained in a given sample were 4.91 and
5.67zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
g. Is it correct to say that 95 times out of 100 the population mean, //, falls
inside the interval from 4.91 to 5.67 g? If not, what would the correct state-
ment be?
6.5 In a study of mating calls in the tree toad Hyla ewingi, Littlejohn (1965) found
the note duration of the call in a sample of 39 observations from Tasmania to
have a mean of 189 msec and a standard deviation of 32 msec. Set 95% confi-
dence intervals to the mean and to the variance. ANS. The 95% confidence limits
for the mean are from 178.6 to 199.4. The 95% shortest unbiased limits for the
variance are from 679.5 to 1646.6.
6.6 Set 95% confidence limits to the means listed in Table 6.2. Arc these limits all
correct? (That is, do they contain μ ?)
6.7 In Section 4.3 the coefficient of dispersion was given as an index of whether or
not data agreed with a Poisson distribution. Since in a true Poisson distribution,
the mean μ equals the parametric variance σ \ the coefficient of dispersion is anal-
ogous to Expression (6.8). Using the mite data from Table 4.5, test the hypoth-
esis that the true variance is equal to the sample mean — in other words, that
we have sampled from a Poisson distribution (in which the coefficient of disper-
sion should equal unity). Note that in these examples the chi-squarc tabic is not
adequate, so that approximate critical values must be computed using the method
given with Tabic IV. In Section 7.3 an alternative significance test that avoids
this problem will be presented. ANS. A'2 — (η — 1) χ CD = 1308.30, χΐ ~
645.708.
6.8 Using the method described in Exercise 6.7, test the agreement of the observed
distribution with a Poisson distribution by testing the hypothesis that the true
coefficient of dispersion equals unity for the data of Tabic 4.6.
6.9 In a study of bill measurements of the dusky flycatcher, Johnson (1966) found
that the bill length for the males had a mean of 8.14 + 0.021 and a coefficient
of variation of 4.67%. On the basis of this information, infer how many specimens
must have been used? ANS. Since V = lOOs/F and .s, = s/sjn, -Jit = K^F/IOO.
Thus η 328.
6.10 In direct klinokinctic behavior relating to temperature, animals turn more often
in the warm end of a gradient and less often in the colder end, the direction of
turning being at random, however. In a computer simulation of such behavior,
the following results were found. The mean position along a temperature gra-
dient was found to be — 1.352. The standard deviation was 12.267, and ti equaled
500 individuals. The gradient was marked olTin units: zero corresponded to the
middle of the gradient, the initial starting point of the animals; minus corre-
sponded to the cold end; and plus corresponded to the warmer end. Pest the
hypothesis that direct klinokinetic behavior did not result in a tendency toward
aggregation in either the warmer or colder end; that is, test the hypothesis that
/<, the mean position along the gradient, was zero.
132 c h a p t e r 6 /' e s t i m a t i o n a n d h y p o t h e s i s testing
6.11 In an experiment comparing yields of three new varieties of corn, the following
results were obtained.
Variety
1 2 3
To compare the three varieties the investigator computed a weighted mean of the
three means using the weights 2, — 1, — 1. Compute the weighted mean and its
95% confidence limits, assuming that the variance of each value for the weighted
mean is zero. ANS. Yw = —36.05, = 34.458, the 95% confidence limits are
— 47.555 to —24.545, and the weighted mean is significantly different from zero
even at the Ρ < 0.001 level.
CHAPTER
Introduction to Analysis
of Variance
TN
< T
Κt r-
—
ι 1
1 = rn
" § Κ t
Tf vi
•«t ΓΊ
II
II II II
a» *§ Ι&Γ
S ®
Ε » 'W I
°zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHG
ν»
U o1
" VD ^t Μ •f
t </-> ^r
-3-
θ"
OO — r- —I τ»-
rt V") Tf V) Κ
Ο m
Tf α ^Ό rf
Ό— Tf
Tf
Ο Ο) TrfJ" OTj"
O Ο rJ
Tf m
π Tf
OO C7\ ON Άι Ο
t τί" Tt xt t rr
Γ1
- Tf 00 fH ΓΙ oo
t 4 ^t ^t ^t rl
VJ </>
S ii
ii
•o c
c ο
136 c h a p t e r 7 /' i n t r o d u c t i o n to analysis of variance
a rather low estimate c o m p a r e d with those obtained in the other samples. Since
we have a sum of squares for each group, we could obtain an estimate of the
p o p u l a t i o n variance f r o m each of these. However, it stands to reason that we
would get a better estimate if we averaged these separate variance estimates in
some way. This is d o n e by c o m p u t i n g the weighted average of the variances by
Expression (3.2) in Section 3.1. Actually, in this instance a simple average would
suffice, since all estimates of the variance are based on samples of the same size.
However, we prefer to give the general formula, which works equally well for
this case as well as for instances of unequal sample sizes, where the weighted
average is necessary. In this case each sample variancezyxwvutsrqponmlkjihgfedcbaZYXW
sf is weighted by its
degrees of freedom, w\ = n ; — 1, resulting in a sum of squares ( Z y f ) , since
(«,· — l)s 2 = Σ y f . Thus, the n u m e r a t o r of Expression (3.2) is the sum of the sums
of squares. T h e d e n o m i n a t o r is Σ"(π, — 1) = 7 χ 4, the sum of the degrees of
freedom of each group. The average variance, therefore, is
and hence
tabi.K 7.2
Data arranged for simple analysis of variance, single classification, completely
randomized.
(/roups
a
γ y
sums £γ Σ. t2 iy3 • iy, •· · i n
Means Ϋ Υ, Y2 Υ, ' V, V,
138 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance
table, and the second subscript changes with each row representing an individual
item. Using this notation, we can c o m p u t e the variance of sample 1 as
1 i="
y y
—
η - r 1 i Σ= ι ( u - i)2
The variance within groups, which is the average variance of the samples,
is c o m p u t e d as
1 i=a j —η
α ( η -
Γ> ,Σ= ι Σ
1) j=ι ( Y i j -
N o t e the double s u m m a t i o n . It means that we start with the first group, setting
i = 1 (i being the index of the outer Σ). W e sum the squared deviations of all
items from the mean of the first group, changing index j of the inner Σ f r o m 1
to η in the process. W e then return to the outer summation, set i = 2, a n d sum
the squared deviations for g r o u p 2 from j = 1 toj = n. This process is continued
until i, the index of the outer Σ, is set to a. In other words, we sum all the
squared deviations within one g r o u p first and add this sum to similar sums f r o m
all the other groups.
The variance a m o n g groups is c o m p u t e d as
n i=a
2
-^-rliY.-Y)
a - 1 Μ
Let us devise yet a n o t h e r sampling experiment. This is quite a tedious one with-
out the use of computers, so we will not ask you to carry it out. Assume that
you are sampling at r a n d o m from a normally distributed population, such as the
housefly wing lengths with mean μ and variance σ2. T h e sampling procedure
consists of first sampling n l items and calculating their variance .vf, followed by
sampling n 2 items and calculating their variance .s2. Sample sizes n, and n 2 may
or may not be equal to each other, but are fixed for any one sampling experiment.
Thus, for example, wc might always sample 8 wing lengths for the first sample
(n,) and 6 wing lengths for the second sample (n 2 ). After each pair of values (sf
and has been obtained, wc calculate
This will be a ratio near 1, because these variances arc estimates of the same
quantity. Its actual value will depend on the relative magnitudes of variances
7.2 / t h ezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
F distribution 139
Fs of their variances, the average of these ratios will in fact a p p r o a c h the quantity
(n2 — l) /( «2 — 3), which is close to 1.0 when n2 is large.
The distribution of this statistic is called the F distribution, in h o n o r of
R. A. Fisher. This is a n o t h e r distribution described by a complicated mathe-
matical function that need not concern us here. Unlike the t and χ2 distributions,
the shape of the F distribution is determined by two values for degrees of freedom,
Vj and v 2 (corresponding to the degrees of freedom of the variance in the
n u m e r a t o r and the variance in the d e n o m i n a t o r , respectively). Thus, for every
possible combination of values v l5 v 2 , each ν ranging from 1 to infinity, there
exists a separate F distribution. Remember that the F distribution is a theoretical
probability distribution, like the t distribution and the χ2 distribution. Variance
ratios s f / s f , based on sample variances are sample statistics that m a y or may
not follow the F distribution. We have therefore distinguished the sample vari-
ance ratio by calling it Fs, conforming to o u r convention of separate symbols
for sample statistics as distinct from probability distributions (such as ts and
X2 contrasted with t and χ2).
We have discussed how to generate an F distribution by repeatedly taking
two samples from the same normal distribution. We could also have generated
it by sampling from two separate n o r m a l distributions differing in their mean
but identical in their parametric variances; that is, with μ , φ μ 2 but σ\ = σ\.
Thus, we obtain an F distribution whether the samples come from the same
normal population or from different ones, so long as their variances arc identical.
Figure 7.1 shows several representative F distributions. F or very low degrees
of freedom the distribution is l - s h a p c d , but it becomes humped and strongly
skewed to the right as both degrees of freedom increase. Table V in Appendix
norm 7. ι
140 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance
FKHJRE 7 . 2
F r e q u e n c y curve of the /· d i s t r i b u t i o n for (> and 24 degrees of f r e e d o m , respectively. A one-tailed
7 . 1 / t h ezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
F distribution 141
Thus, the two statistics of significance are closely related and, lacking a χ 2 table,
we could m a k e d o with anzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC
F table alone, using the values of vF [v ^ in place
°f* 2 v,·
Before we return to analysis of variance, we shall first apply our newly won
knowledge of the F distribution to testing a hypothesis a b o u t two sample
variances.
BOX 7.1
Testing the significance of differences between two variances.
Survival in days of the cockroach Blattella vaga when kept without food or water.
The alternative hypothesis is that the two variances are unequal. We have
no reason to suppose that one sex should be more variable than the other.
In view of the alternative hypothesis this is a twotailed test. Since only
the right tail of the F distribution is tabled extensively in Table V and in
most other tables, we calculate F s as the ratio of the greater variance over
the lesser one:
Because the test is twotailed, we look up the critical value Fa/2|vi,»2)> where
α is the type I error accepted and v, = ri1 — 1 and v2 = n, — 1 are the
degrees of freedom for the upper and lower variance, respectively. Whether
we look up ^<χ/2ΐν,.ν 2] o r Fx/up,vi] depends on whether sample 1 or sample
2 has the greater variance and has been placed in the numerator.
From Table V we find F0.02519,9] = 4.03 and F 0 0 5 l 9 i 9 J = 3.18. Be
cause this is a twotailed test, we double these probabilities. Thus, the F
value of 4.03 represents a probability of α = 0.05, since the righthand tail
area of α = 0.025 is matched by a similar lefthand area to the left of
^o.975[9.9i = '/f0.025(9,9] = 0.248. Therefore, assuming the null hypothesis
is true, the probability of observing an F value greater than 4.00 and
smaller than 1/4.00 = 0.25 is 0.10 > Ρ > 0.05. Strictly speaking, the two
sample variances are not significantly different—the two sexes are equally
variable in their duration of survival. However, the outcome is close
enough to the 5% significance level to make us suspicious that possibly
the variances are in fact different. It would be desirable to repeat this
experiment with larger sample sizes in the hope that more decisive results
would emerge.
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
7.3 /THE HYPOTHESISzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
H :uj
0 = σ\ 143
We shall now modify the data of Table 7.1, discussed in Section 7.1. Suppose
the seven groups of houseflies did not represent r a n d o m samples from the same
population but resulted from the following experiment. Each sample was reared
in a separate culture jar, and the medium in each of the culture jars was prepared
in a different way. Some had more water added, others more sugar, yet others
more solid matter. Let us assume that sample 7 represents the s t a n d a r d medium
against which we propose to c o m p a r e the other samples. The various changes
in the medium affect the sizes of the flies that emerge from it; this in turn affects
the wing lengths we have been measuring.
We shall assume the following effects resulting from treatment of the
medium:
The effect of treatment / is usually symbolized as a,. (Please note that this use
of α is not related to its use as a symbol for the probability of a type I error.)
Thus a, assumes the following values for the above treatment effects.
α, - - 5 α 4 =• I
α. = -2 «5=1
«Λ = 0 α6 = 5
/ν — η
σ·>
cQ «I ΓΪ
•f
II
= ε ι>-
οδ <*
υ
*> 2 r-i
ο
§ <+N ο
ο
I
ε "! n>-
+ I
'b.
—ι νο rr fN
ΚΊ Ό r-, \θ r J ^D
+ tl- un tn to vD
ο
•5 — ο (Ν •
—ι Ο γ- r- c-ι trqiedTJHA
•3 r r ti
o in
ο ^t so ι^ιο ^f Ο ο
2 « + s
te
«
CL _
i/i i/-
3 II
r<~) h- XI =
Ο 1
.Ξ ο* 1
W
7.4 / h e t e r o g e n e i t y a m o n g s a m p l e means 145
J u j ~ π
,2
Σ Σ ( V >'.,·
2
Σ Σ l ' y u · Α) ι>, · ·Λ)|
a(n - I ) ι ,- ι
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
TABLE 7 . 4
D a t a of Table 7.3 arranged in the manner of Table 7.2.
a Groups
I 2 3 i a
ΙΛ1 r , , + «1 Y
«2
lltsronljfeaUT
+
Yil + «3 ' • Yn + a, · · Y.I + ««
t 2 y 22 + * 2 + «3 • Yi2 + a. • Y.2 +
3 + Yli + «2 y 33 + «3 · • ^3 + «, · •• +Ya
«„ l
J Yxj y^ + >:«; + «3 •• ·
Yij+ "A •· • Y.J+
η + a, + *2 Yin + «3 •• ' Yin + Oti•• Y + m»„
η η π η
Sums + HOC, Σ 2 Y
+ "a2 Σ y 3 + »a 3 ••
• tYi + *i • Σκ + ny n
•
-- Σ - V) + («,• - a ) l 2
a - I ,= ι
1 1
a -
, ς '<>;
1 ,-v ,
>)' + a , Σ ^
- 1, ι
·<)·' + a 2 - , Σ
1,= ι
m «)
T h e first of these terms we immediately recognize as the previous variance el
the means, Sy. T h e second is a new q u a n t i t y , but is familiar by general appeal
ancc; it clearly is a variance or at least a q u a n t i t y akin to a variance. T h e tliiM
expression is a new type; it is a so-called covariance. which we have not w i
e n c o u n t e r e d . We shall not be concerned with it at this stage except to say th.n
7.4 / zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
HETEROGENEITY AMONG SAMPLE MEANS 147
in cases such as the present one, where the m a g n i t u d e of the treatment effects
a,· is assumed to be independent of the X to which they are added, the expected
value of this q u a n t i t y is zero; hence it does not contribute to the new variance
of means.
The independence of the treatments effects and the sample m e a n s is an
i m p o r t a n t concept that we must u n d e r s t a n d clearly. If we had not applied dif-
ferent treatments to the medium jars, but simply treated all jars as controls,
we would still have obtained differences a m o n g the wing length means. Those
are the differences f o u n d in Table 7.1 with r a n d o m sampling from the same
population. By chance, some of these means are greater, some are smaller. In
our planning of the experiment we had no way of predicting which sample
means would be small and which would be large. Therefore, in planning our
treatments, we had n o way of m a t c h i n g u p a large treatment effect, such as that
of medium 6, with the m e a n that by chance would be the greatest, as that for
sample 2. Also, the smallest sample mean (sample 4) is not associated with the
smallest treatment effect. Only if the m a g n i t u d e of the treatment effects were
deliberately correlated with the sample means (this would be difficult to d o in
the experiment designed here) would the third term in the expression, the co-
variance, have an expected value other than zero.
T h e second term in the expression for the new variance of m e a n s is clearly
added as a result of the treatment effects. It is a n a l o g o u s to a variance, but it
cannot be called a variance, since it is not based on a r a n d o m variable, but
rather on deliberately chosen treatments largely under our control. By changing
the m a g n i t u d e and n a t u r e of the treatments, wc can more or less alter the
variancelike quantity at will. We shall therefore call it thezyxwvutsrqponmlkjihgfedcbaZYXW
added component due
to treatment effects. Since the α,-'s are arranged so that a = 0, we can rewrite
the middle term as
Thus we see that the estimate of the parametric variance of the population is
increased by the quantity
a
which is η times the added c o m p o n e n t due to treatment effects. We found the
variance ratio f\. to be significantly greater than could be reconciled with the
null hypothesis. It is now obvious why this is so. We were testing the variance
148 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance
a — ι
1 2
- . Σ Ο ' , - > >' + ' . ' Σ <··'. · "· ' · , Σ · - π κ - η
a I ,· , a I ,·-1 α 1 ,· - ,
T h e first term is the variance of m e a n s ,Sy, as before, and the last term is the
covariance between the g r o u p m e a n s and (he r a n d o m effects Ah the expected
value of which is zero (as before), because the r a n d o m effects are independent
of (he m a g n i t u d e of the means. T h e middle term is a true variance, since .4,
is a r a n d o m variable. We symbolize it by .s^ and call it the added variance
component amoiui (/roups. It would represent the added variance c o m p o n e n t
a m o n g females or a m o n g medium batches, d e p e n d i n g on which of the designs
discussed a b o v e we were thinking of. T h e existence of this added variance com-
ponent is d e m o n s t r a t e d by the /·' test. If the g r o u p s are r a n d o m samples, we
may expect I- to a p p r o x i m a t e σ1/σ1 - I; but with an added variance c o m p o -
nent, the expected ratio, again displayed lopsidcdly, is
η2 + ησ\
X 2
a "
150 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance
N o t e thatzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
σΑ, the parametric value of sA, is multiplied by η, since we have to
multiply the variance of m e a n s by η to obtain an independent estimate of the
variance of the population. In a Model II a n o v a we are interested not in the
m a g n i t u d e of any At or in differences such as Al — A2, but in the m a g n i t u d e
of σΑ a n d its relative m a g n i t u d e with respect to σ 2 , which is generally expressed
as the percentage 100s^/(s 2 + sA). Since the variance a m o n g g r o u p s estimates
σ2 + ησ\, we can calculate s2A as
of the sum of the variances a m o n g and within groups. Model II will be formally
discussed at the end of this chapter (Section 7.7); the methods of estimating
variance c o m p o n e n t s are treated in detail in the next chapter.
So far we have ignored one other variance that can be c o m p u t e d from the
d a t a in Table 7.1. If we remove the classification into groups, we can consider
the housefly d a t a to be a single sample of an = 35 wing lengths and calculate
the m e a n and variance of these items in the conventional manner. T h e various
quantities necessary for this c o m p u t a t i o n are shown in the last column at the
right in Tables 7.1 and 7.3, headed " C o m p u t a t i o n of total sum of squares." We
obtain a mean of F = 45.34 for the sample in Table 7.1, which is, of course,
the same as the quantity Ϋ c o m p u t e d previously from the seven g r o u p means.
T h e sum of squares of the 35 items is 575.886, which gives a variance of 16.938
when divided by 34 degrees of freedom. Repeating these c o m p u t a t i o n s for the
d a t a in Table 7.3, we obtain ? = 45.34 (the same as in Table 7.1 because
Σ" a, = 0) and .v2 = 27.997, which is considerably greater than the c o r r e s p o n d -
ing variance from Table 7.1. The total variance c o m p u t e d from all an items is
a n o t h e r estimate of σ 2 . It is a good estimate in the first case, but in the second
sample (Table 7.3), where added c o m p o n e n t s due to treatment effects or added
variance c o m p o n e n t s are present, it is a poor estimate of the population variance.
However, the p u r p o s e of calculating the total variance in an a n o v a is not
for using it as yet a n o t h e r estimate of σ 2 , but for introducing an i m p o r t a n t
m a t h e m a t i c a l relationship between it and the other variances. This is best seen
when we arrange our results in a conventional analysis of variance table, as
7.5 / p a r t i t i o n i n g t h e t o t a l s u m o f s q u a r e s a n d d e g r e e s o f f r e e d o m 151
TABLE zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
7.5
Anova table for data in Table 7.1.
(i) (41
Sum Mean
U) (2) of squares square
Source of variation dj SS MS
shown in Table 7.5. Such a table is divided into four columns. The first iden-
tifies the source of variation as a m o n g groups, within groups, and total (groups
a m a l g a m a t e d to form a single sample). The column headed df gives the degrees
of freedom by which the sums of squares pertinent to each source of variation
must be divided in order to yield the corresponding variance. T h e degrees of
freedom for variation a m o n g groups is a — 1, that for variation within groups
is a (η — 1), and that for the total variation is an — 1. The next two columns
show sums of squares and variances, respectively. Notice that the sums of
squares entered in the a n o v a table are the sum of squares a m o n g groups, the
sum of squares within groups, and the sum of squares of the total sample of
an items. You will note that variances arc not referred to by that term in anova,
but are generally called mean squares, since, in a Model I anova, they d o not
estimate a population variance. These quantities arc not true mean squares,
because the sums of squares are divided by the degrees of freedom rather than
sample size. T h e sum of squares and mean square arc frequently abbreviated
SS and MS, respectively.
The sums of squares and mean squares in Table 7.5 are the same as those
obtained previously, except for minute r o u n d i n g errors. Note, however, an
i m p o r t a n t property of the sums of squares. They have been obtained indepen-
dently of each other, but when we add the SS a m o n g groups to the SS within
groups we obtain the total SS. The sums of squares are additive! Another way of
saying this is that wc can decompose the total sum of squares into a portion
due to variation a m o n g groups and a n o t h e r portion due to variation within
groups. Observe that the degrees of freedom are also additive and that the total
of 34 df can be decomposed into 6 df a m o n g groups and 28 df within groups.
Thus, if we know any two of the sums of squares (and their a p p r o p r i a t e degrees
of freedom), we can c o m p u t e the third and complete our analysis of variance.
N o t e that the mean squares arc not additive. This is obvious, since generally
(a + b)f(c + d) Φ a/c + b/d.
Wc shall use the c o m p u t a t i o n a l formula for sum of squares (Expression
(3.8)) to d e m o n s t r a t e why these sums of squares are additive. Although it is an
algebraic derivation, it is placed here rather than in the Appendix because
these formulas will also lead us to some c o m m o n c o m p u t a t i o n a l formulas for
analysis of variance. Depending on computational equipment, the formulas wc
152 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance
have used so far to obtain the sums of squares may not be the most rapid pro-
cedure.
T h e sum of squares of m e a n s in simplified n o t a t i o n is
Σ Y
SS„
=ς (- Σ y y - -„tr
- \n / a
Σ Ι tr
1 1 /
= Ση i ΣΣ^
a l η \ί a η
an*
Y 2
ss w h W i n = l X ( - η = Σ
α π t u / π
= ς ς > 2 „ Σ ( Σ ^
γ 2
ssuniύ = Σ Σ ( - η
u η 1 / a η
γ 2 γ
= ΣΣ - an- [\ Σ Σ
We now copy the formulas for these sums of squares, slightly rearranged as
follows:
1 /" "
SS. Σ Σ Y
-an \ Σ Σ y
y y 2
^ Σ ( Σ ) + Σ Σ
1
ΣΣ
a n ( a n
1
ss,. η
ΣΣγ
an
7.5 / p a r t i t i o n i n g t h e t o t a l s u m o f s q u a r e s a n d d e g r e e s o f f r e e d o m 153
1 ηtsronljfeaUT
a
\—' ^ Ί J
σ -1 > or or σ + ησΑ
a — ι
TABLE 7.6
Anova table for data in Table 7.3.
W (4)
Sum Μ can
U) C) af squares square
Source of variation df SS MS
y 7 1 _ y 7 = 41 - 45.4 = - 4 . 4
and the deviation of the individual wing length from the grand m e a n is
γΊι - y = 4 i — 45.34 = - 4 . 3 4
N o t e that these deviations are additive. The deviation of the item from the g r o u p
m e a n and that of the g r o u p mean from the grand m e a n add to the total devia-
tion of the item from the g r a n d j n e a n . These deviations are stated algebraically
as ( 7 — F) + ( F - F) = (Y - F). Squaring and s u m m i n g these deviations for an
items will result in
a n _ a _ _ an
απtsronljfeaUT
_ _ ^ a — = " _
2Σ(y - F h y - f) = 2 Ϊ [ ( ? - Ϋ ) Σ ι υ - ?>]
YtJ = μ + Al + € υ (7.3)
Exercises
7.1 In a study comparing the chemical composition of the urine of chimpanzees
and gorillas (Gartler, Firschein, and Dobzhansky, 1956), the following results
were obtained. For 37 chimpanzees the variance for the amount of glutamic acid
in milligrams per milligram of creatinine was 0.01069. A similar study based on
six gorillas yielded a variance of 0.12442. Is there a significant difference be-
tween the variability in chimpanzees and that in gorillas? ANS. Fs = 11.639,
025[5.36] ~ 2.90.
7.2 The following data are from an experiment by Sewall Wright. He crossed Polish
and Flemish giant rabbits and obtained 27 F , rabbits. These were inbred and
112 F 2 rabbits were obtained. We have extracted the following data on femur
length of these rabbits.
η y s
F, 27 83.39 1.65
Fi 112 80.5 3.81
Treatment
A Β C D
Litters
1 2 3 4 5 6 7
ANS. .r = 5.987, MS among = 4.416, s2A = 0, and Fs = 0.7375, which is clearly not
significant at the 5% level.
7.6 Show that it is possible to represent the value of an individual variate as follows:
y = (>') + (>',— V') + (Vj; — Y). What docs each of the terms in parentheses
estimate in a Model 1 anova and in a Model II anova?
CHAPTER
Single-Classification
Analysis of Variance
come especially simple for the two-sample case, as explained in Section 8.4.
In Model I of this case, the mathematically equivalentzyxwvutsrqponmlkjihgfedcbaZYX
t test can be applied
as well.
W h e n a Model I analysis of variance has been f o u n d to be significant,
leading to the conclusion that the m e a n s are not f r o m the same population,
we will usually wish to test the means in a variety of ways to discover which
pairs of m e a n s are different f r o m each other and whether the m e a n s can be
divided into groups that are significantly different from each other. T o this end,
Section 8.5 deals with so-called planned comparisons designed before the test
is run; and Section 8.6, with u n p l a n n e d multiple-comparison tests t h a t suggest
themselves to the experimenter as a result of the analysis.
We saw in Section 7.5 that the total sum of squares and degrees of freedom
can be additively partitioned into those pertaining to variation a m o n g groups
and those to variation within groups. F o r the analysis of variance proper, we
need only the sum of squares a m o n g groups and the sum of squares within
groups. But when the c o m p u t a t i o n is not carried out by computer, it is sim-
pler to calculate the total sum of squares and the sum of squares a m o n g groups,
leaving the sum of squares within groups to be obtained by the subtraction
SSiotai — SS g r o u p s . However, it is a good idea to c o m p u t e the individual vari-
ances so we can check for heterogeneity a m o n g them (sec Section 10.1). This will
also permit an independent c o m p u t a t i o n of SS w i l h i n as a check. In Section 7.5
we arrived at the following c o m p u t a t i o n a l formulas for the total a n d a m o n g -
groups sums of squares:
These formulas assume equal sample size η for each g r o u p and will be modified
in Section 8.3 for unequal sample sizes. However, they suffice in their present
form to illustrate some general points a b o u t c o m p u t a t i o n a l procedures in
analysis of variance.
We note that the second, subtracted term is the same in both sums of
squares. This term can be obtained by s u m m i n g all the variates in the a n o v a
(this is the grand total), squaring the sum, and dividing the result by the total
n u m b e r of variates. It is c o m p a r a b l e to the second term in the c o m p u t a t i o n a l
formula for the ordinary sum of squares (Expression (3.8)). This term is often
called the correction term (abbreviated CT).
The first term for the total sum of squares is simple. It is the sum of all
squared variatcs in the anova table. T h u s the total sum of squares, which
describes the variation of a single unstructured sample of an items, is simply
the familiar sum-of-squares formula of Expression (3.8).
162 c h a p t e r 8 / single-classification analysis of variance
8.2 Equal η
Expressions for the expected values of the m e a n squares are also shown
in the first a n o v a table of Box 8.1. They are the expressions you learned in the
previous chapter for a M o d e l I anova.
BOX 8.1
Singleclassification anova with equal sample sizes.
The effect of the addition of different sugars on length, in ocular units
( x 0.114 = mm), of pea sections grown in tissue culture with auxin present: η = 10
(replications per group). This is a Model I anova.
Treatments (a = 5)
17. Glucose
2% + 2tsronljfeaUT
2% %
Observations, Glucose Fructose /% Fructose Sucrose
i.e., replications Control added added added added
1 75 57 58 58 62
2 67 58 61 59 66
3 70 60 56 58 65
4 75 59 58 61 63
5 65 62 57 57 64
6 71 60 56 56 62
7 67 60 61 58 65
8 67 57 60 57 65
9 76 59 57 57 62
10 68 61 58 59 67
It
ΣϊY 701 593 582 580 641
70.1 59.3 58.2 58X> 64.1
Preliminary computations
CP M i ΣΥY ^ 191,828.18
5 x 1 0 50
e»V y
164 c h a p t e r 8 / single-classification analysis of variance
B O X 8,1 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRPONMLKJIHGFEDCBA
Continued
S. ss total =zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
i i r 2 ~ C T
= quantity 2 - quantity 4 - 193,151 - 191,828.18 - 1322.82
« quantity 3 - quantityyxvutsrqponmlihfedcbaYVTSRPONHGFDCA
4 « 192,905.50 - 191,828.18 = 1077.32
7. SS w j t h i n =s SS (ora i — SSgreap;
« quantity 5 - quantity 6 « 1322.82 - 1077.32 = 245.50
Expected
Source of variation df SS MS F, MS
f Y Among groups a - 1 6 - i - ^ +
(β - 1 ) MS w i thi „ a - 1
7
F - y Within groups a(n - 1) 7 — a2
a(n - 1)
y - Y Total an - 1 5
Substituting the computed values into the above table, we obtain the fol
lowing:
Anova table
Source of variation df SS MS Fs
Ϋ -- Y Among groups
(among treatments) 4 1077.32 269.33 49.33**
Y -- f Within groups
(error, replicates) 45 245.50 5.46
Y -- Ϋ Total 49 1322.82
8.3 Unequal η
This time we shall use a Model II analysis of variance for an example. Remember
that up to and including the F test for significance, the c o m p u t a t i o n s are exactly
the same whether the anova is based on Model I or Model II. We shall point
out the stage in the c o m p u t a t i o n s at which there would be a divergence of
operations depending on the model.
T h e example is shown in Table 8.1. It concerns a series of morphological
measurements of the width of the scutum (dorsal shield) of samples of tick
larvae obtained from four different host individuals of the cottontail rabbit.
These four hosts were obtained at r a n d o m from one locality. We know nothing
about their origins or their genetic constitution. They represent a r a n d o m
sample of the population of host individuals from the given locality. We would
not be in a position to interpret differences between larvae from different hosts,
since we know nothing of the origins of the individual rabbits. Population
biologists arc nevertheless interested in such analyses because they provide an
answer to the following question: Are (he variances of means of larval characters
a m o n g hosts greater than expected on the basis of variances of the characters
within hosts? We can calculate the average variance of width of larval scutum
on a host. This will be our "error" term in the analysis of variance. We then
test the observed mean square a m o n g groups and sec if it contains an added
c o m p o n e n t of variance. What would such an added c o m p o n e n t of variance
represent? The mean square within host individuals (that is, of larvae on any
one host) represents genetic differences a m o n g larvae and differences in environ-
mental experiences of these larvae. Added variance a m o n g hosts demonstrates
significant differentiation a m o n g the larvae possibly due to differences a m o n g
t In, l-wiclt.' -ilTivf inn ill.· I·.™·!,. Il -ilcr» mau ke> rllwa Ι.· ΛΙΙΪ,· r,.|i Ίηι,,η.ι
166 c h a p t e r 8 / single-classification analysis of variance
TABLE 8 . 1zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
D a t a and anova table for a single classification anova with unequal sample sizes. W i d t h of s c u t u m
(dorsal shield) of larvae of t h e tickzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED
Haemaphysalis leporispalustris in s a m p l e s f r o m 4 c o t t o n t a i l
r a b b i t s . M e a s u r e m e n t s in m i c r o n s . T h i s is a M o d e l II a n o v a .
Hosts (a = 4)
1 2 3 4
8 10 13 6
Anova table
Source of variation df SS MS Fs
a m o n g them. A possible reason for looking at the means would be at the begin-
ning of the analysis. O n e might wish to look at the g r o u p means to spot outliers,
which might represent readings that for a variety of reasons could be in error.
The c o m p u t a t i o n follows the outline furnished in Box 8.1, except that the
symbol Σ" now needs to be written Σ"', since sample sizes differ for each group.
Steps 1, 2, and 4 t h r o u g h 7 are carried out as before. Only step 3 needs to be
modified appreciably. It is:
= Σ
The critical 5% and 1% values ofzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONML
F are shown below the a n o v a table in
Table 8.1 (2.89 and 4.44, respectively). You should confirm them for yourself
in Table V. N o t e that the argument v2 = 33 is not given. You therefore have
to interpolate between a r g u m e n t s representing 30 to 40 degrees of freedom,
respectively. T h e values shown were c o m p u t e d using h a r m o n i c interpolation.
However, again, it was not necessary to carry out such an interpolation. The
conservative value of F, Fal3i30], is 2.92 and 4.51, for α = 0.05 and a = 0.01,
respectively. T h e observed value Fs is 5.26, considerably above the interpolated
as well as the conservative value of F0 0l. We therefore reject the null hypothesis
(H0: a\ = 0) that there is no added variance c o m p o n e n t a m o n g g r o u p s and that
the two mean squares estimate the same variance, allowing a type I error of less
than \ X . We accept, instead, the alternative hypothesis of the existence of an
added variance c o m p o n e n t σ2Λ.
W h a t is the biological meaning of this conclusion? For some reason, the
ticks on different host individuals dilfer more from each other than d o individual
ticks on any one host. This may be due to some modifying influence of individ-
ual hosts on the ticks (biochemical differences in blood, differences in the skin,
differences in the environment of the host individual—all of them rather un-
likely in this case), or it may be due to genetic diflcrcnces a m o n g the ticks.
Possibly the ticks on each host represent a sibship (that is, are descendants of a
single pair of parents) and the differences in the ticks a m o n g host individuals
represent genetic differences a m o n g families; or perhaps selection has acted dif-
ferently on the tick populations on each host, or the hosts have migrated to the
collection locality from different geographic areas in which the licks differ in
width of scutum. Of these various possibilities, genetic differences a m o n g sib-
ships seem most reasonable, in view of the biology of the organism.
The c o m p u t a t i o n s up to this point would have been identical in a Model 1
anova. If this had been Model I, the conclusion would have been that there
is a significant treatment effect rather than an added variance c o m p o n e n t . Now,
however, we must complete the c o m p u t a t i o n s a p p r o p r i a t e to a Model II anova.
These will includc the estimation of the added variance c o m p o n e n t and the
calculation of percentage variation at the two levels.
168 c h a p t e r 8 / single-classification analysis of variance
«η =
1
n
i ~
Σ>?\
VzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPO
Σ a
(8.1)
Σ"· /
1 + 10 2 + 13 2 + 6 2
(8 + 10 + 13 + 6) - = 9.009
4 - ~ 8 + 10 + 13 +
8.4 T w o groups
BOX 8J
Testing the difference in means between two groups.
Average age (in days) at beginning of reproduction inzyxwvutsrqponmlkjihgfedcbaZYXWV
Daphnia longispina (each
variate is a mean based on approximately similar numbers of females). Two series
derived from different genetic crosses and containing seven clones each are
compared; η = 7 clones per series. This is a Model I anova.
Series (a = 2)
I 11
7.2 8.8
7.1 7.5
9.1 7.7
7.2 7.6
7.3 7.4
7.2 6.7
7.5 7.2
η
Σγ 52.6 52.9
Υ 7.5143 7.5571
Σγ Ζ 398.28 402.23
s2 0.5047 0.4095
Single classification anova with two groups with equal sample sizes
Anova table
Source of variation df ss MS
FO.OJ(l.121 ~ 4.75
A t test of the hypothesis that two sample means come from a population with
equal μ ; also confidence limits of the difference between two means
This test assumes that the variances in the populations from which the two
samples were taken are identical. If in doubt about this hypothesis, test by method
of Box 7.1, Section 7.3.
170 chapter 8 / single-classification analysis of variance
BOX 8.2
Continued
The appropriate formula for f s is one of the following:
Expression (8.2), when sample sizes are unequal and n, or n z or both sample
sizes are small ( < 30):zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
df = n, + n 2 — 2
Expression (8.3), when sample sizes are identical (regardless of size): df =
2(« - 1)
Expression (8.4), when n1 and n 2 are unequal but both are large ( > 30): df ~
tts -+ rt2 — 2
For the present data, since sample sizes are equal, we choose Expression (8.3):
t __ ( ή - VVl - (μ . - μ ι)
is very similar for the two series. It would surprise us, therefore, to find that
tlicy arc significantly different. However, we shall carry out a test anyway. As
you realize by now, one cannot tell from the m a g n i t u d e of a difference whether
i( is significant. This depends on the m a g n i t u d e of (he error mean square, rep-
resenting the variance within scries.
The c o m p u t a t i o n s for the analysis of variance are not shown. They would
be the same as in Box 8.1. With equal sample sizes and only two groups, there
8.4 / t w o groups 171
( Σ ^ - Σ ^ ) (526 - 529) 2
= ^ 2 n = 1 4
= 0 0 0 6 4 3
There is only 1 degree of freedom between the two groups. The critical value of
F 0 ,05[i,i2] >s given u n d e r n e a t h the a n o v a table, but it is really not necessary to
consult it. Inspection of the m e a n squares in the a n o v a shows that MS g r o u p s
is m u c h smaller t h a n MS„ U h i n ; therefore the value of F s is far below unity,
and there c a n n o t possibly be an added c o m p o n e n t due to treatment effects
between the series. In cases where A/S g r o u p s < MS w i t h i n , we d o not usually b o t h e r
to calculate Fs, because the analysis of variance could not possibly be sig-
nificant.
There is a n o t h e r m e t h o d of solving a Model I two-sample analysis of vari-
ance. This is a t test of the differences between two means. This t test is the
traditional m e t h o d of solving such a problem; it may already be familiar to you
from previous acquaintance with statistical work. It has no real advantage in
either ease of c o m p u t a t i o n or understanding, and as you will see, it is mathe-
matically equivalent to the a n o v a in Box 8.2. It is presented here mainly for
the sake of completeness. It would seem too much of a break with tradition
not to have the t test in a biostatistics text.
In Section 6.4 we learned a b o u t the t distribution and saw that a t dis-
tribution of η — 1 degree of freedom could be obtained from a distribution of
the term (F( — μ )/χ ? ι , where sy_ has η — 1 degrees of freedom and Ϋ is normally
distributed. The n u m e r a t o r of this term represents a deviation of a sample mean
from a parametric mean, and the d e n o m i n a t o r represents a standard error for
such a deviation. We now learn that the expression
(% - Y2) - (μ , - μ 2)
i, = (8.2)
"(η. ; 1 Mf i (>i2 - 1 >sl "ι
η. + η2 - 2 n,n7
(Υ, - Υ,) - (μ ι - μ , )
(8.3)
(V, - Υ2)-(μ , - μ 2 )
(8.4)
SS (control v e r s u s sugars)
_ (701 ) 2 (593 + 582 + 580 + 641) 2 (701 + 593 + 582 + 580 + 641) 2
4
10 40 ~ 50
2 2
(701) (2396) (3097)-
= — + - = 8^2.12
10 40 50
TABLE 8.2
Means, group sums, and sample sizes from the data in Box 8.1. l ength of pea sections g r o w n in
tissue culture (in o c u l a r units).
/ ".i illliCOSi'
1"
+
('onirol yhtcost' Jructosc Γ'~„ fructose siurosc Σ
Y 70.1 593 58.2 58.0 64.1 (61.94 - F)
b e t w e e n t h e s e t w o g r o u p s . Since a c o m p a r i s o n b e t w e e n t w o g r o u p s h a s o n l y 1
d e g r e e of f r e e d o m , t h e s u m of s q u a r e s is at t h e s a m e t i m e a m e a n s q u a r e . T h i s
m e a n s q u a r e is tested o v e r t h e e r r o r m e a n s q u a r e of t h e a n o v a t o give t h e
following comparison:
=
^0.05[1,45] 4.05, F 0.0 1 [ 1 .4 5] = ^.23
T h i s c o m p a r i s o n is h i g h l y significant, s h o w i n g t h a t the a d d i t i o n s of s u g a r s h a v e
significantly r e t a r d e d t h e g r o w t h of the p e a sections.
N e x t we test w h e t h e r t h e m i x t u r e of s u g a r s is significantly d i f f e r e n t f r o m
t h e p u r e sugars. U s i n g the s a m e t e c h n i q u e , we c a l c u l a t e
SS (mixed s u g a r s v e r s u s p u r e s u g a r s )
SS ( a m o n g p u r e sugars) 196.87
MS ( a m o n g p u r e s u g a r s ) --= — — -- = 98.433
d) 2
MS ( a m o n g p u r e s u g a r s ! 98.433
I\ = = - — 18.03
A/S w i l h ,„ 5.46
df
SS ( c o n t r o l versus sugars) = 832.32 1
SS (mixed versus p u r e sugars) = 48.13 1
SS ( a m o n g p u r e sugars) = 196.87 2
SS ( a m o n g t r e a t m e n t s ) =1077.32 4
yxwvutsrqponmlkihgfedcbaYXWVUTSRQPONMLKIHGFEDCBA
TAHI.F 8 . 3
Anova table from Box K.I, with treatment sum of squares decomposed into
planned comparisons.
y
7
k
a 0.05
a' = ^ - ^ - 0.01
for an experimentwise critical α — 0.05. T h u s , (lie critical value for the [·\ ratios
of these c o m p a r i s o n s is /· „ l ) ] M 4 S | or /·'„ <>,| > 4 5 ] , as a p p r o p r i a t e . T h e first three
tests arc carried out as shown above. T h e last test is c o m p u t e d in a similar
manner:
MS
— ' > Fjh, | „(„•!)] (8.5)
^^wilhin
Since M S g r o u p J M S „ i t h i n = S S g r o u p s / [ ( « - 1) M S w i l h i n J , we can r e w r i t e E x p r e s s i o n
(8.5) as
which is greater than the critical SS. Wc conclude, therefore, that sucrosc re-
tards g r o w t h significantly less than the o t h e r sugars tested. We may c o n t i n u e
in this fashion, testing all the differences that look suspicious o r even testing
all possible sets of means, considering them 2, 3, 4, a n d 5 at a time. This latter
a p p r o a c h may require a c o m p u t e r if there are m o r e than 5 m e a n s to be c o m -
pared, since there arc very m a n y possible tests that could be m a d e . This
p r o c e d u r e was p r o p o s e d by Gabriel (1964), w h o called it a sum of squares simul-
taneous test procedure (SS-S'l'P).
In the SS-S I I' and in the original a n o v a , the chancc of m a k i n g a n y type I
e r r o r at all is a, the probability selected for the critical I· value f r o m T a b l e V.
By " m a k i n g any type I e r r o r at all" we m e a n m a k i n g such an e r r o r in the overall
test of significance of the a n o v a a n d in any of the subsidiary c o m p a r i s o n s a m o n g
m e a n s or sets of means needed to complete the analysis of the experiment. Phis
probability a therefore is an experimentwise e r r o r rate. N o t e that t h o u g h the
probability of any e r r o r at all is a, the probability of e r r o r for any p a r t i c u l a r
test of s o m e subset, such as a test of the difference a m o n g three o r between t w o
means, will always be less than χ Thus, for the test of each subset o n e is really
using a significance level a \ which may be m u c h less than the cxperimcntwisc
e x e r c i s e s 195
α, a n d if t h e r e a r e m a n y m e a n s in t h e a n o v a , this a c t u a l e r r o r r a t e a ' m a y be
o n e - t e n t h , o n e o n e - h u n d r e d t h , o r even o n e o n e - t h o u s a n d t h of t h e e x p e r i m e n t -
wise α ( G a b r i e l , 1964). F o r this r e a s o n , t h e u n p l a n n e d tests d i s c u s s e d a b o v e
a n d the overall a n o v a a r e n o t very sensitive t o differences b e t w e e n i n d i v i d u a l
m e a n s o r differences w i t h i n small subsets. O b v i o u s l y , n o t m a n y differences a r e
g o i n g t o be c o n s i d e r e d significant if a' is m i n u t e . T h i s is t h e price w e p a y for
n o t p l a n n i n g o u r c o m p a r i s o n s b e f o r e we e x a m i n e t h e d a t a : if w e w e r e t o m a k e
p l a n n e d tests, the e r r o r r a t e of e a c h w o u l d be greater, h e n c e less c o n s e r v a t i v e .
T h ezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
SS-STP p r o c e d u r e is only o n e of n u m e r o u s t e c h n i q u e s f o r m u l t i p l e
u n p l a n n e d c o m p a r i s o n s . It is t h e m o s t c o n s e r v a t i v e , since it a l l o w s a large
n u m b e r of possible c o m p a r i s o n s . D i f f e r e n c e s s h o w n t o be significant by this
m e t h o d c a n be reliably r e p o r t e d as significant differences. H o w e v e r , m o r e sen-
sitive a n d p o w e r f u l c o m p a r i s o n s exist w h e n t h e n u m b e r of possible c o m p a r i s o n s
is c i r c u m s c r i b e d b y t h e user. T h i s is a c o m p l e x s u b j e c t , t o w h i c h a m o r e c o m p l e t e
i n t r o d u c t i o n is given in S o k a l a n d Rohlf (1981), Section 9.7.
Exe rc ise s
8.1 The following is an example with easy numbers to help you become familiar
with the analysis of variance. A plant ecologist wishes to test the hypothesis
that the height of plant species X depends on the type of soil it grows in. He has
measured the height of three plants in each of four plots representing different
soil types, all four plots being contained in an area of two miles square. His
results are tabulated below. (Height is given in centimeters.) Does your anal-
ysis support this hypothesis? ANS. Yes, since F, = 6.951 is larger than ywvut
'θ <I5|J.H| — 4 . 0 7 .
1 15 25 17 10
2 9 21 23 13
3 14 19 20 16
8.2 The following are measurements (in coded micrometer units) of the thorax length
of the aphid Pemphigus populitransversus. The aphids were collected in 28 galls
on the cottonwood I'opulas delloides. Four alate (winged) aphids were randomly
selected from each gall and measured. The alate aphids of each gall are isogenic
(identical twins), being descended parthcnogenetieally from one stem mother.
Thus, any variance within galls can be due to environment only. Variance be-
tween galls may be due to differences in genotype and also to environmental
differences between galls. If this character, thorax length, is affected by genetic
variation, significant intergall variance must be present. The converse is not nec-
essarily true: significant variance between galls need not indicate genetic varia-
tion; it could as well be due to environmental differences between galls (data by
Sokal, 1952). Analyze the variance of thorax length. Is there significant intergall
variance present? (Jive estimates of the added component of intergall variance,
if present. What percentage of the variance is controlled by intragall and what
percentage by intergall factors? Discuss your results.
182 c h a p t e r 8 / s i n g l e - c l a s s i f i c a t i o n a n a l y s i s of variance
8.3 VI ill is and Seng (1954) published a study on the relation of birth order to the
birth weights οΓ infants. The data below on first-born and eighth-born infants are
extracted from a table of birth weights of male infants of Chinese third-class
patients at the K a n d a n g Kerbau Maternity Hospital in Singapore in 1950 and
1951.
3:0 3: 7 .
3:8 3: 15 1
4:0 4::7 3
4:8 •4:: 15 7 4
5:0 5::7 111 5
5:8 5 : 15 267 19
6:0 6:: 7 457 52
6:8 6 : 15 485 55
7:0 7:7 363 61
7:8 7 : 15 162 48
8:0 8: 7 64 39
8:8 8:1 5 6 19
9:0 9 :7 5 4
9:8 9 :15
10:0 10:7 1
10:8 10:15
1932 307
η y Sy
24 hours after
methoxychlor injection 5 24.8 0.9
Control 3 19.7 1.4
Strain
SL CS LL
tii 80 69 33
3 "ι
8070 7291 3640 Σ Σ γ 2
= 1,994.650
Note that part of the computation has already been performed for you. Perform
unplanned tests a m o n g the three means (short vs. long larval periods and each
against the control). Set 95% confidence limits to the observed differences of
means for which these comparisons are made. ANS. MS | S L v s 1 L ) = 2076.6697.
8.6 These data are measurements of live random samples of domestic· pigeons col-
lected during January, February, and March in Chicago in 1955. The variable-
is the length from the anterior end of the narial opening to the lip of the bony
beak and is recorded in millimeters. Data from Olson and Miller (1958).
Samples
1 1 3 4 s
8.7 The following data were taken from a study of blood protein variations in deer
(Cowan and Johnston, 1962). The variable is the mobility of serum protein frac-
II expressed as 1(T 5 cm 2 /volt-seconds.
tionzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Y S
T
A, Β
A, C
A, D
A, (B + C + D)/3
B, (C + D)/2
CHAPTER
Two-Way Analysis
of Variance
ω —i.
25 °°
e t II
'Π
3 «
5> ^
Μ "3
ω u>
ίβ Ό
ed Λ
S S5
Ο
c
.2 _
"<-»i 0c
§•2
cΟ .51
Ο -D
8 I
S 8
« H.
a
•3 P.
Ή
« •a
5
<3
A W υt
<3 OS
(Ν η
ON οο 8 S
5? <u νο —<
cs (u σν
-Η
νο
Tfr
IzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
i-
u
rΝ t o o o m σι « ο ο οΓΟ
Ο Φ 00 σν σν 8
νο Ο • νο <η α\ "Τ 0 οί VO τ»·
CJ •w οο' r-
ον vd
3
"fr νΟ Q Ον r-- 0 «ο ο 00 Ο II
•si »h 00 *3F τ r^j rr cR ρ- 00 νο ro m
(Λ -t^ v£> ro c> "Λ *T ΣΝ >Η —; Ον νο Ί—4 W
ω XI V3 rv|
' ο 60 "ί
δ- νο Ο Ο νο ο as g 00 Ο Ο Ο
Ο >> 2 (Ν 55 —ι νο ο «ο S 00
ό * "S
ο •S 00 νΟ Ον Tt Λ oc d Η Ο ^Η ^J
Ο χ> r-
•β 3 ~ ~
s UH νΟ 00 Q m II Ο Ο oo tΓ—Ο II
<η
ΐ Τ3 —< Γ-; Φ Ον 00 rr
r—
00 10 rn οό w-ί «λ Κ νο σ< 00 o< W
2
•β
ο.Ο
m £ — Α> «
1 = t
β .2 t .2 _
Ο_ ' 6·? ο.
ο TJ Β
δ
αν S S 3 <2yxtsonmljieaYWPNLJIA
& >Λ Ο
ν-> D
60 S "
x 4 >>
<L>-SS I
ο ε π ί
BOX 9.1
Continued
Preliminary computations
a b π
Υ =
1. Grand total = Σ Σ Σ 461.74
γ 2
2. Sum of the squared observations = Σ Σ Σ = + ••• + (12.30)2 = 5065.1530
3. Sum of the squared subgroup (cell) totals, divided by the sample size of the subgroups
" b / η 2
γ
Σ Σ \ Σ
ν / (84.49)2 + •· • + (98.61)2
« 8 = 4663.6317
t Ϋ f y/ ι
4. Sum of the squared column totals divided by the sample size of a column = - A
« fb η \2
= (2«.00) 2 + (216.74)2 _
bn ~~ (3 χ 8) ~ 4438.3S44
b/a η \2
Υ
5. Sum of the squared row totals divided by the sample size of a row = Σ^ Ϊ Σ Σ.... 1
an
(143.92)η22 + (121.82)2 + (196.00)2
46230674
(2^8) =
6. Grand total squared and divided by the total sample size = correction term CT
/ a b it \2
Σ ΣΣ ΣΣ γ Π) ,
\ / „. (quantity
(qua , l), 2 „(461.74), 2
abn abn (2x3x8)"4441'7464
γ1 C T
7- SS,„,ai = Σ Σ Σ ~ = quantity 2 - quantity 6 = 5065.1530 - 4441.7464 = 623.4066
a b / η \2
ΣΣΙ Σ
8. SSsubgr = ^ - C T = quantity 3 - quantity 6 = 4663.6317 - 4441.7464 = 221.8853
a ( b it V
ς ( ς ς ^)
9. SSA (SS of columns) = — C T = quantity 4 - quantity 6 = 4458.3844 - 4441.7464 = 16.6380
bn
b fa η \2
γ
Σ ( Σ Σ Ι
10. SSB (SS of rows) = — ^ '— - CT = quantity 5 - quantity 6 = 4623.0674 - 4441.7464 = 181.3210
an
11. SSA „ B (interaction SS) = SS subgr - SSA - SS„ = quantity 8 - quantity 9 - quantity 10
= 221.8853 - 16.6380 - 181.3210 = 23.9263
SSloltll — SSsllbgr = quantity 7 - quantity 8
12. SSwUhin (within subgroups; error SS) =yxvutsrqponmlihfedcbaYVTSRPONHGFDCA
= 623.4066 - 221.8853 = 401.5213
As a check on your computations, ascertain that the following relations hold for some of the above quantities: 2 S 3 S 4 i 6;
3 > 5 > 6.
Explicit formulas for these sums of squares suitable for computer programs are as follows:
9 a . SSA = n b t ( Y A - Y)2
10a. SSB = n a £ ( f B - Y
?
2
11a. SSAB = n £ i ( Y - ? A - ? B + f )
2
12a. SS within = n t i ^ - ? )
BOX 9.1
Continued
Source of variation jf
"J >« MS
Expected MS (Model Γ)
Ϋ Α - ? A (columns) a - 1 9 2 , nb«
9
( a - I )
<r2 + — — Vώ a
2
a -
Y B - Y Β (rows) 10 b
h - 1 10
ib - 1)
yxtsrponmljifecbaYSRPOJHGEDCA
Υ - Ϋ Α - Υ β + Ϋ Α χ Β (interaction) 1) 11
(a - 1 KbywvutsrqonmljigfcWVTSONLKJIHGDC Λ
11 2
(a - m - 1) (a - W - 1) Z w )
Y - Y Within subgroups 12
ab(n - 1) 12
ab(n - 1)
Y - f Total abn — I 1
1 6 3 1 f M b o t h faCtors
e x p i r n f f o S r m o S * ^ ^ >the ex
?ected ^ o v e are eorreet Below are the corresponding
Mixed model
Source of variation Model II (.4 fixed, β random)
nb °
A σ2 + ησζΒ + nbai σ2 + ησ\Β +
α — I
2
Β σ2 + π<7 2 β + naog α" -I- ηασ|
Α χ Β + ηα ιΑ Β
σtsronljfeaUT 2
σ + ησ"ΑΒ
ι
π- σ2
Within subgroups
Anova table
Source of variation df SS MS F,
Since this is a Model I anova, all mean squares are tested over the error MS. For a discussion of significance tests, see Section
9.2.
Conclusions.—Oxygen consumption does not differ significantly between the two species of limpets but differs with the sa!in:r·
At 50% seawater, the O , consumption is increased. Salinity appears to affect the two species equally, for there is insufficient evidir.:;
of a species χ salinity interaction.
I
192 c h a p t e r 9 ,/ t w o - w a y a n a l y s i s oh v a r i a n c e
TABLE zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
9.1
Preliminary anova of subgroups in twoway anova. D a t a f r o m Box 9.1.
Source of variation df SS MS
q u o t i e n t s we s u b t r a c t t h e c o r r e c t i o n term, c o m p u t e d as q u a n t i t y 6. T h e s e s u b -
t r a c t i o n s a r e carried o u t as steps 9 a n d 10, respectively. Since t h e r o w s a n d
c o l u m n s a r e b a s e d o n e q u a l s a m p l e sizes, we d o n o t h a v e t o o b t a i n a s e p a r a t e
q u o t i e n t for t h e s q u a r e of e a c h r o w o r c o l u m n s u m b u t c a r r y o u t a single divi-
sion a f t e r a c c u m u l a t i n g t h e s q u a r e s of t h e s u m s .
Let us r e t u r n for a m o m e n t t o the p r e l i m i n a r y a n a l y s i s of v a r i a n c e in
T a b l e 9.1, w h i c h d i v i d e d t h e t o t a l s u m of s q u a r e s i n t o t w o p a r t s : t h e s u m of
s q u a r e s a m o n g the six s u b g r o u p s ; a n d t h a t w i t h i n the s u b g r o u p s , t h e e r r o r s u m
of s q u a r e s . T h e new s u m s of s q u a r e s p e r t a i n i n g t o r o w a n d c o l u m n effects clearly
are n o t p a r t of the e r r o r , but m u s t c o n t r i b u t e t o t h e differences t h a t c o m p r i s e
the s u m of s q u a r e s a m o n g t h e f o u r s u b g r o u p s . W e t h e r e f o r e s u b t r a c t r o w a n d
col u m n SS f r o m the s u b g r o u p SS. T h e latter is 221.8853. T h e r o w S S is 181.3210,
a n d t h e c o l u m n SS is 16.6380. T o g e t h e r they a d d u p t o 197.9590, a l m o s t b u t
n o t q u i t e t h e value of t h e s u b g r o u p s u m of s q u a r e s . T h e difference r e p r e s e n t s
a t h i r d s u m of s q u a r e s , called the interaction sum of squares, w h o s e v a l u e in
this case is 23.9263.
W c shall discuss the m e a n i n g of this new s u m of s q u a r e s presently. At the
m o m e n t let us say o n l y t h a t it is a l m o s t a l w a y s p r e s e n t (but n o t necessarily
significant) a n d g e n e r a l l y t h a t it need n o t be i n d e p e n d e n t l y c o m p u t e d but m a y
be o b t a i n e d as illustrated a b o v e by the s u b t r a c t i o n of the row .SS a n d t h e col-
u m n SS f r o m the s u b g r o u p SS. T h i s p r o c e d u r e is s h o w n g r a p h i c a l l y in F i g u r e
9.1, which illustrates the d e c o m p o s i t i o n of the total s u m of s q u a r e s i n t o the s u b -
g r o u p SS a n d e r r o r SS. T h e f o r m e r is s u b d i v i d e d i n t o the row SS, c o l u m n SS,
a n d i n t e r a c t i o n SS. T h e relative m a g n i t u d e s of these s u m s of s q u a r e s will differ
f r o m e x p e r i m e n t to e x p e r i m e n t . In F i g u r e 9.1 they a r e not s h o w n p r o p o r t i o n a l
to their a c t u a l values in the limpet e x p e r i m e n t ; o t h e r w i s e the a r e a r e p r e s e n t i n g
the row SS w o u l d have to be a b o u t 11 times t h a t allotted to the c o l u m n SS.
Before we c a n intelligently test for significance in this a n o v a w e m u s t u n d e r -
s t a n d the m e a n i n g of interaction. W e c a n best e x p l a i n i n t e r a c t i o n in a t w o - w a y
a n o v a by m e a n s of a n artificial illustration b a s e d o n the limpet d a t a wc h a v e
just s t u d i e d . If we i n t e r c h a n g e the r e a d i n g s for 75% a n d 50'7, for A. d'uiitulis
only, we o b t a i n the d a t a t a b i c s h o w n in T a b i c 9.2. O n l y the s u m s of t h e s u b -
g r o u p s , rows, a n d c o l u m n s a r e s h o w n . W e c o m p l e t e the a n a l y s i s of v a r i a n c e
in t h e m a n n e r p r e s e n t e d a b o v e a n d n o t e the results at the fool of f a b l e 9.2.
T h e lotal a n d e r r o r SS are the s a m e as b e f o r e ( T a b l e 9.1). T h i s s h o u l d not be
9.1 / t w o - w a y a n o v a w i t h r f . p i r ation 193
R o w SS = 181.3210
T o t a lzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
SS = 77,570.25 "S • S u b g r o u p SS = 211.8803
C o l u m n SS = 10.6380
I n t e r a c t i o n S',S* = 23.02(53
E r r o r AS = 401.5213
FIGURE 9.1
D i a g r a m m a t i c r e p r e s e n t a t i o n of the p a r t i t i o n i n g of the total s u m s of s q u a r e s in a t w o - w a y o r t h o g o n a l
a n o v a . T h e a r e a s of the subdivisions are not s h o w n p r o p o r t i o n a l to the m a g n i t u d e s of the s u m s
of squares.
TABl.F. 9 . 2
An artificial example to illustrate the meaning of interaction. T h e r e a d i n g s
for 75'7, a n d 50% s e a w a t e r c o n c e n t r a t i o n s of Acmaea digitalis in Box 9.1
have been i n t e r c h a n g e d . O n l y s u b g r o u p a n d marginal totals are given
below.
Species
Seawater
concentration A. scahra A digitalis £
Completed anova
Sintrce of variation df SS MS
of the second and third rows have been altered appreciably as a result of the
interchange of the readings for 75% and 50% salinity inzyxwvutsrqponmlkjihgfedcbaZYXWV
A. digitalis. The sum
for 75% salinity is now very close to that for 50% salinity, and the difference
between the salinities, previously quite m a r k e d , is now n o longer so. By con-
trast, the interaction SS, obtained by subtracting the sums of squares of rows
and columns from the s u b g r o u p SS, is now a large quantity. R e m e m b e r that
the s u b g r o u p SS is the same in the two examples. In the first example we sub-
tracted sums of squares due to the effects of both species and salinities, leaving
only a tiny residual representing the interaction. In the second example these
two main effects (species and salinities) account only for little of the s u b g r o u p
sum of squares, leaving the interaction sum of squares as a substantial residual.
W h a t is the essential difference between these two examples?
In Table 9.3 we have shown the s u b g r o u p and marginal m e a n s for the
original d a t a from Table 9.1 and for the altered d a t a of Table 9.2. T h e original
results are quite clear: at 75% salinity, oxygen c o n s u m p t i o n is lower than at
the other two salinities, and this is true for both species. We note further that
A. scabra consumes more oxygen than A. digitalis at two of the salinities. T h u s
our statements a b o u t differences due to species or to salinity can be m a d e
largely independent of each other. However, if we had to interpret the artificial
d a t a (lower half of Table 9.3), we would note that although A. scabra still con-
sumes m o r e oxygen than A. digitalis (since column sums have not changed), this
difference depends greatly on the salinity. At 100% and 50%, A. scabra con-
sumes considerably more oxygen than A. digitalis, but at 75% this relationship
is reversed. Thus, we are n o longer able to m a k e an unequivocal statement
a b o u t the a m o u n t of oxygen taken up by the two species. We have to qualify
our statement by the seawater concentration at which they are kept. At 100%
ι Mil ιzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
9.3
Comparison of means of the data in Box 9.1 and Table 9.2.
Spa ies
Seawiiter
V./
Oruftnui ilalu from Box
ion",; 10.56 7.43 9.00
75".; 7.89 7.34 7.61
50",; 12.17 12.33 12.25
Mean 10.21 9.03 9.62
a n d 50%,zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Yscabra > y d i g i , a l i ! ^ b u t at 75%, T scabril < K d , Bilali ,. If we examine the
effects of salinity in the artificial example, we notice a mild increase in oxygen
c o n s u m p t i o n at 75%. H o w e v e r , again we have to qualify this s t a t e m e n t by the
species of the c o n s u m i n g limpet; scabra c o n s u m e s least at 75%, while digitalis
c o n s u m e s most at this c o n c e n t r a t i o n .
This d e p e n d e n c e of the effect of o n e factor o n the level of a n o t h e r f a c t o r
is called interaction. It is a c o m m o n a n d f u n d a m e n t a l scientific idea. It indicates
that the effects of t h e t w o factors are not simply additive b u t t h a t any given
c o m b i n a t i o n of levels of factors, such as salinity c o m b i n e d with a n y one species,
contributes a positive o r negative increment to the level of expression of the
variable. In c o m m o n biological terminology a large positive increment of this
sort is called synergism. W h e n drugs act synergistically, the result of the inter-
action of the t w o d r u g s m a y be a b o v e a n d b e y o n d the sum of the separate effects
of each drug. W h e n levels of t w o factors in c o m b i n a t i o n inhibit each other's
effects, wc call it interference. ( N o t e that "levels" in a n o v a is customarily used
in a loose sense to include not only c o n t i n u o u s factors, such as the salinity in
the present example, but also qualitative factors, such as the two species of
limpets.) Synergism a n d interference will both tend to magnify the interaction
SS.
(FP)(«?)(Cy)=FyK+?c+F
= Fκ c + F
T h i s s o m e w h a t involved expression is the deviation d u e t o interaction. W h e n
we e v a l u a t e o n e such expression for each s u b g r o u p , s q u a r e it, s u m the squares,
a n d multiply the s u m by n, we o b t a i n the i n t e r a c t i o n SS. This p a r t i t i o n of the
d e v i a t i o n s also holds for their squares. This is so because the s u m s of t h e p r o d -
ucts of the s e p a r a t e t e r m s cancel o u t .
A simple m e t h o d for revealing the n a t u r e of the interaction present in the
d a t a is to inspect the m e a n s of the original d a t a table. We c a n d o this in T a b l e
9.3. T h e original d a t a , s h o w i n g n o interaction, yield the following p a t t e r n of
relative m a g n i t u d e s :
Scahra Digitalis
100%
ν ν
75%
Λ Λ
50%
Scuhru Digitalis
100%
V Λ
75%
Λ V
50%
.1. digitalis
I'KiURE 9 . 2
50 75 100 Oxygen consumption by two species of
% Seawatrr l i m p e t s at t h r e e salinities. D a t a f r o m Box 9.1.
9.3 / TWO-WAV ANOVA WITHOU I ΚΙ ΙΊ (CATION 199
Factor A: Time
(a = 3)
Factor B: Before
Individuals alcohol Immediately 12 hours
Φ = 8) ingestion after ingestion later Σ
The eight sets of three readings are treated as replications (blocks) in this analysis. Time is a fixed treatment effect, while differ-
ences between individuals are considered to be random effects. Hence, this is a mixed-model anova.
Preliminary computations
a b
y 413 40
1. Grand total = Σ Σ = ·
α b
y2
2. Sum of the squared observations = Σ Σ = (20.00)2 + - · · + (30.45)2 = 8349.4138
2 2 2
» Sum
3. c ofr squaredΛ column
ι . . ι divided
totals .»· Μ by
u sample
ι size
· ofr a column
ι Σ ( Σ 77
=— = (152.88) + (111.68)
— + (148.84)
— = 7249.7578
b 8
b f at \ 2
y τ y]
\ / (49 79)2 -t- • · • -j- (82 51 )2
4. Sum of squared row totals divided by sample size of a row = — — = —' —-—— = 8127.8059
a 3
a b \2
Σ Σ η
5. Grand total squared and divided by the total sample size = correction term CT •-
ab
ab 24
6· SSu»ai = Σ Σ γ2 yxtsonmljieaYWPNLJIA
~ C T = quantity 2 - quantity 5 = 8349.4138 - 7120.8150 = 1228.5988
Σ(ς υ)2tsronljfeaUT
1
7. SSA (SS of columns) = — \ - C T = quantity 3 - quantity 5 = 7249.7578 - 7120.8150 = 128.9428
b
Σ(ς υ)2
J
8. SSB (SS of rows) = — ^ — - CT= quantity 4 - quantity 5 = 8127.8059 - 7120.8150 = 1006.9909
a
8
w
•α §
+ +
a, «1
«3 NX
to is
+
*
*
*
ΐί
•G
OS
oo Λ
W> 00
to w-1
Tf 00 vO
5 ci
3 •Ί-
00 Ο OO
rJi οS Vl OO
•Ί- O SO Os
ο·. CJ\ \D v-i
en oo Η 00
<N4
T— ο
Ο Os r^i
π
e
CO
o.
1)
CTJ
3 υ
Ό
'>
yxtsonmljieaYWPNLJIA
•ο
c
•3 ' 3
C Β
β zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONM
ε υ
J2
ο
υ
ο
OQ ω Η
Γ-1 "S χ c .s
i 3 § Ϊ3
χ "S
Ο § - I ι
βο υ ι>? ί».
9 . 3 / TWO-WAY ANOVA WITHOUT REPLICATION 203
R o w SS = 1006.9909
T o t a lyxwvutsrqponmlkjihgfedcbaYXWVUTSRPNMLKJIHFEDCBA
SS = 1228.5988 < >- S u b g r o u p = 122S.5988
C o l u m n .S'.S = 128.9428
I n t e r a c t i o n SS = 92.6651 = r e m a i n d e r
£ E r r o r .S'.V = 0
FIGURF. 9 . 3
D i a g r a m m a t i c r e p r e s e n t a t i o n of t h e p a r t i t i o n i n g of t h e total s u m s of s q u a r e s in a t w o - w a y o r t h o -
g o n a l a n o v a w i t h o u t r e p l i c a t i o n . T h e a r e a s of the s u b d i v i s i o n s a r e not s h o w n p r o p o r t i o n a l to t h e
m a g n i t u d e s of t h e s u m s of s q u a r e s .
in this example is the s a m e as the total sum of squares. If this is not immediately
a p p a r e n t , consult Figure 9.3, which, w h e n c o m p a r e d with Figure 9.1, illustrates
that the e r r o r sum of squares based on variation within s u b g r o u p s is missing
in this example. T h u s , after we s u b t r a c t t h e sum of squares for c o l u m n s (factor
A) a n d for rows (factor B) f r o m the total SS, we are left with only a single sum
of squares, which is the equivalent of the previous interaction SS but which is
n o w the only source for an e r r o r term in the a n o v a . This SS is k n o w n as the
remainder SS or the discrepance.
If you refer to the expected m e a n s q u a r e s for the two-way a n o v a in Box 9.1,
you will discover why we m a d e the s t a t e m e n t earlier that for s o m e models and
tests in a two-way a n o v a w i t h o u t replication we must a s s u m e that the inter-
action is not significant. If interaction is present, only a M o d e l II a n o v a can
be entirely tested, while in a mixed model only the fixed level c a n be tested
over the r e m a i n d e r m e a n square. But in a pure M o d e l I a n o v a , o r for the
r a n d o m factor in a mixed model, it would be i m p r o p e r to test the m a i n effects
over the r e m a i n d e r unless we could reliably a s s u m e that n o a d d e d effect d u e
to interaction is present. G e n e r a l inspection of the d a t a in Box 9.2 convinces
us that the t r e n d s with time for any o n e individual are faithfully reproduced
for the o t h e r individuals. Thus, interaction is unlikely to be present. If, for
example, some individuals had not responded with a lowering of their S - P L P
levels after ingestion of alcohol, interaction would have been a p p a r e n t , a n d the
test of the m e a n s q u a r e a m o n g individuals carricd out in Box 9.2 would not
have been legitimate.
Since we a s s u m e no interaction, the r o w and c o l u m n m e a n s q u a r e s arc
tested over the e r r o r MS. T h e results a r e not surprising; casual inspection of
the d a t a would have predicted o u r findings. Differences with time are highly
significant, yielding a n F„ value of 9.741. T h e a d d e d variance a m o n g individuals
is also highly significant, a s s u m i n g there is n o interaction.
A c o m m o n a p p l i c a t i o n of t w o - w a y a n o v a w i t h o u t replication is the repeated
testing of the same individuals. By this we m e a n that the same g r o u p of individuals
204 CHAPTER 9 ,/ TWO-WAY ANALYSIS Oh VARIANCE
BOX 9.3
Paired comparisons (randomized Mocks with β = 2).
Lower face width (skeletal bigoniai diameter in cm) for 15 North American white
girls measured when 5 and again when 6 years old.
M>
w (2) (i) »=ri2r(I
Individuals 5-year-olds 6-year-olds Σ (difference)
Anova table
Source of
variation df SS MS F. Expected MS
Ages (columns;
factor A) 1 0.3000 0.3000 388.89** <r2 + o2AB + -b-τΣ"2
Individuals <3—1
(rows; factor Β) 14 2.6367 0.188,34 (244.14)**
Remainder 14 0.0108 0.000,771,43 22
σ + tTab
<r + ασί
Total 29 2.9475
BOX 9.3
Continued
to assume that the interaction o \ B is zero, we may test for an added variance
component among individual girls and would find it significant.
._ D ~ (μ ι~μ 2)
«Β
where D is the mean difference between the paired observations.
_ τοolj 3. oo
D = _ _ — _ _ _ Λ ο 20
and sg = sD/v'fo is the standard error of D calculated from the observed differences
in column (4):
- (^Dfjb _ jO.6216 - (3.00 2 /fS) _ /0.0216
Sj> 1
- b —I 14 ~yj 14
= V0S")T,542,86 = 0.039,279,2
and thus
_ s„ _ 0.039,279,2
• 0.010,141,9
We assume that the true difference between the means of the two groups, pt — μ 2,
equals zero:
D- 0 0.20 - 0
19 7 2 0 3 With
^ " " 0Ό10,14Ι,9 " " ' =
This yields Ρ « 0.0L Also tj = 388.89, which equals the previous F„,
wipe it over t h e half of the leaf on o n e side of the midrib, r u b b i n g the other
half of the leaf with a control or s t a n d a r d solution.
A n o t h e r design leading to paired c o m p a r i s o n s is to apply the t r e a t m e n t to
t w o individuals s h a r i n g a c o m m o n experience, be this genetic or e n v i r o n m e n t a l .
T h u s , a d r u g or a psychological test might be given to g r o u p s of twins o r sibs.
one of each pair receiving the treatment, the o t h e r one not.
Finally, the p a i r e d - c o m p a r i s e n s technique may be used when the t w o in-
dividuals to be c o m p a r e d share a single experimental unit a n d are thus subjected
to c o m m o n e n v i r o n m e n t a l experiences. If we have a set of rat cages, each of
which holds two rats, a n d we are trying to c o m p a r e the effect of a h o r m o n e
injection with a control, we might inject o n e of each pair of rats with the
h o r m o n e a n d use its cage m a t e as a control. This w o u l d yield a 2 χ η a n o v a
for η cages.
O n e reason for f e a t u r i n g the p a i r e d - c o m p a r i s o n s test separately is t h a t it
alone a m o n g the t w o - w a y a n o v a s w i t h o u t replication h a s a n equivalent, alter-
native m e t h o d of a n a l y s i s — t h ezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIH
t test for paired c o m p a r i s o n s , which is the
traditional m e t h o d of analyzing it.
T h e p a i r e d - c o m p a r i s o n s ease shown in Box 9.3 analyzes face widths of five-
and six-year-old girls, as already m e n t i o n e d . T h e question being asked is
whether the faces of six-year-old girls are significantly wider than those of five-
year-old girls. T h e d a t a a r e s h o w n in c o l u m n s (1) a n d (2) for 15 individual girls.
C o l u m n (3) features the row s u m s that are necessary for the analysis of variance.
T h e c o m p u t a t i o n s for the two-way a n o v a w i t h o u t replication are the same as
those already s h o w n for Box 9.2 and thus arc not shown in detail. T h e a n o v a
table shows that there is a highly significant difference in face width between
the two age groups. If interaction is assumed to be zero, there is a large a d d e d
variance c o m p o n e n t a m o n g the individual girls, u n d o u b t e d l y representing
genetic as well as e n v i r o n m e n t a l differences.
T h e o t h e r m e t h o d of analyzing p a i r e d - c o m p a r i s o n s designs is the well-
k n o w n t test for paired comparisons. It is quite simple to apply a n d is illustrated
in the second half of Box 9.3. It tests whether the mean of s a m p l e differences
between pairs of readings in the t w o c o l u m n s is significantly different from a
hypothetical mean, which the null hypothesis puts at zero. T h e s t a n d a r d error
over which this is tested is the s t a n d a r d e r r o r of the m e a n difference. T h e dif-
ference c o l u m n has to be calculated and is s h o w n in c o l u m n (4) of the data
tabic in Box 9.3. T h e c o m p u t a t i o n s arc quite s t r a i g h t f o r w a r d , a n d the conclu-
sions a r c the s a m e as for the two-way a n o v a . This is a n o t h e r instance in which
we o b t a i n the value of F s when we s q u a r e the value of /,.
Although the p a i r e d - c o m p a r i s o n s t test is the traditional m e t h o d of solving
this type of problem, we prefer the two-way a n o v a . Its c o m p u t a t i o n is no more
t i m e - c o n s u m i n g and has the a d v a n t a g e of providing a measure of the variance
c o m p o n e n t a m o n g the rows (blocks). This is useful knowledge, because if there-
is no significant a d d e d variance c o m p o n e n t a m o n g blocks, o n e might simplify
the analysis a n d design of future, similar studies by e m p l o y i n g single classifi-
cation a n o v a .
208 CHAPTER 9 ,/ TWO-WAY ANALYSIS Oh VARIANCE
Exercises
9.1 Swanson, Latshaw, and Tague (1921) determined soil p H electrometrically for
various soil samples from Kansas. An extract of their d a t a (acid soils) is
shown below. D o subsoils differ in p H from surface soils (assume that there is
no interaction between localities and depth for p H reading)?
3.74 4.44 3.92 4.29 4.54 5.30 3.40 3.79 4.80 5.75
4.01 4.37 4.95 5.24 5.18 4.50 3.55 3.66 6.45 5.14
3.77 4.25 4.47 4.43 5.75 4.59 3.83 3.58 5.18 5.25
3.78 3.71 4.28 4.00 5.04 5.04 3.95 3.38 4.49 4.76
4.10 4.08 4.07 4.62 4.64 4.83 4.43 3.71 5.24 5.18
4.06 3.90 4.10 4.29 4.79 4.55 3.70 3.94 5.70 4.22
4.27 4.41 4.38 4.85 4.72 4.97 3.30 3.59 5.41 5.98
3.94 4.1 1 3.98 4.66 3.88 5.38 3.93 3.55 4.77 4.85
4.1 1 4.37 4.46 4.40 5.28 5.39 3.58 3.55 5.18 6.55
4.25 3.53 5.05 4.33 4.66 5.97 3.54 343 5.23 5.72
40.03 41.17 43.66 45.11 48.48 50.52 37.21 36.18 52.45 53.40
4.003 4.1 17 4.366 4.51 1 4.848 5.052 3.721 3.618 5.245 5.340
iihit
X Y2 = 2059.6109
1 \ l KC IS1 s 209
9.3 Blakeslee (1921) studied length-width ratios of second seedling leaves of two
types of Jimson weed called globe (G) a n d nominal (TV). Three seeds of each
type were planted in 16 pots. Is there sufficient evidence to conclude that globe
and nominal differ in length-width ratio?
Pot Types
identification
number G Ν
(ienol ι
Series ι + +b bb
Strains
Dt'/i.si'/ V
per container OL BF.LL bwb
,v C IS C s c IS C
Assumptions of
Analysis of Variance
W c shall n o w e x a m i n e t h e u n d e r l y i n g a s s u m p t i o n s of the a n a l y s i s of v a r i a n c e ,
m e t h o d s for testing w h e t h e r these a s s u m p t i o n s a r e valid, t h e c o n s e q u e n c e s for
a n a n o v a if t h e a s s u m p t i o n s a r e violated, a n d s t e p s t o be t a k e n if t h e a s s u m p -
tions c a n n o t be met. W c s h o u l d stress t h a t b e f o r e y o u c a r r y o u t a n y a n o v a
o n a n a c t u a l r e s e a r c h p r o b l e m , y o u s h o u l d a s s u r e yourself t h a t t h e a s s u m p -
t i o n s listed in this c h a p t e r seem r e a s o n a b l e . If they a r c n o t , y o u s h o u l d c a r r y
out o n e of several p o s s i b l e a l t e r n a t i v e steps to r e m e d y the s i t u a t i o n .
In Scction 10.1 wc briefly list t h e v a r i o u s a s s u m p t i o n s of a n a l y s i s of vari-
ance. W c d e s c r i b e p r o c e d u r e s for t e s t i n g s o m e of t h e m a n d briefly s t a t e t h e
c o n s e q u e n c e s if t h e a s s u m p t i o n s d o n o t h o l d , a n d we give i n s t r u c t i o n s o n h o w
t o p r o c e e d if they d o n o t . T h e a s s u m p t i o n s i n c l u d e r a n d o m s a m p l i n g , inde-
p e n d e n c e , h o m o g e n e i t y of variances, n o r m a l i t y , a n d a d d i t i v i t y .
In m a n y cases, d e p a r t u r e f r o m the a s s u m p t i o n s of a n a l y s i s of v a r i a n c e
can be rectified by t r a n s f o r m i n g the o r i g i n a l d a t a by using a new scale. T h e
212 CHAPTER 10 , ASSUMPTIONS OF ANALYSIS OF VARIANC 1
ical p r o c e s s of r a n d o m l y a l l o c a t i n g t h e t r e a t m e n t s t o t h e e x p e r i m e n t a l p l o t s
e n s u r e s t h a t t h e e's will be i n d e p e n d e n t .
L a c k of i n d e p e n d e n c e of t h e e's c a n result f r o m c o r r e l a t i o n in t i m e r a t h e r
t h a n space. In a n e x p e r i m e n t we m i g h t m e a s u r e t h e effect of a t r e a t m e n t b y
r e c o r d i n g weights of ten i n d i v i d u a l s . O u r b a l a n c e m a y suffer f r o m a m a l a d -
j u s t m e n t t h a t results in giving successive u n d e r e s t i m a t e s , c o m p e n s a t e d f o r by
several o v e r e s t i m a t e s . C o n v e r s e l y , c o m p e n s a t i o n b y the o p e r a t o r of the b a l a n c e
m a y result in r e g u l a r l y a l t e r n a t i n g over- a n d u n d e r e s t i m a t e s of the t r u e weight.
H e r e a g a i n , r a n d o m i z a t i o n m a y o v e r c o m e t h e p r o b l e m of n o n i n d e p e n d e n c e of
e r r o r s . F o r e x a m p l e , w e m a y d e t e r m i n e t h e s e q u e n c e in w h i c h i n d i v i d u a l s of
the v a r i o u s g r o u p s a r e w e i g h e d a c c o r d i n g to s o m e r a n d o m p r o c e d u r e .
T h e r e is n o s i m p l e a d j u s t m e n t o r t r a n s f o r m a t i o n t o o v e r c o m e t h e lack of
i n d e p e n d e n c e of e r r o r s . T h e b a s i c d e s i g n of t h e e x p e r i m e n t o r t h e w a y in w h i c h
it is p e r f o r m e d m u s t b e c h a n g e d . If the e's a r e n o t i n d e p e n d e n t , t h e validity
of the u s u a lzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
F test of significance c a n be seriously i m p a i r e d .
Homogeneity of variances. In S e c t i o n 8.4 a n d B o x 8.2, in w h i c h we de-
scribed t h e t test for t h e difference b e t w e e n t w o m e a n s , y o u w e r e told t h a t
the statistical test w a s valid o n l y if we c o u l d a s s u m e t h a t t h e v a r i a n c e s of t h e
t w o s a m p l e s were e q u a l . A l t h o u g h w e h a v e n o t stressed it so far, this a s s u m p -
tion t h a t t h e e ; / s h a v e identical v a r i a n c e s a l s o u n d e r l i e s t h e e q u i v a l e n t a n o v a
test for t w o s a m p l e s — a n d in fact a n y t y p e of a n o v a . Equality of variances in
a set of s a m p l e s is a n i m p o r t a n t p r e c o n d i t i o n for several statistical tests. Syn-
o n y m s for this c o n d i t i o n a r e homogeneity of variances a n d homoscedasticity.
T h i s latter t e r m is c o i n e d f r o m G r e e k r o o t s m e a n i n g e q u a l scatter; t h e c o n v e r s e
c o n d i t i o n (inequality of v a r i a n c e s a m o n g s a m p l e s ) is called heteroscedasticity.
Because we a s s u m e t h a t e a c h s a m p l e v a r i a n c e is a n e s t i m a t e of t h e s a m e p a r a -
m e t r i c e r r o r v a r i a n c e , the a s s u m p t i o n of h o m o g e n e i t y of v a r i a n c e s m a k e s in-
tuitive sense.
W e h a v e a l r e a d y seen h o w t o test w h e t h e r t w o s a m p l e s a r c h o m o s c e d a s t i c
p r i o r t o a t test of the differences b e t w e e n t w o m e a n s (or t h e m a t h e m a t i c a l l y
e q u i v a l e n t t w o - s a m p l e a n a l y s i s of variance): we use a n F test for the h y p o t h e s e s
H n : a \ = o \ a n d Η , : σ ] Φ σ \ , as illustrated in Scction 7.3 a n d Box 7.1. F o r
m o r e t h a n t w o s a m p l e s t h e r e is a " q u i c k a n d d i r t y " m e t h o d , p r e f e r r e d by m a n y
b e c a u s e of its simplicity. T h i s is the F m . lx lest. T h i s test relies o n the tabled
c u m u l a t i v e p r o b a b i l i t y d i s t r i b u t i o n of a statistic that is the v a r i a n c e r a t i o of the
largest t o the smallest of several s a m p l e v a r i a n c e s . T h i s d i s t r i b u t i o n is s h o w n in
T a b l e VI. Let us a s s u m e t h a t we h a v e six a n t h r o p o l o g i c a l s a m p l e s of 10 b o n e
l e n g t h s e a c h , for w h i c h we wish t o c a r r y o u t a n a n o v a . T h e v a r i a n c e s of the
six s a m p l e s r a n g e f r o m 1.2 t o 10.8. W e c o m p u t e t h e m a x i m u m v a r i a n c e r a t i o
'sn>axAs'min = Ύ.'ι~ = 9.0 a n d c o m p a r e it with f ' m . u l l J „|, critical values of w h i c h a r e
f o u n d in T a b l e VI. F o r a = 6 a n d ν = η - 1 = 9, /·'„„„ is 7.80 a n d 12.1 at the
5% a n d Γ'ό levels, respectively. W e c o n c l u d e t h a t the v a r i a n c e s of the six s a m -
ples a r c significantly h e t e r o g e n e o u s .
W h a t m a y c a u s e such h e t e r o g e n e i t y ? In this case, we s u s p e c t that s o m e of
the p o p u l a t i o n s are i n h e r e n t l y m o r e v a r i a b l e t h a n o t h e r s . S o m e races or species
214 CHAPTER 10 , ASSUMPTIONS OF ANALYSIS OF VARIANC 1
are relatively u n i f o r m for o n e character, while others are quite variable for t h e
s a m e c h a r a c t e r . In a n a n o v a representing the results of an experiment, it m a y
well be that o n e s a m p l e h a s been o b t a i n e d u n d e r less s t a n d a r d i z e d c o n d i t i o n s
t h a n the others a n d hence h a s a greater variance. T h e r e are also m a n y cases
in which the heterogeneity of variances is a f u n c t i o n of an i m p r o p e r choice of
m e a s u r e m e n t scale. W i t h s o m e m e a s u r e m e n t scales, variances vary as f u n c t i o n s
of means. T h u s , differences a m o n g m e a n s b r i n g a b o u t h e t e r o g e n e o u s variances.
F o r example, in variables following the Poisson distribution t h e variance is in
fact e q u a l t o the m e a n , a n d p o p u l a t i o n s with greater m e a n s will therefore have
greater variances. Such d e p a r t u r e s f r o m the a s s u m p t i o n of homoscedasticity
can often be easily corrected by a suitable t r a n s f o r m a t i o n , as discussed later in
this chapter.
A rapid first inspection for hetcroscedasticity is to check for c o r r e l a t i o n
between the m e a n s a n d variances or between the m e a n s a n d the ranges of the
samples. If the variances increase with the m e a n s (as in a Poisson distribution),
s2/Y or s/Ϋ = V will be a p p r o x i m a t e l y c o n s t a n t for the samples.
the ratioszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
If m e a n s a n d variances are i n d e p e n d e n t , these ratios will vary widely.
T h e consequences of m o d e r a t e heterogeneity of variances a r e not t o o seri-
o u s for the overall test of significance, but single degree of f r e e d o m c o m p a r i -
sons m a y be far f r o m accurate.
If t r a n s f o r m a t i o n c a n n o t cope with heteroscedasticity, n o n p a r a m e t r i c
m e t h o d s (Section 10.3) m a y have to be resorted to.
Normality. We have a s s u m e d t h a t the e r r o r terms e ; j of the variates in each
s a m p l e will be i n d e p e n d e n t , that the variances of the e r r o r terms of t h e several
samples will be equal, a n d , finally, t h a t the error terms will be n o r m a l l y dis-
tributed. If there is serious question a b o u t the normality of the d a t a , a g r a p h i c
test, as illustrated in Section 5.5, might be applied to each sample separately.
T h e consequences of n o n n o r m a l i t y of e r r o r are not too serious. O n l y very
skewed distribution w o u l d have a m a r k e d effect on the significance level of
the F test or on the efficiency of the design. T h e best way to correct for lack
of n o r m a l i t y is to carry out a t r a n s f o r m a t i o n that will m a k e the d a t a n o r m a l l y
distributed, as explained in the next section. If n o simple t r a n s f o r m a t i o n is satis-
factory, a n o n p a r a m e t r i c test, as carried out in Section 10.3, should be sub-
stituted for the analysis of variance.
Additivitv· In two-way a n o v a without replication it is necessary to a s s u m e
that interaction is not present if o n e is to m a k e tests of the m a i n effects in a
M o d e l I a n o v a . This a s s u m p t i o n of no interaction in a two-way a n o v a is some-
times also referred t o as the a s s u m p t i o n of additivity of the main effects. By this
we m e a n that any single observed variate can be d e c o m p o s e d into additive
c o m p o n e n t s representing the t r e a t m e n t effects of a particular row a n d c o l u m n
as well as a r a n d o m term special to it. If interaction is actually present, then
the F test will be very inefficient, a n d possibly misleading if the effect of the
interaction is very large. A check of this a s s u m p t i o n requires either m o r e t h a n
a single observation per cell (so that an e r r o r m e a n square can be c o m p u t e d )
10.1 / THE ASSUMPTIONS OF ANOVA 215
o r a n i n d e p e n d e n t e s t i m a t e of the e r r o r m e a n s q u a r e f r o m p r e v i o u szyxwvutsrqponmlkjih
comparable
experiments.
I n t e r a c t i o n c a n be d u e t o a variety of causes. M o s t f r e q u e n t l y it m e a n s
t h a t a given t r e a t m e n t c o m b i n a t i o n , such as level 2 of f a c t o r A w h e n c o m -
bined with level 3 of f a c t o r B, m a k e s a v a r i a t e d e v i a t e f r o m t h e e x p e c t e d value.
S u c h a d e v i a t i o n is r e g a r d e d as a n i n h e r e n t p r o p e r t y of t h e n a t u r a l system
u n d e r s t u d y , as in e x a m p l e s of synergism o r interference. S i m i l a r effects o c c u r
w h e n a given replicate is q u i t e a b e r r a n t , as m a y h a p p e n if a n e x c e p t i o n a l p l o t
is included in a n a g r i c u l t u r a l e x p e r i m e n t , if a diseased i n d i v i d u a l is i n c l u d e d in
a physiological e x p e r i m e n t , o r if by m i s t a k e a n i n d i v i d u a l f r o m a different species
is i n c l u d e d in a b i o m e t r i c study. Finally, a n i n t e r a c t i o n t e r m will result if t h e
effects of t h e t w o f a c t o r s A a n d Β o n t h e r e s p o n s e v a r i a b l e Y a r e m u l t i p l i c a t i v e
r a t h e r t h a n additive. An e x a m p l e will m a k e this clear.
In T a b l e 10.1 we s h o w t h e a d d i t i v e a n d m u l t i p l i c a t i v e t r e a t m e n t effects
in a h y p o t h e t i c a l t w o - w a y a n o v a . Let us a s s u m e t h a t the expected p o p u l a t i o n
m e a n μ is zero. T h e n the m e a n of the s a m p l e s u b j e c t e d to t r e a t m e n t I of fac-
t o r A a n d t r e a t m e n t 1 of f a c t o r Β s h o u l d be 2, by the c o n v e n t i o n a l a d d i t i v e
m o d e l . T h i s is so b e c a u s e each f a c t o r at level 1 c o n t r i b u t e s u n i t y t o t h e m e a n .
Similarly, the expected s u b g r o u p m e a n s u b j e c t e d t o level 3 for f a c t o r A a n d
level 2 for f a c t o r Β is 8, t h e respective c o n t r i b u t i o n s to the m e a n b e i n g 3 a n d 5.
H o w e v e r , if the p r o c e s s is multiplicative r a t h e r t h a n additive, as o c c u r s in a
variety of p h y s i c o c h e m i c a l a n d biological p h e n o m e n a , the e x p e c t e d v a l u e s will
be q u i t e different. F o r t r e a t m e n t AlBt< the e x p e c t e d value e q u a l s 1, which is
the p r o d u c t of 1 a n d 1. F o r t r e a t m e n t A 3 B 2 , the e x p e c t e d value is 15, the p r o d -
uct of 3 a n d 5. If we w e r e t o a n a l y z e m u l t i p l i c a t i v e d a t a of this sort by a
c o n v e n t i o n a l a n o v a , we w o u l d find that the i n t e r a c t i o n s u m of s q u a r e s w o u l d
be greatly a u g m e n t e d b e c a u s e of the n o n a d d i t i v i t y of the t r e a t m e n t effects. In
this case, there is a s i m p l e r e m e d y . By t r a n s f o r m i n g the v a r i a b l e i n t o l o g a r i t h m s
( T a b l e 10.1), we a r c a b l e t o r e s t o r e the additivity of the d a t a . T h e third item
in each cell gives the l o g a r i t h m of (he expected value, a s s u m i n g m u l t i p l i c a t i v e
ί'λιιι κ κι.ι
Illustration of additive and multiplicative elfects.
h'tu tor A
h acKir Η a, - 1 os = 2 a, - 3
">
3 4 Additive effects
/'. - ι 1 2 3 Multiplicative effects
0 0.30 0.48 Log of multiplicative effect:
() 7 8 Additive effects
II2 - 5 s 10 15 Multiplicative effects
0.70 1.00 1.18 Log of multiplicative effect:
216 CHAPTER 10 , ASSUMPTIONS OF ANALYSIS OF VARIANC 1
10.2 T r a n s f o r m a t i o n s
(1) (2)
Number of Square root of (3) (4)
flies emerging number of flies Medium A Medium Β
y J y / f
0 0.00 1 —
1 1.00 5 —
2 1.41 6 —
3 1.73 — —
4 2.00 3 —
5 2.24 — —
6 2.45 — —
7 2.65 2
8 2.83 — 1
9 3.00 — 2
10 3.16 — 3
11 3.32 — 1
12 3.46 — 1
13 3.61 1
14 3.74 — 1
15 3.87 — 1
16 4.00 2
15 75
Untransformed variable
Ϋ 1.933 11.133
s2 1.495 9.410
Square root transformation
V ntransformed transformed
s2, 9.410 •Wl 0.2634 _
f\ = 6.294** r —— ~>— 9>i
F0.()2S[l 4, 1 4| / ' = 1.255 ns
1.495 0.2099 ~
(7 1.687 10.937
95% confidence limits
— sjt —
'o.os-Vy 1.297 - 2.145 V 0 ' 2 " 4 3.307 - 2.145 N " iT"
= 1.015 ^ 3.053
L2 = JY f i 0 . 0S .Vr '-583 3.561
Back-transformed (squared) confidence limits
BOX 10.1
MannWhitney V test for two samples, ranked observations, not paired.
A measure of heart function (left ventricle ejection fraction) measured in two
samples of patients admitted to the hospital under suspicion of heart attack. The
patients were classified on the basis of physical examinations during admission
into different socalled Killip classes of ventricular dysfunction. We compare the
left ventricle ejection fraction for patients classified as Killip classes I and III. The
higher Killip class signifies patients with more severe symptons. Thefindingswere
already graphed in the source publication, and step 1 illustrates that only a graph
of the data is required for the MannWhitneyzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPON
U test. Designate the sample size of
the larger sample as nl and that of the smaller sample as n2. In this case, n, = 29,
= 8. When the two samples are of equal size it does not matter which is desig
n2yxtsrponmljifecbaYSRPOJHGEDCA
nated as n,.
1. Graph the two samples as shown below. Indicate the ties by placing dots at the
same level.
0.8
0.7 r- *
•
bu
0.6
0.5 ; ι
»
ft—
•
ω 0.4
•
•
*
•
« %
0.3
•
•
0.2 - •
0.1
0.49 + 0.13 0.28 + 0.08
η = 29 n = 8
ι 1
1 m
Killip class
2. For each observation in one sample (it is convenient to use the smaller sample),
count the number of observations in the other sample which are lower in value
(below it in this graph). Count \ for each tied observation. For example, there
are lj observations in class I below the first observation in class III. The half
is introduced because of the variate in class I tied with the lowest variate in
class III. There are 2f observations below the tied second and third observa
tions in class III. There are 3 observations below the fourth and fifth variates
in class III, 4 observations below the sixth variate, and 6 and 7 observations,
respectively, below the seventh and eight variates in class III. The sum of these
counts C = 29{. The MannWhitney statistic Vs is the greater of the two
quantities C and (n,n2 - C), in this case 29| and [(29 χ 8) 29|] = 202^.
222 chapter 10 , a s s u m p t i o n s o f a n a l y s i s o f v a r i a n c 1
Box 10.1
Continued
Testing the significance of V,
No tied variates in samples (or variates tied within samples only). When n, £ 20,
compare U, with critical value for ί/φ,,„2] in Table XI. The null hypothesis is
rejected if the observed value is too large.
In cases where n t > 20, calculate the following quantity
ttsronljfeaUT
Us ~ n 'n ^2
/"ι"ζ("ι + n t + 1)
V 12
which is approximately normally distributed. The denominator 12 is a constant.
Look up the significance of ts in Table III against critical values of for a one
tailed or twotailed test as required by the hypothesis. In our case this would yield
t 202.5 ~(29)(8)/2 ^ 86.5 = ^
/(29)(8)(29 + 8TT) V734.667
V 12
A further complication arises from observations tied between the two groups.
Our example is a case in point. There is no exact test. For sample sizes n, < 20,
use Table XI, which will then be conservative. Larger sample sizes require a more
elaborate formula. But it takes a substantial number of ties to affect the outcome
of the test appreciably. Corrections for ties increase the t„ value slightly; hence
the uncorrected formula is more conservative. We may conclude that the two
samples with a t, value of 3.191 by the uncorrected formula are significantly dif
ferent at Ρ < 0.01.
BOX 10.2
KolmogorovSmirnov twosample test, testing differences in distributions of two
samples of continuous observations. (Both n, and n2 <, 25.)
Two samples of nymphs of the ehigger Trombicuia lipovskyi. Variate measured is
length of cheliceral base stated as micrometer units. The sample sizes are rij = 16, trqiedTJ
«2 = 10.
Sample A Sample Β
Y Y
104 100
109 105
112 107
114 107
116 108
118 111
118 116
119 120
121 121
123 123
125
126
126
128
128
128
Computational steps
Box 10.2
Continued
3. Compute d, the absolute value of the difference between the relative cumulative
frequencies in columns (4) and (5), and enteT in column (6).
4. Locate the largest unsigned differenceyxtsrponmljifecbaYSRPOJHGEDCA
D. It is 0.475.
5. Multiply D by »,n 2 . We obtain (16)(10J(0.475) «= 76.
6. Compare ntn2D with its critical value in Table XIII, where we obtain a value
of 84 for Ρ = 0.05. We accept the null hypothesis that the two samples have
been taken from populations with the same distribution. The Kolmogorov
Smirnov test is less powerful than the MannWhitney U test shown in Box 10.1
with respect to the alternative hypothesis of the latter, i.e., differences in location.
However, KolmogorovSmirnov tests differences in both shape and location
of the distributions and is thus a more comprehensive test
BOX 10.3
Wilcoxon's signedranks test for two groups, arranged as paired observations.
Mean litter size of two strains of guinea pigs, compared over η — 9 years.
m (2) m w
Year Strain Β Strain 13 D Rank(R)
Procedure
1. Compute the differences between the η pairs of observations. These are entered
in column (3), labeled D.
2. Rank these differences from the smallest to the largest without regard to siyn.
3. Assign to the ranks the original signs of the differences.
4. Sum the positive and negative ranks separately. The sum that is smaller in
absolute value, Ts, is compared with the values in Table XII for η = 9.
Since T, = 1, which is equal to or less than the entry for one-tailed α = 0.005
in the table, our observed difference is significant at the 1% level. Litter size in
strain Β is significantly different from that of strain 13.
For large samples (η > 50) compute
rnn + xK" + i)
V 12
is assigned t o t h e c o r r e s p o n d i n g r a n k . T h e s u m of t h e positive o r of t h e n e g a t i v e
r a n k s , w h i c h e v e r o n e is s m a l l e r in a b s o l u t e value, is t h e n c o m p u t e d (it is labeled zyxwv
Ts) a n d is c o m p a r e d w i t h t h e critical v a l u e Τ in T a b l ezyxwvutsrqponmlkjihgfedc
XII f o r t h e c o r r e -
s p o n d i n g s a m p l e size. In view of t h e significance of t h e r a n k s u m , it is clear
t h a t s t r a i n Β h a s a litter size different f r o m t h a t of s t r a i n 13.
T h i s is a very s i m p l e test t o c a r r y o u t , b u t it is, of c o u r s e , n o t as efficient
as the c o r r e s p o n d i n g p a r a m e t r i c t test, w h i c h s h o u l d be p r e f e r r e d if the n e c e s s a r y
a s s u m p t i o n s hold. N o t e t h a t o n e n e e d s m i n i m a l l y six differences in o r d e r t o
c a r r y o u t W i l c o x o n ' s s i g n e d - r a n k s test. W i t h only six p a i r e d c o m p a r i s o n s , all
differences m u s t be of like sign for the test t o be significant a t t h e 5% level.
F o r a large s a m p l e a n a p p r o x i m a t i o n using the n o r m a l c u r v e is available,
w h i c h is given in B o x 10.3. N o t e t h a t t h e a b s o l u t e m a g n i t u d e s of t h e differences
play a role only i n s o f a r as they affect the r a n k s of the differences.
A still simpler test is the sign test, in w h i c h we c o u n t t h e n u m b e r of positive
a n d negative signs a m o n g the differences ( o m i t t i n g all differences of zero). W c
t h e n test t h e h y p o t h e s i s t h a t t h e η p l u s a n d m i n u s signs are s a m p l e d f r o m a
p o p u l a t i o n in which t h e t w o k i n d s of signs a r e present in e q u a l p r o p o r t i o n s ,
as m i g h t be e x p e c t e d if t h e r e were n o t r u e difference b e t w e e n t h e t w o p a i r e d
samples. S u c h s a m p l i n g s h o u l d follow the b i n o m i a l d i s t r i b u t i o n , a n d the test
of the h y p o t h e s i s t h a t the p a r a m e t r i c f r e q u e n c y of t h e plus signs is ρ = 0.5 c a n
be m a d e in a n u m b e r of ways. Let us learn these by a p p l y i n g the sign test to
the g u i n e a pig d a t a of B o x 10.3. T h e r e a r c n i n e differences, of w h i c h eight a r c
positive a n d o n e is n e g a t i v e . W e c o u l d follow the m e t h o d s of Section 4.2
(illustrated in T a b l e 4.3) in which we c a l c u l a t e the c x p e c t c d p r o b a b i l i t y of
s a m p l i n g o n e m i n u s sign in a s a m p l e of nine o n the a s s u m p t i o n of β = q = 0.5.
T h e p r o b a b i l i t y of such a n o c c u r r e n c e a n d all " w o r s e " o u t c o m e s e q u a l s 0.0195.
Since we h a v e n o a p r i o r i n o t i o n s t h a t o n e strain s h o u l d h a v e a g r e a t e r litter
size t h a n the o t h e r , this is a two-tailed test, a n d wc d o u b l e the p r o b a b i l i t y to
0.0390. Clearly, this is a n i m p r o b a b l e o u t c o m c , a n d wc reject the null h y p o t h e s i s
that ρ — q = 0.5.
Since the c o m p u t a t i o n of the cxact p r o b a b i l i t i e s m a y be q u i t e t e d i o u s if n o
t a b l e of c u m u l a t i v e b i n o m i a l p r o b a b i l i t i e s is at h a n d , we m a y t a k e a s e c o n d
a p p r o a c h , using T a b i c IX, which f u r n i s h e s c o n f i d e n c e limits for ρ for v a r i o u s
s a m p l e sizes a n d s a m p l i n g o u t c o m e s . L o o k i n g u p s a m p l e size 9 a n d Υ = 1
( n u m b e r s h o w i n g the p r o p e r t y ) , we find the 95% c o n f i d e n c e limits to be 0.0028
a n d 0.4751 by i n t e r p o l a t i o n , t h u s e x c l u d i n g the value ρ = q = 0 5 p o s t u l a t e d
by the null h y p o t h e s i s . At least at the 5% significance level wc c a n c o n c l u d e
that it is unlikely t h a t the n u m b e r of p l u s a n d m i n u s signs is e q u a l . T h e c o n -
fidence limits imply a t w o - t a i l e d d i s t r i b u t i o n ; if we i n t e n d a o n e - t a i l e d test, wc
c a n infer a 0.025 significance level f r o m the 95% c o n f i d e n c e limits a n d a 0.005
level f r o m the 99% limits. O b v i o u s l y , such a o n e - t a i l e d test w o u l d be carried
out only if the results were in the d i r e c t i o n of t h e a l t e r n a t i v e h y p o t h e s i s . T h u s ,
if the a l t e r n a t i v e h y p o t h e s i s were t h a t s t r a i n 13 in Box 10.3 h a d g r e a t e r litter
size t h a n strain B, wc w o u l d not b o t h e r t e s t i n g this e x a m p l e at all, sincc the
228 chapter 10 , a s s u m p t i o n s o f a n a l y s i s o f v a r i a n c 1
o b s e r v e d p r o p o r t i o n of y e a r s s h o w i n g t h i s r e l a t i o n is less t h a n half. F o r l a r g e r
s a m p l e s , w e c a n use t h e n o r m a l a p p r o x i m a t i o n t o the b i n o m i a l d i s t r i b u t i o n as
follows:zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
ts = (Υ — μ )/σγ = (Y — kp)/y/kpq, w h e r e we s u b s t i t u t e t h e m e a n a n d
s t a n d a r d d e v i a t i o n of t h e b i n o m i a l d i s t r i b u t i o n l e a r n e d in S e c t i o n 4.2. In
o u r case, w e let η s t a n d f o r k a n d a s s u m e t h a t ρ = q = 0.5. T h e r e f o r e , t s =
(F — = (7 — T h e v a l u e of ts is t h e n c o m p a r e d w i t h Γα[αο) in
T a b l ezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
III, u s i n g o n e tail o r t w o tails of the d i s t r i b u t i o n as w a r r a n t e d . W h e n t h e
s a m p l e size η > 12, this is a s a t i s f a c t o r y a p p r o x i m a t i o n .
A t h i r d a p p r o a c h we c a n use is to test t h e d e p a r t u r e f r o m the e x p e c t a t i o n
t h a t ρ = q = 0.5 by o n e of the m e t h o d s of C h a p t e r 13.
Exercises
10.1 Allee and Bowen (1932) studied survival time of goldfish (in minutes) when placed
in colloidal silver suspensions. Experiment no. 9 involved 5 replications, and
experiment no. 10 involved 10 replicates. Do the results of the two experiments
differ? Addition of urea, NaCl, and N a 2 S to a third series of suspensions ap-
parently prolonged the life of the fish.
Colloidal silver
Urea and
Experiment no. 9 Experiment no. 10 salts added
Analyze and interpret. Test equality of variances. Compare anova results with
those obtained using the Mann-Whitney U test for the two comparisons under
study. To test the effect of urea it might be best to pool Experiments 9 and 10,
if they prove not to differ significantly. ANS. Test for homogeneity of Experi-
ments 9 and 10, Us = 33. us. For the comparison of Experiments 9 and 10 versus
urea and salts, 136, Ρ < 0.001.
10.2 In a study of flower color in Butterflywced (Asc/epias tuherosa), Woodson (1964)
obtained the following results:
Cieoi/raphie
region Y η .V
The variable recorded was a color score (ranging from 1 for pure yellow to 40
for deep orange-red) obtained by matching flower petals to sample colors in
Maerz and Paul'szyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB
Dictionary of Color. Test whether the samples are homo-
scedastic.
10.3 Test for a difference in surface and subsoil p H in the data of Exercise 9.1, using
Wilcoxon's signed-ranks test. ANS. Ts = 38; Ρ > 0.10.
10.4 Number of bacteria in 1 cc of milk from three cows counted at three periods
(data from Park, Williams, and Krumwiede, 1924):
(a) Calculate means and variances for the three periods and examine the relation
between these two statistics. Transform the variates to logarithms and com-
pare means and variances based on the transformed data. Discuss.
(b) Carry out an anova on transformed and untransformed data. Discuss your
results.
10.5 Analyze the measurements of the two samples of chigger nymphs in Box 10.2
by the Mann-Whitney U test. Compare the results with those shown in Box 10.2
for the Kolmogorov-Smirnov test. ANS. V„ = 123.5, Ρ < 0.05.
10.6 Allee et al. (1934) studied the rate of growth of Ameiurus melas in conditioned
and unconditioned well water and obtained the following results for the gain in
average length of a sample fish. Although the original variates are not available,
we may still test for differences between the two treatment classes. Use the sign
test to test for differences in the paired replicates.
Conditioned Unconditioned
Replicate water water
1 2.20 1.06
2 1.05 0.06
3 3.25 3.55
4 2.60 1.00
5 1.90 1.10
6 1.50 0.60
7 2.25 1.30
8 1.00 0.90
9 — 0.09 -0.59
10 0.83 0.58
CHAPTER
Regression
VtsronljfeaUT
= a + i>\
V
= 20 4 15Λ" D r u g Λ on a n i m a l 1'
' = 40 + 7.5.Y l ) r u K li on a n i m a l (J
' = 20 + 7.5.Y D r u g Η on a n i m a l Ρ
.V
0 1 2 3 4 5 6 7 8
M i c r o g r a m s of d r u g / c c blood
Ι ICURI: III
Blood p r e s s u r e of an a n i m a l in m m H g as a f u n c t i o n of d r u g c o n c e n t r a t i o n in mi per ec of b l o o d .
232 chapter ]1 / regression
rKiUKl· 11.2
BUHKI p r e s s u r e of a n animal
in m m l l g a s a ( u n c i i o n of d r u g
c o n c e n t r a t i o n in /ig p e r cc of
blood. R e p e a t e d s a m p l i n g for
a given d r u g c o n c e n t r a t i o n .
ι Μ i r r u g n o n s of d r u g IT MIUKI
234 chapter 11 / regression
r r m m t— NO Ό Ο
Ί Ί On 04 m
On m OO O NΊ N
—ι rs — Ο Ί m ^
κ 1—1 Ο Ο Ο — r~i rn
Γν|
NO q
ta Ί" m Γ o Ί" rN ο Ο
NO ι/~> <N ο Ί ο οο
ο s
ο © Ο ο ο
— §
ο
<N 8
Ο ο O Ο
§Ο
Ο
NO
NO
οο
ο
Ο ο
oc 00 m On Ί" O N m f
^t m ι/"> NO <N Ό ON Ί"
rΟ m Τ Ο m Ο
I ri o rj m Ο Ί ο Ο Ο Ο
ο ο © Ο Ο Ο ο © Ο Ο
1 I I I
οο rN 00 rn NO m ON
rn ΓΊ m Γ un ο Ί 00 fN
Ο sO rn οο 00 oo O N rj
ο — Ί" οο m NO — t | Ο
αό οό κ NO ws ^ rn Ί" \ό
V")
οο ON Ί ON O ο
N fN
ο ON Ί NO — ON ON o
rl τ oo •—1 ο rn ο — . O rr,
Tl
N
ο Ο Ο οο m Cn) —t Ο
οο Ο ο ο Ο rn vS Ί rn
rl
NO ο r NO Ί _ NO Ι CnI NO ΓΜ
rn Ο sC οο οο yri r ΟΝ OO r
rn r^ ΟΝ OO rj
ο ro Ί m NO ρ ρ ΟΟ Cl
χ' α\ — rn Ο Ο r jrn rn οό
•η ΟΟ «—· I Iι | rn NO ON Ί"
I 1 I 1 1 Ί 1
Ι I
sO
rg η r jr^ r ιr^t r^i Γ1 ΓΊ ΟΟ m
α^ ON <r 'r, ΟΟ r
— r NO οο NO LTj oo NO 'O
* ON NO Ί NO NO o K to ^ r
rn r rn Wi Τ 1 m ON ο
Ό Ί" .— NO CO Ο
r1 οο
DO oo oo ΟΟ Γ~ IΓ iπ t)r l rg
ΙΟ Ί Γ I 1 Ί ΓΙ Ο Ο
O — '— · nD ο '— ' rn OO rn Ο
r i r iο Ο ο Ο — — : Η ο
I 1 1 1 1
—
ON G\ σ ON „
rn m oo Γ*Ί NO; — — NO NO ο
| Ο OC ο r jr i>r\ τ ri ο
un rn | — ' ri rn Ί"
1 1
π
00 t ΟΟ Ο m oo η ο rί
χ ON NO Ο ON oc NO r ιr CJ Ο
οο oo NO Νθ" <r\ ui f Nf rr' \0
$
Μ
σν
: <-r ι V, ίο «/"ιΓ*" I
Ο r ιO^ rn rn ri Ό r^t Ο
: £ ' ΓΝ Ί ν-i ND OO ON Ό
RF
2 Ε
—ο
11.3 / t h e l i n e a r r e g r e s s i o n equation 237
8
7
a
FIGURE 1 1 . 3
W e i g h t loss (in mg) of nine batches of 25
Tribolium beetles after six days of starva-
tion at nine different relative humidities.
D a t a f r o m T a b l e 1 I.I, after Nelson (1964).
A'
1
' 1 1 j 1 1 1 . . γ
0 10 20 30 40 50 60 70 80 90 100
% Relative humidity
Y = a + hX (11.1)
it
FIGURE 1 1.4
ο
D e v i a t i o n s f r o m t h e m e a n (of Y) f» r 'he
d a t a of F i g u r e 11.3.
238 CHAPTER 11 / REGRESSION
which indicates that for given values of X, this equation calculates estimated
values Y (as distinct from the observed values Y in any actual case). T h e devia-
tion of an observation Yj f r o m the regression line is ( ^ — f ; ) a n d is generally
symbolized aszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
d Y x . These deviations can still be drawn parallel to the Y axis,
but they meet the sloped regression line at an angle (see Figure 11.5). T h e sum
of these deviations is again zero ( Σ ά γ . χ = 0), a n d the sum of their squares yields
a quantity Σ(Υ — Υ)2 — ΣάΥ.χ a n a l o g o u s to the sum of squares Σγ2. F o r rea-
sons that will become clear later, Σ ά Υ Χ \ & called the unexplained sum of squares.
The least squares linear regression line t h r o u g h a set of points is defined as that
straight line which results in the smallest value of Σ ά Υ χ . Geometrically, the
basic idea is that one would prefer using a line that is in some sense close to
as m a n y points as possible. F o r purposes of ordinary Model I regression analy-
sis, it is most useful to define closeness in terms of the vertical distances from
the points to a line, and to use the line that makes the sum of the squares
of these deviations as small as possible. A convenient consequence of this cri-
terion is that the line must pass t h r o u g h the point Χ, Ϋ. Again, it would be
possible but impractical to calculate the correct regression slope by pivoting
a ruler a r o u n d the point Χ, Ϋ and calculating the unexplained sum of squares
Σ ά \ . χ for each of the innumerable possible positions. Whichever position gave
the smallest value of ΣιΙ2 ,A. would be the least squares regression line.
The formula for the slope of a line based on the minimum value of Σ d Y . x
is obtained by means of the calculus. It is
y xv
, Π1.2)
I· **
Let us calculate h = Σ.νν'/Σ.ν 2 for our weight loss data.
We first c o m p u t e the deviations from the respective means of λ' and Y,
as shown in columns (.3) and (4) of Tabic 11.1. The sums of these deviations.
MOURE 1 1.5
D e v i a t i o n s f r o m t h e r e g r e s s i o n line f o r I h e
d a t a of F i g u r e 11.3.
Relative Iniiniilitv
11.3 / THE LINEAR REGRESSION EQUATION 239
Σ χ a n d Σ>>, are slightly different from their expected value of zero because of
r o u n d i n g errors. The squares of these deviations yield sums of squares and
variances in columns (5) and (7). In column (6) we have c o m p u t e d the products
xy, which in this example are all negative because the deviations are of unlike
sign. An increase in humidity results in a decrease in weight loss. The sum of
these products Σ" xy is a new quantity, called thezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQ
sum of products. This is a p o o r
but well-established term, referring to Σ xy, the sum of the p r o d u c t s of the devia-
tions rather t h a n ΣΧΥ, the sum of the products of the variates. You will
recall that Σ ν 2 is called the sum of squares, while ΣΥ2 is the sum of the squared
variates. The sum of products is a n a l o g o u s to the sum of squares. When divided
by the degrees of freedom, it yields the covariance, by analogy with the vari-
ance resulting from a similar division of the sum of squares. You may recall first
having encountered covariances in Section 7.4. N o t e that the sum of products
can be negative as well as positive. If it is negative, this indicates a negative
slope of the regression line: as X increases, Y decreases. In this respect it differs
from a sum of squares, which can only be positive. F r o m Table 11.1 we find that
Σ.χ}' = - 4 4 1 . 8 1 7 6 , Σ .γ2 = 8301.3889, and b = Σχγ/Σχ2 = - 0 . 0 5 3 , 2 2 . Thus,
for a one-unit increase in X, there is a decrease of 0.053,22 units of Y. Relating
it to our actual example, we can say that for a 1% increase in relative humidity,
there is a reduction of 0.053,22 m g in weight loss.
You may wish to convincc yourself that the formula for the regression
coefficient is intuitively reasonable. It is the ratio of the sum of products of
deviations for X and Y to the sum of squares of deviations for X. If we look
at the product for A",, a single value of X, we obtain x,y,. Similarly, (he squared
deviation for X, would be x 2 , or x,x,. T h u s the ratio \,y, .ν,.ν, reduces to y ; /x
Although Σ v y / Σ χ 2 only a p p r o x i m a t e s the average of y,/x ; for (he η values of
X h the latter ratio indicates the direction and magnitude of the change in Y
for a unit change in X. Thus, if y, on the average equals ,v,. b will equal 1. When
y, = — .ν,, b — 1. Also, when |y,| > |.x,|, /> > |l|; and conversely, when jy,| <
jx,|, b < \ \ \ .
How can we complete the equation Y = a -+ bX'J We have stated that the
regression line will go through the point ,Ϋ, Y. At V 50.39,yxtsonmljieaYWPNLJIA
Ϋ ^ 6.022; that is,
we use Ϋ, the observed mean of Y, as an estimate Ϋ of the mean. We can sub-
stitute these means into Fxpression (11.1):
Y = a + bX
Y = a + bX
a = Ϋ - bX
a = 6.022 - (-0.053,22)50.39
= 8.7038
Therefore,
Ϋ - 8.7038 - 0.053,22X
240 CHAPTER 11 / REGRESSION
Ϋ = (Ϋ - bX) + bX
= Ϋ + b(X - X)
Ϋ - Y = bx
y = bx (11.3)
FIGURE 1 1.6
L i n e a r r e g r e s s i o n l i l t e d t o d a t a of
F i g u r e 11.3.
.V
.v
0 10 20 :S0 to .Ml (>() 70 SO 90 101)
' ( Hchitivc Imniidil v
11.3 /zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
THE LINEAR REGRESSION LOCATION 241
η
Σ ( χ χ κ υ - ϋ)
η (11.4)
Σ (χ γι '
B O X 11.1
humidity (A") 0 12.0 29.5 43.0 53.0 62.5 75.5 85.0 93.0
Basic computations
1. Compute sample size, sums, sums of the squared observations, and the sum of
the X K's.
n=9 453.5 £ Y = 54.20
2 2
Σ A = 31,152.75 £ Y = 350.5350 £ XY = 2289.260
2. The means, sums of squares, and sum of products are
X = 50.389 Υ = 6.022
Xx 2
= 8301.3889 ^ y 2 = 24.1306 yxtsonmljieaYWPNLJIA
^ n
4 5 3 5
= 2289.260 ^ : f -20)^44L8,78
8301.3889
6. The unexplained sum of squares is
= χ . ν 2 - £j> 2 = 24.1306 - 23.5145 = 0.6161
11.4 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
/MORE T H A N O N E VALUE; OE Y EOR E A C H V A L U E OF X 243
2 2
Σί = Σ ^ = ^2Σ*2 = § ^ Σ * 2
("-τ)
v-2 (Σ*?)2
L y -
Σ*2
B O X 11.2
The variateszyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Y are arcsine transformations of the percentage survival of the
bettle Tribolium castaneum at 4 densities (X = number of eggs per gram of flour
medium).
Density = X
(a = 4)
Σ «. = 15
Σ Σ y = 907.81
Source: Data by Sokai (1967).
Anova table
Source of variation df SS MS F,
We proceed to test whether the differences among the survival values can be
accounted for by linear regression on density. If F„ < [l/(« — l)j F a( i Σ„„, „,, it
is impossible for regression to be significant.
BOX 11.2
Continued
a
2. S a m of X 2 weighted by sample size — £ n,X 2
2 2
= %Sf + 4(20? + 3(50) + 3(100)
= 39.225
= 30,841.35
Σ»«
quantity 1
quantity 3
( T xv) 2
7. Explained sum of squares = Σ .P2 = --
Σ χ2
2 / ΊΊΛΊ
quantity 5 18,690
=
8. Unexplained sum of squares = Σ ^ ΐ ' χ ® W P 8 ~~ Σ ί ' 2
= SSgromPS ~ quantity 7
= 423.7016 - 403.9281 = 19.7735
246 CHAPTER 1 1 / REGRESSION
B O X 11.2
Continued
Completed anova table with regression
Source of variation if SS MS F,
r Λ χ
Σ quantity 5 18,690
10. Y intercept = a = f -bY .xX
a Rj
_ Σ Σ Y quantity 9 χ quantity 1
Σ "< Σ »«
907.81 (-0.147,01)555
60.5207 + 5.4394 = 65.9601
15 15
Α Β
Υ Y
+ +
+ tsronljfeaUT
+ + + +
I ι L -J ι ι u _1 ] L
I I L Λ' _i ι ι u V I 1 L_ V
FIGURE 1 1.7
D i f f e r e n c e s a m o n g m e a n s a n d linear regression. G e n e r a l ( r e n d s o n l y a r e i n d i c a t e d by these figures.
Significance of a n y of t h e s e w o u l d d e p e n d o n (he o u t c o m e s of a p p r o p r i a t e tests.
Yij = μ + / ix, + O , + C y
figljrr 11.8
D e n s i t y ( n u m b e r of e^Rs/ii of m e d i u m )
250 CHAPTER 1 1 / REGRESSION
11.5 T e s t s o f s i g n i f i c a n c e in regression
T r a n s p o s e d , this yields
„ , (V a t ) 2 ^ r
Σ > ' " = y v2 ' Σ<'>
Of course, Σ r c o r r e s p o n d s to y, Σ ι . v to dy Y, and
V'v''y
'/.'•Λ
ι KiURi: 11.9
Sthcmalic diagram to s h o w relations in-
v o l v e d in p a r t i t i o n i n g i h e s u m of s q u a r e s of
the d e p e n d e n t variable.
0 -Λ
Λ'
11.5zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
/ TESTS OF SIGNIFICANCE IN REGRESSION 262
_ Explained _ ( £ *v)2 2 2 ?ν 1
σ
Y
~ Y
(estimated Y from 1
L>' = y-pT · νί >· χ + Is' Σ χ~
m e a n of Κ)
Unexplained, error
s σ
V - V (observed Y from η- 2 £ .x = V r - Σ f 'f .* ϊ χ
estimated V)
Ϋ Total (observed Y n _ , £ v2 = £ γ ι _ {Σ si
from mean of F) ' η
Source of variation df SS MS l\
Explained - d u e to linear
regression 1 23.5145 23.5145 267.18**
Unexplained error around
r e g r e s s i o n line 7 0.6161 0.08801
Total '8~ 141306
υ •S.C
43 55
*
•s.
*
> yxtsonmljieaYWPNLJIA
η!
<3
<Ν I
£ . I έ"
| Χ eS
<3
e
II g
τa ? *
ν.
II
13 11
•S-2
8 |
£ feb
•S'iH
{Λ
®3 -ri.
1X 60
2 .S
C tA
υ
•a nj
ΙΛ
a
ο
I
, υ
Κ "Ο.
2 ε
β
ο
ο = '"δ ι^Γ
S
ο ίε I <U
ο
'C </ι ο
Β) · " c υ
£WJ Λ1 ο Ε
α. (ύ
"θ.
60
Ο
μ
11.5 /zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
TESTS OF S I G N I F I C A N C E i n REGRESSION 253
BOX 11.4
Significance tests and computation of confidence limits of regression statistics. Single
value of Y for each value of X.
Based on standard errors and degrees of freedom of Box 11.3; using example of
Box 11.1.
η= 9 X = 50.389 ? = 6.022 yxtsrponmljifecbaYSRPOJ
b y .x = - 0.053,22 £ x 2 = 8301.3889
0.088,01
(ft - 2) 7
= ^ 1 = 0.098,888,3
254 CHAPTER 1 1 / REGRESSION
BOX 11.4
Continued
= v ® 8 W 0 4 0 7 ^ 0 ) = %/α035,873 = 0.189,40
7. 95% confidence limits for μ Υί corresponding to the estimate Yt ~ 3.3817 at
A", = 100% relative humidity:
!» J-·
FIGURE 1 1.10
95% confidence limits to regression line of
F i g u r e 11.6.
.V
0 10 '20 30 10 50 60 70 80 00 100
r
', R e l a t i v e h u m i d i t y
FIGURE 11.11
95% confidence limits to regression e s t i m a t e s
for d a t a of F i g u r e 11.6.
3
X
1 1 1 1 l| 1 i 1 . 1 X
0 10 '20 HO 10 50 tin 70 SO 90 100
Relative humidity
it m a y be as i m p o r t a n t to c o m p a r e regression coefficients as it is to c o m p a r e
these o t h e r statistics.
T h e test for the difference between t w o regression coefficients can be carried
out as a nzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
F test. W e c o m p u t e
ρ = J M l L _
s
YA±M,2 yx
(Σ*1)(Σ4)
where .s 2 . x is the weighted average sj. x of the t w o groups. Its f o r m u l a is
V:
160
120
· = 100
ο
80
Ο
50
40
-J Λ'
10 20 0 10 15 20
FIGURE 1 1 . 1 2
L o g a r i t h m i c t r a n s f o r m a t i o n of a d e p e n d e n t v a r i a b l e in r e g r e s s i o n . C h i r p - r a t e a s a f u n c t i o n of t e m -
p e r a t u r e in m a l e s of t h e t r e e c r i c k e t Oecanthus fultoni. Each point represents the m e a n chirp rate/min
f o r all o b s e r v a t i o n s a t a g i v e n t e m p e r a t u r e in " C . O r i g i n a l d a t a in left p a n e l , Y p l o t t e d o n l o g a r i t h m i c
s c a l e in r i g h t p a n e l . ( D a t a f r o m B l o c k , 1966.)
HGuki I l.l.l
L o g a r i t h m i c t r a n s f o r m a t i o n of t h e i n d e p e n d e n t v a r i a b l e in r e g r e s s i o n . T h i s i l l u s t r a t e s s i / e of elec-
trical r e s p o n s e l o i l l u m i n a t i o n in t h e c e p h a l o p o d eye. O r d i n a t e , m i l l i v o l t s ; a b s c i s s a , r e l a t i v e b r i g h t -
ness of i l l u m i n a t i o n . A p r o p o r t i o n a l i n c r e a s e in Λ' ( r e l a t i v e b r i g h t n e s s ) p r o d u c e s a l i n e a r e l e c t r i c a l
r e s p o n s e V. ( D a t a in l ' r b h l i c h , 1 9 2 1 . )
262 CHAPTER 1 1 / REGRESSION
100
ο
ε so
c
a;
1
γ 2 1 1 1 —1 '— .V
0 10 20 30 0.1 0.5 1 2 5 10 30
Dose Dose in log scale
FIGURE 11.14
Exercises
11.1 T h e f o l l o w i n g t e m p e r a t u r e s ( Y ) w e r e r e c o r d e d in a r a b b i t a t v a r i o u s t i m e s ( Z )
a f t e r it w a s i n o c u l a t e d w i t h r i n d e r p e s t v i r u s ( d a t a f r o m C a r t e r a n d M i t c h e l l , 1958).
Time after
injection Temperature
(h) CF)
24 102.8
32 104.5
48 106.5
56 107.0
72 103.9
80 103.2
96 103.1
G r a p h t h e d a t a . C l e a r l y , t h e last t h r e e d a t a p o i n t s r e p r e s e n t a d i f f e r e n t p h e n o m -
e n o n f r o m t h e first four pairs. For the first four points: (a) C a l c u l a t e b. (b)
C a l c u l a t e t h e r e g r e s s i o n e q u a t i o n a n d d r a w in t h e r e g r e s s i o n line, (c) T e s t t h e
h y p o t h e s i s t h a t β = 0 a n d set 9 5 % c o n f i d e n c e l i m i t s , (d) Set 9 5 % c o n f i d e n c e
l i m i t s t o y o u r e s t i m a t e of t h e r a b b i t ' s t e m p e r a t u r e 50 h o u r s a f t e r t h e i n j e c t i o n .
A N S . α = 100, b = 0 . 1 3 0 0 , F, = 5 9 . 4 2 8 8 , Ρ < 0 . 0 5 , Ϋ50 = 106.5.
1 1 .2 T h e f o l l o w i n g t a b i c is e x t r a c t e d f r o m d a t a b y S o k o l o f f (1955). A d u l t w e i g h t s
of f e m a l e Drosophihi persimilis r e a r e d at 2 4 " C a r c a f f e c t e d b y t h e i r d e n s i t y a s
l a r v a e . C a r r y o u t a n a n o v a a m o n g d e n s i t i e s . T h e n c a l c u l a t e t h e r e g r e s s i o n of
w e i g h t o n d e n s i t y a n d p a r t i t i o n t h e s u m s οΓ s q u a r e s a m o n g g r o u p s i n t o t h a t
e x p l a i n e d a n d u n e x p l a i n e d by l i n e a r r e g r e s s i o n . G r a p h t h e d a t a w i t h t h e r e g r e s -
s i o n line fitted t o t h e m e a n s . I n t e r p r e f y o u r r e s u l t s .
Mean weight
Larval of adults \ of wciifhts
density (in mg) (not \ , l η
1 1.356 0.180 9
3 1.356 0.133 34
5 1.284 0.130 50
6 1.252 0.105 63
10 0.989 0.130 83
20 0.664 0.141 144
40 0.475 0.083 24
11.3 D a v i s ( 1 9 5 5 ) r e p o r l e d t h e f o l l o w i n g r e s u l t s in a s t u d y of t h e a m o u n t of e n e r g y
m e t a b o l i z e d by t h e F n g / i s h s p a r r o w . Passer domesticus, under various constant
temperature conditions and a ten-hour photoperiod. Analyze and interpret
A N S . MSy = 6 5 7 . 5 0 4 3 . MS, , - 8.2186, A-/.S wjlhin = 3.9330. d e v i a t i o n s a r c n o t
EXERCISES 265
0 24.9 6 1.77
4 23.4 4 1.99
10 24.2 4 2.07
18 18.7 5 1.43
26 15.2 7 1.52
34 13.7 7 2.70
11.4 Using the complete data given in Exercise J 1.1, calculate the regression equa-
tion and compare it with the one you obtained for the first four points. Discuss
the effect of the inclusion of the last three points in the analysis. Compute the
residuals from regression.
11.5 The following results were obtained in a study of oxygen consumption (micro-
liters/mg dry weight per hour) in Heliothis zea by Phillips and Newsom (1966)
under controlled temperatures and photoperiods.
Temperature Photoperiod
CC) (h)
10 14
18 0.51 1.61
21 0.53 1.64
24 0.89 1.73
C o m p u t e r e g r e s s i o n f o r e a c h p h o t o p e r i o d s e p a r a t e l y a n d test f o r h o m o g e n e i t y
of s l o p e s . A N S . F o r 10 h o u r s : b =• 0 . 0 6 3 3 , . s j . K = 0 . 0 1 9 , 2 6 7 . F o r 14 h o u r s : b =
0.020,00, s2Y. j = 0.000,60.
11.6 L e n g t h of d e v e l o p m e n t a l p e r i o d (in d a y s j of t h e p o t a t o l e a f h o p p e r , Empousca
labile, f r o m e g g t o a d u l t at v a r i o u s c o n s t a n t t e m p e r a t u r e s ( K o u s k o l e k a s a n d
D e c k e r , 1966). T h e o r i g i n a l d a t a w e r e w e i g h t e d m e a n s , b u t f o r p u r p o s e s of t h i s
a n a l y s i s we shall c o n s i d e r t h e m a s t h o u g h t h e y w e r e s i n g l e o b s e r v e d v a l u e s .
Mean length of
developmental
Icmpt'ralun' period in days
( F) Y
59.8 58.1
67.6 27.3
70.0 26.8
70.4 26.3
74.0 19.1
75.3 19.0
78.0 16.5
80.4 15.9
81.4 14.8
83.2 14.2
88.4 14.4
91.4 14.6
m c 1 < "3
266 CHAPTER 1 1 / REGRESSION
Temperuiure Calories
CO Ϋ η s
0 24.3 6 1.93
10 25.1 7 1.98
18 22.2 8 3.67
26 13.8 10 4.01
34 16.4 6 2.92
Test for the equality of slopes of the regression lines for the 10-hour and 15-hour
photoperiod. ANS. Fs = 0.003.
11.8 Carry out a nonparametric test for regression in Exercises 11.1 and 11.6.
11.9 Water temperature was recorded at various depths in Rot Lake on August 1,1952,
by Vollenweider and Frei (1953).
Plot the data and then compute the regression line. Compute the deviations
from regression. Does temperature varv as a linear function of depth? What do
the residuals suggest? ANS. a = 23.384, h = - 1.435, F, = 45.2398, Ρ < 0.01.
CHAPTER
Correlation
In c o r r e l a t i o n , b y c o n t r a s t , we a r e c o n c e r n e d largely w h e t h e r t w o vari-
ables a r e i n t e r d e p e n d e n t , o rzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED
covary—that is, v a r y t o g e t h e r . W e d o n o t e x p r e s s
o n e as a f u n c t i o n of t h e o t h e r . T h e r e is n o d i s t i n c t i o n b e t w e e n i n d e p e n d e n t
a n d d e p e n d e n t v a r i a b l e s . It m a y well be t h a t of t h e p a i r of v a r i a b l e s w h o s e
c o r r e l a t i o n is s t u d i e d , o n e is t h e c a u s e of t h e o t h e r , but we n e i t h e r k n o w n o r
a s s u m e this. A m o r e typical (but n o t essential) a s s u m p t i o n is t h a t t h e t w o vari-
ables a r e b o t h effects of a c o m m o n cause. W h a t we wish t o e s t i m a t e is t h e d e g r e e
to which these v a r i a b l e s vary t o g e t h e r . T h u s we m i g h t be i n t e r e s t e d in t h e c o r -
r e l a t i o n b e t w e e n a m o u n t of fat in diet a n d i n c i d e n c e of h e a r t a t t a c k s in h u m a n
p o p u l a t i o n s , b e t w e e n foreleg length a n d h i n d leg l e n g t h in a p o p u l a t i o n of m a m -
mals, b e t w e e n b o d y weight a n d egg p r o d u c t i o n in f e m a l e blowflies, o r b e t w e e n
age a n d n u m b e r of seeds in a weed. R e a s o n s w h y we w o u l d wish t o d e m o n -
s t r a t e a n d m e a s u r e a s s o c i a t i o n b e t w e e n p a i r s of v a r i a b l e s need n o t c o n c e r n us
yet. W e shall t a k e this u p in Section 12.4. It suffices for n o w t o s t a t e t h a t w h e n
we wish t o establish the d e g r e e of a s s o c i a t i o n b e t w e e n p a i r s of v a r i a b l e s in a
p o p u l a t i o n s a m p l e , c o r r e l a t i o n a n a l y s i s is t h e p r o p e r a p p r o a c h .
T h u s a c o r r e l a t i o n coefficient c o m p u t e d f r o m d a t a that h a v e b e e n p r o p e r l y
a n a l y z e d by M o d e l 1 regression is m e a n i n g l e s s as a n e s t i m a t e of a n y p o p u l a -
tion c o r r e l a t i o n coefficient. C o n v e r s e l y , s u p p o s e we were t o e v a l u a t e a regres-
sion coefficient of o n e v a r i a b l e o n a n o t h e r in d a t a t h a t h a d been p r o p e r l y
c o m p u t e d as c o r r e l a t i o n s . N o t o n l y w o u l d c o n s t r u c t i o n of such a f u n c t i o n a l
d e p e n d e n c e for these variables n o t meet o u r i n t e n t i o n s , b u t we s h o u l d p o i n t
o u t t h a t a c o n v e n t i o n a l regression coefficient c o m p u t e d f r o m d a t a in which
b o t h variables are m e a s u r e d with e r r o r - a s is the case in c o r r e l a t i o n a n a l y s i s —
f u r n i s h e s biased e s t i m a t e s of the f u n c t i o n a l relation.
E v e n if we a t t e m p t the c o r r e c t m e t h o d in line with o u r p u r p o s e s we m a y
r u n a f o u l of the n a t u r e of the d a t a . T h u s we m a y wish t o e s t a b l i s h cholesterol
c o n t c n t of b l o o d i d a f u n c t i o n of weight, a n d t o d o so we m a y t a k e a r a n d o m
s a m p l e of m e n of the s a m e age g r o u p , o b t a i n e a c h i n d i v i d u a l ' s c h o l e s t e r o l c o n -
tent a n d weight, a n d regress the f o r m e r o n the latter. H o w e v e r , b o t h these
variables will h a v e been m e a s u r e d with e r r o r . I n d i v i d u a l v a r i a t e s of the s u p -
posedly i n d e p e n d e n t v a r i a b l e Λ' will n o t h a v e been deliberately c h o s e n o r c o n -
trolled by the e x p e r i m e n t e r . T h e u n d e r l y i n g a s s u m p t i o n s of M o d e l I regression
d o not h o l d , a n d fitting a M o d e l I regression to the d a t a is not legitimate,
a l t h o u g h y o u will have n o difficulty f i n d i n g i n s t a n c e s of such i m p r o p e r p r a c -
tices in t h e p u b l i s h e d research literature. If it is really a n e q u a t i o n d e s c r i b i n g
the d e p e n d e n c e of Y o n X that we are after, we s h o u l d c a r r y o u t a M o d e l II
regression. H o w e v e r , if it is the d e g r e e of a s s o c i a t i o n b e t w e e n t h e v a r i a b l e s
( i n t e r d e p e n d e n c e ) t h a t is of interest, t h e n we s h o u l d c a r r y o u t a c o r r e l a t i o n
analysis, for which these d a t a a r c suitable. T h e c o n v e r s e dilliculty is t r y i n g t o
o b t a i n a c o r r e l a t i o n coefficient f r o m d a t a t h a t are p r o p e r l y c o m p u t e d as a re-
gression t h a t is, a r e c o m p u t e d w h e n X is fixed. A n e x a m p l e w o u l d be h e a r t -
beats of a p o i k i l o t h c r m as a f u n c t i o n of t e m p e r a t u r e , w h e r e several t e m p e r a t u r e s
h a v e been a p p l i e d in a n e x p e r i m e n t . S u c h a c o r r e l a t i o n coeflicient is easily o b -
tained m a t h e m a t i c a l l y but w o u l d s i m p l y be a n u m e r i c a l value, not a n e s t i m a t e
270 CHAPTER 1 2 / CORRELATION
TABLE 1 2 . 1
The relations between correlation and regression. This table indicates the correct c o m p u t a t i o n for
any combination of purposes and variables, as shown.
of a p a r a m e t r i c m e a s u r e of correlation. T h e r e is an interpretation t h a t c a n be
given to the s q u a r e of the correlation coefficient that has some relevance to a
regression p r o b l e m . H o w e v e r , it is not in any way an estimate of a p a r a m e t r i c
correlation.
This discussion is s u m m a r i z e d in T a b l e 12.1, which shows the relations
between correlation and regression. T h e two c o l u m n s of the table indicate the
t w o c o n d i t i o n s of the pair of variables: in o n e case one r a n d o m a n d m e a s u r e d
with error, the o t h e r variable lixed; in the o t h e r ease, both variables r a n d o m .
In this text we depart f r o m the usual c o n v e n t i o n of labeling the pair of vari-
ables Y and X or X2 for both correlation and regression analysis. In re-
gression we c o n t i n u e the use of Y for the d e p e n d e n t variable a n d X for the
i n d e p e n d e n t variable, but in correlation both of the variables are in fact r a n d o m
variables, which we have t h r o u g h o u t the text designated as V. We therefore
refer to the t w o variables as V, a n d Y2. T h e rows of the table indicate the
intention of the investigator in carrying out the analysis, a n d the four q u a d -
rants of the table indicate the a p p r o p r i a t e p r o c e d u r e s for a given c o m b i n a t i o n
of intention of investigator a n d n a t u r e of the pair of variables.
l 2
=
(n - - "l).s
f · -
yi.s>2
C2-1)
2 2
I = j s ( n 1) = (« - I) = V X . v
Γ
12 = , (12.3)
VX-vrl-vi
T o slate Expression (12.2) more generally for variables Yt and Yk, we can write
it as
(η - 1 )SjSk
The correlation coefficient rjk can range from + 1 for perfect association
to — 1 for perfect negative association. This is intuitively obvious when we
consider the correlation of a variable Yj with itself. Expression (12.4) would then
yield r ^ = Σ y^y,/\/Σ>' 2 Σ = Σ ^ / Σ } ' ? = 1, which yields a perfect correla-
tion of + I. If deviations in one variable were paired with opposite but equal
272 CHAPTER 1 2 / CORRELATION
iKii'Ri: 12.1
B i v a r i a t e n o r m ; · I f r e q u e n c y d i s t r i b u t i o n . T h e p a r a m e t r i c c o r r e l a t i o n ρ b e t w e e n v a r i a b l e s V, a n d
e q u a l s z e r o . T h e f r e q u e n c y d i s t r i b u t i o n m a y be v i s u a l i z e d a s a b e l l - s h a p e d mound.
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
1 2 . 2 /' THE PRODUCTMOMEN Ε CORK I I.ATION COEFFICIENT
FIGURE 12.2
Bivariate n o r m a l frequency distribution. The parametric correlation μ zyxwvutsrqponmlkjihgfedcbaZYXW
b e t w e e n v a r i a b l e s F, a n d Y2
e q u a l s 0.9. T h e b e l l - s h a p e d m o u n d of F i g u r e 12.1 h a s b e c o m e e l o n g a t e d .
2r 2
I - I
Y 0- Y 0 y ο
- 1 - -1 -1
- 2" _ "> -2
Y o-
3I 1 1- 1- J_ 1 J
3 2 1 0 1 2 1
X
l-Kii IRI: 12.3
R a n d o m s a m p l e s f r o m b i v a r i a l e n o r m a l d i s t r i b u t i o n s w i l h v a r y i n g v a l u e s of t h e p a r a m e t r i c c o r r e -
l a t i o n c o e l h c i c n t p. S a m p l e s i / c s n 100 in all g r a p h s e x c e p t ( i . w h i c h h a s n 500. (Α) ρ (1.4.
(Η)/- I U . «'),> OS. ( I ) ) , , 0.7. (I I ρ 0.7. I f ) ρ 0.9. (Ci )p 0.5.
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
1 2 . 2 /' THE PRODUCTMOMEN Ε CORK I I.ATION COEFFICIENT
_ ( Σ >'.>'2) 2 .
Σ ^ Σ>Ί
Look at the left term of the last expression. It is the s q u a r e of the sum of
p r o d u c t s of variables Y, a n d Y2, divided by the sum of squares of Y,. If this
were a regression problem, this would be the f o r m u l a for the explained sum of
squares of variable Y2 on variable Y,, E y 2 . In the symbolism of C h a p t e r 11,
on regression, it would be E y 2 = ( E . x y ) 2 / E x 2 . T h u s , we can write
Σ j5 (12.6)
Izi
• 1 2 - ^ 2 ( !2 · 6 ί1 )
Zri
276 CHAPTER 1 2 / CORRELATION
which can be derived just as easily. (Remember that since we are n o t really
regressing one variable on the other, it is just as legitimate to have Yt explained
by Y2 as the other way around.) T h e ratio symbolized by Expressions (12.6) a n d
(12.6a) is a p r o p o r t i o n ranging f r o m 0 to 1. This becomes obvious after a little
contemplation of the m e a n i n g of this formula. The explained sum of squares
of any variable must be smaller t h a n its total sum of squares or, maximally, if
all the variation of a variable has been explained, it can be as great as the total
sum of squares, but certainly no greater. Minimally, it will be zero if n o n e of the
variable can be explained by the other variable with which the covariance has
been computed. Thus, we obtain an i m p o r t a n t measure of the p r o p o r t i o n of
the variation of one variable determined by the variation of the other. This
quantity, the square of the correlation coefficient,zyxwvutsrqponmlkjihgfedcbaZYXWVUTS
r\2, is called the coefficient
of determination. It ranges from zero to 1 a n d must be positive regardless of
whether the correlation coefficient is negative or positive. Incidentally, here is
proof that the correlation coefficient c a n n o t vary beyond - 1 a n d + 1 . Since
its square is the coefficient of determination and we have just shown that the
b o u n d s of the latter are zero to 1, it is obvious that the b o u n d s of its square
root will be ± 1.
T h e coefficient of determination is useful also when one is considering the
relative i m p o r t a n c e of correlations of different magnitudes. As can be seen by a
reexamination of Figure 12.3, the rate at which the scatter d i a g r a m s go f r o m a
distribution with a circular outline to one resembling an ellipse seems to be
m o r e directly proportional to r2 t h a n to r itself. Thus, in Figure 12.3B, with
ρ 2 = 0.09, it is difficult to detect the correlation visually. However, by the time
we reach Figure 12.3D, with μ 2 = 0 . 4 9 , the presence of correlation is very
apparent.
The coefficient of determination is a quantity that may be useful in regres-
sion analysis also. You will recall that in a regression we used a n o v a to partition
the total sum of squares into explained and unexplained sums of squares. O n c e
such an analysis of variance has been carried out, one can obtain the ratio of
the explained sums of squares over the total SS as a measure of the p r o p o r t i o n
of the total variation that has been explained by the regression. However, as
already discusscd in Section 12.1, it would not be meaningful to take the square
root of such a coefficient of determination and consider it as an estimate of the
parametric correlation of these variables.
We shall now take up a mathematical relation between the coefficients of
correlation and regression. At the risk of being repetitious, we should stress
again that though we can easily convert one coefficient into the other, this docs
not mean that the two types of coefficients can be used interchangeably on the
same sort of data. O n e i m p o r t a n t relationship between the correlation coeffi-
cient and the regression coefficient can be derived as follows from Expression
(12.3):
J>i>'2 = Σ yi>'2
χΣντ xlvi
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
1 2 . 2 /' THE PRODUCTMOMEN Ε CORK I I.ATION COEFFICIENT
w e
M u l t i p l y i n g n u m e r a t o r a n d d e n o m i n a t o r of this expression by V Z y f ,
obtain
.zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLK
>1ΫΜ _ Σ y ^ i . V Z y i
v r a If? 7Σ y\
Dividing n u m e r a t o r a n d d e n o m i n a t o r of the right t e r m of this expression by
sjn — 1, we o b t a i n
/Σ
Σ μ » ν" ~ 1 = — (12-7)
Σ.Γ / Σ yi
'η — 1
Similarly, we c o u l d d e m o n s t r a t e t h a t
r =
\i • ζ— (12.7a)
a n d hence
b,.2 = r l 2 ^ (12.7b)
s, s2
= s
i + s2 ~~ 2rl2sls2 (12.9)
W h a t Expression (12.8) indicates is t h a t if we m a k e a new c o m p o s i t e
variable that is the sum of t w o o t h e r variables, the variance of this new variable
will be the sum of the variances of the variables of which it is c o m p o s e d plus
an a d d e d term, which is a f u n c t i o n of the s t a n d a r d deviations of these two
variables a n d of the c o r r e l a t i o n between them. It is shown in Appendix A 1.8 that
this added term is twicc the covariance of Yl a n d Y2. W h e n the t w o variables
278 CHAPTER 12 / CORRELATION
BOX 12.1
C o m p u t a t i o n of the p r o d u c t - m o m e n t correlation coefficient.
V) (2)
r, Y2
Gi It Body
weight in weight
milligrams in grams
159 14.40
179 15.20
100 11.30
45 2.50
384 22.70
230 14.90
100 1.41
320 15.81
80 4.19
220 15.39
320 17.25
210 9.52
Computation
. „ (quantity l) 2 v(2347)
2
quantity 2 — = 583,403
12
124,368.9167
BOX 12.1
Continued
- 34.837.10 - - 6561.6175
12
9. Product-moment correlation coefficient (by Expression (12.3)):
r = = quantity 8
2
VX y i Σ χ/quantity 6 χ quantity 7
6561.6175 6561.6175
7(124,368.9167)(462.4782) ^577517,912.7314
6561.6175
: 0.8652 « 0.87
7584.0565
400 r
γ·
f i g u r e 12.4
S c a t t e r d i a g r a m f o r c r a b d a t a of B o x 12.1.
10 15 20 25 30
H o d y w e i g h t in g r a m s
BOX 12.2
Tests of significance and confidence limits for correlation coefficients.
Test of the null hypothesis H0: ρ =* 0 versus Hxi ρ ψ 0
The simplest procedure is t o c o n s u l t Table VIII, where t h e critical values o l »·
are tabulated for d f = η - 2 f r o m I t o 1000. If t h e a b s o l u t e v a l u e of the observed
r is g r e a t e r t h a n the tabulated value in the c o l u m n for t w o variables, we reject
the null hypothesis.
Examples. In Box 12.1 w e found the correlation between body weight and
gill w e i g h t t o b e 0 . 8 6 5 2 , b a s e d o n a s a m p l e o f η = 12. F o r 10 degrees of freedom
the critical values are 0.576 at the 5% level a n d 0.708 at the 1% level of signifi-
c a n c e . S i n c e t h e o b s e r v e d c o r r e l a t i o n is g r e a t e r t h a n b o t h of these, w e c a n reject
the null hypothesis, yxtsrponmljifecbaYSRPOJHGEDCA
H0 : p~0,atP< 0.01.
Table VIII i s b a s e d u p o n the following test, w h i c h m a y b e carried o u t when
t h e t a b l e is n o t a v a i l a b l e o r w h e n a n e x a c t t e s t is n e e d e d a t s i g n i f i c a n c e l e v e l s or
at d e g r e e s of f r e e d o m o t h e r t h a n t h o s e f u r n i s h e d in t h e table. T h e null hypothesis
is t e s t e d b y m e a n s o f t h e t d i s t r i b u t i o n ( w i t h η - 2 d f ) by using the standard error
o f r. W h e n ρ = 0,
Sr
E 3
2)
Therefore,
{r 0)
t ~ =r [ΕΞΆ
s 2
^ r ^ w ^ Wd-'· )
F o r the d a t a of B o x 12.1, this w o u l d be
ζ —0 r——
t - _ — = 2 v « - 3
S i n c e ζ is n o r m a l l y d i s t r i b u t e d a n d w e a r e u s i n g a p a r a m e t r i c s t a n d a r d deviation,
we compare i, w i t h o r e m p l o y T a b l e U , " A r e a s of t h e n o r m a l c u r v e . " If we
had a sample correlation of r = 0.837 between length of right- a n d left-wing veins
o f b e e s b a s e d o n η =» 5 0 0 , w e w o u l d find ζ — 1.2111 in T a b l e X. Then
t s = 1.2111 7 4 9 7 = 26.997
- 6
T h i s value, w h e n l o o k e d u p in T a b l e Π , yields a very small p r o b a b i l i t y ( < 10 ).
B O X 12.2
Continued
h = at (z - ζ ) ν " - 3
l/VfT-3
where ζ andzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
ζ are the ζ transformations of r and p, respectively. Again we com-
pare ts with f atool or look it up in Table I I . From Table V I I I we find
For r = 0.837 2 = 1.2111
For ρ = 0.500 ζ = 0.5493
Therefore
t, = (1.2111 - 0.5493)(V?97) = 14.7538
The probability of obtaining such a value of r, by random sampling is Ρ < 1 0 " 6
(see Table II). It is most unlikely that the parametric correlation between right-
and left-wing veins is 0.5.
Confidence limits
If η > 50, we can set confidence limits to r using the ζ transformation. We first
convert the sample r to z, set confidence limits to this z, and then transform these
limits back to the r scale. We shall find 95% confidence limits for the above wing
vein length data.
For r = 0.837, r = 1.2111, α = 0.05.
ί 0 5 ) 1 , 9 6 0
1 - 2 t « - - - ° " - 117. 21 11 11 1
1.2111 - 0 . 0 8 7 9 = 1.1232
1
L2 = ζ + = 1.2111 + 0.0879 = 1.2990
V" - 3
We retransform these ζ values to the r scale by finding the corresponding argu-
ments for the ζ function in Table X.
L, «0.808 and L2 « 0.862
are the 95% confidence limits around r = 0.837.
1
• +
3 n,
12.3 / zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
S I G N I F I C A N C E TESTS IN C O R R E L A T I O N
BOX 12.2
Continued
Since z t -zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
z is n o r m a l l y d i s t r i b u t e d a n d w e a r e u s i n g a p a r a m e t r i c s t a n d a r d
z
W h e n ρ is close t o + 1.0, t h e d i s t r i b u t i o n of s a m p l e v a l u e s of r is m a r k e d l y
a s y m m e t r i c a l , a n d , a l t h o u g h a s t a n d a r d e r r o r is a v a i l a b l e for r in such cases,
it s h o u l d n o t be a p p l i e d unless the s a m p l e is very large (n > 500), a m o s t in-
f r e q u e n t case of little interest. T o o v e r c o m e this difficulty, we t r a n s f o r m r to a
f u n c t i o n z, d e v e l o p e d by F i s h e r . T h e f o r m u l a for ζ is
(12.10)
BOX 113
Kendall's coefficient of r a n k correlation, τ.
Computation of a rank correlation coefficient between the blood neutrophil <. .urn ·.
(y,; χ 10" 3 perzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
μ ]) and total marrow neutrophil mass (Y2: x 10''per kg) m ι ·>
patients with nonhematological tumors; η = 15 pairs of observations.
Source:zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Data extracted from Liu, Kesfeld. and Koo (1983).
Computational steps
1. Rank variables Y, andyxtsrponmljifecbaYSRPOJHGEDCA
Y2 separately and then replace the original variates with
the ranks (assign tied ranks if necessary so that for both variables you will
always have η ranks for η variates). These ranks are listed in columns (3) and
(5) above.
2. Write down the η ranks of one of the two variables in order, paired with the
rank values assigned for the other variable (as shown below). If only one vari-
able has ties, order the pairs by the variable without ties. If both variables have
ties, it does not matter which of the variables is ordered.
3. Obtain a sum of the counts C i( as follows. Examine the first value in the column
of ranks paired with the ordered column. In our case, this is rank 10. Count
all ranks subsequent to it which are higher than the rank being considered.
Thus, in this case, count all ranks greater than 10. There are fourteen ranks
following the 10 and five of them are greater than 10. Therefore, we count a
score of C, = 5. Now we look at the next rank (rank 8) and find that six of
the thirteen subsequent ranks are greater than it; therefore, C 2 is equal to 6.
The third rank is 11, and four following ranks are higher than it. Hence, C 3 = 4.
Continue in this manner, taking each rank of the variable in turn and counting
the number of higher ranks subsequent to it. This can usually be done in one's
head, but we show it explicitly below so that the method will be entirely clear.
Whenever a subsequent rank is tied in value with the pivotal rank Rlt count
| instead of 1.
288 CHAPTER 1 2 / CORRELATION
BOX J 2.3
Continued
1 10 11,12,13,15,14 5
2 8 11,9,12,13,15,14 6
3 11 12,13, 15,14 4
4 7 9,12,13,15,14 5
5 9 12,13,15,14 4
6 1 6,4, 5,2,12, 3,13,15,14 9
7 6 12, 13, 15, 14 4
8 4 5,12,13,15,14 5
9 5 12,13,15,14 4
10 2 12, 3,13,15,14 5
11 12 13,15, 14 3
12 3 13, 15, 14 3
13 13 15, 14 2
14 15 0
15 14 0
£ C ( = 59
n(n - 1) - Σ Τ, n(n - 1) - £ Γ*
where Σ"1 Τ, and Σ"1 Τ 2 are the sums of correction terms for ties in the ranks of
variable Yl and Y2, respectively, defined as follows. A Τ value equal to t(t — 1)
is computed for each group of t tied variates and summed over m such groups.
Thus if variable Y2 had had two sets of ties, one involving t = 2 variates
and a second involving t = 3 variates, one would have computed Σ™ T2 =
2(2 - 1) + 3(3 - 1) = 8. It has been suggested that if the ties are due to lack
of precision rather than being real, the coefficient should be computed by the
simpler formula.
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
1 2 . 5 /' K E N D A L L ' S COEFFICIENT ΟΙ
KANK C O R R E L A T I O N 289
BOX 12.3
Continued
5. T o test significance for s a m p l e sizes >40, we can make use of a n o r m a l ap-
p r o x i m a t i o n to test t h e null hypothesis t h a t t h e t r u e v a l u e of τ = 0:
= — = = = = = L = = = compared with
v/2(2« + 5 ) / 9 φ · =!)
Y, 1 2 3 4 5
Y2 1 3 2 5 4
Exercises
12.1 G r a p h t h e f o l l o w i n g d a t a in t h e f o r m of a b i v a r i a t e s c a t t e r d i a g r a m . C o m p u t e
t h e c o r r e l a t i o n c o e f f i c i e n t a n d set 9 5 % c o n f i d e n c e i n t e r v a l s t o p. T h e d a t a w e r e
c o l l e c t e d f o r a s t u d y of g e o g r a p h i c v a r i a t i o n in t h e a p h i d Pemphigus populi-
transversus. T h e v a l u e s in t h e t a b l e r e p r e s e n t l o c a l i t y m e a n s b a s e d o n e q u a l
s a m p l e sizes f o r 2 3 l o c a l i t i e s in e a s t e r n N o r t h A m e r i c a . T h e v a r i a b l e s , e x t r a c t e d
f r o m S o k a l a n d T h o m a s (1965), a r e e x p r e s s e d in m i l l i m e t e r s . F, = t i b i a l e n g t h ,
Y2 = t a r s u s l e n g t h . T h e c o r r e l a t i o n c o e f f i c i e n t will e s t i m a t e c o r r e l a t i o n of t h e s e
t w o v a r i a b l e s o v e r l o c a l i t i e s . A N S . r = 0.910, Ρ < 0.01.
1 0.631 0.140
2 0.644 0.139
3 0.612 0.140
4 0.632 0.141
5 0.675 0.155
6 0.653 0.148
7 0.655 0.146
8 0.615 0.136
9 0.712 0.159
10 0.626 0.140
1 1 0.597 0.133
12 0.625 0.144
13 0.657 0.147
14 0.586 0.134
15 0.574 0.134
16 0.551 0.127
17 0.556 0.130
18 0.665 0.147
19 0.585 0.138
20 0.629 0.150
21 0.671 0.148
22 0.703 0.151
23 0.662 0.142
12.2 The f o l l o w i n g d a t a w e r e e x t r a c t e d f r o m a l a r g e r s t u d y b y B r o w e r ( 1 9 5 9 ) o n s p e c i a -
t i o n in a g r o u p of s w a l l o w t a i l b u t t e r f l i e s . M o r p h o l o g i c a l m e a s u r e m e n t s a r e in
m i l l i m e t e r s c o d e d χ 8.
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
EXERCISES 302
η η
Specimen Length of Length of
Species number 8th tergile superuncus
Compute the correlation coefficient separately for each species and test signifi-
cance of each. Test whether the two correlation coefficients differ significantly.
12.3 A pathologist measured the concentration of a toxic substance in the liver and
in the peripheral blood (in /ig/kg) in order to ascertain if the liver concentration
is related to the blood concentration. Calculate τ and test its significance.
Liver Blood
0.296 0.283
0.315 0.323
0.022 0.159
0.361 0.381
0.202 0.208
0.444 0.411
0.252 0.254
0.371 0.352
0.329 0.319
0.183 0.177
0.369 0.315
0.199 0.259
0.353 0.353
0.251 0.303
0.346 0.293
ANS. τ = 0.733.
12.4 The following tabic of data is from an unpublished morphometric study of the
cottonwood Populus deltoides by T. J. Crovello. Twenty-six leaves from one
tree were measured when fresh and again after drying. The variables shown are
fresh-leaf width (V,) and dry-leaf width (y2), both in millimeters. Calculate r
and test its significance.
y, Y, Y,
90 88 100 97
88 87 110 105
55 52 95 90
100 95 99 98
86 83 92 92
90 88 80 82
82 77 i 10 106
78 75 105 97
115 109 101 98
100 95 95 91
110 105 80 76
84 78 103 97
76 71
EXERCISES 293
12.5 Brown and Comstock (1952) found the following correlations between the length
of the wing and the width of a band on the wing of females of two samples
of t h e b u t t e r f l yzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Heliconius charitonius:
Sample η r
1 100 0.29
2 46 0.70
Test whether the samples were drawn from populations with the same value of
p. ANS. No, is = -3.104, Ρ < 0.01.
12.6 Test for the presence of association between tibia length and tarsus length in
the data of Exercise 12.1 using Kendall's coefficient of rank correlation.
CHAPTER
Analysis of Frequencies
Almost all o u r work so far has dealt with estimation of parameters and tests
of hypotheses for c o n t i n u o u s variables. The present chapter treats an i m p o r t a n t
class of cases, tests of hypotheses a b o u t frequencies. Biological variables may
be distributed i n t o two or m o r e classes, depending on some criterion such as
arbitrary class limits in a c o n t i n u o u s variable or a set of mutually exclusive
attributes. An example of the former would be a frequency distribution of birth
weights (a c o n t i n u o u s variable arbitrarily divided into a n u m b e r of contiguous
classes); one of the latter would be a qualitative frequency distribution such as
the frequency of individuals of ten different species obtained from a soil sample.
For any such distribution wc may hypothesize that it has been sampled f r o m
a population in which the frequencies of the various classes represent certain
parametric p r o p o r t i o n s of the total frequency. W e need a test of goodness of fit
for our observed frequency distribution to the expected frequency distribution
representing o u r hypothesis. You may recall that we first realized the need for
such a test in C h a p t e r s 4 and 5, where we calculated expected binomial. Poisson,
and normal frequency distributions but were unable to decide whether an ob-
served sample distribution departed significantly f r o m the theoretical one.
13.1 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
/TESTS FOR G O O D N E S S OF KIT. I N T R O D U C T I O N 295
In Section 13.1 we introduce the idea of goodness of fit, discuss the types
of significance tests that are appropriate, explain the basic rationale behind such
tests, a n d develop general c o m p u t a t i o n a l formulas for these tests.
Section 13.2 illustrates the actual c o m p u t a t i o n s for goodness of fit when
the d a t a are a r r a n g e d by a single criterion of classification, as in a one-way
quantitative or qualitative frequency distribution. This design applies to cases
expected to follow one of the well-known frequency distributions such as the
binomial, Poisson, or n o r m a l distribution. It applies as well to expected distri-
butions following some other law suggested by the scientific subject matter
under investigation, such as, for example, tests of goodness of fit of observed
genetic ratios against expected Mendelian frequencies.
In Section 13.3 we proceed to significance tests of frequencies in two-way
classifications—calledzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB
tests of independence. W e shall discuss the c o m m o n tests
of 2 χ 2 tables in which each of two criteria of classification divides the fre-
quencies into two classes, yielding a four-cell table, as well as R χ C tables with
more rows a n d columns.
T h r o u g h o u t this chapter we carry out goodness of fit tests by the G statistic.
W e briefly mention chi-squarc tests, which are the traditional way of analyzing
such cases. But as is explained at various places t h r o u g h o u t the text, G tests
have general theoretical advantages over chi-square tests, as well as being
computationally simpler, not only by c o m p u t e r , but also on most pocket or
tabletop calculators.
The basic idea of a goodness of fit test is easily understood, given the extensive
experience you now have with statistical hypothesis testing. Let us assume that
a geneticist has carried out a crossing experiment between two F , hybrids and
obtains an F 2 progeny of 90 offspring, 80 of which a p p e a r to be wild type and
10 of which are the m u t a n t phenotypc. T h e geneticist assumes d o m i n a n c e and
expects a 3:1 ratio of the phenotypes. When we calculate the actual ratios,
however, we observe that the d a t a are in a ratio 80/10 = 8:1. Expected values
for ρ and q are ρ = 0.75 and ij = 0.25 for the wild type and m u t a n t , respectively.
Note that we use the caret (generally called " h a t " in statistics) to indicate hypo-
thetical or expected values of the binomial proportions. However, the observed
p r o p o r t i o n s of these two classes are ρ = 0.89 and q = 0.11, respectively. Yet
another way of noting the contrast between observation and expectation is to
state it in frequencies: the observed frequencies are J\ = 80 and f2 = 10 for the
two phenotypes. Expccted frequencies should be (\ = pn = 0.75(90) = 67.5 and
/ , = qn = 0.25(90) = 22.5, respectively, where η refers to the sample size of
offspring from the cross. N o t e that when we sum the expected frequencies they
yield 67.5 + 22.5 = η = 90, as they should.
T h e obvious question that comes to mind is whether the deviation from the
3:1 hypothesis observed in o u r sample is of such a m a g n i t u d e as to be im-
probable. In other words, d o the observed d a t a differ enough from the expected
296 CHAPTER 1 3 / ANALYSIS OF FREQUENCIES
m m
(N (N
so
v-> in
m w>
<N <N
tN Ο CN|
m so
Γ Os' <N
OS Ο OO
< Tt
r") oo'
1
-J
c
•Ο
Ο
in m Ι ο
Κ <N ©
so c^ | os
' 5-s· II II
UJ ,2;
sr «ρ
in 5 •^U = C
ο ,c
ΰ 2 "
8 ο
- ε ζυ £Ο
οοισ> π » |
«Ν iu Ο v. II II
Cl.
Ο c.
2
ΐ J3Ο.
Ο Ο ΙΟ
oo —· I os
Ο - Ά
4> —
•5 <->
^ CJD Λ zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIH
ri .S all O.
—• α. C Β
5 41 ιΛ T3 2
ΗQΛ £ 2
298 CHAPTER 1 3 / ANALYSIS OF FREQUENCIES
hypothesis. N o t e that these expressions yield the probabilities for the observed
o u t c o m e s only,zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
not for observed and all worse outcomes. Thus, Ρ = 0.000,551,8
is less t h a n the earlier c o m p u t e d Ρ = 0.000,849, which is the probability of 10
and fewer mutants, assuming ρ = f , q =
T h e first probability (0.132,683,8) is greater t h a n the second (0.000,551,754,9),
since the hypothesis is based on the observed data. If the observed p r o p o r t i o n
ρ is in fact equal to the p r o p o r t i o n ρ postulated under the null hypothesis, then
the two c o m p u t e d probabilities will be equal and their ratio, L, will equal 1.0.
T h e greater the difference between ρ and ρ (the expected p r o p o r t i o n under the
null hypothesis), the higher the ratio will be (the probability based on ρ is
divided by the probability based on ρ or defined by the null hypothesis). This
indicates that the ratio of these two probabilities or likelihoods can be used as
a statistic to measure the degree of agreement between sampled and expected
frequencies. A test based on such a ratio is called a likelihood ratio test. In our
case, L = 0.132,683,8/0.000,551,754,9 = 240.4761.
It has been shown that the distribution of
G = 2 In L (13.1)
2
can be a p p r o x i m a t e d by the χ distribution when sample sizes are large (for a
definition of "large" in this case, see Section 13.2). The a p p r o p r i a t e n u m b e r of
degrees of freedom in Table 13.1 is 1 because the frequencies in the two cells
for these d a t a add to a constant sample size, 90. The outcome of the sampling
experiment could have been any n u m b e r of m u t a n t s from 0 to 90, but the
n u m b e r of wild type consequently would have to be constrained so that the
total would add up to 90. O n e of the cells in the tabic is free to vary, the other
is constrained. Hence, there is one degree of freedom,
f n our ease,
C(n,J\)pr<q (13.2)
and
But
13.1 / zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
TESTS FOR GOODNESS OF FIT: INTRODUCTION 299
SincezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
f = np a n d f\ = np a n d similarly f2 = nq and / , = nq,
L =
JW
//lV'//2
and
/Λ , , , J f l
lnL = / , l n ^ j + / 2 l n ^ J (13.3)
G = 2X./;in^) (13.4)
Σ/> η In η (13.5)
frequencies in any problem is fixed, this means that a — 1 classes are free to
vary, whereas the ath class must constitute the difference between the total sum
and the sum of the previous a — 1 classes.
In some goodness of fit tests involving more than two classes, we subtract
more than one degree of freedom from the number of classes,zyxwvutsrqponmlkjihgfedcb
a. These are
instances where the parameters for the null hypothesis have been extracted from
the sample data themselves, in contrast with the null hypotheses encountered
in Table 13.1. In the latter case, the hypothesis to be tested was generated on
the basis of the investigator's general knowledge of the specific problem and of
Mendelian genetics. The values of ρ = 0.75 and q = 0.25 were dictated by the
3:1 hypothesis and were not estimated from the sampled data. For this reason,
the expected frequencies are said to have been based on an extrinsic hypothesis,
a hypothesis external to the data. By contrast, consider the expected Poisson
frequencies of yeast cells in a hemacytometer (Box 4.1). You will recall that to
compute these frequencies, you needed values for μ , which you estimated from
the sample mean Ύ. Therefore, the parameter of the computed Poisson distri
bution came from the sampled observations themselves. The expected Poisson
frequencies represent an intrinsic hypothesis. In such a case, to obtain the correct
number of degrees of freedom for the test of goodness of fit, we would subtract
from a, the number of classes into which the data had been grouped, not only
one degree of freedom for n, the sum of the frequencies, but also one further
degree of freedom for the estimate of the mean. Thus, in such a case, a sample
statistic G would be compared with chisquare for a — 2 degrees of freedom.
Now let us introduce you to an alternative technique. This is the traditional
approach with which we must acquaint you because you will see it applied in
the earlier literature and in a substantial proportion of current research publi
cations. We turn once more to the genetic cross with 80 wildtype and 10
mutant individuals. The computations are laid out in columns (7), (8), and (9)
in Table 13.1.
We first measure / — / , the deviation of observed from expected frequen
cies. Note that the sum of these deviations equals zero, for reasons very similar
to those causing the sum of deviations from a mean to add to zero. Following
our previous approach of making all deviations positive by squaring them, we
square ( / — / ) in column (8) to yield a measure of the magnitude of the devia
tion from expectation. This quantity must be expressed as a proportion of the
expected frequency. After all, if the expected frequency were 13.0, a deviation of
12.5 would be an extremely large one, comprising almost 100% of f , but such
a deviation would represent only 10% of an cxpected frequency of 125.0. Thus,
we obtain column (9) as the quotient of division of the quantity in column (8)
by that in column (4). Note that the magnitude of the quotient is greater for
the second line, in which the / is smaller. Our next step in developing our test
statistic is to sum the quotients, which is done at the foot of column (9), yielding
a value of 9.259,26.
This test is called the chi-square test because the resultant statistic, X2, is
distributed as chisquare with a 1 degrees of freedom. Many persons inap
13.2 / zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
SINGLECLASSIFICATION GOODNESS OF FIT TESTS 301
propriately call the statistic obtained as the sum of column (9) a chisquare.
However, since the sample statistic is not a chisquare, we have followed the
X2 rather than
increasingly prevalent convention of labeling the sample statisticzyxwvutsrqponmlkjihgfed
2 2
χ . The value of X = 9.259,26 from Table 13.1, when compared with the critical
value of χ2 (Table IV), is highly significant (P < 0.005). The chisquare test is
always onetailed. Since the deviations are squared, negative and positive devia
tions both result in positive values of X2. Clearly, we reject the 3:1 hypothesis
and conclude that the proportion of wild type is greater than 0.75. The geneticist
must, consequently, look for a mechanism explaining this departure from ex
pectation. Our conclusions are the same as with the G test. In general, X2 will
be numerically similar to G.
We can apply the chisquare test for goodness of fit to a distribution with
more than two classes as well. The operation can be described by the formula
a
(f - f·)2
Χ 2 = Σ (13.6)
fi
BOX 13.1
G Test for Goodness of F i t Single Classification.
1.zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Frequencies divided into a 2: 2 classes: Sex ratio in 6115 sibships of 12 in Saxony.
The fourth column gives the expectedfrequencies,assuming a binomial distri
bution. These were first computed in Table 4,4 but are here given to five
decimalplace precision to give sufficient accuracy to the computation of G.
(J)
Deviation
V) w from
cic? 99 / / expectation
12 0 2.347,27)
•52 28.429,73 +
11 1 26.082,46]
10 2 181 132.835,70 +
9 3 478 410.012,56 +
8 4 829 854.246,65 —
7 5 1112 1265.630,31 —
6 6 1343 1367.279,36 —
5 7 1033 1085.210,70 _
4 8 670 628.055,01 +
3 9 286 258.475,13 +
2 10 104. 71.803,17 +
1 11 12.088,84)
•27 0.932,84 (>13.021,68 +
0 12
6115 == η 6115.000,00
( U\
\ jlJ
52 +181 + + 27ln
= K K^)
= 94.871,55
·' · (ηέ^))
Since there are a = 11 classes remaining, the degrees of freedom would be
α — 1 == 10, if this were an example tested against expected frequencies based
on an extrinsic hypothesis. However, because the expected frequencies are based
on a binomial distribution with mean pg estimated from the p , of the sample,
a further degree of freedom is removed, and the sample value of G is compared
with a χ2 distribution with a 2 = 11 — 2 = 9 degrees of freedom. We applied
Williams' correction to G, to obtain a better approximation to χ 2. In the for
mula computed below, ν symbolizes the pertinent degrees of freedom of the
.2zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
/ SINGLECLASSIFICATION (I(K)DNESS OF FIT TESTS
BOX 13.1
Continued
problem. We obtain
Parameters estimated
Distribution from sample df
Binomial Ρ a —2
Normal μ , a a-3
Poisson μ a-2
When the parameters for such distributions are estimated from hypotheses
extrinsic to the sampled data, the degrees of freedom are uniformly a — 1.
2. Special case of frequencies divided in a = 2 classes: In an Fz cross in dro
sophila, the following 176 progeny were obtained, of which 130 were wildtype
flies and 46 ebony mutants. Assuming that the mutant is an autosomal recessive,
one would expect a ratio of 3 wildtypefliesto each mutant fly. To test whether
the observed results are consistent with this 3:1 hypothesis, we set up the data
as follows.
Flies f Hypothesis f
+
= 2[130In ( Η δ + 46 In iff)] * < ί ' ' ' ' '
= 0.120,02
304 CHAPTER 1 3 / ANALYSIS OF FREQUENCIES
BOX 13.1
Continued
Williams* correction for the twocell case is <? = 1 +·1/2», which is
1 + 2 m r l M 2 ' u
in this example.
G 0.120,02
01197
The case presented in Box 13.1, however, is one in which the expected
frequencies are based on an intrinsic hypothesis. We use the sex ratio data in
sibships of 12, first introduced in Table 4.4, Section 4.2. As you will recall, the
expected frequencies in these data are based on the binomial distribution, with
the parametric proportion of males p . estimated from the observed frequencies
of the sample (p , = 0.519,215). The computation of this case is outlined fully
in Box 13.1.
The G test does not yield very accurate probabilities for small f{. The cells
with J] < 3 (when a > 5) or f , < 5 (when a < 5) are generally lumped with
adjacent classes so that the new / are large enough. The lumping of classes
results in a less powerful test with respect to alternative hypotheses. By these
criteria the classes of /· at both tails of the distribution are too small. We lump
them by adding their frequencies to those in contiguous classes, as shown in
Box 13.1. Clearly, the observed frequencies must be lumped to match. The
number of classes a is the number after lumping has taken place. In our case, roled
α = 11.
Because the actual type I error of G tests tends to be higher than the
intended level, a correction for G to obtain a better approximation to the chi
square distribution has been suggested by Williams (1976). He divides G by a
correction factor q (not to be confused with a proportion) to be computed as
q = 1 + (a2 — l)/6m>. In this formula, ν is the number of degrees of freedom
appropriate to the G test. The effect of this correction is to reduce the observed
value of G slightly.
Since this is an example with expected frequencies based on an intrinsic
hypothesis, we have to subtract more than one degree of freedom from a for
the significance test. In this case, we estimated p.· from the sample, and therefore
a second degree of freedom is subtracted from a, making the final number of
degrees of freedom a — 2 = II 2 9. Comparing the corrected sample value
13.3 / zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
TESTS OF INDEPENDENCE: T W O W A Y TABLES 305
of
of χ2 at 9 degrees of freedom, we find
^adj — 94.837,09 with the critical valuezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPO
it highly significant (Ρ « 0.001, assuming that the null hypothesis is correct).
We therefore reject this hypothesis and conclude that the sex ratios are not
binomially distributed. As is evident from the pattern of deviations, there is an
excess of sibships in which one sex or the other predominates. Had we applied
the chisquare test to these data, the critical value would have been the same
(Xa[9])·
Next we consider the case for a = 2 cells. The computation is carried out
by means of Expression (13.4), as before. In tests of goodness of fit involving
only two classes, the value of G as computed from this expression will typically
result in type I errors at a level higher than the intended one. Williams' correction
reduces the value of G and results in a more conservative test. An alternative
correction that has been widely applied is the correction for continuity, usually
applied in order to make the value of G or X2 approximate the χ2 distribution
more closely. We have found the continuity correction too conservative and
therefore recommend that Williams' correction be applied routinely, although
it will have little elfect when sample sizes are large. For sample sizes of 25 or
less, work out the exact probabilities as shown in Table 4.3, Section 4.2.
The example of the two cell case in Box 13.1 is a genetic cross with an
expected 3:1 ratio. The G test is adjusted by Williams' correction. The expected
frequencies differ very little from the observed frequencies, and it is no surprise,
therefore, that the resulting value of G adj is far less than the critical value of χ2
at one degree of freedom. Inspection of the chisquare table reveals that roughly
80% of all samples from a population with the expected ratio would show
greater deviations than the sample at hand.
must be rejected, this is taken as evidence that the characters are linked—that
is, located on the same chromosome.
There are numerous instances in biology in which the second hypothesis,
concerning the independence of two properties, is of great interest and the first
hypothesis, regarding the true proportion of one or both properties, is of little
interest. In fact, often no hypothesis regarding the parametric valueszyxwvutsrqponmlkjihg
p{ can be
formulated by the investigator. We shall cite several examples of such situations,
which lead to the test of independence to be learned in this section. We employ
this test whenever we wish to test whether two different properties, each occurring
in two states, are dependent on each other. For instance, specimens of a certain
moth may occur in two color phases—light and dark. Fifty specimens of each
phase may be exposed in the open, subject to predation by birds. The number
of surviving moths is counted after a fixed interval of time. The proportion
predated may differ in the two color phases. The two properties in this example
are color and survival. We can divide our sample into four classes: lightcolored
survivors, lightcolored prey, dark survivors, and dark prey. If the probability
of being preyed upon is independent of the color of the moth, the expected
frequencies of these four classes can be simply computed as independent prod
ucts of the proportion of each color (in our experiment, 5) and the overall
proportion preyed upon in the entire sample. Should the statistical test of inde
pendence explained below show that the two properties are not independent,
we are led to conclude that one of the color phases is more susceptible to
predation than the other. In this example, this is the issue of biological impor
tance; the exact proportions of the two properties are of little interest here. The
proportion of the color phases is arbitrary, and the proportion of survivors is
of interest only insofar as it differs for the two phases.
A second example might relate to a sampling experiment carricd out by a
plant ecologist. A random sample is obtained of 100 individuals of a fairly rare
species of tree distributed over an area of 400 square miles. For each tree the
ecologist notes whether it is rooted in a serpentine soil or not, and whether the
leaves arc pubcsccnt or smooth. Thus the sample of η = 100 trees can be divided
into four groups: serpentinepubescent, serpentinesmooth, nonserpentine
pubescent, and nonserpentinesmooth. If the probability that a tree is or is not
pubesccnt is independent of its location, our null hypothesis of the independence
of these properties will be upheld. If, on the other hand, the proportion of
pubcscencc differs for the two types of soils, our statistical test will most prob
ably result in rejection of the null hypothesis of independence. Again, the ex
pected frequencies will simply be products of the independent proportions of
the two properties serpentine versus nonserpentine, and pubesccnt versus
smooth. In this instance the proportions may themselves be of interest to the
investigator.
An analogous example may occur in medicine. Among 10,000 patients ad
mitted to a hospital, a certain proportion may be diagnosed as exhibiting disease
X. At the same time, all patients admitted are tested for several blood groups.
A certain proportion of these arc members of blood group Y. Is there some
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
13.3 /TESTS OF INDEPENDENCE: T W O W A Y TABLES 307
Dead Alive Σ
Bacteria and antiserum 13 44 57
Bacteria only 25 29 54
73 111
r1
oo
Thus 13 mice received bacteria and antiserum but died, as seen in the table.
The marginal totals give the number of mice exhibiting any one property: 57
mice received bacteria and antiserum; 73 mice survived the experiment. Alto
gether 111 mice were involved in the experiment and constitute the total sample.
In discussing such a table it is convenient to label the cells of the table and
the row and column sums as follows:
a b a + b
c d c + d
a + c b + d η
From a twoway table one can systematically computc the cxpcctcd fre
quencies (based on the null hypothesis of independence) and compare them
with the observed frequencies. For example, the expected frequency for cell d
(bacteria, alive) would be
. Λ A {c + d \ f b + d\ (c + d)(b + d)
Jbacl.alv ~ nPbacl.alv ~ nPbad X Palv — " I
which in our case would be (54)(73)/l 11 = 35.514, a higher value than the
observed frequency of 29. We can proceed similarly to compute the expected
frequencies for each cell in the table by multiplying a row total by a column total,
and dividing the product by the grand total. The expected frequencies can be
308 CHAPTER 1 3 / ANALYSIS OF FREQUENCIES
Dead Alive Σ
Bacteria and antiserum 19.514 37.486 57.000
Bacteria only 18.486 35.514 54.000
You will note that the row and column sums of this table are identical to those
in the table of observed frequencies, which should not surprise you, since the
expected frequencies were computed on the basis of these row and column
totals. It should therefore be clear that a test of independence will not test
whether any property occurs at a given proportion but can only test whether
or not the two properties are manifested independently.
The statistical test appropriate to a given 2 x 2 table depends on the under
lying model that it represents. There has been considerable confusion on this
subject in the statistical literature. For our purposes here it is not necessary to
distinguish among the three models of contingency tables. The G test illustrated
in Box 13.2 will give at least approximately correct results with moderate to
largesized samples regardless of the underlying model. When the test is applied
to the above immunology example, using the formulas given in Box 13.2, one
obtains G adj = 6.7732. One could also carry out a chisquare test on the devia
tions of the observed from the expected frequencies using Expression (13.2).
This would yield χ 2 = 6.7966, using the expected frequencies in the table above.
Let us state without explanation that the observed G or X 2 should be compared
with χ2 for one degree of freedom. We shall examine the reasons for this at the
end of this section. The probability of finding a fit as bad, or worse, to these
data is 0.005 < Ρ < 0.01. We conclude, therefore, that mortality in these mice
is not independent of the presence of antiserum. We note that the percentage
mortality among those animals given bacteria and antiserum is (13)(100)/57 =
22.8%, considerably lower than the mortality of (25)(100)/54 = 46.3% among
the mice to whom only bacteria had been administered. Clearly, the antiserum
has been effective in reducing mortality.
In Box 13.2 we illustrate the G test applied to the sampling experiment in
plant ecology, dealing with trees rooted in two different soils and possessing
two types of leaves. With small sample sizes (n < 200), it is desirable to apply
Williams' correction, the application of which is shown in the box. The result
of the analysis shows clearly that we cannot reject the null hypothesis of inde
pendence between soil type and leaf type. The presence of pubescent leaves is
independent of whether the tree is rooted in serpentine soils or not.
Tests of independence need not be restricted to 2 χ 2 tables. In the twoway
cases considered in this section, we are concerned with only two properties,
but each of these properties may be divided into any number of classes. Thus
organisms may occur in four color classes and be sampled at five different times
during the year, yielding a 4 χ 5 test of independence. Such a test would ex
amine whether the color proportions exhibited by the marginal totals are inde
13.3 / t e s t s o f i n d e p e n d e n c e : t w o - w a y t a b l e s 309
BOX 13.2
2 x 2 test of independence.
A plant ecologist samples 100 trees of a rare species from a 400squaremile area.
He records for each tree whether it is rooted in serpentine soils or not, and whether
its leaves are pubescent or smooth.
Serpentine 12 22 34
Not Serpentine 16 50 66
Totals 28 72 100»η
Σ
a b a+b
c d c +d
£ a+c b+d a + b + c + d<=n
Compute the following quantities.
1. X / In / for the cell frequencies = 12 In 12 + 22 In 22 + 16 In 16 + 50 In 50
= 337.784,38
2. £ / for the row and column totals = 34 In 34 + 66 In 66 + 28 In 28 + 72 In 72
= 797.635,16
3. η In « = 100 In 100 = 460.517,02
4. ComputeyxtsrponmljifecbaYSRPOJHGEDCA
G as follows:
G = 2(quantity 1 quantity 2 + quantity 3)
= 2(337.784,38 797.635,16 + 460.517,02)
= 2(0.666,24) = 1.332,49
Williams' correction for a 2 χ 2 table is
a = ,1 ,Η( W + W W W + Wt)
H 6(100)
= 1 .0 2 2 ,8 1
G 1.332,49 _
13028
Compare GadJ with critical value of χζ for one degree of freedom. Since our
observed Gadj is much less than Zo.ostu = 3.841, we accept the null hypothesis
that the leaf type is independent of the type of soil in which the tree is rooted.
310 CHAPTER 1 3 / ANALYSIS OF FREQUENCIES
BOX 13.3
If χyxtsrponmljifecbaYSRPOJHGEDCA
C test o f independence using the G test.
Frequencies for the Μ and Ν blood groups in six populations from Lebanon.
Genotypes (a = 3)
Populations ———... .• • .• • ,
( b 6) MM MN NN Totals %MM 7.MN SNN
/„Η ς λ )
= Σ( Σ zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
= 818 In 818 + · • + 494 In 494 = 5486.213 + · · · + 3064.053 =
BOX 13.3
Continued
(3 + 1)(6 + 1)
~ 6(2466)
= 1.001,892
are often called RxC tests of independence, R and C standing for the number
of rows and columns in the frequency table. Another case, examined in detail
in Box 13.3, concerns the MN blood groups which occur in human populations
in three genotypes—MM, MN, and NN. Frequencies of these blood groups
can be obtained in samples of human populations and the samples compared
for differences in these frequencies. In Box 13.3 we feature frequencies from six
Lebanese populations and test whether the proportions of the three groups arc
independent of the populations sampled, or in other words, whether the fre
quencies of the three genotypes differ among these six populations.
As shown in Box 13.3, the following is a simple general rule for computation
of the G test of independence:
Μ Μ genotypes in the third population (Greek Orthodox) and the much lower
frequency of the MN heterozygotes in the last population (Sunni Moslems).
ThezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
degrees of freedom for tests of independence are always the same and
can be computed using the rules given earlier (Section 13.2). There are k cells
in the table but we must subtract one degree of freedom for each independent
parameter we have estimated from the data. We must, of course, subtract one
degree of freedom for the observed total sample size, n. We have also estimated
a — 1 row probabilities and b — 1 column probabilities, where a and b are
the number of rows and columns in the table, respectively. Thus, there are
k — (a — 1) — (b — 1)— 1 = fc — a — b + I degrees of freedom for the test.
But since k = a χ b, this expression becomes {a χ b) — a — b + 1 = (a — 1) χ
(b — 1), the conventional expression for the degrees of freedom in a twoway
test of independence. Thus, the degrees of freedom in the example of Box 13.3,
a 6 χ 3 case, was (6 1) χ (3 — 1) = 10. In all 2 χ 2 cases there is clearly only
(2 — 1) χ (2 — 1) = 1 degree of freedom.
Another name for test of independence is test of association. If two prop
erties are not independent of each other they are associated. Thus, in the ex
ample testing relative frequency of two leaf types on two different soils, we
can speak of an association between leaf types and soils. In the immunology
experiment there is a negative association between presence of antiserum and
mortality. Association is thus similar to correlation, but it is a more general
term, applying to attributes as well as continuous variables. In the 2 x 2 tests
of independence of this section, one way of looking for suspected lack of
independence was to examine the percentage occurrence of one of the prop
erties in the two classes based on the other property. Thus we compared the
percentage of smooth leaves on the two types of soils, or we studied the per
centage mortality with or without antiserum. This way of looking at a test of
independence suggests another interpretation of these tests as tests for the
significance of differences between two percentages.
Exe rc ise s
13.1 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
In an experiment to determine the mode of inheritance of a green mutant, 146
wild-type and 30 mutant offspring were obtained when F , generation houseflics
were crosscd. Test whether the data agree with the hypothesis that the ratio of
wild type of mutants is 3:1. ANS. G = 6.4624, G a d j = 6.441, 1 d f , xg 0 5 [ 1 , = 3.841.
13.2 Locality A has been exhaustively collected for snakes of species S. An ex-
amination of the 167 adult males that have been collected reveals that 35 of
these have pale-colored bands around their necks. From locality B, 90 miles
away, we obtain a sample of 27 adult males of the same species, 6 of which show
the bands. What is the chance that both samples are from the same statistical
population with respect to frequency of bands?
13.3 Of 445 specimens of the butterfly Erebia epipsodea from mountainous areas,
2.5",", have light color patches on their wings. Of 65 specimens from the prairie,
70.8tsronljfeaUT
'T, have such patches (unpublished data by P. R. Ehrlich). Is this difference
significant? llinv First work backwards to obtain original frequencies. ANS.
G - 175.5163, I dj\ G.Mll = 171.4533.
EXERCISES 313
Chromosome CD
Chromosome EF Td/Td 22 96 75
St/Td 8 56 64
St/St 0 6 6
Antibiotic Antibiotic
+ Ninwstilith' + placebo
Negative opinion 1 16
Positive opinion 19 4
Mathematical Appendix
Al.l D e m o n s t r a t i o n t h a t t h e s u m of t h e d e v i a t i o n s f r o m t h e m e a n is e q u a l
to zero.
W e h a v e t o l e a r n t w o c o m m o n r u l e s of s t a t i s t i c a l a l g e b r a . W e c a n o p e n a
p a i r of p a r e n t h e s e s w i t h a Σ sign in f r o n t of t h e m b y t r e a t i n g t h e Σ a s t h o u g h
it w e r e a c o m m o n f a c t o r . W e h a v e
£ (A, + B,) = £ A 1 + £ β.
il • ι =1
A l s o , w h e n Σ " _ , C is d e v e l o p e d d u r i n g a n a l g e b r a i c o p e r a t i o n , w h e r e C is
a c o n s t a n t , this c a n be c o m p u t e d as follows:
η
Σ C = C + C + · ••+ Γ (η t e r m s )
i 1
- nC
.117
APPENDIX 1 / MATHEMATICAL APPENDIX
= Σ Υ
~ " Υ roled
= ^Σ isince Υ =
η
Υ Υ
= Σ ~ Σ
Therefore, Σ }' = 0.
Α1.2 Demonstration that Expression (3.8), the computational formula for the
sum of squares, equals Expression (3.7), the expression originally developed for
this statistic.
£ ( Y Ϋ ) 2 = Σ ( ^ 2 2Υ Ϋ + Ϋ 2 )
= Σ ^2 ~ 2 ϋ ς υ +η γ2
(since Υ
η
γ 2 γ
Ά Σ )
= Σ γ 1 +, ( Σ )
Hence,
γ γ 1
(Ση 2
Σ ( Σ
η
Α1.3 Simplified formulas for standard error of the difference between two
means.
The standard error squared from Expression (8.2) is
n] + n2 — 2 n, η,
WhenzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
ti1 φ n2 but each is large, so that (r^ — 1) ά nl and (n2 — 1) « n2, the
standard error squared of Expression (8.2) simplifies to
rtjSf +
n, + n 2
n 2 s j ηn,
η , η .
+ η- n
n1n2
i s
i ,
1
n2sl
n1n2
_ s j
—
n2
1
«j
s\
Yi Yi
ts (from Box 8.2)
1 1
(sl + sl)
φ 1) ΣνΙ + Σ ή
( Ϋ ι - n(n - 1)(F, - Y,
r<2 = η η \ Π η
1
n(n 1) ΣνΙ + Σνΐ) Σy' + Σy22
1 2
m - ^
2 I
2 2
= (F, - Ϋ) + ( Ϋ 2 ~ Υ Ϊ
Υι + Υ ι Υ . ί * Yi + Υ 2 ^2 (since Υ = + Ϋ2)β)
= Κ + γ.
Γ, Υ Λ 2
Υ - Τ,
2
Ϋζ)2
MS„ = η χ MS m c a m = « [ i ( F , F2)2]
η -
y2)2
~ 2
Zyi + Z ^
MS.
APPENDIX 1 / MATHEMATICAL APPENDIX .117
1(ΆΫ 2)2
F.=
Σ ύ + Σ ή ) Ι Ά η - i)]
Φ i)(F x ?2)2
Σ ή + Σ ή
= tl
Σ > = Σ ( * χ ) ( γ γ)
γ γ χ + η Χ Υ
= Σ Χ Υ - Χ Σ - Σ (since Σ Χ Υ = Υ)
= Σ Χ Υ ~ ηΧ ΪΣ
η
= Σ χ γ χ Σ γ
Similarly,
Σχν = Σ™~
and
Σ χ ν = Σ Χ Υ ^ Σ Χ 1 ("· 5,
η
Therefore,
Σ >' 2 2 + Μ Σ ^2 = Σ Γ 2 +
Σχ
2
Σν 2^ " (Σ*2)2 ^ ^ Σ*2
318 APPENDIX 1 / MATHEMATICAL APPENDIX .117
or
(11.6)
( Σ ^) 2
l zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC
d
r x = ly2 - 2 Σ*
A1.7 Demonstration that the sum of squares of the dependent variable in
regression can be partitioned exactly into explained and unexplained sums of
squares, the cross products canceling out.
By definition (Section 11.5),
y = y + άγ χ
since b =
Σχ2
h x
= *>Σ *>· ~ T y
= ο
where σ , and σ2 are standard deviations of Yj and Y2, respectively, and ρλ2 is
the paramctric correlation cocfficicnt between Y, andyxtsrponmljifecbaYSRPOJHGE
Y2 .
If Ζ = Υ, + Y2, then
ι
(y,
'. +
1v2'
2) Σ(η + Y
l)
η
ι
= Σ (Ν, Ϋ,Ι + (Υ, Y2) = Σο-. +y 2)2
1 1 1 2
£(y\ + y\ + 2y,y 2 ) = £rf + Χ y \ + £y,y2
η η η η
σ
ΐ + σί + -σι;
.117
APPENDIX 1 / MATHEMATICAL APPENDIX
Therefore
σΐ = = σι + σ22 + 2 p 1 2 fftff 2
Similarly,
σέ = = σ ι + σ 2 2ρ 1 2 σ ,σ 2
The analogous expressions apply to sample statistics. Thus
s
f y 1-Ϊ2) = s
i + s2 ~ 2r12sj.s;2 (12.9)
A1.9 Proof that the general expression for the G test can be simplified to Ex
pressions (13.4) and (13.5).
In general, G is twice the natural logarithm of the ratio of the probability
of the sample with all parameters estimated from the data and the probability
of the sample assuming the null hypothesis is true. Assuming a multinomial
distribution, this ratio is
Pa
L =
n'.
ρ {·ρ { 2 Pa
Pi
Π
ι \Pi
where /, is the observed frequency, pt is the observed proportion, and the
expected proportion of class /, while η is sample size, the sum of the observed
frequencies over the a classes.
G = 2 In L
Σ./ >
Since /• = npt and /, = nph
G = 2£./;in(4 (13.4)
A
G = 2 V /; In Σ / . I" Σ./;1·1"
npi Pi
Statistical Tables
TAB I
Twi »tive hundred random digits.
1 2 3 4 5 6 7 8 9 10
1 48461 14952 72619 73689 52059 37086 60050 86192 67049 64739
2 76534 38149 49692 31366 52093 15422 20498 33901 10319 43397
3 70437 25861 38504 14752 23757 59660 67844 78815 23758 86814
4 59584 03370 42806 11393 71722 93804 09095 07856 55589 46020
5 04285 58554 16085 51555 27501 73883 33427 33343 45507 50063
6 77340 10412 69189 85171 29082 44785 83638 02583 96483 76553
7 59183 62687 91778 80354 23512 97219 65921 02035 59847 91403
8 91800 04281 39979 03927 82564 28777 59049 97532 54540 79472
9 12066 24817 81099 48940 69554 55925 48379 12866 51232 21580
10 69907 91751 53512 23748 65906 91385 84983 27915 48491 91068
11 80467 04873 54053 25955 48518 13815 37707 68687 15570 08890
12 78057 67835 28302 45048 56761 97725 58438 91528 24645 18544
13 05648 39387 78191 88415 60269 94880 58812 42931 71898 61534
14 22304 39246 01350 99451 61862 78688 30339 60222 74052 25740
15 61346 50269 67005 40442 33100 16742 61640 21046 31909 72641
16 66793 37696 27965 30459 91011 51426 31006 77468 61029 57108
17 86411 48809 36698 42453 83061 43769 39948 87031 30767 13953
18 62098 12825 81744 28882 27369 88183 65846 92545 09065 22655
19 68775 06261 54265 16203 23340 84750 16317 88686 86842 00879
20 52679 19595 13687 74872 89181 01939 18447 10787 76246 80072
21 84096 87152 20719 25215 04349 54434 72344 93008 83282 31670
22 63964 55937 21417 49944 38356 98404 14850 17994 17161 98981
23 31191 75131 72386 11689 95727 05414 88727 45583 22568 77700
24 30545 68523 29850 67833 05622 89975 79042 27142 99257 32349
25 52573 91001 52315 26430 54175 30122 31796 98842 37600 26025
26 1658ft 81842 01076 99414 31574 94719 34656 80018 86988 79234
27 81841 88481 61191 25013 30272 23388 22463 65774 10029 58376
28 43563 66829 72838 08074 57080 15446 11034 98143 74989 26885
29 19945 84193 57581 77252 85604 45412 43556 27518 90572 00563
30 79374 23796 16919 99691 80276 32818 62953 78831 54395 30705
31 48503 26615 43980 09810 38289 66679 73799 48418 12647 40044
32 32049 65541 37937 41105 70106 89706 40829 40789 59547 (X>783
33 18547 71562 95493 34112 76895 46766 96395 31718 48302 45893
34 03180 96742 61486 43305 34183 99605 67803 13491 09243 29557
35 94822 24738 67749 83748 59799 25210 31093 62925 72061 69991
36 34330 60599 85828 19152 68499 27977 35611 96240 62747 89529
37 43770 81537 .59527 95674 76692 86420 69930 1(X)20 72881 12532
38 56908 77192 50623 41215 14311 42834 80651 93750 59957 31211
39 32787 07189 80539 75927 75475 73965 11796 72140 48944 74156
40 52441 78392 11733 57703 29133 71164 55355 31006 25526 55790
41 22377 54723 18227 28449 04570 18882 00023 67101 06895 08915
42 18376 73460 88841 39602 34049 20589 05701 08249 74213 25220
43 53201 28610 87957 21497 64729 64983 71551 99016 87903 63875
44 34919 78901 59710 27396 02593 05665 11964 44134 00273 76358
45 33617 92159 21971 16901 57383 34262 41744 60891 57624 06962
46 70010 40964 98780 72418 52571 18415 64362 90636 38034 04909
47 19282 68447 35665 31530 59832 49181 21914 65742 89815 39231
48 91429 73328 13266 54898 68795 40948 80808 63887 89939 47938
49 97637 78393 33021 05867 86520 45363 43066 00988 64040 09803
50 95150 07625 05255 83254 93943 52325 93230 62668 79529 65964
APPENDIX 2 / STATISTICAL TABLES
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
TABLE I I
A r e a s o f the n o r m a l curve
y/σ 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 y/σ
0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359 0.0
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753 0.1
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141 0.2
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517 0.3
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879 0.4
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224 0.5
0.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549 0.6
0.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852 0.7
0.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133 0.8
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389 0.9
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621 1.0
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830 1.1
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015 1.2
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177 1.3
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319 1.4
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441 1.5
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545 1.6
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633 1.7
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706 1.8
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767 1.9
2.Ί .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817 2.0
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857 2.1
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890 2.2
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916 2.3
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936 2.4
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952 2.5
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964 2.6
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974 2.7
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981 2.8
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986 2.9
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990 3.0
3.1 .4990 .4991 .4991 .4991 .4992 .4992 .4992 .4992 .4993 .4993 3.1
3.2 .4993 .4993 .4994 .4994 .4994 .4994 .4994 .4995 .4995 .4995 3.2
3.3 .4995 .4995 .4995 .4996 .4996 .4996 .4996 .4996 .4996 .4997 3.3
3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4998 3.4
.499767
.499841
.499892
.499928
.499952 Tabled area
.499968
.499979
.499987
.499991
.499995
.499997
.499998
.499999
.499999
.StMXMX)
N i t t c The quantity given is the area under the standard norma! density function between the mean
and the critical point. The area is generally labeled J - α (as shown in the figure). By inverse inter-
polation one can lind the number of standard deviations corresponding to a given area.
appendix 2 / statistical tables 323
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
t a b l eIII
V 0.9 0.5 0.4 0.2 0.1 0.05 0.02 0.01 0.001 I>
1 .158 1.000 1.376 3.078 6.314 12.706 31.821 63.657 636.619 1
2 .142 .816 1.061 1.886 2.920 4.303 6.965 9.925 31.598 2
3 .137 .765 .978 1.638 2.353 3.182 4.541 5.841 12.924 3
4 .134 .741 .941 1.533 2.132 2.776 3.747 4.604 8.610 4
5 .132 .727 .920 1.476 2.015 2.571 3.365 4.032 6.869 5
6 .131 .718 .906 1.440 1.943 2.447 3.143 3.707 5.959 6
7 .130 .711 .896 1.415 1.895 2.365 2.998 3.499 5.408 7
8 .130 .706 .889 1.397 1.860 {2306} 2.896 3.355 5.041 8
9 .129 .703 .883 1.383 1.833 2.262 2.821 3.250 4.781 9
10 .129 .700 .879 1.372 1.812 2.228 2.764 3.169 4.587 10
Note: If a one-tailed test is desired, the probabilities at Ihc head of the table must be halved, f o r degrees of
freedom ν > 30, interpolate between the values of the argument v. The table is designed for harmonic inter-
polation. Thus, to obtain io.o5|43j> interpolate between i n n s|4oi ~ 2.021 and i 0 „ M ( 1 ) ) | - 2.000, which are furnished
in the table. Transform the arguments into 120/v - 120/43 - 2.791 and interpolate between 120/60 - 2.000 and
120/40 - 3.000 by ordinary linear interpolation:
When ν > 120, interpolate between l ? 0 / x 0 and 120/120 = 1. Values in this table have been taken from a
more extensive one (table III) in R. A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural and
\ Λ „ . S t h (ΠϋυρΓ A R.wH I'ainhnroli I 9S81 with nprmk^inn of I hi :ι 111 h« tr·; ;i nd their nuhlisher
appendix 2 / statistical tables
TABLE zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
IV
Critical values o f the chisquare distribution
6 0.676 1.237 2.204 5.348 10.645 12.592 14.449 16.812 18.548 22.458 6
7 0.989 1.690 2.833 6.346 12.017 14.067 16.013 18.475 20.278 24.322 7
8 1.344 2.180 3.490 7.344 13.362 15.507 17.535 20.090 21.955 26.124 8
9 1.735 2.700 4.168 8.343 14.684 16.919 19.023 21.666 23.589 27.877 9
10 2.156 3.247 4.865 9.342 15.987 18.307 20.483 23.209 25.188 29.588 10
11 2.603 3.816 5.578 10.341 1 7.275 19.675 21.920 24.725 26.757 31.264 11
12 3.074 4.404 6.304 11.340 18.549 21.026 23.337 26.217 28.300 32.910 12
13 3.565 5.009 7.042 12.340 19.812 22.362 24.736 27.688 29.819 34.528 13
14 4.075 5.629 7.790 13.339 21.064 23.685 29.141 31319 36.123 14
15 4.601 6.262 8.547 14.339 22.307 24.996 27.488 30.578 32.801 37.697 15
16 5.142 6.908 9.312 15.338 23.542 26.296 28.845 32.000 34.267 39.252 16
17 5.697 7.564 10.085 16.338 24.769 27.587 30.191 33.409 35.718 40.790 17
18 6.265 8.231 10.865 1 7.338 25.989 28.869 31.526 34.805 37.156 42.312 18
19 6.844 8.907 11.651 18.338 27.204 30.144 32.852 36.191 38.582 43.820 19
20 7.434 9.591 12.443 19.337 28.412 31.410 34.170 37.566 39.997 45.315 20
21 8.034 10.283 13.2411 20.337 29.615 32.670 35.479 38.932 41.401 46.797 21
22 8.643 10.982 14.042 21.337 30.81 3 33.924 36.781 40.289 42.796 48.268 22
23 9.260 11.688 14.848 22.337 32.007 35.172 38.076 4 1.638 44.181 49.728 23
24 9.886 12.401 ) 5.659 23.337 33.196 36.415 39.364 42.980 45.556 51.179 24
25 10.520 13.120 16.473 24.337 (4.382 37.652 40 646 44.314 46.928 52.620 25
26 11.160 13.844 1 7.292 25.336 35.5(,3 38.885 41.923 45.642 48.291) 54.052 26
27 11.808 14.573 18.114 26.336 36.741 40.113 43.194 46.963 49.645 55.476 27
28 12.461 15.308 18.939 27.336 37.916 41.337 44.461 48.278 50.993 56.892 28
29 13.121 16.047 19.768 28.336 39.088 42.557 45.722 49.588 52.336 58.301 29
30 13.787 16.791 20.599 29.336 40.256 43.773 46.979 50.892 53.672 59.703 30
31 14.458 1 7.539 21.434 30.336 41.422 44.985 48.232 52.191 55.003 61.098 31
.12 15.134 18.291 22.271 31.336 42.585 46.194 49.480 5.3.486 5(i.329 62.487 32
33 15.815 19 0 4 7 23.1 1(1 32.336 A 3.745 47.400 50.725 54.776 57.649 63.870 33
34 16.501 19.806 23.952 33.336 44.903 4K.602 51.966 56.061 58.964 65.247 34
35 17 192 20.569 24.797 34.336 •16.059 49.802 53.203 57.342 60.275 66.619 35
3<> 1 7 887 2 1.336 25.643 35 3 3 6 47.212 50.998 54.437 58.619 61.582 67.985 36
37 1 8.58d 22.106 26.492 36.335 48.363 52.192 55.668 59.892 62.884 69.346 37
3Κ 19.289 22.S7S 27.343 37.335 49.513 53.384 56.896 61.162 64.182 70.703 38
39 19.99(> 23.654 28.19(, 38.335 50.660 54.572 58.120 62428 65.476 72.055 39
40 20.71)7 24.433 29.051 39.335 51.805 55.758 59.312 63.691 66.766 73.402 40
41 21.421 25.215 29.907 411.335 52.949 56.942 60.561 64.950 68.053 74.745 41
42 22.138 25.999 30.765 41.335 54.090 58.121 61.777 66.206 69.336 76.084 •12
43 22.859 26.785 31.625 42.335 55.230 59.3(14 (.2.99(1 67.459 70.616 77.419 43
44 23.584 27.575 32.487 4 3 . 3 (5 56.369 60.4SI 64.202 68.710 71.893 78.750 44
45 24.31 1 28.366 33.351) 44.335 57.505 61.650 65.410 69.957 73.166 80.077 45
46 25.042 29.160 34.215 •15.335 58.641 62.830 66.617 71.201 74.437 81.400 46
47 25.775 29.956 35.081 46.335 59.774 64.001 67.821 72.443 75.701 82.720 47
48 26.51 1 30.755 35.949 47 ( 3 5 60.907 65.171 69.023 73.683 76.969 84.037 48
49 27.249 31.555 36.818 48.335 62.038 66.339 70.222 74.919 78.231 85.351 49
5o 27.991 32.357 37.689 4 9 . 3 (5 63.167 67.505 71.420 76.154 79.490 86.661 50
Γοι values of ν > 100, compulc approximale critical values of χ 2 by formula as follows: .ι t
r I lM
w h e r e / 2 l | , | can be looked up in l a h l c Ml Thus χ;, 0 5 | , :c ,| is computed as III,, ri| I v 240 1 )'
.645 t s/2'· )) 1 - }(I7.10462| 2 = 146.2X4. l o r * · 0.5 employ ι, , I | o e ] in the above formula. When ι -- 0.5,
- 0 Vainer of chi-snnare from 1 to 111 di-i-u-rs of IriN'itom h.ivp hppn taken from ι mnr,* ...ι..ικινι. ι-ιΚΙ.· In.
appendix 2 / statistical tables 325
TABLE zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
IV
continued
α
V .995 .975 .9 .5 .1 .05 .025 .01 .005 .001 f
51 28.735 33.162 38.560 50.335 64.295 68.669 72.616 77.386 80.747 87.968 51
52 29.481 33.968 39.433 51.335 65.422 69.832 73.810 78.616 82.001 89.272 52
53 30.230 34.776 40.308 52.335 66.548 70.993 75.002 79.843 83.253 90.573 53
54 30.981 35.586 41.183 53.335 67.673 72.153 76.192 81.069 84.502 91.872 54
55 31.735 36.398 42.060 54.335 68.796 73.311 77.380 82.292 85.749 93.168 55
56 32.490 37.212 42.937 55.335 69.918 74.468 78.567 83.513 86.994 94.460 56
57 33.248 38.027 43.816 56.335 71.040 75.624 79.752 84.733 88.237 95.751 57
58 34.008 38.844 44.696 57.335 72.160 76.778 80.936 85.950 89.477 97.039 58
59 34.770 39.662 45.577 58.335 73.279 77.931 82.117 87.166 90.715 98.324 59
60 35.534 40.482 46.459 59.335 74.397 79.082 83.298 88.379 91.952 99.607 60
61 36.300 41.303 47.342 60.335 75.514 80.232 84.476 89.591 93.186 100.888 61
62 37.068 42.126 48.226 61.335 76.630 81.381 85.654 90.802 94.419 102.166 62
63 37.838 42.950 49.111 62.335 77.745 82.529 86.830 92.010 95.649 103.442 63
64 38.610 43.776 49.996 63.335 78.860 83.675 88.(X)4 93.217 96.878 104.716 64
65 39.383 44.603 50.883 64.335 79.973 84.821 89.177 94.422 98.105 105.998 65
66 40.158 45.431 51.770 65.335 81.085 85.965 90.349 95.626 99.331 107.2.58 66
67 40.935 46.261 52.659 66.335 82.197 87.108 91.519 96.828 100.55 108.526 67
68 41.713 47.092 53.548 67.334 83.308 88.250 92.689 98.028 101.78 109.791 68
69 42.494 47.924 54.438 68.334 84.418 89.391 93.856 99.228 103-00 111.055 69
70 43.275 48.758 55.329 69.334 85.527 90.531 95.02.3 1 0 0 . 4 3 104.21 112.317 70
71 44.058 49.592 56.221 70.334 86.635 91.670 96.189 101.62 105.43 113.577 71
72 44.843 50.428 57.113 71.334 87.743 92.808 97.35.1 102.82 106.65 114.835 72
73 4^,629 51.265 5S.0O6 72.334 8SH50 93.945 9S.516 104.01 107.86 116.092 73
74 16.417 52.103 58.9(H) 73.33-1 89.956 95.1 >81 99.678 105.20 109.07 117.346 74
75 47.206 52.942 59.795 74.334 91.( )61 96.217 10O.K4 1 0 6 . 3 9 110.29 118.599 75
76 47.997 53.782 60.690 75.334 92.166 97.351 102.00 107.58 111.50 119.850 76
77 48.788 54.623 61.586 76.334 93.270 9S.484 103.16 108.77 112.70 121.1(H) 77
78 49.582 55.466 62.483 77.334 94.373 99./)] 7 104.32 109.96 113.91 122.348 78
79 50.376 56.309 63.380 78.334 95.476 UK).75 105.47 111.14 115.12 123.594 79
80 51.172 57.153 64.278 79.334 96.578 101.88 106.63 112.33 116.32 124.839 80
81 51.969 57.998 65.176 80.334 97.680 101.Ο1 107.78 113.51 117.52 126.082 81
82 52.767 58.845 66.076 81.334 98.780 1 0 4 14 108.9-1 114.69 118.73 127.324 82
83 53.567 59.692 66.97(, 82.334 99.880 105.27 110.09 115.88 119.93 128.565 83
84 54.368 60.540 (.7.876 83.334 100.98 1U6.39 111.24 117.06 121.13 129.801 84
85 55.170 61.389 68.777 84.334 102.08 107.52 1 12.39 11 8.24 122.32 131.041 85
86 55.973 62.239 69.679 85.334 Κ13.18 108.65 1 1 3.54 119.41 123.52 132.277 86
87 56.777 63.(189 70.581 86.334 104.28 109.77 114.69 120.59 124.72 133.512 87
HH 57.582 63.941 71.484 87.334 105.37 110.90 1 15.84 121.77 125.91 134.745 88
8') 58.389 64.793 72.387 88.334 106.47 1 12.02 116.99 122.94 127.11 135.978 89
9(1 59.196 65.647 73.291 89.334 107.56 113.15 1 1 8.14 124.12 128.30 137.208 90
VI 6().(* 15 66.501 74.196 90.334 108.66 114.27 1 19.28 125.29 129.49 138.438 91
92 60.815 67.356 75.101 91.331 109.76 1 15.39 120.43 126.46 1 30.68 1 39.666 92
93 61.625 68.2 Π 76.0(16 92.33-1 110.85 I 16.51 121.57 127.63 131.87 140.893 93
9.1 62 437 69.068 76.91 2 9 3 . (34 111.94 117.63 122.72 128.80 133.06 142.1 19 94
"5 63.250 69.925 77.818 94.334 113.04 1 18.75 1 23.86 129.97 1 34.25 14 i.3-14 95
64.06.1 70.783 78.725 95.334 114.13 119.87 125.00 131.14 135.43 144.567 96
97 64.878 7 1.642 79.633 96.334 115.22 1 20.99 1 26.14 132.31 136.62 145.789 97
98 65.694 72.501 80.541 97.33-1 116.32 122.1 1 127.28 133.48 137.80 I 4 7.(1 [O <'8
99 66.510 73.361 81.449 98.334 117.41 123.23 128.42 134.64 138.99 148.230 99
loo 67.328 74.222 82.358 99.334 118.50 1 24.34 1 29.56 135.81 140.17 149 149 1(H)
a p p e n d i x 2 / s t a t i s t i c a l t a b l e s 326
TABLE V zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Critical values of theyxvutsrqponmlihfedcbaYVTSRPONHGFDCA
F distribution
a 1 2 3 4 5 6 7 8 9 10 11 12 α
1 .05 161 199 216 225 230 234 237 239 241 241 243 244 .05
.025 648 800 864 900 922 937 948 957 963 969 973 977 .025
.01 4050 5000 5400 5620 5760 5860 5930 5980 6020 6060 6080 6110 .01
2 .05 18.5 19.0 19.2 19.2 19.3 19.3 19.4 19.4 19.4 19.4 19.4 19.4 .05
.025 38.5 39.0 39.2 39.2 39.3 39.3 39.4 39.4 39.4 39.4 39.4 39.4 .025
.01 98.5 99.0 99.2 99.2 99.3 99.3 99.4 99.4 99.4 99.4 99.4 99.4 .01
3 .05 10.1 9.55 9.2S 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.76 8.74 .05
.025 17.4 16.0 15.4 15.1 14.9 14.7 14.6 14.5 14.5 14.4 14.3 14.3 .025
.01 34.1 30.8 29.5 28.7 28.2 27.9 27.7 27.5 27.3 27.2 27.1 27.1 .01
4 .05 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.93 5.91 .05
.025 12.2 10.6 9.98 9.60 9.36 9.20 9.07 8.98 8.90 8.84 8.79 8.75 .025
.01 21.2 18.0 16.7 16.0 15.5 15.2 15.0 14.8 14.7 14.5 14.4 14.4 .01
5 .05 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.71 4.68 .05
.025 10.0 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68 6.62 6.57 6.52 .025
.01 16.3 13.3 12.1 11.4 11.0 10.7 10.5 10.3 10.2 10.1 9.99 9.89 .01
6 .05 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.03 4.00 .05
.025 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.52 5.46 5.41 5.37 .025
.01 13.7 10.9 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.79 7.72 .01
7 .05 5.59 4.74 4.35 4.12 3.97 3.87 3.77 3.73 3.68 3.64 3.60 3.57 .05
.025 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.89 4.82 4.76 4.71 4.67 .025
.01 12.2 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.54 6.47 .01
8 .05 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.31 3.28 .05
.025 7.57 6.06 5.42 5.05 4.82 4.65 4.53 4.43 4.36 4.30 4.25 4.20 .025
.01 11.3 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.73 5.67 .01
9 .05 .5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.10 3.07 .05
.025 7.21 5.71 5.08 4.72 4.48 4.32 4.20 4.10 4.03 3.96 3.91 3.87 .025
.01 10.6 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 5.18 5.11 .01
10 .05 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.94 2.91 .05
.025 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.78 3.72 3.67 3.62 .025
.01 10.0 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.77 4.71 .01
t a b l ezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB
V
continued
!X 15 20 24 30 40 50 60 120 CO a
1 .05 246 248 249 250 251 252 252 253 254 .05
.025 985 993 997 1000 1010 1010 1010 1010 1020 .025
.01 6160 6210 6230 6260 6290 6300 6310 6340 6370 .01
2 .05 19.4 19.4 19.5 19.5 19.5 19.5 19.5 19.5 19.5 .05
.025 39.4 39.4 39.5 39.5 39.5 39.5 39.5 39.5 39.5 .025
.01 99.4 99.4 99.5 99.5 99.5 99.5 99.5 99.5 99.5 .01
3 .05 8.70 8.66 8.64 8.62 8.59 8.58 8.57 8.55 8.53 .05
.025 14.3 14.2 14.1 14.1 14.0 14.0 14.0 13.9 13.9 .025
.01 26.9 26.7 26.6 26.5 26.4 26.3 26.3 26.2 26.1 .01
4 .05 5.86 5.80 5.77 5.75 5.72 5.70 5.69 5.66 5.63 .05
.025 8.66 8.56 8.51 8.46 8.41 8.38 8.36 8.31 8.26 .025
.01 14.2 14.0 13.9 13.8 13.7 13.7 13.7 13.6 13.5 .01
5 .05 4.62 4.56 4.53 4.50 4.46 4.44 4.43 4.40 4.36 .05
.025 6.43 6.33 6.28 6.23 6.18 6.14 6.12 6.07 6.02 .025
.01 9.72 9.55 9.47 9.38 9.29 9.24 9.20 9.11 9.02 .01
6 .05 3.94 3.87 3.84 3.81 3.77 3.75 3.74 3.70 3.67 .05
.025 5.27 5.17 5.12 5.07 5.01 4.98 4.96 4.90 4.85 .025
.01 7.56 7.40 7.31 7.23 7.14 7.09 7.06 6.97 6.88 .01
7 .05 3.51 3.44 3.41 3.38 3.34 3.32 3.30 3.27 3.23 .05
.025 4.57 4.47 4.42 4.36 4.31 4.27 4.25 4.20 4.14 .025
.01 6.31 6.16 6.07 5.99 5.91 5.86 5.82 5.74 5.65 .01
8 .05 3.22 3.15 3.12 3.08 3.04 3.02 3.01 2.97 2.93 .05
.025 4.10 4.00 3.95 3.89 3.84 3.80 3.78 3.73 3.67 .025
.01 5.52 5.36 5.28 5.20 5.12 5.07 5.03 4.95 4.86 .01
9 .05 3.01 2.94 2.90 2.86 2.83 2.81 2.79 2.75 2.71 .05
.025 3.77 3.67 3.61 3.56 3.51 3.47 3.45 3.39 3.33 .025
.01 4.96 4.81 4.73 4.65 4.57 4.52 4.48 4.40 4.31 .01
10 .05 2.85 2.77 2.74 2.70 2.66 2.64 2.62 2.58 2.54 .05
.025 3.52 3.42 3.37 3.31 3.26 3.22 3.20 3.14 3.08 .025
.01 4.56 4.41 4.33 4.25 4.17 4.12 4.08 4.00 3.91 .01
328 a p p e n d i x 2 / s t a t i s t i c a l t a b l e s 328
table zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
V
continued
α 1 2 3 4 5 6 7 8 9 10 11 12 α
11 .05 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.82 2.79 .05
.025 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.59 3.53 3.48 3.43 .025
.01 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54 4.46 4.40 .01
12 .05 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.72 2.69 .05
.025 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.44 3.37 3.32 3.28 .025
.01 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30 4.22 4.16 .01
15 .05 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.51 2.48 .05.
.025 6.20 4.77 4.15 3.80 3.58 3.41 3.29 3.20 3.12 3.06 3.01 2.96 .025
.01 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.73 3.67 .01
20 .05 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.31 2.28 .05
.025 5.87 4.46 3.86 3.51 3.29 3.13 3.01 2.91 2.84 2.77 2.72 2.68 .025
.01 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.29 3.23 .01
24 .05 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.22 2.18 .05
.025 5.72 4.32 3.72 3.38 3.15 2.99 2.87 2.78 2.70 2.64 2.59 2.54 .025
.01 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17 3.09 3.03 .01
30 .05 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.13 2.09 .05
.025 5.57 4.18 3.59 3.25 3.03 2.87 2.75 2.65 2.57 2.51 2.46 2.41 .025
.01 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 2.98 2.90 2.84 .01
40 .05 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 2.04 2.04 .05
.025 5.42 4.05 3.46 3.13 2.90 2.74 2.62 2.53 2.45 2.39 2.33 2.29 .025
.01 7.31 5.1 8 4.31 3.83 3.51 3.29 3.12 2.99 2.89 2.80 2.73 2.66 .01
60 .05 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 1.95 1.92 .05
.025 5.29 3.93 3.34 3.01 2.79 2.63 2.51 2.41 2.33 2.27 2.22 2.17 .025
.01 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63 2.56 2.50 .01
120 .05 3.92 3.07 2.68 2.45 2.29 2.17 2.09 2.02 1.96 1.91 1.87 1.83 .05
.025 5.15 3.80 3.23 2.89 2.67 2.52 2.39 2.30 2.22 2.16 2.10 2.05 .025
.01 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47 2.40 2.34 .01
» .05 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.83 1.79 1.75 .05
.025 5.02 3.69 3.11 2.79 2.57 2.41 2.29 2.19 2.11 2.05 1.99 1.94 .025
.01 6.63 4.61 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32 2.25 2.18 .01
appendix 2 / statistical tables
t a b l e zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
V
continued
α 15 20 24 30 40 50 60 120 Oo tx
11 . 0 5 2.72 2.65 2.61 2.57 2.53 2.51 2.49 2.45 2.40 .05
.025 3.33 3.23 3.17 3.12 3.06 3.02 3.00 2.94 2.88 .025
.01 4.25 4.10 4.02 3.94 3.86 3.81 3.78 3.69 3.60 .01
12 .05 2.62 2.54 2.51 2.47 2.43 2.40 2.38 2.34 2.30 .05
.025 3.18 3.07 3.02 2.96 2.91 2.87 2.85 2.79 2.72 .025
.01 4.01 3.86 3.78 3.70 3.62 3.57 3.54 3.45 3.36 .01
15 .05 2.40 2.33 2.39 2.25 2.20 2.18 2.16 2.11 2.07 .05
.025 2.86 2.76 2.70 2.64 2.59 2.55 2.52 2.46 2.40 .025
.01 3.52 3.37 3.29 3.21 3.13 3.08 3.05 2.96 2.87 .01
20 .05 2.20 2.12 2.08 2.04 1.99 1.97 1.95 1.90 1.84 .05
.025 2.57 2.46 2.41 2.35 2.29 2.25 2.22 2.16 2.09 .025
.01 3.09 2.94 2.86 2.78 2.69 2.64 2.61 2.52 2.42 .01
24 .05 2.11 2.03 1.98 1.94 1.89 1.86 1.84 1.79 1.73 .05
.025 2.44 2.33 2.27 2.21 2.15 2.11 2.08 2.01 1.9-1 .025
.01 2.89 2.74 2.66 2.58 2.49 2.44 2.40 2.31 2.21 .01
3 0 .05 2.01 1.93 1.89 1.84 1.79 1.76 1.74 1.68 1.62 .05
.025 2.31 2.20 2.14 2.07 2.01 1.97 1.94 1.87 1.79 .025
.01 2.70 2.55 2.47 2.39 2.30 2.25 2.21 2.11 2.01 .01
40 .05 1.92 1.84 1.79 1.74 1.69 1.66 1.64 1.58 1.51 .05
.025 2.18 2.07 2.01 1.94 1.88 1.83 1.80 1.72 1.64 .025
.01 2.52 2.37 2.29 2.20 2.11 2.06 2.02 1.92 1.80 .01
6 0 .05 1.84 1.75 1.70 1.65 1.59 1.56 1.53 1.47 1.39 .05
.025 2.06 1.94 1.88 1.82 1.74 1.70 1.67 1.58 1.48 .025
.01 2.35 2.20 2.12 2.03 1.94 1.88 1.84 1.73 1.60 .01
120 .05 1.75 1.66 1.61 1.55 1.50 1.46 1.43 1.35 1.25 .05
.025 1.95 1.82 1.76 1.69 1.61 1.56 1.53 1.43 1.31 .025
.01 2.19 2.03 1.95 1.86 1.76 1.70 1.66 1.53 1.38 .01
® .05 1.67 1.57 1.52 1.46 1.39 1.35 1.3 2 1.22 1.00 .05
.025 1.83 1.71 1.64 1.57 1.48 1.43 1.39 1.27 1.<X) .025
.01 2.04 1.88 1.79 1.70 1.59 1.52 1.47 1.32 1.00 .01
330 a p p e n d i x 2 / s t a t i s t i c a l t a b l e s 330
t a b l e VI
Critical values of F m a x
α (number of samples)
ν α 2 3 4 5 6 7 8 9 10 11 12
2 .05 39.0 87.5 142. 202. 266. 333. 403. 475. 550. 626. 704.
.01 199. 448 729. 1036. 1362. 1705. 2063. 2432. 2813. 3204. 3605.
3 .05 15.4 27.8 39.2 50.7 62.0 72.9 83.5 93.9 104. 114. 124.
.01 47.5 85. 120. 151. 184. 21(6) 24(9) 28(1) 31(0) 33(7) 36(1)
4 .05 9.60 15.5 20.6 25.2 29.5 33.6 37.5 41.1 44.6 48.0 51.4
.01 23.2 37. 49. 59. 69. 79. 89. 97. 106. 113. 120.
5 .05 7.15 10.8 13.7 16.3 18.7 20.8 22.9 24.7 26.5 28.2 29.9
.01 14.9 22. 28. 33. 38. 42. 46. 50. 54. 57. 60.
6 .05 5.82 8.38 10.4 12.1 13.7 15.0 16.3 17.5 18.6 19.7 20.7
.01 11.1 15.5 19.1 22. 25. 27. 30. 32. 34. 36. 37.
7 .05 4.99 6.94 8.44 9.70 10.8 11.8 12.7 13.5 14.3 15.1 15.8
.01 8.89 12.1 14.5 16.5 18.4 20. 22. 23. 24. 26. 27.
8 .05 4.43 6.00 7.18 8.12 9.03 9.78 10.5 11.1 11.7 12.2 12.7
.01 7.50 9.9 11.7 13.2 14.5 15.8 16.9 17.9 18.9 19.8 21.
9 .05 4.03 5.34 6.31 7.11 7.80 8.41 8.95 9.45 9.91 10.3 10.7
.01 6.54 8.5 9.9 11.1 12.1 13.1 13.9 14.7 15.3 16.0 16.6
10 .05 3.72 4.85 5.67 6.34 6.92 7.42 7.87 8.28 8.66 9.01 9.34
.01 5.85 7.4 8.6 9.6 10.4 11.1 11.8 12.4 12.9 13.4 13.9
12 .05 3.28 4.16 4.79 5.30 5.72 6.09 6.42 6.72 7.00 7.25 7.48
.01 4.91 6.1 6.9 7.6 8.2 8.7 9.1 9.5 9.9 vtsrpnmlihgfcSOC
10.2 10.6
15 .05 2.86 3.54 4.01 4.37 4.68 4.95 5.19 5.40 5.59 5.77 5.9.3
.01 4.07 4.9 5.5 6.0 6.4 6.7 7.1 7.3 7.5 7.8 8.0
20 .05 2.46 2.95 3.29 3.54 3.76 3.94 4.10 4.24 4.37 4.49 4.59
.01 3.32 3.8 4.3 4.6 4.9 5.1 5.3 5.5 5.6 5.8 5.9
30 .05 2.07 2.40 2.61 2.78 2.91 3.02 3.12 3.21 3.29 3.36 3.39
.oi 2.63 3.0 3.3 .3.4 3.6 3.7 3.8 .3.9 4.0 4.1 4.2
60 .05 1.67 1.85 1.96 2.04 2.1 1 2.17 2.22 2.26 2.30 2.3.3 2.36
.01 1.96 2.2 2.3 2.4 2.4 2.5 2.5 2.6 2.6 2.7 2.7
CO .05 1.00 1.00 1.00 l.(X) 1.00 1.00 1.00 1.00 l.(X> 1.00 1.00
.01 1.00 1.00 1.00 1.00 1.00 1.<X> 1.00 1.00 1.00 1.00 1.00
Noli· Corresponding lo cach value of a {number of samples) and ν (degrees of freedom) arc Iwo critical values
of /' m : 1 I representing the upper 5% and 1 '7, percentage points The corresponding probabilities a 0.05 and 0.01
represent one tad of the /'„ M , distribution This table was copied from H. A. David (Binnn lnkn 39:422 424. l l )S2)
with permission of the publisher and author
appendix 2 / statistical tables 331
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
t a b l eVII
Shortest unbiased confidence limits for the variance
Note: The factors in this table have been obtained by dividing the quantity n 1 by the values found in a table
prepared by D. V Lindley, D. A. East, and P. A. Hamilton (Biometrika 47:433 437, 1960).
332 a p p e n d i x 2 / s t a t i s t i c a l t a b l e s 332
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
t a b l e VIII
Critical values for correlation coefficients
V α r V α r V α r
Nftte: Upper value is V',",, lower value is 1 c r i t i c a l value. This table is reproduced by permission from Stuli.slitnl
M c t h n h , 5th edition, by (ii-or^c W. Snedecor, (c) 1956 by The Iowa State University Press.
appendix 2 / statistical tables 333
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
TABLE I X
Confidence limits for percentages
This table furnishes confidence limits for percentages based on the binomial
distribution.
The first part of the table furnishes limits for samples up to size η = 30.
The arguments are Y, number of items in the sample that exhibit a given prop
erty, and n, sample size. Argument Y is tabled for integral values between 0 and
15, which yield percentages up to 50%. For each sample size η and number of
items 7 with the given property, three lines of numerical values are shown. The
first line of values gives 95% confidence limits for the percentage, the second line
lists the observed percentage incidence of the property, and the third line of
values furnishes the 99% confidence limits for the percentage. For example, for
Τ = 8 individuals showing the property out of a sample of η = 20, the second
line indicates that this represents an incidence of the property of 40.00%, the
first line yields the 95% confidence limits of this percentage as 19.10% to 63.95%,
and the third line gives the 99% limits as 14.60% to 70.10%.
Interpolate in this table (up to η = 49) by dividing L a n d L J , the lower
and upper confidence limits at the next lower tabled sample sizezyxwvutsrqponmlkjihg
n~, by desired
sample size n, and multiply them by the next lower tabled sample size n~. Thus,
for example, to obtain the confidence limits of the percentage corresponding to
8 individuals showing the given property in a sample of 22 individuals (which
corresponds to 36.36% of the individuals showing the property), compute the
lower confidence limit Li - L^n~/n = (19.10)20/22 = 17.36% and the upper
confidence limit L 2 = L J n " / « = (63.95)20/22 = 58.14%.
The second half of the table is for larger sample sizes (n = 50, 100, 200,
500, and 1000). The arguments along the left margin of the table are percentages
from 0 to 50% in increments of 1%, rather than counts. The 95% and 99%
confidence limits corresponding to a given percentage incidence ρ and sample
size η are the functions given in two lines in the body of the table. For instance,
the 99% confidence limits of an observed incidence of 12% in a sample of
500 are found to be 8.5616.19%, in the second of the two lines. Interpolation
in this table between the furnished sample sizes can be achieved by means of the
following formula for the lower limit:
__ L, n~(n* - n) + Lfn'(« - Ό
1 +
n(n — n~)
In the above expression, η is the" size of the observed sample, n~ and n+ the
next lower and upper tabled sample sizes, respectively, L^ and are corre
sponding tabled confidence limits for these sample sizes, and L, is the lower
confidence limit to be found by interpolation. The upper confidence limit, L 2 ,
can be obtained by a corresponding formula by substituting 2 for the subscript
1. By way of an example we shall illustrate setting 95% confidence limits to an
observed percentage of 25% in a sample size of 80. The tabled 95% limits for
n = 50 are 13.84 39.27%. For n = 100, the corresponding tabled limits are
334 a p p e n d i x 2 / s t a t i s t i c a l t a b l e s 334
16.8834.66%. When we substitute the values for the lower limits in the above
formula we obtain
for the lower confidence limit. Similarly, for the upper confidence limit we
compute
The tabled values in parentheses are limits for percentages that could not be
obtained in any real sampling problem (for example, 25% in 50 items) but are
necessary for purposes of interpolation. For percentages greater than 50% look
up the complementary percentage as the argument. The complements of the
tabled binomial confidence limits are the desired limits.
These tables have been extracted from more extensive ones in D. Mainland,
L. Herrera, and Μ . I. Sutcliffe,zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFE
Tables for Use with Binomial Samples (Depart
ment of Medical Statistics, New York University College of Medicine, 1956)
with permission of the publisher. The interpolation formulas cited are also due
to these authors. Confidence limits of odd percentages up to 13% for η = 50
were computed by interpolation. For Y = 0, onesided (1 — a)100% confidence
limits were computed as L 2 = 1 — oc1/n with L, = 0.
APPENDIX 2 / s t a t i s t i c a l tables
χ
ice its for percentages
η
5 10 15 20 25 30
95 0.00-45.07 0.00-25.89 0.00-18.10 0.00-13.91 0.00-11.29 0.00- 9.50
0.00 0.00 0.00 0.00 0.00 0.00
99 0.00-60.19 0.00-36.90 0.00-26.44 0.00-20.57 0.00-16.82 0.00-14.23
95 0.51-71.60 0.25-44.50 0.17-32.00 0.13-24.85 0.10-20.36 0.08-17.23
20.00 10.00 6.67 5.00 4.00 3.33
99 0.10-81.40 0.05-54.4 0.03-40.27 0.02-31.70 0.02-26.24 0.02-22.33
95 5.28-85.34 2.52-55.60 1.66-40.49 1.24-31.70 0.98-26.05 0.82-22.09
40.00 20.00 13.33 10.00 8.00 6.67
99 2.28-91.72 1.08-64.80 0.71-48.71 0.53-38.70 0.42-32.08 0.35-27.35
95 6.67-65.2 4.33-48.07 3.21-37.93 2.55-31.24 2.11-26.53
30.00 20.00 15.00 12.00 10.(X)
99 3.70-73.50 2.39-56.07 1.77-45.05 1.40-37.48 1.16-32.03
95 12.20-73.80 7.80-55.14 5.75-43.65 4.55-36.10 3.77-30.74
40.00 26.67 20.(X) 16.00 13.33
99 7.68-80.91 4.88-62.78 3.58-50.65 2.83-42.41 2.34-36.39
95 18.70-81.30 11.85-61.62 8.68-49.13 6.84-40.72 5.64-34.74
50.00 33.3.3 25.00 20.(X) 16.67
99 12.80-87.20 8.03-68.89 5.85-56.05 4.60-47.(X) 3.79-40.44
95 16.33-67.74 11.90-54.30 9.35-45.14 7.70-38.56
40.00 30.00 24.CX > 20.00
99 11.67-74.40 8.45-60.95 6.62-51.38 5.43-44.26
95 21.29-73.38 15.38-59.20 12.06-49.38 9.92-42.29
46.67 35.00 28.00 23.33
99 15.87-79.54 11.40-65.70 8.90-55.56 7.29-48.01
95 19.10-63.95 14.96-53.50 12.29-45.89
40.00 32.00 26.67
99 14.60-70.10 11.36-59.54 9.30 51.58
95 23.05-68.48 17.97 57.48 14.73-49.40
45.(X> 36.IX) 30.00
99 18.08-74.30 14.01-63.36 11.43-55.00
95 27.20-72.80 21.12-61.32 17.29 52.80
50.00 40.00 33.33
99 21.75-78.25 16.80 67.04 13.69 58.35
95 24.41-65.06 19.93 56.13
44.00 36.67
99 19.75-70.55 16.06-61.57
95 27.81-68.69 22.66 59.39
48.00 40.IXI
99 22.84 73.93 18.50-64.69
95 25.46 62.56
43.33
99 21.07 67.72
95 28.35-65.66
46.67
99 23.73-70.66
95 31.3(1 68.70
50.00
99 26.47 73.53
appendix 2 /
IX
led
η
α 50 100 200 500 1000
95 .00- 7.11 .00- 3.62 .00- 1.83 O .00- 0.74 .00- 0.37
99 .00-10.05 vtsrpnmlihgfcSOC
.00- 5.16 .00- 2.62 .00- 1.05 .00- 0.53
95 (.02 8.88) .02- 5.45 .12- 3.57 .32- 2.32 tsronljfeaUT
.48- 1.83
99 (.0012.02) .00- 7.21 .05- 4.55 .22 2.80 .37- 2.13
IX
te d
η
50 100 200 500 1000
95 14.63-40.34 17.75-35.72 20.08-32.65 22.21-30.08 23.31-28.83
99 11.98-44.73 15.59-38.76 18.43-34.75 21.10-31.36 22.50-29.73
95 (15.45-41.40) 18.62-36.79 20.99-33.70 23.16-31.11 24.27-29.86
99 (12.71-45.79) 16.42 39.84 19.31 35.81 22.04-32.41 23.46-30.76
95 16.23-42.48 19.50-37.85 21.91-34.76 24.11-32.15 25.24 30.89
99 13.42-46.88 17.25-40.91 20.20-36.88 22.97-33.46 24.41-31.80
95 (17.06-43.54) 20.37-38.92 22.82-35.81 25.06-33.19 26.21-31.92
99 (14.18-47.92) 18.07-41.99 21.08-37.94 23.90-34.51 25.37-32.84
95 17.87-44.61 21.24-39.98 23.74-36.87 26.01-34.23 27.17-32.95
99 14.91-48.99 18.90-43.06 21.97-39.01 24.83-35.55 26.32-33.87
95 (18.71-45.65) 22.14-41.02 24.67-37.90 26.97-35.25 28.15-33.97
99 (15.68-50.02) 19.76-44.11 22.88-40.05 25.78-36.59 27.29-34.90
95 19.55-46.68 23.04-42.06 25.61-38.94 27.93-36.28 29.12-34.99
99 16.46-51.05 20.61-45.15 23.79-41.09 26.73-37.62 28.25-35.92
95 (20.38-47.72) 23.93-43.10 26.54-39.97 28.90-37.31 30.09-36.01
99 (17.23-52.08) 21.47-46.19 24.69-42.13 27.68-38.65 29.22-36.95
95 21.22-48.76 24.83-44.15 27.47-41.01 29.86-38.33 31.07-37.03
99 18.01-53.11 22.33-47.24 25.60-43.18 28.62-39.69 30.18-37.97
95 (22.06-49.80) 25.73-45.19 28.41-42.04 30.82-39.36 32.04 38.05
99 (18.78-54.14) 23.19-48.28 26.51-44.22 29.57-40.72 31.14 39.00
95 22.93-50.80 26.65-46.20 29.36-43.06 31.79-40.38 33.02-39.06
99 19.60-55.13 24.08-49.30 27.44 45.24 30.53-41.74 32.12-40.02
95 (23.80-51.81) 27.57 47.22 30.31-44.08 32.76-41.39 34.00-40.07
99 (20.42-56.12) 24.96-50.31 28.37-46.26 31.49-42.76 33.09-41.03
95 24.67-52.81 28.49-48.24 31.25-45.10 33.73-42.41 34.98-41.09
99 21.23-57.10 25.85-51.32 29.30-47.29 32.45-43.78 34.07-42.05
95 (25.54-53.82) 29.41-49.26 32.20-46.12 34.70-43.43 35.97-42.10
99 (22.05-58.09) 26.74 52.34 30.23 48.31 33.42 44.80 35.04-43.06
95 26.41-54.82 30.33-50.28 33.15 47.14 35.68-44.44 36.95 43.11
99 22.87-59.08 27.63-53.35 31.16-49.33 34.38 45.82 36.02-44.08
95 (27.31-55.80) 31.27-51.28 34.12 48.15 36.66 45.45 37.93-44.12
99 (23.72-60.04) 28.54-54.34 32.11 50.33 35.35-46.83 37.00-45.09
95 28.21-56.78 32.21 52.28 35.08-49.16 37.64-46.46 38.92-45.12
99 24.57-60.99 29.45-55.33 33.06-51.33 36.32-47.83 37.98-46.10
95 (29.ΙΟ 57.76) 33.15 53.27 36.05 50.16 38.62 47.46 39.91 46.13
99 (25.42-61.95) 30.37-56.32 34.01 52.34 37.29-48.84 38.96-47.10
95 30.00 58.74 34.09 54.27 37.01-51.17 39.60-48.47 40.90-47.14
99 26.27 62.90 31.28 57.31 34.95-53.34 38.27-49.85 39.95-48.11
95 (30.90 59.71) 35.03 55.27 37.97 52.17 40.58-49.48 41.89-48.14
99 (27.12 63.86) 32.19 58.30 35.90-54.34 39.24-50.86 40.93-49.12
95 31.83-60.67 35.99 56.25 38.95-53.17 41.57-50.48 42.88 49.14
99 28.0064.78 33.13 59.26 36.87-55.33 40.22-51.85 41.92-50.12
95 (32.75-61.62) 36.95-57.23 39.93-54.16 42.56-51.48 43.87-50.14
99 (28.89 65.69) 34.07-60.22 37.84-56.31 41.21 52.85 42.91 51.12
95 33.68 62.57 37.91-58.21 40.91-55.15 43.55-52.47 44.87 51.14
99 29.78-66.61 35.01-61.19 38.80-57.30 42.19-53.85 43.90-52.12
95 (34.61 63.52) 38.87-59.19 41.89-56.14 44.54 53.47 45.86 52.14
99 (30.67 67.53) 35.95-62.15 39.77-58.28 43.18 54.84 44.89 53.12
95 35.53-64.47 39.83-60.17 42.86-57.14 45.53-54.47 46.85-53.15
99 31.55 68.45 36.89-63.11 40.74-59.26 44.16-55.84 45.89 54.11
APPENDIX 2 /roleds t a t i s t i c a l t a b l e s 349
TABLL X
The ; transformation correlation coefficient r
r Ζ r Ζ
TABLE X I zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED
Critical values ofzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFE
U, the MannWhitney statistic
α
n \ n2 0.10 0.05 0.025 0.01 0.005 0.001
3 2 6
3 8 9
4 2 8
3 11 12
4 13 15 16
5 2 9 10
3 13 14 15
4 16 18 19 20
5 20 21 23 24 25
6 2 11 12
3 15 16 17
4 19 21 22 23 24
5 23 25 27 28 29
6 27 29 31 33 34
7 2 13 14
3 17 19 20 21
4 22 24 25 27 28
5 27 29 30 32 34
6 31 34 36 38 39 42
7 36 38 41 43 45 48
8 2 14 15 16
3 19 21 22 24
4 25 27 28 30 .31
5 30 32 34 36 38 40
6 35 38 40 42 44 47
7 40 43 46 49 50 54
8 45 49 51 55 57 60
9 1 9
2 16 17 18
3 22 23 25 26 27
4 27 30 32 33 35
5 33 36 38 40 42 44
6 39 42 44 47 49 52
7 45 48 51 54 56 60
8 50 54 57 61 63 67
9 56 60 64 67 70 74
10 1 10
2 17 19 20
3 24 26 27 29 30
4 30 33 35 37 .38 40
5 37 39 42 44 46 49
6 43 46 49 52 54 57
7 49 53 56 59 6) 65
8 56 60 63 67 69 74
9 62 66 70 74 77 82
10 68 73 77 81 84 90
U
Noli-: Critical values are tahulaled for two samples of sizes and n 2 . where > P 1 0 "·, "ι ~ The
upper bounds of the critical values are furnished so that the sample statistic U, has to be greater than a given
critical value to be significant. The probabilities at the heads of the columns are based on a one-tailed lest and
represent the proportion of the area of the distribution of 1' in one tail beyond the critical value, f or a two-tailed
test use the same critical values but double the probability at the heads of the columns. This table was extracted
from a more extensive one (table 11.4) in D. B. Owen. Handbook of Statistical Tables (Addison-Wesley Publishing
Co , Reading, Mass.. 1962): Courtesy of U.S. Atomic Energy Commission, with permission of the publishers.
appendix 2 / statistical tables
XI
led
α
η· 0.10 0.05 0.025 0.01 0.005 0.001
1 11
2 19 21 22
3 26 28 30 32 33
4 33 36 38 40 42 44
5 40 43 46 48 50 53
6 47 50 53 57 59 62
7 54 58 61 65 67 71
8 61 65 69 73 75 80
9 68 72 76 81 83 89
10 74 79 84 88 92 98
11 81 87 91 96 100 106
1 12
2 20 22 23
3 28 31 32 34 35
4 36 39 41 42 45 48
5 43 47 49 52 54 58
6 51 55 58 61 63 68
7 58 63 66 70 72 77
8 66 70 74 79 81 87
9 73 78 82 87 90 96
10 81 86 91 96 99 106
11 88 94 99 104 108 115
12 95 102 107 113 117 124
1 13
2 22 24 25 26
3 30 33 35 37 38
4 39 42 44 47 49 51
5 47 50 53 56 58 62
6 55 59 62 66 68 73
7 63 67 71 75 78 83
8 71 76 80 84 87 93
9 79 84 89 94 97 103
10 87 93 97 103 106 113
11 95 101 106 112 116 123
12 103 109 115 121 125 133
13 111 118 124 130 135 143
1 14
2 24 25 27 28
3 32 35 37 40 41
4 41 45 47 50 52 55
5 50 54 57 60 63 67
6 59 63 67 71 73 78
7 67 72 76 81 83 89
8 76 81 86 90 94 100
9 85 90 95 100 104 111
10 93 99 104 110 114 121
11 102 108 114 120 124 132
12 110 117 123 130 134 143
13 119 126 132 139 144 153
14 127 135 141 149 154 164
352
a p p e n d i x 2 / s t a t i s t i c a l TABLES
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC
TABLE X I
continued
n
2
0.10 0.05 0.025 0.01 0.005 0.001
"lzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
15 1 15
2 25 27 29 30
3 35 38 40 42 43
4 44 48 50 53 55 59
5 53 57 61 64 67 71
6 63 67 71 75 78 83
7 72 77 81 86 89 95
8 81 87 91 96 100 106
9 90 96 101 107 111 118
10 99 106 111 117 121 129
11 108 115 121 128 132 141
12 117 125 131 138 143 152
13 127 134 141 148 153 163
14 136 144 151 159 164 174
15 145 153 161 169 174 185
16 1 16
2 27 29 31 32
3 37 40 42 45 46
4 47 50 53 57 59 62
5 57 61 65 68 71 75
6 67 71 75 80 83 88
7 76 82 86 91 94 101
8 86 92 97 102 106 113
9 96 102 107 113 117 125
10 106 112 118 124 129 137
11 115 122 129 135 140 149
12 125 132 139 146 151 161
13 134 143 149 157 163 173
14 144 153 160 168 174 185
15 154 163 170 179 185 197
16 163 173 181 190 196 208
17 1 17
2 28 31 32 34
3 39 42 45 47 49 51
4 50 53 57 60 62 66
5 60 65 68 72 75 80
6 71 76 80 84 87 93
7 81 86 91 96 100 106
8 91 97 102 108 112 119
9 101 108 114 120 124 132
10 112 119 125 132 136 145
11 122 130 136 143 148 158
12 132 140 147 155 160 170
13 142 151 158 166 172 183
14 153 161 169 178 184 195
15 163 172 180 189 195 208
16 173 183 191 201 207 220
17 183 193 202 212 21c> 232
APPENDIX 2 / STATISTICAL TABLES
XI
ed
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED
t a b l eXII
nominal α
0.05 0.025 0.01 0.005
η Τ a Τ α Τ α Τ α
5 0 .0312
1 .0625
6 2 .0469 0 .0156
3 .0781 1 .0312
7 3 .0391 2 .0234 0 .0078
4 .0547 3 .0391 1 .0156
No te This tabic furnishes critical values Tor the one-tailed test of significance of the tank sum / , obtained in
Wileoxon's matched-pair's signed-ranks lest Since the exact probability level desired cannot be obtained with
integral critical values of T, two such values and iheir attendant probabilities bracketing the desired signlicance
level are furnished. Thus, to find the significant 1**» values for κ -- W we note the two critical of / . Μ and 3K,
in the table. H i e probabilities corresponding to these two values of I are 0.0090 and 0.0102. Clearly a rank sum
of Ί\ 37 would have a probability of less than 0.01 and would be considered significant by the stated criterion.
I or two-tailed tests in which the alternative hypothesis is that the pairs could diller in either direction, double
the probabilities .stated at the head of the table. F o r sample sizes η > 59 compute
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
TABLE X I I
continued
nominal α
0.05 0.025 0.01 0.005
η Τ α Τ α Τ α Τ α
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED
t a b l e XII
continued
nominal α
0.05 0.025 0.01 0.005
η Τ α Τ α Τ α Τ α
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
TABLE X I I I
Critical values of the twosample KolmogorovSmirnov statistic.
«2
,
ηn a 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
2 .05 16 18 20 22 24 26 26 28 30 32 34 36 38 38 40 42 44 46
.025 24 26 28 30 32 34 36 38 40 40 42 44 16 48
.01 38 40 42 44 46 48 50
3 .05 15 18 21 21 24 27 30 30 33 36 36 39 42 45 45 48 51 51 54 57 60
.025 18 21 24 27 30 30 33 36 39 39 42 45 48 51 51 54 57 60 60 63
.01 27 30 33 36 39 42 42 45 48 51 54 57 57 60 63 66 69
4 .05 16 20 20 24 28 28 30 33 36 39 42 44 48 48 50 53 60 59 62 64 68 68
.025 20 24 28 28 32 36 36 40 44 44 45 52 52 54 57 64 63 66 69 72 75
.01 - - - - 24 28 32 36 36 40 44 48 48 52, 56 60 60 64 68 72 72 76 80 84
5 .05 15 20 25 24 28 30 35 40 39 43 45 46 55 54 55 60 61 65 69 70 72 76 80
.025 - 20 25 30 30 32 36 40 44 45 47 51 55 59 60 65 66 75 74 78 80 81 90
.01 - 25 30 35 35 40 45 45 50 52 56 60 64 68 70 71 80 80 83 87 90 95
6 .05 18 20 24 30 30 34 39 40 43 48 52 54 57 60 62 72 70 72 75 78 80 90 88
.025 18 24 30 36 35 36 42 44 48 54 54 58 63 64 67 78 76 78 81 86 86 96 96
.01 24 30 36 36 40 45 48 54 60 60 64 69 72 73 84 83 88 90 92 97 102 107
7 .05 21 24 28 30 42 40 42 46 48 53 56 63 62 64 68 72 76 79 91 84 89 92 97
.025 21 28 30 35 42 41 45 49 52 56 58 70 68 73 77 80 84 86 98 96 98 102 105
.01 28 35 36 42 48 49 53 59 60 65 77 75 77 84 87 91 93 105 103 108 112 115
Note· This table furnishes upper critical values of n:n2l), the Kolmogoiov-Smirnov test statistic I) multiplied
by the two sample sizes m, and η λ . Sample sizes n, are given at the left margin of the table, while sample sizes
π ι are given across its top at the heads of the columns. The three values furnished at the intersection of two
samples sizes represent the following three two-tailed probabilities: 0.05. 0.025, and 0.01
f o r two samples with m, 16 and /?_> 10, the 5"ό critical value of n { n j ) is K4 Any value of n t n J ) > 84
will be significant at / ' < 0.05.
When a one-sided test is desired, approximate probabilities can be obtained from this table by doubling
the nominal ct values. However, these are not exact, since the distribution of cumulative frequencies is discrete.
This table was copied from table 55 in F S. Pearson and H. O. Hartley, Biometrika Tables for Statisticians,
Vol. II (Cambridge University Press, London 1972) with permission of the publishers.
APPENDIX 2 / STATISTICAL TABLES 347
TABLEzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
XIII
continued
1 1 α 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2 0 2 1 22 23 24 25
12 .05 24 30 36 4 3 48 53 60 63 66 72 84 81 86 9 3 9 6 100 108 108 116 120 124 125 144 138
.025 24 33 4 0 45 54 56 64 69 72 76 96 84 9 4 9 9 104 108 120 120 124 129 134 137 156 150
.01 - 36 44 50 60 60 68 75 80 86 96 9 5 104 108 116 119 126 130 140 141 148 149 168 165
13 .05 26 33 39 4 5 52 56 62 65 70 75 81 91 89 9 6 101 105 110 114 120 126 130 1 3 5 140 145
.025 26 36 4 4 47 54 58 65 72 77 84 84 104 1 0 0 104 111 114 120 126 130 137 141 146 151 158
.01 39 4 8 52 60 65 72 78 84 91 9 5 117 104 115 121 127 131 138 1 4 3 1 5 0 156 161 166 172
14 .05 26 36 4 2 46 54 63 64 70 74 82 8 6 89 112 9 8 106 111 116 121 126 140 138 142 146 150
.025 28 39 44 51 58 70 70 76 82 87 9 4 100 112 110 116 122 126 1 3 3 138 147 148 154 160 166
.01 42 4 8 56 64 77 76 84 90 9 6 104 104 126 1 2 3 126 134 140 148 152 161 164 170 176 182
15 .05 28 36 44 55 57 62 67 75 8 0 84 9 3 9 6 98 120 1 14 116 1 2 3 127 1 3 5 138 144 149 156 160
.025 30 39 45 55 63 68 74 81 9 0 94 9 9 104 110 135 119 129 135 141 150 1 5 3 154 163 168 175
.01 42 52 60 69 75 81 90 100 102 108 115 1 2 3 1 3 5 1 3 3 142 147 152 160 168 1 7 3 179 186 195
16 .05 30 39 4 8 54 60 64 80 78 84 89 9 6 101 106 114 128 124 128 133 140 1 4 5 150 157 168 167
.025 32 42 52 59 64 73 80 85 9 0 9 6 104 111 116 119 144 136 140 145 156 157 164 169 184 181
.01 45 56 6 4 72 77 88 9 4 100 106 116 121 126 1 3 3 160 143 154 160 168 1 7 3 1 8 0 187 2 0 0 199
17 .05 32 42 48 5 5 62 68 77 82 89 9 3 100 105 111 116 124 136 133 141 146 151 157 1 6 3 168 173
.025 34 45 52 6 0 67 77 80 90 96 102 108 114 122 129 136 153 148 151 160 166 170 179 183 190
.01 48 60 68 73 84 88 9 9 106 1 1 0 119 127 134 142 143 170 164 166 175 180 187 196 2 0 3 207
18 .05 34 45 50 60 72 72 80 9 0 92 9 7 108 110 116 1 2 3 128 133 162 142 152 159 164 170 180 180
.025 36 48 54 6 5 78 80 86 9 9 1 0 0 107 1 2 0 1 2 0 126 1 3 5 140 148 162 159 166 174 178 184 198 196
.01 51 6 0 70 84 87 9 4 1 0 8 108 118 126 131 1 4 0 147 154 164 180 176 182 189 196 2 0 4 2 1 6 2 1 6
19 .05 36 45 5 3 61 70 76 82 89 9 4 102 108 114 121 127 133 141 142 171 160 1 6 3 169 177 183 187
.025 38 51 57 66 76 84 9 0 9 8 103 111 120 126 133 141 145 151 159 190 169 1 8 0 185 190 199 2 0 5
.01 38 54 6 4 71 83 91 9 8 107 1 1 3 122 130 138 148 152 160 166 176 1 9 0 187 199 2 0 4 2 0 9 2 1 8 224
20 .05 38 48 60 65 72 7 9 88 9 3 110 107 116 120 126 135 140 146 152 160 1 8 0 173 176 184 192 2(X)
.025 40 51 6 4 75 78 86 9 6 100 120 116 124 130 138 150 156 160 166 169 2(X> 180 192 199 2 0 8 2 1 5
.01 40 57 6 8 80 88 9 3 104 111 130 127 140 143 152 160 168 175 182 187 2 2 0 199 2 1 2 2 1 9 2 2 8 2 3 5
21 .05 38 51 59 6 9 75 91 89 9 9 105 112 120 126 140 138 1 4 5 151 159 163 173 189 183 189 198 202
.025 40 54 6 3 74 81 98 9 7 108 116 123 129 137 147 153 157 166 174 180 1 8 0 2 1 0 2 0 3 2 0 6 2 1 3 2 2 0
.01 42 57 72 80 9 0 105 107 117 126 134 141 150 161 168 1 7 3 180 189 199 199 231 2 2 3 2.2.7 237 2 4 4
22 .05 40 51 62 70 78 84 94 101 108 121 124 130 138 141 150 157 164 169 176 183 198 194 204 2 0 9
.025 42 57 6 6 78 86 9 6 102 1 10 118 132 134 141 148 154 164 170 178 185 192 2 0 3 2 2 0 2 1 4 2 2 2 228
.01 44 60 72 83 92 103 112 122 130 143 148 156 164 173 180 187 196 204 212 2 2 3 242 237 242 2 5 0
23 .05 42 54 64 72 80 89 9 8 106 1 14 119 125 135 142 149 157 163 170 177 184 189 194 2 3 0 2 0 5 216
.025 44 60 6 9 80 86 9 8 106 115 124 131 137 146 154 163 169 179 184 190 199 2 0 6 214 2 3 0 2 2 6 237
.01 46 63 76 87 9 7 108 115 126 137 142 149 161 170 179 187 196 204 2 0 9 2 1 9 227 237 2 5 3 2 4 9 2 6 2
24 .05 44 57 6 8 76 9 0 9 2 104 111 118 124 144 140 146 156 168 168 180 183 192 198 204 2 0 5 2 4 0 2 2 5
.025 16 60 72 81 9 6 102 112 120 128 137 156 151 160 168 184 183 198 199 2,08 2 1 3 222 226 2 6 4 2 3 8
01 48 66 80 9 0 102 112 128 132 140 150 168 166 176 1 86 2 0 0 2 0 3 2 1 6 2 1 8 22.8 2 3 7 242 249 288 262
25 .(15 •16 60 6 8 80 88 97 104 114 1 2 5 129 138 1-15 150 160 167 173 180 187 2 0 0 202 2 0 9 2 1 6 2 2 5 2 5 0
.025 48 63 75 9 0 96 105 112 123 135 140 150 158 166 175 181 19() 196 2 0 5 2 1 5 2 2 0 2 2 8 2 3 7 2.18 2 7 5
.01 50 69 84 9 5 107 115 125 135 150 154 165 172 182 195 199 207 216 224 2 3 5 2 4 4 2 5 0 262 262 MX)
348 APPENDIX 2 / STATISTICAL TABLES 348
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB
t a b l e XIV
Critical values for Kendall's rank correlation coefficient τ
α
η 0.10 0.05 0.01
4 1.000
5 0.800 1.000
Note: This tabic furnishes 0.10, 0.05, and 0.01 critical values for Kendall's rank correlation coefficient τ. The
probabilities arc for a two-tailed test When a one-tailed test is desired, halve the probabilities at the heads of
the columns.
To test the significance of a correlation coefficient, enter the table with the appropriate sample size and
find the appropriate critical value. For example, for a sample size of 15, the 5% and 1% critical values of τ are
0.390 and 0.505, respectively. Thus, an observed value of 0.49X would be considered significant at the 5% but
not at the 1% level. Negative correlations are considered as positive for purposes of this test. For sample sizes
/i > 40 use the asymptotic approximation given in Box 12.3, step 5.
The values in this table have been derived from those furnished in table XI of J. V. Bradley, Distribution-Free
Statistical Tests (Prentice-Hall, F.nylewood Cliffs. Ν .1 , I96X) with permission of the author and publisher.
Bibliography
Allee, W. C., and E. Bowen. 1932. Studies in animal aggregations: Mass protection against
colloidal silver among goldfishes. J. Exp. Zool.,zyxwvutsrqponmlkjihgfedcbaZYXW
61:185 207.
Allee, W. C., E. S. Bowen, J. C. Welty, and R. Oesting. 1934. The effect of homotypic
conditioning of water on the growth of fishes, and chemical studies of the factors
involved. J. Exp. Zool., 68:183-213.
Archibald, Ε. E. A. 1950. Plant populations. II. The estimation of the number of individ-
uals per unit area of species in heterogeneous plant populations. Ann. Bot. N.S.,
14:7-21.
Banta, A. M. 1939. Studies on the physiology, genetics, and evolution of some Cladocera.
Carnegie Institution of Washington, Dept. Genetics, Paper 39. 285 pp.
Blakeslee, A. F. 1921. The globe mutant in thejimson weed (Datura stramonium). Genetics,
6:241 264.
Block, B. C. 1966. The relation of temperature to the chirp-rate of male snowy tree
c r i c k e t s , Oecanthus fultoni ( O r t h o p t e r a : G r y l l i d a e ) . Ann. Enlomol. Soc. Amer., 59:
56-59.
Brower, L. P. 1959. Speciation in butterflies of the Papilio glaucus group. I. Morpho-
logical relationships and hybridization. Evolution, 13:40 63.
Brown, Β. E., and A. W. A. Brown. 1956. The effects of insecticidal poisoning on the level
of cytochrome oxidase in the American cockroach. J. Econ. Enlomol., 49:675 679.
350 BIBLIOGRAPHY
Wilkinson, L., and G. E. Dallal. 1977. Accuracy of sample moments calculations among
widely used statistical programs.zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHG
Amer. Stat., 31:128131.
Williams, D. A. 1976. Improved likelihood ratio tests for complete contingency tables.
Biometrika,zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
63:3337.
Willis, E. R., and N. Lewis. 1957. The longevity of starved cockroaches. J. Econ.
Entomol., 50:438440.
Willison, J. T„ and L. M. Bufa. 1983. Myocardial infarction—1983. Clinical Res.,
31 :364375.
Woodson, R. E., Jr. 1964. The geography of flower color in butterflyweed. Evolution,
18 :143163.
Young, Β. H. 1981. A study of striped bass in the marine district of New York State.
Anadromous Fish Act, P.L. 89304. Annual report, New York State Dept. of
Environmental Conservation. 21 pp.
Index
A n a l y s i s of v a r i a n c ezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
continued B r o w n , F . M., 293. 350
See also Single-classification a n a l y s i s Bufa. L. M „ 221, 352
of v a r i a n c e
table, 1 5 0 - 1 5 1
two-way, 185-207.
See also T w o - w a y a n a l y s i s of v a r i a n c e CD (coefficient of d i s p e r s i o n ) , 69
A n g u l a r t r a n s f o r m a t i o n , 218 C T ( c o r r e c t i o n term), 39, 161
A n o v a . See A n a l y s i s of v a r i a n c e χ2zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQP
(chisquare), 112
A n t i m o d e , 33 χ Ι Μ (critical c h i - s q u a r e at p r o b a b i l i t y
A r c h i b a l d , Ε. Ε. Α., 16, 18, 349 level ot, a n d d e g r e e s of f r e e d o m v),
A r c s i n e t r a n s f o r m a t i o n , 218 113, Table IV, 3 2 4
Arithmetic mean. 2 8 - 3 0 C a l c u l a t o r , 25
A r i t h m e t i c p r o b a b i l i t y g r a p h p a p e r , 86 C a r t e r , G . R.. 264, 350
A r r a y , 16 C a u s a t i o n , 257
A s s o c i a t i o n , 312 C e n t r a l limit t h e o r e m , 94
d e g r e e of, 269 C e n t r a l t e n d e n c y , m e a s u r e of. 28
test of. See Test of i n d e p e n d e n c e Character, 7
A s s u m p t i o n s in r e g r e s s i o n . 233 - 2 3 4 C h i - s q u a r e (χ 2 ). 112
Attributes, 9 - 1 0 Chi-square distribution, 112-114
A v e r a g e . See M e a n s a m p l e statistic o f f * 2 ) , 130, 3 0 0
A v e r a g e v a r i a n c e w i t h i n g r o u p s . 136 C h i - s q u a r e table, Table IV, 3 2 4
C h i - s q u a r e test, 3 0 0 - 3 0 1
of d i f f e r e n c e b e t w e e n s a m p l e a n d
parametric variance, 1 2 9 - 1 3 0
b ( r e g r e s s i o n coefficient), 232 f o r g o o d n e s s of fit, 3 0 0 - 3 0 1
bY X (regression coefficient of v a r i a b l e Y o n Class(es), 134
v a r i a b l e X). 232 g r o u p i n g of, 1 8 - 2 3
β ( p a r a m e t r i c v a l u e for r e g r e s s i o n C l a s s interval, 19
coefficient), 233 C l a s s limits, implied, 11, 19
β t (fixed t r e a t m e n t effect of f a c t o r Β o n j t h C l a s s m a r k , 19
g r o u p ) , 195 C l u m p e d d i s t r i b u t i o n , 58, 66, 70
B a n t a , A. M . . 169, 349 Clumping:
Bar d i a g r a m , 23 as a d e p a r t u r e f r o m b i n o m i a l
Belt, c o n f i d e n c e , 255 d i s t r i b u t i o n , 58
Bernoulli, J.. 3 as a d e p a r t u r e f r o m P o i s s o n d i s t r i b u t i o n ,
Biased e s t i m a t o r . 38 66, 70
B i m o d a l d i s t r i b u t i o n , 33, 85 C o d i n g of d a t a . 4 0 43
B i n o m i a l d i s t r i b u t i o n . 54 64. 296 a d d i t i v e , 40
c l u m p i n g in, 58 - 6 0 combination, 40
c o n f i d e n c e limits for, 227, Table I X , 333 multiplicative, 40
g e n e r a l f o r m u l a for, 6t Coefficient:
p a r a m e t e r s of, 6 0 c o r r e l a t i o n . See C o r r e l a t i o n coefficient
r e p u l s i o n in, 58 60 of d e t e r m i n a t i o n , 276
B i n o m i a l p r o b a b i l i t y ( p , q ) , 54 of d i s p e r s i o n (CD), 69
p a r a m e t r i c (p, q), 6 0 of r a n k c o r r e l a t i o n , K e n d a l l ' s (τ), 286 290
B i o a s s a y , 262 c o m p u t a t i o n of. Box 123, 287 289,
Biological statistics, t Table XI C, 348
B I O M c o m p u t e r p r o g r a m s , 25 regression. See Regression coefficient
Biometry, I of v a r i a t i o n (I ), 43
Bioslatistics, I s t a n d a r d e r r o r of, 102, 110
h i s t o r y of, 2 4 C o m b i n a t i o n coding, 40
b i v a r i a l e n o r m a l d i s t r i b u t i o n . 272 Comparisons:
Bivariate sample. 7 p a i r e d , 204 207, 225 228, 277 279.
B i v a r i a t e s c a t t e r g r a m , 272 See also P a i r e d c o m p a r i s o n s
Blakeslee, A. K , 209, 349 tests, multiple, 181
Block, B. C., 261, 349 C o m p u t e d variables, 13
B o n f e r r o n i m e t h o d . 178 179 C o m p u t e r , 25
B o w e n , Γ·., 228, 349 C o m s t o c k , W. P., 293, 350
B r o w e r , L. P., 290, 349 C o n d i t i o n s for n o r m a l f r e q u e n c y
B r o w n , A. W. Α., 182, 349 d i s t r i b u t i o n s , 76 78
INDEX 355
DifferencezyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
continued of m e a n , 41, Box 5.1, 8 8 8 9
b e t w e e n t w o regression coefficients, of standard deviation, 41, Box 5.1, 8 8 8 9
256257 of value of Y in regression, 237
between t w o variances: Estimators:
c o m p u t a t i o n of, Box 7.1, 142 biased, 38
testing significance of, 1 4 2 1 4 3 unbiased, 38, 103
D i s c o n t i n u o u s variable, 9 Events, 50
Discrepance, 203 independence of, 52
Discrete variables, 9 Expected frequencies, 5 6 5 7
Dispersion: a b s o l u t e , 57
coefficient of, 69 binomial, 5 6 5 7
statistics of, 28, 3 4 4 3 normal, 79
Distribution: P o i s s o n , 68
bimodal, 33. 85 relative, 5 6 5 7
binomial, 5 4 6 4 , 296 Expected m e a n squares, 1 6 3 1 6 4
bivariate normal, 272 Expected value, 98
chisquare, 1 1 2 1 1 4 , Table IV, 324 for Y, given X, 237
clumped, 58, 66, 70 Explained mean square, 251
c o n t a g i o u s , 59, 6 6 Explained s u m of squares, 241
F, 1 3 8 1 4 2 , Table V, 326 Extrinsic hypothesis, 3 0 0
frequency, 1 4 2 4
function, cumulative and normal, 79
leptokurtic, 85
of means, 9 4 1 0 0 / (observed frequency), 57
m u l t i m o d a l , 33 / (absolute expected frequencies), 57
multinomial, 299, 319
f i j (observed frequency in row i and
normal, 16, 7 4 9 1
c o l u m n j), 311
platykurtic, 85
,/ rcl (relative expected frequency), 57
Poisson, 6 4 7 1
F (variance ratio), 138 142
probability, 47, 56
F, (sample statistics of F distribution), 138
repulsed, 5 8 6 0 , 66, 71
F\ l v i Vi) (critical value of the F distribution),
Student's /, 106 108, Table I I I , 323
141, Table V, 326
Distributionfree m e t h o d s . See
( m a x i m u m variance ratio), 213,
N o n p a r a m e t r i c tests Table VI, 330
D o b z h a n s k y , T„ 44, 158, 3 5 0 / distribution, 138 142, Table V, 326
D o s a g e s , 262 critical value of (F, | v , vj| ), 141, Table V,
D o s a g e m o r t a l i t y curves, 262 326
s a m p l e statistics of (/· ,), 138
F test, onetailed, 140
F' test, twotailed, 141
(random deviation of the /th individual f , „ test, 213
of g r o u p 11, 155 Factorial, mathematical operation, 61
F.D S 0 (median effective dose), 33 Firschcin, I. L„ 44, 158, 350
Effects: Fisher, R. Α., 3, 133, 139, 283
main, 194 Freedom, degrees of, 38, 298 301
r a n d o m group, 149, 157 Frei, M„ 266, 352
treatment. 143 French, A. R„ 210, 350
F.hrlich, Ρ R., 312 Frequencies:
Empirically fitted curves, 258 absolute expected ( / ) , 57
Equality of a s a m p l e variance and a observed ( / ) , 57
parametric variance, 129 130 relative expected (/„.,), 56 57
Error(s): Frequency distribution, 14 24
independence of, 212 213 c o m p u t a t i o n of median of, 32
m e a n square, 153 of c o n t i n u o u s variables, 18 24, 75 76
standard. See Standard error graphic test for normalily of. Box 5.1,
type 1, 1 1 6 1 2 1 8889
type II, 117 125 l s h a p e d , 16, 69
Frror rate, experimentwise. 178 mcristic, 18
Estimate: normal, 16, 7 4 9 1
of added variance c o m p o n e n t , 167 168 preparation of, Box 2 . / , 20 21
INDEX 357
LzyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
(likelihood ratio), 298 harmonic, 31
L t (lower c o n f i d e n c e limit), 104 m e a n of (F), 136
L 2 (upper confidence limit), 104 of P o i s s o n distribution, 6 8 6 9
L D 5 0 (median lethal dose), 33 parametric (μ ), 38
Laplace, P. S., 3 sample, 38
Latshaw, W. L., 208, 351 of a sample, 30
Least squares, 235
a n d ranges, correlation between, 211
Lee, J. A. H „ 17, 3 5 0
standard error of, 102
Leinert, J., 200, 3 5 0
s u m of the deviations from, 37, 3 1 4 3 1 5
Leptokurtic curve, 85
t test of the difference between two,
Level, significance, 1 1 8 1 2 1
169173
Lewis, N., 142, 352
variance a m o n g , 98, 1 3 6 1 3 7
L e w o n t i n , R. C„ 313, 3 5 0
a n d variances, correlation between, 2 1 4
L i k e l i h o o d ratio test, 298
weighted, 30, 98
Limits:
M e a n square(s) (MS), 37, 151
confidence. See C o n f i d e n c e limits for deviations from regression (MSr x),
implied class, 1 1 , 1 9 (s}.x), 248
Linear regression. See Regression error, 153
Littlejohn, M. J., 131, 3 5 0 expected value of, 1 6 3 1 6 4
Liu, Υ. K , 36, 287, 3 5 0 explained, 251
L o c a t i o n , statistics of, 2 8 3 4 individual, 153
L o g likelihood ratio test, 298 intragroup, 153
sample statistic of (G), 298 d u e to linear regression ( M S f ) , (sy),
L o g a r i t h m i c transformation, 218, 2 6 0 248, 251
total, 153, 251
unexplained, 251
M S (mean square), 151 Measurement variables, 9
MS γ (mean square due to regression), 248 Median, 3 2 3 3
M S J X (mean square for deviations from effective d o s e ( E D 5 0 ) , 33
regression), 248 lethal d o s e ( L D 5 0 ) , 33
μ (parametric mean), 38 standard error of, 102
confidence limits for, Box 6.2, 109 Meredith, Η. V., 205, 3 5 0
μ γ (expected value for variable F for any Meristic frequency distribution, 18
given value of A:), 233 Meristic variables, 9
μ f (expected value for ?,), 255 Μ id range, 41
M a i n effects, 194 Miller, L„ 278
M a n n W h i t n e y sample statistic (CJJ, 220 Miller, R. L„ 26, 183, 351
M a n n W h i t n e y statistic ( , „ ;| ), 222, Millis, J., 24, 42, 182, 350
Table XI. 339 Mitchell, C. Α., 264, 350, 355
M a n n W h i t n e y Utest, 220 222 Mittler, T. E„ 313, 350, 356
c o m p u t a t i o n for, Box 1(1./, 221-222 Mixed model t w o w a y anova, 186, 199
critical values in, 222, Table X I , 339 Mode, 3 3 3 4
Mean(s): M o d e l I anova, 148, 1 5 4 1 5 6
arithmetic ( ? ) , 2 8 3 0 M o d e l I regression:
c o m p a r i s o n of: a s s u m p t i o n s for, 2 3 3 2 3 4 , 269 2 7 0
planned, 173 179 with o n e Y per X , 2 3 5 2 4 3
unplanned, 179 181 with several Y's per X, 2 4 3 2 4 9
c o m p u t a t i o n of, 39 4 3 M o d e l II anova, 1 4 8 1 5 0 , 157 158
from a frequency dislribution, Box 3.2. twoway, 1 8 5 2 0 7
42 Model II regression, 234 235, 269 270
from unordered data. Box 3.1, 41 M o s i m a n n , J. E., 53, 350
conftdencc limits for, 1 0 9 1 1 1 M u l t i m o d a l distribulions, 33
deviation from (V), 36 M u l t i n o m i a l distributions, 299, 319
difference between two, 1 6 8 1 7 3 Multiple c o m p a r i s o n s tests, 181
distribution of, 94 100 Multiplicative coding, 40
equality of two, 168 173
estimates of, 38
g e o m e t r i c ( G M r ) , 31 η (sample size), 29
graphic estimate of, o n probability paper, n„ (average sample size in analysis of
87 89. Box 5.1. 8 8 8 9 v a r i a n c e ) 168
INDEX
T w o b y t w o tests of i n d e p e n d e n c e , 3 0 8 3 1 0 a m o n g groups, 1 3 6 1 3 7
c o m p u t a t i o n for,zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Box 13.2, 309 h o m o g e n e i t y of, 2 1 3 2 1 4
T w o t a i l e d F test, 141 a m o n g means, 1 3 6 1 3 7
T w o t a i l e d test, 64, 122 of m e a n s , 98
T w o w a y analysis of variance, parametric (σ 2 ), 38
with replication, 1 8 6 1 9 7 sample, 38
c o m p u t a t i o n of, Box 9.1, 187 of s u m of t w o variables, 318
w i t h o u t replication, 199 2 0 7 Variance ratio ( f ) , 1 3 8 1 4 2
c o m p u t a t i o n of, Box 9.2, 2 0 0 m a x i m u m ( F m J , 213, Table VI, 330
significance testing for, 1 9 7 1 9 9 Variate, 10
T w o w a y frequency distributions, 307 Variation, coefficient of, 4 3
T w o w a y frequency table, 307 Vollenweider, R. Α., 266, 352
test of i n d e p e n d e n c e for, 3 0 5 3 1 2
T y p e I error. 1 1 6 1 2 1
T y p e II error, 1 1 7 1 2 5
h\ (weighling factor), 98
W e b e r F e c h n e r law, 260
W e i g h t e d average, 30, 98
Us ( M a n n W h i t n e y s a m p l e statistic). 220 W e i g h t i n g factor (w,), 98
ί/,Ιαι „2) ( M a n n W h i t n e y statistic), 222, W e l d o n , W. F. R„ 3
Table X I , 339 Weltz, S. C,., 349
Ushaped frequency distributions, 16, 33 White, M. J. D „ 313, 3 5 0
ϋtest, MannWhitney, 2 2 0 2 2 2 W i l c o x o n ' s signedranks test, 2 2 5 2 2 7
computation for, Box !0.1, 221. c o m p u l a t i o n for, Box 10.3, 2 2 6
critical values in, 222, Table X I , 339 critical value of rank s u m , 227, Table X I I ,
U n b i a s e d estimators, 38, 103 343
U n e x p l a i n e d m e a n square, 251 W i l k i n s o n , L., 190, 352
U n e x p l a i n e d s u m of squares, 241 Williams. A. W , 229, 351
U n i o n , 50 Williams. D. Α., 304. 352
Universe, 8 Williams' correction, 3 0 4 305, 308
U n o r d e r e d data, c o m p u t a t i o n of Ϋ a n d s Willis, E. R„ 142, 352
from, Box 3 1, 41 Willison, J. T.. 221, 3 5 2
U n p l a n n e d c o m p a r i s o n s , 174 W o o d s o n , R. F.„ Jr., 228, 3 5 2
a m o n g means, 1 7 4 1 7 9 Wrighl, S„ 158, 2 2 6
Utida, S„ 71, 351
S T A T I S T I C S M A N U A L , E d w i n L. C r o w et al. C o m p r e h e n s i v e , p r a c t i c a l c o l l e c t i o n
of c l a s s i c a l a n d m o d e r n m e t h o d s p r e p a r e d b y U . S . N a v a l O r d n a n c e T e s t S t a t i o n .
S t r e s s o n use. Basics of statistics a s s u m e d . 2 8 8 p p . 5 S χ 8'ί. 0-486-60. 1 3.9.9-X
S O M E T H E O R Y O F S A M P L I N G , W i l l i a m E d w a r d s D o m i n g . A n a l y s i s of t h e
p r o b l e m s , t h e o r y a n d d e s i g n of s a m p l i n g t e c h n i q u e s f o r social scientists, i n d u s t r i a l
m a n a g e r s a n d o t h e r s w h o f i n d statistics i m p o r t a n t at w o r k , til t a b l e s . 9 0 f i g u r e s , xvii
+<>0'2pp. ΓΛ χ Η1/·. 0-48(i-64()84-X
IN T R O D U C T I O N T O T H E ' T H E O R Y O F G A M E S , ]. C . C . M c K i n s e y . This
( o m p r e h e n s i v e o v e r v i e w ol t h e m a t h e m a t i c a l t h e o r y ol g a m e s illustrates a p p l i c a t i o n s
lo s i t u a t i o n s i n v o l v i n g < o n l l i e l s ol i n t e r e s t , i n c l u d i n g e c o n o m i c , sot ial, p o l i t i c a l , a n d
m i l i t a r y c o n t e x t s . A p p r o p r i a t e for a d v a n c e d u n d e r g r a d u a t e a n d g r a d u a t e c o u r s e s ;
a d v a n c e d c a l c u l u s a p r e r e q u i s i t e . 1052 e d . x + , 3 7 2 p p . Ft'i. χ S7. 0-tKti1281 1 7
I II FY C H A L L E N G I N G P R O B L E M S IN P R O B A B I L I T Y W I T H S O L U T I O N S ,
F r e d e r i c k M o s l e l l e i . R e m a r k a b l e pu/./.lers, g r a d e d in dilTicultv, illustrate e l e m e n t a r y
a n d a d v a n c e d a s p e c t s ol p r o b a b i l i t y . D e t a i l e d solutions. 8 8 p p . ίι'ί, χ 8'/. 0 I8(> ti.r>3,r>.r> 2
P R O B A B I L I T Y T H E O R Y : A C O N C I S E C O U R S E , Y. A. R o z a n o v . H i g h l y r e a d
a b l e , sell c o n t a i n e d inlrodiK tion ( o v e r s c o m b i n a t i o n of e v e n t s , d e p e n d e n t e v e n l s ,
Bernoulli (rials, e t c . I 18pp. χ 8'/,. 0 I8(i (.3,Γ> I t <1
I N T E G R A L E Q U A T I O N S , F. G . T r i c o m i . A u t h o r i t a t i v e , w e l l - w r i t t e n t r e a t m e n t
of e x t r e m e l y u s e f u l m a t h e m a t i c a l t o o l w i t h w i d e a p p l i c a t i o n s . V o l t e r r a E q u a t i o n s ,
F r e d h o l m E q u a t i o n s , m u c h m o r e . A d v a n c e d u n d e r g r a d u a t e to g r a d u a t e level.
E x e r c i s e s . B i b l i o g r a p h y . 2 3 8 p p . 5 it χ 81/,. 0-48(1-64828-1
F O U R I E R S E R I E S . G e o r g i P. T o l s t o v . T r a n s l a t e d b y R i c h a r d A . S i l v e r m a n . A valu-
a b l e a d d i t i o n t o t h e l i t e r a t u r e oil t h e s u b j e c t , m o v i n g c l e a r l y f r o m s u b j e c t to s u b j e c t
a n d t h e o r e m tu t h e o r e m . 107 p r o b l e m s , a n s w e r s . ;kit>pp. •'>% χ 8'/.. 0-ΙΚ(ί-(>,'•!;•! 17 <1
I N T R O D U C T I O N T O M A T H E M A T I C A L T H I N K I N G , F r i e d r i c hzyxwvutsrqponmlkjih Waismann.
E x a m i n a t i o n s of a r i t h m e t i c , g e o m e t r y , a n d t h e o r y of i n t e g e r s ; r a t i o n a l a n d n a t u r a l
n u m b e r s ; c o m p l e t e i n d u c t i o n ; limit a n d p o i n t ol a c c u m u l a t i o n ; r e m a r k a b l e c u r v e s ;
c o m p l e x a n d h v p e r c o m p l e x n u m b e r s , m o r e . )!)*>!) e d . 2 7 f i g u r e s . x i i + 2 6 0 p p . ~r'k χ Η1.'·.
0 - Ι 8 ( ι (>331 7-9
P O P U L A R L E C T U R E S O N M A T H E M A T I C A L L O G I C , I I a n W a n g . N o t e d logi-
c i a n ' s lucirl t r e a l m e n l of historical d e v e l o p m e n t s , sel t h e o r y , m o d e l t h e o r y , r e c u r s i o n
t h e o r y a n d c o n s t r u c t i v i s m , p r o o f t h e o r y , m o r e . 3 a p p e n d i x e s . B i b l i o g r a p h y . 1981
e d i t i o n , ix + 283|>p. Λ* χ 8'/· . (i ·18(>-(>7(>:ί2 :i
C A L C U L U S O F V A R I A T I O N S , Robert W e i n s l o c k . Basic i n l r o d u c l i o n c o v o i i n g
i s o p c t i m e l r i c p i o b l e m s , l l i e o r y ol elasticity, q u a n t u m m e c h a n i c s , e l e c t n i s t a l i c s , elc.
E x e r c i s e s t h i o u g h o u t . 3 2 ( i p p . ';>% χ 8V·. 0 186 (i:WMi!l 2