Where Can Buy Introduction To Biostatistics Second Edition Robert R. Sokal Ebook With Cheap Price
Where Can Buy Introduction To Biostatistics Second Edition Robert R. Sokal Ebook With Cheap Price
Where Can Buy Introduction To Biostatistics Second Edition Robert R. Sokal Ebook With Cheap Price
com
https://ebookgate.com/product/introduction-to-
biostatistics-second-edition-robert-r-sokal/
https://ebookgate.com/product/an-introduction-to-ecological-
economics-second-edition-robert-costanza/
https://ebookgate.com/product/an-introduction-to-chaotic-
dynamical-systems-second-edition-robert-devaney/
https://ebookgate.com/product/introduction-to-robust-estimation-
and-hypothesis-testing-second-edition-rand-r-wilcox/
https://ebookgate.com/product/an-introduction-to-the-mechanics-
of-solids-in-si-units-3rd-edition-robert-r-archer/
Expect The Unexpected A First Course In Biostatistics
Second Edition Raluca Balan
https://ebookgate.com/product/expect-the-unexpected-a-first-
course-in-biostatistics-second-edition-raluca-balan/
https://ebookgate.com/product/introduction-to-documentary-second-
edition-bill-nichols/
https://ebookgate.com/product/introduction-to-political-theory-
second-edition-graham/
https://ebookgate.com/product/introduction-to-california-plant-
life-ornduff-robert-editor/
https://ebookgate.com/product/introduction-to-compressible-fluid-
flow-second-edition-carscallen/
INTRODUCTION TO
BIOS TATIS TIC S
SECOND EDITION
D O V E R P U B L I C A T I O N S , INC.
Mineola, New York
Copyright
Bibliographical Note
T h i s D o v e r e d i t i o n , first p u b l i s h e d in 2009, is a n u n a b r i d g e d r e p u b l i c a t i o n of
t h e w o r k originally p u b l i s h e d in 1969 by W . H . F r e e m a n a n d C o m p a n y , N e w
Y o r k . T h e a u t h o r s h a v e p r e p a r e d a new P r e f a c e f o r this e d i t i o n .
S o k a l , R o b e r t R.
I n t r o d u c t i o n t o Biostatistics / R o b e r t R. S o k a l a n d F. J a m e s R o h l f .
D o v e r ed.
p. c m .
O r i g i n a l l y p u b l i s h e d : 2 n d ed. N e w Y o r k : W . H . F r e e m a n , 1969.
I n c l u d e s b i b l i o g r a p h i c a l r e f e r e n c e s a n d index.
I S B N - 1 3 : 978-0-486-46961-4
I S B N - 1 0 : 0-486-46961-1
I. B i o m e t r y . I. R o h l f , F. J a m e s , 1936- II. Title.
Q H 3 2 3 . 5 . S 6 3 3 2009
570.1 '5195 dc22
2008048052
M a n u f a c t u r e d in the U n i t e d S t a l e s of A m e r i c a
D o v e r P u b l i c a t i o n s , Inc., 31 Fast 2nd Street, M i n e o l a , N . Y . 1 1501
to Julie and Janice
Contents
PREFACE xiii
1. INTRODUCTION 1
1.1 Some definitions 1
1.2 The development of bioslatistics 2
1.3 The statistical frame of mind 4
2. D A T A IN B i O S T A T l S T I C S 6
2.1 Samples and populations 7
2.2 Variables in biostatisties 8
2.3 Accuracy and precision of data 10
2.4 Derived variables 13
2.5 Frequency distributions 14
2.6 The handling of data 24
3. D E S C R I P T I V E STATISTICS 27
3.1 The arithmetic mean 28
3.2 Other means 31
3.3 The median 32
3.4 The mode 33
3.5 The range 34
3.6 The standard deviation 36
3.7 Sample statistics and parameters 37
3.Ν Practical methods for computing mean and standard
deviation 39
3.9 The coefficient of variation 43
V1U CONTENTS
4. I N T R O D U C T I O N TO PROBABILITY DISTRIBUTIONS:
T H E B I N O M I A L A N D P O I S S O N D I S T R I B U T I O N S 46
4.1 Probability, random sampling, and hypothesis testing 48
4.2 The binomial distribution 54
4.3 The Poisson distribution 63
APPENDIXES 314
AI Mathematical appendix 314
A2 Statistical tables 320
BIBLIOGRAPHY 349
INDEX 353
Preface to the Dover Edition
We are pleased and honored to see the re-issue of the second edition of our Introduc-
tion to Biostatistics by Dover Publications. On reviewing the copy, we find there
is little in it that needs changing for an introductory textbook of biostatistics for an
advanced undergraduate or beginning graduate student. The book furnishes an intro-
duction to most of the statistical topics such students are likely to encounter in their
courses and readings in the biological and biomedical sciences.
The reader may wonder what we would change if we were to write this book anew.
Because of the vast changes that have taken place in modalities of computation in the
last twenty years, we would deemphasize computational formulas that were designed
for pre-computer desk calculators (an age before spreadsheets and comprehensive
statistical computer programs) and refocus the reader's attention to structural for-
mulas that not only explain the nature of a given statistic, but are also less prone to
rounding error in calculations performed by computers. In this spirit, we would omit
the equation (3.8) on page 39 and draw the readers' attention to equation (3.7) instead.
Similarly, we would use structural formulas in Boxes 3.1 and 3.2 on pages 41 and 42,
respectively; on page 161 and in Box 8.1 on pages 163/164, as well as in Box 12.1
on pages 278/279.
Secondly, we would put more emphasis on permutation tests and resampling methods.
Permutation tests and bootstrap estimates are now quite practical. We have found this
approach to be not only easier for students to understand but in many cases preferable
to the traditional parametric methods that are emphasized in this book.
Robert R. Sokal
F. James Rohlf
November 2008
Preface
Robert R. Sokal
F. J a m e s Rohlf
INTRODUCTION TO
BIOSTATISTICS
CHAPTER
Introduction
This c h a p t e r sets the stage for your study of biostatistics. In Section 1.1, we
define the field itself. We then cast a necessarily brief glance at its historical
development in Section 1.2. T h e n in Section 1.3 we conclude the c h a p t e r with
a discussion of the a t t i t u d e s that the person trained in statistics brings to
biological research.
sense as the scientific study of numerical data based on natural phenomena. All
p a r t s of this definition a r e i m p o r t a n t a n d deserve emphasis:
Scientific study: Statistics m u s t meet t h e c o m m o n l y accepted criteria of
validity of scientific evidence. W e m u s t always be objective in p r e s e n t a t i o n a n d
e v a l u a t i o n of d a t a a n d a d h e r e t o the general ethical code of scientific m e t h o d -
ology, or we m a y find t h a t t h e old saying t h a t "figures never lie, only statisticians
d o " applies to us.
Data: Statistics generally deals with p o p u l a t i o n s or g r o u p s of individuals;
hence it deals with quantities of i n f o r m a t i o n , not with a single datum. T h u s , t h e
m e a s u r e m e n t of a single a n i m a l or the response f r o m a single biochemical test
will generally not be of interest.
Numerical: Unless d a t a of a study c a n be quantified in one way o r a n o t h e r ,
they will not be a m e n a b l e to statistical analysis. N u m e r i c a l d a t a can be m e a -
s u r e m e n t s (the length or w i d t h of a s t r u c t u r e or t h e a m o u n t of a chemical in
a b o d y fluid, for example) o r c o u n t s (such as t h e n u m b e r of bristles or teeth).
Natural phenomena: W e use this term in a wide sense to m e a n not only all
t h o s e events in a n i m a t e a n d i n a n i m a t e n a t u r e that take place outside the c o n t r o l
of h u m a n beings, but also those evoked by scientists a n d partly u n d e r their
control, as in experiments. Different biologists will c o n c e r n themselves with
different levels of n a t u r a l p h e n o m e n a ; o t h e r k i n d s of scientists, with yet different
ones. But all would agree t h a t the chirping of crickets, the n u m b e r of peas in
a pod, and the age of a w o m a n at m e n o p a u s e are n a t u r a l p h e n o m e n a . T h e
h e a r t b e a t of rats in response to adrenalin, the m u t a t i o n rate in maize after
irradiation, or t h e incidence o r m o r b i d i t y in patients treated with a vaccine
m a y still be considered n a t u r a l , even t h o u g h scientists have interfered with t h e
p h e n o m e n o n t h r o u g h their intervention. T h e average biologist w o u l d n o t c o n -
sider the n u m b e r of stereo sets b o u g h t by p e r s o n s in different states in a given
year to be a n a t u r a l p h e n o m e n o n . Sociologists o r h u m a n ecologists, however,
might so consider it a n d deem it w o r t h y of study. T h e qualification " n a t u r a l
p h e n o m e n a " is included in the definition of statistics mostly to m a k e certain
that the p h e n o m e n a studied are not a r b i t r a r y ones t h a t are entirely u n d e r the
will a n d c o n t r o l of the researcher, such as the n u m b e r of animals e m p l o y e d in
an experiment.
T h e w o r d "statistics" is also used in a n o t h e r , t h o u g h related, way. It can
be the plural of the n o u n statistic, which refers t o any one of m a n y c o m p u t e d
or estimated statistical quantities, such as the m e a n , the s t a n d a r d deviation, o r
the correlation coefficient. Each o n e of these is a statistic.
Data in Biostatistics
Each biological discipline has its own set of variables, which may include con-
ventional m o r p h o l o g i c a l m e a s u r e m e n t s ; c o n c e n t r a t i o n s of chemicals in b o d y
fluids; rates of certain biological processes; frequencies of certain events, as in
genetics, epidemiology, a n d radiation biology; physical readings of optical or
electronic machinery used in biological research; and m a n y more.
We have already referred to biological variables in a general way, but we
have not yet defined them. We shall define a variable as a properly with respect
to which individuals in a sample d i f f e r in some ascertainable way. If t h e property
does not differ within a s a m p l e at h a n d or at least a m o n g the samples being
studied, it c a n n o t be of statistical interest. Length, height, weight, n u m b e r of
teeth, vitamin ( ' c o n t e n t , and genotypes are examples of variables in o r d i n a r y ,
genetically and phcnotypically diverse g r o u p s of organisms. W a r m - b l o o d e d n e s s
in a g r o u p of m a m m a l s is not, since m a m m a l s are all alike in this regard.
2 . 2 / VARIABLES IN BIOSTATISTICS 9
Variables
Measurement variables
Continuous variables
Discontinuous variables
Ranked variables
Attributes
Measurement variables are those measurements and counts that are expressed
numerically. M e a s u r e m e n t variables are of t w o kinds. T h e first kind consists of
continuous variables, which at least theoretically can assume an infinite n u m b e r
of values between a n y t w o fixed points. F o r example, between the t w o length
m e a s u r e m e n t s 1.5 a n d 1.6 cm there are an infinite n u m b e r of lengths that could
be m e a s u r e d if o n e were so inclined a n d h a d a precise e n o u g h m e t h o d of
calibration. Any given reading of a c o n t i n u o u s variable, such as a length of
1.57 m m , is therefore an a p p r o x i m a t i o n to the exact reading, which in practice
is u n k n o w a b l e . M a n y of the variables studied in biology arc c o n t i n u o u s vari-
ables. Examples are lengths, areas, volumes, weights, angles, temperatures,
periods of time, percentages, c o n c e n t r a t i o n s , a n d rates.
C o n t r a s t e d with c o n t i n u o u s variables are the discontinuous variables, also
k n o w n as meristic or discrete variables. These are variables that have only cer-
tain fixed numerical values, with no intermediate values possible in between.
T h u s the n u m b e r of segments in a certain insect a p p e n d a g e may be 4 or 5 or
6 but never 5l or 4.3. Examples of d i s c o n t i n u o u s variables are n u m b e r s of a
given s t r u c t u r e (such as segments, bristles, leel h, or glands), n u m b e r s of offspring,
n u m b e r s of colonics of m i c r o o r g a n i s m s or animals, or n u m b e r s of plants in a
given q u a d r a t .
Some variables c a n n o t be m e a s u r e d but at least can be ordered or r a n k e d
by their m a g n i t u d e . T h u s , in an experiment one might record the rank o r d e r
of emergence o f t e n p u p a e without specifying the exact time at which each p u p a
emerged. In such cases we code the d a t a as a ranked variable. I he o r d e r of
emergence. Special m e t h o d s for dealing with such variables have been devel-
oped, and several arc furnished in this book. By expressing a variable as a series
of ranks, such as 1,2, 3, 4. 5, we d o not imply that the difference in m a g n i t u d e
between, say, r a n k s I and 2 is identical lo or even p r o p o r t i o n a l to the dif-
ference between r a n k s 2 a n d 3.
Variables that c a n n o t be measured but must be expressed qualitatively are
called attributes, or nominal variables. These are all properties, such as black
or white, p r e g n a n t or not p r e g n a n t , d e a d or alive, male or female. W h e n such
attributes are c o m b i n e d with frequencies, they can be treated statistically. Of
80 mice, we may, for instance, state that four were black, t w o agouti, and the
10 CHAPTER 2 / DATA IN BIOSTATISTICS
Color Frequency
Black 4
Agouti 2
Gray 74
T o t a l n u m b e r of m i c e 80
Implied limits
one, an easy rule to remember is that the number of unit steps from the smallest
to the largest measurement in an array should usually be between 30 a n d 300.
Thus, if we are measuring a series of shells to the nearest millimeter a n d the
largest is 8 m m and the smallest is 4 m m wide, there are only four unit steps
between the largest a n d the smallest measurement. Hence, we should measure
our shells to one m o r e significant decimal place. Then the two extreme measure-
ments might be 8.2 m m a n d 4.1 mm, with 41 unit steps between them (counting
the last significant digit as the unit); this would be an a d e q u a t e n u m b e r of unit
steps. T h e reason for such a rule is that an error of 1 in the last significant digit
of a reading of 4 m m would constitute an inadmissible error of 25%, but an e r r o r
of 1 in the last digit of 4.1 is less t h a n 2.5%. Similarly, if we measured the height
of the tallest of a series of plants as 173.2 cm a n d that of the shortest of these
plants as 26.6 cm, the difference between these limits would comprise 1466 unit
steps (of 0.1 cm), which are far too many. It would therefore be advisable to
record the heights to the nearest centimeter, as follows: 173 cm for the tallest
and 27 cm for the shortest. This would yield 146 unit steps. Using the rule we
have stated for the n u m b e r of unit steps, we shall record two or three digits for
most measurements.
The last digit should always be significant; that is, it should imply a range
for the true measurement of from half a "unit step" below to half a "unit step"
above the recorded score, as illustrated earlier. This applies to all digits, zero
included. Zeros should therefore not be written at the end of a p p r o x i m a t e n u m -
bers to the right of the decimal point unless they are meant to be significant
digits. T h u s 7.80 must imply the limits 7.795 to 7.805. If 7.75 to 7.85 is implied,
the measurement should be recorded as 7.8.
When the n u m b e r of significant digits is to be reduced, we carry out the
process of rounding off numbers. The rules for r o u n d i n g off are very simple. A
digit to be rounded off is not changed if it is followed by a digit less than 5. If
the digit to be rounded off is followed by a digit greater than 5 or by 5 followed
by other nonzero digits, it is increased by 1. When the digit to be rounded off
is followed by a 5 standing alone or a 5 followed by zeros, it is unchanged if it
is even but increased by 1 if it is odd. T h e reason for this last rule is that when
such numbers are summed in a long series, we should have as m a n y digits
raised as arc being lowered, on the average; these changes should therefore
balance out. Practice the above rules by r o u n d i n g off the following n u m b e r s to
the indicated n u m b e r of significant digits:
26.58 2 27
133.71 37 5 133.71
0.03725 3 0.0372
0.03715 3 0.0372
18,316 2 8.000
17.3476 3 17.3
2 . 4 / DERIVED VARIABLES 13
10 l· 25
/
0 I III· I . ll.l I.
10
100
. I. ...il..i .i.llililillilHi hull Li. 11 ι ,ι,ι
30 r
20
10
500
i I L
0
70
60
50
2000
40
30
20
10
f i g u r k 2.1
S a m p l i n g from ;i p o p u l a t i o n of birth weights of i n f a n t s (a c o n t i n u o u s variable). Λ. Λ s a m p l e of 25.
Β. Λ s a m p l e of KM). C. A s a m p l e of 500. D. Λ s a m p l e of 2(XX).
16 CHAPTER 2 / DATA IN BIOSTATISTICS
200 -
£ 150 -
JJ _ FIGURE 2 . 2
2 '"" B a r d i a g r a m . F r e q u e n c y of t h e sedge Car ex
£ flacca in 500 q u a d r a t s . D a t a f r o m T a b l e 2.2;
I orginally f r o m A r c h i b a l d (1950).
0 1 2 3 4 5 (i 7 S
N u m b e r of p l a n t s q u a d r a t
Variable Frequency
V /
9 I
8 1
7 4
6 3
5 1
4 1
Phenolype J
A- 86
an 32
This tells us that there are two classes of individuals, those identifed by the A -
phenotype, of which 86 were f o u n d , a n d those comprising the h o n i o z y g o t e re-
cessive aa, of which 32 were seen in the sample.
An example of a m o r e extensive qualitative frequency distribution is given
in Table 2.1, which s h o w s the distribution of m e l a n o m a (a type of skin cancer)
over b o d y regions in men a n d w o m e n . This table tells us t h a t the t r u n k a n d
limbs are the most frequent sites for m e l a n o m a s and that the buccal cavity, the
rest of the gastrointestinal tract, and the genital tract are rarely afflicted by this
ΤΛΒΙ Κ 2.1
TABI.E 2.2
A meristic frequency distribution.
N u m b e r of p l a n t s of the sedge Carex
flacca f o u n d in 500 q u a d r a t s .
0 181
1 118
2 97
3 54
4 32
5 9
6 5
7 3
8 1
Total 500
Original measurements
7
425-4.35 4.3 4 4.25-4.45 4.35
4.35-4.45 4.4 3
4.45-4.55 4.5 1 4.45-4.65 4.55 | 1 4.45-4.75 4.6
4.55-4.65 4.6 0
4.65-4.75 4.7 1 4.65-4.85 4.75 | _1
If 25 25 25
Histogram of the original frequency distribution shown above and of the grouped distribution with 5 classes. Line below
abscissa shows class marks for the grouped frequency distribution. Shaded bars represent original frequency distribution;
hollow bars represent grouped distribution.
10 r
_] I 1 1 i—
3.4 3.7 4.0 4.3 4.6
Y (femur length, in units of 0.1 mm)
Completed array
Step I Step 2 ... Step 7 .. . (Step )
Ί Ί 7 3
3 3 3 3 67
4 9 4 96 4 96 4 964
5 5 5 5 5 5
6 6 6 4 6 4
7 7 7 7 13
X X X X
9 9 9 1 9 IX
10 10 10 10
11 11 11 11
12 12 12 7 12 7
13 13 13 13
14 14 14 14
15 15 15 15
16 16 16 3 16 3
17 17 17 17
IX IX 18 IX 0
FIGURE 2 . 3
F r e q u e n c y polygon. Birth weights of 9465
males infants. C h i n e s e third-class p a t i e n t s in
S i n g a p o r e , 1950 a n d 1951. D a t a f r o m Millis
a n d Seng (1954).
Exercises
2.1 R o u n d t h e f o l l o w i n g n u m b e r s t o t h r e e s i g n i f i c a n t figures: 1 0 6 . 5 5 , 0 . 0 6 8 1 9 , 3 . 0 4 9 5 ,
7815.01, 2.9149, a n d 20.1500. W h a t a r e t h e implied limits b e f o r e a n d after r o u n d -
ing? R o u n d these s a m e n u m b e r s t o o n e decimal place.
A N S . F o r t h e first v a l u e : 107; 1 0 6 . 5 4 5 106.555; 1 0 6 . 5 - 1 0 7 . 5 ; 106.6
2.2 D i f f e r e n t i a t e b e t w e e n t h e f o l l o w i n g p a i r s of t e r m s a n d g i v e a n e x a m p l e o f e a c h ,
(a) S t a t i s t i c a l a n d b i o l o g i c a l p o p u l a t i o n s , ( b ) V a n a l e a n d i n d i v i d u a l , (c) A c c u r a c y
a n d p r e c i s i o n ( r e p e a t a b i l i t y ) , ( d ) C l a s s i n t e r v a l a n d c l a s s m a r k , (e) B a r d i a g r a m
a n d h i s t o g r a m , (f) A b s c i s s a a n d o r d i n a t e .
2.3 G i v e n 2 0 0 m e a s u r e m e n t s r a n g i n g f r o m 1.32 t o 2 . 9 5 m m , h o w w o u l d y o u g r o u p
t h e m i n t o a f r e q u e n c y d i s t r i b u t i o n ? G i v e class limits a s well a s c l a s s m a r k s .
2.4 G r o u p t h e f o l l o w i n g 4 0 m e a s u r e m e n t s of i n t e r o r b i t a l w i d t h of a s a m p l e o f d o -
m e s t i c p i g e o n s i n t o a f r e q u e n c y d i s t r i b u t i o n a n d d r a w its h i s t o g r a m ( d a t a f r o m
O l s o n a n d M i l l e r , 1958). M e a s u r e m e n t s a r e in m i l l i m e t e r s .
12.2 12.9 11.8 11.9 11.6 11.1 12.3 12.2 11.8 11.8
10.7 1 1.5 1 1.3 11.2 1 1.6 11.9 13.3 11.2 10.5 11.1
12.1 11.9 10.4 10.7 10.8 11.0 11.9 10.2 10.9 11.6
10.8 11.6 10.4 10.7 12.0 12.4 11.7 11.8 1 1.3 11.1
2.5 H o w p r e c i s e l y s h o u l d y o u m e a s u r e t h e w i n g l e n g t h of a s p e c i e s of m o s q u i t o e s
in a s t u d y of g e o g r a p h i c v a r i a t i o n if t h e s m a l l e s t s p c c i m c n h a s a l e n g t h of a b o u t
2.8 m m a n d t h e l a r g e s t a l e n g t h of a b o u t 3.5 mm'. 1
2.6 T r a n s f o r m t h e 4 0 m e a s u r e m e n t s in E x e r c i s e 2.4 i n l o c o m m o n l o g a r i t h m s ( u s e a
t a b i c o r c a l c u l a t o r ) a n d m a k e a f r e q u e n c y d i s t r i b u t i o n of t h e s e t r a n s f o r m e d
v a r i a t e s . C o m m e n t o n t h e r e s u l t i n g c h a n g e in t h e p a t t e r n of t h e f r e q u e n c y d i s -
tribution from that found before
2.7 f o r t h e d a t a of T a h l e s 2.1 a n d 2.2 i d e n t i f y t h e i n d i v i d u a l o b s e r v a t i o n s , s a m p l e s ,
populations, and variables.
2.8 M a k e a s t e m - a n d - l c a f d i s p l a y of t h e d a t a g i v e n in E x c r c i s c 2.4.
2.9 T h e d i s t r i b u t i o n o f a g e s of s t r i p e d b a s s c a p t u r e d by h o o k a n d l i n e f r o m t h e E a s t
R i v e r a n d t h e H u d s o n R i v e r d u r i n g 1 9 8 0 w e r e r e p o r t e d a s f o l l o w s ( Y o u n g , 1981):
A<tc I
1 13
2 49
3 96
4 28
5 16
6 X
S h o w t h i s d i s t r i b u t i o n in t h e f o r m of a b a r d i a g r a m .
CHAPTER
Descriptive Statistics
14.9
10.8
12.3
23.3
Sum =- 6 1 7 3
3.1 / THE ARITHMETIC MEAN 29
Mean = 15.325%
Υ» v2, Y3, ^
Yl, Υ2,.·.,Υη
Σ" Yi = y> + y
2 + ·· ·+ η
i= 1
Σ 1 Yi = ιΣ 1 γί
!
= Σγ<
;
= Σ γ
= Σ γ
T h e third symbol might be interpreted as meaning, " S u m the Y t 's over all
available values of /." This is a frequently used n o t a t i o n , a l t h o u g h we shall
not employ it in this b o o k . T h e next, with η as a superscript, tells us to sum η
items of V; note (hat the i subscript of the Y has been d r o p p e d as unneces-
sary. Finally, the simplest n o t a t i o n is s h o w n at the right. It merely says sum
the Vs. This will be the form we shall use most frequently: if a s u m m a t i o n sign
precedes a variable, the s u m m a t i o n will be u n d e r s t o o d to be over η items (all
the items in the sample) unless subscripts or superscripts specifically tell us
otherwise.
30 CHAPTER 3 /' DESCRIPTIVE STATISTICS
This f o r m u l a tells us, " S u m all the («) items a n d divide the s u m by n."
T h e mean of a sample is the center of gravity of the obsen'ations in the sample.
If you were to d r a w a h i s t o g r a m of an observed frequency d i s t r i b u t i o n o n a
sheet of c a r d b o a r d a n d then cut out the h i s t o g r a m a n d lay it flat against a
b l a c k b o a r d , s u p p o r t i n g it with a pencil b e n e a t h , chances a r e t h a t it would be
out of balance, t o p p l i n g to either the left o r the right. If you m o v e d the s u p -
p o r t i n g pencil p o i n t to a position a b o u t which the h i s t o g r a m w o u l d exactly
balance, this point of b a l a n c e would c o r r e s p o n d to the a r i t h m e t i c m e a n .
W e often m u s t c o m p u t e averages of m e a n s or of o t h e r statistics that m a y
differ in their reliabilities because they are based on different sample sizes. At
o t h e r times we m a y wish the individual items to be averaged to have different
weights or a m o u n t s of influence. In all such cases we c o m p u t e a weighted
average. A general f o r m u l a for calculating the weighted average of a set of
values Yt is as follows:
t = (3.2)
Σ »·.-
w h e r e η variates, each weighted by a factor w„ are being averaged. T h e values
of Yi in such cases are unlikely to represent variates. They are m o r e likely to
be s a m p l e m e a n s Yt or s o m e o t h e r statistics of different reliabilities.
T h e simplest case in which this arises is when the V, are not individual
variates but are means. T h u s , if the following three m e a n s are based on differing
s a m p l e sizes, as shown,
>; n,
3.85 12
5.21 25
4.70 Η
GMr=nY\Yi (3.4a)
I
1 1 „ 1
You may wish to convince yourself that the geometric mean a n d the h a r m o n i c
m e a n of the four oxygen percentages are 14.65% a n d 14.09%, respectively. U n -
less the individual items d o not vary, the geometric m e a n is always less than
the arithmetic m e a n , and the h a r m o n i c m e a n is always less t h a n the geometric
mean.
S o m e beginners in statistics have difficulty in accepting the fact that mea-
sures of location or central tendency o t h e r t h a n the arithmetic m e a n are per-
missible or even desirable. T h e y feel that the arithmetic m e a n is the "logical"
32 CHAPTER 3 /' DESCRIPTIVE STATISTICS
the median would be the m i d p o i n t between the second and third items, or 15.5.
Whenever any o n e value of a variatc occurs m o r e than once, p r o b l e m s may
develop in locating the m e d i a n . C o m p u t a t i o n of the median item b e c o m e s m o r e
involved because all the m e m b e r s of a given class in which the m e d i a n item is
located will have the s a m e class m a r k . T h e median then is the {n/2)lh variate
in the frequency distribution. It is usually c o m p u t e d as that point between the
class limits of the m e d i a n class where the median individual would be located
(assuming the individuals in the class were evenly distributed).
T h e median is just o n e of a family of statistics dividing a frequency dis-
tribution into equal areas. It divides the distribution into two halves. T h e three
quartiles cut the d i s t r i b u t i o n at the 25, 50, and 75% p o i n t s — t h a t is, at points
dividing the distribution into first, second, third, and f o u r t h q u a r t e r s by area
(and frequencies). T h e second quarlile is, of course, the median. (There are also
quintiles, deciles, a n d percentiles, dividing the distribution into 5. 10, a n d 100
equal portions, respectively.)
M e d i a n s arc most often used for d i s t r i b u t i o n s that d o not c o n f o r m to the
s t a n d a r d probability models, so that n o n p a r a m e t r i c m e t h o d s (sec C h a p t e r 10)
must be used. Sometimes (he median is a m o r e representative m e a s u r e of loca-
tion than the a r i t h m e t i c m e a n . Such instances almost always involve a s y m m e t r i c
3.4 / THE MODE 33
20
18 η = 120
Hi uh
14
12
10
c"
ct-
U.
8
HGURi·: 3.1
An a s y m m e t r i c a l f r e q u e n c y d i s t r i b u t i o n ( s k e w e d t o the right) s h o w i n g l o c a t i o n of t h e m e a n , m e d i a n ,
a n d m o d e . P e r c e n t b u t t e r f a t in 120 s a m p l e s of milk ( f r o m a C a n a d i a n c a t t l e b r e e d e r s ' r e c o r d b o o k ) .
10
10
£α; 6
α-
ϊ 4
Uh
10
(i 1
0 I
FIGURE 3 . 2
T h r e e frequency d i s t r i b u t i o n s h a v i n g identical m e a n s a n d s a m p l e si/.es but differing in dispersion
pattern.
TABLE 3.1
ΣΥ I Is.7
Mean Y - 7.713
3.7 / SAMPLE STATISTICS AND PARAMETERS 37
X>· 2 __ 308.7770
Variance = = 20.5851
15
T h e variance is a m e a s u r e of f u n d a m e n t a l i m p o r t a n c e in statistics, a n d we
shall employ it t h r o u g h o u t this b o o k . At the m o m e n t , we need only r e m e m b e r
that because of the s q u a r i n g of the deviations, the variance is expressed in
squared units. T o u n d o the effect of the squaring, we now take the positive
s q u a r e r o o t of the variance a n d o b t a i n the standard deviation:
(3.6)
We note that this value is slightly larger than o u r previous estimate of 4.537.
Of course, the greater the s a m p l e size, the less difference there will be between
division by η a n d by n I. However, regardless of sample size, it is good
practice to divide a sum of s q u a r e s by η — 1 when c o m p u t i n g a variance or
s t a n d a r d deviation. It m a y be assumed that when the symbol s2 is e n c o u n t e r e d ,
it refers to a variance o b t a i n e d by division of the sum of squares by the degrees
of freedom, as the q u a n t i t y η — 1 is generally referred to.
Division of the s u m of s q u a r e s by η is a p p r o p r i a t e only when the interest
of the investigator is limited to the s a m p l e at h a n d a n d to its variance a n d
Another random document with
no related content on Scribd:
The Project Gutenberg eBook of Suuria
pyrkimyksiä
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Language: Finnish
Kirj.
Juho Hoikkanen
I
"No niin…" Heikki keskeytti hartaan äänettömyyden, joka vallitsi,
kun hänen isänsä oli huokaissut viimeisen henkäyksensä tälle
maailmalle karsinanurkan sängyssä.
"No niin", hän toisti, rykäisi kuivasti ja meni hakemaan riihen luota
leveätä lautaa, jonka oli muutamia päiviä aikaisemmin pudottanut
riihen kylkiäisen katolta ikäänkuin odottamaan tätä hetkeä.
Palattuaan hän asetti sen pystyyn uunia vasten ja virkkoi äidilleen ja
muille tuvassa olijoille:
Hiljaisia nyyhkytyksiä.
"Tekö!"
"Joo… ja sen pää oli niin puhdas ja sileä kuin listitty nauris… eli
niinkuin tuon pi-pikariri kuve", vastasi Heikki, nyt jo melkolailla
sammaltaen.
"No nyt minä siis saan pistää jalkani oikein virallisestikin oman pö-
pöydän alle."
"Ky-kyllä."
II
"Ei palata."
"Ei."
"Ei syönyt."
"Ei."
"Hyvinkin tiesi."
"Sen verran."
*****
"Mikä?"
"Semmoinen le-lehmä."
"Mansikkiko!"
"Ei ta-taivas, mutta maa. Minä laitan tähän talon semmoisen, ettei
pappila parempi. Navettakin pitää olla pulskempi kuin köyhän talon
pytinki — se-sementtipermannot, akkunat kuin kirkossa ja hö-
hönkätorvet katolla."
Heikki kantoi oljet pellolle, haki ladosta pari kupoa lisää ja sytytti
palamaan.
*****
Aamuyö.
"Voi, hyvä Jumala, kun Eero pyysi — pyysi, että arkun kansi
naulattaisiin puunauloilla."
"Ei se ole paljon pyydetty. Ota, hyvä Ville, pois nuo rautanaulat,
kisko vaikka hampain", hätäili Henna yhä.
"Ei, Henna hyvä, kyllä ne nyt pysyvät siinä, missä ovat. Vapriikin
valssinauloja jos olisivat, niin voisivat lähteäkin. Mutta takonauloja ei
kisko pihkaisesta puusta itse per… tuota — ellei järin arkkua säre."
"Voi yhtähyvin sentään", Henna ihan itki. "Että jos sen mitenkuten
on raskaampi ollakseen, kun kerta puunauloja pyysi. Se sitä
ylösnousemustaan lie ajatellut. Ja jos se siitä hyvinkin kuontuu vielä
jälkeenpäin valittelemaan ja syyttelemään."
"Soh, tamma!"
Tamma nojasi länkiinsä kerran, kaksi, mutta kun reki tuntui olevan
kuin kannossa kiinni, katsoi se kysyvästi taakseen.
"Eikö tuo lie tämä sukkelin keino", sanoi Heikki, nousi reessä
seisaalle ja läimäytti tammaa lautasille ohjasperillä olan takaa. Ja
kun hevonen oikein vauhdilla ponnisti, lähti reki liikkeelle niin äkkiä,
että miehet olivat keikahtaa maahan selälleen. Mutta reen kulku oli
niin raskasta, että tamma sai kiskoa voimainsa takaa kuin karhia.
Tuntui kuin sen alla olisi ollut sadat näkymättömät kynnet
haraamassa vastaan.
Kaiken tämän johdosta Henna oli niin lyöty, että hän ei voinut edes
itkeä, ei muuta kuin huokailla.
"Anturain alla tietysti oli jäätä, joka vähitellen kului ja hioutui pois.
Näkeehän Ville, että tamma kävellä rapsuttaa jo länget korvissa.
Soh, Lipi!… hih!"
"Ketkä ne?"
"Nepä ne… Vai jäätä? No, lapsilla on lapsen usko. Minä olen jo
vanha mies, olen ollut monessa myllyssä ja tunnen hivenen näitä
asioita. Minä olen tehnyt viimeiset majat näille Korvenkylän kuolleille,
saatellut heidät näinikään hautaan toisen toisensa jälkeen ja ollut
ylimäisenä vieraana ja virrenveisaajana kaikissa maahanpanijaisissa
lähemmä viisikymmentä ajastaikaa.
"Mitäs tyhjästä, Ville ottaa miehen ryypyn", kehoitti Heikki. "Ei pidä
vähäksyä köyhän antia. Kun tästä loppuu, löytynee tilkka lisää
reenseviltä heinäsäkistä.
"Puhumatta paras."
"No, niinpä kai. Enpä ole mokomaa junkkaria maistanut sitte
kotipolton aikain… Pistähän tuosta rouheita piippuusi. Tuntuu tuo
ilma hiukan lämmähtäneen, että tarkenee tässä jo piippuunsa pistää,
eikä ole pelkoa, että piip… Eikös perh…! Puhelehan, Heikki, tuonne
suupieleen, että pääsen tuosta nysästä eroon. Vetää piru huulia
yhteen kuin maneetti ja pistelee kuin sata neulaa", puhua sopotti
Ville toisesta suupielestään. "Siinä on tuo suukappale läkkipellistä, ja
on se joskus ennenkin tulipalopakkasilla jäätynyt huuliin kiinni… No
sillälailla, sillälailla… väkevällepä tuo henkesi höyrähtääkin… Ähä,
jopas irtautuu ylähuuli. Puhelehan vielä, että irtautuu
alahuulestakin… no sil… sillälailla. Sen tuon vanhan huulissa ei ole
lämpöä enempää kuin vanhassa anturanahassa."
"Ei taitaisi Ville tareta laulaa. On tuolla ketuksi vähän oudot eleet."
"Tuolla kun korvessa parkasitte, niin sitä vailla, ettei kuura puista
tipahdellut, ja eikö lie tipahdellutkin alimmilta oksilta."