100% found this document useful (1 vote)
244 views385 pages

Statical Chapman

Uploaded by

pixie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
244 views385 pages

Statical Chapman

Uploaded by

pixie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 385

Statistics

for technology
A COURSE IN APPLIED STATISTICS
Third ed itio n (R evised)
Statistics
for technology
A COURSE IN APPLIED STATISTICS
Third edition (Revised)

Christopher Chatfield
Reader in Statistics
Bath University, UK

CHAPMAN & HALL/CRC


Boca Raton London New York Washington, D.C.
L ibrary o f C ongress C ataloging-in-P ublication D ata

Catalog record is available from the Library o f Congress

This book contains information obtained from authentic and highly regarded sources. Reprinted material
is quoted with permission, and sources are indicated. A wide variety o f references are listed. Reasonable
efforts have been made to publish reliable data and information, but the author and the publisher cannot
assume responsibility for the validity o f all materials or for the consequences o f their use.
Apart from any fair dealing for the purposes o f research or private study, or criticism or review, as
permitted under the UK Copyright Designs and Patents Act, 1988, this publication may not be reproduced,
stored or transmitted, in any form or by any means, electronic or mechanical, including photocopying,
microfilming, and recording, or by any information storage or retrieval system, without the prior per­
m ission in writing o f the publishers, or in the case o f reprographic reproduction only in accordance with
the terms o f the licenses issued by the Copyright Licensing Agency in the UK, or in accordance with
the terms o f the license issued by the appropriate Reproduction Rights Organization outside the UK.
The consent o f CRC Press LLC does not extend to copying for general distribution, for promotion,
for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press
LLC for such copying.
Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.

Tradem ark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation, without intent to infringe.

Visit the CRC Press Web site at www.crcpress.com

© 1983 by Christopher Chatfield

First edition 1970


Reprinted 1975
Second edition 1978
Third edition 1983
Reprinted 1985, 1986, 1988
Reprinted with revisions 1989, 1991,(twice), 1992, 1994
First CRC Press reprint 1999
Originally published by Chapman & Hall

N o claim to original U.S. Government works


International Standard Book Number 0-412-25340-2
Printed in the United States o f America 2 3 4 5 6 7 8 9 0
Printed on acid-free paper
Contents

Preface 11

Part One Introduction

1 Outline of statistics 15

2 Simple ways of summarizing data 20


2.1 Introduction 20
2.2 Graphical methods 21
2.3 Summary statistics 28

Part Two Theory

3 The concept of probability 37


3.1 Probability and statistics 37
3.2 Some definitions 38
3.3 Types of events 40
3.4 Permutations and combinations 49
3.5 Random variables 51

4 Discrete distributions 56
4.1 The discrete probability distribution 56
4.2 The binomial distribution 57
4.3 The binomial model 59
4.4 Types of distribution 64
4.5 The mean and variance of the binomial distribution 64
4.6 The Poisson distribution 69
4.7 Bivariate discrete distributions 75

5 Continuous distributions 81
5.1 Definitions 81
5.2 The mean and variance of continuous distributions 86
5.3 The normal distribution 87
5.4 Uses of the normal distribution 92
5.5 Normal probability paper 95
5.6 The exponential distribution 98
5.7 Bivariate continuous distributions 103

5 Contents
6 Estimation 106
6.1 Point and interval estimates 106
6.2 Properties of the expected value 107
6.3 The sampling distribution of x 111
6.4 The sampling distribution of 116
6.5 Some properties of estimators 118
6.6 General methods of point estimation 121
6.7 Interval estimation 126

7 Significance tests 134


7.1 Introduction 134
7.2 Tests on a sample mean 140
7.3 Comparing two sample means 143
7.4 The i-test applied to paired comparisons 147
7.5 The goodness-of-fit test 148
7.6 TheF-test 155
7.7 Distribution-free or non-parametric tests 157
7.8 Power and other considerations 158

8 Regression and correlation 166


8.1 Scatter diagram 166
8.2 Curve fitting 167
8.3 Regression 171
8.4 Confidence intervals and significance tests in linear regression 174
8.5 The coefficient of determination 177
8.6 Multiple and curvilinear regression 178
8.7 Orthogonal polynomials 180
8.8 The design of regression experiments 185
8.9 The correlation coefficient 185
8.10 Estimating the regression lines 191
8.11 The bivariate normal distribution 194
8.12 Interpretation of the correlation coefficient 196

Part Three Applications


9 Planning the experiment 203
9.1 Preliminary remarks 203
9.2 Measurements 204
9.3 The propagation of error 206
9.4 Improving precision with series and parallel arrangements 215
9.5 Combining dissimilar estimates by the method of least squares 216

10 The design and analysis of experiments - 1 Comparative experiments 224


10.1 Some basic considerations in experimental design 224
10.2 A mathematical model for simple comparative experiments 226

6 Contents
10.3 The number of replications 227
10.4 Randomization 230
10.5 The analysis of a randomized comparative experiment 231
10.6 The range test 235
10.7 One-way analysis of variance 237
10.8 Follow-up study of the treatment means 241
10.9 Verifying the model 243
10.10 The randomized block experiment 244
10.11 Two-way analysis of variance 248
10.12 Latin squares 252
10.13 Balanced incomplete block designs 253

11 The design and analysis of experiments - 2 Factorial experiments 257


11.1 Introduction 257
11.2 The advantages of complete factorial experiments 258
11.3 The design of complete factorial experiments 260
11.4 The analysis of a complete factorial experiment 263
11.5 Follow-up procedure 269
.11.6 The 2" factorial design 271
11.7 Fixed effects and random effects 276
11.8 Other topics 277
11.9 The examination of residuals 279
11.10 Determination of optimum conditions 280
11.11 Summary 285

12 Quality control 288


12.1 Acceptance sampling 288
12.2 Operating characteristic curve 289
12.3 Types of sampling schemes 293
12.4 Rectifying schemes 296
12.5 The military standard plan 297
12.6 Sampling by variables 298
12.7 Practical problems 298
12.8 Process control 299
12.9 Control charts for samples 301
12.10 Cusum charts 306
12.11 Prediction, system identification and control 312

13 Life testing 319


13.1 Problems in measuring reliability 319
13.2 The mathematical distribution of failure times 321
13.3 The exponential distribution 323
13.4 Estimating the conditional failure rate 324
13.5 The Weibull distribution 327

7 Contents
Appendix A The relationships between the normal, t- and
F-distributions 332

Appendix B Statistical tables 335


Table 1. Areas under the normal curve 335
Table 2. Percentage points of Student’s /-distribution 336
Table 3. Percentage points of the distribution 337
Table 4. Upper percentage points of the F-distribution 338
Table 5. Values of c “* 341
Table 6. Percentage points of the distribution of the Studentized range 342
Table 7. Random numbers 344

Appendix C Further reading 346

Appendix D Some other topics 351


D .l Calculating the mean and standard deviation of a frequency
distribution 351
D.2 Interpretation of the sample standard deviation 354
D.3 How to round numbers 355
D.4 Stem-and-leaf plots and box plots 360
D.5 Estimating and testing a proportion 363

Appendix E Some general comments on tackling statistical problems 365


E. 1 Preliminary questions 365
E.2 Collecting the data 366
E.3 Preliminary data analysis 367
E.4 Choice of a more elaborate method 369
E.5 Using a computer 370
E.6 Using a library 371
E.7 Presenting the results 371

Answers to exercises 373

Index 377

8 Contents
‘It’s p erfectly in te llig ib le ,’ the ca p ta in said , in an o ffen d ed
to n e, ‘to a n y o n e th at u n d erstan d s su ch th in g s.’
Preface

This book provides an introduction to statistics, with particular


emphasis on applications in the applied sciences and engineering. The
book may be used as a text for a basic course in statistics or for self­
tuition. Although the book was originally intended for ‘service’ courses
to scientists and engineers, I have been pleased to find that the book
has also been widely used for applied statistics courses for students in
mathematics and statistics departments. Although ‘technology’ was an
‘in’ word when the book was written, it may be that a better title today
would be ‘A Course in Applied Statistics’.
The book is divided into three sections. Part One includes an
introduction and some material on descriptive statistics. Part Two
deals with the theory of probability and statistics, while Part Three
considers some applications, including the design and analysis of
experiments, quality control and life-testing. The reader who is
specifically interested in one of these last two topics can proceed
directly from Chapter 7 to Chapter 12 or 13 as appropriate.
The favourable reaction to the first and second editions has prompted
me to make relatively few changes for the third edition. I have clarified
and updated the text in many places but without changing the page
numbering. This should be helpful to teachers who are used to earlier
editions. It will also keep costs down and avoid introducing new
typographical errors.
Appendix E was added to the text in the third edition. It makes some
general comments on how to tackle statistical problems and I strongly
recommend the reader to study this Appendix after taking a conventional
introductory Statistics course. The Appendix includes remarks on such
topics as processing the data, using a library and writing a report. These
vital topics are often omitted from conventional courses.
I have kept the mathematics throughout the book as simple as
possible; an elementary knowledge of calculus plus the ability to
manipulate algebraic formulae are all that is required. I have tried to
introduce the theory of statistics in a comprehensible way without

10 Preface
getting too involved in mathematical details. 4 few results are stated
without proof where this is unlikely to affect the student’s comprehen­
sion. However, I have tried to explain carefully the basic concepts of
the subject, such as probability and sampling distributions; these the
student must understand. The worst abuses of statistics arise in the
‘cook-book’ approach to the subject where scientists try to analyse
their data by substituting measurements into statistical formulae which
they do not understand.
Many readers will have access to a computer, or at least to a
microcomputer or sophisticated pocket calculator. The statistical
software which is readily available has taken much of the drudgery
out of statistics. There are some remarks on using a computer in
Appendix E.5. I have amended the text in a number of places to make
it clear that the reader need no longer worry too much about some
computational details, although there is much to be said for working
through some of the methods in detail at least once in order to
understand fully what is going on.
I am grateful to many people for constructive comments on earlier
editions and I am always pleased to hear from any reader with new
ideas.
I am indebted to Biometrika trustees for permission to publish
extracts from Biometrika Tables for Statisticians (Appendix B, Table 6)
and to Messrs Oliver and Boyd Ltd, Edinburgh for permission to
publish extracts from Statistical Tables for Biological, Agricultural and
Medical Research (Appendix B, Table 7).
The quotations are, of course, from the works of Charles Lutwidge
Dodgson, better known as Lewis Carroll.

C h r is t o p h e r C h a t f ie l d
Bath University
May, 1989

11 Preface
Part one
Introduction

‘Surely,’ said the governor, ‘Her Radiancy would admit


that ten is nearer to ten than nine is - and also nearer
than eleven is.’
Chapter 1
Outline of statistics

Statistical methods are useful in many types of scientific investigation.


They constitute the science of collecting, analysing and interpreting data
in the best possible way. Statistics is particularly useful in situations
where there is experimental uncertainty and may be defined as ‘the
science of making decisions in the face of uncertainty’. We begin with
some scientific examples in which experimental uncertainty is present.

Example 1
The thrust of a rocket engine was measured at ten-minute intervals
while being run at the same operating conditions. The following thirty
observations were recorded (in newtons x 10 ^):

999-1 1003-2 1002-1 999-2 989-7 1006-7


1012-3 996-4 1000-2 995-3 1008-7 993-4
998-1 997-9 1003-1 1002-6 1001-8 996-5
992-8 1006-5 1004-5 1000-3 1014-5 998-6
989-4 1002-9 999-3 994-7 1007-6 1000-9

The observations vary between 989-4 and 1014*5 with an average


value of about 1000. There is no apparent reason for this variation
which is of course small compared with the absolute magnitude of each
observation; nor do the variations appear to be systematic in any way.
Any variation in which there is no pattern or regularity is called
random variation. In this case if the running conditions are kept uniform
we can predict that the next observation will also be about a thousand
together with a small random quantity which may be positive or
negative.

15 Outline of statistics
Example 2
The numbers of cosmic particles striking an apparatus in forty con­
secutive periods of one minute were recorded as follows.

3 0 01 0 2 1 0 1 1
0 3 4 1 2 0 2 0 3 1
10 12 0 2 1 0 1 2
3 1 00 2 1 0 3 1 2

The observations vary between zero and four, with zero and one
observed more frequently than two, three and four. Again there is
experimental uncertainty since we cannot exactly predict what the
next observation would be. However, we expect that it will also be
between zero and four and that it is more likely to be a zero or a one
than anything else. In Chapter 4 we will see that there is indeed a
pattern in this data even though individual observations cannot be
predicted.

Example 3
Twenty refrigerator motors were run to destruction under advanced
stress conditions and the times to failure (in hours) were recorded as
follows.

104-3 158-7 193-7 201-3 206-2


227-8 249-1 307-8 311-5 329-6
358-5 364-3 370-4 380-5 394-6
426-2 434-1 552-6 594-0 691-5

We cannot predict exactly how long an individual motor will last,


but, if possible, we would like to predict the pattern of behaviour of a
batch of motors. For example we might want to know the over-all
proportion of motors which last longer than one week (168 hours).
This problem will be discussed in Chapter 13.
When the scientist or engineer finishes his education and enters
industry for the first time, he must be prepared to be faced frequently
with situations which involve experimental uncertainty. The purpose of
this book is to provide the scientist with methods for treating these
uncertainties. These methods have proved to be very useful in both
industry and research.

16 Outline of statistics
A scientific experiment has some or all of the following character­
istics.
(1) The physical laws governing the experiment are not entirely under­
stood.
(2) The experiment may not have been done before, at least success­
fully, in which case the instrumentation and technique are not fully
developed.
(3) There are strong incentives to run the smallest number of the
cheapest tests as quickly as possible.
(4) The experimenter may not be objective, as for example when an
inventor tests his own invention or when a company tests competitive
products.
(5) Experimental results are unexpected or disappointing. (Engineers
explain disappointing results as illustrating Murphy’s law. The
corresponding law for statistics might be phrased ‘if two events are
equally likely to occur, the worse will happen’.)
(6) Although experimental uncertainty may be present, many industrial
situations require decisions to be made without additional testing or
theoretical study.
To illustrate how statistics can help at different stages of an experi­
ment, let us assume that we have been given the task of improving the
performance of a space pump. The basic design of this machine is
illustrated in Figure 1: gas is passing through the pump which is
driven by an electric motor.
The first step is to study the current technology. Statistical methods
are never a substitute for understanding the physical laws governing

inlet
temperature..^
pressure

outlet
temperature
pressure
Figure 1 Diagram of a space pump

17 Outline of statistics
the problem under consideration; rather statistics is a tool for the
scientist in the same sense as differential equations and digital
computers.
The second step is to define the objective of the test program as
precisely as possible. For example we may be interested in improving
flow, pressure rise or efficiency, or reducing weight or noise. Alterna­
tively we may be interested in improving the reliability of the machine
by increasing the time between breakdowns.
The objective being defined, a list of variables or factors can be
made which will vary or be varied during the test program. In the space-
pump experiment, the variables include power, inlet temperature, inlet
pressure and speed. The test program must be designed to find the best
way of choosing successive values for the different factors. The problems
involved in designing an experiment are discussed in Chapters 10 and
11.
In order to see if the objective is being achieved, we look at the outputs
or responses. In the space-pump example, these include pressure rise,
flow, efficiency and reliability. Note that these responses are probably
interconnected. For example it may be pointless to increase the flow if,
as a result, the efficiency drops or the reliability falls off (that is, the
pump breaks down more frequently).
During the experiment, measurements of the factors and responses
will be made. The article being tested should be instrumented to obtain
the most precise and accurate data. The problems of instrumentation
are discussed in Chapter 9. The analysis of the resulting data should
attempt to determine not only the individual effect of each variable on
the responses, but also the joint effects of several variables, the inter­
actions. The process of estimating these effects is called inference. These
estimates should have known statistical properties such as a lack of
systematic error (called unbiasedness). Finally any conclusions or
recommendations should be given together with a measure of the risk
involved and/or the possible error in the estimates.
In order to carry out the above procedure successfully it is usually
helpful to set up a mathematical model to describe the physical situation.
This model is an attempt to formulate the system in mathematical
terms and may, for example, consist of a series of formulae connecting
the variables of interest. This model may involve unknown coefficients
or parameters which have to be estimated from the data. After this
has been done, the next step is to test the validity of the model. Finally,
if the model is sound, the solution to the problem can be obtained from
it.

18 Outline of statistics
The key point to note in this Chapter is that experimental
uncertainty is present in most practical situations and that there is
nothing abnormal about it. Despite its presence, the study of statistics
enables us to collect, analyse and interpret data in a sensible way.

Additional reading
Many other books provide an introduction to statistical techniques.
The references in the list given below have been selected to provide a
non-technical easy-to-read approach. They motivate the subject,
provide examples where statistics has been used successfully, and also
give examples where statistics has been used in a misleading way. The
book by Darrell Huff is particularly amusing and could profitably be
read by everyone. The book by Peter Sprent explains the key prin­
ciples of Applied Statistics, and would serve as a useful complement
to a more formal text book such as this one. The text book by Andrew
Ehrenberg adopts a refreshing non-technical approach with the
emphasis on improving numeracy.

Ehrenberg, A. S. C. (1982), A Primer in Data Reduction, Wiley.


H o p k in s , H. (1973), The Numbers Game, Seeker and AVarburg.
H uff , D . (1973), How to Lie with Statistics, 2nd edn. Penguin B ooks.
R e ic h m a n n , W . J. (1961), Use and Abuse of Statistics, Chapman and Hall: paper­
back edn (19 6 4), Penguin B ooks.
S p r e n t , P. (1977), Statistics in Action, Penguin B ooks.

19 Outline of statistics
Chapter 2
Simple ways of summarizing
data

2.1 Introduction
The subject ‘Statistics’ is concerned with collecting reliable data and then
analysing and interpreting them. We begin by introducing some simple
methods for summarizing data which are often given the collective title
‘Descriptive Statistics’. They include plotting appropriate graphs and
calculating various summary statistics, possibly using a computer package
such as MINITAB. Note that some supplementary material has been
added in Appendices D .l to D.4 and E.3. The idea is to get a ‘feel’ for
the data and pick out their most important features. The quality of the
data should also be assessed by looking for errors, outliers (see section
11.9), missing observations and other peculiarities. This preliminary
analysis is always advisable before choosing a more advanced statistical
technique, and may prove sufficient anyway.
It may seem strange to consider data analysis before data collection but
it turns out that the topics are easier to learn in this order. Nevertheless,
it is clear that the selected sample must be chosen to be representative of
the population from which it is drawn. Statistics will then enable us to
make deductions about the whole population from the information in the
sample.
The data usually consist of a series of measurements on one or more
quantities of interest, called variates. A discrete variate can only take a
sequence of distinct values which are usually integers. For example, the
number of cosmic particles striking a recorder in one minute may be 0, 1 ,
2, . . . . The number U cannot be observed. On the other hand, a
continuous variate can take any value in a specified interval on a
continuous scale. For example, the thrust of a rocket engine can be any
non-negative number. In practice the recorded values of a continuous
variate will be rounded to a specified number of significant figures and

20 Sinnple ways of summarizing data


may look discrete, but will still be regarded as continuous when the
underlying variate is continuous.
Categorical data (e.g. see section 7.5) typically consist of the frequen­
cies with which one of a list of categories occurs (e.g. frequencies of
different types of defect). Other valid types of data include rankings and
go/no-go measurements. The appropriate analysis naturally depends on
the type of measurement.

2.2 Graphical methods


It is always a good idea to plot the data in as many different ways as
possible, as much information can often be obtained just by looking
at the resulting graphs. This section considers data for a single vari­
able, while Section 8.1 describes the scatter diagram for displaying
simultaneous observations on two variables. Two new types of graph,
called stem-and-leaf plots and box plots, are introduced in Appendix
D.4.

2.2.1 The bar chart


Given discrete data as in Example 2, Chapter 1, the first step is to find
the frequency with which each value occurs. Then we find, for example,
that ‘zero’ occurs thirteen times but that ‘four’ only occurs once.
Table 1

Number of
cosmic particles Frequency

0 13
I 13
2 8
3 5
4 I

Total 40

The values in the right hand column form a frequency distribution.


This frequency distribution can be plotted as in Figure 2, where the
height of each line is proportional to the frequency with which the
value occurs.
This diagram is called a bar chart (or line graph or frequency diagram)
and makes it easier to see the general pattern of discrete data.

21 Graphical methods
> i5r

^0h

....... 4 ..............
number of cosmic particles

Figure 2 Bar chart of data of Table 1

2.2.2 The histogram


The histogram is used to display continuous data. Like the bar chart, it
reveals the general pattern of the data as well as any unusual values (or
outliers). It is best illustrated with an example.

Example 1
The heights of 100 students were measured to the nearest inch and
tabulated as follows.

Table 2

Height Number of
(inches) students

60-62 6
63-65 15
66-68 40
69-71 30
72-74 9

Total 100

The data were divided into five groups as shown and the frequencies
with which the different groups occur form a frequency distribution.
The data can be plotted as a histogram in which rectangles are con­
structed with areas proportional to the frequency of each group.

22 Simple ways of summarizing data


How to draw a histogram
(1) Allocate the observations to between five and twenty class inter­
vals, which should not overlap but yet cover the whole range. In
Example 1, the first class interval chosen is apparently (60-62)
inches, but more strictly is (59.5-62.5) inches.
(2) The class mark is the midpoint of the class interval. All values
within the interval are considered concentrated at the class mark.
(3) Determine the number of observations in each interval.
(4) Construct rectangles with centres at the class marks and areas
proportional to the class frequencies. If all the rectangles have the same
width then the heights are proportional to the class frequencies.
The choice of the class interval and hence the number of intervals
depends on several considerations. If too many intervals are used
then the histogram will oscillate wildly but if too few intervals are
used then important features of the distribution may be overlooked.
This means that some sort of compromise must be made. As the number
of observations is increased the width of the class intervals can be
decreased as there will be more observations in any particular interval.

Example 2
Plot a histogram of the data given in Example 1, Chapter 1. The
smallest thrust observed is 989*4 and the largest 1014*5. The difference
between them is about twenty-five units so that three units is a reason­
able class interval. Then we will have about ten class intervals. Group
the observations into the ten intervals as in Table 3.

23 Graphical methods
Table 3
Frequency distribution of thrust

Number of
Class interval observations

987-990 2
990-993 1
993-996 3
996-999 5
999-1002 7
1002-1005 6
1005-1008 3
1008-1011 1
1011-1014 1
1014-1017 1

If an observation falls exactly at the division point (for example


990 0) then it is placed in the lower interval. Note that if we take one
unit to be the class interval then there will only be five intervals out of
twenty-eight with more than one observation and this will give a very
flattened histogram, which is very difficult to interpret.

24 Simple ways of summarizing data


Histogram shapes. The shape of a histogram can be informative. Some
common shapes are illustrated below.

(a)

(b)

'liilL v

Figure 5 Various histograms


(a) symmetric or bell-shaped
(b) skewed to the right or positively skewed
(c) reverse J-shaped
id) skewed to the left or negatively skewed

25 Graphical methods
Frequency curve. Where there are a large number of observations
the histogram may be replaced with a smooth curve drawn through
the midpoints of the tops of each box. Such a curve is called a frequency
curve.

2.2.3 The cumulative frequency diagram


Another useful way of plotting data is to construct what is called a
cumulative frequency diagram. If the observations are arranged in
ascending order of magnitude, it is possible to find the cumulative
frequency of observations which are less than or equal to any particular
value. It is usually sufficient to calculate these cumulative frequencies at
a set of equally spaced points. The cumulative frequencies are easier to
interpret if they are expressed as relative frequencies, or proportions,
by dividing by the total number of observations. When these values are
plotted, a step function results which increases from zero to one.
Interest in cumulative frequencies arises if, for example, we wanted to
find the proportion of manufactured items which fall below a particular
standard, but note that the diagram is not particularly helpful as an
exploratory tool.

Example 3
Plot the cumulative frequency diagram of the data given in Example 1,
Chapter 1.
Using Table 3 it is easy to calculate the cumulative frequencies at
the divisions between the class intervals.

26 Simple ways of summarizing data


Table 4

Cumulative Relative
Thrust frequency frequency

987 0 0000
990 2 0-067
993 3 0-100
996 6 0-200
999 11 0-367
1002 18 0-600
1005 24 0-800
1008 27 0-900
1011 28 0-933
1014 29 0-967
1017 30 1-000

The relative frequencies have been plotted in Figure 7.

Figure 7 Cumulative frequency diagram of thrust data

27 Graphical methods
2.3 Summary statistics
In addition to the graphical techniques, it is useful to calculate
some figures to summarize the data. Any quantity which is calculated
from the data is called a statistic (to be distinguished from the subject
statistics). Thus a statistic is a function of the measurements or observa­
tions.
Most simple statistics can be divided into two types; firstly quantities
which are ‘typical’ of the data and secondly quantities which measure
the variability of the data. The former are usually called measures of
location and the latter are usually called measures of spread.

2.3.1 Measures o f location


There are three commonly used measures of location, of which the
mean is by far the most important.

The mean. Suppose that n measurements have been taken on the variate
under investigation, and these are denoted by Xj, X2, . . . , The
(arithmetic) mean of the observations is given by

Xi + X 2 + . . .-hX,
X = ”= y ^ 2.1
n’

In everyday language we say that x is the average of the observations.

Example 4
Find the average thrust of the rocket engine from the data in Example 1,
Chapter 1.
999* 1 +1003-2+ ... +1000*9
We find X = _

- 1000- 6.

Data are often tabulated as in Tables 1 and 2 and this makes it


somewhat easier to calculate the mean. If the values Xi, X2, . . . , of the
variate occur with frequency/i, / 2, . . . . , /v, then the mean is given by
the equivalent formula

28 Simple ways of summarizing data


/l ^2 + • • •+ / n
X =
fl + /2 + *• -+ /^

Z
i= 1
\ * (see also Appendix D.l)
Z /
i= 1

Example 5
In Table 1 we have Xi = 0 occurs with frequency fi = 13. Similarly
Xi = 1 occurs with frequency /2 = 13 and so on. Then we find
(Ox 13) + (1 X 13) + (2 x 8) + (3 x5) + (4x 1)
13+13 + 8 + 5 + 1
_ 48
“ 40
= 1- 2.
Thus an average of 1*2 cosmic particles are emitted every minute.

Example 6
When continuous data are tabulated as in Table 2, the sample mean
can be estimated by assuming that all the observations in a class interval
fall at the class mark.
In this case we find
(61 x 6) + (64x 15)-h(67x40) + (70x30) + (73 x9)
100
67-6
= mean height of the students.
At this point it is worth repeating the fact that a set of data is a sample
drawn from the population of all possible measurements. Thus it is
important to realize that the sample mean, x, may not be equal to the
true population mean, which is usually denoted by p. In Chapter 6
we will see that x is an estimate of p. Similarly the other sample statistics
described here will also be estimates of the corresponding population
characteristics.

29 Summary statistics
The median. This is occasionally used instead of the mean, particularly
when the histogram of the observations is skewed (see Figure 5). It is
obtained by placing the observations in ascending order of magnitude
and then picking out the middle observation. Thus half the observations
are numerically greater than the median and half are smaller.

Example 7
The weekly wages of twelve workers, selected at random from the
payroll of a factory, are (to the nearest pound)
83, 111, 127, 96, 124, 103, 82, 99, 173, 137, 102, 106.
The data give ic = £lll*9.
Rewriting the observations in ascending order of magnitude we
have
82, 83, 96, 99, 102, 103, 106, 111, 124, 127, 137, 173.
As there are an even number of observations, the median is the average
of the sixth and seventh values, namely £104-5.
As eight of the observations are less than the sample mean, it could
be argued that the median is ‘more typical’ of the data. The outlying
observation, 173, has a considerable effect on the sample mean, but
not much effect on the median.

The mode. This is the value of the variate which occurs with the
greatest frequency. For discrete data the mode can easily be found by
inspection. For continuous data the mode can be estimated by plotting
the results in a histogram and finding the midpoint of the tallest box.
Thus in Example 1 the mode is 67 inches.

Comparison. As we have already remarked, the mean is by far the


most important measure of location. When the distribution of results
is roughly symmetric, the mean, mode and median will be very close
together anyway. But if the distribution is very skewed there may be a
considerable difference between them and then it may be useful to
find the mode and median as well as the mean. Figure 8 shows the
frequency curve for the time taken to repair aircraft faults (R. A. Harvey,
‘Applications of statistics to aircraft project evaluation’. Applied
Statistics, 1967). Typical values are one hour for the mode, two hours
for the median, and four hours for the mean. Obviously extra care is
needed when describing skew distributions.

30 Simple ways of summarizing data


2.3.2 Measures o f spread
It is often equally important to know how spread out the data is. For
example suppose that a study of people affected by a certain disease
revealed that most people affected were under two years old or over
seventy years old; then it would be very misleading to summarize the
data by saying ‘average age of persons affected is thirty-five years’.
At the very least we must add a measure of the variability or spread
of the data. Several such quantities are available.

repair time (hours)


Figure 8 Frequency curve of aircraft repair times

Range. This is the difference between the largest and smallest observa­
tions. It is useful for comparing the variability in samples of roughly
equal size, but unfortunately it depends on the sample size and tends to
get larger as the sample size increases.

Interquartile range. This is the difference between the upper and lower
quartiles (see page 361) and thus includes the middle half of the
observations when put into ascending order of magnitude. Unlike the
range, it does not depend on the sample size and is not affected by a few
outlying values. Thus it is a useful ‘robust’ statistic which is particularly
useful for describing skew distributions, perhaps supplemented by the
values of the median and the highest and lowest observations.
However for distributions which are approximately symmetric, we will
see that there are strong theoretical and practical considerations for
preferring a statistic called the standard deviation, which is essentially the
root-mean-square deviation about the mean. The square of the standard
deviation is called the variance and has many uses, though not as a
descriptive statistic.

31 Sunnmary statistics
Variance and standard deviation. The sample variance 5^ of n observa­
tions, Xi, X2, . . . , is given by
(x, -x )^ + (X 2 -x )^ + . ..+ (x „ -x )^
=
(n-I)
(x .-x )
2.2
n 1
The standard deviation s of the sample is obtained by taking the square
root of the variance.

The use of (n—\) instead of n in the denominator of this formula


puzzles many students. On the rare occasions when the true popula­
tion mean // is known, the formula for 5^ would indeed have n in the
denominator as expected.

I= 1 n *
However, since is not generally known, the sample mean x is used
instead. Then it can be shown theoretically that it is better to change
the denominator from n to {n—\). (This point is discussed in Section
6.5, where is found to be an unbiased estimate of the true population
variance, a^.) It is worth pointing out that the difference between n
and (n —1) is only important for small samples.
The standard deviation is in the same units as the original measure­
ments and for this reason it is preferred to the variance as a descriptive
measure. However it is often easier from a theoretical and computa­
tional point of view to work with variances. Thus the two measures are
complementary (see Appendix D.2 for further remarks on the
interpretation of a standard deviation).
It is sometimes more convenient to rearrange 2.2 in the form
-2

2 1(1/ 0- nx
s = 2.3
(n-l)
and this formula is used in many pocket calculators. But if x is much
bigger than s, then 2.3 may give rounding errors unless the data is
coded (see Exercise 3 below). Thus 2.2 is preferred for use on
computers. The formula for calculating the standard deviation of a
frequency distribution is given in Appendix D.l.

32 Simple ways of summarizing data


Example 8
Find the range, variance and standard deviation of the following 6
observations:

0-9 1-3 1-4 1-2 0-8 10 -

Range = 1*4 —0-8 = 0-6.


6-6
X = M.

X x" = 7-54.
, (7.54-7*26)
s = -------^------ = 0056,

5 = 0-237.

Coefficient of variation. We have seen that the standard deviation is


expressed in the same units as the individual measurements. For some
purposes it is much more useful to measure the spread in relative terms
by dividing the standard deviation by the sample mean. The ratio is
called the coefficient o f variation.

(coefficient of variation) = - .
X

For example a standard deviation of 10 may be insignificant if the


average observation is around 10,000 but may be substantial if the
average observation is around 100.
Another advantage of the coefficient of variation is that it is in­
dependent of the units in which the variate is measured, provided that
the scales begin at zero. If every observation in a set of data is multiplied
by the same constant, the mean and standard deviation will also be
multiplied by this constant, so that their ratio will be unaffected. Thus
the coefficient of variation of a set of length measurements, for example,
would be the same whether measurements were made in centimetres
or inches. However this is not true, for example, for the centigrade and
Fahrenheit scales of measuring temperature where the scales do not
begin at zero, and the coefficient of variation should not be calculated
when the variable can take negative values.

33 Summary statistics
Example 9
Using 2.2 (or 2.3 with coding - see Exercise 3 below), the standard
deviation of the data from Example 1, Chapter 1, was found to be
s = 60.
It is not immediately clear if this is to be judged ‘large’ or ‘small’.
However the coefficient of variation is given by
60
= 0006
X 1000-6
and from this we Judge that the variation in the data is relatively small.

Exercises
1. The weight of an object was measured thirty times and the following
observations were obtained.

6-120 6-129 6-116 6-114 6-112


6-119 6-119 6-121 6-124 6-127
6-113 6-116 6-117 6-126 6-123
6-123 6-122 6-118 6-120 6-120
6-121 6-124 6-114 6-121 6-120
6-116 6-113 6-111 6-123 6-124

All measurements are to the nearest 0 001 g. Plot a histogram of the


observations.
2. Find the mean, variance and standard deviation of the following
samples.
(a) 5 2 3 5 8 (b) 105 102 103 105 108
(c) 0-5 0-2 0-3 0-5 0-8.
Can you see how the answers to (b) and (c) follow from those for (a)?
3. The computation of the mean and variance of a set of data can often
be simplified by subtracting a suitable constant from each observation.
This process is called ‘coding’. Find the mean of the coded observations
and then add the constant to get the sample mean. The variance (and
standard deviation) of the observations is not affected by coding.
Calculate 3c and for the following samples.

(a) 997 995 998 992 995 (here 990 is a suitable constant)
(b) 131 1 137-2 136-4 133-2 139 1 140-0

34 Summary statistics
Part two
Theory

‘Mutton first, mechanics afterwards.’


Chapter 3
The concept of probability

3.1 Probability and statistics


We begin our study of the theory of statistics with an introduction to
the concept of probability. Most of us have a good grasp of probability
as illustrated by the popularity of games of chance such as bridge,
poker, roulette, and dice, all of which involve the assessment of proba­
bility. It turns out that the theory of probability is a good base for the
study of statistics - in fact each is the inverse subject of the other. The
relation between probability and statistics may be clarified with an
example.
A sample of size one hundred is taken from a very large batch (or lot) of
valves, which contain a proportion p of defective items. This is another
situation in which experimental uncertainty is involved, because, if
several such samples are taken, the number of defective items will not
be the same in each sample.
Let us hypothetically suppose that we know that p == OT. Then we
would expect to get (100 xOT) = 10 defectives in the sample. But a
single sample may contain any number of defectives ‘close’ to 10. If
we keep taking samples size 100 from the batch, then sometimes 8
defectives will be present, sometimes 11, sometimes 9, 10, 12 etc.
Probability theory enables us to calculate the chance or probability of
getting a given number of defectives, and we shall see how this is done
in Chapter 4.
However, in a typical experimental situation we will not know what
p is. Instead, suppose we take a sample size 100 and get 10 defectives.
Then what is pi The obvious answer is 10/100 = OT; but it may be
O i l or 0*09 or any number ‘close’ to OT. Statistical theory enables us
to estimate the value of p.
The duality between probability and statistics can be expressed more
generally in the following way. In both subjects the first step in solving
a problem is to set up a mathematical model for the physical situation
which may involve one or more parameters. In the above example the

37 Probability and statistics


proportion of defectives p is a parameter. If the parameters of the model
are known, given or established from past history then we have a
probability problem and we can deduce the behaviour of the system from
the model. However if the parameters are unknown and have to be
estimated from the available data then we have a statistical problem.
But to understand and solve statistical problems, it is necessary to have
some prior knowledge of probability, and so we will devote the next
three chapters to a study of probability and related topics.

Example 1
An air-to-air missile has a ‘kill’ ratio of 2 in 10. If 4 missiles are launched,
what is the probability that the target will not be destroyed? Is this a
statistics or probability problem?
The physical situation in this problem can be described by one
quantity: the ‘kill’ ratio. As this is known from previous tests we can
calculate the probability that a given number of missiles will hit the
target. Thus we have a probability problem. The solution is obtained
by considering the ‘miss’ ratio which must be 8 in 10. The probability
that all 4 missiles will miss the target, assuming that they are indepen­
dently launched, is obviously less than the probability that just one
missile will miss the target. Later in the chapter we will see that the
probability is given by
(0-80)^ = 0-4096.

Example 2
A hundred missiles are launched and eleven ‘kills’ are observed. What
is the best estimate of the ‘kill’ ratio?
As the ‘kill’ ratio was unknown before the test was performed, we
have to estimate this quantity from the observations. The ratio of kills
to launches is the intuitive estimate, namely 0-11. This is a statistical
problem.

3.2 Some definitions


We begin our study of probability theory with some definitions.
The sample space is defined as the set of all possible outcomes of an
experiment. For example:
When a die is thrown the sample space is 1,2, 3, 4, 5 and 6.

38 The concept of probability


If two coins are tossed, the sample space is head-head, tail-tail, head-
tail, tail-head.
In testing the reliability of a machine, the sample space is ‘success’ and
‘failure’.
Each possible outcome is a sample point. A collection of sample
points with a common property is called an event.
If a die is thrown and a number less than 4 is obtained, this is an event
containing the sample points 1, 2 and 3.
If two coins are tossed and at least one head is obtained, this is an
event containing the sample points head-head, tail-head, and head-
tail.
If the number of particles emitted by a radioactive source in one minute
is measured, the sample space consists of zero and all the positive
integers, and so is infinite. Obtaining less than five particles is an event
which consists of the sample points 0, 1, 2, 3 and 4.
The probability of a sample point is the proportion of occurrences of
the sample point in a long series of experiments. We will denote the
probability that sample point x will occur by P(x). For example, a
coin is said to be ‘fair’ if heads and tails are equally likely to occur, so
that P(H) = F(T) = ^. By this we mean that if the coin is tossed N
times and / h heads are observed, then the ratio fn /N tends to get closer
to ^ as iV increases. On the other hand if the coin is ‘loaded’ then the
ratio / h/N will not tend to
The probability of a sample point always lies between zero and
one. If the sample point cannot occur, then its probability is zero, but
if the sample point must occur, then its probability is one. (Note that
the converse of these two statements is not necessarily true.) Further­
more the sum of all the probabilities of all the sample points is one.
Probability theory is concerned with setting up a list of rules for
manipulating these probabilities and for calculating the probabilities
of more complex events. Most probabilities have to be estimated
from sample data but simple examples deal with equally likely sample
points which are known to all have the same probability.
Example 3
Toss two fair coins. Denote heads by H and tails by T. There are four
points in the sample space and they are equally likely to occur.
Sample space HH HT TH TT
Probability i i i i

39 Some definitions
Example 4
Throw a fair die.
Sample space 1 2 3 4 5
Probability
The probability of an event is the sum of the probabilities of the sample
points which constitute the event.

Example 5
If two fair coins are tossed, obtaining at least one head is an event,
denoted by E, which consists of the three sample points (HH), (HT),
and (TH).
Probability of this event = P(E)
= P(HH) + P(HT) + P(TH)
4 + 4 + ^ ^ 4.

Example 6
If two fair dice are tossed, the sample space consists of the thirty-six
combinations shown below.

1,1 1,2 1,3 1,4 1,5 1,6


2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6

Each of the thirty-six sample points is equally likely to occur and so


each has probability By inspection we can see for example that
P(sum of the 2 dice is 7) = ^ ^
P(sum is 2) = ^
P(sum is 7 or 11) = ^ + ^

3.3 Types of events


3.3.1 Mutually exclusive events
If two events, Ei and E 2 , are mutually exclusive, they have no common

40 The concept of probability


sample points. In a single trial mutually exclusive events cannot both
occur; the probability that one of the mutually exclusive events occurs
being the sum of their respective probabilities. This is the addition law
for mutually exclusive events.

P{E,+E2) = P{E,)-^P(E2). 3.1

The notation (Ei + E 2) means that at least one of the events occurs;
as applied to mutually exclusive events it means that one event or the
other occurs.

Example 7
From a fair pack of well-shuffled cards one card is dealt. Define the
event E^ as ‘the card is a spade’ and event E 2 as ‘the card is a heart’.
These two events are mutually exclusive as one card cannot be both a
heart and a spade.
Then P { E ,^ E 2 ) = M + M 1
2-

3.3.2 Not mutually exclusive


Two events that are not mutually exclusive contain one or more
common sample points. The probability that at least one of these events
occur is given by the general addition law

P(E i +E2) = P{E,) + P{E2)-P{E,E2l 3.2

where (Ej E 2) is the event that both Ei and E 2 occur.

Figure 9 Events and are not mutually exclusive

41 Types of events
The general addition law may be demonstrated pictorially. Let the
area of a unit square represent the probability of the entire sample space.
Inside this draw two figures with areas equal to the respective proba­
bilities of El and E 2 and such that the common area is equal to the
probability that both the events will occur.
The area of the event (Ei + E 2) is the sum of the areas of Ei and E 2
minus their common area Ei E 2 , which is included in both Ei and E 2 .
Thus we have
P(E i +E2) == P(E i ) + P (E 2)-P (E i E2).
Note that if Ej and E 2 are mutually exclusive, the probability of the
joint event (E 1 E 2) is zero - that is, P(E 1 E 2) = 0 - and then equation 3.2
reduces to equation 3.1. Mutually exclusive events have no common
area and can be depicted as in Figure 10.

Figure 10 Events and ^re mutually exclusive

Example 8
What is the probability of rolling two dice to obtain the sum seven
and/or the number three on at least one die? Let event Ej be ‘the sum
is T and event E 2 be ‘at least one 3 turns up’. By inspection of the table
in Example 6, we find
P (E i)^ ^ P(E2) = ih
P(E i E2) = ^ .
Thus P(E i + E 2) = P(E i ) + P(E 2) - P ( E i E 2) = A + =
This can be checked by looking at the table.
Events that are not mutually exclusive may be further classified as
dependent or independent events. Dependence between events is treated
by the notion of conditional probability.

42 The concept of probability


3.3.3 C o n d itio n a l proba b ility
We defined the probability of an event as the sum of the probabilities
of the sample points in the event. P(E) = ^ P(sample points in E). In
event
a more general sense we could have defined the probability of an event
as follows:
^ P(sample points in E)
event
P(E) = 3.3
^ P(all sample points)
sample
space

The denominator is of course unity.


Now suppose that we are interested in the probability of an event Ei
and we are told that event E 2 has occurred. The conditional probability
of E l, given that E 2 has occurred, is written P(E i |E 2), and read as
P(Ei given E 2). By analogy with equation 3.3, this conditional proba­
bility can be defined:
^ P(sample points common to Ej and E 2)
P(E i |E2) = ^
^ P(sample points in E 2)
E2

3.4
P{p2)
The effect of the conditional information is to restrict the sample space
to the sample points contained in event E 2.
Example 9
Given that a roll of two fair dice has produced at least one three, what
is the probability that the sum is seven? Event Ei is ‘the sum is 7’ and
event E 2 is ‘at least one 3 turns up’.
P(E i E3
f (e JE2
P(^l)
2/36
(see Example 8)
11/36

11 •

This result can be obtained directly from the table in Example 6, by


removing all points in which no three occurs. Of the remaining eleven
points exactly two give a sum which is seven.

43 Types of events
E x a m p l e 10

Given that the last roll of two dice produced the sum seven, what is
the probability that the next roll will also produce the sum seven?
Careful! If the dice are fair dice and the roll is preceded by a good
shake in a dice cup, the play is fair, which implies that dice have no
memory. Thus there is no connection between the results of successive
rolls, and they are independent events rather than dependent events.
In other words, the knowledge that the last roll was a seven contributes
nothing to our ability to predict the next roll.

3.3.4 Independent and dependent events


Two events, El and E 2 ,are said to bQ independent if F(Ei) = P(E i |E 2).
Thus the knowledge that event E 2 has occurred has no effect on the
probability of event E j.
Conversely, two events are said to be dependent if P(Ei) ^ P(E i|E 2).

Example 10 (continued)
As two successive rolls are independent the probability of the sum
seven on the next roll is still Te. The last result is irrelevant. This is a
subtle point.
Some people interpret the ‘law of averages’ as meaning that if the
last roll was a seven the next roll is less likely to be a seven; that the
dice will act to ‘even out’ the results. The implication is that the dice
have some kind of memory.
A diametrically opposed notion is that if sevens have been ‘popular’
they will continue to be. This idea is based on the belief that most real
gambling equipment will be slightly biased and therefore some events
will occur more often than expected. For this reason honest gambling
casinos make extreme efforts to keep their games of chance fair. It is
in their best interest. Most serious studies of casino data discredit both
the above theories. Games of chance do not have memories.

3.3.5 Joint events


The probability of the joint event (E^ E 2) can be obtained by con­
sidering

P(E,E,)
P(E,|E2) =
P(E,) ’

44 The concept of probability


which can be rearranged to give the general product law
P (E ,E 2) = P(E 2)P(E,|E 2).
Similarly P(E[ E 2) = P(E i )P(E2|E i ).
These relations are general and apply to both dependent and
independent events. Of course if the events are independent then
P(E i |E 2) = P(Ei), so that the equation simplifies to give
P(E,E2) = P(E,)P(E2).
This is called the product law for independent events.

Example 11
What is the probability of rolling the sum seven with two fair dice,
one of which will be a three? Event Ej is ‘the sum is 1\ event E 2 is
‘at least one 3 occurs’.
P (E ,E 2) = P(E 2)P(E,|E 2)
= XA (see Example 9)
_ 1

Example 12
What is the probability of drawing two aces from a well-shuffled pack?
Event El = ace on the first card
Event E 2 = ace on the second card
P(E 2|Ei) is obtained by noting that if the first card drawn is an ace,
then there will be fifty-one cards left in the pack of which three are
aces.
Thus P(E i E 2) = P(E i )P(E2|E i )
- X 3T

= 00045.
Although two cards are drawn in the above experiment, the student
may think of the situation as a single trial. We can write
El = ace on first card together with any second card,
E 2 = ace on second card together with any first card,
El E 2 = ace on first and second cards.

45 Types of events
Alternatively the student may think of the experiment as two trials.
The result of the second trial (the second card) depends on the result of
the first trial (the first card). The laws of probability give the same
result whichever way the problem is studied.

Example 13
A fair die is thrown. What is the probability that it will show 6 on the
first throw and 2 on the second throw?
El = 6 on first throw,
E 2 == 2 on second throw.
But the occurrence or non-occurrence of event Ei has no bearing on
whether or not E 2 occurs. Thus the two events are independent.

P(E i E2) = P(E i )P(E2)


— 6 ^6
1
= 36-

two events £1 and £2

A .

not iiiyiytilly yxcfusivy


P {B, £,) - O
■■■P{B, PA

mdependisnt ij depymleni
P{B. EA'^PiPAPiPABA

Figure 11 Types of events

46 The concept of probability


3.3.6 S u m m a r y o f typ es o f events

At first reading the classification of events may be a little confusing, but


Figure 11 may help to clarify the situation.

Example 14
If there are three events then similar reasoning and diagrams will
provide all the required relationships.

Figure 12 Three events

By inspection,
F(Ei +E2 + E3) = P(E,) + P(E2) + P(E3)-P(E,E2)-P(E i E3)
-P(E2E3) + P(E,E2E3),

PÍE
P n ,)- ^( e ,E 2 E 3)
Ir ,E
(E ,|E

P(Ei E2E3) = P(E,)P(E2|E,)P(E3|Ei E2).

If E l , E 2 and E 3 are independent events then

P{E,E2E,) = P{E,)P{E2)P{E,).

47 Types of events
E x a m p l e 15

A manned rocket vehicle is estimated to have a reliability (probability


of mission success) of 0*81 on the first launch. If the vehicle fails there
is a probability of 0*05 of a catastrophic explosion, in which case the
abort system cannot be used. If an abort can be attempted, the abort
system reliability is 0*90.
Calculate the probability of every possible outcome of the first
launch.

First step. Define the events.


El - Mission success
El - Mission failure
E2- Non-catastrophic failure
E 2- Catastrophic failure
E3 - Successful abort
E3 - Abort failure
E4 - Crew survives
E4 - Crew does not survive.
In each case the opposite event of E is denoted by E. (Note that if
P{E) -h P(E) = 1 the events E, E are called eomplementary events. In this
example, the events E 1 and E 1 are complementary, as well as E 4 and E 4.)

Second step. Classify the events. For example a mission cannot be


both a success and a non-catastrophic failure, so that Ei and E 2 are
mutually exclusive. In fact Ej is mutually exclusive with E 2 , E 3, E 3
and E4. E 2 is mutually exclusive with E 3, E 3 and E4. E 3 is mutually
exclusive with E 4. Also each event is mutually exclusive with its
opposite event.

Third step. The probabilities of the events can be found in succession.

P(Ei) = 0-81,
P(Ei) = l- P ( E i)
- 019.
P(Ei E 2) = P(E 2|E i )P(E i ) from equation 3.4. But a catastrophic failure
(E2) can only occur if a mission failure (E J occurs. Thus we have
E 1 E 2 = E 2.

48 The concept of probability


Therefore PÍE2 ) = P(E 2|E,)P(E,
= 005x019
= 00095.

By similar reasoning we have


P(E2) = P(E2E,)
= P(E 2|E,)P(E.)
= 0-95 X 019 launch
= 01805.
P(E3) = PiEjE^) 0-81 0-19

= P(E3|E2)P(E2)
= 0-90 X01805
= 016245.
0 -9 5 005
P(E^) = P(E,) + P(E 3)
= 0-81+016245
= 0-97245
P(E^) = 1 -P(E 4) 0-9 0-1
= 0-02755.
P(E3) = P(E3E2)
El E3 E3 E2
= P(E3 E2)P(E2) -------------- ----- ^-------
E4 E4
= 0-10x0-1805
= 0-01805.
Check: P(E 4> = P(E 2) + P(E 3)
0-02755 = 0-0095 + 0-01805.

3.4 Permutations and combinations


Many problems in probability require the total number of points in a
sample space to be counted. When there is a very large number of
these, a knowledge of combinatorial theory, and in particular of
permutations and combinations, is useful.

49 Permutations and combinations


A quantity frequently used in combinatorial theory is factorial n,
which is defined as factorial « = n! = n(n - l) ( n - 2 ) . . . 3 x 2 x 1 .
Note that it will be convenient to define 0! to be one.

3.4.1 Permutations
The number of ways in which r items can be selected from n distinct
items, taking notice of the order of selection, is called the number of
permutations of n items taken r at a time, and is denoted by or
P(n, r).
= n ( n -l)...( n - r- h l)
n\
{n -r)l
For example, the number of permutations of the letters a, b, c taken
two at a time is
= 3 x 2 = 6.
These are ab, ac, be, ba, ca, cb.

3.4.2 Combinations
The number of ways in which r items can be selected from n distinct
items, disregarding the order of the selection, is called the number of
combinations of n items taken r at a time. This is denoted by "C^, or
I n\

There are r! permutations which are the same combination. Thus


we have
_ __ i_ _ ________ = "C
^n
' ~ r! ~ (n-r)!r!
For example, the number of combinations of the letters a, b, c taken
two at a time is
3^ _ A_ 3

These are ab, ac, be.


Important n ote: ab is the same combination as ba but not the same
permutation.
The above expressions enable us to solve many problems in proba­
bility.

50 The concept of probability


E x a m p l e 16

Consider the probability of being dealt a bridge hand which consists


of all thirteen cards of one suit.
The total number of possible hands is the number of combinations
of thirteen cards out of fifty-two. (We are not concerned with the order
in which the cards are dealt.) This is ^^Ci3. The total number of
perfect hands is just four (all hearts, all spades, all diamonds or all
clubs). If all possible hands are equally likely then

probability (perfect hand) = 52

This is very small indeed (<10 ^^). Most such events which are
recorded in the newspapers are probably hoaxes.

Example 17
An inspector draws a sample of five items from a batch of a hundred
valves which are numbered from one to a hundred to distinguish them.
How many distinct samples can he choose? If there is one defective
item in the batch, how may distinct samples of size five can be drawn
which contain the defective item? What is the probability of drawing
a sample which contains the defective item?
As we are not concerned with the order in which the valves are
selected, the total number of distinct samples is the number of combina­
tions of five from a hundred, that is, In the second case one of
the items is fixed and we need to find the total number of ways of
selecting four more items from the remaining ninety-nine valves. This
is ^^€ 4.
Thus the probability of drawing a sample which contains the defective
item is
^^C4 J_
100^ m '
This probability can of course be written down straight away by noting
that any valve has a probability of of being in any particular sample.

3.5 Random variables


One of the basic ideas in probability is that of a random variable. In
many cases this is simply the numerical variable under consideration.

51 Random variables
For example, if a penny is tossed twice, the number of heads which turn
up can be either 0, 1 or 2 according to the outcome of the experiment.
We say that the number of heads is a random variable, since it expresses
the result of the experiment as a number. Other simple random variables
are the temperature of a chemical process and the resistance of a
thermocouple.
More generally a random variable is a rule for assigning a particular
number to a particular experimental outcome. For example, if the
result of an experiment is recorded as say ‘the experiment was success­
ful’ or ‘the experiment was not successful’, then to obtain a random
variable we must code the results so that, for example, a success corres­
ponds to 1 and a failure to 0. We can also consider more complicated
random variables. For example, if a penny is tossed twice, the random
variable X = (number of heads)^ can take the values 0,1 or 4 according
to the outcome of the experiment. The sample variance is another
example of a random variable, since for any given series of observations,
Xi, X2 , . . . , there is a corresponding number

which varies from sample to sample.


Mathematically a random variable is a numerically valued function
defined on the sample space. This means that for every possible
experimental outcome (sample point), there is a corresponding number.
This number is one way of expressing the information which results
from the experiment.
We have already discussed in section 2.1 the difference between
discrete and continuous experimental data. Similarly a random variable
is said to be discrete if it can only take a discrete set of values, and is
said to be continuous if it can take any value in some specified range.

Example 18
The items in a sample of ten electrical units are classified as defective
or non-defective. Is the number of defectives in the sample, X, a discrete
or continuous random variable?
The number of defectives in the sample can be any integer between
0 and 10. Thus X is a discrete random variable (it cannot take the value
1^ for example).

52 The concept of probability


E x a m p l e 19

Consider the random variable X = length of a screw. This variable


can be any positive number so that it is a continuous variable. However
if the length is measured, then the accuracy of the measurements is
limited by the accuracy of the measuring instrument so that in practice
the measured length can only take discrete values. Thus the distinction
between discrete and continuous data is not clear-cut. If the length is
measured to two decimal places and we observe say 2-40 inches, then
probability (measured length = 2*40 inches)
= probability (2-395 < X < 2-405).
Although all the probability concepts discussed this far have been
illustrated with discrete examples, the ideas are equally applicable to
continuous variables, with one important distinction. The number of
sample points in a continuous sample space is always infinite and the
probability of a single sample point is zero. As in Example 19 the
probability of a single sample point is replaced by the idea of the
probability of an interval, which contains an infinite number of sample
points.
In Chapter 4 we will continue our discussion of discrete random
variables and will return to continuous variables in Chapter 5.

Exercises
1. Two parts of a machine are manufactured independently. These two
parts have probabilities pi and p 2 respectively of failing. Find the
probability that
(a) neither part fails
(b) at least one part fails
(c) exactly one part fails.
2. A component A which has 90 per cent reliability (it will fail one in
ten times) is vital to the running of a machine. An identical component
is fitted in parallel to form system Si, and the machine will work
provided that one of these components functions correctly.

53 Exercises
3. In two rolls of a fair die what is the probability that
(a) the two outcomes are the same?
(b) the second outcome is larger than the first?
(c) the second outcome is a five?

4. What is the probability of getting


(a) at least three heads in four tosses of a fair coin?
(b) at least two tails in four tosses of a fair coin? (Assume that all
2"^ = 16 outcomes are equally likely.)

5. A ball is chosen at random out of an urn containing three black balls


and two red balls and then a selection is made from the remaining four
balls. Find the probability that a black ball will be selected
(a) at the first time
(b) the second time
(c) both times (assume all twenty outcomes are equally likely).

6. How many possible pairs consisting of a president and a vice-


president can be formed from a club of sixty members, provided no
member can hold both offices?

7. Special alloys are usually produced in batches called ‘melts’. Any


castings, forgings or machined parts which are made from these alloys
carry an identification of the melt because there may be significant
variation from melt to melt. The usual identification employed is three
letters. How many melt designations are possible?

8. A group of randomly selected people compare the months and days


of their birthdays. How large a group is required to have at least an
even chance of finding two people with the same birthday? (Ignore
leap years.) H in t: Define the event E 2 as the second birthday does not
match the first; the event E 3 as the third does not match the first or
second; and so on. Then P(No match between n people)

= Pi^n)
= p {e ,) p (e ,\E2)p {e 4 e ,E2 ,.P (E „ |E „ _ ,...E 2).

9. If a batch contains ninety good and ten defective valves, what is the
probability that a sample size five drawn from the batch will not
contain any defective items?

54 The concept of probability


10. (a) In order that the system

S ,: ---------------------- (A)-------- -<s>


consisting of components A and B should function correctly, both
components must function correctly. Assuming that each component
functions independently of the other, find the probability that Sj
functions, given that A, B have probabilities 0-8, 0-9 respectively of
functioning correctly.
(b) The system

is connected in parallel in such a way that if either of the sub-systems


A-B functions correctly, S2 functions correctly. Find the probability
that S 2 functions correctly.
(c) The system

functions correctly if either A and either B function correctly. Find the


probability that S 3 functions correctly.

11. You are playing Russian roulette with a gun which has six chambers.
What is the probability of someone being killed during the first six
pulls of the trigger? (The chamber is spun after each firing and you
may assume that the probability of hitting a loaded chamber = ^ This
‘game’ is not recommended!)

55 Exercises
Chapter 4
Discrete distributions

4.1 The discrete probability distribution


One of the most important concepts in probability is the idea of a
probability distribution. This chapter deals with discrete distributions,
while continuous distributions are considered in Chapter 5. We begin
with a formal definition. If a discrete random variable, X, can take
values Xi, ^ 2 , . . . , 5 with probabilities P i, P2? • • • ? Piv^ where
Pi + P 2 + - • -+Piv = 1 and Pi ^ 0 for all i, then this defines a discrete
probability distribution for X.
The probability that X takes a particular value x will be denoted by
P{X = x) or simply P{x),

Example 1
Toss two fair coins. Consider the random variable X = number of
heads which turn up. Then X can take the values 0, 1 or 2 according
to the outcome of the experiment. From Example 3, Chapter 3 we
know P{X = 0) = i , P{X = 1) = P{X = 2) = The sum of these
probabilities is one. Thus this random variable follows a discrete
probability distribution.

Example 2
Throw two fair dice. Consider the random variable X = (sum of the
pips on the two dice). Then X can take the values 2, 3 ,4 ,..., 11,12
according to the outcome of the experiment. By inspection of the
table in Example 6, Chapter 3, we find
P{2) = ie P(3) = ^ P(4) = ^ P{5) = A i ’(6) = TO
PC) = ^ P(8) = ^ P(9) = ^ i’(lO) = ^ P{n) = ^
P {n) =

56 Discrete distributions
These probabilities sum to unity and we have another discrete distribu­
tion.
The discrete probability distribution is a very important tool in
probability and statistics and in the remainder of the chapter we will
consider two valuable distributions which frequently arise in practice.

4.2 The binomial distribution


This distribution has a wide range of practical applications, ranging
from sampling inspection to the failure of rocket engines.
Suppose that a series of n independent trials is made, each of which
can be a success, with probability p, or a failure, with probability (1 —p).
The number of successes which is observed may be any integer between
0 and n. In section 4.3 we will show that
(probability of getting r successes out of n) = P{r)

= ”C , / ( l - p r - ^ (r = 0 , l , . . . , 4

These probabilities define a discrete probability distribution customarily


called the binomial distribution. The probabilities are obtained when the
expression

r=0

is expanded by the binomial theorem. In fact this is probably the easiest


way to remember the binomial probabilities. It also explains why the
distribution is ealled the binomial distribution and proves that the
sum of the probabilities is one.
A discrete distribution can be represented pictorially by a bar chart.
Figure 13 show four examples of the binomial distribution, with
different values for n and p, to show the various distribution shapes
which can occur.
Figure 13(c) is the mirror image of Figure 13(b), since the values for
n are the same and 0*9 = 1 -OT. When p = 0*5 the binomial distribu­
tion is symmetric about the point ¿n; Figure 13(d) is an example of
this with n = 20.
The terms of a binomial distribution can readily be calculated on a
pocket calculator for small values of n, but for values of n above about
20 it is often more convenient to approximate the binomial
distribution either with a Poisson distribution when np is ‘small’ (see
Section 4.6) or with a normal distribution (see Section 5.4).

57 The binomial distribution


QA (b )

0-2

0^0^.......

:0-4r (c)

0-2

00
9 10 11 12 13 14 15

F ig u ré is Binomial distributions
(a) reverse J-shaped, a? = 3, p = ¿ (b) positively skewed or skewed to the right,
/? = 15, p = 0-1 (c) negatively skewed or skewed to the left, a? = 15, p = 0-9
(d) symmetric, a? = 20, p = 0-5

58 Discrete distributions
4.3 The binomial model
The binomial distribution is applicable whenever a series of trials is
made which satisfies the following conditions.
(1) Each trial has only two possible outcomes, which are mutually
exclusive. We will usually call the two outcomes success (denoted by S)
and failure (denoted by F). Other possible pairs of outcomes are
defective or non-defective, go or no-go, heads or tails.
(2) The probability of a ‘success’ in each trial is a constant, usually
denoted by p. Thus the probability of a ‘failure’ is (1 —p).
(3) The outcomes of successive trials are mutually independent.
A typical situation in which these conditions will apply (at least
approximately) occurs when several items are selected at random
from a very large batch and examined to see if there are any defective
items (that is, failures). The number of defectives in a sample size n
is a random variable, denoted by X, which can take any integer value
between 0 and n. In order to find the probability distribution of X,
we first consider the simple case when n = 2.

4.3.1 2 trials
If two trials are made, the sample space consists of the four points
SS, SF, FS and FF.
Probability (2 successes) = P(SS)
= P(S)P(S) by the assumption of indepen­
dence

Probability (1 success) = P(SF)-fP(FS)


= p ( l- p ) + (l-p )p .
Probability (0 successes) = P(FF)

Thus the probability distribution of the number of successes in two


trials is
P(0) = (1 -p )^ P(l) = 2p(l- p ) P(2) = p \
Note that P(0) + P(l) + P(2) = 1 as required.

59 The binomial model


This distribution can be obtained by substituting n = 2 into the
general formula for the binomial distribution which was given in the
previous section.

Example 3
Toss two fair pennies. Let X = number of heads showing. If we think
of a head as a success and a tail as a failure we have
p ~ (probability of a head in one throw)
_ 1
— 2-
Substituting this value of p into the above formulae we have
P(0) = i P{1) = i P(2) = I
as already obtained in Example 1.

4.3.2 n trials
More generally we will consider the case where n trials are carried
out. The probability of getting r successes in n trials can be found as
follows;
F(exactly r successes) = P{r successes and (n —r) failures)
There are many possible sequences which can give exactly r successes.
For example, we can start with r successes and finish with {n —r) failures.
Since successive trials are independent we have

P(SS... S F F ... F) = / ( I - pT ~ \
Every possible ordering of r successes and {n —r) failures will have
the same probability, p''(l —p)"“ '*. The number of possible orderings is
the number of combinations of r out of n. Thus we have
P(exactly r successes) — —p)”“'' (r = 0 ,1 ,2 ,..., n)
and so the number of successes follows the binomial distribution.
In order to derive the binomial distribution in the above way, we
have carefully stated certain assumptions about the behaviour of
successive trials. These assumptions constitute what is called the
binomial model. This model is a typical example of a mathematical
model in that it attempts to describe a particular physical situation in
mathematical terms. Such models may depend on one or more para­
meters which specify how the system will behave. In the binomial model

60 Discrete distributions
the quantities n and p are parameters. If the values of the model para­
meters are known, it is a simple matter to calculate the probability of
a particular outcome. We will now illustrate the use of the binomial
distribution with some examples.

Example 4
Roll three dice. Find the distribution of the random variable
X = (number of twos which turn up).
X can take the values 0, 1, 2 or 3. Now the probability of getting
a two for one die is An incorrect but popular deduction is that the
probability of getting at least one two from three dice is Let us call
the event ‘getting a two’ a success so that the probability of a success is
As the results from the three dice are independent, the binomial model
is applicable and the distribution of the number of twos is a binomial
distribution with parameters n = 3, p = ^.
P(0) = = 0-58,
i’(l) = = 0-35,
P(2) = ^C2(i)^(|) = 0-07,
P(3) = = 0-005.
Note that the probability of getting at least one 2 is 0-42 which is
considerably less than | .
In effect this means that if we roll 3 dice 100 times we would expect
to get the following results.

Number of occurrences
58
35
7
0

Of course if we actually try it we will probably get a slightly different


distribution due to sampling ffuctuation. The above distribution is
what we expect on average.

Example 5
On past evidence, an electrical component has a probability of 0*98 of
being satisfactory. What is the probability of getting two or more
defectives in a sample size five?

61 The binomial model


The number of defectives in a sample is a binomial random variable
with p = 0*02, n = 5 so that
P(0 defectives) = 0*98^
= 0-904
P(1 defective) = 5 x 0-98"^ x 0-02
= 0-092.

This means that the probability of getting two or more defectives in a


sample size five is
1-0-904-0-092 = 0-004
which is very small indeed.
In effect this means that if two or more defectives are observed in
practice then it is likely that something has gone wrong with the
manufacturing process.

Example 6
The latest recoilless rocket launcher has a kill probability of 0-60. How
many rockets would you have to fire to have at least a 95 per cent
probability of destroying the target? What is the probability distribu­
tion of salvos of this size?
Let the required salvo contain n rockets. The probability of success
of each rocket is given by p = 0-60. Thus the number of rockets which
actually hit the target has a binomial distribution with parameters n,
and p = 0-60. Thus the probability that no rockets hit the target is
given by (1 —p)” = 0-40".
Hence the probability that the target is destroyed (that is, at least
one rocket hits the target) is given by
1-0-40".
We are told that this must be greater than 0-95. Therefore,
1-0-40" > 0-95
0-05 > 0-40"
giving n = 4 since n must be an integer.
Then the probability distribution for salvos of this size is given by

62 Discrete distributions
P{r hits) = ^C(0'60)'‘(0-40)'^"'' (r = 0, 1,2, 3,4)
P(0 hits) = 0-026
P(1 hit) = 0-154
P{2 hits) = 0-346
P(3 hits) = 0-346
P(4 hits) = 0-130.

Example 7
An inspector takes a random sample of ten items from a very large
batch. If none of the items is defective he accepts the batch, otherwise he
rejects the batch. What is the probability that a batch is accepted if
the fraction defective is 0, 0 01, 0-02, 0 05, 01, 0-2, 0-5, 1? Plot this
probability against the fraction defective.

Figure 14

The number of defectives in a sample is a binomial random variable


with parameters n = 10 and p = fraction defective. Thus
P(0 defectives) = (1 —p)^®
= probability that a batch is accepted.
When p = 0 the batch is certain to be accepted; when p = 1 the batch
is certain to be rejected.
For intermediate values of p the expression (1—p)^^ must be
evaluated. The results are plotted in Figure 14.

63 The binomial model


4.4 Types of distribution
Before continuing with our discussion of theoretical discrete distribu­
tions, we will pause briefly to distinguish between the four different
types of distribution which can occur.

Table 5

Discrete Continuous

Theoretical
(parameters known) type I type III

Empirical
(parameters unknown) type II type IV

In Chapter 2 we studied empirical frequency distributions of types II


and IV. For example, the distribution given in Table 1 is an example of
type II, and the distribution given in Table 2 is an example of type IV.
The technologist will often wish to analyse empirical frequency distribu­
tions of types II and IV and once again it is worth emphasizing that
they are easier to analyse when the corresponding theoretical distribu­
tions have been examined and understood. In other words the best
method of analysing empirical frequency distributions is to set up a
model for the physical situation and hence find the probability distribu­
tion which adequately describes the data. There are many theoretical
distributions available, and the binomial distribution is a particularly
useful example of type I. A little later in the chapter, in Table 6, the
student will find an empirical frequency distribution together with the
corresponding binomial distribution which adequately describes the
data. Theoretical continuous distributions of type III will be described
in Chapter 5.
We now return to a discussion of some of the properties of the
binomial distribution.

4.5 The mean and variance of the hinomial distribution


In Chapter 2 we saw how to calculate the sample mean of an empirical
frequency distribution. If the values , X2, ..., x^v occur with frequencies
/ i 5/ 2i • • • 5/ n then the sample mean is given by

64 Discrete distributions
N

Z fi^i
X = ~N •

This quantity is an estimate of the true population mean, which is


usually denoted by p. In the case of the binomial distribution, the
theoretical mean depends on the values of the binomial parameters,
n and p. If a series of n trials is made, each of which has a probability p
of being a success, then the average or mean number of successes is
given by
p = np.
For example, if a penny is tossed 20 times, the average number of heads
observed will be 10 = 20 x | , since n = 20 and p == Of course in a
single experiment we may observe any integer between 0 and 20. The
above result says that in the long run the average number of heads
observed will be 10.
It is very important to understand the difference between the
theoretical mean, p, and the sample mean, x. In most practical situa­
tions the theoretical mean is unknown and the sample mean is used to
estimate it (see Chapter 6). But the two quantities will almost always be
different.
To illustrate this point we will describe an experiment in which the
theoretical mean is actually known. Suppose that each person in a
group of people randomly selects ten telephone or car numbers. Then
there is a probability p = ^ that any number will be even. The actual
total of even numbers that 1 person selects may be any integer between
0 and 10 and the probability of getting exactly r even numbers is given
by the binomial distribution with n = 10 and p = 7 . These probabilities
are given in Table 6. The experiment was carried out by 30 people and
the number of people who chose r even numbers out of 10 is also
recorded in Table 6. These observed frequencies can be compared with
the binomial probabilities by multiplying the latter by the sample size,
30, to give a set of expected frequencies. Comparing the two distributions
by eye, it can be seen that the binomial distribution gives a reasonably
good fit to the observed frequency distribution.
The theoretical mean of this binomial distribution is given by
np = 5. Thus a person will choose five even numbers on average.

65 The mean and variance of the binomial distribution


T a b le 6
T h eo r etica l an d O b serv ed D istr ib u tio n s o f E v en T e le p h o n e N u m b e r s

Expected Number of
frequency Observed observed
= P, X30 frequency even numbers

0 0-001 0-0 0 0
1 0-010 0- 3 0 0
2 0-044 1-3 2 4
3 0-117 3-5 3 9
4 0-205 6-2 5 20
5 0-246 7-4 9 45
6 0-205 6-2 3 18
7 0-117 3-5 6 42
8 0-044 1-3 1 8
9 0-010 0-3 1 9
10 0-001 0-0 _0 0
30-0 30 155

The sample mean of the observed frequency distribution can be


obtained by dividing the total number of observed even numbers by
thirty. This gives
X= == 5-17.
Thus there is a difference of 0-17 between the sample mean and the
theoretical mean. However the values are close enough together to
demonstrate that the observed distribution is very similar to the
theoretical binomial distribution. In fact in Chapter 6 we shall see that
the sample mean, x, tends to get closer to the theoretical mean as the
sample size is increased. In other words if 1000 people each randomly
select ten numbers then the average number of even numbers will
probably be much closer to five.

4.5.1 The expected value


The mean of a probability distribution is not always as easy to find
as that of the binomial distribution and so we will give a formal defini­
tion of the mean of a distribution based on the expected value symbol
E. The idea of the expected value of a random variable is an extremely
useful one and can be used to calculate many other quantities (see
section 6.2).

66 Discrete distributions
If X is a discrete random variable which can take values , X2, . . . ,
with probabilities Pi,P2,- - ^Pn then the average or expected value of
X is given by

E{X) = X x,p,. 4.1

It may not be immediately obvious that this formula will give a quantity
which is the average value of X in a long series of trials. However there
is a clear analogy with the formula for the sample mean if we use the
N

fact that Y j Pi ~ ^ write the above formula as


i=1
N

E XiPi
i= 1
E(X)
E p .-

We can check this formula by calculating the expected value of a


binomial random variable which we already know is np. From equation
4.1 we have

E (X )= E " C ^ p V - p f- 'r
r =0

n- 1

= np E
r =0
= np[p + ( l- p ) ] " “ ‘
= np.

Example 8
Toss two coins and let X = number of heads. It is clear that the expected
number of heads will be one. This is confirmed by applying the above
formula which gives

E(X) = 0 x P { X = 0) + l xP(X - l) + 2 x P (X = 2)
= 1 x ^ + 2 xj:

= 1.

67 The mean and variance of the binomial distribution


Example 9
A gambler wins £20 with probability 0-1 and loses £1 with probability
0*9. What is his expected win?
Let X = amount won.
Then A = +20 with probability OT
X = —1 with probability 0-9.
Thus E(X) = 20x0*1-1 xO-9
= M.
Note that the expected value need not be a possible value of the
random variable. Thus in Example 9 the gambler cannot win £1*1 in
any particular game. The result says that in a long series of games he
will win £ 1*1 per game on average.

4.5.2 Variance
The idea of an expected value is also extremely useful when calculating
the theoretical variance, and hence the theoretical standard deviation,
of a probability distribution.
We have already seen in Chapter 2 how to calculate the sample
variance of a set of data. If n observations , . . . , are taken from a
population which has mean /i, the sample variance is given by

This quantity is an estimate of the true population variance, which is


usually denoted by Let us suppose that A is a discrete random
variable, with mean p, which can take values Xj, X2 , . . . , with
probabilities Pi,P 2 ,- - ,P n ^ Then by analogy with the formula for 5^
we have
variance (A) =

= E 4.2

The variance is a measure of the spread of the distribution around its


expected value p. In terms of the expected value symbol we have

68 Discrete distributions
variance (X) = —
In other words the variance of X is the average or expected value of
(X — in a long series of trials.
From equation 4.2 the theoretical variance of the binomial distribu­
tion is given by

I ir-n p f"C ,p V -p r',


r=0

which after a lot of algebra gives the simple formula np{l —p). From this
result it follows that the standard deviation of the binomial distribution
is given by J[np{\-p)].

4.6 The Poisson distribution

Another important discrete distribution is the Poisson distribution,


which is named after a French mathematician. The probability distribu­
tion of a Poisson random variable is given by

P(r) = (r — 0, 1 , 2, . . . p > 0).

These probabilities are non-negative and sum to one.

r=0 r=0 ^•

00 r
r =0 r\
—e ^ e

Thus the probabilities form a discrete probability distribution. These


probabilities are easy to calculate from P(0) = using the recurrence
relationship

fiPjr)
P(r-hl) =
(r+l)‘

69 The Poisson distribution


The Poisson distribution has two main applications; firstly for
describing the number of ‘accidents’ which occur in a certain time
interval, and secondly as a useful approximation to the binomial
distribution when the binomial parameter p is small. We begin with the
former situation.

X denotes an event

Figure 15

Let US suppose that events occur randomly as shown in Figure 15


and that there are /I events on average in a unit time interval. This means
that in a time interval of length t there will be h events on average. But
the actual number observed in one particular time interval may be
any non-negative integer. A typical situation of this type was presented
in Example 2, Chapter 1, where the arrival of a cosmic particle can
be thought of as an ‘accident’ or event. The number of particles striking
an apparatus in successive periods of one minute varied between
nought and four. The reader must realize that we cannot predict
exactly how many particles will arrive in a particular time interval.
What we can do is to predict the pattern of arrivals in a large number
of such time intervals. We will show that the number of accidents, in a
time interval of length t, is a Poisson random variable with parameter
p = h . The proof is rather sophisticated and the student may prefer
to omit the following subsection at the first reading.

4.6.1 The Poisson model


In a very small time interval of length Ai, let us suppose that the
probability of observing one accident is given by M i and the chance
of observing more than one accident is negligible. In addition let us
suppose that the numbers of accidents in two different time intervals
are independent of one another.
Let P(r, t) = probability of observing exactly r accidents in a time
interval of length t. Then,
P(0, Ai) = 1 —2Ai (from the first assumption)
and
P(0, i + Ai) = P(0, i)P(0, Ai) (by the assumption of independence)
= P(0, i) ( l- M i) .

70 Discrete distributions
Rearranging this equation we obtain
P(0,í + A í)-P(0, t)
At
The left hand side of this equation can be recognized as the definition of
dP{0, t)
as Ai 0.
dt
This simple first order diiferential equation for P(0, t) can be solved to
give
m t) =
using the initial condition P(0,0) = 1.
A similar differential equation for P(l, i) can be obtained by noting
that
P(l, i-f Ai) = P(l, t)P(0, Ai) + P(0, t)P{l At)
= P ( l,í ) ( l - M í ) + ^"^'/lAí.
This gives
P (l,í + A í)-P (l,í) _ d P (lt)
At dt
= - X P { l t ) + Á e-^\

which can be solved to give


P (lt) =
using the initial condition P(l, 0) = 0.
In a similar way, we can find P(2, i), P(3, t)__ Generally we find

P{r, t) = (r = 0, 1 , 2,...).

If we put fi = h
= (average number of accidents in time i),
then we have a Poisson distribution as previously defined.
This model for the Poisson distribution depends on a random
physical mechanism called a Poisson process, further details of which
can be obtained from any book on stochastic (that is, random) processes
(see the Further Reading list).

71 The Poisson distribution


4.6.2 Properties of the Poisson distribution
(i) In the Poisson process, the average number of accidents in time t is
given p = Xt. Thus p must be the mean of the Poisson distribution.
The mean can also be calculated using equation 4.1.
00 00_
Mean =2r=0
=2
00

= '■ 2 rl
r — yj

= p.

(ii) An important feature of the Poisson distribution is that the variance


is equal to the mean, p. Thus the standard deviation is given by y/p.
We find

variance = E[{X —p)^]


00

= ^ (r —^)^P(r) (from equation 4.2)


r=0

= Y, f i r — l)-\-r —2pr-j-p^}P{r)
r=0

= p^ p —2p^ + p^ (after some algebra)

(iii) The Poisson distribution is a useful approximation to the binomial


distribution when the binomial parameter p is small. The binomial
distribution has mean np and variance np(l —p). Thus when p is small
we find

variance ^ np = mean,

as for the Poisson distribution. Then it can be shown (see for example,
Hoel, 1962) that if n ^ oo and p ^ 0 in such a way that the mean,
p = np, stays fixed, then the binomial probabilities tend to the Poisson
probabilities. This is of practical importance because it is much easier
to calculate Poisson probabilities than the corresponding binomial
probabilities. Generally speaking if n is large (>20) and the mean,
p = np, is small (<5) then the approximation will be sufficiently
accurate.

72 Discrete distributions
06

0-4

(a) (b)

0-2
11 =
^ «
» . •« :* ....
04\ 00 "" 1
0 1 2 0 1 2 3 4 5

: 0-3^
0-2[' (c)
0-1 ['
ool
0
■III
1 2 3 4 5 6 7 8

Figure 16 Poisson distributions


(a) /I = 0-5 (b) /i = 1 (c) /I = 3

(iv) Figure 16 shows three examples of Poisson distributions, with


different values for /i, to show the types of distribution which can occur.

Example 10
Suppose that the number of dust particles per unit volume in a certain
mine is randomly distributed with a Poisson distribution and that the
average density is p particles per litre.
A sampling apparatus collects a one-litre sample and counts the
number of particles in it. If the true value of p is six, what is the proba­
bility of getting a reading less than two?
Let X = (number of particles in a one-litre sample).

P(X = r) = e-^~, (r = 0,1,2,...).


rl
Therefore
P{X <2) = P{X = 0)EP{X = 1)
= ^-6 -¡-6 e - 6

= 0-0174.

73 The Poisson distribution


E x a m p l e 11

Suppose that the probability of a defect in a mile of steel wire is 0 01.


A steel cable consists of a hundred strands and will support its design
load with ninety-nine good strands. What is the probability that a
mile-long cable will support its design load?
Consider the random variable X = number of defective strands in
a mile-long cable. This will have a binomial distribution with para­
meters
n = 100 (strands)
and p = 0*01 = (probability that any strand is defective).
Then
P(cable is not defective)
= P(no bad strands or just one bad strand)
= ^ C o ( 0 - 0 1 ) ^ ( 0 - 9 9 ) 100 100 ^ ^(Q.Q j ) (0 - 9 9 )^ ^

= 0-366 + 0-370
= 0*736.
As n is large and p is small it is possible to approximate this binomial
distribution with a Poisson distribution which has the same mean
p = np = \.
The probability that the cable has r defective strands is given approx­
imately by e~ ^/r\. Thus the probability that the cable is not defective is
given approximately by
- 0*368-hO-368
= 0*736.
The answers obtained from the binomial and Poisson distributions are
virtually identical, and since it is much easier to calculate e~^ than
0-99100^it is preferable to use the Poisson distribution approximation.

Example 12
The number of cosmic particles striking an apparatus in forty con­
secutive periods of one minute were given in Example 2, Chapter 1.
On the assumption that cosmic particles arrive randomly at a constant

74 Discrete distributions
over-all rate, one expects the Poisson distribution to describe the
frequency distribution of the number of particles arriving in a one-
minute period.
The total number of particles observed is
13 + (2x8) + (3x5) + ( 4xl ) - 48.
(The average number of particles observed per minute) = ^ = 1-2.
The Poisson distribution with this mean is given by

P{r) = e- (r = 0, l , 2,...)
rl
= (probability of observing r particles in a one-minute period if
^ = 1 -2).
These probabilities are tabulated in column 3 of Table 7.

Table 7

Number of particles (r) Number of periods Poisson Poisson


in a one-minute period with r particles probabilities frequencies

0 13 0-301 12-0
1 13 0-361 14-4
2 8 0-216 8-7
3 5 0-087 3-5
4 1 0-026 1-0
5-h 0 0-011 0-4

Total 40 1-0 40-0

Multiplying these probabilities by forty (the sample size) gives the


frequencies which can be expected if the Poisson model is appropriate.
These frequencies are tabulated in column 4 of Table 7. Comparing the
observed and theoretical frequencies by eye, we see that there appears
to be good agreement between the two distributions. This indicates that
the Poisson model really is appropriate in this situation.

4.7 Bivariate discrete distributions


So far we have considered the discrete probability distribution for a
single random variable. It is a straightforward matter to extend the

75 Bivariate discrete distributions


ideas to deal with the situation in which we are interested in two random
variables, X and 7, at the same time.
The joint probability that X will take a particular value x, and that
Y will take a particular value y is denoted by

P{x,y) = P{X = x , Y = y ) .

This function is such that

Y.P(x,y) = L
x,y

The probability of obtaining a particular value of one random


variable without regard to the value of the other random variable is
called the marginal probability. Thus

Px(x) = X >')’

Py(y) = Y.P(x,y)
X

are the marginal probability distributions of the two random variables.


The two random variables are said to be independent if
P(x, y) = Px{x)PYiy) for all values of x and y.
The idea of independence can be clarified as follows. Suppose that
we know that a particular value of 7, say y, has been observed, and
we want to know the probability that a particular value of X will occur.
We write
P(^\y) = probability that X = x given that Y = y.
We call this the conditional probability that X equals x given that the
value of 7 is y. From equation 3.4 we have
P{x,y)
P{x\y) =
Pyiy)
Thus if A', Y are independent we have

P{x\y)=Px{x)
and then the knowledge that a particular value of 7 has been observed
does not affect the probability of observing a particular value of X.
The conditional distribution of X is then the same for all values of 7
(and vice versa).

76 Discrete distributions
E x a m p l e 13

Suppose the joint distribution of X and Y is as given below

Marginal
distribution
0 1 2 of Y
1 3
0 i F1 F
1
y 1 1 0 ïï3

2 1 1 0 1
F

Marginal
1
distribution of X 2
1
4 41

The grand total of the joint probabilities is one. The row totals form
the marginal distribution of Y ; the column totals form the marginal
distribution of X.
If y = 0 the conditional distribution of X is given by

P(0\0) = ^ = 1/3,

n m = ^ = 1/3,

P(2|0) = I ? = 1/3.

These conditional probabilities also add up to one. The other condi­


tional distributions of both X and Y can be found in a similar fashion.
Since the above conditional distribution is not the same as the marginal
distribution of X, the two random variables are not independent.
Exercises
1. Packets of food are filled automatically and the proportion of
packets in a very large batch which are underweight is p. A sample size
n is selected randomly from the batch and the probability that the
sample contains exactly r defective packets (r = 0, 1 , 2, . . . , n) follows
a certain probability distribution. Name this distribution and write
down the probability that the sample contains exactly r defective
packets.

77 Exercises
For one particular process it has been found in the past that 2 per cent
of the packets are underweight. An inspector takes a random sample of
ten packets. Calculate

(a) the expected number of packets in the sample which are under­
weight,
(b) the probability that none of the packets in the sample are under­
weight.
(c) the probability that more than one of the packets in the sample is
underweight.

2. Compute and plot the binomial distributions when n = 4 for


(a) P = i
(b) p = i
(c) P = i

3. It has been found in the past that 4 per cent of the screws produced
in a certain factory are defective. A sample of ten is drawn randomly
from each hour’s production and the number of defectives is noted.
In what fraction of these hourly samples would there be at least two
defectives? What doubts would you have if a particular sample con­
tained six defectives?

4. One per cent of a certain type of car has a defective tail light. How
many cars must be inspected in order to have a better than even chance
of finding a defective tail light?

5. An electronic component is mass-produced and then tested unit by


unit on an automatic testing machine which classifies the unit as
‘good’ or ‘defective’. But there is a probability 01 that the machine
will mis-classify the unit, so that each component is in fact tested five
times and regarded as good if so classified three or more times. What
now is the probability of a mis-classification?

6. A recent court decision supported a pilot’s decision to continue to


his destination with a four-engine jet aircraft after an engine failure at
midrange on a two-hour flight, rather than land at the midway point.
His argument was that the aircraft will fly on two engines and that the
probability of two additional failures was quite small. If the one-hour
reliability of a single engine is 0-9999, do you agree with the pilot’s

78 Discrete distributions
decision? If the aircraft was a three-engine jet that will fly on one engine
would you agree? Comment on the comparative reliability of the three-
and four-engine Jets in this situation.

7. The new ‘Tigercat’ sports car has an idle loping problem as about
10 per cent of the Tigercats have an unstable fluctuating engine speed
when they are idling. An engineering ‘fix’ is put in a production pilot
lot of a hundred cars.
(a) If the fix has no eflfect on the problem, how many cars would you
expect to have the fault?
(b) If only two of the pilot lot have the idling fault, and the other
ninety-eight cars are not defective, would you conclude that the fix has
a significant effect? (H int: Show that the probability of getting two or
less defectives in a sample size of a hundred, given that the fix has no
effect, is very small indeed.)

8. The average number of calls that a hospital receives for an ambu­


lance during any half-hour period is 0-3. Considering a reasonable cost
per ambulance and crew and presuming that any ambulance will
return to the hospital in half an hour, how many ambulances would
you recommend for this hospital? Comment on the idea of ambulance
pools which are shared by several hospitals.

9. A manned interplanetary space vehicle has four engines each with


reliability 0-99. Each engine has a failure detection system which may
itself fail. If the engine does fail there is a conditional probability of
0 02 that a success will be signalled. If an engine fails and is not detected
the result is catastrophic. However the mission can be completed with
three engines if one engine fails and is detected. What is the probability
of mission success? If there is no abort system or escape system, what
action would you recommend if two failures are signalled? (Hint:
Calculate probability of no engine failures plus one detected failure.)

10. It has been found in the past that one per cent of electronic com­
ponents produced in a certain factory are defective. A sample of size one
hundred is drawn from each day’s production. What is the probability
of getting no defective components by using
(a) the binomial distribution,
(b) the Poisson distribution approximation.

79 Exercises
11. A construction company has a large fleet of bulldozers. The average
number inoperative at a morning inspection due to breakdowns is two.
Two standby bulldozers are available. If a bulldozer can always be
mended within twenty-four hours of the morning inspection find the
probability that at any one inspection
(a) no standby bulldozers will be required,
(b) the number of standby bulldozers will be insufficient.

12. There are many other discrete distributions apart from the binomial
and Poisson distributions. For example the hypergeometric distribution
arises as follows. A sample size n is drawn without replacement from a
finite population size N which contains Np successes and N ( \ —p)
failures. Show that the probability of getting exactly x successes in the
sample is given by
N(1 -
P(x) = (x = 0, 1 , 2,..., minimum of n and Np).

If N is very large compared with n, it can be shown that this distribu­


tion can be approximated by the binomial distribution with
P (x )^ ^ C ,p V -p r (x = 0, 1, 2, . . . , n).

13. A series of items is made by a certain manufacturing process. The


probability that any item is defective is a constant p, which does not
depend on the quality of previous items. Show that the probability that
the rth item is the first defective is given by
P(r) = p { l - p y - {r= 1, 2,...).
This probability distribution is called the geometric (or Pascal) distribu­
tion. Show that the sum of the probabilities is equal to one as required;
also show that the mean of the distribution is equal to \/p.

Reference
H o e l , P. G. (1962), Introduction to Mathematical Statistics, W iley, 3rd edn, (4th edn
published in 1971).

80 Discrete distributions
Chapter 5
Continuous distributions

5.1 Definitions
In Chapter 4 we considered discrete distributions where the random
variable can only take a discrete set of values. In this chapter we con­
sider continuous distributions where the random variable can take
any value in some specified interval.
In Section 2.2 we described how observations on a continuous
variate can be plotted as a histogj:am. As more and more observations
are taken, and the class interval is made smaller, the histogram tends
to a smooth curve called a frequency curve.

If the height of the curve is standardized so that the area underneath


it is equal to unity, then the graph is called a probability curve. The height
of the probability curve at some point x is usually denoted by /(x),
and this function is called the probability density function (often ab­
breviated p.d.f.). This non-negative function is such that it satisfies the
condition that the area under the probability curve is unity.

J
00

f(x ) d x = 1 .

81 Definitions
It is important to realize that f(x) is not the probability of observing
X. When the variate is continuous we can only find the probability of
observing a value in a certain range. For example, / ( xq)Ax is the
probability of observing a value between Xq and X q + Ax. In other
words it is the area of the shaded strip in Figure 18. Also note that
the probability of observing a single value is zero.

Figure 18

More generally the area between the verticals at Xj and X2 gives


the probability that an observation will fall between and X2 .

P r o b a b , l i „ ( , , < X < x , , = J / ( x ) dx.

Figure 19

82 Continuous distributions
Example 1
A random variable is said to be uniformly distributed between a and b
if it is equally likely to occur anywhere between a and b. Thus the height
of the probability curve of the uniform distribution is a constant
between a and b. Elsewhere the probability density function is zero.

f{x}-

Figure 20 P.D.F. of uniform distribution

The total area under the curve is equal to C{b —a). But this is equal to
unity by definition. Thus we have

C =
ib-aY

a < X < b,

elsewhere.

Another way of describing a probability distribution is to specify a


function called the cumulative distribution function (often abbreviated
c.d.f.). This function, usually denoted by F(x), is defined by
F(x) = probability {X ^ x)
= (probability of observing a value less than or equal to x).
This function is the theoretical counterpart of the cumulative frequency
diagram which was described in section 2.2. From a mathematical
point of view the cumulative distribution function is the best way of
describing a distribution since it can be used for both discrete and
continuous distributions. For this reason it is often simply called the
distribution function.

83 Definitions
For discrete distributions it is a step function which increases from
zero to one. However, from a practical point of view, the function is
more useful for problems involving continuous variables, which are
the concern of this chapter. Since, for continuous distributions, the
probability of observing a single value is zero, it is immaterial whether
we write P ( X ^ x) or P {X < x) in the continuous case.

Figure 21

For such a distribution we find

F{xo) = J/(.v)i/x
—X

= (area under the probability curve to the left of .Yq)^


and F{ x 2 ) - F(x,) = P(xi < X ^ X2) = P(x^ < X < X2).

84 Continuous distributions
The function must increase from zero to one since we have
F (-o o ) - 0
and
F(+ oo) = 1.
It is often S-shaped as in Figure 22.

Example 2
Find the cumulative distribution function of the uniform distribution
(see Example 1). The random variable cannot take a value less than a.

Figure 23

Thus we have

F{x) = 0 X < a.

In addition the random variable must always be less than or equal to b.


Thus we have

F{x) = 1 X > h.

For values of x between a and h we have

x —a
F(x) =
b —a

85 Definitions
Thus we have
0 X < a,
x —a
F{x) = a^ X h,
b -a
1 X > b.
This function is illustrated below.

The c.d.f. of a continuous distribution describes it just as completely


as the p.d.f. Thus the two functions are complementary, and they can
be obtained from one another using the relations

=J
XO

F{xo) f(x ) dx

and
dF{x)
f{y^)
dx
(Mathematical note: F{x) may not be differentiable at isolated points
and the p.d.f. is not defined at these points.)

5.2 The mean and variance of continuous distributions


By analogy with equation 4.1 the definition of the mean of a continuous
distribution is given by

86 Continuous distributions
00

E{X) xf(x) dx. 5.1

This value is the average or expected value of the random variable in


a long series of trials. The summation sign in equation 4.1 is replaced
with the integral sign.
By analogy with equation 4.2 the definition of the variance of a
continuous distribution is given by
variance (X) = E[{X —fi)^] where = E{X)
ou

{x -n ff(x )d x . 5.2
- i

Example 3
Find the mean and variance of the uniform distribution which has
p.d.f.
—1 < X < 1,
f{x) =
elsewhere.
By inspection the distribution is symmetric about the point x = 0,
which must therefore be the mean of the distribution. This can be
confirmed by equation 5.1 which gives
1
E(X) = J x j d x = 0.

Using equation 5.2 the variance is given by

J x ^ jd x = y.

5.3 The normal distribution


The normal or Gaussian distribution is the most important of all the
distributions since it has a wide range of practical applications. It is
sometimes called the bell-shaped distribution, a name which aptly
describes the characteristic shape of many distributions which occur

87 The normal distribution


in practice. For example the histogram and frequency curve, which are
plotted in Figure 6, are symmetric and shaped roughly like a bell.
The normal distribution is a mathematical model which adequately
describes such distributions. The height of the normal probability curve
is given by

f(x) = - c o < x < + o o ,

where /i, a are parameters such that —oo < < oo and (T > 0.

Figure 25 The normal distribution

The reader need not remember this rather complicated formula. A


graph of the normal curve is given in Figure 25.
The p.d.f. is negligible for values of x which are more than 3cr away
from
It is worth pointing out that this p.d.f. has been chosen so that the
total area under the normal curve is equal to one for all values of
and (T. This can be shown by making the transformation z = {x —}i )Ig
and using the standard integral
ou

J dz — yj2n

which is given in most tables of integrals.


The curve is symmetric about the point x jn, which must therefore
be the mean of the distribution. It can also be shown, using equation 5.2,
that the variance of the normal distribution is equal to Thus the
standard deviation of the normal distribution is equal to g.

88 Continuous distributions
f{>()

Figure 26 Two normal distributions

The shape of the normal curve depends on the standard deviation, rx,
since the larger this is, the more spread out the distribution will be.
Figure 26 shows two normal distributions, both of which have mean
zero. However one has a = 1 and the other has cr = 3.
Whatever the values of and rr, the normal distribution is such that
about one observation in three will lie more than one standard deviation
from the mean, and about one observation in twenty will lie more than
two standard deviations from the mean. Less than one observation in
300 will lie more than three standard deviations from the mean.
In practical problems, the height of the normal curve is of little
direct interest, and instead interest centres on the cumulative distribu­
tion function. For example, if the weights of a batch of screws are
known to be approximately normally distributed with mean 2T0 grams
and standard deviation 0T5 grams, we might want to find the propor­
tion of screws which weigh less than 2*55 grams. This can be found from
the cumulative distribution function.
F(x) = probability (X ^ x)

1
e-*“-") du
J{2n)(x

Unfortunately this integral cannot be evaluated as a simple function


of X. Instead it must be integrated numerically, and the results can
then be tabulated for different values of x. Table 1 in Appendix B
tabulates the c.d.f. of a normal distribution with zero mean and a
standard deviation of one. This particular normal distribution is often

89 The normal distribution


called the standard normal distribution. It turns out that the c.d.f. of any
other normal distribution can also be obtained from this table by
making the transformation
X -fi
Z =

where the random variable X has mean and standard deviation a.


It is easy to show that the standardized variable Z has zero mean and a
standard deviation of one, as required.
For a particular value, x, of X, the corresponding value of the
standardized variable is given by
(x -/i)
z=

and this is the number of standard deviations by which x departs from


fi. The cumulative distribution function of X can now be found by
using the relation
F(z) = probability (Z ^ z)

= probability

= probability (X ^ ju + za).
The full table of F{z) is given in Table 1, Appendix B. Some particularly
useful values are also given in Table 8.

Table 8

z 0 1 2 2-5 3-0

F(z) 0-5 0-84 0-977 0-994 0-9987

Table 1, Appendix B, only tabulates F(z) for positive values of z.


Values of F{z) when z is negative can be found by using the symmetry
of the distribution. The two shaded areas in Figure 27 are equal in area.
Probability (Z ^ —^o) = probability (Z > +Z q)
= 1—probability (Z ^ F zq),
Therefore F( —Zo) = 1 —F( + Zq).

90 Continuous distributions
5.3.1 Notation
If a random variable, X, is normally distributed with mean and
variance cr^, we write
X is N{fi,
The cumulative distribution function of the standard normal distribu­
tion, N{0,1), is often denoted by the special notation 0(z) to distinguish
it from the c.d.f. of any other normal distribution. However in this
book we will use the notation F(z) on the understanding that the c.d.f.
of a non-standard normal distribution will be described by
probability(A" ^ x) and not F(x).

Example 4
Find the probability that an observation from a normal distribution
will be more than two standard deviations from the mean.
If X is N{p, then we want to find
probability [X > + 2cr) +probability (X < p —2a\
By making the transformation Z = ( X - /i)/(7 , it can be seen that this
probability is the same as
probability (Z > 2) + probability (Z < —2).
But from Table 8 we have
F(2) = probability (Z 2)
= 0-977
therefore probability (Z > 2) = 0 023.
By symmetry (see Figure 27), this is equal to probability (Z < —2).

91 The normal distribution


Thus
probability (|Z| > 2) = 2x0-023
= 0-046.
The probability that an observation from a normal distribution will be
more than two standard deviations from the mean is slightly less than
one in twenty.

Example 5
The individual weights of a batch of screws are normally distributed
with mean p = 2-10 grams and standard deviation g = 0-15 grams.
What proportion of the screws weigh more than 2-55 grams?
In order to evaluate probability(X > 2-55) we make the transforma­
tion Z = (X —p)la. This gives

/Z -2 -1 0 2-55-2-10
probability (X > 2-55) = probability I— —

= probability (Z > 3)
= 1—probability (Z ^ 3)
= 1-F (3)
= 1-0-9987 (from Table 8)
= 0-0013.
Thus 0-13 per cent of the screws weigh more than 2-55 grams.

5.4 Uses of the normal distribution


(1) Many physical measurements are closely approximated by the
normal distribution.
Generally speaking such measurements are of two types. Firstly,
those in which the variation in the data is caused by observational
error. If the error in measuring some unknown quantity is the sum of
several small errors which may be positive or negative at random, then
the normal distribution will usually apply. Secondly, measurements
in which there is natural variation. For example, some biological
measurements, such as the heights of different men, are approximately
normally distributed.

92 Continuous distributions
Indeed non-normality is so rare that it is a useful clue when it does
occur. Particular care is required when describing non-normal distribu­
tions particularly if the distribution is skewed.
(2) Some physical phenomena are not normally distributed but can
be transformed to normality. For example the fatigue lives of a batch
of electrical motors give a distribution which is skewed to the right.
However if log(fatigue life) is plotted, then the distribution will be
closer to a normal distribution.
(3) It can be shown that the normal distribution is a good approxima­
tion to the binomial distribution for large n, provided that p is not close
to 0 or 1. (For example if n > 20 the approximation is valid for
0*3 < p < 0-7. For larger values of n a wider range for p is permissible).
For a given binomial distribution, the corresponding normal distribu­
tion is found by putting
p = np and <j^ = np{ \ —p).

This approximation is useful as the binomial distribution can be


difficult to evaluate for large n (see Example 7). Thus to evaluate the
binomial distribution
(a) if n is large and p is close to 0, use the Poisson approximation with
p =m
(b) if n is large and p is not close to 0 or 1, use the normal approxima­
tion as above,
(c) if n is small, then simply evaluate the binomial distribution.
(4) The normal approximation to the binomial distribution is a special
case of the central limit theorem, which will be considered again in
Chapter 6. Briefly this theorem says that if a series of samples, of
size n, are taken from a population (not necessarily normal) with
mean p and standard deviation a, then the sample means will form a
distribution which tends to the normal distribution as n increases,
whatever the population distribution. In Chapter 6 we will see that the
distribution of sample means also has mean p, but has a smaller
standard deviation given by a/y/n.
(5) Any random variable formed by taking a linear combination of
independent normally distributed random variables will itself be
normally distributed. This property can be applied, for example, to
manufactured items made up of components having normally dis­
tributed dimensions (see section 9.3.1).

93 Uses of the normal distribution


Example 6
The strengths of individual bars made by a certain manufacturing
process are known to be approximately normally distributed with
mean 24 and standard deviation 3. The consumer requires at least
95 per cent of the bars to be stronger than 20. Do the bars meet the
consumer’s specifications?
Let X = strength of a bar. In order to calculate probability {X > 20),
we standardize in the usual way to obtain
X -24 20-24
P{X >20) = P --- ^--- > ----r---

X -24
=P > -1*33

The random variable ^{X —24) is approximately N (0 ,1) so that this


probability can be obtained from Table 1, Appendix B. We find
P(X > 20) = 0-91.
Thus less than 95 per cent of the bars are stronger than 20 and so the
bars do not meet the consumer’s specifications.

Example 7
A die is tossed 120 times. Find the probability that a Tour’ will turn
up less than fifteen times.
Let X = number of Tours’ which turn up. This random variable will
follow a binomial distribution with parameters n = 120 and p = ¿.

l)r/5\ 120-r
Thus P{X < 15) = 2
6 6
which is very tedious to sum.
However, as n is large and p = ^ we can use the normal approxima­
tion with
p = np = 20
= np(l —p) = 16-67
(7 = 4-08.

94 Continuous distributions
Now for the discrete binomial distribution we have
P(X ^ 14) = P{X < 15),
which is not of course true for the continuous normal approximation.
We compromise by finding P{X < 14|) with the normal approxima­
tion. (This is often called the continuity correction.)
1 4 Í-2 0
P{X < 14i) = P p . <
408 ' 4-08
= P(Z < -1-35)
= F (-l-3 5 )
= l - F ( + l-35)
= 0*0885 from Table 1, Appendix B.

5.5 Normal probability paper


The cumulative distribution function of the standard normal distribu­
tion is illustrated in Figure 28.

It is possible to make a non-linear transformation of the vertical


scale so that F(z) will plot as a straight line.
Graph paper with scales such as those shown in Figure 29 is called
normal probability paper and is used in the following way. Suppose we
have a series of n observations which we suspect is normally distributed.

95 Normal probability paper


Figure 29 Normal probability paper showing the c.d.f. of the standard normal
distribution

Arrange them in order of magnitude so that

Xi ^ X2 ^ ^
At each point calculate the observed cumulative frequency
number of observations ^ x,
P M = ------------------
n +n \----------------

n -\-1

96 Continuous distributions
Note that there is a mathematical reason for putting (n+ 1) rather than
n in the denominator. This also makes it possible to plot the point
P(x„) = n/(n+ 1) on the graph.
After choosing a suitable scale for the variate, x, the values of
P(x,), (/ = 1 to n), can be plotted on the graph paper. If the data really
is normal, the points will lie approximately on a straight line. Con­
versely if the data is not normal, as for example if the distribution is
skewed, then the points will not lie on a straight line.
If the data does appear to be normal, the mean of the distribution
can be estimated by fitting a straight line to the data and finding the
value of X whose estimated cumulative frequency is 50 per cent. The
standard deviation of the distribution can be estimated by finding the
difference between the two values of x whose estimated cumulative
frequencies are 50 per cent and 84 per cent respectively.

Example 8
The cumulative frequencies of the data from Example 1, Chapter 1,
are given in Table 9.

Table 9

i p(^i) i P {X i) i P {X i)

1 989-4 0-032 11 998-6 0-355 21 1002-9 0-678


2 989-7 0-064 12 999-1 0-386 22 1003-1 0-710
3 992-8 0-097 13 999-2 0-418 23 1003-2 0-742
4 993-4 0-129 14 999-3 0-450 24 1004-5 0-774
5 994-7 0-161 15 1000-2 0-483 25 1006-5 0-805
6 995-3 0-193 16 1000-3 0-516 26 1006-7 0-838
7 996-4 0-225 17 1000-9 0-549 27 1007-6 0-870
8 996-5 0-257 18 1001-8 0-581 28 1008-7 0-902
9 997-9 0-290 19 1002-1 0-613 29 1012-3 0-934
10 998-1 0-322 20 1002-6 0-645 30 1014-5 0-977

These values are plotted on normal probability paper in Figure 30.


The points lie roughly on a straight line and so the data is approximately
normally distributed.
A straight line was fitted to the data by eye. From this line it can be
seen that the value of x whose cumulative frequency is 50 per cent is
1000-6, and the difference between the two values of x whose cumulative

97 Normal probability paper


985 990 995 1000 1005 1010 1015

thrust
Figure 30

frequencies are 50 per cent and 84 per cent is 6*2. These values, which
are estimates of the mean and standard deviation of the distribution,
are very close to the sample mean and sample standard deviation as
calculated in Chapter 2.

5.6 The exponential distribution


This is another useful continuous distribution. Its probability density
function is given by

98 Continuous distributions
x >0 where the parameter A > 0
fix ) =
10 X < 0.
A typical example of an exponential distribution is given iji Figure 31.

Figure 31 P.D.F. of exponential distribution

It is easy to show that the total area under the exponential curve is
equal to unity. The probability of observing a value between a and b
is given by
D
)ie~ dx = e —e- X b
i
The cumulative distribution function of the exponential distribution
is given by
X ^ 0,
F{x) =
10 X < 0.

5.6.1 Derivation from the Poisson process


The exponential distribution can be obtained from the Poisson process
(see section 4.6) by considering ‘failure times’ rather than the number
of accidents (failures). If accidents happen at a constant rate A per unit
time, then the probability of observing no accidents in a given time t is
given by e~^\
Consider the random variable, T, which is the time that elapses after
any time-instant until the next accident occurs. This will be greater than
a particular value t provided that no accidents occur up to this point.

99 The exponential distribution


Ai
Thus probability { T > t)
therefore probability (T ^ t) = \ —e
This is the cumulative distribution function of the random variable
T and we recognize that it is the c.d.f. of the exponential distribution.
Thus times between accidents are distributed exponentially.

5.6.2 Mean and variance


If accidents happen at an average rate of X per unit time then the
average time between accidents is given by 1/A. This is the mean of the
exponential distribution. It can also be obtained by calculating
00

E{X)
=i xX e dx =

The variance of the exponential distribution is given by

variance(X) = e Ix —^

= J dx
0

= ^ (after some algebra).

Example 9
The lifetime of a certain electronic component is known to be exponen>
tially distributed with a mean life of 100 hours. What proportion of
such components will fail before 50 hours?
As the mean life is 100 hours, we have X = 1/100.
Thus the c.d.f. of the distribution of lifetimes is given by
F(t) =
The probability that a component will fail before 50 hours is given by
F(50) = 1-¿>-50/100

= 0*393.

100 Continuous distributions


Thus the proportion of such components which fail before 50 hours
is about 39 per cent.

5.6.3 Applications
The exponential distribution is sometimes used to describe the distribu­
tion of failure times when studying the reliability of a product. However
the reader is warned that many products are such that the longer they
survive the more likely they are to fail, in which case the exponential
distribution will not apply. An alternative continuous distribution, called
the Weibull distribution, will be introduced in Chapter 13.
The exponential distribution is also useful in the study of queueing
theory, and a brief introduction to this subject is given here. Many
situations can be described as a queueing process; this makes it
possible to tackle problems in areas such as production engineering
and road traffic flow. Two familiar examples of queues are customers
making purchases at a shop, and calls arriving at a telephone exchange
which only has a limited number of lines. A typical industrial situation
occurs when one or more mechanics are available to repair machine
breakdowns. The mechanic can be thought of as the server and a dis­
abled machine as a customer. At a given time there may be several
machines being repaired or waiting to be repaired, and these machines
form the queue.
The simplest type of queue is a single-server queue, in which
'customers’ arrive randomly at a single service station and receive
attention as soon as all previous arrivals have been serviced. As
customers arrive randomly, the number of arrivals in a given time-
interval is a Poisson random variable. If the average arrival rate is
denoted by and Ai is a very small time interval, we have
probability(new arrival between t and t + At) = M i
and this is independent of previous arrivals.
The service time is occasionally constant, but more commonly it is
a random variable and several distributions have been proposed to
describe the distribution of service times. It is often found that the
service times are approximately exponentially distributed. Denoting
the exponential parameter by p, the average service time is equal to
l/p. The number of service completions in a given time interval is then
a Poisson random variable, and so the probability that a service
terminates between t and i -h Ai is given by pAt which is independent of
previous service completions.

101 The exponential distribution


Let P^(t) denote the probability that there are r customers in the
queue at time t (including the one being serviced). These probabilities
will depend on the initial length of the queue at time i = 0. In order
to calculate them it is necessary to solve a series of differential equations
which are obtained as follows. The queue is empty at time i-h Ai, either
if it was empty at time t and there were no arrivals in Ai, or if the queue
contained one customer at time t and there was a service completion
in Ai. Using the fact that events in different time periods are independent
and ignoring terms involving (Ai)^ we have
P o(i + Ai) = P o (0 Xprobability (no arrivals in Ai)
-hPi(i) Xprobability (service completion in Ai)

= Po(i)(l-AAi) + Pi(i)M i.
This equation can be rearranged to give
P o(i + A i ) - P o ( i )
Ai
-XPo(t)+fiPM

where the left hand side can be recognized as the definition of


d Po( t )
as Ai -►0.
dt

By a similar argument we have


dPrit)
= AP,_i(i)-(A + u)P,(i) + /iP,^i(i) { r = 1,2,...).
dt
Although the general solution of this set of differential equations
may be found, it is more important to find how the solution behaves as
i -> 00; in other words to find what is called the limiting distribution of
the queue length. In fact, provided that 2 < /i, it is easy to show that
this limiting distribution does exist and does not depend on the initial
queue length. Then [_dPJity\/(dt) -►0 as i -►oo and we find

P^ = limit Pr(t)
t -*oo

(r = 0, l , 2,...).

This is an example of a geometric distribution (see exercise 13,

102 Continuous distributions


Chapter 4). The ratio A//i, which must be less than one to give a stable
solution, is often called the traffic intensity. If > 1, the queue will
tend to get longer and longer.
The above result enables us to calculate such useful characteristics
of the system as the average queue size, the probability that the server
is busy and the average waiting time of a customer. For example, the
average queue size is given by
00

.=0
and the probability that the server is busy is given by

5.7 Bivariate continuous distributions


As in section 4.7, it is a straightforward matter to extend the ideas of
a single continuous random variable to the situation in which we are
interested in two random variables, X and V, at the same time.
The joint probability density function, /(x , y), is given by the follow­
ing relationship.
probability {x < X < x-f Ax, y < Y < y + Ay) = /(x , y)AxAy.
This function is such that

J
00 00

j f i x ,y ) d x d y = \.

The probability density function of one random variable without


regard to the value of the other random variable is called the marginal
probability density function. Thus

ou
fxM = J
-X
f(x, y) dy,

X
M y) = J f i x , y) dx

103 Bivariate continuous distributions


are the marginal probability density functions of the two random
variables.
The two random variables are said to be independent if
/(x , y) = /x(x)/y(y) for all x, y.

Exercises
1. Find the area under the standard normal curve (/â = 0, a = 1 )
(a) outside the interval ( —1, + 1 ),
(b) between —0-5 and -h0*5,
(c) to the right of 1 *8.

2. IfX isN (3,(j2 = 4) find


(a) P{X < 3),
(b) P{X ^ 5\
(c) P{X ^ 1).

3. The mean weight of 500 students at a certain college is 150 lb and the
standard deviation 15 lb. Assuming that the weights are approximately
normally distributed, estimate the proportion of students who weigh
(a) between 120 and 1801b,
(b) more than 180 lb.

4. The lengths of a batch of steel rods are approximately normally


distributed with mean 3T ft and standard deviation 0T5ft. Estimate
the proportion of rods which are longer than 3-42 ft.

5. If X is normally distributed with fi = \0 and a = 2, find numbers


Xq, Xi such that
(a) P{X > xo) - 0 05,
(b) P(X > xi) = 001.
Also find k such that
P {\X -fi\ > k ) = 0-05.

6. Suppose that the lifetimes of a batch of radio components are known


to be approximately normally distributed with mean 500 hours and
standard deviation 50 hours. A purchaser requires at least 95 per cent
of them to have a lifetime greater than 400 hours. Will the batch meet
the purchaser’s specifications?

104 Continuous distributions


7. An examination paper consists of twenty questions in each of which
the candidate is required to tick as correct one of the three possible
answers. Assume that a candidate’s knowledge about any question
may be represented as either (i) complete ignorance in which case he
ticks at random or, (ii) complete knowledge in which case he ticks the
correct answer.
How many questions should a candidate be required to answer
correctly if not more than 1 per cent of candidates who do not know
the answer to any question are to be allowed to pass? (Use the normal
approximation to the binomial distribution.)
8. The lengths of bolts produced in a certain factory may be taken to be
normally distributed. The bolts are checked on two ‘go-no go’ gauges
so that those shorter than 2*983 in. or longer than 3*021 in. are rejected
as ‘too-short’ or ‘too-long’ respectively.
A random sample of N = 300 bolts is checked. If they have mean
length 3*007 in. and standard deviation 0*011 in., what values would
you expect for , the number o f‘too-short’ bolts and «2, the number of
‘too-long’ bolts?
A sample oï N = 600 bolts from another factory is also checked. If
for this sample we find = 20 and ri2 = 15, find estimates for the
mean and standard deviation of the length of these bolts.

105 Exercises
Chapter 6
Estimation

In Chapters 4 and 5 a number of discrete and continuous distributions


were studied. The importance of these theoretical distributions is
that many physical situations can be described, at least approximately,
by a mathematical model based on one of these distributions. A
difficulty many students have at first is that of recognizing the appro­
priate model for a particular situation. This can only come with
practice and the student is recommended to try as many exercises as
possible.
We will now turn our attention to the major problem of statistical
inference which is concerned with getting information from a sample
of data about the population from which the sample is drawn, and in
setting up a mathematical model to describe this population.
The first step in this procedure is to specify the type of mathematical
model to be employed. This step is vital since any deductions will
depend upon the validity of the model. The choice of the model depends
upon a number of considerations, including any relevant theoretical
facts, any prior knowledge the experimenter has, and also a preliminary
examination of the data. The model often assumes that the population
of possible observations can be described by a particular theoretical
distribution and we are now in a position to consider such models.
Statistical inference can be divided into two closely related types of
problems; the estimation of the unknown parameters of the mathe­
matical model and the testing of hypotheses about the mathematical
model. The second of these problems will be considered in Chapter 7.

6.1 Point and interval estimates


It is most important to distinguish between the true population
parameters and the sample estimates. For example, suppose that a
sample of n observations Xj, X2, . . . , x„, has a symmetric distribution
similar to the distribution of the data in Example 1, Chapter 1. Then

106 Estimation
a suitable mathematical model is that each observation is randomly
selected from a normal distribution of mean /i and standard deviation
o. Then the sample mean x is an intuitive estimate of the population
mean and the sample standard deviation,

s =>

is an intuitive estimate of the population standard deviation a.


There are two types of estimates in common use. An estimate of a
population parameter expressed by a single number is called a point
estimate. Thus in the above example x is a point estimate of p.
However, a point estimate gives no idea of the precision of the
estimate. For example, it does not tell us the largest discrepancy
between x and p which is likely to occur. Thus it is often preferable to
give an estimate expressed by two numbers between which the popula­
tion parameter is confidently expected to lie. This is called an interval
estimate.
Before discussing methods of finding point and interval estimates, we
must decide what we mean by a ‘good’ estimate and for this we need to
understand the idea of a sampling distribution.
Given a set of data, Xj, X2, . . . , x„, a variety of statistics can be
calculated such as x and s^. If we now take another sample of similar
size from the same population then slightly different values of x and
s^ will result. In fact if repeated samples are taken from the same
population then it is convenient to regard the statistic of interest as a
random variable and its distribution is called a sampling distribution.
Strictly speaking the distribution of any random variable is a sampling
distribution, but the term is usually reserved for the distribution of
statistics like x and s^.

6.2 Properties of the expected value


In order to find the sampling distribution of a statistic like x, we shall
have to extend our knowledge of the expected value of a random vari­
able. This concept was introduced in Chapter 4 in order to calculate the
theoretical mean and variance of a given distribution, but it is also
useful for carrying out many other operations.
We begin by repeating the formal definition of the expected value
of a simple random variable. If the discrete random variable X can

107 Properties of the expected value


take values Xj, X2, . . . , x^y, with probabilities Pi, ?Pn ^ then the
mean or expected value of the random variable is defined as
N

E{X) = Y, ^iPi (see section 4.5).

The corresponding definition for a continuous random variable is


given by
00
E{X) = j xf{x)dx (see section 5.2).

More generally we are interested in the expected value of random


variables related to X. For example, we might want to know the
expected value of 2X or X^. The variance of X is of particular impor­
tance. This is given by

variance (J^) = E[{X-fi)^] where ju = E{X\


These expected values can be found from the following general defini­
tion. If g{X) is a function of the random variable, X, then the expected
value or expectation of g(X) is given by

Z Si^i)Pi the discrete case,


E[g{X)] = 00
g(x)/(x) dx for the continuous case.
j
The idea of expectation can also be extended to apply to problems
involving more than one random variable. For example, we might
want to calculate the expected value of X + Y, where X and Y are two
random variables. If g(X, 7) is a function of the random variables,
X and y, then the expected value or expectation of g(X, Y) is given by

Z g(^, y)P{^, y) for the discrete case,


x,y
E[g{X, Y)]
J y)/(^i y) dx dy for the continuous case.

In the discrete case P(x, y) is the joint probability that X is equal to x


and y is equal to y, and in the continuous case /(x , y) is the joint

108 Estimation
probability density function. The summation (or integral) is taken over
all possible combinations of x and y.
In proving the following results we only consider the discrete case.
The proof in the continuous case follows immediately. The quantities
b and c are constants.
6.2.1 If X is any random variable then
E{bX + c) = bE(X) + c. 6.1

Proof

E{bX + c) = ^ (bxi + c)Pi


i= 1
N N

= ft X ^ iP i + C S Pi
i= 1 i= 1
= bE{X)-^c.

6.2.2 If X and Y are any two random variables then

E(X-^Y) = E{X)-^E{Y). 6.2


Proof
E { X + Y ) = ^ i x + y)P{x,y)

= X xP(x, y)+ X y)

i:P ix ,y) + l l y l P i x , y )

= J]xPxix)+ Y^yPriy),

where PxM, Pyiy) are the marginal probabilities of x and y respectively


(see section 4.7).
This completes the proof as

E xPxix) = E{X) and E yPriy) = PiY)-


This result can be extended to any number of random variables. We
find

£(2i 1 + X 2 + . . . + X,) = E(X 1) + . . . + E{X,).

109 Properties of the expected value


6.2.3 If X is any random variable then
variance {cX) = variance (X). 6.3
Proof. Let E(X) = fi, so that E{cX) = cfi by 6.1. Then
variance (cX) = E[{cX —
= E [c \X -^if]
= c ^ E liX -n ^ ]
= variance (A").

6.2.4 If X and Y are independent random variables, then


E{XY) = E(X)E(Y). 6.4
Proof
E{XY) = X xyP{x, y).
x,y

But X, Y are independent, so P{x, y) can be factorized to give


P{x, y) = Px{x)Py{y)
E(XY) = Y^xyPAx)Pyiy)

= \^Y.^Pxix)j\YyPy{y)j

= EiX)E{Y).

6.2.5 If X and Y are independent random variables, then

variance (X + 7) = variance (X) + variance (7). 6,5

This result is one of the many reasons why statisticians prefer the
variance (or its square root the standard deviation) as a measure of the
spread of a distribution in most situations.

Proof. Let E{X) = pi and £(7) = p 2 -

E{X + Y) = p i+ p 2 using 6.2 .

Variance (X + 7) = E[{X + Y - p ^ - p 2 f ]
= E[(X - p ^ f ] + E [ ( Y -p 2 f] + 2E[(X - p ,){Y-P2)1

110 Estimation
But the third term in this expression is given by

= E { X ) E ( Y ) -^ ,E { Y )- fi 2 E{X) + fi , ^ 2 using 6.4

= 0.
Thus variance (X + 7) = variance (A^)-h variance (7).
This result can be extended to any number of independent random
variables. We find

variance (X 1 + X 2 + ••• + ^k) = variance (Xi )+ ... ^variance (X^).

Note particularly that 6.2.4 and 6.2.5 only apply to independent


random variables but that the other results apply to any random
variables.
The following results are also useful. The proofs are left to the reader.

Variance (X) = E(X^)-[E{X)]\ 6.6

Variance {X -Yc) = variance (X). 6.7

6.3 The sampling distribution of x


Given n observations, their sample mean, x, is the intuitive estimate of
the underlying population mean. How accurate is this estimate? What
are its properties? To answer these questions we consider what would
happen if we were to take repeated samples of the same size from the
same population. The sample means would vary somewhat from
sample to sample and form what is called the sampling distribution of
X. Of course, in practice we normally take just one sample, and the
properties of the sampling distribution of x are found, not by taking a
real or hypothetical series of samples, but by using the theoretical
properties of the expected value as demonstrated below. In this
derivation, the quantity x is regarded, not as a particular value, but as
a random variable. In previous sections we have typically denoted a
random variable by (capital) X, and a particular value of it by (lower
case) X. Here it is convenient to denote the sample mean by x whether
or not it is a random variable and the reader should bear this in mind
throughout this section.
We now state the following important theorem.

Ill The sampling distribution of x


Theorem 1
If random samples, size n, were to be taken from a distribution with
mean ¡x and standard deviation a, then the sample means would form a
distribution having the same mean but with a smaller standard
deviation given by c/y/n.
The quantity a/y/n, which is the standard deviation of the sampling
distribution of x, is often called the standard error of x to distinguish it
from the standard deviation, (7, of the original distribution. As n
increases, a/y/n decreases and this confirms the intuitive idea that the
more observations taken, the more accurate will be the sample mean.

Proof
(i7) The mean. We want to find the average value of the sample mean, x,
in a long series of trials. It is convenient to use the concept of the
expected value. By definition we know that the expected value of any
observation is equal to the population mean jx.
1 +X2+.
Thus E(x) = E
1 n

=E p +£ — using 6.2
1 \ «;
, ,
=- +- using 6.1
n n n
= P-
Thus the mean value of the sampling distribution of x is equal to //.
Note that when we write E{x) or E(xi\ the quantity in the brackets is
a random variable and not a particular sample value.
(b) The standard error.

Variance (x) = variance

= ^variance (Y x,) using 6.3

= ^[variance (xj) + + variance (xJ] using 6.5

since successive observations are independent.


But variance (x,) = for all /.

112 Estimation
Thus variance (x) = \ . -j-a

a
n

Thus the standard deviation or standard error of x is equal to a/Jn.


This completes the proof.

Example 1
The percentage of copper in a certain chemical is to be estimated by
taking a series of measurements on small random quantities of the
chemical and using the sample mean percentage to estimate the true
percentage. From previous experience individual measurements of this
type are known to have no systematic error and to have a standard
deviation of 2 per cent. How many measurements must be made so that
the standard error of the estimated percentage is less than 0-6 per cent?
Assume that n measurements are made. The standard error of the
sample mean will then be l/^Jn per cent. If the required precision is
achieved we must have

2 /^n < 0-6


giving n > 1M.
As n must be an integer, at least twelve measurements must be made to
achieve the required precision.

The student must remember Theorem 1, as it is one of the most


important results in statistics. It is worth emphasizing that this theorem
is independent of the parent distribution and holds whether it is
normal, Poisson or any other. The question which now arises is what
type of distribution will the sample mean follow? In Theorem 2 we
consider the situation in which the parent distribution is normal.

Theorem 2
If random samples, size n, are taken from a normal distribution, the
sampling distribution of x will also be normal, with the same mean and
a standard error given by a/^n.
What happens if the parent population is not normal? The rather
surprising result is that provided reasonably large samples are taken

113 The sampling distribution of x


(for example, n > 30), the sampling distribution of x will be approxi­
mately normal whatever the distribution of the parent population. This
remarkable result known as the central limit theorem has already been
mentioned in section 5.4.

Central limit theorem


If random samples, size n, are taken from a distribution with mean ^
and standard deviation a, the sampling distribution of x will be
approximately normal with mean fi and standard deviation a/yjn, the
approximation improving as n increases.
The proofs of both Theorem 2 and the central limit theorem are
beyond the scope of this book, but can be found in books on
probability or mathematical statistics (see Appendix C).

Example 2
The diameters of shafts made by a certain manufacturing process are
known to be normally distributed with mean 2-500 cm and standard
deviation 0 009 cm. What is the distribution of the sample mean of
nine such diameters selected at random? Calculate the proportion of
such sample means which can be expected to exceed 2-505 cm.
From Theorem 2 the sampling distribution of x will also be normal
with the same mean 2-500 cm but with a standard deviation (or standard
error) equal to 0-009/^9 = 0-003 cm.
In order to calculate probability (x > 2-505) we standardize in the
usual way to obtain
f x - 2-500
2-505-2-500’
probability
L 0-003 0-003
X - 2-500
= probability > 1-66
0-003

The random variable (x —2-500)/0-003 will be N (0,1) so that the


required probability can be obtained from Table 1, Appendix B. The
proportion of sample means which can be expected to exceed 2-505 cm
is 0-048.

Example 3
The sampling distribution of x is illustrated by constructing a distribu­
tion from successive samples of ten random numbers. The mean of

114 Estimation
each sample is calculated and the frequency distribution of the sample
means is obtained.
In this example, a random number is such that it is equally likely
to be any integer between zero and nine. Thus the population proba­
bility distribution is given by
Pr = ^ ( r - 0, 1 , . . . , 8,9).
This distribution is sometimes called the discrete uniform distribution.
Random numbers of this type have been tabulated in a number of
books, or can easily be generated on a computer. Alternatively they
can be generated with the simple device shown in Figure 32.

Figure 32 Random number generator

The circumference of the circle is divided into ten equal sections,


numbered 0 to 9, so that when the arrow is spun it is equally likely to
come to rest pointing at any one of the sections.
The arrow was spun ten times and the resulting values were 0, 8, 3,
7, 2, 1, 5, 6, 9, 9. The sample mean of these numbers is 5 0. The arrow
was then spun ten more times and the resulting values were 2, 8, 7, 8,
1, 4, 5, 0, 3, 1, giving a sample mean equal to 3-9. This process was
repeated forty times, giving forty sample means which are given below.

50 3-9 5-2 4-6 4-1 3-1 4-8 4-9 4-5 4-2


5-1 3-3 4-2 4-3 5-0 4-0 4-5 3-5 5-4 4-7
5-3 4-3 3-3 5-6 4-1 4-9 4-4 3-9 4-6 5-8
7-1 4-8 51 2-8 4-3 60 50 4-8 5-3 4-5

115 The sampling distribution of X


It is easier to inspect this data if we form the grouped sampling
distribution as in the table below.

Grouped sampling distribution


Sample mean Frequency
00-10 0
M -2 0 0
2- 1-3-0 1
3- 1-4-0 7
41-5-0 23
51-60 8
61-7-0 0
7-1-8-0 1
81-9-0 0

It is clear that this distribution has a smaller standard deviation


than the parent distribution. In fact thirty-eight of the observations lie
between 31 and 6 0. The mean of the parent population is 4-5. By
inspection the mean value of the sampling distribution of x is also
close to 4-5. Moreover the sampling distribution of x is much closer
to a normal distribution than might have been expected with such
relatively small samples.

6.4 The sampling distribution of


Thus far we have concentrated on the sampling distribution of x, but
any other sample statistic will also have a sampling distribution. For
example, if repeated random samples, size n, are taken from a normal
distribution with variance the statistic will vary from sample to
sample and it can be shown that the statistic
2 {n-\)s^ ( xi - x) ^ + ... + (x„-x)^
X = ....Z2'

follows a distribution called, the chi-squared or distribution. The


Greek letter x is pronounced ‘kigh’ and spelt ‘chi’, x^ is used rather
than just x to emphasize that the statistic cannot be negative. This
distribution is related to the normal distribution (see Appendix) and
depends upon a parameter called the ‘number of degrees of freedom’.
The term degrees of freedom (abbreviated d.f.) occurs repeatedly in
statistics. Here it is simply a parameter which defines the particular
X^ distribution. In the above situation, the distribution will have
{n— l)d.f. This is the same as the denominator in the formula

116 Estimation
2 ^
n- 1 ■

and we say that the estimate of has (n— l)d.f. We will say more
about this in section 6.5.
The p.d.f. of the distribution is too complicated to give here, but
the percentage points of the distribution are tabulated in Appendix B,
Table 3, for dififerent values of v. The distribution is always skewed to
the right and has a mean value equal to the number of degrees of
freedom. For large values of v, the distribution tends towards the
normal distribution.
An example of a x^ distribution is shown in Figure 33. The per­
centage point Xa,v is chosen so that the proportion of the distribution,
with Vd.f., which lies above it, is equal to a.

Figure 33 The distribution

As =

(n - l)s^
we have E = n -l

so that E{s^) = Thus the expected value of the sample variance is


equal to the true population variance.

117 The sampling distribution of ^


6.5 Some properties of estimators
In many situations it is possible to find several statistics which could be
used to estimate some unknown parameter. For example if a random
sample of n observations is taken from iV(/i, cr^) then three possible point
estimates of are the sample mean, the sample median and the average
of the largest and smallest observations. In order to decide which of
them are ‘good’ estimates, we have to look at the properties of the
sampling distributions of the different statistics. The statistic is called
an estimator and is a random variable which varies from sample to
sample. The word estimate, as opposed to estimator, is usually reserved
for a particular value of the statistic.
One desirable property for an estimator is that of unbiasedness. An
estimator is said to be unbiased or accurate if the mean of its sampling
distribution is equal to the unknown parameter. Denote the unknown
parameter by and the estimator by /¿. (The * or ‘hat’ over is the
usual way of denoting an estimator). Thus fi is an unbiased estimator
of ILLif

E(fi) = fi.

For example the sample mean, 3c, is an unbiased estimator of the


population mean /i, because by Theorem 1 on p. 112 we have

E(x) = n.

Unbiasedness by itself is not enough to ensure that an estimator is


‘good’. In addition we would like the sampling distribution to be
clustered closely round the true value. Thus given two unbiased
estimators, /¿i and /¿2, we would choose the one with the smaller
variance or standard error. This brings in the idea of efficiency. If
var(/ii) > var(/i2) and /¿i, /¿2 are both unbiased estimators, then the
relative efficiency of /¿j with respect to /¿2 is given by

var(/i2
var(/ii

which is a number between 0 and 1. The unbiased estimator whose


standard error is the smallest possible is sometimes called the minimum
variance unbiased estimator.

118 Estimation
But it is important to realize that there are many good estimators
which are not unbiased. An estimator with a small bias and a small
standard error may be better than an unbiased estimator with a large
standard error. The idea of efficiency can be extended to consider biased
estimators by calculating the expected mean square error, £(/i —
This is equal to the variance of fi only when fl is unbiased. The relative
efficiency of with respect to /¿2 is given quite generally by

2*

unbiased biased

inefficient

Figure 34 Types of estimators

Four types of sampling distribution are shown in Figure 34. An


unbiased efficient estimator is clearly preferable to a biased inefficient
estimator.
One further desirable property of a good estimator is that of con­
sistency ; which, roughly speaking, says that the larger the sample size n
the closer the statistic will be to the true value. An estimator, /¿, is said
to be consistent if
(a) E{fi) n as n oo
(b) var(/r) 0 as n oo

119 Some properties of estimators


Property (a) clearly holds for unbiased estimators. Moreover the sample
mean, for example, is consistent as var(x) = o^jn tends to zero as n ^ co.
In some situations there is an estimator which is ‘best’ on all counts.
But in other situations the best unbiased estimator may have a larger
mean square error than some biased estimator and then the choice of
the estimator to be used depends on practical considerations.

Example 4
If a random sample of n observations is taken from N{p, a^), two
possible estimates of p are the sample mean and the sample median.
We know

var (x) = — .
n

It can also be shown that the median is unbiased and that

71(7
var(median) =
2n
Thus the efficiency of the median is 2/ tz or about 64 per cent. Thus it
is better to use the sample mean.
Note that for skewed distributions the median will give a biased
estimate of the population mean. Moreover this bias does not tend
to zero as the sample size increases and so the median is not a consistent
estimate of the mean for skewed distributions.
We now state the following important results. If a random sample,
size n, is taken from a normal distribution with mean p and variance
(7^ then

( 1 ) X is the minimum variance unbiased estimate of p,


(2) s^ is the minimum variance unbiased estimate of

We have already proved that E{x) = p, so x is an unbiased estimate


of p. Similarly it can be shown that

E{s^) =
n —l

so that s^ is an unbiased estimate of c7^. This is the reason that we


chose this formula for in Chapter 2. If we put n instead of (n —l)

120 Estimation
in the denominator of the formula for 5^, then we will get a biased
estimate of This fact, which puzzles many students, arises because
the sample values are ‘closer’ on average to x than they are to jn. How­
ever, in the rare case where the population mean jj. is known, but the
variance is not, the unbiased estimate of would indeed be
—jiif/n, with n in the denominator. The denominator in the
formula for is called the number of degrees of freedom of the estimate.
When fi is unknown and the observations are compared with x, there
is one linear constraint on the values of (xj —x), since ^(x^ —x) = 0, and
so one degree of freedom is ‘lost’.
More generally in Chapter 10 we will see that the number of degrees
of freedom can be thought of as the number of independent comparisons
available. With a random sample size n, each observation can be
independently compared with the (n—1) other observations and so
there are {n — 1)d.f. This is the same as the number of degrees of freedom
of the estimate ^(x, —x)^/(n—1). If in addition we know the true mean
fi, then each of the n observations can be independently compared with
jLi and so there are n d.f. This is the same as the number of degrees of
freedom of the estimate ^(x, —
The fact that x and are minimum variance estimates of ¡li and
requires a mathematical proof which is beyond the scope of this book.

6.6 General methods of point estimation


We have seen that the mean of a sample from a normal distribution
is the best estimate of the population mean. Fortunately there are many
cases where such intuitively obvious estimates are indeed the best.
However, other situations exist where it is not obvious how to find a
good estimate of an unknown parameter. Thus we will describe two
general methods of finding point estimates.

6.6.1 The method of moments


Suppose that n observations, x^, X2, . . . , x„, are made on a random
variable, X, whose distribution depends on one or more unknown
parameters. The kth sample moment of the data is defined to be
” k
mv

121 General methods of point estimation


The kth population moment is defined to be

^ik = E(X^)

and this will depend on the unknown parameters.


The method of moments consists of equating the first few sample
moments with the corresponding population moments, to obtain as
many equations as there are unknown parameters. These can then be
solved to obtain the required estimates. This procedure usually gives
fairly simple estimates which are consistent. However they are some­
times biased and sometimes rather inefficient.

Example 5
Suppose we suspect that a variable has a distribution with p.d.f.
i(fc+l)x‘ (0 < x < l , k > -l),
f(x ) =
(otherwise).
and that the following values are observed.
0-2 0-4 0-5 0-7 0-8 a -8 0-9 0-9.
Estimate the unknown parameter k.
As there is only one unknown parameter, the method of moments
consists simply of equating the sample mean with the population mean.

The sample mean = m\ = ^ — 0*65.

1
The population mean = j (k-\-l)x^x dx
0

_ (fc+l)
(k + 2)'
Thus the estimate of k is given by

( ^ + 1)
= 0-65
(k + 2)
giving
ic = 0-86.

122 Estimation
6.6.2 The method of maximum likelihood
This second method of estimation gives estimates which, besides being
consistent have the valuable property th at for large m they are the
most efficient. However, the estimates may be biased as may those
obtained by the method of moments.
We introduce the idea of the likelihood function by means of an
example. Suppose that we want to estimate the binomial parameter p,
the probability of a success in a single trial, and that ten trials are
performed resulting in seven successes. We know that the probability
of observing x successes in ten trials is This is a
probability function in which p is fixed and x varies between 0 and 10.
However, in this particular experiment we observe x = 7 and would
like to estimate p. If the value x = 7 is inserted in the probability
function, it can now be thought of as a function of p in which case it
is called a likelihood function. We write
L {p )= ^ ^ C ,p f\-p )\
The value of p, say p, which maximizes this function, is called the
maximum likelihood estimate. Thus the method of maximum likeli­
hood selects the value of the unknown parameter for which the proba­
bility of obtaining the observed data is a maximum. In other words
the method selects the ‘most likely’ value for p. The true value of p
is of course fixed and so it may be a trifle misleading to think of p as a
variable in the likelihood function. This difficulty can be avoided by
calling the likelihood a function of the unknown parameter.
In order to maximize the likelihood function it is convenient to take
logs, thus giving
log^L(p) = lo g /° C 7+ 71ogeP+31og^(l-p)
^ log L(p) _ 7 ___ ^
dp p l-p'
The likelihood function is maximized when [d log L{p)]/dp is zero. Thus
we have

= 0
P (1 -p )
giving

Thus the intuitive estimate is also the maximum likelihood estimate.

123 General methods of point estimation


More generally with a series of observations, Xj, X2, . . . , the
likelihood function is obtained by writing down the joint probability of
observing these values in terms of the unknown parameter. With a
continuous distribution, the likelihood function is obtained by writing
down the joint probability density function.
The method of maximum likelihood can also be used when the
distribution depends on more than one parameter. For example, the
normal distribution depends on two parameters, ^ and so that the
likelihood is a function of two unknown parameters. To obtain the
maximum value of this function we differentiate partially, first with
respect to ¡x, and then with respect to a; equate the derivatives to
zero and solve to obtain the maximum likelihood estimates. With a
sample size n, the maximum likelihood estimates turn out to be

fi = X

.2 _

Notice that the estimate of is biased, and so the usual estimate

52
n -l
is preferred for small samples. (Actually is the marginal likelihood
estimate of cr^, but this method of estimation will not be discussed in
this book.)

Example 6
The Poisson distribution is given by

P.= (r = 0, 1 , 2 ,...).
rl

A sample of n observations, Xi, X2, . . . , x„, is taken from this distribu­


tion. Derive the maximum likelihood estimate of the Poisson para­
meter X.
The joint probability of observing Xj,X2, . . . , x „ for a particular
value of X is given by the product of the probabilities

n‘
1= 1 X.!
(where FI denotes a product).

124 Estimation
Thus the likelihood function is given by

i-w = f ] -
X , !I ’

where the observations are known and k is unknown.

log^ L(k) = (-A + Xi log^A -log^x, !) + (-/1 + X2 log^/l-log^X 2 !)+

+ ... + ( - A+ x„ log^A-log<,x„!)

= -nA + l o g e i X- ^ i - Z' o g ^ ^ . '


d log^ L{k) ^
dk k '

When this is zero we have

Thus the intuitive estimate, x, is also the maximum likelihood estimate


of L (It is also the method of moments estimate.)

Example 7
Find the maximum likelihood estimate for the problem in Example 5.
For a particular value of /c, the joint probability density function
of observing a series of values Xj, X2, . . . , Xg is given by
(/c -h 1)x^i X (/c -f 1)X2 X ... X (/c + 1)Xg .

Thus the likelihood function is given by

L (/ c) = (/ c + 1 ) « ( x , ) W - - ( ^ 8 ) ' ‘

loge L{k) = 8 log^(/c + \) + k X logi


i= 1

¿ lo g ,L W 8 , ,

When this is zero we have

125 General methods of point estimation


8
U i = “8
Z log^x,.
i= 1
-8
-4-24
= 1-89
k = 0-89.
This estimate is very close to the estimate obtained by the method of
moments.

6.7 Interval estimation


We have described two general methods of deriving point estimates of
an unknown parameter. However it is usually preferable to find an
interval estimate. This estimate is usually constructed in such a way
that we have a certain confidence that the interval does contain the
unknown parameter. The interval estimate is then called a confidence
interval. We will concentrate on the important problem of finding a
confidence interval for the population mean.

6.7.1 Confidence interval for ^ with a known


Suppose a random sample, size n, is taken from a distribution with
unknown mean but whose standard deviation a is known. This
situation is somewhat uncommon but is a useful introduction to the
case where a is also unknown.
If the distribution is normal we have seen that x is a^/nf so
X — IX
z = is N(0, 1 ).
a l^ n
Now 95 per cent of the standard normal distribution lies between
± 1*96. Therefore

probability! —1-96 < ^ < +1-96 = 0-95.


\
But
X — IX
< 1-96 implies • 1 -9 6 -7 - < n
a /^ n V”

126 Estimation
and

—1-96 < r implies n<x+l-96-^.


a /^ n

Thus the expression can be rearranged to give

probability! 3 c -1*96-^ < fi < 3c+ 1-96-4- = 0-95.


\ V'^ vW
The interval between x — 1‘96 a /Jn and 3c+l -96cr/^n is called the
95 per cent confidence interval for ju. In other words given the sample
mean x, we are 95 per cent confident that this interval will contain jn.
The two endpoints of the confidence interval are called the confidence
limits for the unknown parameter.
The meaning of the above probability statement must be clearly
understood. If x is a random variable then this statement is certainly
true. In practice only one value is available and the confidence interval
either will or will not contain /i, in which case the probabilities are one
or nought respectively. Then the above statement must be interpreted
as follows : if for each experiment like this we were to claim that lay
within such a confidence interval, then 95 per cent of these claims
would be true in the long run. For one particular experiment the
probability or confidence of 95 per cent expresses the odds at which
we would be prepared to bet that the confidence interval does contain /x.
Because of the central limit theorem a similar confidence interval is
obtained when the sample is taken from a non-normal distribution
provided that a reasonably large sample is taken.

Example 8
If the sample mean, w, of the twelve measurements in Example 1 were
found to be 12*91 per cent, give a 95 per cent confidence interval for the
true percentage w.
The 95 per cent confidence interval for w is given by

vv± 1-96-4^= 12*91+ 1*96x0*58

= 12*91 + 1*14.
By a similar argument we can obtain the confidence intervals
corresponding to stronger or weaker degrees of confidence. For

127 Interval estimation


example, 99 per cent of the normal distribution lies within 2*58 standard
deviations from the mean. Thus the 99 per cent confidence interval is
given by

3C+ 2-58

Figure 35

Generally to obtain the 100(1 —a) per cent confidence interval, we


find the value, such that a proportion of the standard normal
distribution lies above it. By symmetry a proportion lies below
— so that the 100(1 —a) per cent confidence interval is given by

6.7.2 Confidence interval for ¡x with a unknown


In practice we usually find that the true standard deviation, C7, is
unknown and that the sample standard deviation, s, is used to estimate
a. Then if a random sample size n is drawn, an estimate of the standard
error of the sample mean ic is given by s/^n.
Then we might expect the 95 per cent confidence interval for to be
of the form x+ l*96s/^n, simply replacing a by s in our previous
formula. For large samples {n > 30) we can in fact get a good approxi­
mation with this formula. However for small samples {n < 30) we must
take a rather wider interval, since s is no longer such a good estimate
of (7, and there will be appreciable variation in 5 from sample to sample.

128 Estimation
If we denote the 95 per cent confidence interval for //, for a sample
size n, by

x + t. 6.8

then some typical values for are given in Table 10.

Table 10
Values of for a
95 per cent Confidence Interval

4 3-18
8 2-36
12 2-20
20 209
00 1*96

Thus although increases quite rapidly for very small n, a useful


approximation for n > 25, which is worth remembering, is that the
95 per cent confidence interval is x + 2(estimated standard error of
sample mean).

Example 9
A sample of eight observations is taken from a distribution with
unknown mean p. The sample mean x is 7*91 and the sample standard
deviation s is 0-67. Thus an estimate of the standard error of the sample
mean is 0 67/^8 = 0*237. From Table 10 we find that for a sample
size 8 the 95 per cent confidence interval for the true mean p is given
by x + 2-36 x s/^n.
Thus the 95 per cent confidence interval is given by 7*91 ±0*56.
In equation 6.8 is actually the percentage point of a distribution
called the i-distribution. We have already seen that if random samples,
size n, are drawn from a normal distribution, mean p and variance
then the sampling distribution of the random variable
X — p .
z = — ^ IS A(0, 1 ).
(T/^n

129 Interval estimation


However, if o is replaced with its sample estimate 5, then the random
variable

t =

has a sampling distribution which is somewhat more spread out than


the standard normal distribution. It can be shown that the statistic
t follows a distribution called the i-distribution, which is related to the
normal and distributions (see Appendix A). This distribution
depends on a parameter called the number of degrees of freedom,
which is the same as the number of degrees of freedom of the estimate s.
In the above situation, where was calculated from samples size n,
there are (n—1) degrees of freedom. An example of a i-distribution,
with three degrees of freedom, is given in Figure 36 together with a
standard normal distribution for comparison.

Figure 36

The i-distribution is sometimes called the Student i-distribution after


W. S. Gossett who studied the distribution and published papers on it
under the pen name ‘Student’. Like the standard normal distribution
it is symmetric with mean zero. The percentage point, i^^, is chosen so
that a proportion a of the i-distribution, with vd.f., lies above it.
These percentage points are tabulated in Table 2, Appendix B.
The percentage points which are required to establish confidence
intervals for ^ are such that
P(|i| > i j = a.
Thus tc = i^a with the appropriate number of degrees of freedom (see
Figure 37).

130 Estimation
We are now in a position to find a confidence interval for fx for any
degree of confidence we wish. If x and are calculated from a sample
size n, the 100(1 —a) per cent confidence interval for ¡j. is given by

Example 10
Use the data given in Example 9 to establish a 99 per cent confidence
interval for the true mean p.
There are eight observations, so that the sample estimate s is based
on (8-1) d.f. From Table 2, Appendix B, we find
^0 005,7 “ 3‘50.
From Example 9 we have s /^ n = 0-237. Thus the 99 per cent
confidence interval for p is given by
7-91 ±3-50x0-237 = 7-91 ±0-83.
This is about half as wide again as the 95 per cent confidence interval
for p.

5.7.3 Confidence interval for


Interval estimates for other parameters can be derived in a similar way.
For example we know that (n — follows a distribution with
(n —1) degrees of freedom. Thus
( n - \ ) sc2 \
< — 2--- < XUn-l = ( 1 - a ) .

131 Interval estimation


This can be rearranged to give
(n— (n — l)s^
< (7^ <
1 /I - \oi.n- 1
Thus the 100(1 —a) per cent confidence interval for lies between
{n—\)s^ {n—\)s^
and
/ia.n- 1 X1- ia,n - 1

Exercises
1. A random sample is drawn from a population with a known standard
deviation of 2 0. Find the standard error of the sample mean if the
sample is of size (a) 9; (b) 25 and (c) 100. What sample size would give
a standard error equal to 0*5?
2. If X, Y are independent random variables show that
variance (X —Y) = variance (X) +variance (7).
Two random samples of size and «2 are taken from a population
with variance The two sample means are and X2 - Show that
the standard error of Xj —X2 is given by
1 1
Ml tl2
3. A random sample size 10 is taken from a standard normal distribu­
tion. Find the values a and b such that
P(a < x) = 0*95
P(x < b) = 0-95
i.e.
P(a < x < b) = 0*90.
Also use the distribution to find the values c and d such that
P(c < s^) = 0-95
P(s^ < d) = 0-95
i.e.
P{c < s^ < d) = 0-90.
Also find the values of a, b, c and d if the sample was taken from a
normal distribution with a mean of six and a variance of four.

132 Estimation
4. The probability density function of the exponential distribution is
given by
f(x) = Àe~ X > 0.
A random sample of n observations is taken from this distribution.
Show that the method of moments estimate of A is given by
1
X = - where x is the sample mean,
v
Also show that the maximum likelihood estimate of a is the same as
the method of moments estimate.
5. The percentage of copper in a certain chemical is measured six times.
The standard deviation of repeated measurements is known to be 2*5:
the sample mean is 141. Give a 95 per cent confidence interval for the
true percentage of copper, assuming that the observations are approxi­
mately normally distributed.
6. The percentage of copper in a certain chemical is measured six times.
The sample mean and sample standard deviation are found to be 14T
and 2*1 respectively. Give a 95 per cent confidence interval for the true
percentage of copper, assuming that the observations are approxi­
mately normally distributed.
7. A random sample, size «i, is taken from and a second
random sample, size « 2, is taken from If the sample means
are denoted by x^, x 2 it can be shown that

- w o ,I ),

Show that the 100(1—a) per cent confidence interval for (^ 1 - /^ 2^


assuming a is known, is given by

(X i-X :

The corresponding formula when a is unknown is given by

where 5 denotes the combined estimate of a from the two samples (see
section 7.3.2).

133 Exercises
Chapter 7
Significance tests

7.1 Introduction
We have seen that statistics is concerned with making deductions from
a sample of data about the population from which the sample is drawn.
In Chapter 6 we considered the problem of estimating the unknown
parameters of the population. In this chapter we shall see how to carry
out a significance test in order to test some theory about the population.
The commonest situation is when the population can be described
by a probability distribution which depends on a single unknown
parameter fi. On the basis of the experimental results we might wish,
for example, to accept or reject the theory that ¡ has a particular value
jl

fiQ. A numerical method for testing such a theory or hypothesis is


called a significance test.

Example 1
One of the commonest problems facing the engineer is that of trying
to improve an industrial process in some respect. In particular he may
wish to compare a new process with an existing process. However,
because of changeover costs, he will need to be reasonably certain that
the new process really is better before making the necessary changes.
As an example let us suppose that the strength of steel wire made by
an existing process is normally distributed with mean = 1250 and
standard deviation a = 150. A batch of wire is made by a new process,
and a sample of 25 measurements gives an average strength of 3c = 1312.
(We assume that the standard deviation of the measurements does not
change.) Then the engineer must decide if the difference between x and
Pq is strong enough evidence to justify changing to the new process.
The method of testing the data to see if the results are considerably or
significantly better is called a significance test.

134 Significance tests


In any such situation the experimenter has to weigh the evidence and,
if possible, decide between two rival possibilities. The hypothesis which
we want to test is called the null hypothesis, and is denoted by H q. Any
other hypothesis is called the alternative hypothesis, and is denoted by
H, .
In the case of the engineer in Example 1, he must decide whether to
accept or reject the hypothesis that there is no difference between the
new and existing processes. This theory is known as the null hypothesis
because it assumes there is no difference between the two processes.
Let us denote the mean strength of wire produced by the new process
by p, so that x is the sample estimate of p. Then the null hypothesis is
given by
H ^ \p = Pq.
The second possible theory in this situation is that the new process is
better than the existing process and this is the alternative hypothesis. It
is also useful to formulate precisely as follows:
H^ : p > Pq,
At first the student may have difficulty in deciding which of two
theories is the null hypothesis and which is the alternative hypothesis.
The important point to remember is that the null hypothesis must be
assumed to be true until the data indicates otherwise, in much the same
way that a prisoner is assumed innocent until proved guilty. Thus the
burden of proof is on and the experimenter is interested in departures
from H q rather than from H ^.

Example 2
If a new industrial process is compared with an existing process, the
choice of the null hypothesis depends on a number of considerations and
several possibilities arise.
(a) If the existing process is reliable and changeover costs are high, the
burden of proof is on the new process to show that it really is better.
Then we choose
H q: existing process is as good as or better than the new process,
and H^ : new process is an improvement.
This is the situation discussed in Example 1.

(b) If the existing process is unreliable and changeover costs are low,
the burden of proof is on the existing process. Then we choose

135 Introduction
H q: new process is as good as or better than the existing process.
and : existing process is better than the new process.

In these two extreme cases there is no difficulty in choosing the


correct null hypothesis. But in practice, the situation may be somewhere
in between. Then, in addition to choosing the more natural null
hypothesis, it is also essential to make a careful choice of the sample size,
as described in section 7.8, in order to ensure that the risk of making
a wrong decision is acceptably small. Alternatively a decision theory
approach may be possible by allocating costs to the different even­
tualities, but this approach will not be described in this book.

Example 3
A new drug is tested to see if it is effective in curing a certain disease.
The Thalidomide tragedy has emphasized that a drug must not be put
on the market until it has been rigorously tested. Thus we must assume
that the drug is not effective, or is actually harmful, until the tests
indicate otherwise. The null hypothesis is that the drug is not effective.
The alternative hypothesis is that the drug is effective.
A second factor to bear in mind when choosing the null hypothesis is
that it should nearly always be precise, or be easily reduced to a precise
hypothesis. For example when testing
H q: Po
against H i'. p > po,

the null hypothesis does not specify the value of p exactly and so is not
precise. But in practice we would proceed as if we were testing
H q\ p = Po
against H^ \ p >Po

and here the null hypothesis is precise.


Further comments on choosing a precise null hypothesis are included
at the end of section 7.8.

7.1.1 Test statistic


Flaving decided on the null and alternative hypotheses, the next step is
to calculate a statistic which will show up any departure from the null
hypothesis.

136 Significance tests


In the case of Example 1, common sense suggests that the larger the
difference between x and fiQ the more likely it is that the new process
has increased the strength. If we divide this difference by the standard
error of x, which is cr/^/25, we have the test statistic

z= ■/^O
r/V25-

However we must remember that even if there is no difference


between the two processes (that is, H q is true), we cannot expect x to
be exactly equal to JlIq. If H q is true, we have seen in section 6.3 that x
will have a sampling distribution mean ¡llq and standard error a/yj25.
Thus z will follow a standard normal distribution, N{0, 1). Thus if H q
is true we are unlikely to get a value of z bigger than about + 2 so that
if (x —plq)I{(jI^25) is larger than this, then doubt is thrown on the null
hypothesis.

7.1.2 Level of significance


This is the probability of getting a result which is as extreme, or more
extreme, than the one obtained. A result which is unlikely to occur if
H q is true (that is, which has a low level of significance) is called a
significant result.
Using the data of Example 1, we first calculate the observed value of
the test statistic

1312-1250
Zq = . = 2-06.
150/725

From normal tables we find

probability (z > 2 06) = 1 —F (2 06)


= 00197.
Thus there is about a 2 per cent chance of getting a more extreme result.
As this result is so unlikely, we are inclined to reject the null hypothesis
and accept the alternative hypothesis that there has been an improve­
ment.

137 Introduction
7.1.3 The interpretation of a significant result

It is important to realize from the outset that we can rarely prove


with absolute certainty that H qis or is not true. If the level of significance
of the test statistic is fairly low, then doubt is thrown on the null
hypothesis and we will be inclined to reject it. Nevertheless there is still
a small possibility that the observed data could have resulted from the
given null hypothesis.
If the level of significance of the test statistic is less than 5 per cent,
we say that the result is significant at the 5 per cent level. This is generally
taken to be reasonable evidence that H q is untrue. If the result is
significant at the one per cent level of significance, it is generally taken
to be fairly conclusive evidence that H q is untrue.
On the other hand if the level of significance is fairly high (> 5 per cent)
then this simply shows that the data is quite consistent with the null
hypothesis. This does not mean that H q is definitely true, as the sample
size may not have been large enough to spot a fairly small departure
from H q. But, for the time being at least, we have no evidence that H q
is untrue and so will accept it.
Thus if we had obtained the result x = 1290 in Example 1, the
level of significance would be greater than 5 per cent and we would
not have enough evidence to reject H q, as such a result is quite likely
to occur if H q is true. But, if the difference of forty between x and
Pq is judged to be a worthwhile improvement, then a larger sample
should be taken in order to try to detect a smaller improvement than
was possible with a sample of twenty-five. Where possible the sample
size should be chosen beforehand so that the smallest significant
difference, which is about two standard errors (laljn), is much less
than the smallest improvement considered to be of practical importance
(see section 7.8 and Exercise 4).
At this point it is worth emphasizing the distinction between
statistical and practical significance. For example, suppose that the
result X = 1270 was obtained in Example 1, and that the engineer
decided that the difference of twenty between this value and Pq was not
sufficiently large to justify changing to the new process. In other words,
the difference of twenty was judged not to be of practical significance.
Then there is no point in doing a significance test to see if the result is
statistically significant, since the engineer’s actions will not be affected
by the result of such a test. Thus we must remember that a result may be
statistically significant but not practically significant.

138 Significance tests


7.1.4 One-tailed and two-tailed tests
In Example 1 we were only interested in values of x significantly
higher than jUq. Any such test which only takes account of departures
from the null hypothesis in one direction is called a one-tailed test (or
one-sided test).
However, other situations exist in which departures from H q in
two directions are of interest. For example, suppose that a chemical
process has optimum temperature T and that measurements are made
at regular intervals. The chemist is interested in detecting a significant
increase or decrease in the observed temperature and so a two-tailed
test is appropriate. It is important that the scientist should decide if a
one-tailed or two-tailed test is required before the observations are
taken.

Example 4
According to a certain chemical theory, the percentage of iron in a
certain compound should be 12T. In order to test this theory it was
decided to analyse nine different samples of the compound to see if the
measurements would differ significantly from 12T per cent.
Before carrying out the analyses, we must specify the null and alterna­
tive hypotheses and also decide if a one-tailed or two-tailed test is
appropriate.
Denote the unknown mean percentage of iron by p. Then the null
hypothesis is given by
12 *1 %.
The alternative hypothesis is given by
12 -1 %.
As we are interested in significantly high or low results, a two-tailed
test is appropriate. The actual measurements and the resulting analysis
will be given later in the chapter.
For tests based on the normal distribution, the level of significance
of a result in a two-tailed test can be obtained by doubling the level of
significance which would be obtained if a one-tailed test was carried
out on the same result (see section 7.2).

7.1.5 Critical values


We have seen that the lower the level of significance of a particular
result, the less likely it is that the null hypothesis is true. If the observed

139 Introduction
level of significance is very small, then a decision can be made to reject
the null hypothesis. If very large then a decision can be made to accept
the null hypothesis. But if the observed level of significance is around
5 per cent, the results are rather inconclusive and the experimenter
may decide to take some more measurements.
However, if a definite decision has to be taken, the experimenter
should choose a particular level of significance and reject the null
hypothesis if the observed level of significance is less than this. The
5 per cent and the 1 per cent levels are commonly used. This value
should be chosen before the observations are taken. For example, if
the 5 per cent level is chosen and the observed level of significance is
less than this, then we say the result is significant at the 5 per cent level.
A critical value of the test statistic can be calculated which will
correspond to the chosen significance level. Thus in Example 1 we
could choose to reject H q if the observed level of significance is less than
5 per cent. But probability (z > 1*64) = 0 05. Thus the critical value of
the test statistic, z, is 1-64. If the observed value of z is greater than 1-64,
then H q must be rejected.
A variety of significance tests exist each of which is appropriate to a
particular situation. In the remainder of this chapter we will describe
some of the more important tests.

7.2 Tests on a sample mean


In this section we assume that a random sample, size n, is taken from
a normal distribution with unknown mean ¡a. The sample mean is
denoted by 3c. We are interested in a particular value ^ q for /z, and ask
the question ‘does 3c differ significantly from ^IqT. Thus the null
hypothesis is given by
H q\ 11 = flQ.
The method of analysis depends on whether or not the population
standard deviation a is known.

7.2.1 (7 known
This situation has been discussed in detail in Example 1. The test
statistic is given by

z=
<j/yjn'

140 Significance tests


If H q is true then z is a standard normal variable. Denote the observed
value of z by Zq. If a two-tailed test is appropriate the level of significance
is obtained by calculating
P{\z\ ^ |zo |) = 2 x P ( z ^ |z o |)
from normal tables (see Figure 38).
(a) Zq positive Zq negative

(b)

zo
/Vi :m <M o

Figure 38 Shaded area gives level of significance


(a) two-tailed test/yv fi ^ (b) one-tailed test

If a one-tailed test is appropriate we find P(z > Zq) if we are interested


in significantly high values, or P{z < Zq) if we are interested in
significantly low values. Note that if Zq is negative we have
P(z < Zq) = P{z > |Zo|).
by the symmetry of the normal distribution.
The 95 per cent and 99 per cent critical values for the test statistic z
can be found from normal tables and are given in Table 11.

Table 11

T w o-tailed O n e-tailed

95% 196 1-64


99% 2-58 2-33

141 Tests on a sample mean


7.2.2 G unknown
When a is unknown it is natural to replace it with the sample standard
deviation, 5, to obtain the test statistic

-^ 0
t =
s/J n ■

(i) Large samples. When n is more than about twenty-five, the


sample standard deviation is a good estimate of a and, if the null
hypothesis is true, the random variable t will be approximately N(0, 1 ).
Then the analysis proceeds as above where g is known.

(ii) Small samples - the t-test. When n is less than about twenty-five
the sample standard deviation is not such a good estimate of , so g

that, if the null hypothesis is true, the distribution of t is more spread


out than a standard normal distribution. In section 6.7.2 we saw that
the random variable t will follow a i-distribution with (n — l) degrees
of freedom. If we denote the observed value of t by Iq, the level of
significance is found as follows:

two-tailed test P(|i| ^ |iol) = 2P(t ^ |iol) (for H^: p ^ Pq\


one-tailed test P{t ^ to) (for H i :/z > po)
P{t ^ to) (for H i : /X < po).
These probabilities can be found from the table of percentage points
of the i-distribution which is given in Appendix B. As described in
section 6.7.2, the percentage point t^ ,, is chosen so that there is a
probability a of getting a larger observation from a i-distribution with v
degrees of freedom. For a two-tailed test if |iol is larger than t^j2 , then
the result is significant at the 100a per cent significance level. For a
one-tailed test, if to is larger than tg,fovH^ : p > p o or if to is less than
—iflj for H i : < /Iq, then the result is significant at the 100a per cent
significance level.

Example 4 continued
The analysis of the nine diflferent samples of the compound gave the
following results:
11-7 12*2 10-9 114 11-3 120 IM 10-7 11-6
Is the sample mean of these measurements significantly different from
12T per cent?

142 Significance tests


From the observations we compute
X = 11-43
5" = 0-24
s - 0-49.
The population standard deviation is unknown, so we compute the
test statistic
X -fio
^ sl^n
_ 11-43-12-1
” 0-49/79
= -4-1.
As the sample standard deviation is computed from only nine measure­
ments, we must use a i-test. If the null hypothesis is true the test statistic
should follow a i-distribution with eight degrees of freedom.
From Table 2, Appendix B,

0-0 2 5 ,8 = 2-31.
As |iol is larger than io-o25,8? the probability of observing a result
which is as extreme or more extreme than this is certainly less than
2 X0-025 = 0-05. Thus the result is significant at the 5 per cent level.
Moreover we have io oos,8 3-36 so that the result is also significant
at the 1 per cent level. Thus we have fairly conclusive evidence that the
percentage of iron is not 12-1 per cent.

7.3 Comparing two sample means


A common problem in statistics is that of comparing the means of two
samples of data. Let us assume that a random sample, size nj, is taken
from a population having unknown mean and that a second sample,
size ri2 , is taken from a population having unknown mean /I2 . The
observations in the first sample will be denoted by X2 and
in the second sample by x ^, x'2, . . . , x^^ • The two sample means will be
denoted by Xjand X2 .
This section is concerned with answering the question ‘does x^
differ significantly from X2?’. In other words the problem is to test the
hypothesis H q: = //2 • The alternative hypothesis may be of the form
Hi : fii ^ : fii > ^ 2 or < fi2 . A two-tailed test is appro­
priate in the first case and a one-tailed test in the second and third.

143 Comparing two sample means


The significance test proposed here depends on the following
assumptions. Firstly, that both sets of observations are normally
distributed; secondly, that the populations have the same variance ;
and thirdly, that the two samples are independent.
The method of analysis again depends on whether or not the popula­
tion standard deviation a is known.

7.3.1 a known
Common sense suggests that the larger the diflference between and
X2, the less likely H q is to be true. A standardized test statistic can
be obtained by dividing (x j—X2) by its standard error, which
can be found as follows. We know variance ( x j = and variance
(X2) = (J^ln2 . In addition we know that the variance of the sum or
difference of two independent random variables is equal to the sum of
the variances of the two variables (see section 6.2). Thus

variance (x^—X2) = ---- h— .


Hi ^2
Thus the standard error of (xj —X2) is given by
J{a^/ni + (7^/n2) = + l/«2)-
The test statistic is given by

(T^(l/ni + l/n2
If H q is true it can be shown that the random variable z is a standard
normal variable. The level of significance of a particular result Zq can
be found as in section 7.2.

1.32 G unknown
Following the method adopted in section 7.2 we replace a with the
sample standard deviation, s, to obtain the test statistic

t =
s 7 (v « i+ v « 2 )'
The standard deviation of the first sample is given by

V {X i-X if
1

144 Significance tests


The standard deviation of the second sample is given by

*^2 = V ^ 1
Then the combined unbiased estimate of is given by
2 = ...................................................
( » l - l )Sl+ («2-l)S2
«1 + « 2 - 2
X(^i
-\-ri2-2
This result follows from the fact that if X, Y are independent random
variables with , V2 degrees of freedom then X + 7 is also a x^ random
variable with (v^ + V2) degrees of freedom. Thus
( n i- l) s f («2 - 1)52
-+- IS X (ni-l) +(n2-l)

As the mean value of the x^ distribution is equal to the number of


degrees of freedom we find
~( « l~ l) s f + («2 - l ) s j
rii-hn2 —2

The denominator («1+^2 —2) represents the number of degrees of


freedom of this estimate of the variance, and is equal to the number of
degrees of freedom of the estimate sf plus the number of degrees of
freedom of the estimate Note that if = ri2 then we have
si + sl
5^ =

If the null hypothesis is true, it can be shown that the test statistic, i,
follows a i-distribution with 4-/12 —2) degrees of freedom. Thus the
level of significance of a particular result, ìq, can be found as before.

Example 5
Two batches of a certain chemical were delivered to a factory. For
each batch ten determinations were made of the percentage of man­
ganese in the chemical. The results were as follows :
Batch 1: 3-3 3-7 3-5 44 3-4 3-5 4-0 3*8 3-2 3-7
Batch 2: 3-2 3-6 34 3-4 3-0 3-4 2-8 34 3-3 3-6

145 Comparing two sample means


Is there a significant difference between the two sample means?
From the data we find
Batch 1 Batch 2
l-v , 36-2 32-5
131-82 106-23
y ,2 0-776 0-605
Z .' 10

Hence
.Vi = 3-62 X, = 3-25
, 0-776
-------- = 0-0862
9
0-605
si = 0-0672.

It is reasonable to assume that both sets of observations are normally


distributed and that both sample variances are estimates of the same
population variance (see Example 10). The combined estimate of this
variance is given by
^ 1+^2 = 00767
2
0-277.
If the true percentages of manganese in batch 1 and batch 2 are
denoted by and ^ 2 respectively, the null hypothesis is given by
H q\ = H2 , and the alternative hypothesis is given by \ ^ 1^ 2 -
Thus a two-tailed test is appropriate. The test statistic is given by
V- __ V
= 3-0.
S\! +1^)

If H q is true the sampling distribution of t will be a i-distribution with


eighteen degrees of freedom, as the estimate of s is based on eighteen
degrees of freedom.
But from Table 2, Appendix B, we find to oos.is = 2-^^’ so that

probability > 2 88
- ) = 0 01
- .

Thus the result is significant at the 1 per cent level and we have strong
evidence that H q is untrue.

146 Significance tests


7.4 The /-test applied to paired comparisons
When comparing two different methods, it often happens that experi­
ments are carried out in pairs. Then it is the difference between each
pair of measurements which is of interest.

Example 6
In order to compare two methods for finding the percentage of iron in
a compound, ten different compounds were analysed by both methods
and the results are given below.

Compound Method Compound Method


number A B number A B
1 13-3 13-4 6 3-7 4-0
2 17-6 17-9 7 51 51
3 4-1 4*1 8 19 8*0
4 17*2 17-0 9 8-7 8-8
5 101 10-3 10 11-6 120

We ask the question ‘is there a significant difference between the two
methods of analysis?’
Note that it would be wrong to calculate the average percentage for
method A and for method B and proceed as in section 7.3 because
the variation between compounds will swamp any difference there
may be between the two methods. Instead we compute the difference
for each compound as below.

Compound Compound
number Difference number Difference
1 01 6 0-3
2 0-3 7 00
3 00 8 01
4 -0-2 9 01
5 0-2 10 0-4

If the two methods give similar results, the above differences should be a
sample of ten observations from a population with mean zero.
Generally we have k pairs of measurements X2 j, (j = 1, 2, . . . , k)
which are independent observations from populations with means
Pij, p 2 j' The null hypothesis is that each pair of means are equal.
Ho’ j = Pij (for all j).

147 The /-test applied to paired comparisons


Then the differences
dj = Xyj—X2j ij = h .k)
will be a sample, size k, from a population with mean zero. Furthermore,
if the populations are approximately normally distributed, the differ­
ences will also be approximately normally distributed. If the observed
average difference is denoted by d and the standard deviation of the
observed differences by s^, then the standard error of 3 is given by
sj-yjk. We now apply a i-test, as in section 7.2, by calculating the test
statistic

t =
sJs/k'
If H q is true, the distribution of t will be a i-distribution with (k—\)
degrees of freedom, as the estimate is calculated from k differences.

Example 6 continued
d = average difference = 0T3

E
sj = ^---- = 0-031 = 0176
k —l
d
tn — = 2-33
W io
The alternative hypothesis is of the form H^: pij ^ for all j, and
so a two-tailed test is required. From Table 2, Appendix B, we find
^0 025,9 ~ 2*26.
Thus P(\t\ > 2-26) = 0 05.
As to is greater than 2-26, the result is significant at the 5 per cent level,
and so we have reasonable evidence that H q is untrue.

7.5 The goodness-of-fit test


We now turn our attention to a different type of problem. Data can
often be classified into k mutually exclusive classes or categories. Then
we need a test to see if the observed frequencies in each category are
significantly different from those which could be expected if some
hypothesis were true.

148 Significance tests


Let us suppose that this hypothesis suggests that P i , P 2^ • - , Pk
k

the respective probabilities of the categories, where ^ pj = 1. That is,

H q : Pi = probability that ith outcome will occur (i = 1 , . . . , /c).


If n experiments are performed, the expected number of occurrences
of the ith outcome is given by €i = npi. Denote the observed frequency
of the ith outcome by Oj-. Then we want to know if are
compatible with , . . . ,

Theorem
If Oj, . . . , Ofc and np^,.. .,npk are the observed and expected frequencies
for the k possible outcomes of an experiment, then, for large n, the
distribution of the quantity
ft
{Oj-nPi)
2 npi

is approximately that of a random variable with (k—1) degrees of


freedom. One degree of freedom is lost because of the constraint
k k

S Of = n = Y.
i= l i= l

The x^ distribution was introduced in section 6.4 and percentage


points are given in Table 3, Appendix B. The point is such that there
is a probability a of observing a larger value of with v degrees of
freedom. The observed value of the x^ lest statistic will be denoted
by Xo' If the null hypothesis is not true, then we expect Xo be ‘large’,
and as we are only interested in significantly large values of the test
statistic, we have a one-tailed test in which the level of significance of
the observed result is given by P(x^ >xD-

Example 7
Assuming that a die is fair we have
Pi = probability that an i turns up
=i {i= 1,2, 3,4, 5 , 6).

149 The x^ goodness-of-fit test


A die was tossed 120 times and the following frequencies occurred.
01 = 17 02 = 18 03 = 24

02 = 26 = 21 Og = 14.

Test the hypothesis that the die is fair.


If the hypothesis is true, the expected frequency of each outcome is
given by
= 120 x ^ = 20 (i= 1, . . . , 6)
Thus we have
2 _ (\l-2 0 f , (18 -2 0 )^ , (2 4-20)^
Xo — I
20 20 20
^(26 - 20f ^ (21 - 20f ^ (14 - 20)^
20 20 20
= 5-1.
From Table 3, Appendix B, we have
%0 05,5 11‘07.
As Xo is less than 11 07, the result is not significant at the 5 per cent
level and we can accept the null hypothesis that the die is fair.
We have noted that the distribution of the test statistic is only
approximately that of a x^ random variable. In order to ensure that the
approximation is adequate, sufficient observations should be taken so
that €i ^ 5 for all i. However with less than five categories it is better to
have the expected frequencies somewhat larger. If the expected number
in any category is too small, the category should be combined with
one or more neighbouring categories. (If there are more than about
ten categories, then the approximation is valid provided that less than
20 per cent of the values of ei are less than five, and provided that none
is less than one.)
The x^ test can also be used to test goodness-of-fit when the null
hypothesis depends on unknown parameters which must be estimated
from the data. One degree of freedom is deducted for each parameter
estimated from the data. Note that it is preferable to use maximum
likelihood estimates. The test statistic is often written
(observed —expected)^
2 expected

150 Significance tests


E xam ple 8

In Example 12, Chapter 4, we saw how to fit a Poisson distribution


to the data of Example 2, Chapter 1. At the time we simply noted that
there appeared to be good agreement between observed and expected
frequencies. We can now confirm this with a goodness-of-fit test.
The expected Poisson frequencies for r == 4 and r = 54- are less than
five, and so are too small to be treated separately. They are therefore
combined with the results for r = 3 to obtain the following observed
and expected frequencies.

Observed Poisson
r frequency frequency
0 13 120
1 13 14-4
2 8 8-7
34 6 4-9

The test statistic is given by

2 _ (13-12-Or , (13-14-4K , ( 8 - 8 '7 r , (6-4-9)"


Xo — I t v :; I v ;
120 14-4 8-7 4-9
= 0-52.

We now have two linear restrictions on the frequencies. The sums of


the observed and expected frequencies are both equal to forty. In
addition the means of the observed distribution and of the fitted Poisson
distribution are both equal to 1*2. Therefore, since there are four cells,
the number of degrees of freedom is given by (4 —2) = 2. From Table 3,
Appendix B, we have

Zo05,2 = 5’99.
The observed value of x^ is much smaller than the critical value so we
can accept the null hypothesis that the Poisson distribution gives a
good fit.

7.5.1 Testing independence in a two-way table


A series of observations can often be classified by two types of character­
istics into a two-way table.

151 The x^ goodness-of-fit test


Example 9

A company manufactures a washing machine at two different factories.


A survey is taken on a batch of machines from each factory and a
record is kept of whether or not each machine requires a service call
during the first six months.

No service Service
call required
Factory A 80 32
Factory B 63 33

Is there evidence that one factory produces more trouble-free


appliances than the other?
The above is an example of a 2 x 2 two-way table. The general two-
way table will have r rows and c columns. If n observations are taken,
let riij, be the number of observations which fall in the iih row and jth
column.

Table 12
rx c Two-way Table

«11 «12 • .. riu «1-


«2 1 «22 • .. ri2c «2-

«rl «r2 • .. n,. «r-

«1 n.2 . .. n.. n

Let
n,. = Y, ^ij = number of observations in iih row,
j
n.j = Y ^ij = number of observations in jih column.
i
Thus

«=Ii =S
j
"r
Generally speaking we are interested in testing the independence of
the two types of classification. For this reason two-way tables are
often called contingency tables because we may ask if the presence of

152 Significance tests


one characteristic is contingent on the presence of another. Thus, in
Example 9, if factory A produces a higher proportion of trouble-free
appliances than factory B, then a machine is more likely to be ‘no call’
if it comes from factory A than if it comes from factory B. In such a case
the rows and columns are not independent.
We can formalize the null hypothesis as follows. Let pij be the
probability that an item selected at random will be in the iih row and
jth column. Let p,. be the probability that an item will be in the ith
row and p.j the probability that an item will be in the 7th column.
Then the null hypothesis that rows and columns are independent is
given by
Ho ■Pij = PiP j for all ij.
It is easy to show that the maximum likelihood estimates of pj. and p.j
are given by the intuitive estimates and n.j/n. Thus if H q is true
an estimate of the expected frequency in the cell in the ith row and 7th
column is given by

ni. n.j
npij = npi.p.j = n------^
n n
ri:.n.

This can be compared with the observed value ttij. The test statistic is
given by

i=l j=l Ui.n.j/n

The number of degrees of freedom is obtained as follows. As the sum


of the observed and expected frequencies are equal, this results in the
loss of one degree of freedom. In addition the parameters p i., p 2.,..., p^.,
p . i , p . 2 , . . . ,p.c are estimated from the data. However from the first
condition we must have

i j
so only (r + c —2) independent estimates have to be made. Thus the
number of degrees of freedom is given by

rc - 1 - ( r - f c - 2) = ( r - l) ( c - 1 ).

153 The goodness-of-fit test


Example 9 continued

The row and column sums are given by


n^. = 112 « 2- = 96 n.i = 143 n.2 = 65.
The total number of observations is given by n = 208.
Let Cij denote the expected number of observations in the iih row
and jth column. Thus
n:.n.;

We find
i'll = ^12 = 35 -22 = 30.
C2i = 66
(80-77)2 (32-35)2 (63-66)2 (33_3Q)
Thus Xo = -+ +-
77 35 66 30
= 0 -8 2 t

Since number of degrees of freedom = 1,


Zoo5,i — 3*84.
Thus the result is not significant at the 5 per cent level and we have
no real evidence that factory A produces more trouble-free appliances
than factory B.
We have not, as yet, specified an alternative hypothesis when testing
independence in a two-way table. If the value of for a two-way
table is found to be significantly large then the null hypothesis must
be rejected. Occasionally we will indeed have a specific alternative
hypothesis in m ind; but more generally a common procedure is simply
to look at the data and see where large discrepancies between observed
and expected frequencies occur. This may suggest a suitable hypothesis
to describe the data.

7.5.2 Some remarks on the x^ test


The above comments on the alternative hypothesis also apply in other
situations where it is not clear what is. Thus in Example 8, if we
had found that the Poisson distribution did not give a good fit, then
we would have had to consider other models for the data, bearing in
mind the observed discrepancies from the Poisson model.
t With a 2 X 2 table, it is slightly more accurate to calculate

154 Significance tests


In all applications of the test the number of degrees of freedom is
given by (k —m), where there are k cells and m linear constraints between
the observed and expected frequencies. It is often useful to combine
the results of successive experiments made at different times on the
same problem. This is done by adding the observed values of and
also adding the number of degrees of freedom from each individual
experiment. The total value of x^ is then tested with the total number of
degrees of freedom. This may give a significant value even though
some of the individual values are not significant.

7.6 TheF-test
This significance test is widely used for comparing different estimates
of variance, particularly in the analysis of variance which is described
in Chapter 10.
Suppose that we have a normal distribution with variance Two
random samples, sizes and ^ 2 , are drawn from this population and
the two sample variances s\ and S2 are calculated in the usual way.
As 5? and si are both estimates of the same quantity we expect the
ratio s\ls\ to be ‘close’ to unity, provided that the samples are reasonably
large. If we take repeated pairs of samples, size and ri2 , it can be
shown that the ratio F = s\lsl will follow a distribution called the
F-distribution.
A random variable which follows the F-distribution can be obtained
from two independent x^ random variables (see Appendix A). The
distribution depends on two parameters, and V2, which are the
number of degrees of freedom of si and 52 respectively. Note that
the estimates si and s | must be independent. In the above situation
we have = n^ — l and V2 = «2 —1 .
An F-test is carried out in the following way. Let si and S2 be
independent estimates of al and al respectively, and assume that the
observations in the two samples are normally distributed. Then we are
often interested in testing the hypothesis that and S2 are both
estimates of the same variance In other words we want to test the
null hypothesis H q \ gI = al = For an alternative hypothesis of
the form H^ :al > al, a one-tailed test would be appropriate. If the
ratio sl/sl is much greater than one then we will be inclined to reject
H q. In order to see if the observed ratio sl/sl is significantly large, it
is compared with the upper percentage points of the F-distribution
which are given in Table 4, Appendix B. The point F^^ vi,v2 is the point
on the F-distribution, with Vj and V2 d.f., such that a proportion a of

155 The F-test


the distribution lies above it. If, for example, we find s\!sl > ^o o5,vi,v2’
where Sj, s\ are based on Vj, V2 d.f. respectively, then the result is
significant at the 5 per cent level and we have reasonable evidence that
Ho is untrue.

Figure 39 An F-distribution

Occasionally, as in Example 10, the alternative hypothesis is of the


form H ^ : g\ ^ al, in which case a two-tailed test is appropriate. In
this case the test statistic is chosen so that the larger sample variance
is in the numerator, as this enables us to compare it with the upper
percentage points of the F-distribution. If this is not done and the
test statistic is less than one, then it must be compared with the lower
percentage points of the F-distribution, which can be found using the
relationship

fi- i«,Vi,V2 = m 4-a,V2,vi •

Example 10
In Example 5 we assumed that the two sample variances were estimates
of the same population variance. We can now verify this statement with
an F-test.
Let si be an estimate of , and let si be an estimate of . Then the
two hypotheses are given by
). (Jj — (72,
H^ \ o \ ^ g \.

156 Significance tests


A two-tailed test is required. We must assume that both populations
are normally distributed. From the data we find sf — 0 0862 and
si = 0-0672.
The test statistic is given by

^0 ^ (where 5? > 5^)


S2
= 1-28.
(If we had found sf < si the test statistic would have been Fq = sl/s\.)
The number of degrees of freedom of both Sj and s! is nine. From
Table 4, Appendix B, we find F0.025,9,9 4*03, so if s\lsl were greater
than 4-03, the result would be significant at the 5 per cent level. (As
we are running a two-tailed test, the level of significance is 2 x 0-025.)
In fact, the observed test statistic is much smaller than 4-03 and we
conclude that it is reasonable to accept the null hypothesis.

7.7 Distribution-free or non-parametric tests


In order to apply the z-test or i-test shown in section 7.2, it is necessary
to assume that the observations are approximately normally distributed.
Occasionally it will not be possible to make this assumption; for
example, when the distribution of observations is clearly skewed. Thus
a group of tests have been devised in which no assumptions are made
about the distribution of the observations. For this reason the tests
are called distribution-free. Since distributions are compared without the
use of parameters, the tests are sometimes called non-parametric,
though this term can be misleading. The simplest example of a distribu­
tion free test is the sign test which will be applied to the data of Example
6.

Example 11
The ten differences between the results of method A and method B are
0-1 0-3 0-0 -0-2 0-2 0-3 0-0 0-1 0-1 0-4

Thus seven of the differences have a positive sign, one has a negative
sign and two are zero.
Applying the null hypothesis that the two methods give similar
results, the differences are equally likely to have a positive or negative
sign. If we disregard the two zero differences, the number of positive

157 Distribution-free or non-parametric tests


differences in the remaining eight should follow a binomial distribu­
tion with n = S and p = The expected number of positive differences
is given by np = 4, but in actual fact there are seven. The level of
significance of this result is the probability of observing a result which
is as extreme or more extreme. We would be equally suspicious of
seven negative differences and even more suspicious of eight positive
or eight negative differences. Thus the level of significance is given by
P(0, 1, 7 or 8 positive differences)
8V2)
= 007.
Thus the result is not significant at the 5 per cent level.
The result in Example 11 contrasts with the result of Example 6,
when a significant result was obtained by using a i-test. Clearly, the
conclusions obtained from different significance tests need not always
be the same. A general method of comparing significance tests is given
in section 7.8. However it is clear that the sign test is not very dis­
criminatory since it takes no account of the magnitude of the differences.
If a distribution-free test is required, it is better to use the Wilcoxon
signed rank test, which is equivalent to the Mann-Whitney U-test.
This test is widely used in the social sciences and a description can be
found in many books such as Gibbons (1971).
In practice, although the true distribution of the observations is
seldom known, a preliminary examination of the data is oiten sufficient
to make the assumption that the observations are approximately
normally distributed. Moreover it can be shown that small departures
from normality do not seriously affect the tests based on the normal
assumption which have been described in this chapter. For this reason,
these tests are often called robust. The tests based on the normal assump­
tion are used much more frequently in the applied sciences than
distribution-free tests.

7.8 Power and other considerations


After carrying out a significance test, we have some evidence on
which we have to decide whether or not to reject the null hypothesis.
Generally speaking H q is rejected if the observed value of the test
statistic is larger (or smaller) than a particular critical value. This
critical value should be chosen before the observations are taken. It
is often chosen so that there is at most a 5 per cent chance of rejecting
H q when it is actually true.

158 Significance tests


The student is often dismayed by the fact that however the critical
value is chosen, it is still possible to make a mistake in two different
ways. Firstly it is possible to get a significant result when the null
hypothesis is true. This is called an error of type L Secondly it is possible
to get a non-significant result when the null hypothesis is false. This
is called an error of type II.
Hq is true Hq is false
Accept Hq Correct decision Type II error
Reject Hq Type I error Correct decision

Example 12
In Example 1 we showed that in order to test
H o:/i = 1250
against
H ^ \ p > 1250
it was necessary to consider the test statistic
X—1250
150/725 ‘
If H q is true, this is a standard normal variable and
probability (z > 1-64) = 0-05.
Thus H q is rejected at the 5 per cent level if the observed value of z is
greater than 1-64. Therefore 1*64 is the critical value of z. This is
equivalent to rejecting H q if x > 1250 + (l-64x 150)/725 = 1299-2. So
the corresponding critical value of x is 1299*2. Here we have chosen
the critical value in such a way that the probability of an error of type I
is 5 per cent (see Figure 40).
In general the probability of getting a significant result when H q is
true is denoted by a. The above choice of a = 0*05 was quite arbitrary.
If changeover costs are high, so that the experimenter requires strong
evidence of an improvement, then it would be better to choose a = 0-01.
If H q is true we have

x -1 2 5 0
probability < 2-33 = 0-99
(l50/V25

SO that the critical value of z would then be 2-33.

159 Power and other considerations


(b) p = 0-24

Figure 40 Type I and type II errors


(a) Hq true: sampling distribution of x is N{^ 250, 30^)
(b) true: sampling distribution of x is A/(1320, 30^)

If we have a specific alternative hypothesis and a given value of a,


we can also find the probability of an error of type II. This is often
denoted by j?. For example, let us suppose that the new process really
is better and that ¡jl = 1320. Thus x would be normally distributed with
mean 1320 and standard deviation 150/^25. But if H q is tested with
a = 0-05, we have seen that the critical value of x is 1299-2. If the
observed value of x exceeds the critical value, then H q will be correctly
rejected. But a type II error will occur if the observed value of x is less
than the critical value, so that H qis accepted when it is false. If the above
alternative hypothesis is true we have
1320 1299-2 1320
probability (x < 1299-2) = probability p < -
30 30
= probability (z < —0-695)
= 0-24.
and this is the probability of an error of type II.
We can also find the probability of an error of type II for any other
value of a. For example if a is chosen to be 0-01, the critical value of x

160 Significance tests


is 1250 + 2*33x30 = 1319*9. If the above alternative hypothesis is
actually true, the chance of getting a non-significant result is given by

x -1 3 2 0 1319*9-1320
probability (x < 1319*9) = probability < —
30 30

= probability (z < 0*0)


= 0*5.

Thus, with the lower level of significance, there is a much higher chance
of making an error of type II.
From Example 12 it can be seen that the two types of errors are
dependent on one another. For example, if the critical value is in­
creased in order to reduce the probability of a type I error, then the
probability of a type II error will increase. A ‘good’ test of significance
is one which minimizes the probabilities of these two types of error in
some way. For a given value of a we want to choose the significance
test which minimizes jS.
An alternative way of looking at this problem is to look at the power
of the test. For a specific alternative hypothesis, the power of a
significance test is obtained by calculating the probability that H q is
rejected when it is false (that is, the correct decision is made). Then

power = 1 —probability (error of type II)


- 1 -iS.

For a given value of a, we would like to choose the significance test


which has the maximum power.
In general is often non-specific. For example, if we want to test

H q: p = po
against

H^ \ p > po,

it is necessary to construct a power curve by calculating the probability


of rejecting H q for different values of p, and plotting this against p.
The critical value is chosen so that this probability is equal to a when

161 Power and other considerations


Suppose we wanted to compare the i-test with some other significance
test, such as the sign test. This can be done by constructing the power
curves for both tests after choosing the critical values in each test so
that each has the same value for a. If the assumptions for the i-test
are valid (that is, the sample is randomly selected from a normal
population), it can be shown that the power curve of the i-test is always
above the power curve of any other test for > fiQ. In other words
the i-test is then the most powerful test.
We will not attempt to prove this result, nor will we consider the
power of any of the other tests considered in this chapter. In fact it
can be shown that all the tests described in this chapter which are based
on the normal distribution are most powerful under the stated assump­
tions. However the goodness-of-fit test can have rather poor power
properties against certain types of alternative. A more detailed
discussion of these topics may be found in many books such as Wine
(1954) and Snedecor and Cochran (1980).
By now the reader should be well aware of the fact that one cannot be
certain of making the correct decision as a result of a significance test.
It is possible to make an error of type I or of type II. The importance of
statistics is that it enables the experimenter to come to a decision in an
objective way when faced with experimental uncertainty. However it is
always a good idea to give a full statement of the results of an analysis
rather than simply to say that a result is significant or non-significant.
We will conclude this chapter by commenting on a topic which was
mentioned briefly earlier. So far we have considered situations in which
the sample size has been chosen arbitrarily. However this will some­
times mean that the risk associated with a decision is unacceptably
large. Thus it is a good idea to choose the sample size in a scientific way.
The following is a typical situation. A random sample is taken from
N(/i, where is known but ¡j. is unknown. It is required to test
H q ] /^ = fiQ against > fiQ. If the sample is of size n the test
statistic is given by z = ( x - fiQ)l(al^n). The critical value of this
statistic can be chosen so that the probability of an error of type I is
equal to a. Suppose we also want to ensure that the probability of an
error of type II is less than if is actually equal to > //q)- Then
it can be shown that the size of the sample must be at least
+ Zp)
n=

Further discussion of the choice of sample size is given for example by


Snedecor and Cochran (1980).

162 Significance tests


A scientific choice of the sample size is also desirable when H q is not
precise or when the choice of a precise H q conflicts with the idea that
the burden of proof should be on H^. For example suppose that a
chemist wishes to decide if a certain method of analysis gives unbiased
measurements by testing it on several standard solutions of known
concentration. There are two rival theories in this situation namely:
(a) The method gives unbiased results.
(b) The method gives biased results.
Here the burden of proof is on hypothesis (a) suggesting that hypothesis
(b) should be chosen as H q. But more statisticians would choose
hypothesis (a) as H q because it is precise. This doubtful procedure can
be remedied by choosing the sample size large enough to give a
guaranteed power for a specific bias.

Exercises
1. Test the hypothesis that the random sample

121 12-3 11-8 11-9 12-8 124

came from a normal population with mean 12 0. The standard devia­


tion of the measurements is known to be 04.
Also construct a 95 per cent confidence interval for the true mean, fi.
2. Repeat question one without assuming that the standard deviation
is known to be 0 4. In other words estimate the population variance
from the sample measurements and use a i-test.
3. A manufacturer claims that the percentage of phosphorus in a
fertilizer is at least 3 per cent. Ten small samples are taken from a batch
and the percentage of phosphorus in each is measured. The ten measure­
ments have a sample mean of 2-5 per cent and a sample standard
deviation of 0-5 per cent. Is this sample mean significantly below the
claimed value? State the null hypothesis and the alternative hypothesis
and say if a one- or two-tailed test is required.
4. The strength of paper used by a certain company is approximately
normally distributed with mean 30 p.s.i. and standard deviation 3 p.s.i.
The company decides to test a new source of paper made by a different
manufacturer. If this paper is significantly stronger, then the company
will switch its trade to this new manufacturer. A batch of paper is

163 Exercises
obtained from this manufacturer and a series of measurements are to be
made on the strength of different pieces of paper from the batch.
Assuming that the standard deviation of these measurements is also
3 p.s.i., how large a sample size should be chosen so as to be 95 per cent
certain of detecting a mean increase in strength of 2 p.s.i. with a one-
tailed test at the 5 per cent level?
5. For a certain chemical product it is thought that the true percentage
of phosphorus is 3 per cent. Ten analyses give x = 3*3 per cent and
s = 0-2 per cent. Is the sample mean significantly different from 3 per
cent? (This question differs from question 3 because we are interested
in departures from 3 per cent in either direction.)
6. One sample of fifteen observations has = 82 and = 5. A
second sample of ten observations taken by a different scientist has
X2 = 88 and S2 = 7. Is there a significant difference between the two
sample means at the (a) 0*05 and (b) 0 01 level of significance? (You may
assume that the two populations have equal variances.)
7. Test the hypothesis that the following set of 200 numbers are
‘random digits’, that is, each number is equally likely to be 0, 1, 2, . . . , 9.
r 0 1 2 3 4 5 6 7 8 9

Frequency 22 16 15 18 16 25 23 17 24 24
8. The following figures show the number of accidents to 647 women
in a period of five weeks while working on the manufacture of shells.
(Source: M. Greenwood and G. U. Yule, Journal of the Royal Statistical
Society, 1920.)
Number of accidents 0 6-f

Frequency 447 132 42 21 0

Find the Poisson distribution with the same mean. Test the hypothesis
that the Poisson distribution gives a good fit to the data.
9. In order to test the effectiveness of a new drug in treating a particular
disease, seventy patients suffering from the disease were randomly
divided into two groups. The first group was treated with the drug and
the second group was treated in the standard way. The results were as
follows.
Recover Die
Drug 20 15
No drug 13 22

Test the hypothesis that the drug has no effect.

164 Significance tests


References

G ibbons, J. D . (1 9 7 1 ), Nonparametric Statistical Inference, M cG raw -H ill.


S nedecor, G . W ., and Cochran, W . G . (1980), Statistical Methods, 7th ed n , Iow a
S tate U n iversity Press.
Wine, R. L. (1964), Statistics for Scientists and Engineers, P rentice-H all.

Addendum
S ig n ifica n ce te sts can so m e tim e s b e very h elp fu l, but I am d istu rb ed at th e p resen t
te n d e n c y o f n o n -sta tistic ia n s to o v e r d o sig n ifica n ce tests. F or e x a m p le th e re is n o p oin t
in testin g null h y p o th e s e s w h ich are o b v io u sly u n tru e, as w h en the a u th or w as rece n tly
a sk ed to a n a ly se so m e b io lo g ica l d a ta and test th e null h y p o th e sis that w ater has n o
e ffe ct o n plan t gro w th . T h e d a n g er h ere is that th e test m ay b e carried o u t in correctly
or that an in a d eq u a te sa m p le siz e m ay lea d to th e null h y p o th e sis b e in g in correctly
a ccep ted .
T h er e is a lso n o p o in t in testin g a null h y p o th e sis w h ich is o b v io u sly g o in g to b e
a cce p te d , as w h en th e au th o r w as r ece n tly a sk ed to test th e null h y p o th e sis that fx^ / iq ,
w h ere pu w as an u n k n o w n p o p u la tio n m ea n and /jlqa sp ec ific v a lu e , and th e sam p le
m ea n w as less than /jlq.
A n o th e r d a n g er w ith sig n ifica n ce te sts is that th e a ssu m p tio n s o n w h ich th e y are
b a sed m ay n o t b e sa tisfied , if for ex a m p le th e d ata are co rr e la te d , rather than
in d e p e n d e n t.
In so m e jo u rn a ls, particu larly m ed ica l jo u rn a ls, it has b e c o m e virtu ally im p o ssib le to
p resen t any resu lts w ith o u t carrying o u t a test and ‘g ivin g a P -v a lu e ’. T h is is very
d a n g ero u s. It sh o u ld b e rea lised that in g en era l (th o u g h n o t a lw ays), estim a tio n is m ore
im portan t than sig n ifica n ce testin g .

T w o tips, w hich are w ell w orth rem em bering, are that:

(1) A non -sign ifican t difference is n o t n ecessarily the sam e thin g as n o difference.
(2) A significant difference is not necessarily the sam e thin g as an in terestin g difference.

165 Exercises
Chapter 8
Regression and correlation

In previous chapters we have been mainly concerned with the behaviour


of one variable without reference to the behaviour of any other variable.
In this chapter we consider the situation in which simultaneous
measurements are taken on two (or more) variables.

8.1 Scatter diagram


Let us suppose that n pairs of measurements, (X2 ,y 2 \ - - ^
(x„, y„X are made on two variables x and y. The first step in the investiga­
tion is to plot the data on a scatter diagram in order to get a rough idea
of the relationship (if any) between x and y.

Example 1
An experiment was set up to investigate the variation of the specific
heat of a certain chemical with temperature. Two measurements of
the specific heat were taken at each of a series of temperatures. The
following results were obtained.

tS1-75

M -70

1-65

1.60t— 60 70 80 90 100
50

temperature

Figure 41 Scatter diagram of data of Example 1

166 Regression and correlation


Temperature °C 50 60 70 80 90 100

Specific heat 1-60 1-63 1-67 1-70 •71 1-71


1-64 1-65 1-67 1-72 •72 1-74

Plot the results on a scatter diagram.

8.2 Curve fitting


It is often possible to see, by looking at the scatter diagram, that a
smooth curve can be fitted to the data. In particular if a straight line
can be fitted to the data then we say that a linear relationship exists
between the two variables. Otherwise the relationship is non-linear.
Situations sometimes occur, particularly in physics and chemistry,
in which there is an exact functional relationship between the two
variables and in addition the measurement error is very small. In
such a case it will usually be sufficiently accurate to draw a smooth
curve through the observed points by eye. Here there is very little
experimental uncertainty and no statistical analysis is really required.
However, most data do not give such a clear relationship, and an
objective statistical approach is required. In the first part of this chapter,
we discuss the situation where the values of one variable, called the
response or dependent variable, depend on the selected values of one (or
more) variables which can be determined by the experimenter. Such
variables are variously called controlled, regressor, predictor, explanatory
or independent variables, though the latter adjective is best avoided (see
page 199). The problem is usually complicated by the fact that the
dependent variable is subject to a certain amount of experimental
variation or scatter.
Thus, in Example 1, the temperature is the controlled variable and
the specific heat is the dependent variable. At a fixed temperature, the
two observations on the specific heat vary somewhat. Nevertheless it
can be seen that the average value of the specific heat increases with
the temperature.
The problem now is to fit a line or curve to the data in order to
predict the mean value of the dependent variable for a given value
of the controlled variable. If the dependent variable is denoted by y
and the controlled variable by x, this curve is called the regression
curve, or line, of y on x.
We will begin by considering the problem of fitting a straight line

167 Curve fitting


to n pairs of measurements, (xj, y j , . .. , (x„, y„), where the are subject
to scatter but the are not. A straight line can be represented by the
equation
y = aQ +a^x.

Our task is to find estimates of Gq and such that the line gives a good
fit to the data. One way of doing this is by the 'method of least squares'.
At any point Xj the corresponding point on the line is given by
a Q - y a ^ X i , so the difference between the observed value of y and the
predicted value is given by
ei = yi-{ao+aiXi).

Figure 42

The least squares estimates of Gq and a^ are obtained by choosing


the values which minimize the sum of squares of these deviations.
The sum of the squared deviations is given by

S = t e f
i= l

i= 1

168 Regression and correlation


This quantity is a function of the unknown parameters Uq and a^.
It can be minimized by calculating dSjda^ and dS/da^, setting both
these partial derivatives equal to zero, and solving the two simultaneous
equations to obtain the least squares estimates, Qq and , of « q or .
We have

5ao

Z 2(y .-iio - a i^ i)( -x .) .


da.
When these are both zero we have
'L '^ (y i~ ^ o -a iX i)(-l) = 0,

Z - «0 - a 1 ^.)( - ^i) = 0.
These can be rearranged to give

Mao + 5, Z ^ i = Z 3'.’
8.1
¿ o Z ^ i + ^ i Z ^ f = Z-x.y.-
These two simultaneous equations in Gq and aj are often called the
normal equations. They can be solved to give

«0 =
2 -^/0 / - y ) S ( x ,- x ) ( y ,- y )
«1 = since x^{yi —y) = 0.
Ex/(jC/ —x)

It is easy to see that the least squares regression line passes through
the centroid of the data, (x, y), with slope . Thus it is often convenient
to write the equation in the equivalent form
y - y = fli(x-x).
Many pocket calculators with a facility for regression use an
algebraically equivalent formula for given by

4 _ " Z ^ i> '.- Z 3 '.Z ^ i


‘ «Z ^f-(Z ^.)^ ■
Each of the required quantities, namely ^ x ,, ^ x f and
can be easily computed from the data.

169 Curve fitting


The formula for ài is also equivalent to

8.2

which can be used if x and y have already been calculated.


After the least squares regression line has been calculated, it is
possible to predict values of the dependent variable. At a particular
value, Xq, of the controlled variable, the point estimate of y is given by
áo + áiXo, provided Xo lies in the range covered by the experiment.
Extrapolation may be dangerous.

Example 2
Fit a straight line to the data of Example 1 and estimate the specific
heat when the temperature is 75°C.
By inspection we have x = 75 = average temperature.
12
By calculation we have ^ y¡ = 20T6 giving y = 1-68.
i= 1

We also have = 33-8894, = 1519-9 and = 71,000. Note


particularly in the last summation that each value of x must be con­
sidered twice as there are two measurements on the specific heat at
each temperature.
From equation 8.2 we find
ài = 0-00226.
Hence áo= 1*510.
Thus the estimated regression line of y on x is given by
y = l-51-h000226x.
When X = 75 the prediction of the specific heat is given by
1-51+0-00226x75 = 1-68.
In many cases an inspection of the scatter diagram is sufficient to
see that the regression equation is not a straight line. One way of
using linear theory in the non-linear case is to transform the variables
in such a way that a linear relationship results. For example, if two
variables are related by the formula
y = aox^,
then we have

170 Regression and correlation


log>^ = logflo + c lo g x ,

so that if log y is plotted against log x, the points will lie on a straight
line. One advantage of such a transformation is that it is easier to fit
a straight line than any other type of curve. A more detailed account of
such transformations is given by Wetherill (1981, Chapter 8).
In general it will more often be necessary to try and fit a non-linear
curve to the data. Before describing how this is done, we will discuss
the problem of regression in more general terms.

8.3 Regression
We have seen that if several measurements are made on the dependent
variable, y, at the same value of the controlled variable, x, then the
results will form a distribution. The curve which joins the mean values
of these distributions is called the regression curve of y on x, and an
example is given in Figure 43. The problem of finding the most suitable
form of equation to predict one variable from the values of one, or more,
other variables is called the problem of regression.

Figure 43 Regression curve. The locus of the mean values of the /-distributions

In order to estimate the regression curve of y on x we must first


specify the functional form of the curve. Some examples are
y = ÜQ-i-a^x linear regression,
y = ao~\-aiX-ya2 X^ quadratic regression.
y = ao c*.

171 Regression
The functional form may be selected from theoretical considerations.
For example, the experiment may have been designed to verify a
particular relationship between the variables. Alternatively the
functional form may be selected after inspecting the scatter diagram,
as it would be pointless for example to try and fit a straight line to some
data if the relationship was clearly non-linear. In practice the experi­
menter may find it necessary to fit several different types of curve to the
data and to choose the one which gives the best fit (see section 8.7).
After the functional form has been selected, the next problem is to
estimate the unknown parameters of the curve. If, for example, a
quadratic relationship is thought to be appropriate, the regression
curve is given by
y = aQ-\-aiX-\-a2 X^
and then the quantities «o» ^2 n^ust be estimated from the data.
A general method of estimating the parameters of a regression curve
is by the method of least squares. We have already described how this
is done when the regression curve is a straight line. A similar technique
can be adopted with other regression curves. For example, let us
suppose that the regression curve of y on x is given by
y = aQ-\-aiX-\-a2 X^.
At any point x,, the corresponding point on the curve is given by
aQ+ aiXi~\-a2 xf, so the difference between the observed value of y and
the predicted value is
e¡ = yi-(ao + aiX i+ a2 x f).

The sum of squared deviations is given by

This quantity can be minimized by calculating dS/düQ, dS/dui and


dS/dü2 , and setting the partial derivatives equal to zero. Then the
least squares estimates can be obtained by solving the resulting simul­
taneous equations which are

náo + ái + = Z>'i’
áo X + ái X + ^2 Z = Z
á o Z ^ . - + ^ i Z ^ f + ^ 2 Z ^ ? = 'Ly¡^f-.
These are the normal equations for quadratic regression. They can be
solved to give the least squares estimates áo, áj and ü2 .

172 Regression and correlation


The next problem that arises is to find the conditions under which the
least squares estimates are ‘good’ estimates of the unknown parameters
of the regression equation. In the case of linear regression it can be
shown that the least squares estimates are maximum likelihood
estimates if the following conditions apply :
(1) For a fixed value of the controlled variable, Xq say, the dependent
variable follows a normal distribution with mean
(2) The conditional variance of the distribution of y for a fixed value
of X is a constant, usually denoted by (7y|^. Thus the conditional variance
of the dependent variable does not depend on the value of x.
A preliminary examination of the data should be made to see if
it is reasonable to assume that these conditions do apply. Together
these assumptions constitute what is called the linear regression model.
This model states that for a fixed value of x, say Xo, the dependent
variable, y, is a random variable such that
E(y\x) = ao + a^x.
Thus the line
y = ao-\-a^x
joins the mean values of the y-distributions and is the true regression
line. In order to distinguish the true regression line from the estimated
regression line we will denote the latter by
y = âo-yâiX.

The assumptions made in the above model are also necessary to


establish confidence intervals for the true values of Qq and and for
the mean values of the y-distributions. This will be described in the
next section.
In general the properties of normality and constant variance are
often important assumptions in many other types of model. For
example, in quadratic regression, suppose that the dependent variable
follows a normal distribution with mean a o + a l X + a 2 ^ ^ where x is
the value of the controlled variable. If the variance of the y-distributions
is a constant which does not depend on the value of x, then the
least squares estimates of Qq, a^ and ü2 are in fact the maximum likeli­
hood estimates.
Note that if the variance of the y-distributions does vary with x,
then the least squares procedure must be modified by giving more
weight to those observations which have the smaller variance. This

173 Regression
problem is considered by Weisberg (1985) Chapter 4. A somewhat
similar problem is also considered in this book in section 9.5.

8.4 Confidence intervals and significance tests in linear regression


The next two sections will be concerned with completing the discussion
of linear regression. Once we have made the assumptions of normality
and constant variance, as described in the previous section, we are in a
position to obtain confidence intervals for üq, ai and + The
results will be stated without formal proofs.
Given n pairs of observations, (xj, yj),. . . , (x„, y j, the least squares
estimates of üq and ai can be obtained in the manner described in
section 8.2. These estimates can be expected to vary somewhat from
sample to sample. However it can be shown that both estimates are
unbiased and hence that Aq+ ^ i X is an unbiased estimate of Uo+^iX.
In addition it can be shown that
nx"-
variance (Üq

_2
that variance (â,) = ^y\x
Z (x .-x )^
and that both Gq and are normally distributed, as each is a linear
combination of the observed values of y which are themselves normally
distributed. Thus in order to obtain confidence intervals for Gq, ai and
flo 4-^1 X we must first obtain an estimate of the residual variance,
The sum of squared deviations of the observed points from the
estimated regression line is given by

This quantity is often called the residual sum of squares. It can be


shown that an unbiased estimate of can be obtained by dividing
this residual sum of squares by n —2

„2 „ Z iy i-^ O -^ lX ,)^
n -2

The denominator, n —2, shows that two degrees of freedom have been
lost. This is because the two quantities do and di were estimated from
the data, so there are two linear restrictions on the values of
y i~ — A computer will routinely calculate Sy\x, but, for a

174 Regression and correlation


pocket calculator, a more convenient formula is

«2 - Z y f - ^ o Z y .- ^ i Z^.>'.
n —2

Then it can be shown that the 100(1 —a) per cent confidence interval
for a 1 is given by

for Gq by

^0 ± ha.n - 2 XS,|, J ~+ _ - ^ 2]

and for OQ+OiXf, by

a o + ai^ O ± ii,.n - 2 XS,|,

The confidence interval for ao~\-aiX is illustrated in Figure 44. Note


that it is shortest when Xq = x.

Figure 44 Confidence intervals for (ao + ^ix)

175 Confidence intervals and significance tests in linear regression


Most engineers believe that the best test procedure for establishing
a relation between two variables is to run equally spaced observations.
This approach is in fact quite correct if there is no prior information
about the relationship, particularly if it is intended to use the method
of orthogonal polynomials to analyse the data, as shown in section 8.7.
However if the experimenter already has convincing evidence that the
relationship is linear, then the above results indicate that it is better to
take more observations at the ends and ‘starve’ the middle. This will
increase the value of —x)^ and so decrease the standard error of all
the above estimates.
One question which frequently arises is whether or not the slope of
the regression line is significantly diflferent from zero. In other words
it is desired to test the hypothesis H q ’m^ = 0 against the alternative
hypothesis ^ 0. The test statistic is given by

^y\x
and follows the i-distribution with n —2 degrees of freedom if H q is true.

Example 3
In Example 2 the estimated slope of the regression line is given by
= 000226.
Is this value significantly different from zero?
33*8894- 1*510833 x 20*16-0*00225556 x 1519*9
We have =
10
= 0*00034,
therefore = 0*018,

Y ^{X i-x f = 3500.


0*00226x 59*1
to —
0-018
= 7-3.
But t,0 - 0 2 5 ,10 - 2*23.

Thus the result is significant at the 5 per cent level and so the slope of
the regression line is significantly different from zero, even though it
appears, at first sight, to be very small.

176 Regression and correlation


We have seen how to find a confidence interval for the mean value of
y, for a given value of x. However we are often more interested in the
spread of the observations around this mean value. Thus it would be
useful to find an interval in which future values of y will probably lie.
It can be shown that there is a probability 1 —a that a future observa­
tion on y, at the point Xq, will lie between

. , / fl , 1 , (^0-^ )^
ao + aiXo±t^,.„-2S,|^ ^ + -+ ~ n --------------

i= 1
This interval is often called a prediction interval.

8.5 The coefficient of determination


Another important consideration is to see how well the estimated
regression line fits the data. In order to achieve this we will use the
following important relationship. If the least squares line is given by
y—A q + ¿21 X ,

it can be shown that

- y f = I iyi-9i)^+ 'Z (y i-y )^


i= l i= l

The quantity measures the sum of squared deviations of the


observed y-values from y. This is often called the total corrected sum
of squares of y or the ‘total variation' in y. The quantity Y,{yi —y y is
the residual sum of squares and was introduced in section 8.4. It is
sometimes called the ‘unexplained variation'. The quantity
represents the variation of the points pi on the estimated regression
line and is often called the ‘explained variation'. The important point
is that the total variation can be partitioned into two components,
the explained and unexplained variation.
Total variation = explained variation + unexplained variation.
The ratio of the explained variation to the total variation measures
how well the straight line fits the data. This ratio is called the coefficient
of determination and must lie between nought and one. If it is equal to
one, then all the observed points lie exactly on a straight line. If it is
equal to nought, then = y for all i, so the slope of the regression line
is zero. Therefore the closer the coefficient is to one, the closer the

177 The coefficient of determination


points lie to an exact straight line. The coefficient can also be calculated
in a similar way for a non-linear curve, which is fitted to a set of data
by the method of least squares. The total variation is partitioned into
the explained variation and the residual sum of squares in a similar way.
The coefficient of determination for linear regression turns out to be
the square of a quantity called the correlation coefficient which is
considered in section 8.9.

8.6 Multiple and curvilinear regression

The analysis of the linear regression model can be extended in a


straightforward way to cover situations in which the dependent variable
is affected by several controlled variables, or in which it is affected
non-linearly by one controlled variable.
For example, suppose that there are three controlled variables,
Xj, X2 , and X3. A linear regression equation is of the form
y = ao+i 2i X i + a 2X2 -ha 3X3.
Given n sets of measurements, (yi, Xj 1 , X21 , X3 1 (y„, X2„, X3J,
the least squares estimates of Qq, a 2 and can be obtained in a
similar way to that previously described. The sum of squared deviations
of the observed values of y from the predicted values is given by

. .............................................. , . d S d S d S , d S
This quantity can be minimized by setting - —, - —, - — and — equal
VGQ VGj yG2 CG3
to zero, to obtain four simultaneous equations in ^ 0, « 1 , «2 ^ 3*
^0« + « l Z ^ l i + « 2 E ^ 2 i + a 3 E ^ 3 i = E > 'n

+ Z ^ l i + «2 Z ^ 2i^li + «3Z^3i^li =
8.4
^ o Z ^ 2 i + « l Z ^ U ^ 2 i + ^ 2 Z ^ l i + « 3 Z ^ 3 i ^ 2 i = Z yi^2i,

^ o Z ^ 3 , + ai Z ^ l i ^ 3 . + a 2 Z ^ 2 i ^ 3 i + ^ 3 Z ^ 3 i = Z
These four equations, called the normal equations, can be solved to
give the least squares estimates of Gq, g^, G2 and g^. We shall not
discuss numerical methods for solving these equations as multiple
regression programs are now available with the majority of computers
and they have taken most of the hard work out of regression. Some
remarks on the dangers of multiple regression and the selection of
variables are given in the addendum at the end of the chapter.

178 Regression and correlation


To derive confidence intervals for the regression parameters and for
the predicted values, it is necessary to make the assumptions of normal­
ity and constant variance as in linear regression. The reader is referred
for example to Weisberg (1985), Wetherill (1981, 1986) or Draper
and Smith (1981).
Curvilinear regression, which has already been referred to in Section
8.3, is tackled in a similar way to multiple regression and can, if desired,
be considered as a special case of multiple regression. We will now
discuss some special cases.

8,6.1 Polynomial regression

Let us suppose that the dependent variable is a polynomial function


of a single controlled variable. For example, in cubic regression, the
regression equation is given by

y =

This type of regression can be approached in the same way as multiple


regression. In the case of cubic regression we can substitute Xj = x,
X2 = x^ and X3 = x^. The least squares estimates of Oq, «2
can then be obtained by solving the normal equations 8.4.
If the observations are taken in such a way that there are an equal
number of observations on y at a series of equally spaced values of x,
then it is computationally more efficient to use the method of
orthogonal polynomials, which is described in Section 8.7.

8.6.2 Mixtures

Some regression models involve a combination of multiple and curvilin­


ear regression. Two examples of this are

y =
y = ao-yaiX-\-a 2 Z-ya^xz.

In the second equation the term ^ 3x 2 implies an interaction between


the two controlled variables x and z. Both situations can be analysed
in the manner previously described. In the first case we set Xj = x,
X2 = x^ and X3 = z; in the second case we set Xi = x, X2 = z and
X, = xz.

179 Multiple and curvilinear regression


8.6.3 Transformations
Theoretical considerations may lead to a regression model which
depends on a transformation of the controlled variables. For example,
if the regression equation is of the form
y = « 0+^1 logx + a 2 log z,
where x and z are the controlled variables, then estimates of a^, a^
and «2 can be obtained by setting Xi = logx and X2 = logz.

8.7 Orthogonal polynomials


If an experiment on polynomial regression is planned so that the
values of the controlled variable are equally spaced, with an equal
number of observations at each point, then it is computationally more
efficient to use a technique called the method of orthogonal
polynomials to fit the regression equation. Although the method is still
of some interest, it is much less important now that fast computing
facilities are widely available, and so some readers may wish to omit
this section.
For polynomial regression of degree k the regression equation is
usually written in the form
y = <3o + fliX+ ...
To use the method of orthogonal polynomials this equation is rewritten
in the following form :
y = a'¿-\-a\fl{x)-y ... + a iM x \

where fXx) is a polynomial in x of degree r, and the constants aj' depend


on the values of ai.
It can be shown that it is always possible to find a set of polynomials
which satisfy the following conditions:
I/.(x )/.(x ) = 0 (r ^ s),
X

i m = 0.
X

These polynomials are called orthogonal polynomials. They depend


on the average value, x, of the controlled variable and on d, the distance
between successive values of x.
The method of calculating these polynomials is rather complicated.
In practice it is much easier to work with a standardized (coded)
controlled variable which is given by

180 Regression and correlation


^ d '
The corresponding regression equation is of the form
y — Uq
These standardized orthogonal polynomials are tabulated in both
Fisher and Yates (1963) and Pearson and Hartley (1966). The standard­
ized controlled variable is symmetric about zero and there is a unit
distance between successive values of z.
The next step is to obtain least squares estimates oi Qq,
First of all we consider the case where one observation is made on y
at n different values of x (or z). If the normal equations are derived as
in the previous section, it is found that they reduce to the following
simple equations
= I y,-
a'l E/i(Zi)^ =

= X/k(2i)y,-
All the other terms are zero because the polynomials are orthogonal;
all summations are for i = 1 to n. The quantities given in
the above tables. Thus the only quantities which have to be calculated
are ^X(Zj)y, ; r = \ to k. This can easily be done as the numerical
values of/^(z,) are also given in Fisher and Yates (1963) and Pearson and
Hartley (1966). The estimates follow immediately.
in general if c observations on y are made for n different values of x
then the normal equations are modified by multiplying the terms on
the left hand side of the equations by c. Finally the regression equation
can be obtained as a polynomial in x by substituting
(x —x)
^ d '
Another advantage of orthogonal polynomials is that the estimates
of the regression parameters are independent. This fact is particularly
useful when the order of the polynomial is not known beforehand.
The problem then is to find the lowest order polynomial which fits the
data adequately. In this case it is best to adopt a sequential approach,
by first fitting a straight line, then fitting a quadratic curve, then a
cubic curve, and so on. At each stage it is only necessary to estimate
one additional parameter as the earlier estimates do not change.

181 Orthogonal polynomials


At each stage an F-test is performed to test the adequacy of the fit.
In order to describe this we will again assume that one observation is
made at n different values of x. The results are stated without proof.
After fitting a straight line, the residual sum of squares is given by

After fitting a quadratic curve, this is reduced to

^2 = Z [ y - ^ 0 -^'l/l(z)-^2/2(z)]^-
It can be shown that this reduction is given by

R1- R 2 =
In general, after fitting a polynomial of degree /c, the residual sum of
squares is given by

The residual sum of squares involves (/c+1) parameters which have


been estimated from the data. In order to estimate the residual variance
we would divide this quantity by n-(fc+l), which is the number of
degrees of freedom corresponding to . We say that is on n —(/c +1)
degrees of freedom. Now at stage j the residual sum of squares is
reduced by a f a quantity which is on one degree of freedom.
If a polynomial of degree k —1 fits the data adequately then
Rf,-i/(n —k) will be an estimate of the residual variance But R ^ - 1
can be split up into two components R,, and a'^ ^hat
RJ{n —k —l) and t>e independent estimates of
Then the ratio of these two quantities will follow an F-distribution
with one and n —k —l degrees of freedom.

F =
R J in -k -iy

On the other hand if a polynomial of degree k does fit the data better
than a polynomial of degree (fc —1), then R^ will be substantially less
than R k -i, and the F-ratio may be significantly large.
This sequential process is continued until two non-significant
F-ratios in a row are obtained. This is necessary because even-order
polynomials may give non-significant results even though odd-order
polynomials do give significant results - and vice versa.
The method is illustrated in Example 4.

182 Regression and correlation


Example 4
Find the polynomial of the lowest degree which adequately describes
the following hypothetical data in which x is the controlled variable.
X 0 1 2 3 4 5 6

y 6-3 5-7 6-3 7-3 9-9 12-5 18-1

The standardized controlled variable is given by


z = X—3.
There are n = 7 values of the controlled variable. From Pearson and
Hartley (1966), the first four orthogonal polynomials are the following:

z h(z) fiiz) Uz)


-3 -3 5 -1 3
-2 -2 0 1 -7
-1 -1 -3 1 1
0 0 -4 0 6
+1 +1 -3 -1 1
+2 +2 0 -1 -7
+3 -l- 3 5 1 3
28 84 6 154

Then we find

i /i(z,));, = 52-6,
i= 1

= 44-2,
= i'4,
= 5-8,
X y = 66-1 .
Thus we have
flo = y = 9-44,
_ 52-6
= 1-878,
28
44-2
ai = — = 0-526,
^ 84

183 Orthogonal polynomials


1-4
a; = — = 0-233,
6

= ^ = 0037.

The next stage is to compute a series of F-ratios to see how many of


these parameters are required. At each stage two mean squares are
obtained by dividing the appropriate sum of squares by the appropriate
number of degrees of freedom. The ratio of the mean squares, the
F-ratio, is then compared with Fo.o5,i,n-k- i •

Table 13

Type of variation Sum of squares d.f. Mean square F-ratio F o-05

Residual from mean By-y)"" = 122-86 6

Explained by linear = 98-81 1 98-81 20-5 6-6


Residual from linear 24-04 5 4-81

Explained by quadratic á í l / 2 ( z ) ' =23-26 1 23-26 116 7-7


Residual from quadratic 0-79 4 0-20

Explained by cubic a? 1 0-33 2-2 10-1


Residual from cubic 0-46 3 0-15

Explained by quartic á;' Y .U zf = 0-22 1 0-22 1-8 18-5


Residual from quartic 0-24 2 0-12

Neither the cubic nor the quartic terms give a significant F-ratio.
In any case, the residual sum of squares is so small after fitting linear
and quadratic terms that it is really unnecessary to try higher order
terms in this case. Thus a quadratic polynomial describes the data
adequately.
The next stage is to compute the regression equation in terms of the
original controlled variable, x. For this we need to know the orthogonal
polynomials as functions of z. These are also given in Fisher and Yates
(1963) and Pearson and Hartley (1966). We find

/i(z) = with = 1

184 Regression and correlation


and
/ 2(z) = À2 (z^ —4) with >^2 = 1 -
Thus the estimated regression equation is given by
y 9-44+l-88z + 0-53(z^-4)
= 9'44+ l-88(x-3) + 0*53[(x-3)^-4]
= 645-1*30 jc+ 0-53x^

8.8 The design of regression experiments


So far, we have said little about how to plan a regression experiment,
although this is a very important topic. Here we will only make a few
preliminary remarks as the design of experiments is considered in some
detail in Chapters 10 and 11. Nevertheless it is important to realise
that a little foresight while the data is being collected can lead to a
substantial decrease in the amount of computation required. A good
design is also necessary to ensure that the conclusions from the experi­
ment are valid.
In polynomial regression, we have already seen that if successive
values of the controlled variable are an equal distance apart, then the
method of orthogonal polynomials can be used. Even in linear regres­
sion a similar restriction on the values of the controlled variable will
reduce the amount of arithmetic (see exercise 2). In multiple regression
we will see that the best design is one in which observations are made at
a rectangular grid of points. This experiment is called a complete
factorial experiment. After the values of the controlled variables have
been standardized, the cross product terms in the normal equations
turn out to be zero and so the equations become much easier to solve
(see Example 3, Chapter 11).
The above remarks have been concerned with reducing the amount
of arithmetic. Another consideration is to carry out the experiment
in such a way that nuisance factors do not affect the results of the
experiment. This can usually be achieved by randomizing the order of
the experiments. A full discussion of this technique is given in Chapter
10.

8.9 The correlation coefficient


The first part of this chapter has been concerned with the problem of
regression. We now turn our attention to a different type of situation in

185 The correlation coefficient


which measurements are made simultaneously on two variables,
neither of which can be controlled. In other words they are both
random variables.
Some typical data of this type is given in Example 5.

Example 5
The following pairs of (coded) measurements were taken of the tem­
perature and thrust of a rocket engine while it was being run under the
same operating conditions. Plot the results on a scatter diagram.
X y X y X y
19 1-2 33 2-1 45 2-2
15 1-5 30 2-5 39 2-2
35 1-5 57 3-2 25 1-9
52 3-3 49 2*8 40 1-8
35 2*5 26 1-5 40 2-8

Data of this type occurs frequently in biology and the social sciences.
For example measurements are often made on two different character­
istics of the same human being, such as his height and weight. Both
these variables are subject to considerable random fluctuation.
Nevertheless we expect to find some relationship between them as a
man who is taller than average is also likely to be heavier than average.
The problem in this sort of situation is to see if the two variables are
inter-related, and, if so, to find a measure of the degree of association or
correlation between them. We will only be concerned with linear
correlation, when the relationship between the variables appears to be
linear.

3
• •
• •
• •

10 20 30 40 50 60
temperature (x)

Figure 45 Scatter diagram of data of Example 5

186 Regression and correlation


The correlation is said to be positive if ‘large’ values of both variables
tend to occur together, and is said to be negative if ‘large’ values of one
variable tend to occur with ‘small’ values of the other variable.
The correlation is said to be high if the observations lie close to a
straight line and is said to be low if the observations are widely scattered.
Finally the variables are said to be uncorrelated if there does not
appear to be any relationship between them. The different types of
correlation are illustrated below.

(a) (b)

(c) (d)

Figure 46 (a) High positive correlation (b) Low positive correlation


(c) Negative correlation (d) No correlation

The most important measure of the degree of correlation between


two variables is a quantity called the (product moment) correlation
coefficient. Let us suppose that n pairs of measurements, (x,-, y^), are
made on two random variables X and Y. Then the observed correlation
coefficient is given by
Y iX j-x ) (y i- y )
r= 8.5

The mathematical justification for this statistic depends on the bivariate


normal distribution, which is considered in section 8.11. Here we will
suggest some intuitive justification for it.

187 The correlation coefficient


Let us consider a statistic, called the covariance of X and Y, which is
defined by
1 ”
n —l ^

In order to understand this statistic, it is necessary to consider the


scatter diagram with its origin at the centroid of the data, (3c, y), and to
divide the plane into four quadrants. The data of Example 5 has its
centroid at the point (36, 2-2), and has been plotted in Figure 47. The
four quadrants are labelled I, II, III and IV.

€ 3
• •
2

1 IV

0
10 20 30 40 50 60
temperature (x)

Figure 47

Now any point in quadrant II will be such that both (Xj —x) and
(yj —y) are positive. Thus the product (Xj —x)(yj —y) is positive. In
quadrant III both (x; —x) and (yj —y) are negative so that the product
(Xj —x)(yj —y) is positive. But any point in quadrants I or IV will be
such that the product of (x^—x) and (yj —y) is negative.
The covariance is obtained by summing all the products and dividing
by (n —l). If the variables are positively correlated, as in Figure 47,
then most of the observed points will lie in quadrants II and III, so that
the covariance will be ‘large’ and positive. On the other hand if the
variables are negatively correlated then most of the observed points will
lie in quadrants I and IV, so that the covariance will be ‘large’ and
negative. But if there is no correlation between the variables, then the
number of points in each quadrant will be approximately the same, so
that the covariance will be close to zero. Thus the covariance is a
measure of the association between the two variables.

188 Regression and correlation


The next problem is to standardize the covariance in such a way that
it does not depend on the scales in which the measurements were made.
Thus in Example 5 we would like to get the same measure of correlation
if the temperature is measured in degrees Fahrenheit, centigrade, or
any other scale. A convenient way of doing this is to express (x, —x) and
iyi~y) units of their respective standard deviations, and Sy,
where

and Sy =
n -l
Then the required measure of correlation is given by

which can be rewritten as Equation 8.5


Many pocket calculators which have a facility to calculate r use the
algebraically equivalent formula

r=

It can be shown that the value of r must lie between —1 and +1.
For r = + 1, all the observed points lie on a straight line which has a
positive slope; for r = —1, all the observed points lie on a straight line
which has a negative slope.

Example 6
Calculate the correlation coefficient of the data from Example 5.
«=15, 1276-1, ^ x = 540, ^_v = 33,
= 21426, 78-44.
Hence
88-1
r= = 0-82
7(1986x5-84)

Thus the correlation is high and positive; this was to be expected after
inspecting Figure 47.

189 The correlation coefficient


It is often useful to perform a significance test to see if the observed
correlation coefficient is significantly different from zero. If there is
really no correlation between the two variables, it is still possible that
a spuriously high (positive or negative) sample correlation value may
occur by chance. When the true correlation coefficient is zero, it can be
shown that the statistic r^(n - 2)/^( 1 - r^) has a i-distribution with n - 2
degrees of freedom, provided that the variables are bivariate
normal. If we are interested in positive or negative correlation then a
two-tailed test is appropriate. The correlation is significantly different
from zero at the a level of significance if
r^{n-2)
> tCLl2,n- 2•
v /U -'-')

Example 7
Is the correlation coefficient in Example 6 significantly different from
zero?

. 9-0
V d - - ’)
But ^0 025,13 “ 216.
Thus the correlation between temperature and thrust is significant
at the 5 per cent level.
Table 14 gives the 95 per cent critical points for the absolute value
of the correlation coefficient for different sample sizes. When the
sample size is small a fairly large absolute value of r is required to show
significant correlation.

Table 14

Sample size Critical value Sample size Critical value

5 0-88 25 0-39
10 0-63 30 0-36
15 0-51 50 0^8
20 0-44 100 0-20

Confidence intervals for the true correlation coefficient are given in


Pearson and Hartley (1966).

190 Regression and correlation


8.10 Estimating the regression lines
If the size of the sample correlation coefficient indicates that the random
variables are interdependent, then we may want to predict the value of
one of the variables from a given value of the other variable. In this
situation it is important to realise that there are two regression lines,
one to predict y from x and one to predict x from y.
If the variables are linearly related, then the regression line of y on x
can be denoted by
y = flo + ^ix.
Estimates of Qq and can be obtained by using the method of least
squares, as described in section 8.2. We find
^0 = y - ^ i x ,
^ _ X(x,.-x)(y..-y)
I ( x, - x)^ •

We can denote the regression line of x on y by


X = bo + b^y.

Figure 48 Finding the regression line of x on /

191 Estimating the regression lines


This line will generally be dififerent from the regression line of y on x.
Estimates of Bq and hj can again be obtained by the method of least
squares, but in this case they are obtained by minimizing the sum of
squared deviations parallel to the x-axis and not parallel to the y-axis
as in previous regression problems.
For any point (Xj, y j the deviation is given by
£-,■ = X i - b o - b j y i .
Minimizing S = ^ ef with respect to bo and h j , we find
i>o = x - b ^ y ,

K y i-y ? ■
The two regression lines will only coincide when the observations all
lie on a straight line. Both lines pass through the centroid of the data,
but their slopes are different. If there is no correlation between the
variables then the two regression lines will be at right angles. In this case
the best prediction of y for any value of x is simply y. Conversely the best
prediction of x for any value of y is simply x.

Figure 49 Regression lines when there is no correlation

Example 8
Find the regression lines for the data of Example 5.
We have x = 36, y = 2-2.

192 Regression and correlation


1986,
15

Z / - ^ = 5-84.

Thus the estimated regression line of j on x is given by

= 0*044(x-36).

The estimated regression line of x on y is given by

88-1
^ - 3 6 = ^ ( . - 2-2)

= 15-l(y-2*2).

These regression lines are plotted in Figure 50.

Figure 50 Regression lines for the data of Example 5

193 Estimating the regression lines


We are now in a position to show that the square of the correlation
coefficient, r, is equal to the coefficient of determination which was
discussed in section 8.5. The estimated regression line of y on x is
given by
X (x ,-x )(y ,-y )~
y -y ( x -x )

= r./.V,
—(x —x) using equation 8.5.

Thus

Z(y.— v f = r 2-^X (x,.-x)


s
- r^Sy(n—
.2 ,2
1)
= Z O 'i-.tf-
But
Z(j^f—J’)^ = (‘explained variation’)
and
^(y, —y)^ = (‘total variation’).
Thus
= (coefficient of determination).
This quantity gives the proportion of the total variation in y which is
accounted for by the linear variation with x. Thus in Example 6 we
find that 0-82^ = 0-672 of the total variation in thrust is accounted
for by the linear relationship with temperature.

8.11 The bivariate normal distribution


For some procedures (e.g. tests on the correlation coefficient - see
section 8.9) it is convenient to assume that pairs of measurements follow
a bivariate normal distribution. This is the natural extension of the
normal distribution to the case of two variables, and describes fairly well
many distributions of pairs of measurements which arise in practice.
The formula for the joint probability density function of this distribu­
tion is rather complicated. Denote the two random variables by X and

194 Regression and correlation


y. If a pair of observations is taken at random from a bivariate normal
distribution, the probability that the value of X lies between x and
x-\-dx and that the value of Y lies between y and y-\-dy is given by
1
/(x , y) dx dy =
2 na^a y ^ (l-p ^ )

1 ^ jy-i^yV
X exp <— dx dy,
2(1 -p ^ )
where jHy, (Ty denote the mean and standard deviation of X
and Y respectively. The parameter p is the theoretical correlation
coefficient and is defined by

p = E

It is easy to show that p always lies between —1 and -f 1, and that it


is zero when X and Y are independent. The sample correlation coeffi­
cient, r, introduced in section 8.9 is a point estimate of p, and Example 7
shows how to test the hypothesis H q \ p = 0 .
Given particular values for the five parameters, p^,o^,Py, a y and p,
we can compute / (x, y) at all values of (x, y). This will give a three-
dimensional surface which has a maximum at (p^c^Py) which
decreases down to zero in all directions. The total volume under the
surface is one. The points (x, y) for which / (x, y) is a constant form an
ellipse with centre at (p^, p^). The major axes of these ellipses have a
positive slope when p is positive and a negative slope when p is negative.
In the special case when p = 0 the major axes of the ellipses are parallel
to the x-axis if > ay and parallel to the y-axis if < Oy.
An important property of the bivariate normal distribution is that
the conditional distribution of 7, for a given value x of X, is normal
with mean
!^y\x = + p—

and variance

^)\x =
This important result, which will not be proved here, says that the
conditional means of the y-distributions lie on the straight line

<T,

195 The bivariate normal distribution


which goes through the point ¡iiy) and which has a slope of pay/
Thus the regression curve of on x is a straight line.
This regression line can be estimated by the method of least squares.
However since is constant, these estimates will in fact be maximum
likelihood estimates. The least squares estimate o f the slope of the
regression line is given by

see section 8.2

= r —. from equation 8.5

This quantity is in fact the intuitive estimate of pOy/G^, obtained by


substituting the sample estimates of p, and Gy.
For a bivariate normal distribution it can also be shown that the
conditional distribution of X, for a given value y of V, is normal with
mean
Gx
Mxl,- = Hx + P— (y-^íy)■
Cy
Thus the regression curve of x on y is also a straight line, and the
least squares estimates of the parameters of this line can again be
obtained by substituting the sample values of Py, cr^, Gy and p.

8.12 Interpretation of the correlation coefficient


We will conclude this chapter with a few general remarks about the
correlation coefficient. Firstly it is worth emphasizing again that it

• •
• •

Figure 51 Correlation coefficient should not be calculated

196 Regression and correlation


should only be calculated when the relationship between two random
variables is thought to be linear. If the scatter diagram indicates a
non-linear relationship, as in Figure 51, then the correlation coefficient
will be misleading and should not be calculated. The data in Figure 51
would give a value of r close to zero even though the variables are
clearly dependent.
It is also important to realise that a high correlation coefficient
between two variables does not necessarily indicate a causal relation­
ship. There may be a third variable which is causing the simultaneous
change in the first two variables, and which produces a spuriously high
correlation coefficient. In order to establish a causal relationship it is
necessary to run a carefully controlled experiment. Unfortunately it is
often impossible to control all the variables which could possibly be
relevant to a particular experiment, so that the experimenter should
always be on the lookout for spurious correlation (see also section
10.4).

Exercises
1. The following measurements of the specific heat of a certain chemical
were made in order to investigate the variation in specific heat with
temperature.

Temperature °C 0 10 20 30 40

Specific heat 0-51 0-55 0-57 0-59 0*63

Plot the points on a scatter diagram and verify that the relationship
is approximately linear. Estimate the regression line of specific heat
on temperature, and hence estimate the value of the specific heat when
the temperature is 25°C.

2. When the values of the controlled variable are equally spaced, the
calculation of the regression line can be considerably simplified by
coding the data in integers symmetrically about zero. When the values
of the controlled variable sum to zero, the estimates of the regression
coefficients are given by

^ Z ^iy¡
^0 =

197 Exercises
For example, with five equally spaced values of the controlled variable
the coded values could be - 2, - 1, 0, + 1, + 2 ; with six values the
coded measurements could be —5, —3, —1, + 1, 3, 5.
Use this technique to find the regression line in the following case.

Output (1000 tons) IM 12-3 13-7 14*6 15-6

Year 1960 1961 1962 1963 1964

Estimate the output of the company in 1965.

3. The following are the measurements of the height and weight of


ten men.

Height (inches) 63 71 72 68 75 66 68 76 71 70

Weight (pounds) 145 158 156 148 163 155 153 158 150 154

(a) Calculate the correlation coefficient and show that it is significantly


different from zero.
(b) Find the linear regression of height on weight.
(c) Find the linear regression of weight on height.

4. In this example we have coded measurements on a dependent


variable y, and two controlled variables and X2 .

Test Xi X2
1 1-6 1 1
2 21 1 2
3 2-4 2 1
4 2-8 2 2
5 3-6 2 3
6 3-8 3 2
7 4-3 2 4
8 4.9 4 2
9 5-7 4 3
10 50 3 4

Find the linear regression of y on Xj and X2 .

198 Regression and correlation


References

A c o m p r e h e n siv e d iscu ssio n o f reg ressio n m ay be fo u n d in m any b o o k s su ch as th o se


by C h a tte r je e and P rice (1 9 7 7 ), W eisb e rg (1 9 8 5 ), W eth erill (1 9 8 1 , 1986) and D ra p er
and Sm ith (1 9 8 1 ).

C hatterjee, S., and Price, B. (1977), Regression Analysis by Example, W iley.


D raper, N. R., and S mith, H. (1981), Applied Regression Analysis, 2nd edn, W iley.
F isher, R . A ., and Y ates , F . (1 9 6 3 ), Statistical Tables for Biological, Agricultural
and Medical Research, O liver & B oyd, 6th edn.
Pearson, E . S ., and H artley, H . O . (1 9 6 6 ), Biometrika Tables for Statisticians,
C am b ridge U n iv ersity Press, 3rd edn.
Wei sberg , S. (1 9 8 5 ), Applied Linear Regression, 2nd e d n , W iley .
Wetherill, G. B. (1981), Intermediate Statistical Methods, C h ap m an and Hall.
W etheri ll , G . B . (1 9 8 6 ), Regression Analysis with Applications, C h ap m an and
H a ll.

Addendum

T h e m eth o d o f m u ltip le reg re ssio n is o n e o f th e m ost w id e ly u sed (and m isu sed )


te c h n iq u e s in the field o f sta tistics. T h is a d d en d u m is to warn the read er o f the d an gers o f
th e tec h n iq u e . In the tex t w e h ave a ssu m e d that the in d e p e n d e n t variab les can be
co n tr o lle d , and su g g e ste d that the v a lu es o f th e se v a riab les sh o u ld p referab ly be ch o se n
to be o r th o g o n a l. H o w e v e r th e tech n iq u e is o fte n u sed w h en the so -c a lle d in d e p e n d e n t
v a ria b les are n ot in d e p e n d e n t at all but h igh ly co rre la ted . F or e x a m p le , in e c o n o m ic s
o n e m ight try to pred ict in flation from ch a n g e s in w a g es, p rices, p u b lic sp en d in g , etc. In
this sort o f situ a tio n , th e p ro b lem m ay b e c o m e ‘ill-c o n d itio n e d ’, the co effic ie n ts in the
fitted eq u a tio n c e a s e to h a v e any real m ea n in g , and the fitted m o d e l m ay b e sp u riou s.
U n fo r tu n a te ly this d raw b ack is not alw a y s reco g n ised and m any reg ression e q u a tio n s
h a v e b e e n rep o rted w h ich ap p ea r to g iv e a g o o d fit to a given se t o f d ata but w hich have
p o o r p r e d ictiv e p erfo rm a n c e.
A n other p o ssib le danger in m ultiple regression is the tem p tation to include a large
num ber o f regressor variables w hich appear to im prove the fit for a given set o f d ata but
w hich a ctu ally give a sp u rio u s fit in that the fitted m odel has poor predictive
perform ance. The a u th or has even seen scien tists trying to fit m ore variables than there
w ere o b ser v a tio n s! A s a crude ru le-o f-thu m b , 1 generally su ggest that the num ber o f
variables sh o u ld not exceed o n e quarter o f the num ber o f o b ser v a tio n s and sh ou ld
preferably not exceed ab o u t 4 or 5.
After fitting a m ultiple regression, it so m etim es arises that o n e or m ore o f the regressor
variables co n trib u te little or n o th in g to the accuracy o f the p red ictions o f the response
variable. In this case we w ant to select a subset o f the x-variab les to include in the fitted
regression eq u a tio n . T his can be d o n e by a variety o f m eth od s in clu d in g backw ard
elim in a tio n (w here the least im portan t variable is su ccessively rem oved until all the
rem aining variables are significant), and forw ard selection (w here the procedure begins
w ith n o x -variab les included and the m ost im portant variable is su ccessively added until
the next variable is not significant) - see, for exam p le. D raper and Sm ith (1981).

199 Exercises
Part three
A p p lic a tio n s

He had forty-two boxes, all carefully packed,


With his name painted clearly on each:
But, since he omitted to mention the fact.
They were all left behind on the beach.
Chapter 9
Planning the experiment

The next three chapters will be concerned with the design and analysis
of experiments. This chapter, entitled the planning of experiments,
deals with preliminary considerations, particularly that of finding the
precision of the response variable. Chapter 10 covers the design and
analysis of comparative experiments, while Chapter 11 deals with the
design and analysis of factorial experiments.

9.1 Preliminary remarks


Probably more experimental effort is wasted because problems have
been poorly defined than for any other reason. When designing an
experiment the first task of the engineer or scientist is to study the
physical aspects of the experiment to the point where he is confident
that there is no relevant knowledge of which he is not aware. The next
task is to make a precise definition of the problem. Many people
underestimate the time required to accomplish these two tasks.
Experience suggests that although the final experimental programme
may be completed in less than a month, it may take as long as a year
to reach an understanding of the problem and false starts are to be
expected. The engineer should always be prepared to do more research,
more analysis and more literary search before plunging into a poorly
defined test programme.
Furthermore a scientist with little knowledge of statistics should not
attempt to design an experiment by himself but should call in the
services of a statistician. In industry the technologist who has studied
statistics with some thoroughness will often act as consultant to other
engineers and scientists. The following remarks are addressed to such
a statistician. Firstly he should make it quite clear that his objective is
to assist the experimenter and not replace him. Secondly he should
be prepared to act as a restraining influence. When an experimental
result is disappointing, the new invention malfunctions, or an attractive
hypothesis is rejected by a significance test, there is an urgent desire to
‘test something’, to ‘get going’. The statistician must always be prepared

203 Preliminary remarks


to encourage the experimenter to make more thorough preparations.
Thirdly there is the problem of what to do with prior data. The
statistician is often consulted only after considerable testing has been
completed in vain without solving the problem. Although this data
may be worthless due to the absence of good statistical procedure, it
may contain some information and it would be stupid to discard it
without a glance. An inspection may reveal that some worthwhile
analysis has not been done.

9.2 Measurements
Once the problem has been well defined the next problem is to determine
the quantities to be measured. The response variables in which we are
really interested cannot always be measured directly but may be a
function of the measurements. These measurements are basic quantities
such as length, weight, temperature or the resistance of a thermo­
couple. But the response variables may be more complex quantities
such as efficiency, fuel consumption per unit of output etc. If the
functional relationship between the response variable and the measure­
ments is known, the measurements can be transformed by this function
to obtain the required quantities.
Unfortunately the measurements are almost certain to contain an
error component. By now you should be convinced that no two experi­
ments are identical except in an approximate sense. There are an
infinite number of reasons why small variations will occur in repeated
tests. The power supply changes, the technician is tired, weather
conditions vary; and so on. If the variation is small the experiment
is said to be precise or repeatable. Precision is expressed quantitatively
as the standard deviation of results from repeated trials under identical
conditions. This precision is often known from past experiments, how­
ever if it is not known then one of the experimental test objectives
should be to obtain an estimate of it.
Another useful property of ‘good’ measurements is that they should
be accurate or free from bias. An experiment is said to be accurate or
unbiased if the expected value of the measurement is equal to the true
value. In other words there is no systematic error. Measuring instru­
ments should be checked for accuracy by relating them to some defined
standard, which is accepted, for example, by the National Bureau of
Standards in the U.S.A.
If the measured values are plotted against the standard reference
values, a calibration curve results. This curve enables us to estimate the

204 Planning the experiment


true value by multiplying the measured value by a correction factor.
Calibration problems are discussed in Mandel (1964). It may not be
possible to entirely eliminate all bias but it can often be made as small
as we wish. It is customary to estimate the bias of an instrument in the
form of an upper bound (not greater than —). This is often an experienced
guess.

Figure 52 Precision and accuracy


(a) accurate and precise (b) not accurate but precise
(c) accurate but not precise (d) not accurate, not precise

It is clear from the above that ‘good’ measurements have the same
properties as ‘good’ estimators, namely precision and accuracy (see
section 6.5). Four types of distributions of measurements are illustrated
in Figure 52. Because there is variation in the measurements, there will
also be variation in the corresponding values of the response variable.
The effect of the functional relationship with the measurements on the
error in the response variable is sometimes called the propagation of
error. The size of the error in the response variable will depend on the
precision and accuracy of the measurements. In the following sections
we will usually make the assumption that the measurements are
accurate. This is reasonable provided that all meters are recalibrated at
regular intervals.

205 Measurements
9.3 The propagation of error
We have seen that the response variable is often some function of the
observed measured variables. Our primary concern is to find the
distribution of the response variable so that, for example, we can
judge if the variation is unacceptably large. This section will be con­
cerned with methods for finding the distribution of the response
variable (see also Box et ai, 1978, Section 17.2).

9.3.1 Linear functions


The simplest case occurs when the response variable z, is the sum of
several independent measured variables which will be denoted by

Z= +X 2 + . . .+x„.
Denote the true values of these measured variables by jui, . • ■, If
the measurements are accurate and have respective precisions
cTj, (72, then, using the results of section 6.2, we have

+ + - -+ = l^z
and
variance (z) = -b a j = of
For the general case, where the response variable is any linear function
of the measured variables,
Z = UiXi-hU2 X2 +---+<^nX„,
where the u,s are constants. Then we have
E[z) = Ui/li-ha 2 A^2 + ---+«n/^n =
and
variance (z) = a \ o \ - \ - a \ o \ ^ .. .-b a„ o^ = ■

Thus having found the mean and variance of the distribution of z, the
next question that arises is to find the type of distribution. The following
important result is stated without proof. If the response variable is a
linear function of several measured variables, each of which is normally
distributed, then the response variable will also be normally distributed.
So given observations on the measured variables, the corresponding

206 Planning the experiment


value of the response variable can be found. This value, denoted by Zq,
is a point estimate of Since the response variable is normally distri­
buted, the 100(1 —a) per cent confidence interval for is given by
^0 i ^a/2 where z„/2 is the appropriate percentage point of the
Standard normal distribution.

Tolerances. One important application of the above results is in the


study of tolerances, where the dimensions of manufactured products
have to be carefully controlled. The tolerance limits of a particular
dimension are defined to be those values between which nearly all the
manufactured items will lie. If measurements on this dimension are
found to be normally distributed with mean and precision o, then the
tolerance limits are usually taken to be ju±3( t (see also section 12.9).
Then only 0-27 per cent of all items can be expected to fall outside the
tolerance limits.
It often happens that a product is made by assembling several
parts and so one dimension of the product may be the sum of the
dimensions of the constituent parts. In this case the above formulae
can be used to find the tolerance limits of the dimension of the product.

Example 1
An item is made by adding together three components whose lengths
are Xj, X2 and X3. The over-all length is denoted by z.
z = Xt H-x^ +X 3.

h*------------------------- ................... .............. - ..... ...................... .......... X3 ------------ ----------M

1
z

Figure 53

The tolerance limits (in inches) of the lengths of the three components
are known to be 1-960 ±0*030, 0-860 ±0-030 and 1-865 ±0-015, res­
pectively. Thus the respective precisions of Xj, X2, and X3 are 0-010,
0-010, and 0-005 in.

207 The propagation of error


If the lengths of the three components are independently normally
distributed, then the over-all length is also normally distributed with
E(z) = 1-960+ 0*860+1-865
= 4-685
and
variance (z) = = 0 01^ + 0-01^+0 005^
= 0-000225,
giving = 0*015.
Thus the tolerance limits for the over-all length are 4*685 + 0 045.

Example 2
Suppose that the tolerance limits for the over-all length, which were
calculated in Example 1, were judged to be too wide. If the tolerance
limits for X2 and X3 cannot be narrowed, find reduced tolerance limits
for X| so that the over-all tolerance limits are 4*685 + 0 036.
Denote the revised precision of x^ by . The revised precision of the
over-all length is given by 0*012. Thus we have

0*0122 = (j2+0*012+ 00052,


giving (Ji = 0*0044.

The revised tolerance limits for x^ are 1*960 + 0*013. This, of course,
means that a considerable improvement must be made in the manu­
facturing standards of this component in order to achieve the required
precision.

9.3.2 Non-linear functions


Thus far we have considered the case where the response variable
is a linear function of the measured variables. We now turn our attention
to non-linear functions. For simplicity let us begin by assuming that
the response variable, z, is a known function of just one measured
variable, x, that is, z = / (x), and let us denote the true value of x by p.
If the measurements are accurate and have precision a, then successive
measurements will form a distribution with mean p and standard
deviation a. This distribution is sometimes called the error distribution
of X .

208 Planning the experiment


The true value of the response variable is given by = /(^). If all
the measurements are transformed by the given function, then these
transformed values form the error distribution of the response variable.
We would like to answer three questions about this distribution.
Firstly we would like to know if the mean value of this distribution is
equal to . In other words given a measurement x, we would like to
know if / (x) is an unbiased estimate of / (/i). Secondly we would like to
find the standard deviation of this distribution. In other words we
would like to find the precision of the response variable. Thirdly if x
is normally distributed, we would like to find the distribution of z.
These questions can be answered in the following way.
Provided that / (x) is a continuous function near /i, it can be expanded
as a Taylor series about the point x = ja.

df
f(x) = f(n) + {x-tJ.f + R3
dx
where df/dx and d^f/dx^ are evaluated at x = ju. R 3 is the remainder
term which can be disregarded if higher derivatives are small or zero
and the coefficient of variation of x is reasonably small.
The above equation can be rewritten in the equivalent form

Then E(z) k E

But

E{x -ju) = 0

and

E{z) ^ +

209 The propagation of error


The expression j(d^z/dx^)^a^ is the bias resulting from using/(x) as
an estimate of Fortunately this bias term will often be small or
zero in which case we have
E[f{x)] * m
or
E(z) ^

Example 3
The measurement x, which is an unbiased estimate of p, is transformed
by the equation z = ^ x . Find the bias in using z as an estimate of yjp.
dz 1 d^z 1
dx l^ x dx^ 4x^/2-

Thus E{z) ^ l^z where = -Jn-


8/. 3 / 2
E{z)
Hence 1 - -

/^z
Thus the percentage bias in estimating can be calculated for differ­
ent values of the coefficient of variation of x, a/p.

Percentage bias
01 0-125%
0-2 0- 5%
0-3 1- 125%

Thus even when the coefficient of variation of x is as high as 30 per cent,


the percentage bias in the transformed values is only IT 25 per cent.
In the remainder of this section, we will assume that the bias term,
j(d^z/dx^)^a^ is relatively small, so that we can write

The important point to notice here is that this is now a linear function
of X . In other words we have approximated / (x) by a linear function
in the area of interest. This means that if x is normally distributed.

210 Planning the experiment


then z will also be approximately normally distributed. Furthermore
the precision of z can be found as follows.
Since E(z) ^

and z - f i , % 1 ^ 1 (x-ju).

variance (z) = ^ F [(z-//J^]

dz
(7^

Hence (x, == (precision of the response variable)


ldz\
G.
dx

Example 4
It is instructive to check the above formula when f(x) is a linear
function.
Let z = ax-\-b,

dz
then —- = a.
dx
Therefore variance (z) = ; this was the result obtained earlier.

Example 5
The measurement x, which is an unbiased estimate of p, is transformed
by the equation z == ^ x . Find the coefficient of variation of z in terms
of the coefficient of variation of x.
1
V i“
variance (z) — where a is the precision of x.
4n

211 The propagation of error


(coefficient of variation of z) = —
f^z

V/^
_ a
2n

= ^(coefficient of variation of x).


It often happens that several measurements are made on the measured
variable. Then it is better to find the average measurement, x, and
transform this value, rather than transforming each measurehient and
finding the average of the transformed measurements. This procedure
will reduce the bias in the estimation of the true value of the response
variable. If there are n observations, the precision of x is given by
(j/yjn and then the precision of the response variable will be given by
{ d f/d x \a /^ n .

Example 6

Four measurements, 4 01, 4 08, 4T4 and 4*09 are made on a measured
variable whose unknown true value is denoted by p. These measure­
ments are known to be accurate and to have precision a = 0*06. Find
a 95 per cent confidence interval for
The observed average measurement, denoted by Xo, is equal to 4 08.
Thus a point estimate oi ^ pis given by
^Xo = 2-02.
The precision of this estimate depends on {dz/dx)^, where z = ^ x .
Unfortunately this cannot be evaluated exactly as we do not know the
true value of p. However it is sufficiently accurate to evaluate (dz/dx)
at the point Xq. Thus the precision of the estimate of yjp is given by

Zjinxo)
00075.

212 Planning the experiment


Thus a 95 per cent confidence interval for ¡jl is given by 2-02 ± 1*96 x
00075.
More generally, we can consider the situation where the response
variable, z, is a function of several independent measured variables,
Xj, .. .,x„.
Z = /(Xi,X2,...,X„).
Let ¡1 2 , . . . , iin denote the true values of the measured variables.
Assume that all the measurements are accurate and have precisions
(7i, (72, . . . , respectively. The true value of z is given by =
/ ( / i i , /i2, . . . , lit). As before, we can expand / ( x j , X2, . . . , x„) in a Taylor
series about the point (/ii, /Z2, . . . , /i„). If the measured variables have
fairly small coefficients of variation, and if the second order partial
derivatives (d^zjdxf) are small or zero at (/ii, /i2, • •., A^«), then we find
E{z) ^ jii.
dz dz dz
and (72 + . . .+
dx 2 dx.
where the partial derivatives are evaluated at (/^i,. . . , When these
true values are unknown it is usually sufficiently accurate to estimate
(7^ by evaluating the partial derivatives at the observed values of the
measured variables.
The above formulae are strictly accurate only for linear functions,
but can also be used for functions involving products and quotients
provided that the coefficients of variation of the measured variables
are less than about 15 per cent.

Example 7
Assume that x and y are independent measured variables with mean
Py, and precision cr^, cr^, respectively. Also assume that (rjpx and
(Ty/py are less than about 0-15. Calculate the expected value and precision
of the response variable, z, in the following cases:
(a) A linear function
z = ax-\-by—where a, b are positive or negative constants.
dz
=a
dx dy
All higher partial derivatives are zero.

213 The propagation of error


Hence E(z) =
variance (z) = a^al-i-b^dy.
These results are the same as those obtained earlier.

(b) Products
z = xy,
E{z) ^ fix
dz dz
= X.
dx dy

If these are evaluated at (fi^, fiy), we find


^2^ ^^ fiyG^-\-
(T ,,2 _2 , fi^Oy,
,,2 _2
^2 ^2 _2
2^ 2 2
/^z /^x Py
or
Cl ^ d - y C l .
where Cy, Q are the coefficients of variation of x, y and z
respectively.

(c) Ratios
X
z=
y

E(z) ^ fi. = ^
Py
dz 1 dz X

dx dy 7^2•
.V

If these are evaluated at fiy) we find

(■; ’ 1
^2 _2

2 ^ 2 2

or C i+ C ,^

214 Planning the experiment


We are now in a position to find the precision of the response variable
if we are given the precision of the measured variables. The latter are
usually known from past experiments ; but if they are not known then
they should be estimated by taking repeated measurements under
controlled conditions.
Should the response error be found to be unacceptably large there
are several possible courses of action. Firstly an attempt could be made
to improve the precision of the measured variables, either by improving
the instrumentation or by improving the design of the system. Secondly
it may be possible to use a different function to calculate the response
from the measurements. Thirdly it may be possible to combine the
information from several imprecise measurements to obtain an
acceptable estimate of the response variable. This last method is
possible when the system includes some redundant meters. The re­
mainder of the chapter will be concerned with obtaining the most
precise estimate of the response variable when this is the case.

9.4 Improving precision with series and parallel arrangements


A common way of improving the precision of a particular measurement
is to add redundant meters in series and parallel arrangements. For
example, suppose that we wanted to estimate the total flow through a
pipe-line. Instead of just taking one observation, we could make three
separate measurements, X2 and X3, on three meters in series, as
shown in Figure 54. If the meters have the same precision a and are
unbiased, then the best unbiased estimate of the total flow is the average
of the three measurements
Xi +X 2 +X 3
z=

Figure 54 Meters in series

The precision of z is given by cr/^3, which is of course smaller than the


precision of the individual measurements.

215 Improving precision with series and parallel arrangements


Alternatively, if there are different size meters available, or if the
precision is a function of the flow through the meter, then a parallel
arrangement, or a combination of meters in series and parallel, may be
used. Two such arrangements are illustrated in Figure 55.

For the parallel arrangement, the total flow is estimated by


Z == X 1 + X 2 + X 3 .

But for the series-parallel arrangement it is not clear what the best
estimate of the total flow should be. Nor is it clear what the best
estimate of the total flow would be in the series arrangement if the
precisions of the three meters were not the same. Thus in the next
section we will outline a general method for combining different
measurements to obtain the best estimate of the response variable. This
method can be used in the above type of situation or wherever informa­
tion is available from several sources.

9.5 Combining dissimilar estimates by the method of least squares


We have seen that the true value of the response variable can often
be estimated in several different ways using the information obtained
from the different measuring instruments. A common procedure is to
take the reading from the most precise instrument and use the redundant
meters as checks. However we can combine all the information and
obtain a single estimate by applying the method of least squares. This
estimate will be better than the individual estimates because it will be
more precise.

216 Planning the experiment


We have already described in Chapter 8 how to fit a straight line
to data by the method of least squares. It is a relatively easy matter
to adapt this method in order to estimate parameters in other situations.
For example, suppose that n observations X2, . . . , are made on
an unknown quantity, //.We have

Xf - + {i = l , . . . , n),

where €2 , . . . , e„ are the measurement errors.


The sum of squared errors is given by

i= 1
n
dS
dfi

The least squares estimate of // is obtained by minimizing S with respect


to ¡1 : this is achieved by putting dS/dfi = 0. Then we find

=I-=-^ n

Thus the sample mean is also the least squares estimate of jn.
The least squares method can be used whenever we have a set of
measurements, each of which has an expected value which is a function
of the unknown parameters. The sum of squared differences between
the measurements and their expected values is formed, and then
minimized with respect to the unknown parameters. The basic assump­
tion necessary to apply the least squares method in this way is that the
precision of the observations should be the same. Later on we shall
see how to modify the method when the precisions are unequal. The
procedure is best illustrated with an example.

Example 8
Suppose that we want to determine the flow in a system with one input
and two output streams, all three being metered. This situation is
similar to the series-parallel arrangements depicted in Figure 56.

217 Combining dissimilar estimates by the method of least squares


oytpyt 1

input

output 2
Figure 56

We have four pieces of information: the three meter readings, mj,


m 2 and m3, all subject to precision error, plus the knowledge that the
input is equal to the sum of the output streams. Let
= (true flow in output 1 ),
fi2 — (true flow in output 2),

then jUi + /i2 = (true input flow).

The relations between the true values and the measurements are given
by

Ml = 112 + ^2 ,

where 62, are the measurement errors associated with the three
meters. We begin by assuming that the meters have the same precision
so that

var(mi) = var(m2) = var(m3) =

The method of least squares can then be used to estimate and ¡^2 -
For particular values of and /¿2, the sum of squared errors is
given by

The least squares estimates of ja^ and ^ 2 ure obtained by minimizing


this expression with respect to and jH2 .

218 Planning the experiment


dfii

d S

dni
Equating the two partial derivatives to zero and solving the two simul-
taneous equations, we get
p.1 = i(2m i-m 2+m 3),
fl2 = i(2m 2-m i+m 3),
/¿I + jU2 = (least squares estimate of total flow)
= ^ m , + m 2 + 2m3).
If we assume that the measurement errors are independent, then the
precision of the least squares estimates can be found from

= ^[4(7^ + + iJ^] =
3 ’
2(7^
^ ¡2 ^[4iT^ + i7^ + cr^] =
T ’

21 = 2(7^

Thus, by using Ai, and /¿^ + /¿2 instead of , m2 and m3 to estimate


the respective flow rates, the precision has been improved from (7 to
^(2/3)cr in each case.

9.5.1 Unequal precision errors


We have seen how to apply the method of least squares when the
precision of the different observations is the same. However situations
often arise where the instruments are not equally precise and then the
method must be modified by giving more weight to the observations
with the smaller precision.
Let us assume that readings are taken from n instruments and that
(7 i is the precision of the ith instrument. Then a set of constants
P i,P 2 ,...,P „ must be chosen so that

C7\P, alP 2 = ... = a^P„.

219 Combining dissimilar estimates by the method of least squares


A convenient set of constants are defined by

{i = 1 , . . . ,n),

where is the square of the smallest precision. Thus is the largest


constant and is equal to one. As before the expected value of each
measurement will depend on certain unknown parameters. In order to
obtain the least squares estimates, the difference between each measure­
ment and its expected value is squared, and then multiplied by the
appropriate P constant. The adjusted sum of squares is formed by
adding up these quantities, and the least squares estimates of the
unknown parameters found by minimizing the adjusted sum of squares.
The method is best illustrated with an example.

Example 9
Two chemists each make three measurements on the percentage of an
important chemical in a certain material. The observations are
Xi2, Xi3 and X21 , X22, ^ 23- Suppose that both chemists make unbiased
measurements but that the variance of measurements made by the
second chemist is four times the variance of measurements made by
the first chemist. Find the least squares estimate of the true percentage
of the chemical.
Denote the true percentage of the chemical by p. Because the
variances are unequal we do not take the simple average of the six
observations. Instead we want to give more weight to the first three
observations.
The constants P^, P2 j are given by
= 1 (i = l,2,3) and P^2 j (7 = 1,2,3).
The adjusted sum of squared deviations is given by
3 ^ 3

^ j=l
dS
dp ¡=1 ^j= l
Putting this derivative equal to zero we obtain the least squares
estimate of p.

220 Planning the experiment


^2j
L i= l j=l

Thus four times as much weight is given to the first three observations
as to the other three observations.

Example 10
The system depicted in Figure 56 is similar to the system used to
measure the flow of liquid oxygen through a rocket engine. There are
three meters which measure the total flow and its two components,
pre-burner flow and main burner flow. The precision of these three
meters is known to be the following:

pre-burner flow ( m j : 2*85 gallons per minute


main burner flow (m2): 20*37 gallons per minute
total flow (m3): 25*82 gallons per minute.

Denote the true flows through , M 2 and M 3 hy p i, P2 /^i + /^2


respectively, and the meter readings by m^, m2, and m3. Then the
weighted sum of squares is given by
S = P i ( m i - + P z i m j - ^l2)^ + f i i -

where

p .= !S = i-o
(2-85r

(2-85)^
Pi = (20-37)2 = 0-0196

(2-85)2
F. = 0-0122.
(25-82)2

dS
= -2 P i(m i-/ii)-2 F 3 (m 3 -/ii-/^ 2 )>
dni

dS
= - 2P2Ìni2-H i i - l P ^ i m ^ - f i i - H i ) -
dn2

221 Combining dissimilar estimates by the method of least squares


By equating these partial derivatives to zero we obtain two simul­
taneous equations in ¡1 ^ and /¿2 which can be solved to give

^2 + ^3) + P i ^ 3 (^ 3 - ^ 2 )

r r iiiP ^? 2 + ^2 ^3) + P j ^ 1)
=
P 1 P2 + P1 P3 + P2 P3
Note that if = P2 = P3 = 1, the precisions are equal and the
equations are the same as those obtained in Example 8.
Substituting the values of P^, P2 and P 3, we find
/¿I = 0*9925mi-h0'0075(m3 —m2),
/¿2 = 0'6193m2 + 0-3808(m3 —mj,
P1 + P 2 0'6118(mi-hm2) + 0*3883m3.
The precision of these three estimates turn out to be
(7^^ = 2-84,
(7^^ = 16*03,
+ = 16-09.
The precision of the least squares estimate of fii is only fractionally
less than the precision of mi, but the precision of the "'"ast squares
estimates of fi2 iPi ’^Pi) substantially less than the precision
of m2 and m3, and this makes the least squares procedure worthwhile.

Exercises
1. The measurement, x, is an unbiased estimate of ^ and has precision
(7. Expand in a Taylor series about the point x = fi. Hence show
that, if the coefficient of variation of x is small, the expected value
of z = x^ is given by -h and that the coefficient of variation of z is
approximately twice as large as that of x.
2. The three measured variables Xi, X2 and X3 have small independent
errors. The precisions are related by
^XI ~ ^X2 ~ ^^X3*
The response variable, z, is given by
Xi -hX2
z=
3x3

222 Planning the experiment


Estimate the precision of z in terms of the precision of X3 when the
measurements are x, = 2-37, X2 = 31 and X3 = 2 08.

Figure 57

3. A system has one input and three output streams. Let mj, m2, m3
and m4 denote the reading of the respective meters. Assuming that
the meters are unbiased and have equal precision, find the least squares
estimate of the total flow.
4. Two flowmeters are connected in series. Let m j, m2, be the readings
of meters A, B respectively. Assuming that both meters are unbiased
and that the precision of meter A is ± 3 Ib/hr and that of meter B
± 5 Ib/hr, find the least squares estimate of the total flow.
5. The tolerance limits for the measured variables x and y are 1 06 ± OT2
and 2*31 ±0T 5 respectively. Find the tolerance limits of the response
variable z = xy.

Figure 58

References
B ox, G. E. P ., H unter, W. G ., and H unter, J. S. (1978), Statistics for
Experimenters, W iley.
M a n d e l , J . (1 9 6 4 ), The Statistical Analysis of Experimental Data, In terscien ce.
T o p p i n g , J , (1 9 6 2 ), Errors of Observation and Their Treatment, 3rd ed n . C h ap m an
and H a ll.

223 Exercises
Chapter 10
The design and analysis of
experiments-
1 Comparative experiments
10.1 Some basic considerations in experimental design
The design and analysis of experiments is an extensive subject to which
numerous books have been entirely devoted (e.g. Box et a/., 1978,
Davies, 1956, Cochran and Cox, 1957 and Cox, 1958). The problems of
design are of course inseparable from those of analysis and it is worth
emphasizing from the outset that, unless a sensible design is employed,
it may be very difficult or even impossible to obtain valid conclusions
from the resulting data. In the next two chapters we will try to illustrate
the basic principles of experimental design and try to provide an intro­
duction to the maze of designs which have been proposed. In addition
the main methods of analysis will be discussed, including the analysis
of variance. Nevertheless the scientist should still be prepared to seek
the advice of a statistician if the experiment is at all complicated, not
only regarding the analysis of the results but also to select the appropriate
design.
Much of the early work on the subject was connected with agricul­
tural experiments, such as comparing the yield from several varieties
of wheat. However so many other factors are involved, such as weather
conditions, type of soil, position in field, and so on, that unless the
experiment is carefully designed, it will be impossible to separate the
effects of the different varieties of wheat from the effects of the other
variables. The reader is referred to the pioneering work of Sir Ronald
Fisher (1951).
In general the techniques described in the next two chapters are
useful whenever the effects being investigated are liable to be masked by
experimental variation which is outside the control of the scientist.
The physicist or chemist working in a laboratory can often keep
unwanted variation to a minimum by working with pure materials and
by the careful control of variables other than those of direct interest.
In this case the methods described here may be of little value. However
for large scale industrial experiments complete control is impossible.
There is inevitably considerable variation in repeated experiments

224 The design and analysis of comparative experiments


because the raw materials vary in quality, and changes occur in the
time of day or other environmental conditions. Thus it is highly
desirable to use an experimental design which will enable us to separate
the effects of interest from the uncontrolled or residual variation. It is
also important to ensure that there is no systematic error in the results.
Important techniques for avoiding systematic error and for increasing
the precision of the results are :
(a) Randomization,
(b) Replication,
(c) Blocking,
(d) The analysis of covariance.
In Chapter 9 it was stated that the first step in planning an experi­
ment is to formulate a clear statement of the objectives of the test
program; the second step is to choose a suitable response variable.
We must then make a list of the factors which may affect the value of
the response variable. We must also decide how many observations
should be taken and what values should be chosen for each factor in
each individual test run.
In Chapter 11 we will consider experiments involving several different
factors but in this chapter we will concentrate our attention on com­
parative experiments of which the following is an example.
One stage of the turbo-jet cycle consists of burning fuel in the high
pressure discharge from the compressor. The efficiency of the burner is
of crucial importance and depends, among other things, on the burner
discharge temperature. Assume that we have been given the task of
designing an experiment to compare two new burners (Bj and B2)
with a standard burner (B3).
A similar type of situation would occur if we wanted to compare
several different processes for hardening steel. The objective of the
experimental programme is to compare the burners in the first situation
and to compare the different processes in the second situation. Such
experiments are called comparative experiments. Generally we shall
refer to the burners, processes or whatever are being compared as
treatments. In any individual test run only one treatment is present.
Note that an individual test run is sometimes called a trial, and that
the term experiment is used to denote the whole set of trials and does
not refer to an individual test run.
The term experimental unit will be used to denote the object on
which a particular trial is carried out. In some experiments it is different
for each individual test run - this occurs when different drugs are
tried on different animals. In these cases we must decide how to assign

225 Some basic considerations in experimental design


the different treatments to the different experimental units. In other
experiments several tests may be made on the same experimental unit -
this occurs when different burners are compared using the same engine.
In these cases the individual tests must be carried out sequentially and
then it is important to make a careful choice of the order in which the
tests should be performed.
Finally, the analysis of the experimental results should give us
estimates of the treatment and factor effects and also of the residual
variation. In order to carry out such an analysis, it is essential to write
down a mathematical model to describe the particular experimental
situation. This mathematical model will involve a number of assump­
tions concerning the relationship between the response variable and the
treatments or factors, and, before we go on to consider some experi­
mental designs in greater detail, it will be useful to become familiar
with some of the assumptions which are commonly made in formulating
mathematical models.

10.2 A mathematical model for simple comparative experiments


We begin by discussing the mathematical model for a simple compara­
tive experiment involving several different treatments. It is usually
reasonable to make what is called the assumption of additivity. This
says that the value of the response variable is equal to
/a q u a n tity d e p e n d in g /a n ‘e r r o r ’ term
I o n th e t r e a tm e n t d e p e n d in g o n
u se d e n v ir o n m e n ta l c o n d itio n s ,
\ t h e e x p e r im e n ta l u n it, e tc .

If in fact these effects are thought to be multiplicative then we can


make a similar assumption by working with the logarithms of the
observations.
If there are c treatments, and n observations are made on each treat­
ment, then a suitable mathematical model for this comparative
experiment is given by the following:
Xij = fii + Sij (i = l , . . . , c ; y = l , . . . , n),
where
Xij = jih observation on treatment /,
ju, = average value of observations made on treatment /,
= random error.

226 The design and analysis of comparative experiments


In addition to the assumption of additivity, it is often necessary to
make further assumptions about the distribution of the random errors.
Firstly we assume that the errors are normally distributed with mean
zero. We further assume that the error variance does not depend on
the treatment involved - in other words the error variance is homo­
geneous. This is usually denoted by o^. Finally we assume that suc­
cessive random errors are uncorrelated.
Similar assumptions to the above will be made in most of the models
discussed in the next two chapters. Of course these assumptions
will not always be satisfied exactly. However they usually hold suffi­
ciently well to justify the methods of analysis which will be described
here. In section 10.9 we shall see how to check that the assumptions
are reasonably valid.

10.3 The number of replications


In a comparative experiment the number of observations on each
treatment is called the number of replications.
A fundamental principle of such experiments is that it is essential
to carry out more than one test on each treatment in order to estimate
the size of the experimental error and hence to get some idea of the
precision of the estimates of the treatment effects.
This can be illustrated by considering an experiment to compare two
methods of increasing the tensile strength of steel beams. Use method A
(the first treatment) on one beam and method B (the second treatment)
on a similar beam. Suppose the coded results (p.s.i. x 10^) are the
following:
A 1 = strength of beam treated by method A
= 180,
= strength of beam treated by method B
- 168.

Then we would suspect that method A was better than B. But we


really have no idea if the difference of 12 between zli and Bj is due to
the difference in treatments or is simply due to the natural variability
of the strength of steel beams. Even if there is no difference between
the two treatments it is highly unlikely that the results will be exactly
the same.

227 The number of replications


Suppose the experiment is repeated and that the following results
are obtained:
A , = 176 B, = 171.
Then an estimate of the difference between the two treatments is given
by
\{AI A2 — —B 2 ) — 82. 10.1
But now we also have two estimates of the residual variation, namely
(AI —A 2 ) and {B^ —B 2 ). These can be combined in two ways to give
^ A , - A 2 + B, -B2) = i 10.2
j{A ,-A2-B^-^B2) = 3i 10.3
The treatment comparison, 8^, is larger than the other two com­
parisons, I and 3|, and this is a definite indication that treatment
A is better than treatment B. Since there are only two treatments we
can compare the treatment effect with the residual variation by means
of a two-tailed i-test, provided that it is reasonable to assume that
successive observations on a particular treatment are normally dis­
tributed. It can be shown that the estimate of the residual standard
deviation is given by

_ 4 J ~
The standard error of the estimate of the treatment difference is
given by
/ / I 1
-+
number of observations on A number of observations on B

— ^2-

Thus the value of the i-test statistic is 8|/2^ = 3-4. But the estimate of
s is based on just two degrees of freedom (see below) and io o25,2 = 4*30
so that the result is not significant at the 5 per cent level. Nevertheless we
still suspect that there is a difference between the two methods and so it
would be advisable to make some more observations in order to improve
the power of the test.
It is difficult to give general realistic advice about the number of
replications required in a specific situation. The larger the number of
replications, the smaller will be the standard error of the difference

228 The design and analysis of comparative experiments


between two treatment means (V2 cr/Vn) and hence the larger the
power of the resulting significance test. Clearly if it is desired to detect
a ‘small’ difference between two treatments then a ‘large’ number of
replications will be required, and so it may be possible to determine
the required number of replications by considering the power of the
test. Some general comments on this problem are given in Cochran
and Cox (1957).
In the above example there were just two treatments. More generally,
three or more treatments may be compared and then it is necessary
to use the technique called the analysis of variance rather than a whole
series of i-tests. This technique is described in section 10.7.
At this point it is convenient to re-introduce the concept of degrees of
freedom, which, in experimental design, can be thought of as the number
of comparisons available. In the example discussed above, there were
four measurements A2 , and B 2 , and these can be combined to
give exactly three independent comparisons as given in equations
10.1,10.2 and 10.3. Any other comparison of the four observations, for
which the sum of the coefficients is zero, can be obtained from these
comparisons. For example ( Tj —/I 2) can be obtained by adding 10.2
to 10.3. The number of independent comparisons is equal to the number
of degrees of freedom of the observations; in this case Just three. This,
of course, is the same as the number of degrees of freedom of the
standard deviation of all four observations.
One of the degrees of freedom corresponds to the comparison
between the two treatments; namely equation 10.1. The two remaining
degrees of freedom correspond to the residual variation and this is
the number of degrees of freedom of the residual standard deviation, s.
The two comparisons are {A 1 —A 2 ) and (Bj—^ 2) or alternatively the
equations 10.2 and 10.3. The over-all degrees of freedom (abbreviated
d.f.) can be tabulated as follows.
d.f.
T rea tm en ts 1
R esid u a l 2
T o ta l 3
More generally, if n observations are made on each of c treatments,
the total number of degrees of freedom can be broken down as follows.
d.f.
T rea tm en ts c- 1
R esid u a l c{n -\)
T o ta l cn—1

229 The number of replications


The notion of degrees of freedom will occur repeatedly throughout
the next two chapters, and it is to be hoped that the student is familiar
with the idea by now.
10.4 Randomization
This section is concerned with the most important basic principle of
good experimentation, namely randomization. We have already seen
that there is often a substantial amount of uncontrolled or residual
variation in any experiment, so no test can ever be repeated exactly.
This residual variation is caused by factors which have not been (or
cannot be) controlled by the experimenter. These factors are called
uncontrolled or nuisance factors. Some of these nuisance factors are
functions of time, place and the experimental units, and they are liable
to produce trends in the data quite apart from the eifects we are looking
for. This would mean that successive error terms would be correlated
so that one of the assumptions mentioned in section 10.2 would not
apply. Fortunately there is a simple but powerful solution which
ensures as far as possible that there is no systematic error in the results.
This technique is called randomization and should be used wherever
possible.
If the tests are performed sequentially in time, then the order of the
tests should be randomized. If a dififerent experimental unit is used in
each test, then the unit should be selected randomly. Both these
operations can be done by using a table of random numbers.
For example, let us look once again at the experiment for comparing
three burners for a turbo-jet engine. If several tests are made on the
same engine, the engine may gradually deteriorate resulting in a
gradual decrease in performance. If we begin with several tests on ,
continue with several tests on B2 and finish with several tests on B3 ,
burner B3 will then be at a serious disadvantage compared with B^.
We may mistakenly attribute the decline in performance to the burners
used later in the test programme, rather than to the ageing of the engine.
It is easy to think of other factors which might produce similar trends
in the data. The meters and gauges may develop calibration shifts later
in the test programme ; the fuel supply may be replenished with fuel with
slightly different properties ; the engine performance is sensitive to the
ambient temperature and pressure and this effect can only be partially
removed by a correction factor, so there may be spurious differences
between day and night and from hot days to cold days. All these effects
may result in a built-in systematic error unless the experiment is
randomized. This is done as follows.

230 The design and analysis of comparative experiments


Suppose we decide to make six observations on each burner. Assign
numbers 1 to 6 to 7 to 12 to B2 and 13 to 18 to B3. Enter a table
of random numbers at any point and take numbers from the table by
moving in a pre-arranged direction, taking two digits at a time. If the
number is between 19 and 99 it is discarded. However if we obtain
say 16, then the first test is carried out on B3, and if the second number
is say 04, the second test is carried out on B^. And so on.
Considering another example, suppose that we wanted to compare
three anti-corrosion paints. Then it would be stupid to paint six
bridges in England with paint A, six bridges in Wales with paint B
and six bridges in Scotland with paint C. If paint A appeared to give the
best results we would be unable to tell if this was because paint A
was ‘best’, or simply because the weather in England is better than the
weather in Wales or Scotland. Instead the three treatments A, B and
C could be randomly assigned to the eighteen experimental units (the
bridges), or better still a randomized block experiment could be
employed as described in section 10. 10.
Of course the use of randomization is sometimes restricted by the
experimental situation. It may force additional assembly and dis­
assembly of the machinery, and it may consume more time. The
engineering statistician must weigh the risk of error in his conclusions
against the ease of testing. The risk of error without some form of
randomization may be very great unless the experimental procedures
are extremely well established from countless experiments.
Randomization may sometimes be impossible. For example, the
significantly high correlation between smoking and lung cancer does
not necessarily prove a causal relationship between the two. One method
of proving or disproving this would be to carry out the following ran­
domized experiment. Select several thousand children and randomly
divide them into two groups, one to smoke and one not to smoke.
If the smoking group developed a significantly higher incidence of
lung cancer over the years, it would prove statistically that there really
is a causal relationship. But an experiment of this type is of course
completely out of the question. Therefore without randomization we
can only say that the case is not proven, but that it would still be sensible
to give up smoking.

10.5 The analysis of a randomized comparative experiment


Let us suppose that n observations are taken on each of c treatments.

231 The analysis of a randomized comparative experiment


The appropriate mathematical model is given by
Xij = {i = l , . . . , c ;7 = l , . . . , n).
We will call this model lOA. The resulting data can be tabulated as
follows.

T a b le 15

Observations Total Sample Population Sample


mean mean variance

T r ea tm en t 1 T, Xi /^i

T r ea tm en t 2 ^21 ^ Xl2^• • • ’ Xln T2 Xl Pi 4

T r ea tm en t c •^cl ’ X(-2? • • ' ? Tc Xc Pc 4

We have

T>=i X,,.
j= l

,2 ^ Y (X j J - X j f

‘ A,
J= 1
n -l
It is easy to show that the average observation on treatment i is the
best point estimate of
Thus Xi = fii.
Our over-all aim is to compare these observed treatment means.
If they are ‘close together’, the differences may be simply due to the
residual variation. On the other hand if they are widely separated, then
there probably is a significant difference between the treatments. We
will describe two methods of data testing used to decide whether all
the theoretical treatment means, fii, are equal. However it will some­
times be ‘obvious’ just by looking at the data that one treatment is
much better than all the other treatments. If we are simply concerned
with finding the ‘best’ treatment, rather than with finding good esti­
mates of the treatment differences, then there may be no point in

232 The design and analysis of comparative experiments


carrying out further analysis. Conversely if there are only small
differences between the observed treatment means, and if these are
judged not to be of practical importance, then there may also be no
point in carrying out further analysis. (It is to be hoped that by now
the reader appreciates the distinction between statistical and practical
significance.)
For many purposes it is more convenient to work with the following
model. This is equivalent to model lOA.
Xij = {i = l , . . . , c ; j =
where
¡A = over-all average,
ti = effect of ah treatment,
Kij = random error.
We will call this model lOB. In terms of model lOA we have

Model lOB appears to involve one more unknown parameter. However,


c
as jj. is the over-all average, we have the restriction that ^ i, = 0, so
i= 1
both models have the same number of independent parameters.
The best point estimates of these new parameters are as follows.
c
Let T = Ti = (grand total of the observations)

and X = — = (over-all observed mean).


cn
Then the best point estimate of /a is given by fl = x, and the best point
estimate of i, is given by
¡i = Xi-X .

Example 1
Steel wire was made by four manufacturers A, B, C and D. In order to
compare their products ten samples were randomly drawn from a
batch of wire made by each manufacturer and the strength of each piece
of wire was measured. The (coded) values are given overleaf and are
plotted in Figure D.2 in Appendix D.4 as box plots.

233 The analysis of a randomized comparative experiment


A B C D
55 70 70 90
50 80 60 115
80 85 65 80
60 105 75 70
70 65 90 95
75 100 40 100
40 90 95 105
45 95 70 90
80 100 65 100
70 70 75 60
Total 625 860 705 905
Mean 62*5 86-0 70-5 90-5
Variance 212-5 204-4 235-8 274-7
T = 3095 X = 77-4.

In terms of model lOA we will denote the mean strength of wire made
by A, B, C and D to be /t 2, /ta and respectively. The estimates of
these parameters are given by fl^ = 62-5, /¿2 = 86-0, = 70-5 and
fi^ = 90-5.
In terms of the equivalent model lOB we will denote the over-all
mean strength by and the effects of A, B, C and D by , i 2» ^3 and
The estimates of these parameters are given by
fl = 17A i i = - 1 4 - 9 ?2 = 8-6 i3=-6-9 = 1 3 -1

Note that the sum of these treatment effects is zero - except for a
round-off error of 0*1 .
We have now completed the initial stages of the analysis which
consists of setting up a mathematical model to describe the particular
situation and obtaining point estimates of the unknown parameters.
The next problem is to decide whether or not there is a significant
difference between the observed treatment means. In other words we
want to test the hypothesis

^0 • = /^2 = - - - = in terms of model lOA


or
H o : a l l if = 0 in terms of model lOB.
If there are just two treatments then we can use a i-test to compare
Xi with 3c2 . Therefore with more than two treatments it might be
supposed that we should perform a whole series of i-tests by testing
the difference between each pair of means. But this is not so as the

234 The design and analysis of comparative experiments


over-all level of significance is affected by the number of tests which are
performed. If the significance level is 5 per cent and there are say eight
treatments then we have to perform twenty-eight tests (= ^ € 2 ). and the
probability that at least one test will give a significant result if H q is
true is approximately 1—(1—005)^® =0-76. This result is not un­
expected when we consider that, even if H q is true, the larger the number
of treatments the larger the average difference between the largest and
smallest observed treatment means will be. In other words the range of
the treatment means increases with the number of treatments.
We will describe two methods of testing H q. The first method is
based on the range of the treatment means. The second method is a
much more general technique called the analysis of variance, which will
also be used when testing similar hypotheses in many other types of
experimental design.

10.6 The range test*


We begin by defining the studentized range. Let Xi, X2, . . . , x,. be a
random sample size c from a normal distribution with variance
The average value of the sample range will be directly proportional to
<7 and so the sampling distribution of the range can be obtained by
considering
range(xi, X2, .. •, x j

In most practical problems the parameter a is unknown. However let us


suppose that we have an independent estimate, s^, of (t^, which is
based on v degrees of freedom. Then the studentized range is given by
range(xi,X2,...,x,)

The sampling distribution of q has been evaluated for different values


of Vand c, and tables of the percentage points are given in Appendix B.
The point qfc, v) is such that
probability(i7 > q j =oi.
We will now describe how the studentized range is used in a com­
parative experiment. We have c treatment means each of which is the
mean of n observations. The null hypothesis, which we wish to test,
is given by
H q : all ti = 0 in terms of model lOB.
*Many readers may prefer to proceed directly to section 10.7.

235 The range test


If H q is true the observed treatment means will be a random sample of
size c from a normal distribution with variance a^ln. We can obtain
an independent estimate of by considering the observed treatment
variances. The estimated variance of observations on treatment i is
given by

n — 1
j= 1
and this is an estimate of Thus the combined estimate of from
the variation within groups is given by

I S1
i= 1
^1? c (n - \)

This estimate of has cin—\) degrees of freedom. Note that this


estimate only depends on the variation within each group and does not
depend on the variation between treatments (between groups). Thus
this quantity is an estimate of whether or not H q is true. An estimate
of the standard error of the treatment means is given by s/J n and so the
ratio
range(xi, X2, . . . , x,)
Q=
sisjn
will have the same distribution as the studentized range provided that
H q is true. However if H q is not true we expect the range of the treatment
means to be higher than expected giving a ‘large’ value of q. If the
observed value of q, which will be denoted by qQ, is higher than
qj^c, c(n— 1 )}, then the result is significant at the a level of significance.

Example 1 continued
We have the following information
Sample mean Sample variance
Manufacturer A 6 25 212-5
Manufacturer B 860 204-4
Manufacturer C 70-5 235-8
Manufacturer D 90-5 274-7
Largest treatment mean = 90-5, smallest treatment mean = 62*5,
range = 90*5 —62-5 = 280,

236 The design and analysis of comparative experiments


212-5 + 204-4 + 235-8 + 274-7
5^ =

= 231-8.
This is based on thirty-six degrees of freedom.
28-0
(¡0 = 5-82
V(231-8/10)
^0 05(4’ 36) = 3-80.
Thus the result is significant at the 5 per cent level. The data suggests
that there really is a significant difference between the strength of wire
made by the four manufacturers.

10.7 One-way analysis of variance


This section describes a second method of testing the hypothesis that
there is no difference between a number of treatments. The total
variation of the observations is partitioned into two components, one
measuring the variability between the group means, , X2 , . . . , and
the other measuring the variation within each group. These two com­
ponents are compared by means of an F-test.
The procedure of comparing different components of variation is
called the analysis of variance. In the above situation the observations
are divided into c mutually exclusive categories and this is called a
one-way classification. We then have a one-way analysis of variance.
It is a little more complicated than the range test but is often more
efficient and has the advantage that a similar technique can be applied
to more complex situations where the observations are classified by
two or more criteria.
We have already seen that the combined estimate of cr^ from the
variation within groups is given by
c -y c n
^ \2
(X ij-X j)

i= 1

and this is based on c{n—l) degrees of freedom.


We now look at the variation between groups. The observed variance
of the treatment means is given by
c
( X j - x f

2
i= 1
1

237 One-way analysis of variance


If the null hypothesis is true, this is an estimate of since the
standard error of the treatments means will be ol^n. Thus
t

"2 c —\

is an estimate of based on c —1 degrees of freedom.


If H q is true both and are estimates of g^ and the ratio F =
will follow an F-distribution with c — 1 and c(n —1 ) degrees of freedom.
On the other hand, if H q is not true, will still be an estimate of g^
but s | will be increased by the treatment differences and so the F-ratio
may be significantly large. In this case we reject H q and conclude that
there is evidence of a difference between the treatment effects.
The usual way to obtain the F-ratio is to calculate the following
quantities and enter them in what is called an analysis of variance
(Anova) table.

Table 16
One-way Anova

Source of variation Sum of squares d.f. Mean square

Between groups (between treatments) n Y (Xi-xf c—1 Sb


i=1

Within groups (residual variation) Z Z iX ij -x f c (n -l)


i=1 j=1

Total variation Z Z (Xij-xf cn—l


i=l j=l

The two mean squares, and s^, are obtained by dividing the
appropriate sum of squares by the appropriate number of degrees of
freedom.
An important feature of the A nova table is that the total sum of
squares is equal to the sum of the between-group and within-group sum
of squares. This can be shown as follows. We have

X ..-X = (x ,-.-X f) + ( X f - x ) .

238 The design and analysis of comparative experiments


Squaring both sides and summing over all values of i and j, we find

Z - ^)^= Z (^0--
i j ij
+Z - ^)^+ 2z (^0--
ij ij

However the sum of the cross product terms is zero since

Z Z (^0—
i= i j = i
= iZ= i L Z
j = i
n
and Y, i^ij ~ ^i) = ^ i.
7=1

Also ZZ
i = l j = l
= niZ= l
and we have the required result.
Another important feature of the A nova table is that the total
number of degrees of freedom is equal to the sum of the between group
and within group degrees of freedom. The details of the computation
are of some interest although most analyses are now carried out using
a computer package. Firstly calculate the group totals 7J(i = 1 to c)
and the over-all total T. Hence find the group means x.(i = 1 to c) and
the over-all mean x. Also calculate Y ^5» which is sometimes called the
iJ
total uncorrected sum of squares. Secondly calculate the total sum of
squares (sometimes called the total corrected sum of squares) which is
given by
Z (^o—
T2
= z^ cn ’

The quantity cnx^ or T^/cn is sometimes called the correction factor.


Thirdly calculate the between-group sum of squares which is given by
c c

^ Y =^ Y ~

T2
- -n i n -
cn '

Note that the same correction factor is present. Lastly the residual
sum of squares can be obtained by subtraction.

239 One-way analysis of variance


Example 1 continued
We have Xi = 62*5, X2 = 860, X3 = 70-5, X4 = 90-5, x = 77-4 Also
X xfj = 252915 n X xf = 244627.
ij i=l
and
cnx^ z= 40 X 77-4 x 77-4 = 239476.
(The total corrected sum of squares) = 252975 —239476
= 13499.

(The between-group sum of squares) = 244627 —239476


= 5151,
(the residual sum of squares) = 13499 —5151
= 8348.
We can now construct the A nova table.
Source Sum of squares df Mean square
Treatments 5151 3 m i
Residual 8348 36 231-8
Total 13499 39

1717
F-ratio =
231-8
= 7-41.
^ 0 0 1 , 3 ,3 6 = 4 *3 9 .

Thus the result is significant at the 1 per cent level and we reject the
hypothesis that there is no difference between wire made by the four
manufacturers. This, of course, is the same conclusion that was ob­
tained with a range test. The reader will also notice that the residual
mean square is the same as the estimate of which was obtained in
the range test. However the way it is computed is completely different.
It is useful to know the average or expected values of the treatment
and residual mean squares. Since is an unbiased estimate of we
have

240 The design and analysis of comparative experiments


^(residual mean square) = E(s^) =
whether or not H q is true. It can also be shown (see Exercise 1) that
£(treatment mean square) = E(s|)

tf
= (7^ + n 2 c —1

Thus E{sl) = only if H q is true. If H q is not true, then the larger the
treatment effects are, the larger we can expect the treatment mean
square and hence the F-ratio to be.

10.8 Follow-up study of the treatment means


The next problem is to decide what to do when a one-way A nova (or
range test) indicates that there is a significant difference between the
treatments. Some typical questions which we might ask are the follow­
ing.
(1) Is one treatment much better than all the other treatments?
(2) Is one treatment much worse than all the other treatments?
(3) Can the treatments be divided into several homogeneous groups in
each of which the treatments are not significantly different?
One way of answering these questions is to calculate a quantity
called the least significant difference. The variance of the difference
between two treatment means is given by

var(x,-
n n
Thus an estimate of the standard error of the difference between the
two treatments is given by

s ^ | - | where s is the square root of the residual mean square. Thus


the least difference between two means which is significant at the a
level is

-\t ^(x,c(n- 1)’

since the estimate s of d is based on c{n—l) d.f. The treatments means


are arranged in order of magnitude and the difference between any pair

241 Follow-up study of the treatment means


of means can be compared with the least significant difference. If the
gap between any two means is less than the least significant difference
then the treatments are not significantly different. However if the
gap between two successive means is larger than the least significant
difference then we have a division between two groups of treatments.
One way of recording the results is to draw lines underneath the
means so that any two means which have the same underlining are not
significantly different.

Example 1 continued
The four means are
= 62*5 Xg = 860
Xc = 70-5 Xj) = 90*5.
The estimate 5 of cr is given by
5 = 7231-8 = 15-2
and is based on thirty-six degrees of freedom.
Choose a significance level of 5 per cent. Then the least significant
difference is given by

Arranging the treatment means in order of magnitude we have

but (xd —Xg) and (xc —x^) are both less than 13*8, whereas (xg —Xc) is
greater than 13*8. Thus we conclude that manufacturers B and D are
better than A and C but that there is not a significant difference between
B and D. We can express these results as follows.
Xa Xc Xg Xd

The astute reader will notice that the above procedure is nothing
more than a multiple i-test. Earlier in the chapter, we decided that this
could not be used to test the hypothesis that the treatment means are
not significantly different. However it is often used as a follow-up
technique to see how the treatments are grouped, once the null
hypothesis has been rejected by an analysis of variance or range test.
The method will still tend to give too many significant differences,
but it is simple to use and for this reason is often preferred to such

242 The design and analysis of comparative experiments


methods as Tukey’s T-method (see, for example, Wetherill, 1981,
Chapter 13) which arguably give more reliable results.
If so desired we can also use the i-distribution to calculate confidence
intervals for the true treatment means. The 100(1 —a) per cent confi-
dence interval for /i,- is given by x,-±ra/2, c(n- 1) 5/V"-

Example 1 continued
^0Q25,36 ~ ^ ~ \5’2.
The 95 per cent confidence intervals for I^c are given by
62-5 ±9-8 86-0 ±9-8
70-5 ±9-8 90-5 ±9-8.
In the above statements the risk that a particular Pi will be outside
its confidence interval is a. But we have formed c such intervals. Thus
the expected number of mistakes will be ecu. In other words the risk
that at least one of the above statements is false will be much higher
than a and is in fact close to 1 —(1 —a)"". In order to form confidence
intervals so that the overall risk of making one or more incorrect
statements is say a', we must calculate the corresponding value of a
from the relation
a = l-(l-a)^

10.9 Verifying the model


A number of assumptions have been incorporated in the mathematical
model lOA or lOB. If any of these assumptions are false, some or all
of the preceding analysis may be invalidated. Some of the assumptions
can be checked by looking at the residuals. In a simple comparative
experiment the residual variation is the variation not accounted for
by the treatment effects. Thus the residual of a particular observation
is the difference between the observation and the average observation
on that particular treatment.
{i = 1 , . . . , = 1 , . . . , n).
One of the worst possibilities is that the residuals may not be random.
For example, we may get a series of negative residuals followed by a
series of positive residuals. This would mean that an uncontrolled
factor was systematically affecting the results which would make any

243 Verifying the model


conclusions suspect. Fortunately the possibility of non-randomness
can often be overcome by randomization. Alternatively it can be over­
come by a technique called blocking which will be described in the next
section.
Another possibility is that the errors may not be normally distributed.
Fortunately the F-test and the i-test are both robust to departures
from the normality assumption. However it will occasionally be neces­
sary to transform the data in order to avoid a distribution of residuals
which is markedly skew. The logarithmic transformation is commonly
used.
A further possibility is that the error variance does vary from treat­
ment to treatment. In this case the treatment means should be compared
not with a combined estimate of the variance but with the individual
variance estimates of the particular treatments. Alternatively it may be
possible to transform the data so that the error variance is constant.
In particular if we suspect that the treatment standard deviation is
directly proportional to the treatment mean then we can make the
variance homogeneous with a logarithmic transformation. Visual
inspection of the group variances is usually sufficient and in Example 1
there is no evidence that the variance is not homogeneous. The
standard method of testing for homogeneity of variance is by Bartlett’s
test. This test and other aspects of residuals and departures from
assumptions are described by Wetherill (1981).

10.10 The randomized block experiment


We have seen that randomization is a simple but effective way of
eliminating systematic error in many experiments. But the experimenter
may still be aware of one particular environmental factor which
contributes substantially to the uncontrolled variation. For example, if
the experiment extends over several days, then observations made on
the same day will show better agreement than those made on different
days. Thus there is a danger of introducing systematic error unless the
randomization procedure happens to give an equal number of tests
on each treatment in each day. We can avoid this danger by dividing
the tests into groups which are ‘close together’ in some way; for
example, measurements made by the same operator or measurements
made on the same batch of raw material. These groups are called
blocks. If an equal number of measurements is made on each treatment
in each block and if the order of tests within a block is randomized,
then the experiment is called a randomized block experiment.

244 The design and analysis of comparative experiments


The technique of blocking is a very useful method of increasing the
precision of comparative experiments. All comparisons are made
within a block and not from block to block. For example, let us look
once again at the experiment for comparing the three anti-corrosion
paints. A fully randomized design might happen to assign the treat­
ments in the following way.
England: 4 tests on A 1 test on B 1 test on C,
Wales; 1 test on A 3 tests on B 2 tests on C,
Scotland: 1 test on A 2 tests on B 3 tests on C.
Thus if the weather is indeed better in England, then it would again
be true that paint A has an unfair advantage over the other two paints.
The ‘obvious’ way to carry out the experiment is to make two observa­
tions on each paint in England, Scotland and Wales. Within each
country the paints will be assigned randomly to the six bridges. Here
the countries are the blocks and we can now compare the observations
within a particular country.
Let us also reconsider the experiment in which three burners are
compared in a turbo-jet engine. Suppose that four observations are
to be made on each burner, but that only six observations can be made
in one day. If we suspect a significant difference from day to day, we must
try to design the experiment so that an equal number of observations
are made on each burner in each day. In other words on day 1 two
observations should be made on each of B^, B2 and B3, and similarly
on day 2. Within each day the order of the six experiments is randomized
and we then have a randomized block experiment. A similar situation
would arise if the tests were to be carried out on two different engines.
Since there is likely to be a difference between the two engines, an
equal number of observations must be made on each burner in each
engine.
It is easy to think of many other experimental situations in which
blocking is necessary and the randomized block experiment is prob­
ably the most important type of comparative experiment. If there
is just one observation on each of c treatments in each of r blocks a
suitable mathematical model is given by the following:
{i = l , . . . , r ; j - l , . . . , c).
where
Xij = observation on treatment j in block /,
= over-all average of the response variable.

245 The randomized block experiment


bi effect of the ith block,
tj = effect of the jth treatment,
Eij = random error.
Since /I is the over-all mean, the treatment and block effects must
be such that

i: = 1 0 = 0-
i=l j=l
Notice that the model assumes that the treatment and block effects
are additive. It is also convenient to assume that the errors are normally
distributed with mean zero and constant variance and that the
errors are independent.
The data can be tabulated as follows.

Table 17

Treatment 1 . .. Treatment c Row total Row average

Block 1 Xii •• ^ic Ti. Xi.


Block 2 X21 .. X2c T2. X2.

Block r X,i .. X,, Tr. X,.

Column
total T, .. Ti.f.
Column
average X.i .. X.,

Grand total of the observations = T =Y^T^. = Y^Tj.


i j
(It is a good idea to check that the sum of the row totals is equal to
the sum of the column totals.)
T
(Over-all average) = x = —.
rc
The best point estimates of the unknown parameters are
¡1 = X,
bi = X i.-x (i = 1 to r),
ij = x . j - x {j = 1 to c).

246 The design and analysis of comparative experiments


Example 2
In order to compare three burners, 6 1,6 2 and 63 , one observation is
made on each burner on each of four successive days. The data is
tabulated below.

B, B3 Row total Row average

Day 1 (Block 1) 21 23 24 68 22-67


Day 2 (Block 2) 18 17 23 58 19-33
Day 3 (Block 3) 18 21 20 59 19-67
Day 4 (Block 4) 17 20 22 59 19-67

Column total 74 81 89
Column average 18-50 20-25 22-25

T= 244 x = 20-33 = fi.


i, = (effect of 61 ) = 18-50-20-33 = -1-83,
t 2 = 20-25-20-33 = -0-08,
?3 = 22-25-20-33 = 1-92,
51 = (effect of day 1) - 22-67-20-33 - 2-34,
52 = 19-33-20-33 = -1-00,
53 = 19-67-20-33 = -0-66,
54 = 19-67-20-33 = -0-66.
Notice that the sum of the treatment effects and the sum of the block
effects are both zero, except for round-off errors of 0-01 and 0-02
respectively.
It looks as though burner 6 3 gives the best results but this is by no
means certain from a visual inspection of the data since results from 63
are not always larger than those of 62 , even within a particular block.
In the next section we shall see how to carry out a two-way analysis
of variance in order to see if the differences between the burners are
significantly large.
It is important to realise that the estimates of the treatment effects
do not depend on the variation between blocks. In order to convince
himself of this fact the reader should try adding a constant to all the
observations in say block 1 of Example 2. This will alter the block

247 The randomized block experiment


effects and the over-all mean but the estimates of the treatment effects
will not change.

10.11 Two-way analysis of variance


In a randomized block experiment the data is classified according to
two characteristics in a two-way table. With one observation on each
treatment in each block, we have proposed the mathematical model
X ij = /X+ hi + + Sij {i = 1 to r ; 7 = 1 to c)
and have seen how to obtain point estimates of the model parameters.
The next problem is to see if there is a significant difference between
the observed treatment effects. In other words we want to test the
hypothesis
H q : all tj = 0.
The total corrected sum of squares is given by Y, -x)^ and this
can be split up into three components in a somewhat similar way to
that described in the one-way analysis of variance. One of these
components measures the variation between treatments, one measures
the variation between blocks and the third measures the residual
variation. The treatment variation is then compared with the residual
variation by means of an F-test. Since the data is classified according
to two characteristics (treatment and block), this procedure is called a
two-way analysis of variance.
The required algebraic identity is

j= 1
Z (^-j

and the student is asked to verify this equation in Exercise 3. The


first component on the right-hand side of this equation measures the
block variation, the second component measures the treatment varia­
tion and the third component measures the residual variation.
The corresponding degrees of freedom are as follows.
d.f

Blocks (rows) r—1


Treatments (columns) c —1
Residual ( r - l ) ( c- l )

Total rc—i

248 The design and analysis of comparative experiments


Each sum of squares is now divided by the appropriate number of
degrees of freedom to give the corresponding mean square. It can be
shown that the average values of these mean squares are as follows.
r
E(block mean square) = -------/ hf,
r —1 1 =1
c

^(treatment mean square) = ------t],


j= 1
^(residual mean square) (7^
Neither the treatment mean square nor the residual mean square is
affected by whether or not there is a significant variation between
blocks. Thus the ratio
treatment mean square
F =
residual mean square
will follow an F-distribution with (c—1 ) and (r—l)(c—1 ) degrees of
freedom if the null hypothesis is true. The observed F-ratio can then
be compared with the chosen upper percentage point of this F-distribu-
tion. The important point to grasp is that, if there is a block effect, the
residual mean square will certainly be smaller in a randomized block
experiment than it would have been in a simple randomized experiment.
Thus the blocking technique enables us to carry out an F-test which
is more sensitive to treatment effects.
The A nova table is as follows.
Table 18
Two-way Anova

Source of
variation Sum of squares d.f. E(mean square)

Treatments r X (x.j- xf c—1


(columns) j=1

Blocks (rows) c Y iXi.-xf r-1


i=1
Residual Y(Xij-Xi-X.j +xf (r-D (c -l)
ij
Total rc —1
iJ

249 Two-way analysis of variance


The details of the computation are of some interest although most
analyses are now executed by a computer package. First calculate the
row and column totals and the over-all total. Hence find the row and
column averages and the over-all average; also calculate and the
correction factor T^/cr or crx^. Then the total corrected sum of
squares is given by
T2
= 2 4 - cr
Secondly the treatment
treatme (column) sum of squares is given by
c c j 2
r 2 (3C.J-X)" = r 2
cr

cr
j= 1
Thirdly the block (row) sum of squares is given by
r^ r^ Ti
c2
i= 1
(Xi.-x)^^^2 cr

‘ - i n - cr
Notice that the correction factor appears in all three equations. The
residual sum of squares can now be obtained by subtraction.
Each sum of squares is now divided by the appropriate number of
degrees of freedom to give the required mean squares. Two F-ratios
can now be calculated by dividing first the treatment mean square,
and secondly the block mean square, by the residual mean square. The
observed treatment F-ratio can then be compared with the upper
percentage point of the appropriate F-distribution in order to test the
hypothesis that all the treatment effects are zero. If the block effect is of
interest we can also test the hypothesis that all the block effects are
zero by considering the observed block F-ratio.

Example 2 continued

^ x f j = 5026 “ 2 ^ ‘" ^ 4983-33 = 4989-50,

Correction factor = 4961*33.

250 The design and analysis of comparative experiments


Source Sum of squares d.f Mean square F-ratio
Treatments 2817 2 14-08 5-8
Blocks 2200 3 7-33 3-0
Residual 14-50 6 2-42
Total 64-67 11

But Fq.05,2,6 = Thus the treatments effects are significant at the


5 per cent level and so we have reasonable evidence that there is a
difference between the burners.
We also have Fq 05 , 3 , 6 = ^‘^6 so the block effect is not significant
at the 5 per cent level. Thus we have no real evidence that there is a
difference between days. Nevertheless the block mean square is three
times as large as the residual mean square and so we have increased
the precision of the experiment by blocking.

In general, if n observations are made on each treatment in each


block a suitable mathematical model is given by

'^ijk = tj + £ijk

where i = 1 to r ,; = 1 to c, /c = 1 to m and = kth observation on


the 7th treatment in block i. The analysis begins as before by finding
the average observation on each treatment, the average observation in
each block and the over-all average. This gives estimates of the treat­
ment and block effects as found before. The A n o v a table is shown
below.

Table 19

Source Sum of squares d.f. E(mean square)

Treatments nr^{x.j-xf c—1


^c - 1
j j

Blocks nc'^(Xi.-xf r-1 2 V


i (
Residual by subtraction by subtraction

Total nrc—1
ij,k

251 Two-way analysis of variance


An F-test can now be carried out to test the hypothesis that the
treatment effects are zero. Finally the assumptions contained in the
model proposed for the randomized block experiment can be checked
in a similar way to that described in section 10.9. A full discussion of the
examination of residuals is given in section 11.9.

10.12 Latin squares


The randomized block experiment is by far the most important type
of comparative experiment. However we will briefly mention two
types of modification; namely Latin square designs and balanced
incomplete block designs.
Sometimes it is possible to think of two different ways of dividing
the experiment into blocks. For example if we decide to use three
engines to compare three burners Bj, B2 and B3 and to make three
tests on each burner then a possible design is the following.

Time of day
Time 1 Time 2 Time 3
Engine 1 B2 Bj
Engine 2 B3 B, B2
Engine 3 B2 B3 Bi

The engines and times of day form the two types of block. The significant
point is that each burner only appears once in each row and in each
column. A design in which each treatment appears exactly once in each
row and each column is called a Latin square design.
It is easy to construct a Latin square. For example, the following
is a 4 X 4 Latin square.
A B C D
D A B C
C D A B
B C D A
The experiment can be randomized by randomly ordering the rows and
columns. The disadvantage of Latin square designs is that the number
of both types of block must equal the number of treatments, a condition
which is rather restrictive.
The analysis of results from a Latin square design is an extension of
the two-way analysis of variance. In addition to calculating the row

252 The design and analysis of comparative experiments


and column totals it is also necessary to calculate the sum of observa­
tions for each treatment.
= sum of observations on treatment i.
The numbers of rows, columns and treatments are all equal, and will be
denoted by m. The row, column and total sums of squares are calculated,
as in Table 18, by substituting m for r and c. However we can now
extract one more component from the total sum of squares to measure
the treatment variation. The treatment sum of squares is given by
1
-z
m I^= 1
2

The residual sum of squares can then be obtained by subtraction. The


degrees of freedom are as follows.
d.f.
Treatments m—1
Rows m—1
Columns m—1
Residual —3m+ 2
Total m^ -1

The treatment mean square can now be compared with the residual
mean square by means of an F-test.

10.13 Balanced incomplete block designs


Sometimes the number of tests which form a homogeneous group
is smaller than the number of treatments. Then it is not possible to test
each treatment in each block and we have what is called an incomplete
block design.
For example, suppose that we want to compare three burners in
the same engine but that it is only possible to make two test runs in
one day. Then the following is a possible design.
Day 1: Bj and B2 ,
Day 2: Bi and B 3,
Day 3: B2 and B 3,
Day 4: B^ and B2,
Day 5: B^ and B 3,
Day 6 : B2 and B3.
The experiment is randomized by tossing a coin to decide which
treatment comes first in each block.

253 Balanced incomplete block designs


An incomplete block design is called balanced if each pair of treat­
ments occur together the same number of times. By inspection the
above design is seen to be balanced as each burner occurs twice with
every other burner. Two treatments can be compared by considering
the differences within each block. To illustrate, we will consider the
above problem in which there are just two observations in each block.
Let = difference between the observations on Bi and B2 on day 1 ;
similarly for ¿2 ^^3 ^ ^^6 •
Let bi denote the effect of the ¿th burner. Then d^, d^ are both
estimates of (hi —¿ 2)5 ^ 2»^5 ^re both estimates of (hj —¿73); and ¿ 3,
are both estimates of (¿2 “ ^ 3).
But we can obtain better estimates of these differences by combining
all the observations. For example, to estimate (hj —b2 ) we note th^at
¿1 + ¿4 + ¿2 F i/5
is an estimate of 4h 1 —2^2 —2^ 3,
or 6hi —(2hi -h2^2 + 2/73)
and that - 1 - i/4 + i/3 +
is an estimate of 4^2 —2hj —2^ 3,
or 662 —(2hi + 2^2 + 2/73).
By subtraction we have :
2 d I ~h2tì?4 d 2 ~hd^ —d^ —d^
is an estimate of 6(hi —h2). Thus the estimate of the difference between
burners Bi and B2 is given by
^{2 d 1 + 2^4 + ¿2 + ^5 “ ^3 ~ ^ó)-
The general estimation of treatment effects and the analysis of
variance for a balanced incomplete block design is considered, for
example, in Cochran and Cox (1957).

Exercises
1. In a simple comparative experiment, show that the expected value of
c c

n Yj (^i~^)^ is equal to {c-l)cF^-\-n Y (Hint: \ar{x) = a^/nc.


i = l i= l

254 The design and analysis of comparative experiments


var(x,) == G^jn. Expand ^ (Xj —x)^ and show ^ x,x = cx^). Hence
1=1 i= 1
show that the expected value of the treatment mean square is given by

+ « Z i.-A f-l)-
i= 1
2. If the number of observations on each treatment in a one-way
analysis of variance is unequal and there are n, observations on treat­
ment /, show that

Z Z = Z Z Z n i(X i-x)\
i= 1 j= 1 1= 1 j= 1 i= 1
Also verify that the number of degrees of freedom corresponding to
the total sum of squares, the treatment sum of squares and the residual
sum of squares are N —1, c —1 and N —c respectively where

N = t
i= 1
3. In a two-way analysis of variance show that

Z Z (Xij-xf
i= 1 j=l
= C Z
i= 1
{X i.-x ^ +r Z
j= 1
( x . j - x f

+Z Zi=lJ=1
{X ij-X i.-X .j + x f .

4. Test the hypothesis that the following four sets of data are homo-
geneous.

Set 1 : 7, 8, 7, 10.
Set 2 : 6, 8, 9, 7.
Set 3: 11 , 10, 12, 9.
Set 4: 12, 10, 9, 10.

5. The following data are the results from a randomized block experi­
ment, which was set up to compare four treatments.

T r e a tm e n t
1 2 3 4
1 6 8 7 11
Block 2 5 6 6 8
3 9 9 11 12

Test the hypothesis that there is no difference between the treatments.

255 Exercises
6. The following data resulted from an experiment to compare three
burners, Bi, B2 and B 3. A Latin square design was used as the tests
were made on three engines and were spread over three days.
Engine 1 Engine 2 Engine 3
Day 1 B. 16 B2 17 B3 20
Day 2 B2 16 B3 21 Bi 15
Day 3 Bj 15 B. 12 B2 13

Test the hypothesis that there is no difference between the burners.


7. Set up a balanced incomplete block design to compare five treat­
ments in blocks of size three.

References
Box, G. E. P ., H un t e r , W. G . , and H un t e r , J. S. (1978), Statistics for
Experimenters, Wiley.
C o c h r a n , W . G . , and C o x , G . M . ( 1957), Experimental Designs, W iley, 2nd edn.
C o x , D . R . ( 1 9 5 8 ), Planning of Experiments, Wiley.
D avi es, O . L. (e d.) ( 1 9 5 6 ) , Design and Analysis of Industrial Experiments, O liv e r &
B o y d , 2 n d edn.
F i s h e r , R . A . (1951), Design of Experiments, Oliver & B oyd , 6th edn.
J o h n , J. A . , and Q ue noui lle , M . H. (1 977), Experiments: Design and Analysis,
Griffin.
W e theri ll , G . B. (1981), Intermediate Statistical Methods, C h a p m a n and Hall.

256 The design and analysis of comparative experiments


Chapter 11
The design and analysis of
experiments —
2 Factorial experiments

11.1 Introduction
We now turn our attention to experiments where the response variable
depends on several different factors. For example, the yield of a chemical
reaction may be affected by changes in the temperature, the pressure
and the concentration of one of the chemicals. It may also be affected
by whether or not the mixture is agitated during the reaction and by the
type of catalyst employed.
A factor is any feature of the experimental conditions which is of
interest to the experimenter. Factors are of two types. Firstly a quantita­
tive factor is one where possible values can be arranged in order of
magnitude. Any continuous variable, such as temperature or pressure,
is a quantitative factor. Conversely a qualitative factor is one whose
possible values cannot be arranged in order of magnitude. In the above
example the type of catalyst employed is a qualitative factor.
The value that a factor takes in a particular test is called the level
of the factor. Except for nuisance factors, we will assume that the levels
of the factors of interest can be determined by the experimenter. By
analogy with comparative experiments, a specific combination of factor
levels is called a treatment or treatment combination.
It is clear that comparative experiments can be thought of as a
special case of factorial experiments. In a simple comparative experi­
ment the individual treatments are the different levels of the one and
only factor. A randomized block comparative experiment can also be
thought of as a factorial experiment in which two factors are involved.
The individual treatments form one factor and the blocks form a
second factor.
The over-all objective of a factorial experiment may be to get a
general picture of how the response variable is affected by changes in
the different factors or to find the over-all combination of factor levels
which gives a maximum (or minimum) value of the response variable.
The latter problem is discussed in section 11.10.

257 Introduction
A common type of test programme is the classical one-at-a-time experi­
ment in which the value of the response variable is found for a particular
treatment combination, after which the factors are altered one-at-a-time
while keeping the other factors at their initial values. Such experiments
suffer from several serious defects and we shall see that it is often
preferable to run what is called a complete factorial experiment, in which
one or more trials is made at every possible combination of factor levels.
Such an experiment is often simply called a factorial experiment.

11.2 The advantages of complete factorial experiments


The most serious defect of one-at-a-time experiments is that they are
unable to detect interactions between factors, whereas complete
factorial experiments can. For example, let us suppose that we are
interested in finding the yield of a chemical reaction at two temper­
atures, To and , and at two pressures Pq and . In a one-at-a-time
experiment we would begin by taking one or more observations at
(TqPq), (TqP i ) and (T^Pq). Suppose the average observation is 100 at
(To Pq), 108 at (To Pi) and 105 at (Tj Po). The question then arises as
to what value the response variable will take at (Tj PJ.

E T 105

To 100 108

pressure

Figure 59 O ne-at-a-tim e experiment

If the average observation at (TjPi) happens to be 113, then the


effects of the two factors are said to be additive. In other words the
difference between the average observation at Tj and at Tq is the same
for both values of P (and vice versa). But if the average observation at
(TiPj) is anything but 113, then the two factors are said to interact.

258 The design and analysis of factorial experiments


Usually the experimenter will not know if there is an interaction
between the two factors and then it is necessary to take measurements
at (T^Pi) in order to find this out. When this is done we have a complete
factorial experiment. Two possible sets of results are shown in Figure
60. If there is no interaction between the two factors then the response
curves at different levels of the pressure will be parallel.

(a) (b)

^0 T,
temperature temperature

Figure 60 Two possible sets of results


(a) lines parallel, no interaction (b) lines not parallel, interaction

The results from a complete factorial experiment can be combined to


estimate the effects of the individual factors, called the main effects,
and also to estimate the interactions between the factors. For example,
let us suppose the average observation at TiP^ turns out to be 109.
The main effect of pressure is defined to be the average difference
between the yield at P^ and at Pq which is 108-5 —102-5 = 6-0. Similarly
the main effect of temperature is given by 107—104 = 3. The interac­
tion between the two factors is defined to be half the difference between
the temperature effect at high pressure and at low pressure. This is
^(1 —5) = —2. Note that this is the same as half the difference between
the pressure effect at high temperature and at low temperature.
The second major advantage of complete factorial experiments is
that they are the most efficient way of estimating main effects even if no
interaction is present. Suppose we are interested in the effects of two
factors A and B, on a given response variable. Factor A is tested at levels
A q and A^, and factor B at B q and B^. The simplest one-at-a-time
experiment consists of one test at {AqB q), (A qB^) and (A^B q). Denoting
the observations by the appropriate small letters we have {a^ ¿o~^o^o)
is an estimate of the main effect of A and (ao^i —^ 0^ 0) is an estimate of
the main effect of B. Each of these estimates is based on just two

259 The advantages of complete factorial experiments


observations and so, since there may be experimental error, it is
advisable to duplicate each observation and find the average differences.
We then have a total of six observations and each estimate of a main
effect is based on four of these.
The simplest factorial experiment consists of one observation at
each of {AqB qX (AqB^),(A i B q) and {A ^ B i ) - a. total of four observations.
However both (<3ifco""^o^o) ~ e s t i m a t e s of the
main effect of A, and the average of these two quantities will have the
same precision as the estimate from the one-at-a-time experiment,
since both estimates are based on four observations. Similarly the
average of (ao h i —«o bo) and (ii i b i —a i i>o) will have the same precision
as the earlier estimate of the main effect of B. Thus the complete fac­
torial experiment achieves the same precision as the one-at-a-time
experiment but with two less observations.
Scientists are usually induced to perform one-at-a-time experi­
ments through a desire to perform as few tests as possible. But this is
obviously a false economy if one misses an important interaction, or if
one could have estimated the main effects more efficiently with a
complete factorial experiment. However, we should point out that
there will be occasions when certain treatment combinations are silly
or dangerous so that a complete factorial experiment cannot be
performed. Then a one-at-a-time experiment may be a sensible
approach.

11.3 The design of complete factorial experiments


We will concentrate here on experiments involving two factors since
it is a fairly straightforward matter to extend the discussion to a
greater number of factors. If factor A is investigated at r levels and
factor B at c levels, we have an r x c complete factorial experiment. If
each treatment combination is replicated n timds, the total number of
tests required is given by r x c x n . As in comparative experiments,
randomization should be used whenever possible, either by assigning
the experimental units randomly to the treatment combinations or by
performing the tests in a random order. If blocking is required, it is
often convenient to let one replication of all treatment combinations
form a block.
In order to illustrate some of the practical problems involved we will
consider an extended version of one of the experiments considered in
Chapter 10. Suppose we want to design an experiment to compare two
new burners and B2 with a standard burner B3, and also to test the
effect of a new fuel additive on the efficiency of a turbo-jet engine.

260 The design and analysis of factorial experiments


The fuel will be tested in two different proportions - low and high.
These two proportions denoted by F 2 and F 3, together with the absence
of the fuel additive - denoted by F^ - are the three possible levels for
the fuel factor. The instrumentation has been developed over several
years and it is known that if two tests are made on each treatment
combination, then the average of the two results will be sufficiently
precise to detect the smallest difference considered to be of practical
significance.
We have already seen that a complete factorial experiment is
superior to a classical one-at-a-time test. Nevertheless it will be
instructive to see the dangers of the one-at-a-time test in this particular
case. The test plan for such an experiment is shown below.

Design one
One-at-a-time test
T e st n u m ber B u rn er F uel
1,2 Standard ( B 3 ) Standard (Fj)
3,4 Burner 1 (BJ Standard (Fj)
5,6 Burner 2 (B2) Standard (Fj)
7,8 Best burner Low additive (F2)
9,10 Best burner High additive ( F 3 )

Each pair of results is averaged and the best of the three burners with
no fuel additive is selected for the last four tests in order to try out the
fuel additive.
Two important criticisms of this design can be made. Firstly,
because of the sequential nature of the experiment, it is not possible
to randomize the order of the tests and so a nuisance factor may
systematically affect the results. Secondly this design cannot detect
interactions between the burners and the fuel, so there is a distinct
possibility that the experiment will not find the best burner-fuel
combination.
Two possible relationships between the burner efficiency, the burners
and the fuel additive are shown in Figure 61. In the first graph the
response curves are parallel so that there is no burner-fuel interac­
tion. In this case the one-at-a-time test would find the best burner-fuel
combination. But in the second graph it is clear that the effect of the
fuel additive depends on the type of burner employed. In this case the
one-at-a-time test would select as the best combination while
in fact F 2B 2 is better. Since it is well known that there is usually an

261 The design of complete factorial experiments


(a)

Figure 61 Two possible types of relationships fuel additive


(a) no interaction (b) interaction

optimum burner for a particular type of fuel (in other words the design
of the burner interacts with the type of fuel), it would be quite wrong to
employ a one-at-a-time test in this case.
The second design we will consider is a randomized complete fac­
torial experiment. With two factors each at three levels there are nine
treatment combinations. Two observations are made at each treatment
combination so that eighteen tests are required. The order of these tests
is completely randomized and a typical test sequence is shown below.
Design two
Fully randomized factorial experiment
B u rn er
B2 B3
Fuel Fi 5,2 18,3 11,14
Fa 15,13 1,10 6,16
F3 4,7 12,9 17,8
(Numbers indicate the test sequence.)

In order to insure against severe trends in the data caused by uncon­


trolled factors, it is a good idea to plot the results in the order in which
they are taken. This gives what is called a time-sequence plot. If an

262 The design and analysis of factorial experiments


obvious trend is visible in the results its cause should be found and if
possible removed, after which the experiment can be repeated. For
example, eighteen tests at maximum turbine inlet temperature may
be beyond the life-span of a newly designed experimental engine, so
as the engine deteriorates the values of the response variable will
systematically decrease.
In Design Two the reader will notice that both measurements on
B iF i are performed early in the experiment while both experiments
on B 1 F 2 happen to be performed much later in the experiment. If
there is a trend in the results, this would mean that Bj Fj has an unfair
advantage over Bj F 2. This potential difficulty can be partly overcome
by blocking. Divide the experiment into two blocks of nine tests and
make one observation on each treatment combination in each block.
A design of this type is called a randomized block factorial experiment.
A typical test sequence is shown below.

Design three
Randomized block factorial experiment
B u rn er
Bi B2 B3
F uel F, 5,13 3,18 7,14
F2 2, 15 1,10 6,16
F3 4,11 9,12 8,17

Tests 1 to 9 are in the first block and tests 10 to 18 in the second block.
A similar design may also be used if the test programme is carried out on
two engines or with two batches of fuel. Within each block the order of
the tests is randomized and we have a randomized block factorial experi­
ment. This is the best design for the particular experiment we have
described, though a simple randomized factorial experiment will
sometimes be adequate in other situations.
The data from the above experiment is given later in Example 1, and
will be investigated after we have discussed the analysis of a factorial
experiment in general terms.

11.4 The analysis of a complete factorial experiment


In this section we discuss the analysis of complete factorial experiments,
concentrating, as in section 11.3, on two-factor experiments. The first
step in such an analysis is to plot the results as shown in Figure 61,
since useful information can often be obtained simply by looking at the
resulting graph. For example, if the lines are approximately parallel.

263 The analysis of a complete factorial experiment


then there is no evidence of an interaction between the factors, whereas
if the lines are skewed to one another then an interaction is probably
present. Note that if one of the factors is quantitative (for example, the
amount of fuel additive), this factor should be chosen as the horizontal
axis in order to show up a functional relationship with the response
variable.
If there is no evidence of an interaction, it is useful to calculate the
main effects of the two factors by tabulating the results in a two-way
table and calculating the row and column averages. These can be
compared with the over-all average. However if an interaction is
present, these main effects cease to have much meaning by themselves
as the effect of one factor will depend on the level of the other factor.
Occasionally the above plotting and tabulation will be sufficient
since the conclusions will be ‘obvious’. For example, if the difference
between two observations on the same treatment combination is ‘small’
compared with the differences between observations on different
treatment combinations, then it may be possible to find the best burner-
fuel combination by inspection. However the residual variation will
usually be sufficiently large to make it impossible to come to such a
clear-cut conclusion. In any case the experimenter is often concerned
with estimating the treatment differences rather than just finding the
best treatment combination. Thus a mathematical model for a
factorial experiment will be proposed, after which the factor effects can
be estimated. Then these effects will be tested to see if they are signifi­
cantly large.
Let us suppose that factor A is investigated at r levels and factor B
at c levels and that the experiment is replicated n times. The following
model is proposed to describe this situation.
^ijk — fi-\-Ai~\-Bj + {A x B)ij + £.i j k
(i = l , . . . , r 7 - l,...,c /c = l ,...,n ) ,
where /i = over-all average,
Ai = effect of A at ith level,
Bj = effect of B at 7th level,
{A X B)ij = joint influence of A at ith level and B at 7th level; that is
the interaction effect
Sijk = random error.
Since fi is the over-all average it can be shown that

264 The design and analysis of factorial experiments


I j
It can also be shown that
Y^{A x B \ j = 0 (for all 7)
i
and
^ (/4 X B)ij = 0 (for all i).
j
It is again convenient to assume that the errors are normally distributed
with mean zero and constant variance and successive errors are
independent.
The data can be tabulated in the following way.
T a b le 2 0 Factor B

Row Row
Level 1 Level 2 • Level c
total average

Level 1 Xj 11 , . . . X1 2 1 . • • • ,^l2n Xi,i,... T,. Xf.

Level 2 X2 1 1, . . . X2 2 1 , ••• ’ ^22n X2ci , . . . ,^2cn T2- ^2-

Level r X, 2 i , . . . ,^r2n X, , 1, . . . Tr. X,.

Column
T.i T.2 T
1 .f.
total

Column
average
x.i X.2 x.^

The following quantities are calculated as shown:


T,
Xi. = X.;
nc nr
t = ^ t, = y ^ T j ,
i j
T
X =
nrc
Tij = (sum of observations in (i,7')th cell),
Xij = (average observation in cell)
T

265 The analysis of a complete factorial experiment


It can be shown that the best unbiased estimates of the model para­
meters are given b y :

Bj = x .j - x .

The next step is to test the main effects of the two factors and the
interaction effect to see if any of them are significantly large. The
best way of doing this is by an analysis of variance. The total sum of
squares, — is partitioned into four components. Two of these
components, the row sum of squares and the column sum of squares,
have the same value as the two-way A nova described in section 10.11.
In addition the interaction sum of squares is calculated. The required
formulae are given below.

T a b le 21
T w o -fa c to r Anova w ith In tera ctio n

Source Sum of squares d.f. E {mean square)

M a in effect A n c'^ {X i.-x f r -1


i ^ r — 1
(row s)

M a in effect B n rY ,(x .j-x f c —1


j ^ c —1
(co lu m n s)
2 , V
In tera ctio n n Y ,iX ij-X i.-x.j + x)^ (r-l)(c-l)
ij ^ (r-\){c -\)

R esid u a l rc{n—l)
ij.k

T o ta l Z (Xijk-x)^ rcn —1
i.J.k

The observed mean squares of the different effects are obtained by


dividing the appropriate sum of squares by the appropriate number of
degrees of freedom. The expected values of these mean squares are
shown above. It is clear from these quantities that the A, B, and interac­
tion mean squares should be compared with the residual mean square
by means of an F-test. For example, to test the hypothesis
= 0,

266 The design and analysis of factorial experiments


calculate
A mean square
F=
Residual mean square
If this exceeds i^o o5,r-i,rc(n-i)^ then / / qi is rejected at the 5 per cent
level. A similar procedure is adopted to test the hypotheses
H 02 : all Bj = 0 //o3:all ( A x B h = 0,
We will not attempt to derive the algebraic quantities shown in
Table 21. However it is worth stressing that the total sum of squares is
equal to the sum of the other sums of squares. Similarly the total
number of degrees of freedom is equal to the sum of the constituent
degrees of freedom.
The computation would normally be performed with a computer
package and involves calculating the row and column totals, the
individual cell totals, the over-all total, the correction
ij,k
factor T^lnrc. Then

(Total corrected sum of squares) =

■ 1 nrc

(Row sum of squares) = nc Yj (^i ~^)^


i= 1

z n T2
i= 1
nc me

(Column sum of squares) = nr Y

z n T2
i=l
nr nrc

■^ij?

zn-
ij
n
The interaction sum of squares can then be obtained by subtraction.

267 T h e analysis o f a c o m p le te fa c to ria l e x p e rim e n t


It is now a simple matter to calculate the mean squares and the
F-ratio and see if the main effects and the interaction effect are signifi­
cant.

Example 1
Two replications were made in the experiment to compare three
burners and to investigate a new fuel additive. The data is given below.
Burners
Cell Cell Cell Row
B. total B2 total B3 total total

F, 16 34 19 36 23 43 113
18 17 20

Fj 19 39 25 48 19 37 124
20 23 18

F3 22 43 21 45 19 36 124
21 24 17

C o lu m n 116 129 116 T= 361


to ta l

n = 2,r = X c = XY. xfjk = 7351 and T ^IS = 7240-0.


The Anova table is given below.

Source Sum of squares d.f. Mean square F

M a in F u el 13-5 2 6*8 3*3


effect

M a in B urners 18-8 2 9.4 4-5


effect

In tera ctio n 60-2 4 150 7-3

R esid u a l 18-5 9 2-1

T o ta l 111 17

■^0 0 5 ,2 ,9 = 4*26 “^0 05 , 4 , 9 “ 3 -6 3 .

268 The design and analysis of factorial experiments


The interaction and burner effects are both significant at the 5 per cent
level but the main effect of fuel is not significantly large. This does not
mean that the fuel factor does not affect the response variable. Since
the interaction is significantly large, the effect of a particular burner
depends on the level of the fuel factor. We shall see how to interpret
these results in the next section.
In the above example the block effect is included in the residual
variation since there is no evidence of block to block variation. Let
T.fc = sum of observations in kih block. Then in Example 1, T.i = 183
and T .2 = 178, and these values are close together. Generally with n
replications, the quantity ~ ^ij)^ can be split into two components

Z (>^iJk-X..k + X- '+rc Y, ( x ..^ - x f.


i.j,k ij,k k= 1

The first component has (rc—l)(n—1)d.f. and measures the residual


variation. The second component has (n—l)d .f and measures the
block effect, and can be calculated with the following formula

_
rc ^ ( x . . ^ - x f =
kt
'
rc nrc
Then the true residual sum of squares can be calculated by subtracting
this quantity from ^ —^ 17)^. Note that once we have separated
ij,k

the block effect, we have actually analysed a three-factor situation


since the different blocks can be thought of as the levels of a third
factor.
One other point of interest is that if only one replication is made
(that is, n = 1), then, by referring to Table 21, it can be seen that the
total degrees of freedom are exhausted by the main effect and interac­
tion degrees of freedom. The formulae are then exactly the same as in a
randomized block experiment (see Table 18), except that the residual
sum of squares becomes the interaction sum of squares. Thus it is
necessary to replicate the experiment in order to estimate the residual
variation.
11.5 Follow-up procedure
The analysis of variance is a straightforward method for deciding if
the main effects and/or the interactions are significantly large. But
the follow-up procedure of interpreting the results is by no means so
clear cut.

269 Follow-up procedure


We begin by discussing the situation where there is no interaction.
In this case the two factors can be examined separately. If one of the
factors is qualitative and has a significant main effect, it is possible to
find the level or levels which give the largest values of the response
variable. This can be done, as in Chapter 10, with a least significant
difference test. If one of the factors is continuous and has a significant
main effect, the problem is to find the functional relationship between
the response variable and the factor. In other words we have a regression
problem.
The true type of functional relationship is usually unknown but it is
often convenient to fit a polynomial to the data of the form
y = + + —
If the factor is investigated at c levels, it is possible to fit a polynomial
up to degree (c —1), although linear and quadratic terms are often all
that are required. Note that the number of independent parameters
involved in a polynomial model of degree (c—1) is the same as the
number of independent parameters involved in a model of type lOA
or lOB. However if a polynomial whose degree is less than (c—1) fits
the data adequately then fewer parameters will be required in the
polynomial model. In any case the polynomial representation will be
more meaningful for a continuous variable.
One way of finding the lowest degree polynomial which adequately
fits a set of data is by the method of orthogonal polynomials (see
section 8.7). The total sum of squares is split up into a linear component,
a quadratic component, and so on. In contrast, the main effect sum of
squares given in Table 21 does not take into account the order of the
levels of the factor. This means, for example, that a linear trend in the
response variable may be overlooked if the non-linear effects are small,
since the main effect sum of squares in Table 21 is divided by (c—1)
rather than one, and so may not give a significant result. The ^reader
should be aware of such a possibility and be prepared to evaluate the
different components of the main effect sum of squares (see, for example,
Davies, 1956).
We now turn our attention to the situation where there is an interac­
tion between the two factors. In this case the main effects cease to have
much meaning by themselves, since the effect of one factor depends on
the level of the other factor. In particular a factor cannot be discounted
because its main effect is small if an interaction is present. If one of the
factors is qualitative the results should be examined at each level of this
factor. If the other factor is continuous, the functional relationship

270 The design and analysis of factorial experiments


between the response variable and this continuous variable will
depend on the level of the qualitative variable. Alternatively if both
factors are continuous variables, then we have a multiple regression
problem with two controlled variables.

11.6 The T factorial design


A special type of complete factorial experiment is one in which n
factors are each investigated at just two levels. There are then 2”
possible treatment combinations. Such an experiment is useful when a
large number of factors have to be considered since it would require too
many tests to run each factor at more than two levels. An experiment
such as this which picks out the important factors from a number of
possibilities is often called a screening experiment.
For simplicity we begin by considering an experiment involving
just three factors A, B and C, each of which is investigated at two
levels; high and low. It is useful to have a systematic method of designa­
ting the treatment combinations in such an experiment. We will use
the appropriate small letter a, b and c when the corresponding factor
is at the high level. The absence of a letter means that the corresponding
factor is at the low level. Thus the treatment combination ab is the one
in which A and B are at the high level but C is at the low level. The
symbol (1) is used when all factors are at the low level. The eight
possible treatment combinations are shown in Figure 62.

Figure 62 The eight possible treatment combinations in a 2^ experiment

If just one observation is made at each treatment combination


we have a total of eight observations. These results can be combined in

271 The 2” factorial design


seven different ways to estimate different effects. For example, there are
four observations at the high level of A and four observations at the low
level. The average difference between them is an estimate of the main
effect of A. Denoting the observations by the corresponding treatment
combination we have
(main effect of A) = a h a c abc — —b —c —be].
Similarly the main effect of B is given by
^[b-\-ab-\-bc-\-abc —{l) —a —c —ac].
An estimate of the AB interaction is given by half the difference
between the main effect of A, at the high level of B, and the main effect
of .4, at the low level of B. Thus we have

{AB interaction) = j[j{ab + abc —b —bc) —^{a-\-ac —{l) —c)]


= ^[ab-\-abc + {l)-\-c —b —hc —a —ac].
Similar expressions can be found to estimate the main effect C, the
AC and BC interactions and also the ABC interaction. Each of these
estimates, which are called contrasts, is a simple linear combination of
the eight observations.
It is instructive to tabulate these contrasts in the following way

Table 22
Effect

A verage A B AB c AC BC ABC

(1 ) — — + — + + —

a + - - - + +
b + - + - - + +
ab + + + - - - -
c + - - + + - - +
ac + - - + + - -
be + - + - + - + -
abc + + + + + + + -h

1 1 1 1 1 1 1 1
F a c to r 8 4 4 4 4 4 4 4

Each effect is obtained by adding all the observations which have


a plus in the appropriate column and subtracting all the observations
which have a minus in this column. The result is divided by four. The
calculations are completed, as shown in the first column of the table,
by adding up all the observations to obtain the over-all average and
dividing by eight.

272 The design and analysis of factorial experiments


Students with a knowledge of matrix theory will recognize that the
columns are orthogonal. In simple terms this means that each of the
last seven columns contain four plus and four minus signs, and if any
two columns are cross-multiplied in pairs, using the rule ( + ) x ( + ) =
( —) x ( —) = + and ( + ) x ( —) = —, then four plus and four minus
signs will result. The practical result of this orthogonality is that the
estimate of one effect will not be affected by changes in any of the
other effects.
Generally it is always a good idea to see that an experimental design
has the property of orthogonality. In particular a complete factorial
experiment is orthogonal if each treatment combination is tested the
same number of times so that the experiment is symmetric with regard
to all the factors. Orthogonal designs have several advantages. Firstly
the resulting calculations are simplified. Secondly the estimate of one
effect is not affected by changes in one or more of the other effects.
Thirdly they are efficient in the sense that for a given number of tests
the effects can be estimated with greater precision using an orthogonal
design than with any other type of design. The commonest form of non-
orthogonal data arises when one or more observations are missing
from a complete factorial experiment. Such data is much more difficult
to analyse.
Another interesting point about Table 22 is that all the interaction
columns can be obtained from the T, B and C columns by multiplying
the appropriate columns. For example, the AB column can be obtained
by multiplying the A and B columns.
An important feature of the 2” experiment is that an analysis of
variance is easy to carry out once the effect totals have been calculated.
Each effect total is squared and divided by T and this gives the sum of
squares corresponding to that effect. A systematic method of estimating
the effects and of performing the analysis of variance has been proposed
by Yates. List the treatment combinations in a systematic way (see
Table 22 and Table 23) and beside them list the corresponding observa­
tions. With n factors, n columns have to be calculated. Each column
is generated from the preceding column in the same way. The first 2”“ ^
numbers in a column are the sums of successive pairs of numbers in the
preceding column. The next 2"“ ^ numbers are the differences of
successive pairs in the preceding column ; the first number in a pair
being subtracted from the second number. Then the final column gives
the effect totals corresponding to the particular treatment combination.
The effects can be estimated by dividing the effect totals by 2”“ L The
sum of squares for each effect is obtained by squaring the effect total
and dividing by 2”.

273 The 2” factorial design


After calculating the sum of squares for each effect, the next problem
is to see which of the effects are significantly large. In order to do this
we need an estimate of the residual variation. Unfortunately we have
already seen that if a factorial experiment is only replicated once, then
the main effects and interactions account for all the degrees of freedom.
Thus in a 2"^ experiment there are fifteen degrees of freedom which are
allocated as follows.

d .f .

M a in effects 4
T w o -fa c to r in te ra ctio n s 6
T h re e-fa cto r in te ra ctio n s 4
F o u r -fa cto r in te ra ctio n s J
T o ta l l5

One way round this difficulty is to use the fact that interactions involv­
ing more than two factors are rarely of practical significance. Thus the
residual mean square can be estimated by combining the sums of
squares of interactions involving three or more factors and dividing
by the corresponding number of degrees of freedom. The main effects
and two-factor interactions each have one d.f., and so the mean square of
each of these effects is the same as the corresponding sum of squares.
A series of F-ratios can now be obtained by dividing the main effect
and two-factor interaction mean squares by the residual mean square.
This enables us to determine which effects are significantly large.

Example 2
The following data are the results from a 2"^ factorial experiment.

Bi B2 B2

0, 18 16 12 10
^1
D2 14 15 8 9

16 19 12 13
^2 D, 14 11 7 8

Use Yates’s method to estimate the effects. Hence perform an analysis of


variance and see which effects are significantly large.

274 The design and analysis of factorial experiments


Table 23

O b s e r v a tio n I II III E ffe c t to ta l E ffe c t E f f e c t S .S .

( 1) 18 30 56 116 202 12-6


a 12 26 60 86 -4 4 -5 - 5 121-0
h 16 28 46 -22 0 00 0-0
ab 10 32 40 -22 2 0-25 0-25
c 16 22 -12 0 -2 -0 -2 5 0- 25
ac 12 24 -10 0 4 0-5 1- 0
be 19 21 -12 -2 4 0-5 1-0
abc 13 19 -10 4 2 0-25 0-25
d 14 -6 -4 4 -3 0 -3*75 56-25
ad 8 -6 4 -6 0 0-0 0-0
bd 15 -4 2 2 0 00 0-0
abd 9 -6 -2 2 6 0*75 2-25
cd 14 -6 0 8 -1 0 -1*25 6-25
acd 7 -6 -2 -4 0 0-0 0-0
bed 11 -7 0 -2 -12 -1 - 5 9-0
abed 8 -3 4 4 6 0-75 2-25

The three and four factor interaction sums of squares are combined
to give
(residual sum of squares) = 0-25+ 2-25+ 0*0+ 9*0+ 2-25
- 13-75.
This quantity is based on five degrees of freedom so that
13-75
(residual mean square) = = 2-75.
The F-ratios are as follows:
E ffe c t F -r a tio
A 44-0
B 0-0
C 0-1
D 20-5
AB 0-1
AC 0-4
AD 0-0
BC 0-4
BD 0-0
CD 2-3
^ 0-05,1 - 6-61 ^001,1,5 — 16-26

275 The 2” factorial design


We conclude that the A and D effects are highly significant but that
none of the other effects are significant.

11.7 Fixed effects and random effects


So far we have only considered models which are often referred to as
fixed-effects models. For example, in the experiment to compare three
burner designs, the effects of these burners can be considered to be
fixed since the experimenter is interested in these particular designs and
in no others. However there are some situations in which it is desirable
to try and draw conclusions about a wider population than that covered
by the experiment. For example, suppose we want to compare the per­
formance of a chemical plant on different days. Then it is reasonable to
consider the days on which tests were made as a random sample from
the population of all possible days. In this case the factor ‘days’ is
called a random factor and the corresponding mathematical model is
called a random-effects model, or components-of-variance model. Using
a one-way classification the random-effects model is as follows:
Xij = fi-hti + Sij (i-= 1 ,... ,c \j = 1 ,... ,n),
where Efi) = 0, and the experimental values of the ti are a random
sample from a normal distribution with mean zero and variance
af. Also assume that Sij are independent N(0, cr^). The fixed-effects
model looks somewhat similar but the treatment effects, {ij, are fixed
and subject to the restriction ^ = 0.
The fixed-effects model is concerned with testing the hypothesis
H q: all ti = 0.
Whereas the random-effects model is concerned with testing the
hypothesis

With a one-way classification it turns out that the method of analysing


the data is the same. In the one-way A n o v a it can be shown that

fixed effects.
E (treatment mean square) =
^(j^ + naf random effects.
In the fixed-effects model the quantity ^ tf/{c— 1) can be thought of as
measuring the spread of the i,s, and is therefore analogous to af.

276 The design and analysis of factorial experiments


However although the analysis is the same in both cases, the distinction
between the two types of model is still important because the hypotheses
tested are not the same.
The distinction between the two types of model is even more impor­
tant with more than one factor, since the method of analysing the
data may be different. For example, if there is a significant interaction
in a two-way analysis of variance, the factor mean squares must be
compared, not with the residual mean square, but with the interaction
mean square. The reader is referred for example to Wetherill (1981,
Chapter 14). Also note that it is possible to have a mixed model in
which some of the factors are fixed and some are random.

11.8 Other topics


The material discussed up to this point covers the most important
aspects of experimental design. To round off our discussion we will
briefly mention several other topics, the details of which can be obtained
from the references.

11.8.1 Nested designs


In a complete factorial experiment a test is made at every possible treat­
ment combination, and for this reason such a design is often called a
crossed design. However it is sometimes impossible or impractical to do
this and then a nested design may be appropriate. For example, suppose
that samples of a particular chemical are sent to four different labora­
tories. Within each laboratory two technicians each make two measure­
ments on the percentage of iron in the chemical. This situation can be
represented diagrammatically as follows :

(L stands for laboratory, T for technician and M for measurement.)


It is obviously impractical to insist that each technician should make
two measurements in each laboratory and so we say that the factor
'technicians’ is nested within the factor 'laboratories’.

277 Other topics


The following is a suitable m odel:
== + + + 1,2, 3,4; 7 = 1,2; /c = 1,2),
where = kth observation by the yth technician in the iih lab,
s, == effect of the iih lab and = effect of the jth technician within the
iih lab. One result of this nested design is that it is impossible to detect a
technician-laboratory interaction.
11.8.2 Confounding
In a complete factorial experiment, the number of tests which form a
homogeneous group may be less than the number of tests required to
perform one replicate of the experiment. If higher order interactions
are of little importance, a technique called confounding can be used to
divide the tests into smaller blocks in such a way that information
about the differences between blocks is ‘mixed up’ or confounded with
the information about high-order interactions.

11.8.3 Fractional factorials


Sometimes, a complete factorial experiment may require too many
tests. For example, with seven factors, the simplest complete factorial
requires Tf = 128 tests. If only main effects and low order interac­
tions are of interest (as in a screening experiment), then a carefully
chosen subset of the possible treatment combinations may be ade­
quate. For example a quarter replicate of a iJ experiment requires
128/4 = 32 tests. Such designs, called fractional factorials (e.g. Box et
aL, 1978, Daniel, 1976), are used in the so-called Taguchi methods
(see p.317).

11.8.4 Analysis of covariance


This technique is a combination of the methods of regression and the
analysis of variance. Suppose that a randomized block experiment is
carried out in order to estimate a number of treatment effects. It
sometimes happens that the experimenter wishes to make these
estimates after adjusting the observations for the effects of one (or
more) continuous variables which have been measured at the same
time. For example, the efficiency of the burner of a turbo-jet engine
depends on the ambient temperature, and so it would be wise to estimate
the differences between burners after taking the ambient temperatures
into account. The additional variable is often called a concomitant
variable. If an analysis of variance is performed on the original data
without adjustment, and if the response variable really does depend

278 The design and analysis of factorial experiments


on the concomitant variable, then the results of the analysis will be
inaccurate.
The simplest case is one in which the response variable, y, depends
linearly on the concomitant variable, .y, in which case a suitable model
is as follows:
y,•. = + tj -f aiXij - X) + Sij,
where \\j = observation on treatment / in block /,
x.j = corresponding value of the concomitant variable.

11.9 The examination of residuals


It should be clear by now that the standard procedure for analysing
data is to propose a mathematical model to describe the physical
situation, to estimate the unknown parameters of this model, and hence
to draw conclusions from the data. In order that the conclusions should
be valid, it is important that the model should be reasonably accurate.
Thus the assumptions on which the model is based should be carefully
checked. This can be done by looking at the residuals. The residual of
an observation is the difference between the observation and the
value predicted after fitting the model.
residual = observation —fitted value.
In section 10.9 we discussed how to check the assumptions involved
in the mathematical model which was proposed to describe a simple
comparative experiment. A similar technique can be applied to many
other situations, including analysis of variance and regression models.
A preliminary examination of the residuals should be sufficient to
detect any gross errors in the models. For example, if a straight line is
mistakenly fitted to pairs of observations which actually follow a
quadratic relationship, then a series of positive residuals will be followed
by a series of negative residuals and vice versa. When the residuals are
clearly non-random it may be that the type of model is incorrect or
that the parameters of the model have been estimated incorrectly.
For example, if an arithmetical mistake has been made we might find
that all the residuals have the same sign.
Another useful procedure is to plot the residuals against external
uncontrolled variables. For example, if the tests are made sequentially,
the residuals can be plotted against time. If they do not appear to be
random, the experimenter should try to find the cause of the non­
randomness and if possible remove it.

279 The examination of residuals


positive residuals

Figure 63 The result of mistakenly fitting a straight line to data showing a


non-linear relationship

Another important use of residuals is to detect outliers in a set of


data. An outlier is a ‘wild’ observation which does not appear to be
consistent with the rest of the data. There are various procedures for
detecting outliers (e.g. see Barnett and Lewis, 1984) including tests
on large residuals. When a suspect value has been detected, the
analyst must decide what to do about it. If possible the original
records should be checked in case there is a mistake. However, a
large residual need not indicate an error. It may result from fitting the
wrong model or from a genuine extreme observation. An outlier can
be removed completely (called trimming) or replaced with some
imputed value such as the overall mean or the next largest or smallest
observation (called Winsorization). It may be desirable to use robust
methods of analysis which automatically downweight extreme
observations. Alternatively, it may be helpful to carry out an analysis
with and without an outlier to see if the results depend crucially on
just one or two observations. Further remarks on checking data
quality are given by Chatfield (1988, section 6.4).

11.10 Determination of optimum conditions


Up to this point we have considered situations in which the objective
has been to obtain a general picture of how the system is affected by
changes in the controlled factors. However it sometimes happens,
particularly in the chemical industry, that the prime objective is simply
to find the conditions which maximize (or minimize) some performance
criterion. For example, we might want to minimize the total cost per
unit yield. Since a minimization problem can be converted to a maxi­
mization problem, we will only consider the latter.

280 The design and analysis of factorial experiments


Firstly it is important to understand what is meant by a response
surface. The response variable, y, is an unknown function of several
variables, X j,...
y = f i x i ....... Xfc).

Figure 64 A contour diagram

We will consider the case where all the variables are continuous.
In the case of one or two controlled variables, it is convenient to think
of this function in geometrical terms. With just one variable, x^, the
relation between y and can be represented by a curve whereas with
two variables, Xj and X2, the relation between y and Xj and X2 can be
represented by a surface whose height, at particular values of the
controlled variables, is equal to the corresponding value of the response
variable. These values of the response variable generate what is called
a response surface. It is often convenient to represent this response
surface by a contour diagram, similar to a geographical map. On each
contour the value of the response variable is a constant.
The problem of maximizing y is complicated by several factors.
Firstly there may be restrictions on the values which the controlled
variables can take; secondly we may not know the type of functional
relationship between the response variable and the controlled variables,
and thirdly there will be measurement errors in the observed values of
the response variable. However in chemical processes the coefficient
of variation of observations made under identical conditions is usually
reasonably small.
Methods of maximizing y are discussed in Davies (1956). A straight­
forward though laborious procedure is as follows. With a single
controlled variable, x, measure the response variable over a wide range

281 Determination of optimum conditions


of values of x, and plot the results on a scatter diagram. If a smooth
curve is drawn through the observations, an estimate of the maximum
will usually be visible. Alternatively a suitable function can be fitted
to the data by the method of least squares. For example, we could
assume that there is a quadratic relationship between y and x of the
form
y = aQ-yaiX-\-a2 X^.
Estimates of Uq, and «2 can be found as shown in Chapter 8. Then an
estimate of the optimum value of x can be found by setting dy/dx equal
to zero. This gives

X = —
2d2‘
If there are k controlled variables, the response variable can be mea­
sured at a grid of points throughout the region of interest. In other
words a complete factorial experiment is performed. A response surface
can be fitted to the data and this enables us to estimate the optimum
conditions. Unfortunately this procedure may require a large number
of tests.
An alternative, and perhaps more efficient, approach is to proceed
iteratively (e.g. Cochran and Cox, 1957; Box and Draper, 1987).
With one controlled variable, the response variable is measured at
what is considered to be a reasonable staring point, say Xq, and also at
a somewhat higher value Xq 4- a. The position of the third measure­
ment depends on the first two results. It is made above Xq + a, if the
second observation is the larger, and below Xq, if the first observation
is the larger. However if the first two observations are about the
same, the third measurement can be made between Xq and Xo + fl.
This step procedure continues until a maximum is found.
In the case of two controlled variables a procedure called the
method of steepest ascent can be employed. If the initial conditions
at the point P are not too close to the maximum, the response surface
can be approximated locally by a plane.
y = Uo-f-UiXi -fU2^ 2 -
If the experimental error is small, these parameters can be estimated
with a 2^ experiment around P. Then an estimate of the path of steepest
ascent can be obtained by making changes in the {xj which are
proportional to the {<5J. The reader is warned that this path will
depend on the scales in which the controlled variables are measured.

282 The design and analysis of factorial experiments


P denotes initial conditions
X2
Qdenotes optimum conditions

Figure 65 (a) Path of steepest ascent goes close to Q (b) Path of steepest ascent
does not go close to Q

Best results are obtained if the contour lines are approximately circular;
if the contour lines are elongated ellipses then the results will be rather
poor. The reason for this is that the path of steepest ascent is not
preserved when a linear transformation is made on one of the controlled
variables. The best procedure is to standardize the controlled variable
in such a way that unit changes in each variable can be expected to have
about the same effect. This is best illustrated by an example.

Example 3
The percentage of the theoretical yield of a chemical process, y, depends
on the temperature Xj and the percentage of one of the chemicals, X2 .
The objective of the experimental programme is to maximize y. A reason­
able starting point is known to be x^ = 425°C and X2 = 11 per cent.
A change of 25°C in Xi and a change of 1 per cent in X2 are thought
to be roughly equivalent. A 2^ experiment was carried out and the
results are given below.

400 450

10 50 52
^2 12 54 56

It is possible to fit a plane to the data as it stands, but it is much better


to standardize the controlled variables with the following linear
transformations
-425 x^ —11
Xi =
25

283 Determination of optimum conditions


This will not only give a better path of steeper ascent but will also
simplify the arithmetic considerably.
^'1
-1 +1

-1 50 52
+1 54 56
Assume that the response surface can be represented locally by the
plane
y = +
Then estimates of «o, and ^2 can be obtained by the method of
least squares, as shown in section 8.6. Since the experiment is orthogonal
and the controlled variables have been standardized we have
Y j ^'u ~ Z ^ 2i = Z = 0* [Note: / = 1 to 4 in all summations].
This considerably simplifies the least squares normal equations to give
do = y = 53,
Y yi i 52 + 56 —54 —50
a, = = 1,
4
54 + 5 6 - 5 2 - 5 0
ai = 2.
Z 4
Thus the estimated plane is given by
y = 53 + x'i + 2x 2 .
The path of steepest ascent can be obtained by changing x\ and x '2 in
the ratio 1:2. But a change of one unit in x\ is equivalent to a 25°C
change in Xj, and a change of two units in x '2 is equivalent to a change
of 2 per cent in X2 . Thus the path of steepest ascent can be obtained by
starting at x^ = 425°C and X2 = 11 per cent, and changing Xj and X2
in the ratio 25°C :2 per cent. Some possible points for further analysis
are given below.
^2
Starting point 425 11
437^ 12
Possible points 450 13
462i 14
475 15

284 The design and analysis of factorial experiments


Observations are made along this line until a maximum is found, then
the procedure can be repeated.
Note that if the variables are not standardized, the least squares
equations will be much more difficult to solve. In addition the response
contours are very elongated ellipses in the original units whereas they
will be closer to circles in the standardized units. This means that the
method of steepest ascent will get to the maximum more quickly by
using the standardized units.
In the second stage of the optimization procedure, as the values of the
controlled variables get closer to the optimum conditions, the second
order effects will become ‘large’ compared with first order effects.
Then it will no longer be sufficient to approximate the response surface
with a plane. Instead it will be necessary to perform a 3^ experiment or
to use a design called a composite design (see Davies, 1956).
We will conclude this section by briefly mentioning another optimiza­
tion procedure called evolutionary operation, (see Box and Draper,
1969; Lowe, 1974). This should be adopted when the process is
already operating at an acceptable level, and when it would be
impractical to interrupt production to carry out one of the experi­
ments already described. Evolutionary operation is often abbreviated
to Evop.
Suppose there are k controlled variables—usually two or three.
Then a 2* experiment can be carried out by making small changes in
these variables. These changes should be chosen so that they do. not
seriously affect the manufacturing process. For example, if a change of
20°C in the process temperature is known to have a significant effect
then a much smaller change of say 2°C should be made. Since the
changes are so small, it will be necessary to replicate the experiment
several times in order to detect significant changes in the response
variable. At least three replicates or cycles of the 2* experiment will be
required. Then the local properties of the response surface (main
effects and first-order interactions) can be estimated. Once a significant
effect has been detected, the operating conditions can be changed in the
required direction, after which a new phase will be started. This pro­
cedure can be continued indefinitely.
11.11 Summary
The choice of the appropriate experimental design will depend upon
a number of factors, but the most important is the state of knowledge
of the phenomena or mechanism which is being tested. Many indus­
trial experiments are carried out in situations in which there is already

285 Summary
a considerable amount of information available. In other words the
physical laws governing the behaviour of the system are well under­
stood. In this sort of situation the statistician is usually only required
to improve the precision and accuracy of the data. Thus when the
state of knowledge is high, the experimental uncertainty is minimized
and the need for statistics is not so acute.
Experiments which do not fall in this category are fewer in number
but much more important. The true research experiment is by defini­
tion not founded on well established knowledge. For example, there
are many instances in the history of science in which an experiment has
given a surprising result or has led to an accidental discovery. This
type of experiment, which has a high degree of experimental uncertainty,
will require all the skill of both the engineer and the statistician. The
experimental designs, which have been described in the last two
chapters, will then be of tremendous value and considerable care should
be taken to choose the most suitable design.

Exercises

1. A complete factorial experiment is set up to investigate factor A


at r levels and factor B at c levels. The experiment is replicated n
times. Show that

Z Z E = nc j] (Xi.-x)^ + nr f (x.j-x)^
i= I j = 1 k= I i =1 j= l

+nE Z (^¡r
/= 1 1
-X.J + X Ÿ + ZZZ
i= 1 j = 1 k = 1
i^ ijk -X ij)

2. An experiment was set up to investigate the effect of four different


catalysts. A, B, C and D, and the effect of agitating the mixture on the
yield from a chemical reaction. No agitation will be denoted by E^,
medium agitation by E 2 and high agitation by E 3. A 4 x 3 factorial
experiment, twice replicated, was carried out. One replication formed
a block and the order of experiments within a block was randomized.
The results were as follows (percentage of theoretical yield).

El E2 E3
A 44 49 50
46 48 50
B 46 51 51
45 47 52

286 The design and analysis of factorial experiments


c 50 53 55
52 55 54
D 46 47 50
45 50 49

Set up the A n o v a table. Graph the results, using ‘amount of agitation’


as the horizontal variate. Discuss the results of the analysis.
3. A singly replicated 2^ experiment is carried out with three factors,
A, B and C. Use Yates’s method to calculate the main eflfects and inter­
actions from the following data.

A, A,

17 18 16 14
10 13 6

Calculate the sum of squares for each effect. Test the main effects by
comparing them with the mean square obtained by combining the two-
and three-factor interactions.

References
B arn et t , V . and L e w i s , T . (1 9 8 4 ), Outliers in Statistical Data, 2n d e d n , W iley .
B o x , G . E . P. and D raper, N . R . (19 6 9 ), Evolutionary Operations, W iley.
B o x , G . E . P. and D r a p e r , N . R . (1 9 8 7 ), Empirical Model-Building and Response
Surfaces, W iley .
B o x , G . E. P., H unter, W. G ., and Hunter, J. S. (1978), Statistics for Experimenters.
W iley.
C h a t f i e l d , C. (1 9 8 8 ), Problem-Solving: A Statistician s Guide, C h ap m an and H a ll.
Cochran, W. G . and C o x , G . M . (1957), Experimental Designs, 2nd ed n , W iley.
D aniel, C . (1976), Applications of Statistics to Industrial Experimentation, W iley.
D avies, O . L. (e d .) (1 9 5 6 ), The Design and Analysis of Industrial Experiments, 2n d ed n ,
O liv e r and B o y d .
Low e, C . W . (1 9 7 4 ), ‘E v o lu tio n a r y o p e r a tio n in a c tio n ’. Applied Statistics, vol. 2 3 , pp.
2 1 8 -2 6 .
Wetherill, G . B. (1981), Intermediate Statistical Methods, C h ap m an and Hall.

287 Exercises
Chapter 12
Quality control

This chapter is concerned with some of the problems involved in


controlling the quality of a manufactured product. The first part of the
chapter (sections 12.1-7) deals with acceptance sampling, which is
concerned with monitoring the quality of manufactured items supplied
by the manufacturer to the consumer in batches. The problem is to
decide whether the batch should be accepted or rejected on the basis
of a sample randomly drawn from the batch.
The second part of the chapter (sections 12.8-11) is concerned with
process control. The problem is to detect changes in the performance
of the manufacturing process and to take appropriate action when
necessary to control the process.
The material in this chapter (except for section 12.11) constitutes the
traditional type of statistical quality control. Recently the success of
Japanese industry has led to an important change in emphasis towards
Total Quality Control’ (e.g. see Ishikawa, 1985; Neave, 1987) and
‘Quality Improvement’ (e.g. Box, 1989), where management and work­
ers co-operate in a harmonious environment. In quality circles all levels
of employees get together to discuss improvements. The emphasis is on
good product design, and on preventing faults rather than just monitoring
them. Even so, some knowledge of traditional methods is still desirable.

12.1 Acceptance sampling


Any manufacturing process will inevitably produce some defective
items. The manufactured items will often be supplied by the manu­
facturer to the consumer in hatches or lots, which may be examined by
the manufacturer before shipment or by the consumer before accep­
tance. The inspection often consists of drawing a sample from each
batch and then deciding whether to accept or reject the batch on the
evidence provided by the sample. A variety of sampling schemes exist
and we will describe the more important of these.

288 Quality control


i
reject
Acceptance sampling

A simple type of sampling scheme is one in which a single sample


is taken and the batch is accepted if there are not more than a certain
number of defective items. For example, we could take a sample size
100 from each batch and reject the batch if there is more than one
defective item. Otherwise the batch is accepted.
Acceptance sampling is used when the cost of inspecting an item is
such that it is uneconomic to look at every item in a batch. For example,
it must be used in the case where the manufactured item is destroyed by
the inspection technique. In contrast, in precision engineering it is
more common to inspect every item, in which case there are few
statistical problems and most of the following remarks do not apply.
When a batch is rejected by a sampling scheme, it may be returned
to the manufacturer, it may be purchased at a lower price or it may even
be destroyed. Alternatively rejected batches may be subjected to
100 per cent inspection so that all defective items in the batch are
replaced by good items. A sampling plan of this type, called a rectifying
scheme, is considered in section 12.4.
Acceptance sampling plans can be divided into two classes. If
the items in a sample are classed simply as ‘good’ or ‘defective’, then
the sampling scheme is said to be sampling by attributes. This qualitative
approach contrasts with sampling by variables, in which a quantitative
measurement is involved. In other words an attribute scheme does not
say how good or how defective an item is. Sometimes this is inevitable.
For example, a light bulb will either work or it will not work. There is
no in-between state and we must use sampling by attributes.
We shall concentrate our attention on attribute sampling schemes.
Fortunately many of the general principles involved also apply to
sampling by variables.

12.2 Operating characteristic curve


The performance of a particular sampling plan may be described by
the operating characteristic curve or O-C curve. Let p denote the
proportion of defectives in a batch. If p is very small, we would like
the batch to be accepted ; on the other hand if p is large, we would like

289 Operating characteristic curve


the batch to be rejected. The O-C curve is a graph of the probability
of accepting a batch plotted against p.
Let L{p) = probability of accepting a batch. When p = 0, there are
no defectives in the batch and so the batch is certain to be accepted.
Thus L(0) = 1. However when p — 1, all the items are defective and
so the batch is certain to be rejected. Thus L(l) = 0.

The problem is to design a sampling plan so that batches of ‘good’


quality are likely to be accepted whereas batches of ‘bad’ quality are
likely to be rejected. The producer and consumer should get together
and decide on a sampling plan which is fair to both. Firstly the con­
sumer should specify the quality level which he would like the producer
to achieve. The proportion of defectives in a batch which is acceptable
to the consumer is called the acceptable quality level (abbreviated
A.Q.L.) and will be denoted by p^. Ideally we would like to be certain
of accepting such a batch but this is not possible without 100 per cent
inspection. The probability that a ‘good’ batch will be rejected because
of a pessimistic looking sample is called the producer's risk and is
denoted by a. The sampling scheme is often chosen so that a has some
agreed value such as 5 per cent.
The consumer must also decide on a quality level which is definitely
unacceptable. The proportion of defective items in a batch which is
considered ‘bad’ is called the unacceptable quality level and will be
denoted by P2 - The corresponding percentage, namely I 0 0 p 2 , is often
called the lot tolerance percentage defective (abbreviated L.T.P.D.).
Ideally we would like to be certain of rejecting a batch of this quality,
but this is also not possible without 100 per cent inspection. The
probability that a bad batch will be accepted because of an optimistic

290 Quality control


looking sample is called the consumer's risk and is denoted by ¡i. The
sampling scheme is often chosen so that has some agreed value such
as 10 per cent.
The producer’s and consumer’s risk are illustrated in Figure 66.
We have said that values of p above P2 are definitely unacceptable.
Nevertheless the manufacturer should aim to do better than this and
his quality control programme should attempt to keep p below p ^. If
the fraction defective increases above p^ towards p 2 , the batch becomes
increasingly likely to be rejected by the sampling scheme and this is
financially harmful to the producer.
The observant reader will note that the graph of (1—L(p)) is none
other than the power curve when testing the hypothesis
Ho\ p = Pi
against the alternative hypothesis
H^:p > pi .

If the batch is rejected by the sampling scheme this is equivalent to


rejecting the above null hypothesis. If the batch is rejected when H q
is actually true, we have an error of type I and the probability of
such an error has been specified as a. Conversely if the batch is accepted
when it should be rejected, we have an error of type II. When p = P2
the probability of a type II error has been specified as j?.

Example 1
What is the O-C curve for a sampling scheme such as that quoted in
section 12.1 in which a single sample is taken and the batch accepted if
there is not more than one defective item?
Let p denote the proportion of defective items in a batch. Let us
also assume that the sample size is small compared with the batch size.
Then the number of defective items in a sample will be a random
variable which follows the binomial distribution with parameters
n = 100 and p. (Strictly speaking it will follow a hypergeometric
distribution, see exercise 12, Chapter 4.)
Thus P(0 defectives) = (1 —p)^^^,
P(1 defective) = 100p(l —p)^^.
Thus the probability of accepting a batch is given by
(l_p)ioo_^100p(i_p)99

291 Operating characteristic curve


This was calculated for various values of p and the O -C curve is shown
in Figure 67.

Note that it would be easier to use the Poisson approximation to


the binomial distribution with
p ^ np = lOOp.
Then the probability of accepting a batch is given approximately by
^-i oop_^100p^- ioop

The O-C curve can be used to explain what is meant by an ideal


sampling scheme. This scheme would be such that any batches with less
than a proportion P3 of defectives would be accepted, whereas any
batches with more than a proportion P3 of defectives would be rejected;
the consumer would presumably specify p^ somewhere between Pi

Figure 68 An ideal O -C curve

292 Quality control


and p 2 - The O-C curve would then be z-shaped as in Figure 68. But
this O-C curve can only be realized with 100 per cent inspection.
However the larger the sample size the closer the O -C curve will
approach the ideal z-shaped curve. Generally speaking if there is only a
small difference between the specified values of and P2 then large
samples will have to be taken to get an O-C curve which is sufficiently
z-shaped.
It is usually a fairly straightforward task to calculate the O-C curve
for a particular sampling scheme, such as that in Example 1. However
in practice we may want to design a sampling scheme for given values
of P i, a, P2 and p. For example, the consumer may want the manufac­
turer to aim at producing material which is 2 per cent defective or
better. If a producer’s risk of 5 per cent is specified, we have Pi = 0-02
and a == 005. In addition the consumer may decide that batches
containing more than 7 per cent defective items are definitely un­
acceptable. If a consumer’s risk of 10 per cent is specified, we have
P2 = 0 07 and P = OTO. Thus two points on the O -C curve have been
selected and the problem is to find a sampling scheme whose O-C
curve goes through these two points.
We will now describe several types of sampling scheme and show
how these are constructed given the specified values of p i , P2, a and p.

12.3 Types of sampling schemes


12.3.1 Single sampling
This is the simplest type of sampling plan. It consists of taking a sample
size n from each batch and accepting the batch as satisfactory provided
the number of defective items in the sample does not exceed a given
number c. The quantity c is called the acceptance number. Example 1
deals with a single sampling scheme.
It may be impossible to find a plan which meets our requirements
exactly, because the sample size and acceptance number must be
integers. However a reasonable approximation may be found by
consulting the appropriate tables (see Duncan, 1974, Chapter 7;
Wetherill, 1977, Section 2.4).

12.3.2 Double sampling


A simple extension of the single sampling scheme is obtained by a
two-stage sampling procedure. A sample size n^ is drawn but the
batch need not be accepted or rejected as a result of this first sample

293 Types of sampling schemes


if it leaves some doubt as to the quality of the batch. Instead a second
sample size ri2 can be drawn and the results of the two samples com­
bined before a decision is made. The scheme will depend upon three
constants Cj, c*2 and C3.

, reject

The plan is usually simplified by taking C2 = C3. As in the case of the


single sampling plan, we would like to find values for «i, « 2»^1 ^2
given the specified values for p 2 , a and p. In fact we need a further
restriction on the parameters to derive a unique double sampling
plan. This is usually achieved by fixing the ratio /i2 *^i • example,
we can take «2 = The reader is referred to the tables in Duncan
(1974, Chapter 8).
A feature of double sampling schemes is that a very good lot will be
accepted and a very bad lot rejected as a result of the first sample.
This first sample is smaller than the number of items inspected in the
equivalent single sampling, which has the same O -C curve. Thus
double sampling enables sampling costs to be reduced as well as
providing the psychological advantage of giving a batch a second
chance.

12.3.3 Sequential sampling


The principle of double sampling can be extended to sequential
sampling. Using this technique the sample is built up item by item, and
after each observation a decision is taken on whether the batch should
be accepted, rejected or whether another observation should be taken.
As before we must specify the four quantities Pi, (x, p 2 and Then
three constants /12 and s, which characterize the chart on which
the results are plotted, can be calculated. These constants are given by

294 Quality control


1-a
log
/l, =
log P2(l-Pl)
_Pl(l-P2)_

1 -M
log

P 2 (l-P l)
log
Pl(l-P2)_

1-Pl
log
I-P 2
P 2 (l-P l)
log
P l ( l - P 2)

After each item is inspected, the cumulative sample size n and the
cumulative number of defective items d are known. If d > sn-\-h2 , then
the batch is rejected and if d < sn —/zj, the batch is accepted. Otherwise
another item is inspected. Continue sampling until the batch is accepted
or rejected.
Sequential sampling was developed in the 1940’s by A. Wald and G.
A. Barnard; for further information see for example Duncan (1974,
Chapter 8). Note that a sequential sampling scheme is more efficient
than the equivalent double sampling scheme which has the same
specifications.

Figure 69 Sequential sampling chart

295 Types of sampling schemes


12.4 R ectifying schemes

The first major contribution to the theory of sampling inspection


was made by H. F. Dodge and H. G. Romig who considered a somewhat
different situation. Any batch which is rejected by the sampling scheme
is subjected to 100 per cent inspection and rectification. Thus all
defective items are replaced with good items.

The consumer will receive two types of batches. The first type will
contain some defective items, but have been accepted by the sampling
inspection. The second type will contain no defective items as they
have been subjected to 100 per cent inspection and rectification. From
these two types of batches we can calculate the average outgoing
quality (abbreviated A.O.Q.), which is the average proportion of
defectives in batches received by the consumer.
As before, let p denote the proportion of defective items in a batch
before the sampling inspection. When p = 0 it is clear that A.O.Q. = 0.
When p = 1, all the batches will be subjected to 100 per cent inspection
and rectification so that again we have A.O.Q. = 0. In between these
two values, the A.O.Q. will have a maximum which is called the average
outgoing quality limit (abbreviated A.O.Q.L.).

Figure 70

296 Quality control


The advantage of a rectifying scheme is that the consumer knows
that the A.O.Q. must be less than the A.O.Q.L. whatever the propor­
tion of defectives supplied by the producer.
Dodge and Romig (1959) give tables for deriving sampling schemes
in two different ways. In both types of scheme it is necessary to
specify the average quality of material produced by the process.
The average proportion of defectives is called the process average
percentage defective. If this is close to the unacceptable quality level,
then a large amount of sampling will be required; on the other hand
if it is relatively low, then less sampling will be required. It can be
estimated from past data by dividing the total number of defectives by
the total number of items inspected. In the first type of sampling
scheme it is necessary to specify the process average and the L.T.P.D.
with a consumer’s risk of 10 per cent, and in the second type of sampling
scheme it is necessary to specify the process average and the A.O.Q.L.
The tables mentioned above give the sampling schemes which require
the minimum over-all amount of inspection to satisfy these specifica­
tions.

12.5 The military standard plan


This set of tables (U.S. Department of Defense, 1963) were originally
developed during the last war but are now available for civilian use and
can be obtained from the U.S. Government Printing Office. They
combine many of the features of earlier plans with some new ones.
Firstly defects are grouped into three different classes; critical, major
and minor. This is because some defects are much more serious than
others and should be weighted accordingly. For example, a paint
scratch on a car is not comparable to faulty steering.
Secondly three levels of inspection are available. They are
(i) normal,
(ii) reduced, smaller samples than usual are taken,
(iii) tightened, larger samples than usual are taken.
The choice of the level of inspection depends on how close the estimated
process average is to the A.Q.L. Thus the scheme adopts the sensible
approach of taking into account the quality of recent batches. If the
production line has hit a bad patch then it is sensible to take larger
samples than usual. On the other hand if the process has been producing
good batches for a long period then reduced sampling can be employed.
The sampling scheme is chosen in such a way that the producer’s risk
is much smaller for large lots than for small lots. The reason for this

297 The military standard plan


is that it is much more serious to reject a large batch when it is ‘good’
than it is to reject a small batch. A description of this plan is given in
Hald (1981, Chapter 4) together with tables.

12.6 Sampling by variables


If a quality characteristic is a continuous variable, it may be possible
to use a variables sampling scheme rather than an attribute scheme.
A quantitative measurement provides more information than a simple
statement that an item is or is not defective, and so a variables sampling
scheme can give the same protection, with a smaller sample size, as
an attribute sampling scheme. For example, suppose that the upper
specification limit for some quality characteristic is given by U. Then
an attribute scheme would simply note the number of items in a sample
whose value exceeded U. A variables scheme would involve finding the
exact measurement for each item in the sample and calculating the
sample mean x. Thus if the standard deviation, a, of the measurements
is known, and it is reasonable to assume that the measurements are
normally distributed, then it is easy to estimate the proportion of
measurements which will exceed U. The batch will be rejected if
{U —x)ta exceeds a certain constant. If a is unknown it must be
estimated either from the sample standard deviation s or from the
sample range R. Details of such procedures can be found in Duncan
(1974, Chapter 11).
One disadvantage of a variables plan is that if several features of a
product are of interest then it will be necessary to make several measure­
ments, whereas in an attribute scheme it would still be relatively simple
to decide whether the item was defective.

12.7 Practical problems


We have so far said little about the practical problems involved in
setting up an acceptance sampling scheme. For example, we have
seen that a sequential sampling scheme is more efficient than a double
sampling scheme which in turn is more efficient than a single sampling
scheme. However if we consider ease of operation the order is reversed.
For this reason a single sampling scheme may give more reliable
results if operated by untrained personnel. More information about
setting up acceptance sampling schemes can be obtained from the
references.

298 Quality control


12.8 P rocess control

The second major branch of statistical quality control is that of process


control. This is concerned with the important problem of keeping a
manufacturing process at a specified stable level, and in consequence
has received widespread application in industry. It has three main
ingredients:
(1) detecting changes in process performance
(2) finding the cause of these changes
(3) making appropriate adjustments to the process.
A process which has been running at an acceptable level may suddenly
move oiT target. The job of the sampling inspector is to detect this
change as quickly as possible so that appropriate corrective action can
be taken.

• •
Ji___ t- target value
• • •• •

Figure 71 A control chart

The most commonly used tool for detecting changes is the control
chart or Shewhart chart. Choose a variable which is characteristic of
the quality of the process and plot this against time. For example, a
chemical process may have a certain optimum temperature. In this
case the actual temperature can be measured at regular intervals and
compared with the target value by plotting it on a control chart. Of
course successive measurements will vary to a greater or lesser degree
even when the process is on target. Some of the causes of variation are
outside the control of the manufacturer and these combine to form
what is called the residual variation. However some causes of variation
can be identified and removed and these are called assignable causes.
A process is said to be under control if all the assignable causes of
variation have been removed.

299 Process control


The use of control charts to detect changes in process performance
was pioneered by W. A. Shewhart amongst others in the 1930’s. The
biggest impact of the control chart is visual. In other words the
experienced inspector can get a lot of information simply by looking at
the chart. In addition a variety of statistical techniques have been
proposed to help him decide when a change has occurred.

12.8.1 Action lines


One simple device is to draw action lines on the control chart at
T ± 3cr where T is the target value and a is the residual standard devia­
tion, that is, the standard deviation of successive observations when
the process is under control. The action lines are sometimes called
control limits. They are chosen so that if the process is on target there
is only a small probability that an observation will be outside them.
If the observations are approximately normally distributed, this
probability should be about 0 003. However these action lines are also
used when the observations are not normally distributed in which
case this probability will be somewhat different. The upper action line
or control limit is often abbreviated U.C.L. Similarly the lower limit
is often abbreviated L.C.L.
If an observation does fall outside the control limits, it is an indica­
tion that the process has moved off target and some form of action is
required. This action may consist of turning a knob, resetting a machine,
replacing a faulty piece of equipment or simply taking several more
measurements to check the first measurement.
Action lines are a useful guide if used intelligently. However because
the observations are treated independently, the method is insensitive
to small changes in the mean. For example, if the process mean moves
a distance a off target, an average of forty-four results are plotted
before one falls outside the control limits.

12.8.2 Warning lines


The above method can be modified by inserting warning lines on the
control chart at T±2o, Action is taken if two consecutive results lie
outside the warning lines.

300 Quality control


12.8.3 Rule of seven

This is a useful rule of thumb which is often used by quality control


engineers. A run of seven observations on the same side of the target
value is taken to indicate a change in the process mean.

12.9 Control charts for samples


So far we have considered the situation where just one observation
is taken at regular intervals; however control charts can also be used
when small samples are taken at regular intervals. If a manufacturing
process makes a large number of items, one way of controlling the
quality of the goods as they are made is to take relatively small samples
in spot checks. The problems involved are distinct from those involved
in acceptance sampling where much larger samples are involved. Of
course a small sample is not so reliable as a large sample, but neverthe­
less it can provide valuable information, especially if carried out by
an experienced inspector who goes round the shop floor in a fairly
systematic way and who can make adjustments on the spot where
necessary.
In a chemical process it may not be possible to take several observa­
tions at the same time and then it is often convenient to divide the
data into what are called rational subgroups. Subgroups of four or
five are commonly used, and within each subgroup the variation is
assumed to be random. These subgroups can now be treated as samples.

Variables. Suppose that a sample of n measurements is taken at


regular intervals on a continuous variable. Let T denote the target
value and a denote the residual standard deviation. We want to control
not only the average quality of the process but also the variability
within successive samples. The first of these objectives can be achieved
by plotting successive sample means on a control chart called an
3c-chart. The standard error of the sample mean is given by a/yjn so
that if (j is known we can draw action lines at T±3(r/yjn. The second
objective is achieved by plotting the sample range, R, on a control
chart called an -chart. The sample range is much easier to calculate
than the sample standard deviation and in section 2.5 we noted that
the range is useful for comparing the variability in samples of equal
size. Thus the range is usually preferred to the standard deviation in
this situation.

301 Control charts for samples


If the observations are normally distributed, it can be shown that
the sampling distribution of the range has mean and upper
percentage points JR0 025 and K q-ooi given by /c2(t and respectively,
where /c^, /c2 and k^ depend on the sample size. The R-chart is con­
structed by placing a warning line at k 2 (r and an action line at k^o. The
target value is at kicr. Values of k^, /c2 and k^ are given in Table 14 of
Pearson (1960) and are reprinted by permission in Table 24.

Table 24

n 2 3 4 5 6 7 8 9 10 11 12

ki M3 1-69 2-06 2*33 2-53 2-70 2-85 2-97 3-08 3-17 3-26
ki 3-17 3-68 3-98 4-20 4-36 4.49 4-61 4-70 4-79 4-86 4-92
kj, 4-65 506 5-31 5-48 5-62 5-73 5*82 5-90 5-97 604 609

So far we have assumed that a is known; however it is usually


necessary to estimate it from past data. Rather than calculate the
average sample standard deviation, it will usually be simpler to cal­
culate the average sample range, R. An estimate of a is given by R/ki
and the R-chart consists of a target value at R, an upper warning line
at /c2^ A i ? upper action line at k^R /k^.

Attributes. Suppose that a sample of n items is inspected at regular


intervals. Then the number of defective items in the sample can be
plotted on a control chart. The action and warning lines are calculated
as follows. Suppose that on past evidence the average proportion of

Figure 72 A typical control chart for sampling by attributes

302 Quality control


defective items is p and that this is a satisfactory quality level. If this
quality level does not change, the number of defective items in succes­
sive samples will be a binomial variable with parameters n and p. The
probability of getting r defective items is given by —p)”“'' or,
using the Poisson approximation, by
e~^P(npY

The action limit, /I, is chosen so that


_i__
000-
r^A
Similarly the warning limit, W, is chosen so that

r^W
The values of A and W can easily be found by evaluating the binomial
probabilities, or, if the Poisson approximation is appropriate, they can
be found by evaluating the Poisson probabilities for which tables are
readily available. So long as the number of defectives in a sample is less
than W, it can be assumed that the process is in control. However one
observation greater than or equal to A, or two consecutive results
greater than or equal to W would mean that some action must be
taken.
Control charts based on the number of defective items in a sample
are sometimes called p-charts. However, with a complex product,
it is possible that several defects may occur in the same item. Then the
total number of defects in a sample can be calculated and plotted on a
control chart which is sometimes called a c-chart (see, for example,
Duncan, 1974, Chapter 20).

12.9.1 Tolerance limits and specification limits


Control charts can also be used in a different type of situation where
there is a zone of acceptable quality rather than a single target value.
For example, the upper and lower limits for the design specifications
for the width of a bolt may be 1T20 inches and MOO inches. Here a
possible target value would be half-way between the specification
limits but it would not matter if the over-all average measurement was
somewhat different from this provided that most of the measurements
still lay within the specification limits.

303 Control charts for samples


Suppose that a series of samples, size n, are taken and that a quanti­
tative measurement is made on each item in the sample. An estimate
of the population mean, //, can be obtained by finding the over-all
average of the sample means, which will be denoted by x. An estimate
of the population standard deviation, a, can be obtained from the
average sampling range, R, by the formula d = R/k^ (see Table 24).
Tolerance limits for the actual measurements are usually given by
/x±3(7, so that, if the observations are normally distributed, only
0-27 per cent of the observations will fall outside them. With the
process under control, the tolerance limits can be estimated by
x ± 3 R / k i . If these values do not fall inside the specification limits,
then action should be taken to adjust the population mean or to
reduce the population standard deviation. Note particularly that if
the difference between the upper and lower specification limits is less
than about 6(7, then the process is bound to produce some defective
items even when it is under control. In such a case it may be necessary
to revise the specification limits if it is impossible to reduce a.
If the tolerance limits do fall inside the specification limits, then the
process is operating at an acceptable level. Control charts for the
sample mean and range can then be constructed as before except that
the target value for the sample mean is replaced by x. Here we assume
that the tolerance limits are not substantially smaller than the specifica­
tion limits. If they are, it may be that ‘too good’ a product is being
made and that production costs could be reduced by lowering standards
somewhat. Alternatively it may be sensible to reduce the specification
limits. If this is not done then it is possible for results to fall outside the
control limits when the product is still well within the design specifica­
tions. This would go against the general rule that control charts should
always be constructed so that it is practicable to investigate every
point which falls outside the control limits.

Example 2

The upper and lower specification limits for the width of a bolt are
IT 20 inches and MOO inches. Fifteen samples of five measurements
give the following readings for x and R. (Range values are multiplied
by 10^.) Set up control charts and see if the process is under control.
(Note that in practice it is advisable to have at least twenty-five samples
in order to set up control charts.)

304 Quality control


Sam ple X R Sam ple X R Sam ple X R
1 M 15 18 6 M 12 5 11 M 13 6
2 M 16 17 7 M 14 5 12 M 14 4
3 M 14 8 8 M 12 7 13 M il 3
4 M 12 6 9 M13 3 14 M 13 5
5 M 14 7 10 M il 4 15 M il 7

This gives X = M l 30 and R = 70. Action lines for the x chart are
placed at
3R
x-h - M 130 + 0-0040.
k j5
All the observed values of x are inside these control limits. The upper
action line for the i?-chart is placed at

M = 16-4.

The values of R from samples 1 and 2 are greater than this and so the
process was then out of control.
The values of x and R were recomputed from samples 3-15. These
were x = 1•1126 and R = 5-4. The control limits for the x and K-charts

upper action line


^ 1•116
-5 1•115
•E 1 •114
•113
S 1•112
E 1 •111
®1•110 lower action line
•109
CD 10 11 12 13 14 15
sample number

15r■ upper action line

Q. 10
E • •

5 • •

0
3 4 5 6 7 8 9 10 11 12 13 14 15
sample number
3 (a) x-chart (b) /?-chart

305 Control charts for samples


were also recomputed. The action lines for the x-chart are placed at
M126±0-0031.
The upper action line for the K-chart is placed at 12-7. The values of
X and R from samples 3 to 15 are plotted in Figure 73. Since none of
the points fall outside the control limits, we can assume that the
process is now under control.
An estimate of the tolerance limits is given by x±3R/ k^ = 1T126 +
0 0070. These are inside the specification limits and so the production
material will come up to the required standards provided that the
process remains in control at the above levels.

12.10 Cusum charts


We have seen that control charts are a valuable aid in detecting changes
in process performance. However the procedure is rather crude in
that small changes in the process mean are often obscured by the
residual variation. This is because each observation is independently
compared with the control limits whereas it would appear sensible to
combine successive results in some way (as in the rule of seven). It
turns out that changes in the process mean are often easier to detect
with the aid of cusum charts, which are a valuable device in statistical
quality control (see for example Wetherill, 1977, Chapter 4).
Let Xp denote the process variable or quality characteristic at time
p, and T the target value. Then at time t the cumulative sum of devia­
tions about T is given by
s, = X i ^ p - n
p^t
If this cumulative sum is plotted against time then a cumulative sum
chart results. The expression ‘cumulative sum’ is customarily shortened
to ‘cusum’. We will see that the gradient or slope of this graph enables
us to estimate the process mean.
An example of a cusum chart is given in Figure 74 where the same
data is plotted on both a control chart and a cusum chart. The process
mean increased somewhat at time but as no point lies outside the
action lines, the control chart gives no definite evidence that the
mean has changed. In contrast there is a clear change on the cusum
chart at time since the cusum increases steadily from there on.
[Note that the vertical scale on the cusum chart is half that on the
control chart.]

306 Quality control


upper
«actkHi
^ ^ line

* * ^ ^ w ^
^target
# ^ #

lower
^action
line

5,

• • •
•• • • •
---- -------------------

Figure 74 A control chart compared with a cusum chart

The calculation of the cusums is a very simple matter. We have


51 = X1—T,
52 = ( X i - T ) + (X2- T )

= S i+ (x 2 -n
In general
5, = 5 ,_ i+ (x ,-T ).
The next problem is to show that the local process mean depends
upon the slope of the cusum graph. We will show that the slope of
the line joining to measures the average difference from T of
to x„.
The local process mean, Xl , is given by

Xl = y X.-
i=m+in-m

1 ix-T)
= T + ‘-
n —m

307 Cusum charts


S„ —S^
Cl = T + —---- ^
n —m
(change in cusum)
= T+:
(number of observations)
= T-h/c X slope,
where /c is a constant which depends upon the scales chosen for the
cusum chart. If the slope is positive, the local mean is above target.
Conversely if the slope is negative, the local mean is below target.
The visual impact of the cusum chart depends in part on choosing
‘good’ scales for the axes of the cusum chart. It is advisable to keep
all slopes below 60°, at the same time making sure that a change in
mean really does give a clearly visible change of slope. A suitable
compromise is to choose the scales so that a series of observations,
with residual variation cr, whose mean moves 2 a oflf target will give
a cusum graph which makes 45° with the horizontal.
Occasionally there will be no obvious target value and then the
cusums must be calculated with respect to some carefully chosen
reference value. This reference value must be close to the average of
the observations as it is much easier to spot changes in the mean
when the slope of the cusum graph changes sign.
It may be possible to detect changes from one positive slope to a
different positive slope but this is much more difficult. In any case if
the reference value is less than most of the observations, then the
cusum will increase rapidly and will quickly run off the graph paper.

12.10.1 Detection of changes


It is often possible to detect when a change in the process has occurred
by visual inspection of the cusum chart. However we will describe
two objective methods which have been proposed; the first by Barnard
and the second by Ewan and Kemp. These two methods have been
shown to be equivalent.

V-mask technique. A V-shaped mask is placed on the cusum chart


at a distance d ahead of the latest observation. The angle between the
limbs of the V-mask and the horizontal is denoted by 9. If all the
previous observations lie inside the V-mask, as in Figure 75, then the

308 Quality control


process is assumed to be in control. Conversely if one or more observa­
tions lie outside the V-mask then the process mean is assumed to have
changed.

Figure 75 A V-mask time

The properties of the V-mask depend on the choice of d and 9.


If d and 9 are ‘large’ then the V-mask will rarely indicate a change in
process mean. Conversely if d and 9 are ‘small’ then there will be lots
of interruptions. We want to choose d and 9 so as to detect any real
changes quickly but in such a way that an interruption is unlikely if
no real change has occurred.
The values of d and 9 can be chosen by trying different V-masks on
past data and selecting one which gives reliable results. Alternatively
they can be obtained by considering average run lengths. For a par­
ticular value of the process mean the average run length is the average
number of observations which are taken before the V-mask indicates
that a change has occurred. The average run length should be high
when the process is on target (to avoid false alarms) but low when the
process is off target. These run lengths have been calculated by Monte
Carlo methods for different values of d and 9 and are tabulated in
I.C.I. (1964). The calculations assume that the observations follow
a normal distribution with variance and that the scales have been
standardized, as mentioned before, so that the horizontal distance
between successive points is equal to la on the vertical axis. Results are
given for displacements of the process mean varying between zero and
3(7. For example, if d = 2 and tan 0 = 0-5, the average run length is
140 when the process is on target but only three when the process is 2a
off target. The manufacturer should choose two such run lengths and
hence find the appropriate V-mask.

309 Cusum charts


Decision interval. This scheme depends on two quantities called the
reference value and the decision interval. The reference value, R, is
chosen midway between the target value and an unsatisfactory level.
Every time a reading exceeds the reference value a cusum is started in
which the calculations are made with respect to R. If the cusum returns
to zero the process is under control. But if the process mean has
changed then the cusum will increase and will eventually exceed the
decision interval, at which point the process is assumed to be out of
control. Note that no plotting is necessary for this scheme.

Corrective action. When the V-mask or decision interval scheme


indicates that a change has occurred in the process performance, then
some form of corrective action must be taken. The decision interval
approach can be adapted to give an estimate of the current process
mean whenever an ‘action’ signal occurs. The change in mean will
correspond to a known change in the setting of one of the process
input variables.

12.10.2 Post-mortems
Cusum charts can also be used to analyse data retrospectively. Instead
of deducting some target value, we deduct the mean of the observations
over the period in question. This means that the cusum starts and
finishes at zero. The main point of interest is to determine if and when
changes in the local mean occurred. There are several ways of doing this.
The method we describe is suitable for computers.
First any freak values are removed from the data and replaced by
the average of the two adjacent values. The residual variance may be
estimated by

(Xj+i-Xj)
=2 2 (n-l)
with {n— 1) d.f.

Then Student’s i-test is used to locate the turning points; that is the
points at which the local mean changes.
This is done in the following way. Move along the series by connect­
ing each observation with the last turning point (or initially with the
first observation), and find the maximum absolute distance between
this chord and the intervening cusum readings. The position of this
largest difference is used to divide the intervening readings into two
groups size n^ and ^2 respectively. The means 3ci and X2 of the two

310 Quality control


groups are calculated. If the process mean is unchanged throughout
the interval then the standard error of the difference between and
Xi is
1\

The required test statistic is given by


Difference between the means
Standard error of the difference

If the value is significantly large then we reject the null hypothesis


that the process mean is unchanged and assume that we have found a
turning point. However if the value is not significant, then we move to
the next observation on the cusum chart and repeat the process.
Turning points can also be found by moving backwards along the
series. The two lists of turning points are amalgamated.
This type of repetitive analysis is ideal for a computer to perform.
In addition, by suitably choosing the scales, the cusum chart can be
printed out together with a Manhattan diagram of the local means,
which, as the name implies, is a block diagram as illustrated in Figure 76.

12.10.3 When to use cusum charts


Given a series of observations which are sequentially arranged in time,
it is usually more efficient to use a cusum chart rather than a control

time
Figure 76 A Manhattan diagram

311 Cusum charts


chart to detect changes in the local mean. However if we are only
interested in changes which are ‘large’ compared with the residual
variation, then a control chart may be adequate. In this case the
control chart is preferred on grounds of simplicity as the non-statis­
tician finds them easier to understand than cusum charts.

12.11 Prediction, system identification and control


Quality control charts, such as Shewhart and cusum charts, are one
type of method for controlling industrial processes. We will now
consider the problem in more general terms. The uncontrolled system
can be represented diagrammatically as follows:

i:
process

Xi represents the input variables which can be controlled;


X2 represents the input variables which cannot be controlled though it may be possible
to measure them and
V represents the output variables.

In addition the output variables will be affected by a series of random


disturbances, which is called noise by electrical engineers.
The purpose of control is to maximize (or minimize) some per­
formance criterion such as cost, output or efficiency. One type of
approach is to feed back the information gained from measuring the
output in order to adjust the xi variables. Such a system is called a
feedback control system. It can be represented diagrammatically as
follows:

^2

process

controller

This is sometimes called a closed loop system for an obvious reason.


The controller may be

312 Quality control


(a) A human being. Then we have manual control. Statistical control
charts may be used, or the controller may have a formula indicating the
required changes in the Xi input variables for a given deviation from
target of the output.
(b) An analogue or digital computer. In this case we have an automatic
control procedure.
Another type of control procedure is feed-forward control. Here the
information gained from measuring changes in the X2 input variables
(which can be measured but not controlled) is used to predict the likely
effect on the output so that compensating adjustments in the Xi input
variables can be made to keep the process on target.
Statisticians in the past have dealt mainly with the problem of
detecting changes in process performance (usually with control charts)
and have paid little attention to the problem of applying continuous
automatic control. In contrast, the engineering literature in particular
has made substantial advances in control theory. Deterministic control
(e.g. Jacobs, 1974) is appropriate when the system under study is
subject to no (or very little) random disturbance. This has proved
useful, for example, in space technology and the study of servome­
chanisms. The theory requires fairly sophisticated mathematics
including the study of differential equations and optimization
methods. But many systems are affected by random disturbances and
then we have what is called stochastic control theory. Linear stochastic
control theory (e.g. Astrom, 1970, Chapter 8) has two main
components; firstly, prediction and filtering; secondly, system
identification. The relationship between prediction and control is
clear, in that if, for example, we predict that the output will be a certain
distance off target, then the input variables can be adjusted by an
appropriate amount to bring the process back on target. In order to
make such an adjustment, we need to know the relationship between
input and output, that is, to know the structure of the system.
In control engineering the measured output may consist of a signal
with added noise which is sufficiently large not to be neglected. The
term filtering is often used to denote procedures for estimating the
present noise-free value of the signal, while the term prediction is used
when future values of the signal are required. The pioneering work of
Norbert Wiener and A. N. Kolmogorov in the 1940’s derived the
optimal predictor for the signal given knowledge of the properties of
the signal and of the noise process. A more practical, parametric,
approach to filtering, called Kalman filtering, has been developed since

313 Prediction, system identification and control


about 1960 (see for example Chatfield, 1989). This approach, based on a
class of models called state-space models, provides a recursive method of
estimating the present state of the system, and has the advantage of being
able to cope with non-stationary series. Having estimated the present
state of the system, we then need to find a feedback law to give the
control signal as a function of the estimated state, this function often
being linear.
The major contribution by statisticians to the design of control
schemes has been made by Box and Jenkins (1970). In contrast to the
engineering literature, which often assumes considerable knowledge
of the system and noise structure, the approach of Box and Jenkins is
more empirical in that they show how to compute forecasts for a given
series, how to identify a suitable model for the system from the
observed input and output, and how to devise a suitable control
scheme.
Box and Jenkins begin by considering the problem of finding
forecasts for a single time series. Given a series of observations jci,
X2, . . . , Xjsf, let us denote the prediction made at time N of the reading
q steps ahead by x(N, q). There are theoretical reasons for restricting
attention to forecasts consisting of a linear function of past data,
namely:
x(N, q) = woXs \viXs-i~^ • . ••
where {w,} are a set of weights. For example the procedure called
exponential smoothing, which can be used for series showing no
obvious trend, is such that
x(N, l ) = a x s - ^ a ( l - a ) x s - i + • . . + a ( l - a f Xs-k-^ • • •

where a is a constant such that 0 < a < 1. The weights are intuitively
appealing as they sum to one and give progressively less weight to
observations further in the past. Furthermore the formula can be
rewritten in the simple recursive form:
x(N, l) = a x s - ^ { l - a ) x ( N - l , 1).
Box and Jenkins (1970) propose a general class of linear models
called autoregressive integrated moving average models (abbreviated
ARIMA models). We do not have space to introduce these here but a
description is given elsewhere by the author (Chatfield, 1989). The
Box-Jenkins approach includes exponential smoothing as a special
case. We also note that the approach can cope with non-stationary
series by differencing the data until any trend has been removed. Box

314 Quality control


and Jenkins show how to identify an appropriate model and estimate
its parameters. The method has probably been more widely used on
straightforward forecasting problems, such as forecasting sales figures
and economic indicators, than on control problems which are our main
concern here.
When a prediction of the output has been made, the next problem
is to calculate the necessary adjustments to the input variables which
will keep the process on target. The problem is complicated by the
fact that adjustments may not become effective immediately. If a step
change is made to an input variable, some typical changes in the output
are as shown in Figure 77.

Figure 77 (a) Exponential response (b) Oscillatory response


(c) Delayed exponential

There are various ways of describing the dynamics of a process. An


important class of systems are those which can be adequately approx­
imated over the range of interest by a linear model. An introduction to

315 Prediction, system identification and control


linear systems is given by Chatfield (1989, Chapter 9). One general
class of linear models is described by Box and Jenkins (1970, 1979).
Given a series of input and output readings, they show how to identify
an appropriate linear model and also how to estimate the parameters
of the model. There have also been many other contributions to system
identification, particularly in the engineering literature, and these are
reviewed by Harris (1976) and Ljung (1987). When an adequate
model has been found for the given process, it should then be possible
to set up an appropriate control procedure.

Exercises
1. The target value for a particular process is given by T = 50. Samples
of size five are taken at regular intervals and the average sample range
when the process is under control is given by 5 0. Set up control charts
for the process.
2. When a new process is put into operation the true target value
may be unknown. If the process is under control, the grand over-all
mean denoted by x, can be used as a target value. Set up control charts
for samples of size four when x = 10*80 and R = 0*46.
3. In the past the average proportion of defective items produced
by a certain process has been 3 per cent and this is a satisfactory quality
level. If samples of size thirty are taken at regular intervals, set up an
attributes control scheme.
4. The temperature of a chemical process is read every ten minutes.
The target value is T and the residual standard deviation is a. The
observations are plotted on a control chart which has action lines at
T ± 3(7. If the process mean shifts to T-h a, find the probability that an
observation lies outside the action lines. Hence find the average number
of observations which will be taken before an observation lies outside
the action lines. [H int: use the geometric distribution which is given by
= 6{\—9 y ~ ' for r = 1, 2, . . . and which has mean 1/0.]
5. Warning lines are placed at T ± 2 a on the control chart discussed
in question 4. Find the probability that two points in succession lie
outside the warning lines when the process is on target. Also calculate
this probability if the process mean shifts to T + (x.
6. The specifications for a certain quality characteristic are 15 0 ± 6 0
(in coded values). Fifteen samples of four readings gave the following
values for x and R.

316 Quality control


Sample X R Sample X R Sample X R
1 161 30 6 15-7 11 11 15-3 13- 8
2 15-2 21 7 15- 2- 2 3
12 17-8 14- 2
3 14-2 5-6 8 150 3-8 13 15-9 4-8
4 13-9 2-4 9 16- 50 5 14 14- 506
5 15- 41 4 10 14*9 2-9 15 15- 2-22

Use the results from all fifteen samples to set up x and R control charts.
Hence decide if the process is in control. If not, remove the doubtful
samples, and recompute values for x and R, for use in succeeding
samples. Estimate the tolerance limits and see if the process will meet
the required specifications.

References
A n in tr o d u ctio n to tra d itio n a l q u a lity co n tro l is g iv en by W eth erill (1 9 7 7 ) and m ore
d e ta ile d g en e r a l a c co u n ts by D u n c a n (1 9 7 4 ) and G rant and L e a v e n w o r th (1 9 8 0 ). A n
in tr o d u ctio n to sa m p lin g in sp e c tio n is g iv en by G u e n th e r (1 9 7 7 ) and a m ore a d v a n ced ,
c o m p r e h e n siv e a cco u n t by H a ld (1 9 8 1 ).
T he paper by Bissell and P ridm ore (1981) is o f particular interest to British readers as
it review s the effect o f the 1979 Weights-and-Measures Act w hich p rovid es for the
ch an g e-o v er in the U n ited K in gd om from a p red om inan tly ‘m in im u m ’ system to an
‘avera g e’ system as practised elsew here in the C o m m o n M arket. F or exam p le, a b ox o f
m atches will now state the average num ber o f m atches included rather than the
m inim um num ber and this has m any im p lica tio n s for packers, trad ing standards
in sp ectors and consum ers.
T h ro u g h o u t the literature on quality co n tro l, it is usually assu m ed that quality
in sp ectors are ‘perfect’. H ow ever, a co m p a riso n o f in sp ectors (B issell, 1977) revealed an
u n a ccep tably large variation in in sp ectio n standards. S om e in sp ectors assessed m uch
low er defect rates than others on the sam e batch o f m aterial. It is also m y experience that
in sp ection standards vary con sid erably, either because o f n egligen ce or because the
in sp ectors gen u in ely apply different standards as to w hat con stitu tes a defective. T his
sh o u ld be borne in m ind in settin g up a quality co n tro l schem e. It appears that qu ality
co n tro l o f in sp ectors is required as well as qu ality co n trol o f the process!
T w o r e fe r e n c e s o n ‘m o d e r n ’ q u a lity c o n tro l p r o ced u res are Ish ik aw a (1 9 8 5 ) and B o x
(1 9 8 9 ). T h e la tter e m p h a siz e s th e r e v o lu tio n a r y sh ift from q u ality c o n tro l to q u ality
improvement, th e u se o f v ery sim p le to o ls , such as c h e c k -sh e e ts and grap h s, to
co n tin u a lly im p r o v e a sy ste m , and the u se o f ex p e r im e n ta l d esig n to im p r o v e p rod u ct
d e sig n . In p articu lar. B o x critically r ev iew s th e u se o f Taguchi methods w h ich p ro v id e a
sy ste m a tic a p p ro a ch , u sin g d e sig n e d e x p e r im e n ts, to se t a p ro d u ct’s ch aracteristics at
their target v a lu e s (o r o p tim iz e so m e d esir a b le q u a lity ch aracteristic) at th e sa m e tim e
as m in im izin g va ria b ility a ro u n d th e se targets and m akin g th e sy stem in se n sitiv e to
e n v ir o n m e n ta l ch a n g e s. A n o th e r k ey figure in m o d e rn q u ality co n tro l is W . E . D e m in g
w h o se ‘14 p o in ts for su c c e ss’ are r e v ie w e d by N e a v e (1 9 8 7 ). T h e y in clu d e such m axim s
as ‘D r iv e o u t fear, so that e v e r y o n e m ay w ork e ffe c tiv e ly for th e c o m p a n y ’, and

317 Exercises
e m p h a siz e the im p o r ta n ce o f w o rk in g as a team and a v o id in g the trad ition al
au th o rita ria n and h ierarch ical sty le o f m a n a g e m en t stru ctu re. O th e r fea tu re s o f
m o d e rn q u a lity co n tro l are that c o n sta n t im p r o v e m e n t sh o u ld b e a w ay o f life and that
th e sta tisticia n sh o u ld act as a q u a lity d e te c tiv e rather than as a p o lic e m a n . W h ile th ere
m ay be little that is sta tistica lly n ew in all th is, th e re is a fu n d a m en ta l d iffe r e n c e in
p h ilo so p h y and in a ttitu d e , particu larly by m a n a g e m e n t, and this n e e d s to b e clearly
r e c o g n iz e d .

Bissell, a . F . (1 9 7 7 ), ‘In co n sisten cy o f in sp ectio n stan d ard s - a case stu d y'. Bulletin in
Applied Statistics, vol. 4, pp. 1 6 -2 7 .
Bissell, A . F . and P ridmore, W. A . (1 9 8 1 ), ‘T he U K average q u an tity system and its
statistical im p lica tio n s’ (w ith d iscu ssio n ). Journal of the Royal Statistical Society. Series
A , vol. 144, pp. 3 8 9 -4 1 8 .
B ox, G. E. P. (1 9 8 9 ), ‘Q u a lity im p r o v em en t: An e x p a n d in g d o m a in for th e
a p p lica tio n o f scien tific m e th o d ’, Phil. Trans, of the Roy. Soc., A , 3 2 7 , 6 1 7 -6 3 0 .
B o x , G . E . P. and Jenkins , G . M . (1 9 7 0 ), Time Series Analysis, Forecasting and
Control, H o ld e n -D a y (revised edn pu b lish ed in 1976).
B o x , G . E . P. and G . M . (1 9 7 9 ), Practical Experiences with Modelling and
Jenkins ,
Forecasting Time Series, G w ily m Jen k in s and Partners (O verseas) Ltd.
C hatfi eld , C. (1 9 8 9 ), The Analysis of Time Series: An Introduction, 4th e d n .
C h a p m a n and H a ll.
Dodge, H . F . and Romig, H . G. (1 9 5 9 ), Sampling Inspection Tables, 2nd ed n , W iley.
D uncan , A . J. (19 7 4 ), Quality Control and Industrial Statistics, 4th ed n , Irwin.
G ran t , E . L. and L eavenworth , R . S. (1 9 8 0 ), Statistical Quality Control, 5th
e d n , M cG ra w -H ill.
G uenther, W. C . (1977), Sampling Inspection in Statistical Quality Control, G riffin.
Hald, a . (1981), Statistical Theory of Sampling Inspection by Attributes, A cad em ic
Press.
Harris, C. J. (1976), ‘P roblem s in system id en tification and c o n tr o l’, Bulletin of the
Institute of Mathematics, vol. 12, pp. 1 3 9 -1 5 0 .
I.C .I. (1964), Cumulative Sum Techniques, I.C .I. M o n o g ra p h no. 3, O liver and B oyd.
IsHiKAWA, K. (1 9 8 5 ), What is Total Quality Control? The Japanese way, tran slated
by D . J. L u , P ren tic e -H a ll.
Jacobs, O . L. R . (1 9 7 4 ), Introduction to Control Theory, C laren d on Press.
Lj u n g , L. (1 9 8 7 ), System Identification: Theory for the User, P r e n tic e -H a ll.
N e a v e , H . R . (1 9 8 7 ), ‘D e m in g ’s 14 p o in ts for s u c c e ss’. The Statistician, 3 6 , 5 6 1 -5 7 0 .
Pearson, E . S . (1 9 6 0 ), Application oj Statistical Methods to Industrial Standardization
and Quality Control, B .S. 600, British S tan d ard s In stitu tion (reprinted).
U .S . D epartment of D efense (1 9 6 3 ), Military Standard I05D, U .S . G overn m en t
Printing Office.
Wetherill, G . B. (1977), Sampling Inspection and Quality Control, 2nd ed n . C h ap m an
and H all.

318 Quality control


Chapter 13
Life testing

13.1 Problems in measuring reliability


The topic of reliability is of major importance in all manufacturing
industries, whether the product concerned is a washing machine, a
camera, a rocket engine, a car, an electron tube or any other durable
consumer product. The reliability of a product is a measure of its
quality and has a variety of meanings depending on the particular
situation. For example, it may be the probability that a device will
function successfully for a certain period of time, or it may be the
expected fraction of time that a system functions correctly. The differ­
ent definitions of reliability are discussed in Barlow and Proschan
(1965), Chapter 1.
The problem of measuring reliability depends on a number of
considerations. Firstly there are many products, like cars, which are
repaired when they break down. Here we are not only interested in the
time before the first failure but also in the times between subsequent
failures. Secondly there are products which are not repaired (or cannot
be repaired) and then the first failure time is also the life time of the
product. In this introductory chapter we will only be concerned with
some of the problems involved in finding a mathematical model to
describe the distribution of the first failure times. However this is
applicable to both the above types of product as the car manufacturer,
for example, is more concerned with the first failure time than with
subsequent breakdowns because of the guarantee which is given with
the product.
One method of measuring the reliability of a product is to test a
batch of items over an extended period of time and to note the failure
times. This process is called life-testing. Thus in Example 3, Chapter 1,
we have the failure times of a batch of refrigerator motors. The life­
testing programme should provide answers to the following questions.

(1) What is the mean life of the product?


(2) If a guarantee is issued with each item, what proportion of the
manufactured items will fail before the guarantee expires?

319 Problems in measuring reliability


(3) What mathematical model will satisfactorily describe the distribu-
tion of failure times?
(4) If there is a change in the manufacturing process, how does the
distribution of failure times alter?
Various type of life tests are available.

1. Non-replacement. N items are tested simultaneously under


identical conditions, but an item is not replaced when it fails. The test
continues either until all the items have failed, or until a specified time
has elapsed (time-truncated), or until a specified number of failures
have occurred (sample-truncated).

2. Replacement. N items are tested simultaneously under identical


conditions. When an item fails, it is repaired or replaced. The test
continues until a stopping rule is satisfied or it may continue indefinitely.

3. Sequential. When a new product is being tested a sequential


operation can be useful. If a lot of failures occur early on and the product
is obviously unsatisfactory, there may be no point in continuing the
test. Conversely, if very few failures have occurred after a long period of
testing, the product is obviously satisfactory and there may also be no
point in continuing the test. Thus after each failure there are three
possibilities.
(a) Product life time has been proved satisfactory ; stop testing.
(b) Product life time has been proved unsatisfactory ; stop testing.
(c) Not enough information is available ; continue testing.
Before setting up the test programme the engineer must answer the
following questions.
(1) How many items should be tested?
(2) How long should the test last?
(3) What constitutes a failure? Many products do not fail suddenly
but deteriorate gradually, and it is essential to decide beforehand the
decrease in performance which constitutes a failure. For commercial
products like a washing machine, the failure point could be at the time
when a service call would be expected.
(4) What precautions should be taken to ensure that the items chosen
for the test are representative?
(5) Will the test be carried out under normal operating conditions?

320 Life testing


It is obviously desirable, if possible, to carry out the tests under the
same conditions as would be met in ordinary use. Unfortunately this
would often require an unrealistic waiting time before useful results
become available. For example, the guarantee on a refrigerator is
usually one or two years and so tests under normal conditions would
last several years.
There are two ways of accelerating life testing. The first method,
called compressed-time testing, requires that the product should be
used more intensively than usual but without changing the stress levels
to which the product is subjected. For example, a washing machine is
often used Just once a week. By running the machine several times a
day, the machine can be made to do as much work in a few weeks as
it would normally do in a year or two, so that useful results would
become available much quicker than would otherwise be the case.
The second method of accelerated life-testing is called advanced-
stress testing and consists of subjecting the product to higher stress
levels than would normally apply. Such tests will induce failures in a
much shorter time. For example, a refrigerator is normally in use all
the time so that it cannot be subjected to compressed-time testing.
When failures do occur they are usually caused by the refrigerator
motor. By running the motor at higher speeds than usual it is found
that failures occur within a few weeks. Past experience may suggest
that say one week under advanced stress conditions is equivalent to
one year under normal operating conditions, and this enables informa­
tion to be derived from the data obtained under advanced stress condi­
tions.
Data obtained from accelerated life testing is of course much less
reliable than that obtained under normal conditions and should always
be treated with suspicion. However it may often be the only way of
getting information in the time available.

13.2 The mathematical distribution of failure times


Since the failure time of an item may be any positive number, the
distribution of failure times is continuous. Thus it can be described
either by its probability density function, f(t), or by its cumulative
distribution function, F{t). Then the probability that an item will
fail in the time interval from i to t-f-Ai is given by f(t)At. Also the
probability that an item will fail at any time up to t is given by

F(t) = jfiu)du.

321 The mathematical distribution of failure times


It is also convenient to introduce a function R{t) called the reliability
function which gives the probability that an item will survive until
time t. Thus we have
R{t) = \-F {t).
After testing a sample of items, we will have a series of observed
failure times denoted by

0^ ^ -
Thus it appears that our problem is to estimate the related functions
f{t), F{t) and R{t) from the observed failure times. However if the
type of mathematical distribution is unknown it is often expedient to
estimate another function called the conditional failure rate function
(or hazard function or instantaneous failure rate function) which is
given by

m
Z{t) =
R{ty
Then the probability that an item will fail in the interval from t to
i-l-Ai, given that it has survived until time i, is given by Z{t)At. It is
most important to realize that this is a conditional probability and
that Z{t) is always larger than f{t), because R(t) is always less than 1.
For example, the proportion of cars which fail between fifteen and
sixteen years is very small because few cars last as long as fifteen years.
But of the cars which do last fifteen years a substantial proportion will
fail in the following year. In fact the function Z(t) describes the distribu­
tion of failure times just as completely as f{t) or F(t\ and is perhaps
the most basic measure of reliability.
It is useful to establish a direct relation between Z{t) and R(t). This
is done in the following way. The functions f{t) and F{t) are related
by the equation
dF{t)
m = (see section 5.1)
~dT
and R{t) = l- F ( i) ,
dR{t)
fit)= -
dt

Thus Z(t) = m
R(t)

322 Life testing


_ -[dR{t)]ldt
R(t)
d In R(t)
~ ~ dt ’

Conversely R(t) = exp| —J Z(u)du 13.1

13.3 The exponential distribution


This distribution was introduced in section 5.6 and is the simplest
theoretical distribution for describing failure times.
Suppose a product is such that its age has no effect on its probability
of failing, so that, if it has survived until time t, the probability that it
will fail in the interval i to (t 4- At) is given by XAt, where Ais a constant
for all t. In other words the conditional failure rate, Z{t\ of the product
is a constant for t > 0.
Zit)

Then R(t) - e x p [ - j Z(u)du

= e x p [-/li],
F{t) = 1—exp[ —/li],
dF{t)
m =
dt
= X Qxp[-Xt].
We notice that / (t) is the probability density function of the exponential
distribution.
The exponential model states that chance alone dictates when a
failure occurs and that the product’s age does not affect the issue.
There are a few products of this type such as electron tubes and electric
fuses. In addition the exponential distribution describes the times
between failures of a complex piece of equipment (see Barlow and
Proschan, 1965, Chapter 2). However the reader is warned that many
products deteriorate with age and then the exponential model will not
apply.

323 The exponential distribution


The mean failure time for the exponential distribution is given by

tX e dt = T-

Estimation. If the exponential model is thought to be appropriate,


the next problem is to use the sample data to estimate >1, or its re­
ciprocal 9 = l/>i, which is the mean life. If all the items in the sample
are run to destruction without replacement then the mean sample
failure time is the best estimate of 9. However if the test programme is
time-truncated or sample-truncated, or if a replacement programme is
used, then an alternative method of estimation is required.
If N items are tested without replacement until n failures have
occurred and if i, denotes the ith failure time, then an estimate of 9
is given by

9=

If N items are tested with replacement until n failures have occurred, we


will denote the ith failure time by ti, whether it applies to an original or a
repaired item. Each of these times is measured from the beginning of the
experiment. Then an estimate of 9 is given by

Nl
9=

The reader is referred, for example, to Mann et ai (1974) for a


discussion of these results.

13.4 Estimating the conditional failure rate


The conditional failure rate, Z(t), is one of the basic measures of the
reliability of a product. In the previous section we considered the
case where Z{t) is constant and the failure times follow an exponential
distribution. But many situations exist in which Z(t) is not constant
and, in the absence of prior information, it is always a good idea to
estimate the shape of the function. This will provide information about

324 Life testing


the way the product fails and give insight into the type of mathematical
model which will be appropriate. We will describe a method of doing
this when a non-replacement programme is used.
N items are tested, simultaneously. Let i, denote the time at which
the ith failure occurs. Divide the interval (0, t^) into between about
N /5 and N /3 intervals each of a constant width which we denote by T.
Now Z (t) Ai is the conditional probability of failure in the interval t to
t + Ai, given that the item has survived until time t. This is strictly valid
only for a very small interval of length Ai, but if we consider an interval
of length r , then Z { c T ) x T can be crudely estimated by dividing the
observed number of failures in the interval cT to (c-hl)'F by the
number of items which last at least as long as cT.

Example 1
Estimates of Z (i) were obtained for the data of Example 3, Chapter 1,
and the results are given below. Since N is 20 and is 691 •5 hours, the
‘obvious’ choice for T is 100 hours, giving 7 intervals. Then we
estimate Z (t) at i = 0, 100, 2 0 0 ,..., 600 hours, as shown below.

No. of items
c Interval No. of failures surviving until ¿(lOOc)xlOO
cT
0 (0, 100) 0 20 0/20 = 0-0
1 ( 100, 200) 3 20 3/20 = 0-15
2 (200, 300) 4 17 4/17 = 0-24
3 (300, 400) 8 13 8/13 = 0-61
4 (400, 500) 2 5 2/5 = 0-40
5 (500, 600) 2 3 2/3 =0-67
6 (600, 700) 1 1 1 /1 = 1-0

The estimates of (Z (t) x 100) range from zero to one and are plotted
in Figure 78. These estimates are fairly crude, particularly when the
remaining sample size sinks to less than 5. Nevertheless it seems clear
from the graph that Z (i) is not constant but rather increases with t. This
means that the exponential model will not apply. We will describe a
more suitable model in the next section.
An increase in Z{t) is a common phenomenon and means that the
product is more likely to fail the older it gets. Another common type

325 Estimating the conditional failure rate


o
o
X
B
2
9D
>

7c5
.9

c
o
o 1-0"
•O
B
to
E

100 200 300 400 5 00 6 00

time (hours)
Figure 78 Estimated conditional failure rate

of conditional failure rate function is depicted in Figure 79. Teething


troubles due to faulty manufacture are quite likely to occur so that
Z(t) is relatively large for low values of t. Then there is a period when
the product is likely to be trouble free, so that Z(t) is relatively low.
Finally, as the product ages, more failures occur and Z(t) rises steadily.

Figure 79 A common type of conditional failure rate

326 Life testing


It is unrealistic to expect one model to describe all possible situa­
tions. However a model based on the Weibull distribution is widely
applicable and will be described in the next section.

13.5 The Weibull distribution


The Weibull distribution has been used successfully to describe
among other things, fatigue failures, vacuum tube failures and ball
bearing failures, and is probably the most popular model for describing
failure times. The conditional failure rate for the Weibull model is given
by
Z(0 =
This function depends on two parameters, m and A, which are called the
Weibull shape parameter and the Weibull scale parameter respectively.
For the special case when m = 1, we find Z{t) = >1, so the exponential
model is a special case of the Weibull model. Various types of con­
ditional failure rates are shown in Figure 80. For values of m greater
than one, Z{t) increases with t. Thus the Weibull model may describe
the data which was analysed in Example 1. However the Weibull
model will not give a function, similar to that depicted in Figure 79,
for any value of m and so the model will not apply in this case.

Using formula 13.1 we can find the reliability function of the Weibull
model,
R(t) = e x p [-A r].
Hence F(r) = 1 —exp[ —
and f(t) = ^ exp[ —

327 The Weibull distribution


The last two functions are the cumulative distribution function and
the probability density function of the Weibull distribution.
If the Weibull distribution is thought to be appropriate, then both
Weibull parameters must be estimated from the data. Efficient methods
of estimation are available (e.g. Gross and Clark, 1975, Mann et al,
1974), and these should be used if such calculations are to be made
regularly. Here we describe a simple graphical method of estimation
which is particularly useful in the early stages of an investigation as it
can be also used to see if the Weibull model is appropriate.

The Weibull reliability function is given by

R{t) = e x p [ - 2 r ] ,
giving In R(t) = —At"’

ln[ —In R(t)] = m In t + In A.

N items are tested without replacement. Let t, denote the iih failure
time. Then it can be shown that an unbiased estimate of F(tj) is given by
i/{N-\-l). Thus an unbiased estimate of R{ti) is given by (N + 1—i)/(N + 1).
But
N + 1- N+1
-In = In
N+\ N + l-i

Thus if ln[(N+ \ )/(N 1 —/)] is plotted against i, on log-log graph paper


(on which both scales are logarithmic), the points will lie approximately
on a straight line if the Weibull model is appropriate. It will usually be
sufficient to fit a straight line by eye. The slope of the line is an estimate
of m. The other parameter A can be estimated at the value of t where
ln[(A^ -f \)/{N + 1 —0] is equal to one.
Before giving an example of this method of estimation we will
generalize the model somewhat to include a third parameter. This
quantity specifies the minimum life time of the product and is sometimes
called the Weibull location parameter. It will be denoted by L. The
conditional failure rate of the three-parameter Weibull model is given
by

mA(t-L)" L < t < 00,


Z{t) =
0 otherwise.
Thus no failures occur before L. Then we have

328 Life testing


R{t) = e x p [ - A ( /- L n
F{t) = 1 —exp[ —/l(r —L)"*] \ L < t < CO.
fit) = m ^ t- L ) ^ - ^ Q x p [ - A { t-L ) ^ ]
A crude estimate of L can be obtained from the smallest failure time.
This is subtracted from all the failure times and then the remaining
observations can be analysed as before, in order to obtain estimates
of m and L This method will be illustrated with the data of Example 3,
Chapter 1, although the introduction of the location parameter, L, is
probably not really necessary in this case.

Example 2
The smallest failure time is 104*3 hours, so L = 104*3. This is subtracted
from all the other observations and the analysis is carried out on the
remaining nineteen observations. Thus we take N = 19 and then
will be the (i —l)th failure after L. The values of ln[20/(20 —(i —1))] are
given below and the resulting values are plotted on log-log paper in
Figure 81.

20
i r , - 104-3
2 1 -/ \2 l -
1 0
2 54-4 1-05 0-049
3 89-4 M l 0-104
4 97-0 1-17 0-156
5 101-9 1-25 0-222
6 123-5 1-33 0-285
7 144-8 1-43 0-358
8 203-5 1-54 0-432
9 207-2 1-67 0-513
10 225-3 1-82 0-600
11 254-2 2-00 0-691
12 260-0 2-22 0-798
13 266-1 2-50 0-912
14 276-2 2-86 1-05
15 290-3 3-33 1-20
16 321-9 4.00 1.38
17 329-8 5-00 1-61
18 448-3 6-67 1-90
19 489-7 10-00 2-30
20 587-2 20-00 3-00

329 The Weibull distribution


The points lie approximately on a straight line, indicating that the
Weibull model really is appropriate. A straight line is fitted to the
data by eye. From this we find
m = 1*72 = (slope of the line)

20
and when In = 1,
21-i
the corresponding value of t enables us to estimate A by
th ln(t —L) = —\nX.
From the graph we see t —L = 290 hours when ln[20/(21 —i)] = 1.

Hence X = 290~^^^
= 0 000059.
The exponential and Weibull distributions are only two of the many
possible types of model for describing the failure times of manufactured
products. Other distributions which have been suggested include the
gamma and log-normal distributions. The choice of a suitable distribu­
tion is by no means automatic, and the reader is referred to Gross and
Clark (1975, especially Chapter 1), Buckland (1964) and Mann et al
(1974). The book by Gross and Clark is concerned mainly with
problems in the biological sciences, where one is concerned with
survival times for animals and humans, but the problems which arise
there are in many ways similar to the problems arising in life-testing.

330 Life testing


References

A d iscu ssio n o f p ro b a b ility m o d els in reliability theory is given by B arlow and P rosch an
(1 9 6 5 , 1975). M ann et al. (1974) discu ss the fitting o f statistical failure m o d e ls,
accelerated life-testin g and system reliability. G ro ss and C lark (1975) are co n cern ed w ith
fitting survival d istrib u tio n s to b io m ed ica l life-tim e d ata. K alb fleisch and P rentice (1980)
are m ainly co n cern ed w ith fitting m o d e ls o f survival tim es in the p resence o f exp lan atory
variables, w hen the regression m eth o d s discu ssed in C h ap ter 8 o f this b o o k are n ot
a p p licab le beca u se survival tim es are generally n o t n orm ally d istributed.

B a r l o w , R . E ., and P r o s c h a n , F. ( 1 9 6 5 ), Mathematical Theory of Reliability,


W iley .
B a r l o w , R . E . and P r o s c h a n , F. (1 9 7 5 ), Statistical Theory of Reliability and Life
Testing, R in eh a rt and W in sto n .
B u c k l a n d , W . R . (1 9 6 4 ), Statistical Assessment of the Life Characteristic, G riffin.
G r o s s , A . J. and C l a r k , V. A . (1 9 7 5 ),Survival Distributions: Reliability Applications
in the Biomedical Sciences, W iley.
K a l b f l e i s c h , J. D . and P r e n t i c e , R . L . (1 9 8 0 ), The Statistical Analysis of Failure
Time Data, W iley.
M a n n , N . R , S c h a f e r , R . E . and S i n g p u r w a l l a , N . D . (1974), Methods for
Statistical Analysis of Reliability and Life Data, W iley.

331 The Weibull distribution


Appendix A
The relationship between the
normal, i and F-distributions

The normal distribution was discussed in detail in Chapter 5. In


Chapters 6 and 7 three other continuous distributions, which are
related directly or indirectly to the normal distribution, were introduced.
These are the t- and F-distributions and here we will state the
mathematical relationships between them. The results will not be
proved as the mathematical techniques involved are beyond the scope
of this book. The reader is referred to any book on mathematical
statistics for the proofs.

The distribution
If X is N (0,1) then the random variable Y = is said to be a
random variable with one degree of freedom. In the general case,
n
if X i , ^ 2, . . . , are independent N (0,1), then Y = Y,
i= 1
be a random variable with n degrees of freedom. Thus, if
X i , X 2, . . . , X„ are independent N(/z, then
n
X t-A
=1Z
=1
is also a x^ random variable with n degrees of freedom.
This distribution is useful in many situations including the follow­
ing; if Xi , . . . , x„ is a random sample size n from N (p i,a \ where
is unknown, then the sampling distribution of

X, —x\^ («-ir
2

is x^ with (n — l) degrees of freedom. One degree of freedom is ‘lost’ by


substituting X for /i, as this places one linear constraint on the values
of (x, —x).

332 Appendix A
The p.d.f. of a distribution with v d.f. is given by
( v - 2 ) / 2 ^ - ( > ’/2)

i
0 otherwise.
This distribution, in which v must be a positive integer, is a special
case of the gamma distribution. Its mean and variance are equal to
Vand 2v respectively. It is always skewed to the right, but tends to the
normal distribution as v ^ oo. Percentage points are given in Table 3,
Appendix B.

The t-distribution
If X is N{0,1), y is with Vd.f., and X and Y are independent, then
the random variable
A
JiY /v)
is said to have a t-distribution with v degrees of freedom.
This distribution is useful in a number of situations, including the
following: a random sample, size n, is taken from a normal distribution,
mean jj, and variance cr^. The sample mean, x, and sample variance, s^,
are calculated in the usual way. We know that the sampling distribu­
tion of
X — ¡1

a /^ n
is N (0,1), and that the sampling distribution of (n—l)s^/a^ is with
(n—1) d.f. Moreover it can be shown that the random variables x and 5^
are independent, even though the values are calculated from the
same sample. Thus the statistic

a /J n
t=
{n —l)s^/(T^
n —1

sl-Jn
has a t-distribution with (n—1) d.f.

333 Appendix A
The p.d.f. of a i-distribution with v d.f. is given by

/(f) = (—00 ^ i ^ + 00).


V(ffv)mv)

The distribution is symmetric about t = 0. Its mean and variance are


0 and v/(v —2) respectively. It tends to the standard normal distribu­
tion as V 00. Percentage points are given in Table 2, Appendix B.

The F-distribution
If , ^2 are independent random variables with , V2 d.f. respec­
tively, then the random variable

F=
X 2 /V2
is said to have an F-distribution with and V2 d.f. respectively.
This distribution is useful in a number of situations including the
following: two random samples, size and «2 respectively, are taken
from a normal distribution^ mean jj, and variance cr^. The two sample
variances, s j , and S2 , are calculated in the usual way. Then the sampling
distribution of {ni — l)sl/(T^ is x^ with {n^ (n^-— l) d.f., and the sampling
distribution of (/I2 - \)s\I g^ is x^ with («2 “ 1) Thus the statistic

« 1 -1
F=
{n2-\)sllG^
«2- 1

has an F-distribution with («1 —1) and («2 —l)d.f.


The mean of the F-distribution is equal to V2AV2 —2) for V2 > 2. Thus
the mean is very close to one for fairly large values of V2 , and the
distribution is always skewed to the right. Upper percentage points of
the distribution are given in Table 4, Appendix B. Lower percentage
points can be found using the equation

F
■* 1 - a , v i , V 2 = -17 L _

334 Appendix A
Appendix B Statistical tables
Table 1 Areas under the normal curve

•00 01 ■02 •03 04 05 06 07 08 09

0-0 -5000 -5040 -5080 5120 5160 5199 5239 5279 5319 -5359

•5398 -5438 •5478 5517 •5557 •5596 5636 5675 5714 5753
•5793 -5832 •5871 5910 •5948 •5987 6026 6064 6103 6141
•6179 -6217 •6255 6293 •6331 •6368 6406 6443 6480 6517
•6554 6591 •6628 6664 •6700 •6736 6772 6808 6844 6879
•6915 -6950 •6985 7019 •7054 •7088 7123 7157 7190 7224

■7257 -7291 •7324 7357 •7389 •7422 7454 7486 7517 7549
•7580 -7611 •7642 7673 •7703 •7734 7764 7793 7823 7852
•7881 -7910 •7939 7967 •7995 •8023 8051 8078 8106 8133
•9 •8159 -8186 •8212 8238 •8264 •8289 8315 8340 8365 8389
10 •8413 8438 •8461 8485 •8508 •8531 8554 8577 8599 8621

M ■8643 -8665 8708 •8729 •8749 8770 8790 8810 8830


1-2 ■8849 -8869 8906 •8925 •8943 8962 8980 8997 9015
1-3 •9032 -9049 •9066 9082 •9099 •9115 9131 9147 9162 9177
1-4 •9192 -9207 •9222 9236 •9251 •9265 9279 9292 9306 9319
1*5 •9332 -9345 •9357 9370 •9382 •9394 9406 9418 9429 9441

1-6 •9452 -9463 •9474 9484 •9495 •9505 9515 9525 9535 9545
1-7 •9554 *9564 •9573 9582 •9591 •9599 9608 9616 9625 9633
1-8 •9641 -9648 •9656 9664 •9671 ■9678 9686 9693 9699 9706
1-9 •9713 -9719 •9726 9732 •9738 •9744 9750 9756 9761 9767
20 •9772 9778 •9783 9788 •9793 •9798 9803 9812 9817

21 •9821 -9826 •9830 9834 •9838 •9842 9846 9850 9854 9857
2- •9861 2-9864 9871 •9875 •9878 9884 9887 9890
2-3 •9893 9896 9901 •9904 •9906 9911 9913 9916
2-4 •9918 -9920 •9922 9924 •9927 •9929 9930 9932 9934 9936
2-5 •9938 9940 •9941 9943 •9945 •9946 9948 9949 9951 9952

2-6 •9953 -9955 •9956 9957 •9959 •9960 9961 9962 9963 9964
2-7 •9965 -9966 •9967 9968 •9969 •9970 9971 9972 9973 9974
2*8 •9974 9975 •9976 9977 •9977 •9978 9979 9979 9980 9981
2- •9981 9-9982 •9982 9983 •9984 9985 9985 9986 9986
3- •9986 09987 •9987 9988 •9988 9989 9989 9990 9990

3-1 •9990 9991 •9991 9991 •9992 •9992 9992 9992 9993 9993
3-2 •9993 -9993 •9994 9994 •9994 •9994 9994 9995 9995 9995
3-3 •9995 -9995 •9995 9996 •9996 ■9996 9996 9996 9996 9996
3-4 •9997 -9997 •9997 9997 •9997 •9997 9997 9997 9997 9998
3-5 •9998 -9998 9998 •9998 •9998 9998 9998 9998 9998

3-6 -9998 -9998 -9998 9999 9999 -9999 9999 9999 9999 9999

335 Appendix B
Table 2 Percentage points of Student's f-distribution

a 10 05 025 01 005 001

1 3078 6314 12 706 31821 63657 318310


2 1886 2920 4303 6 965 9925 22327
3 1638 2353 3182 4541 5841 10215
4 1 533 2132 2776 3747 4604 7173
5 1476 2 015 2571 3365 4032 5893
6 1440 1943 2447 3143 3707 5 208
7 1415 1 895 2365 2998 3499 4785
8 1397 1860 2306 2896 3355 4501
9 1383 1833 2262 2821 3250 4297
10 1372 1812 2228 2764 3169 4144
11 1363 1 796 2201 2718 3106 4025
12 1356 1 782 2179 2681 3055 3930
13 1350 1 771 2160 2650 3012 3852
14 1345 1 761 2145 2624 2977 3787
15 1 341 1 753 2131 2602 2947 3733
16 1337 1746 2120 2583 2921 3686
17 1333 1740 2110 2 567 2 898 3 646
18 1330 1 734 2101 2552 2878 3610
19 1 328 1 729 2093 2539 2861 3579
20 1 325 1725 2086 2528 2845 3552
21 1323 1 721 2080 2518 2831 3527
22 1 321 1717 2074 2508 2 819 3505
23 1319 1714 2069 2500 2807 3485
24 1318 1711 2064 2492 2*797 3467
25 1316 1 708 2 060 2485 2787 3450
26 1 315 1706 2056 2479 2779 3435
27 1 314 1703 2052 2473 2771 3421
28 1313 1701 2 048 2467 2763 3408
29 1 311 1699 2045 2 462 2756 3396
30 1310 1697 2042 2457 2750 3 385
40 1303 1684 2021 2423 2704 3307
60 1 296 1671 2000 2390 2 660 3232
120 1289 1658 1980 2358 2617 3160
00 1 282 1645 1960 2326 2576 3090

336 Appendix B
Table 3 Percentage points of the distribution

995 99 975 95 50 20 10 05 025 01 005

V
1 0000 00002 0001 00039 0-45 1 64 271 384 502 663 788
2 0010 0020 0051 0 103 1 39 322 4 61 599 738 921 10-60
3 0-072 0 115 0216 0352 237 464 625 781 935 11 34 12 84
4 0207 030 0484 071 336 5-99 778 949 11 14 13 28 14 86
5 0 412 055 0 831 1 15 435 7 29 924 11 07 12 83 15 09 1675

6 0676 087 1 24 1 64 535 856 1064 1259 1445 1681 1855


7 0 989 1 24 1 69 2-17 635 980 12 02 1407 1601 18 48 20 28
8 1 34 1 65 2 18 273 734 11 03 1336 15-51 17 53 2009 21 95
9 1 73 209 270 333 834 12 24 14 68 1692 190 2 21 67 23 59
10 2 16 256 325 394 934 13 44 15 99 1831 2048 23 21 25 19

11 260 305 382 4-57 1034 14-63 17 28 1968 21 92 2472 26 76


12 307 357 440 52 3 11 34 1581 18 55 21 03 23 34 26 22 28 30
13 3-57 4-11 5-01 589 12 34 1698 1981 22-36 2474 27 69 29 82
14 407 466 563 657 13 3 4 18 15 21 06 2368 26 12 29 14 31 32
15 460 523 6 26 7 26 143 4 19-31 22 31 25 00 27-49 30 58 32 80

16 5 14 581 691 796 15 3 4 2047 23 54 26 30 2885 3200 34-27


17 570 641 7 56 8-67 1634 21 61 24 77 27 59 30 19 33-41 35 72
18 626 702 823 939 17 34 22 76 2599 28 87 31 53 3481 37 16
19 684 7 63 891 10 12 1834 23 90 27 20 30 14 3285 36 19 3858
20 743 826 9 59 1085 19-34 25 04 2841 31 41 34 17 37 57 40 00

21 803 890 1028 11-59 20 34 26 17 29 62 32 67 3548 38 93 41-40


22 864 9 54 1098 12 34 21 34 27 30 30-81 33 92 36 78 40 29 4280
23 926 1020 11-69 13 09 23 34 2843 32-01 35 17 3808 41 64 4418
24 989 1086 12 40 1385 23 34 29 55 33 20 3642 39 36 4298 45-56
25 1052 11 52 13 12 14 61 2434 30 68 3438 37 65 4065 44-31 4693

26 11 16 12 20 13 84 1538 25 34 31 79 35 56 3889 41 92 4564 48 29


27 11 81 12 88 1457 16 15 26 34 3291 36 74 40 11 43 19 46 96 49-64
28 12 46 13-57 15-31 16 93 27 34 3403 37 92 41-34 44 46 48 28 5099
29 13 12 1426 1605 17 71 28 34 35-14 3909 42 56 45 72 4959 5234
30 1379 14 95 1679 18 49 29 34 36 25 40 26 43 77 4698 50 89 5367

40 20 71 22 16 244 3 26 51 39 34 47 27 51 81 55 76 59 34 63 69 66 77
50 27 99 29 71 3236 3476 4933 58 16 63 17 67 50 71 41 76-15 7949
60 35 53 37 48 40 48 43 19 5933 6897 74 40 79 08 83 30 8838 91 95
70 43 28 4544 48 76 51 74 6933 79 71 85 53 90 53 95 02 100 43 1042
80 51 17 53 54 57 15 6039 79 33 9041 9658 101 88 106 63 112 33 1163

90 59 20 61 75 65 65 69 13 89 33 101-05 107 57 113 15 118 14 124 12 128 3


100 67 33 7006 7422 77 93 9933 111 67 118 50 124 34 129 56 135 81 1402

337 Appendix B
Table 4 Upper percentage points of the f-distribution

a
( ) « = 001

1 2 3 4 5 6 7 8 9 10 12 15 20 24 30

V2
1 4052 2 4999 5 5403 4 5624 6 5763 6 5859 0 5928 4 5981 1 6022 5 6055 8 6106 3 6157 3 6208 7 62346 62606
2 9850 99-00 9917 9925 99 30 9933 9936 9937 9939 9940 99-42 99-43 99-45 9946 9947
3 3412 3082 2946 2871 2824 2791 2767 2749 2735 2723 2705 26 87 2669 2660 2650
4 2120 1800 1669 15 98 15 52 1521 1498 1480 1466 1455 1437 14-20 1402 13-93 1384
5 16 26 13-27 12 06 1139 1097 10-67 10-46 10-29 1016 10-05 9-89 972 9-55 9-47 9 38
6 13 75 10-92 978 9-15 875 8-47 8-26 810 798 787 772 756 740 731 723
7 12 25 9-55 8-45 785 746 719 699 684 672 662 647 6-31 616 607 599
8 1126 8-65 759 701 663 637 618 6-03 591 581 5-67 552 536 528 520
9 10 56 802 6-99 642 606 580 561 547 535 526 5-11 4-96 4-81 473 465
10 1004 7-56 6-55 599 5-64 539 520 506 494 4-85 471 456 4-41 433 425
11 9-65 721 622 5-67 532 5-07 489 474 463 4-54 440 4 25 4-10 402 394
12 933 6-93 595 541 506 4-82 4-64 450 439 430 416 401 386 378 3 70
13 907 6-70 574 5-21 486 4-62 44 4 430 419 4 10 3-96 382 3-66 359 3-51
14 8-86 6-51 5-56 504 4-69 446 428 414 403 3 94 3-80 3-66 351 343 335
15 868 6-36 542 4-89 4-56 432 4-14 400 389 3-80 3-67 352 337 329 321
16 853 623 529 477 4-44 420 403 389 378 3-69 355 3-41 3 26 318 310
17 8-40 6-11 518 467 4-34 410 3-93 379 368 359 346 3-31 316 3-08 300
18 829 6-01 5-09 458 425 4-01 3-84 371 360 3-51 337 323 308 300 292
19 818 593 5-01 45 0 417 3-94 377 363 352 343 330 3-15 300 292 284
20 8-10 585 4-94 443 41 0 3-87 370 356 346 337 323 309 2 94 286 278
21 802 578 487 437 404 3-81 3-64 351 340 331 317 303 288 280 272
22 795 572 482 4-31 399 376 3-59 345 335 326 312 298 283 275 2-67
23 788 566 476 426 3-94 371 354 341 330 321 307 293 278 270 262
24 782 561 472 422 390 3-67 350 336 326 317 303 289 274 266 258
25 777 5-57 468 418 385 363 346 332 322 313 299 285 270 262 254
26 772 553 464 4-14 382 359 3-42 329 318 309 296 281 266 258 250
27 768 549 460 4-11 378 3-56 339 326 315 306 293 278 263 255 247
28 764 5-45 457 407 375 353 336 323 312 303 290 275 260 252 244
29 760 5-42 4-54 4-04 373 350 333 320 309 300 287 273 257 249 241
30 756 539 451 402 370 3-47 330 317 307 298 2 84 270 255 247 239

40 731 518 431 383 3-51 329 312 299 289 280 266 252 237 229 220
60 708 498 4-13 365 3-34 3-12 295 282 272 263 250 235 220 212 203
120 685 479 3-95 3-48 317 296 279 266 256 2-47 2 34 219 203 195 186
00 663 4-61 378 332 302 280 264 251 241 232 2-18 204 188 179 170

338 Appendix B
Table 4 (continued)

i b )n = 0025

1 2 3 4 5 6 7 8 9 10 12 15 20 24 30

V2
1 647 79 799-50 86416 899 58 921 85 937 11 948 22 956 66 963 28 968 63 976 71 984 87 993 10 997 25 1001 4
2 3851 39-00 3917 39-25 3930 3933 3936 3937 3939 3940 3941 3943 3945 3946 3946
3 17 44 16-04 1544 K 10 1488 1473 1462 14-54 1447 14-42 1434 1425 1417 1412 1408
4 12 22 10-65 9-98 960 936 920 9-07 898 890 884 875 866 856 8-51 846
5 10-01 8-43 776 739 715 698 685 676 668 662 652 643 633 628 623

6 881 726 6-60 623 599 582 570 560 552 546 537 527 517 512 507
7 8-07 654 589 552 529 512 499 490 482 476 4-67 4-57 4-47 4-41 436
8 757 606 542 505 482 465 453 4-43 436 4-30 420 410 400 395 389
9 721 5-71 508 472 448 432 420 410 403 396 387 377 367 361 356
10 6-94 5-46 483 447 4-24 407 395 385 378 372 362 3-52 342 337 331

11 672 5-26 4-63 428 4-04 388 3-76 3-66 359 353 343 333 323 317 312
12 655 5-10 4-47 412 389 373 3-61 351 344 337 3 28 318 3-07 302 296
13 641 497 435 400 377 360 348 339 331 325 315 3-05 295 289 284
14 630 486 424 389 366 350 338 329 321 3-15 305 295 284 279 273
15 620 477 415 3-80 358 341 329 320 312 306 296 286 276 270 264

16 6-12 4-69 408 373 3-50 3-34 322 3-12 305 299 289 279 268 263 257
17 6-04 462 401 366 344 328 3-16 306 298 292 282 272 262 256 250
18 598 4-56 395 3-61 338 322 3-10 3-01 293 287 277 267 256 250 244
19 592 4-51 390 356 333 317 305 296 288 282 272 262 2-51 245 239
20 587 446 3-86 351 329 313 3-01 291 284 277 268 257 246 241 235

21 5-83 4-42 382 348 325 309 297 287 280 273 2 64 253 242 237 231
22 579 438 378 3-44 322 305 293 284 276 270 2-60 250 239 233 227
23 575 4-35 375 3-41 318 302 290 281 273 2-67 257 247 236 230 224
24 572 432 372 338 315 299 287 278 270 2-64 2 54 244 233 227 221
25 5-69 429 3-69 3-35 313 297 285 275 268 2-61 2-51 2-41 230 224 218

26 5-66 427 367 333 310 294 282 273 265 259 249 239 228 222 216
27 5-63 424 3-65 331 308 292 280 271 263 2-57 2-47 236 225 219 213
28 5-61 422 363 329 306 290 278 269 261 255 245 2 34 223 217 211
29 5-59 4-20 361 327 304 288 276 267 259 253 243 232 221 215 209
30 5-57 4-18 3-59 325 3-03 287 275 265 257 2-51 2-41 231 220 214 207

40 542 405 346 313 290 274 262 253 245 239 229 218 207 2-01 1 94
60 529 393 334 301 279 263 251 2-41 233 227 2-17 206 194 188 182
120 515 380 323 289 267 252 239 230 222 216 205 194 1 82 1 76 169
00 502 3-69 312 279 257 2-41 229 219 211 205 1-94 183 1-71 1-64 1 57

339 Appendix B
Table 4 (continued)

ca
( ) = 005

1 2 3 4 5 6 7 8 9 10 12 15 20 24 30

^2
1 161 45 199 50 215 71 22458 23016 23399 236 77 23888 240 54 241 88 24391 245 95 248 01 24905 25010
2 1851 19-00 1916 19 25 1930 19 33 19 35 1937 1938 19-40 1941 1943 1945 1945 19-46
3 1013 9-55 928 912 901 894 889 885 881 879 874 8 70 8 66 864 862
4 771 694 659 639 626 616 609 604 600 596 591 586 580 577 575
5 661 579 5-41 5-19 505 495 488 482 477 474 468 4 62 456 4 53 450
6 599 5-14 476 453 439 428 421 415 410 4-06 400 394 387 384 381
7 559 47 4 435 412 3-97 387 379 373 368 364 357 351 3-44 3-41 338
8 532 446 407 38 4 369 3-58 350 344 339 335 328 3 22 3-15 312 3 08
9 512 426 386 363 348 337 329 323 3-18 314 307 301 294 290 286
10 496 410 371 348 333 322 3-14 307 302 298 291 284 277 274 270
11 484 398 359 336 320 309 301 295 290 285 279 272 265 261 257
12 4-75 389 349 326 311 300 2-91 285 280 275 269 262 254 251 247
13 4-67 3-81 341 318 303 292 283 277 271 2-67 260 253 246 242 238
14 4-60 3-74 33 4 311 296 285 276 270 265 260 253 246 239 2 35 231
15 454 368 329 306 290 279 271 264 259 2 54 248 240 233 229 2 25
18 449 363 324 301 285 2-74 266 259 25 4 249 242 235 228 224 219
17 4-45 3-59 320 296 281 270 261 255 249 245 238 231 223 219 215
18 4-41 355 316 293 277 2-66 258 251 246 241 234 227 219 215 211
19 43 8 352 313 290 274 263 254 248 242 238 231 223 216 211 207
20 435 349 310 287 271 260 2-51 245 239 235 228 220 212 208 204

21 432 3-47 307 28 4 268 2-57 249 242 237 232 225 218 210 205 201
22 4-30 3-44 3-05 282 266 255 246 2-40 234 230 223 215 207 203 198
23 42 8 342 303 280 264 253 244 237 232 227 220 213 205 201 196
24 426 340 3-01 278 262 2-51 242 236 230 225 218 211 203 198 194
25 42 4 339 299 276 260 249 240 234 228 224 216 209 201 196 192
26 423 337 298 274 259 247 239 232 227 222 215 207 199 195 190
27 421 335 296 273 257 246 237 231 225 2 20 213 2 06 197 193 188
28 42 0 334 2-95 271 256 245 236 229 224 2 19 212 204 196 191 187
29 418 333 293 270 255 243 235 228 222 218 210 2 03 194 190 185
30 417 332 292 269 253 242 233 227 221 216 209 201 193 189 184
40 408 323 284 261 245 234 225 218 21 2 208 200 192 184 179 174
60 40 0 315 276 253 237 Z 25 2-17 2-10 2-04 199 192 184 175 170 1-65
120 392 3-07 268 245 229 218 209 2<)2 196 1-91 183 175 166 1-61 155
00 3*84 300 260 237 221 2-10 m 194 188 183 175 167 157 152 146

340 Appendix B
Ta b les Values of e
-O OU -0 u -OOb• -Ob —X
e e e X e e e

0-0 1 •0000 1 0000 •00000


•1 9990 •9900 •90484 5-1 •9503 •6005 •00610
2 •9980 9802 •81873 5-2 •9493 •5945 •00552
3 •9970 •9704 •74082 5-3 •9484 5886 •00499
•4 •9960 •9608 •67032 5-4 •9474 •5827 00452
•5 •9950 ■9512 •60653 5-5 •9465 •5769 •00409

6 •9940 •9418 •54881 5-6 •9455 ■5712 •00370


■7 •9930 •9324 •49659 5-7 •9446 •5655 00335
■8 9920 •9231 •44933 5-8 •9436 •5599 •00303
9 •9910 •9139 •40657 5-9 ■9427 ■5543 •00274
1-0 •9900 •9048 •36788 60 •9418 •5488 •00248

1-1 •9891 •8958 •33287 6-1 •9408 •5434 •00224


12 9881 •8869 •30119 6-2 •9399 •5379 •00203
13 •9871 •8781 •27253 6-3 9389 5326 •00184
1-4 •9861 •8694 •24660 6-4 •9380 •5273 •00166
1-5 •9851 ■8607 •22313 6-5 ■9371 •5220 •00150

1-6 •9841 •8521 •20190 6-6 •9361 •5169 ■00136


1-7 •9831 ■8437 •18268 6-7 9352 •5117 •00123
1-8 •9822 •8353 •16530 6-8 •9343 5066 •00111
1- 9•9812 •8270 ■14957 6-9 •9333 •5016 •00101
2- 0•9802 •8187 •13534 7-0 •9324 •4966 •00091

2-1 •9792 •8106 •12246 7-1 ■9315 •4916 •00083


2-2 •9782 8025 •11080 7-2 •9305 •4868 •00075
2-3 •9773 •7945 •10026 7-3 •9296 •4819 •00068
2-4 •9763 •7866 •09072 7-4 9287 •4771 •00061
2-5 •9753 ■7788 •08208 7-5 •9277 •4724 00055

2-6 •9743 •7711 •07427 7-6 9268 •4677 •00050


2-7 •9734 •7634 •06721 7-7 9259 •4630 •00045
28 •9724 •7558 •06081 7-8 ■9250 •4584 •00041
2- 9•9714 7483 05502 7-9 9240 •4538 •00037
3- 0•9704 7408 •04979 8-0 •9231 •4493 •00034

3-1 9695 •7334 •04505 8-1 •9222 •4449 •00030


3-2 •9685 •7261 •04076 8-2 •9213 •4404 00027
3-3 ■9675 •7189 •03688 8-3 •9204 ■4360 •00025
3-4 •9666 •7118 •03337 8-4 •9194 •4317 00022
3-5 •9656 •7047 •03020 8-5 •9185 •4274 00020
3-6 9646 •6977 •02732 8-6 •9176 •4232 •00018
3-7 9637 •6907 •02472 8-7 •9167 •4190 •00017
3-8 •9627 •6839 •02237 8-8 •9158 •4148 •00015
3- 9•9618 •6771 •02024 8-9 •9148 •4107 •00014
4- 0•9608 •6703 •01832 9-0 •9139 •4066 •00012
4-1 •9598 •6637 •01657 9-1 9130 ■4025 •00011
4-2 •9589 •6570 •01500 9-2 •9121 3985 •00010
4-3 •9579 •6505 •01357 9-3 •9112 •3946 •00009
4-4 •9570 •6440 •01228 9-4 •9103 3906 •00008
4-5 •9560 •6376 ■01111 9-5 •9094 •3867 •00007
4-6 •9550 •6313 •01005 9-6 9085 3829 ■00007
4-7 •9541 •6250 •00910 9-7 •9076 •3791 00006
4-8 ■9531 •6188 •00823 9-8 9066 •3753 •00006
4- 9•9522 •6126 •00745 9-9 •9057 •3716 •00005
5- 0■9512 •6065 ■00674 lOO •9048 •3679 •00005

341 Appendix B
Table 6 Percentage points of the distribution of the Studentized range

(a) a =0 01

10 11 12 13 14 15 16 17 18 19 20

1 90-0 135 164 166 202 216 227 237 246 253 260 266 272 277 282 286 290 294 298
14-0 19-0 223 247 20-6 282 29 5 307 31-7 326 33-4 341 348 35-4 360 36-5 37-0 375 379
2
3 826 108 122 13-3 14-2 15-0 15-6 16-2 16-7 17-1 17-5 17-9 18-2 18-5 18-8 19-1 19 3 19-5 19-8
4 6-51 1812 9-47 9-96 10-0 11-1 .115 11-9 12-3 12-6 12 8 13-1 13 3 135 13-7 13-9 14 1 142 14-4
5 5-70 6 97 7 80 8 4 2 89 1 932 967 997 10 24 10 48 10 70 10-80 11 08 11-24 11 40 11 55 11 68 11 81 11 93

6 5-24 633 703 7-56 797 832 8-61 8-87 9-10 930 9-49 9-65 9-81 9-95 10-08 1021 10 32 10 43 10-54
7 4-95 592 6-54 r-01 7-37 768 7-94 8 17 8-37 8-55 871 8-86 900 9-12 9-24 9-35 946 9-55 965
8 4-74 563 620 663 696 724 7-47 768 7-87 803 8 18 8-31 8-44 8-55 8-66 8-76 8-85 894 9-03
9 4-60 5-43 5-96 635 6-66 6-91 7-13 7 32 7-49 7 65 7 78 7-91 803 8-13 823 832 841 849 8-57
10 4-48 5 27 577 6 14 6-43 6-67 687 705 7-21 736 748 7 60 7 71 7-81 7 91 7 99 8-07 8 15 822

11 439 5-14 562 5-97 625 6-48 667 6-84 6-99 713 725 7-36 7-46 7-56 7-65 7 73 781 788 7-95
12 432 5-04 5-50 584 6-10 632 651 6-67 6-81 6-94 706 7 17 726 736 7-44 752 7 59 766 7-73
13 426 496 5-40 5-73 598 6-19 637 6-53 6-67 679 6-90 7-01 7-10 7-19 727 734 7-42 748 7-55
14 421 4-89 532 5-63 588 608 626 6-41 6 54 666 6-77 6-87 696 7-05 7 12 7 20 7 27 733 739
15 4 17 483 525 5-56 580 599 6 16 631 6-44 6-55 666 6 76 684 693 7 00 707 7 14 720 7 26

16 4-13 4-78 5-19 5-49 5-72 5-92 608 622 6-35 6-46 6-56 666 674 682 690 697 703 7-09 7-15
17 4-10 4-74 5-14 5-43 5-66 585 6-01 6-15 6-27 6-38 648 6-57 6-66 673 680 687 694 700 705
18 4-07 470 500 5-38 5-60 579 5-94 6-08 6-20 631 641 6-50 6-58 6-65 6 72 6 79 685 691 696
19 405 467 505 5-33 5-55 5-73 5-89 6-02 6-14 6-25 6-34 6-43 651 658 665 672 678 684 689
20 402 4-64 502 529 5-51 569 584 5-97 6-09 6 19 629 637 645 652 659 665 6 71 6 76 682

24 396 454 4-91 5-17 5-37 5-54 5-69 5-81 5-92 602 6-11 6-19 626 6-33 639 6-45 651 656 6-61
30 3-89 445 480 5-05 524 5-40 5-54 5-65 5-76 585 5-93 6 01 608 6-14 6-20 626 631 636 6-41
40 382 437 470 493 5-11 5-27 530 5-50 5-60 569 5-77 5-84 590 5-96 602 607 6-12 6 17 621
60 3 76 428 4-60 482 499 5 13 525 536 5-45 5 53 560 567 5 73 5 79 584 589 593 5 98 602

120 370 420 4-50 4-71 4-87 5-01 5 12 521 5-30 538 5-44 5-51 556 5-61 566 571 5 75 579 583
364 4-12 440 460 476 488 499 5-08 5-16 523 529 5-35 5-40 545 549 5-54 557 5 61 565
00

c is the size of the sample from which the range is obtained and v is the number of degrees of freedom of s.
( )
b a =005

10 11 12 13 14 15 16 17 18 19 20

1 18 0 270 328 37 1 404 43-1 45-4 474 491 506 52-0 532 54-3 55-4 56-3 572 58-0 588 596
2 609 83 98 10-9 1 17 12 4 13-0 13-5 14-0 14-4 14 7 15-1 15-4 15 7 15 9 16 1 16 4 16 6 168
3 4-50 5-91 682 7-50 8-04 8-48 8-85 9-18 9-46 972 9-95 1015 10-35 10-52 10 69 10 84 10 98 11-11 11 24
4 393 504 576 629 6-71 705 7-35 760 783 803 821 837 852 866 879 891 9-03 9 13 923
5 364 460 522 5-67 603 633 658 680 690 7 17 7 32 747 760 7 72 7 83 7 93 803 8 12 821

6 3-46 4-34 4-90 5-31 563 589 6-12 6-32 649 665 679 6-92 703 7 14 724 734 743 7-51 7 59
7 334 4 16 468 5-06 5-36 5 61 5-82 600 6 16 630 643 655 666 676 685 6-94 702 709 7 17
8 326 404 453 4-89 5 17 540 5-60 577 592 605 6 18 629 639 648 657 665 673 680 687
9 320 395 442 4-76 502 524 5-43 560 5 74 587 598 609 6 19 628 636 644 651 658 6-64
10 3 15 388 433 4-65 491 5 12 530 546 560 5 72 583 593 603 6-11 620 627 634 640 647

11 3 11 382 4-26 457 482 503 5-20 5-35 5-49 5-61 5-71 581 590 599 606 6 14 620 626 633
12 3-08 377 420 4-51 4’75 495 5-12 527 540 5-51 5-62 5-71 5-80 588 5-95 603 609 6 15 621
13 306 3-73 415 4-45 469 488 5-05 5 19 532 543 553 563 5-71 579 586 593 600 605 6-11
14 3-03 3 70 4-11 4-41 4-64 4-83 4-99 5-13 525 536 5-46 5-55 5-64 572 579 585 592 597 603
15 301 367 408 437 460 478 4-94 5-08 520 531 540 549 558 565 572 579 585 590 596
16 3-00 365 4-05 433 456 4-74 4-90 503 5-15 526 535 5-44 552 5-59 566 572 579 584 590
17 2-98 363 4-02 430 4-52 4-71 4-86 499 5-11 5 21 5-31 539 5-47 5-55 5 61 568 574 5 79 584
18 297 3-61 4-00 428 4-49 4-67 4-82 4-96 5-07 5-17 527 535 543 550 557 5-63 569 574 5-79
19 296 359 3-98 4-25 4-47 4-65 4-79 492 5-04 5-14 523 532 539 5-46 553 559 565 5 70 575
20 295 3 58 396 423 4-45 462 4-77 490 5 01 5-11 520 528 536 5-43 549 555 561 566 5 71

24 292 353 3-90 4-17 4-37 454 4-68 4 81 4-92 5-01 5 10 5-18 525 532 538 5-44 550 554 559
30 289 349 3-84 4-10 430 4-46 4-60 472 4-83 492 500 508 5 15 5-21 527 533 538 543 5-48
40 286 3-44 379 4-04 423 439 4-52 463 4-74 4-82 4-91 498 5-05 5-11 5-16 522 5-27 531 536
60 283 340 374 398 4 16 431 444 455 4-65 473 481 488 494 500 506 5 11 5 16 520 524

120 280 336 369 3-92 4-10 4-24 4-36 448 456 4-64 4-72 478 484 4-90 495 500 5-05 509 5-13
00 2-77 3-31 3-63 386 4-03 417 4-29 4-39 4-47 455 4-62 4-68 4-74 480 485 489 493 497 5-01

c is the size of the sample from which the range is obtained and v is the number of degrees of freedom of s.
Table 7 Random numbers

19211 73336 80586 08681 28012 48881 34321 40156 03776 45150
94520 44451 07032 36561 41311 28421 95908 91280 74627 86359
70986 03817 40251 61310 25940 92411 34796 85416 00993 99487
65249 79677 03155 09232 96784 17126 50350 86469 41300 62715
82102 03098 01785 00653 39438 43660 02406 08404 24540 80000

91600 94635 35392 81737 01505 04967 91097 02011 26642 38540
20559 85361 20093 46000 83304 96624 62541 41722 79676 98970
53305 79544 99937 87727 32210 19438 58250 77265 02998 02973
57108 86498 14158 60697 41673 18087 46088 11238 82135 79035
08270 11929 92040 37390 71190 58952 98702 41638 95725 22798

90119 23206 75634 60053 90724 29080 69423 66815 11896 18607
45124 69607 17078 61747 15891 69904 79589 68137 19006 19045
83084 02589 37660 63882 99025 34831 92048 23671 68895 73795
04685 31035 93828 16159 05015 54800 76534 22974 13589 01801
61349 04538 89318 27693 02674 34368 24720 40682 20940 37392

14082 65020 49956 01336 41685 01758 49242 52122 01030 60378
82615 53477 58014 62229 72640 32042 73521 14166 45850 02372
50942 78633 16588 19275 62258 20773 67601 93065 69002 03985
76381 77455 81218 02520 22900 80130 61554 98901 26939 78732
05645 35063 85932 22410 31357 54790 39707 94348 11969 89755

76591 83750 46137 74989 39931 33068 35155 49486 28156 04556
31945 87960 04852 41411 63105 44116 95250 04046 59211 67270
08648 89822 04170 38365 23842 61917 57453 03495 61430 20154
32511 07999 18920 77045 44299 85057 51395 17457 24207 02730
79348 56194 58145 88645 84867 41594 28148 84985 89949 26689

61973 03660 32988 70689 17794 61340 58311 32569 23949 85626
92032 60127 34066 28149 22352 12907 53788 86648 57649 07887
74609 71072 63958 58336 67814 40598 12626 30754 75895 42194
98668 76074 25634 56913 88254 41647 05398 69463 49778 31382
65248 72078 58634 88678 21764 67940 45666 84664 35714 43081

82002 96916 94138 74739 99122 03904 46052 97277 60243 37424
79100 55938 23211 10111 17115 90577 94202 01063 85522 64378
30923 71710 70257 05596 42310 02449 31211 50025 99744 78084
90513 50966 78981 70391 45932 13535 21681 66589 94915 08855
94474 79356 16098 95806 79252 14190 88722 39887 15553 58386

65236 62948 19968 22071 49898 96140 80264 57580 56775 63138
80502 04192 84287 32589 50664 63846 71590 67220 71503 27942
01315 04632 50202 89148 41556 11584 35916 13979 25016 32511
81525 76670 88714 28681 56540 84963 85543 69715 86192 79373
19500 41720 79214 20079 42053 29844 02294 11306 78537 65098

25812 77090 45198 98162 13782 60596 99092 50188 65405 63227
80859 94220 92309 01998 45090 24815 13415 01677 39092
41107 33561 04376 40072 78909 61042 04098 73304 21892 63112
00465 00858 22774 80730 07098 80515 09970 40476 10314 24792
58137 02454 15657 24957 48401 02940 92828 26372 31071 58192

32013 97147 69725 78867 73329 74935 69276 46001 04181 38838
17048 84788 12531 01773 43551 34586 61239 87927 03232 31312
33935 07944 98456 11922 96174 24100 00307 85697 06527 34381
47633 49394 38673 22281 68096 76599 38462 16662 819,59 03358
82161 92521 10712 58839 18546 32920 89220 90493 737^5 22327

344 Appendix B
Table 7 (continued)

99 P50 30876 80821 14955 11495 25666 37656 91874 93051 64664
08090 84688 36332 86858 73763 62534 93378 54809 97076 09077
67619 00352 32735 56954 97851 57350 33068 35393 75938 86086
63779 66008 02516 93878 67930 38445 44166 20168 55128 65337
03259 72119 04797 95593 02754 87120 68167 04455 75318 93127

92914 02066 97320 00328 51685 89729 27446 32599 82486 01718
80001 70542 01530 63033 64348 01306 75419 90348 34717 05147
387.15 09824 86504 14817 74434 80450 95086 73824 40550 14266
15987 74578 12779 69608 76893 94840 36853 00568 35697 00783
06193 94893 24598 02714 69670 06153 97835 71087 58193 97912

40134 12803 33942 46660 05681 35209 65980 77899 38988 75580
88480 27598 48458 65369 81066 02000 68719 90488 50062 10428
49989 94369 80429 97152 67032 62342 96496 91274 71264 45271
62089 52111 92190 85413 95362 33400 03488 84666 99974 01459
01675 12741 94334 86069 71353 85566 16632 97577 18708 99550

04529 19798 47711 63262 06316 00287 86718 33705 31645 70615
63895 63087 91886 43467 55559 35912 39429 18933 75931 18924
17709 21642 56384 85699 24310 85043 00405 59820 54228 58645
11727 83872 22553 17012 02849 39794 50662 32647 67676 95488
02838 03160 92864 29385 63585 46055 41356 96398 70904 87103

62210 02385 73776 03171 83842 94602 31540 96071 55024 87629
16825 05535 99451 81864 99410 81211 62781 55121 62268 48522
05985 62766 58215 61900 53065 85082 88200 74393 24100 88379
14184 86400 41788 82932 27183 44744 14964 71718 76499 37364
95315 04537 85490 90542 42519 35659 87983 51941 20420 56828

65578 64820 95644 98074 72032 53443 92722 96373 36030 78053
18444 28477 01846 95805 91166 74383 55926 92971 99743 04905
03577 99361 21047 21971 71191 70493 70210 87051 94715 88924
49752 47015 09472 20089 90924 03674 73181 81104 95411 00656
32489 04936 30628 99512 40891 39832 25101 71757 77503 82112

76548 92824 53738 65890 78297 50705 96792 56841 41063 92875
26545 68726 06476 57444 35455 46706 40388 79728 99747 75076
67651 97346 75509 50270 27943 71144 15397 04565 95265 52236
67879 04880 01478 97239 32611 85024 37275 46399 59303 28341
96329 85824 79954 96263 91873 37394 45728 32769 72930 82361

87421 32587 32890 79171 54734 60628 53702 06741 98558 19167
22447 888^3 21866 39773 26018 28765 01876 03776 51523 89095
79589 92914 06964 43330 01726 30504 24797 52657 44098 22006
92123 79976 31751 68549 06147 38138 58792 80966 59767 24564
85909 35590 89231 75271 34409 48770 08980 54457 26022 29742

43162 44793 39006 76661 02000 14571 73986 96351 02276 47746
47549 41709 52412 40595 40397 38883 20843 90121 74897 96286
71711 75690 50441 41322 16497 36962 88880 45374 29836 82096
51091 24078 13706 27315 69918 06628 99964 09477 59496 90825
94981 73799 35590 58944 36581 94509 17508 31203 97030 28541

23778 02351 44843 28005 63835 69611 91360 20756 70188 02554
36324 01285 47959 40386 10284 03089 95441 77955 70381 04689
31710 55804 18079 15172 27321 93535 81303 97488 94531 61924
84106 55010 57902 09150 59719 52718 96632 22555 72411 85957
27527 60618 02688 95261 20022 88691 20488 93189 33658 49237

345 Appendix B
Appendix C
Further reading

This appendix provides a list of general reference books on statistics,


operations research and on some other related topics which have not
been covered in this book.

General statistics reference texts


The following selection of books is intended to act as a complement to
this book, by providing an alternative, more extended, view of the
topics discussed herein.

Probability
Chapter 3 of this book provides an introduction to probability theory.
Those readers, who wish to pursue this topic, will find a more complete
treatment of it in many books including the following:
C hung, K. L. (1979), Elementary Probability Theory with Stochastic
Processes, 3rd edn, Springer-Verlag.
F e l l e r , W. (1968), A n Introduction to Probability Theory and its Applica­
tions, vol. 1, 3rd edn, Wiley.
H o e l , P. G ., P o r t , S. C. and S t o n e , C. J. (1971), Introduction to
Probability Theory, Houghton Mifflin.
M e y e r , P. L. (1970), Introductory Probability and Statistical Applications,
2nd edn, Addison-Wesley.

Statistical methods
Statistical methods are discussed in many books, ranging in quality
from the very good to the not-so-good. Some of the older texts give
more detail regarding computation than is necessary today. The
following list gives some of the better applied texts. The author has
found the comprehensive book by Snedecor and Cochran particularly
useful. The books edited by Davies, and by Davies and Goldsmith, are
especially relevant to problems in the chemical industry.
B o w k e r , A . H . and L i e b e r m a n , G . J. (1972), Engineering Statistics, 2nd
edn, Prentice-Hall.

346 Appendix C
Box, G. E. P., H u n t e r , W . G. and H u n t e r , J. S. (1978), Statistics for
Experimenters, Wiley.
D a v i e s , O . L. (ed.) (1956), Design and Analysis of Industrial Experiments,
2nd edn, Oliver and Boyd.
D a v i e s , O . L. and G ol ds mi th , P. L. (eds) (1972), Statistical Methods in
Research and Production, 4th edn, Oliver and Boyd.
S n e d e c o r , G. W. and C o c h r a n , W. G. (1980), Statistical Methods, 7th
edn, Iowa Stale University Press.
S t o o d l e y , K. D. C., L e w i s , T. and S t a i n t o n , C. L. S . (1980), Applied
Statistical Techniques, Ellis Horwood.
W e t h e r i l l , G. B., (1981), Intermediate Statistical Methods, Chapman and
Hall.
W i n e , R. L. (1964), Statistics for Scientists and Engineers, Prentice-Hall.

Statistical theory
There are many books on statistical theory which provide a more
rigorous treatment of statistics than that attempted in this book. These
books include the following:
H oel , P. G . (1984), Introduction to Mathematical Statistics, 5th edn,
Wiley.
L i n d g r e n , B. W. (1976), Statistical Theory, 3rd edn, Macmillan.

Statistical tables
Most text books provide the more commonly used statistical tables
(see Appendix B), but there are many other sets of tables which are
readily available in libraries. Some of these books of tables are for one
specific purpose (e.g. Tables of the Binomial probability distribution),
while others are collections of different tables, which include the
following:
F i s h e r , R. A. and Y a t e s , F. (1963), Statistical Tables for Biological,
Agricultural and Medical Research, 6th edn, Oliver and Boyd.
N eave, H. R. (1978), Statistics Tables, George Allen and Unwin.
P e a r s o n , E. S. and H a r t l e y , H. O. (1966), Biometrika Tables for
Statisticians, vol. 1, 3rd edn, Cambridge University Press.

Operations research
In industrial and commercial organizations, many decisions have to be
made in the presence of competing demands and of uncertainty.
Operations research is an approach to problem-solving which helps in
choosing an effective course of action. The general approach is as

347 Appendix C
follows. First, an approximate mathematical model of the system
under study is developed. Then a measure of the effectiveness of the
system has to be chosen. Finally, the system variables are varied in the
model until an ‘optimum’ solution to the problem is found.
Both statisticians and technologists will find it useful to know
something about operations research. In addition to a knowledge of
probability and statistics, the study of operations research involves a
variety of other topics, particularly optimization techniques. Linear
programming is a mathematical technique for maximizing (or mini­
mizing) a linear function of several variables, when these vari­
ables are subject to linear constraints. This technique is useful, for
example, in allocating limited resources to competing demands in the
most effective way. Non-linear programming and dynamic pro­
gramming are two other optimization methods.
Simulation is another important technique used in operations
research. This involves the construction of a mathematical model of
the relevant portion of the system under study. Then the model is used
to examine the effect of altering the variables which describe the
system without having to make costly changes to the real system. These
experiments are usually carried out on a computer.
Other topics of interest to the operations researcher include stock
control (or inventory control), queueing theory (see Section 5.6.3),
critical path scheduling, and game theory.
The following books are general texts on operations research:
H i l l i e r , F. S. and L i e b e r m a n , G. J. (1980), Introduction to Operations
Research, 3rd edn, Holden-Day.
T a h a , H. a . (1979), Operations Research - An Introduction, 2nd edn,
Prentice-Hall.
W a g n e r , H. M. (1975), Principles o f Operational Research, 2nd edn,
Prentice-Hall.
The following books deal with specialized aspects of operations
research:
B azaraa, M. S . and S h e t t y , C . M. (1979), Non-linear Programming -
Theory and Algorithms, Wiley.
C o o p e r , L. and C o o p e r , M. W. (1981), Introduction to Dynamic
Programming, Pergamon.
C ox, D. R. and S m i t h , W. L. (1961), Queues, Chapman and Hall.
H a s t i n g s , N. A. J. (1973), Dynamic Programming with Management
Applications, Butterworth.
K o l m a n , B. and B e c k , R. E. (1980), Elementary Linear Programming with
Applications, Academic Press.

348 Appendix C
L ew is, C. D . (1970), Scientific Inventory Control, Butterworth.
M organ, B. J. T. (1983), Elements o f Simulation, Chapman and Hall (to be
published).
P r i c e , W. L .(1971), Graphs and Networks: An Introduction, Butterworth.
W illiam s, H. P. (1978), Model-Building in Mathematical Programming,
Wiley.

Some other topics


A number of statistical topics, of interest to scientists and engineers,
have not hitherto been mentioned. Data are often collected sequen­
tially in time, giving rise to what are called time series. An introduction
to time-series analysis is given elsewhere by this author. The books by
Jenkins and Watts (1968) and by Bloomfield (1976) are particularly
concerned with the frequency characteristics of time series. There are
many books on the related topic of signal processing.

B e n d AT, J. S. an d P i e r s o l , A . G . (1986), Random Data, Revised 2nd


edn, Wiley.
B l o o m f i e l d , P. (1976), Fourier Analysis o f Time Series: An Introduttion,
Wiley.
C h a t f i e l d , C . (1989), The Analysis o f Time Series: A n Introduction, 4th
edn. Chapman and Hall.
J e n k i n s , G. M . and W a t t s , D. G. (1968), Spectral Analysis and its
Applications, Holden-Day.

The study of time series, and also of control theory (see Section
12.11), requires a knowledge of stochastic processes, which are physical
processes whose structure involves a random mechanism.

Cox, D . R. and M iller , H. D. (1965), The Theory of Stochastic Processes,


Wiley.

Another group of statistical methods, called multivariate analysis, is


increasingly used in all branches of science. These methods are
appropriate when measurements are made on several variables for
each of a large number of individuals or objects. Cluster analysis is the
term used to describe a number of techniques which are used to
subdivide the individuals into groups or clusters. The methods assume
no underlying model and are of an exploratory nature. Discriminant
analysis is the term used for allocating the individuals to one group of a
given set of groups by means of a decision rule based on a linear

349 Appendix C
function of the data. The linear function is derived from data for a set
of individuals where it is known which group each individual belongs
to. Principal component analysis makes a linear transformation of the
variables to a new uncorrelated set of variables which are obtained in
order of decreasing importance from first to last, the aim being to
reduce the dimensionality of the situation. Factor analysis has a similar
aim but assumes an underlying model with a specified number of
factors which are linear combinations of the original variables. Multi­
dimensional scaling is also concerned with reducing the number of
dimensions but is appropriate when the data arise directly or indirectly
in the form of ‘distances’ between the individuals. This technique can
cope with situations where the ‘distances’ are calculated from ordinal
data. Multivariate methods can be helpful but need to be used with
caution. For example factor analysis, in the author’s experience, rarely
gives results which can be interpreted in a useful way and is widely
misused in the social sciences.
An introduction to multivariate analysis is given in the following
books:

C h a t f ie l d , C. and C o l l i n s , A. J. (1980), Introduction to Multivariate


Analysis, Chapman and Hall.
K e n d a l l , Sir Maurice, (1980), Multivariate Analysis, 2nd edn, Griffin.
M a r d i a , K . V., K e n t , J. T. and B i b b y , J. M . (1979), Multivariate Analysis,
Academic Press.

350 Appendix C
Appendix D
Some additional topics

This appendix considers a few topics which were omitted from the first
edition of this book. Most of them come broadly under the heading of
‘Descriptive Statistics’ and extend the work of Chapter 2. First, we
show how to calculate the mean and standard deviation of a frequency
distribution. Second, we discuss the relationship between sample
range and sample standard deviation, and its help in interpreting the
sample standard deviation as a measure of variability. Third, we
discuss how to round off numbers. Fourth, we introduce stem-and-leaf
and box plots.
The final topic discussed in this appendix is that of estimating and
testing a proportion, which extends the work of Chapters 6 and 7.

D.l Calculating the mean and standard deviation of a frequency


distribution
In Section 2.3, we showed how to calculate the mean, variance and
standard deviation of a set of observations, denoted by jci,. . . , jc„. The
sample mean is given by

jc = ^ Xiln (D.l)
i=i

and the sample variance by

5 = X /(«-!) (D.2)
1= 1

Now data are often presented in the form of a frequency dis­


tribution. Then the above equations need to be modified. The cal­
culation of the mean of a frequency distribution was considered briefly
on p. 29. If the values X i ,. . . ,xsr occur with respective frequencies
/ i , . . . , / n, then the sample mean is given by

X =
i H i f " (D 3)

351 Appendix D
Note that the summations in (D.3) are from 1 to capital N, and not
lower case n as in (D .l). The denominator in (D.3) is of course the
sample size, n.
It is easy to show that the sample variance of a frequency distribution
is given by
(D.4)
'■=1 i= i

=[ (n - 1) (D.5)

where
N
n = Y . f ‘-
1= 1

In the discrete case, x and can be calculated by adding two extra


columns to the frequency distribution table to form the sums X fiXi and
Yafix] as illustrated, using the data of Table 1, Chapter 2. Here there
are N = 5 values which can occur, namely 0, 1, 2, 3 and 4 cosmic
particles. The computations are illustrated in Table D .l.

Table D .l.

Number o f
cosmic particles Frequency
Xi fi Xifi
0 13 0 0
1 13 13 13
2 8 16 32
3 5 15 45
4 1 4 16

Totals 40 48 106

Then
jc = 4 8 / 4 0 = 1* 2

5^ = ( 1 0 6 - 4 0 x 1*2^39

= 1-24

5 = 1-1.

352 Appendix D
In the continuous case, x and are calculated by assuming that all
the observations in a class interval fall at the mid-point of the interval.
As in the discrete case, two extra columns are added to the frequency
table to calculate the sums X fiXi and X /¿jc? where {x,} are now the class
marks. However this is usually achieved by employing a technique
called ‘coding’, which is an extension of the technique described in
Exercise 3 of Chapter 2.
First the mean of the given frequency distribution is guessed by eye.
The midpoint of the selected class interval is coded zero. If the class
intervals are all of equal width, then the mid-points of the two
neighbouring class intervals are coded +1 and —1 respectively.
Subsequent class intervals are coded ±2 and so on. Then the mean and
standard deviation of the coded values is calculated. From these it is
easy to see that the mean and standard deviation of the original data
can be calculated by the following formulae:
mean = (mid-point of the class interval which is coded zero)
-h (coded mean x class interval width),
standard deviation = coded standard deviation x class interval width.
Note that variance = coded variance x (class interval width)^. Also
note that the above formulae assume the class intervals are of equal
width. If they are not, then the coding technique needs to be modified
in an ‘obvious’ way by letting the most common class width correspond
to a code change of unity.
As an example let us consider the height data of Example 1 Chapter
2 (see p. 22). The distribution is roughly symmetric about the centre
class interval of (66-68) in. We therefore guess 67 in to be the trial
mean and code it zero. The mid-points of the other class intervals are
coded as shown in Table D.2. as the class intervals are of equal width.
Then two columns are needed to calculate X rfr and X r^fr where fr is
the frequency in the class interval whose mid-point is coded with the
value r. The coded mean is given by

Z/-/r/Z/. = 21/100
= 0-21
Now the class interval is the distance between successive class marks,
which is three inches in this case. Thus

mean of original data = 67 + (0*21x3)


= 67*6 ins.

353 Appendix D
The coded variance, on rewriting (D.5), is given by

= [1 0 5 -(2 1 )V l0 0 ]/9 9
= 1- 02 .
Thus the coded standard deviation is given by TOl. Then the standard
deviation of the original data is given by (TOl x 3) = 3-0 in.
Note that, although we have carried two decimal places in the
calculations, there is no point in presenting the final values for the
mean and standard deviation to more than one decimal place, as the
original data were measured to the nearest inch. (See the remarks in
Section D.3 on rounding.)

Table D.2.

Height Code Frequency


(inches) r fr rfr

60-62 -2 6 -12 24
63-65 -1 15 -1 5 15
66-68 0 40 0 0
69-71 +1 30 30 30
72-74 +2 9 18 36

Totals 100 21 105

D.2 Interpretation of the sample standard deviation


The sample standard deviation, 5, is a measure of the spread or
variability of a set of data. At first, students may have difficulty in
understanding this statistic, which is the root-mean-square deviation
of the observations about their mean.
First we repeat that s is measured in the same units as the obser­
vations, whereas the variance, 5 ^, is not. Thus the variance is usually
not useful directly as a descriptive measure. However, it turns out that
it is usually easier from a theoretical and computational point of view
to work with variances and take the square root at the end of the
computations to get a standard deviation.
Now in calculating descriptive statistics, such as the mean and
standard deviation, one should always have some idea of the likely
magnitude of the results, so that one can see if the calculated values

354 Appendix D
look reasonable. This gives a check against arithmetical slips. Most
people can guess the mean of a set of data fairly accurately, but may
have trouble assessing the standard deviation. One way to do this is to
look at the range.
In Chapter 2 we pointed out that the range tends to get larger as the
sample size increases so that it does not provide a reliable measure of
spread by itself, particularly as it depends on just two outlying values.
However, it does provide a quick guide to the likely value of the
standard deviation. In particular, s must always be less than the range.
For a distribution which is roughly symmetric, the following guide
can be used:

5 —range/>/(Az) for n < about 12


s —range/4 for 20 < n < 40
5 —range/5 for n about 100
5 —range/6 for n above 400.

where the symbol ’ means ‘is approximately equal to’. If, for
example, the sample size is 15, then interpolating in the above guide,
we have

5 —range/32

For the height data considered in Example 1 of Chapter 2, and in


Section D .l, the range is 14 inches (from 60 to 74 inches) and the
sample size is 100. Thus using the above guide we expect to find s
about 14/5 —2*8 ins. The exact value turned out to be 3-0 in.
For distributions which are skewed, s may be a somewhat large
proportion of the range.

D.3 How to round numbers


Consider the following problems:

Problem 1
The Annual Report of a certain Building Society shows that their
assets rose from £697 432 057 in 1975 to £723 486 872 in 1976. Is it
really possible that a Building Society can specify its assets correctly to
the nearest pound, and would it not, in any case, be clearer to round the
numbers off in some way?

355 Appendix D
Problem 2
Seven observations are made on the time to failure, in hours, of a
certain type of machine. The results are as follows:
104-6, 152-3, 146-5, 130-7, 163-9, 115-6, 121-8.
The mean was calculated on a pocket calculator and was found to be
jc = 133-62857
as shown on the display. How many of the decimal places are worth
recording? These 2 examples highlight a topic which gives trouble to
many students and also to more advanced workers. That is the
question of deciding how many significant digits are needed for any
given number. This problem has grown worse in recent years with the
advent of pocket calculators, as these usually display far more digits
than are actually required. Yet surprisingly few statistics text-books
give any guidance on how to round off numbers. One exception is the
book by A. S. C. Ehrenberg {Data Reduction, 1975, Wiley), whose
work we draw on here.
The rounding problem may arise at three separate stages of a
statistical investigation, namely:
(a) Collecting the data. How many digits should be recorded?
(b) Carrying out the calculations. How many digits need to be ‘carried’
in order to get the results to a specific accuracy?
(c) Presenting the results. How many digits need to be given when
summarizing the analysis of the results?
When presenting data. Ehrenberg {Journal of the Royal Statistical
Society, Series A, 1977, p. 277) has stressed the need to give clear
summary tables and in particular has suggested the two-variable-digits
rule for rounding numbers. This says that data in general and summary
statistics in particular should be rounded to two variable digits, where a
variable digit is defined as one which varies in the kind of data under
consideration.
Let us see how this rule works in practice. Consider the number
181-633. This has six significant figures and three decimal places. But
how many variable digits does it contain? In fact it is impossible to
answer this question in isolation. Rather we must look at the other
observations in the sample. Suppose we have a sample consisting of the
following five observations:
181-633, 182-796, 189-132, 186-239, 191-151.

356 Appendix D
Here the first digit is always ‘1’, and so is not a variable digit. The
second digit is ‘8’ in four cases and ‘9 ’ in the fifth observation.
Although it varies a little, it does not vary over the whole range from ‘0 ’
to ‘9’ and so will not be counted as a variable digit. But the remaining 4
digits in each number could be anything from ‘0’ to ‘9’ and so are
variable digits. Using the two-variable-digit rule, this suggests that the
mean value should be given with just one decimal place, even though
the original data are recorded to three decimal places. Given the
variability in the data (a range of about 10), the two-variable-digit rule
is effectively saying that the digits in the second and third decimal
places give us very little information as we are unlikely to be interested
in variation of smaller magnitude than the range divided by 100.
Applying the rule to Problem 1, the first digit of the Building
Society’s assets would probably not be regarded as a variable digit as it
will change slowly over the years, but all subsequent digits are variable.
This suggests that the numbers should be rounded to the nearest
million pounds for presentation in the annual report. This is probably
what the reader did in the first place when he looked at the figures.
Admittedly some people may find the ‘exact’ figures more comforting,
but one should realise that it is highly unlikely that they are exactly
correct anyway. In addition, since nearly everyone reading the report
will mentally round the figures, it seems sensible to present the data in
rounded format to start with as this makes it easier for the reader. The
original ‘exact’ figures can be stored just in case they are needed.
Applying the two-variable-digit rule to Problem 2, we see that the
first digit is fixed, while the second varies between ‘0’ and ‘6’. The third
and fourth digits are certainly variable, but it is a matter of subjective
judgement as to whether the second digit is regarded as variable or not.
The mean should therefore be given with either zero or one decimal
place depending on one’s judgement. In the matter of rounding, a
degree of subjective judgement is inevitable and all guidelines will
have exceptions. Indeed some people may argue that it is impossible to
give general rules for rounding as there will always be occasional
exceptions. But surely it is better to have general guidelines, which one
can apply with judgement, than to have no guidelines at all. In problem
2 many people will feel ‘safer’ if they give the mean to one decimal
place, but, as we see below, a strong case can be made for rounding the
mean to the nearest integer.
The two-variable digit rule can also be applied to probabilities, as
these should normally be quoted to two decimal places. Occasionally,
additional digits may be desirable, as for example for probabilities

357 Appendix D
close to 0 or 1 when one may wish to see more accurately how close the
values are. Correlation coefficients should also usually be quoted to
two decimal places.
One situation where the two-variable digit rule does not apply,
concerns the presentation of test statistics such as i-values, F-ratios
and ;t^-values. There is no point in giving more than one or two
decimal places as this is the accuracy of the published tables of
percentage points. The observed test statistic is usually compared with
the tabulated percentage points to get an approximate level of
significance.
Having dealt with the presentation of data, let us now consider the
collection of data. I suggest that the two-variable-digit rule can also be
generally (though not always) used. Consider again the sample:

181-633, 182-796, 189-132, 186-239, 191-151.

We have already suggested that the mean of the data should be given to
one decimal place. If the numbers were rounded to one decimal place
before carrying out any calculations, one would lose virtually no
information. In addition the data would be somewhat easier to handle
and much easier to understand if presented in the form:

181-6, 182-8, 189-1, 186-2, 191-2.

Consider again the data of Problem 2. It is arguable here if one wants


zero or one decimal place. If we round the data to whole numbers, they
become:
105, 152, 146, 131, 164, 116, 122

It is now much easier to understand the data as mental arithmetic finds


two digits much easier to handle than three. We can now easily see
which number is highest, which is lowest, and how the data spread
round the mean. The mean of the rounded data is 133-7 instead of
133-6 for the original data, but both are 134 to the nearest integer. The
loss in accuracy is negligible when one considers the width of a
confidence interval for the mean. If the mean is presented to one
decimal place, most people will mentally round it anyway to 134. This
suggests that it may well be adequate to round both the data and their
mean to the nearest integer.
Third we consider carrying out the calculations. Here the two-
variable-digit rule does not apply. If for the example one wants to

358 Appendix D
calculate the standard deviation of a sample correct to one decimal
place, then one needs to calculate the sample variance correct to two
decimal places. In other words, we will generally need extra working
digits, so that we can round off accurately at the end of the calculations.
This does not mean that one will arbitrarily add half-a-dozen digits ‘to
be on the safe side’, but rather consider the computational aspects of
each problem as in Example 2 below. Usually one or two extra
significant digits will be adequate.
Let us ‘round off’ this section with two more examples.

Example 1
The cosmic particle data of Example 2, Chapter 1, consists of integers
between 0 and 4. Here there is only one recorded variable digit. The
mean and standard deviation need to be calculated to one decimal
place in order to give two-variable-digit accuracy.

Example 2
Consider a more sophisticated type of analysis, such as the regression
example on page 170. Here we want to estimate the two coefficients of
the regression line with appropriate accuracy. Although we want to see
how many significant digits are required, this is not really a rounding
problem but rather a numerical analysis problem. Nevertheless, it is
convenient to consider it here. The approximate argument presented
below is intended to clarify the general approach rather than be precise
numerical analysis.
The response variable, y, is recorded to two decimal places in the
data and contains less than two variable digits, so we want the fitted
regression line to give an estimate of the mean value of y which is
correct to at least two decimal places. The regression line has two
components; the intercept, d o, and ‘slope (a i)x controlled variable
(jc)’. We require each component to be accurate to three decimal places
in order that the sum is accurate to at least two decimal places. The
intercept d o , should therefore be calculated to three decimal places.
Now values of the controlled variable, x, range as high as 100. So we
need to calculate the slope correct to five decimal places in order that
its product with x is accurate to three decimal places. The slope itself is
a ratio of a sum of products and a sum of squares (see Equation 8.2). As
the denominator is exactly 3500, the numerator need only be cal­
culated correct to two decimal places. In fact the numerator is exactly
7.9, so there is only one decimal place.

359 Appendix D
D .4 Stem -and-leaf plots and box plots

In this section we briefly describe two methods of plotting data which


are currently fashionable. Both plots can be used to display the
distribution of an observed variable.
The stem-and-leaf plot has similarities to the histogram but arguably
contains more information although it looks more like a table than a
plot. We demonstrate its use by means of an example and will use the
thrust data of Example 1, Chapter 1.
To get a stem-and-leaf plot, we first assess the range of the data
which for the thrust data is (1014*5 — 989*4) = 25*1. We divide this
range into intervals of fixed length in the same sort of way that we did
for the histogram where in Figure 4, Chapter 2, we selected a class
interval of width 3. This meant that the data were spread over ten class
intervals. For a stem-and-leaf plot we normally have to make the class
interval either 0*5, 1 or 2 times a power of 10 which is something of a
restriction in trying to get between about 6 and 15 intervals. Here we
choose a class interval of width 5 ( = 0*5 x 10). We then draw a vertical
line, called the stem, on the left of which we mark the interval
boundaries in increasing order noting only those digits which are
common to the observations within the interval. For example there are
two intervals starting with 99, the first being observations between 990
and 995 and the second between 995 and 1000. No overlap is allowed
and the user must decide how to allocate the interval points. For
example, we decide here to allocate 995*0 (if it occurs) to the upper
interval. Then we go through the observations one by one, noting
down the next significant digit in each observation on the right-hand
side of the stem. These digits constitute what are called the leaves of the
plot. For example, the first observation 999*1, goes into the third line of
Table D.3 as it is in the higher of the two intervals starting with 99. The
third digit of the number is written in the leaf and in this case the
fourth digit is ignored.
After all the observations have been entered, a final version of the
plot can be found by ordering the digits within each class interval. The
stem-and-leaf plot then looks like a histogram on its side. However, we
also have information about the observations within each class
interval, which can be very useful. For example, it is easy to find the
median of a set of data from a stem-and-leaf plot by picking out the
middle observation as the data are in ascending order.
There are many variations on the stem-and-leaf plot and the reader
is referred, for example, to Erickson and Nosanchuck (1977) and
McNeill (1977).

360 Appendix D
Table D.3
A stem-and-leaf plot of the thrust data of Example 1, Chapter 1

98 9 9
99 3 2 4
99 9 9 6 5 8 '7 6 8 9
100 3 2 0 3 2 1 4 0 2 0
100 6 8 6 7
101 2 4

THE STEM THE LEAVES


Units = 10 (newtons x 10^) Units = 1 (newtons x 10^)

The box plot is another method of plotting a set of observations. It is


particularly useful for comparing several groups of observations, by
constructing a box plot for each group. One normally wants at least
ten observations in each group to get reasonable plots.
For each group we find the largest and smallest observations. We
also find the median and the upper and lower quartiles. The lower
quartile is the value below which \ of the observations lie, while the
upper quartile is the value above which \ of the observations lie. The
distance between the upper and lower quartile is called the inter­
quartile range.
The box plot is constructed by drawing a rectangular box whose
length is equal to the inter-quartile range and which is divided into two
parts at the median. From each end of the box, a line is drawn to the
highest and lowest observations in the group. These lines are
sometimes called whiskers, hence the alternative name box-and-
whisker plot.
As an example consider the sample consisting of the ten
observations:
55, 50, 80, 60, 70, 75, 40, 45, 80, 70.
Arranging them in ascending order of magnitude we have:
40, 45, 50, 55, 60, 70, 70, 75, 80, 80.
With a large sample, the best way to do this ordering is to construct
a stem-and-leaf plot.
The median is the average of the 5th and 6th observations, namely
65. The lower quartile is the average of the second and third

361 Appendix D
40 50 60 70 80 unihs
Figure D,1 A box plot

observations, namely 47*5. The upper quartile is 77*5. The box plot is
then as shown in Figure D.l.
This plot demonstrates that the distribution may have rather short
‘tails’ in that the inter-quartile range is over half the range. The real
value of the plot is in comparing several groups. The sample given
above is the first of four groups given in Example 1 of Chapter 10
where we compared steel wire made by four manufacturers A, B, C and
D. The four box plots are shown in Figure D.2.

40 50 60 70 80 90 100 110
strength of wire

Figure D.2 Four box plots to compare the four manufacturers of steel wire

The box plots demonstrate clearly that manufacturers B and D


generally produce stronger wire than manufacturer A (and C?) and this
result has in fact already been confirmed by the use of ANOVA and
least significant differences in Chapter 10.
The stem-and-leaf plot and the box plot were both introduced by
Tukey (1977) as part of an approach to data called ‘exploratory data
analysis’. This has much in common with ‘preliminary data analysis’ as

362 Appendix D
outlined in Appendix E.3. However, Tukey’s book contains much
material which is hard to follow or which seems too complicated for a
preliminary analysis. The reader is therefore advised to consult
Erickson and Nosanchuck (1977), McNeil (1977) or Tukey and
Mosteller (1977) for further details of these methods.
References
E rickso n , B. H. and N o s a n c h u c k , T. A. (1977), Understanding Data,
McGraw-Hill Ryerson.
M c N e i l , D. R. (1977), Interactive Data Analysis, Wiley.
T u k e y , J. W. (1977), Exploratory Data Analysis, Addison-Wesley.
T u k e y , J. W. and M o s t e l l e r , F. (1977), Data Analysis and Regression,
Addison-Wesley.

D.5. Estimating and testing a proportion


A common problem is that of estimating the proportion of items in a
population which have a certain characteristic. For example we may be
interested in the proportion of items in a batch which are defective, or
the proportion of adults in the British electorate who say they intend to
vote Conservative at the next election.
Let us denote the population proportion by p. In order to estimate p,
we take a random sample, size n, and see how many items in the sample
have the characteristic of interest. If the observed number of items
with the given characteristic is denoted by x, then the sample pro­
portion, x/n, is the obvious point estimate of p.
To get a confidence interval for p, we note that if repeated samples of
size n are taken, the number of items with the given characteristic will
be a random variable. If we denote this random variable by X , then we
see that X will have a binomial distribution with parameters n and p
(see Chapter 4). If n is reasonably large, then we can approximate this
distribution by a normal distribution with the same mean, p = np, and
the same variance = n p ( l - p ) . (see Section 5.4). Thus, standardiz­
ing, we see that ( X - np)/yj [np{\ - p)] will approximately follow a
standard normal distribution. But

(A " -n p )M n p (l - p) ] = ( “ p) / ^ [P (l - p ) / n ] .

Thus
X
---- P

363 Appendix D
By a similar argument to that used on p. 126 to derive a confidence
interval for a mean, we find that a 100(1 - a )% confidence interval for
p is given approximately by

x/n±Za/2 ^[p{l-p)/n].

A difficulty with this formula is that the standard error of x/n, namely
V[p(l —p)/n], involves the quantity p which we are trying to estimate.
For p not close to 0 or 1, we usually replace p by its point estimate, x/n,
to give the confidence interval

x / n ± Z a / 2 ^[x{n - x) / n^] .

Example
In a random sample of 400 British voters, 160 people say that they
intend to vote Conservative at the next election. Find a 95%
confidence interval for the proportion in the whole population.
The sample proportion is 160/400 = 0*4.
Its estimated standard error is 7(160x240/400^) —0-025.
As n is large and p is not close to 0 or 1, we can use the normal
approximation. Thus an approximate 95% confidence interval for p is
given by
0-4±l-96x0-025.
The interval is approximately 0*35 to 0-45.
In order to test the hypothesis that p takes a particular value, say po,
we again use the normal approximation if n is large. The test statistic is
given by

z = ( x - npo)/^/[«Po(l -Po)]-
It can easily be shown that, if the null hypothesis is true, then z will
have a standard normal distribution. It is now straightforward to carry
out an appropriate test, depending on whether a one- or two-tailed
alternative is specified (see Chapter 7).

364 Appendix D
Appendix E
Some general comments on
tackling statistical problems

The reader who has studied the theory and methods described in this
book will now hope to be ready to tackle real statistical problems.
However, the reader will soon discover that life is not as simple as it
may appear in an introductory statistics course, where students are
usually shown only how to apply standard techniques to prepared
data. While such exercises are an essential step in learning statistics,
they do not prepare the student for the possibility that data may not
arrive in a neatly-arranged form with precise instructions as to the
appropriate form of analysis. In practice the data are often ‘messier’
than one would like, and the appropriate method of analysis may not
be obvious. Now that the computer has taken away much of the
drudgery in applying statistical techniques, the analyst can afford to
concentrate more on general strategy, such as selecting the most
appropriate method of analysis.
In this final Appendix, I will try to make some general comments on
how to tackle statistical problems. Of course experience is the best
teacher, but I hope that these remarks will be helpful. I also mention
topics such as ‘Presenting the Results’, which, while indispensable, are
often not covered in conventional statistics courses - see Chatfield (1988)
for more detailed coverage.

E.l Preliminary questions


The first step in any statistical investigation is to clarify the objectives
(see Section 9.1). These are often unclear, and may turn out to be
different from those initially suggested. Giving the right answer to the
wrong question is a more commonplace error than might be expected.
At the preliminary stage it is most important to ask questions of
everyone else involved in the project, not only to clarify the objectives
but also to see if there is any other relevant external information.
The possible use of prior information should be investigated. If the
researcher is unaware of prior information, a literature search should

365 Appendix E
be carried out in the library. When similar studies have been done
before, a new study may be unnecessary or alternatively may be useful
so that established results can be generalized or updated.
The question of costs must be borne in mind at all stages.

E.2 Collecting the data


It is essential to ensure that the data we collect are reliable, accurate
and representative. The observations generally constitute a sample
from the population consisting of all possible outcomes which could be
observed. A sample is taken because it is too expensive and time-
consuming to take all possible measurements or because the
population is infinite anyway. Provided that the sample is genuinely
representative of the population, the analysis of the sample should
enable us to make inferences about the population.
The general principles of designing experiments have been discussed
in Chapters 10 and 11. The other main method of collecting data is by
means of sample surveys. These have not been discussed in this book
because of its technological bias and the rpader is referred, for example,
to Moser and Kalton (1971). Nevertheless, most of us are familiar with
sample surveys through activities such as opinion polls and market
research. In contrast to experimental designs, we simply observe what
is going on, and the sample selected must be representative of the
population as a whole. Random samples, where every member of the
population has an equal chance of being selected in the sample, are
generally preferable, although another type of sampling, called quota
sampling, is also widely used. The novice is often surprised that one
can make predictions about several million people from a sample size
of perhaps one thousand. Yet this can be done by selecting a random
sample. An obvious analogy is to tasting soup. When the soup is being
cooked, one gives it a vigorous stir and takes a tiny teaspoonful which
one knows will be representative of the ‘population’ because of the
‘randomizing’ effect of the stir.
When data have already been collected by other scientists, it is
essential to examine closely the data-collection procedure to see if the
data are likely to be reliable. If not, there is little point in spending
much, if any, effort on the data analysis. Perhaps the commonest
failing in data collection is a lack of randomization and this can have
an effect in all sorts of unforeseen ways. To take just one example,
suppose a scientist carries out a series of experiments sequentially
through time by gradually increasing the value of an explanatory
variable and measuring the corresponding value of a response variable.

366 Appendix E
Then, because the order of the experiments has not been randomized,
the effect of the explanatory variable on the response cannot be
separated from any time effect (the two effects are said to be
confounded). If time has no effect, then this does not matter, but if later
observations are systematically higher or lower, the estimates of the
regression equation will be biased. Unfortunately it is very easy for bias
effects to creep in unnoticed.

E.3 Preliminary data analysis


When a set of data has been collected, it is advisable to start with some
sort of preliminary data analysis to clarify the more important
features of the data and help the analyst get a ‘feel’ for the data. This
analysis consists mainly of calculating summary statistics and plotting
whatever graphs are appropriate, as described in Chapter 2 and
Appendix D. However, this stage of the analysis also includes
processing the data and checking its quality, and this we have hitherto
said little about.
The first step in the preliminary analysis is to assess the number of
variables and the number of observations and then to tabulate the data
in a suitable format for analysis. The quality of the data should be
assessed, particularly if a statistician was not consulted before they were
collected. Are there any missing observations, and if so, why are they
missing and what can be done about them? Have too many significant
figures been recorded for any of the variables and if so, do they need to
be rounded (see Appendix D.3)? Alternatively, have too few significant
figures been recorded in which case the data will be too coarse to give
much information?
If the data are going to be analysed using a computer, then the data
must first be processed by coding, punching and verifying them. Some
useful hints on processing data are given by Chatfield (1988, Chapter 6).
The key point to realize is that errors will inevitably creep in when there
are a large number of observations and some sort of data screening must
be carried out. Various checks can easily be made by the computer to
investigate the credibility, consistency and completeness of the data. For
example, one can carry out range tests for each variable by specifying
upper and lower limits such that it would be virtually impossible to get
values outside the range. Any values found outside this range would be
checked and corrected or discarded. Such observations may be misre-
corded or may be genuine extreme observations (or outliers) which are

367 Appendix E
best om itted or treated separately because they are out of line with the
rest of the data. It is also worth getting a complete printout of the data as
the hum an eye is very efficient at spotting suspect values in a data array.
Sum m ary statistics may then be calculated for the d ata as a whole
and for im p o rtan t subgroups of the data. The statistics include the
m ean and stan d ard deviation of each m easured variable and the
correlation between each pair of variables. The distribution of each
variable may be exam ined by m eans of histogram s or stem -and-leaf
plots to see w hat distributional assum ptions are reasonable. For
example, an observed distribution which is approxim ately sym m etric
and bell-shaped may be approxim ated by a norm al distribution. Box
plots are useful for com paring the variability in different groups. The
scatter plots for pairs of variables should be plotted, and these may
help the analyst spot obvious relationships between variables, detect
outliers and also detect any clusters of observations. A variety of m ore
com plicated graphical procedures are available for m ultivariate data
(see Chatfield and Collins, 1980, Section 3.3). The possibility of
transform ing one or m ore of the variables should also be considered.
U nfortunately the various m otives for m aking a transform ation, such
as stabilizing variance, m aking effects additive and achieving norm ality
may be conflicting.
Sum m ary statistics and graphs are im portant not only in a
prelim inary analysis but also in the presentation of the conclusions.
Some obvious, but much ignored, rules are th at sum m ary statistics
should not contain too m any significant figures (see Appendix D.3),
th at graphs and tables should have a clear title and not be too spread
out, th at all axes in graphs should be labelled and th at when plotting
one variable against another, the scales should be chosen so th at the
variability in one direction is com parable to the variability in the other
direction.
A fter the prelim inary d ata analysis has been carried out, the analyst
should have a good ‘feel’ for the d ata and perhaps a better idea as to
w hat statistical techniques are appropriate. Indeed, in the a u th o r’s
experience, the prelim inary d ata analysis is often all th at is required.
To take just one example, the au th o r was once asked to carry out
regression and analysis of variance on the d ata shown in Figure E.l.
Two treatm ents were being com pared at different values of a regressor
variable, x, and the response variable, y, was m easured in each case.
The scientist who had collected the d ata wanted to dem onstrate that
treatm ent A was significantly different from treatm ent B. I pointed out
th at this was obvious in Figure E.l as there was no overlap at any

368 Appendix E
value of X. This m ade a significance test unnecessary (see the
addendum to C hapter 7). Of course, not all situations are as clear-cut
as this, but it is surprising how m any are.

JQ
to
fo
>
(U
U)
c
o
Q.
U
Q))

predictor variable (X ) ,

Figure E.1 A graph showing the observations on treatments A and B.


( X , observation on A; o, observation on B ; the lines join group means)

E.4 Choice of a more elaborate method


A lthough prelim inary d ata analysis is very im portant, and possibly all
th at is required, there will of course be m any occasions when a m ore
sophisticated analysis is required. Given the bewildering array of
statistical techniques which are available, ranging from regression and
analysis of variance to m ultivariate m ethods and non-param etric
procedures, the reader will often w onder how the ‘rig h t’ m ethod can be
selected. Sometimes the choice o f m ethod is relatively obvious given the
n ature of the data, but som etim es it is not. The reader m ust realize th at
he/she is bound to come across situations which may not be covered in
this book and for which he/she has little or no idea how to proceed.
There is nothing shameful ab o u t this! The correct procedure when this
happens is to carry out a literature search in the library (looking at
books, jou rn als and reference systems - see Section E.6) or to consult
an appropriate colleague or expert. In choosing a m ethod of analysis,
the reader should be prepared for the fact th at ad hoc m odifications

369 Appendix E
may need to be made to standard procedures in order to cope with
particular situations.
Of course, in theory it is desirable to have some idea of a suitable
method of analysis before collecting the data, but this is not always
possible before seeing the data. Sometimes the data will show
unexpected features of obvious importance. Alternatively, having
analysed the data, some discrepancies may result which question the
original choice of model or analysis procedure.
The main objective in many statistical analyses is the construction of
a probability model. The main motives for building probability models
are that they enable the analyst to make inferences from a sample to a
larger population and assess the uncertainty in any conclusions, and
also that they may provide a general parsimonious description of
different sets of data. When setting up a first attempt at a model, prior
knowledge and preliminary data analysis are both very useful. It
should be emphasized that all models are tentative and approximate:
tentative in the sense that they may be modified in the light of
subsequent information, and approximate in that no model is exactly
true. Having fitted a model and looked at the residuals, the model may
well have to be modified, sometimes by complicating it, sometimes by
simplifying it, and sometimes by making major changes in structure.
Indeed there are often a number of cycles in model-fitting as defects in
the original model assumptions are recognized, further data are
acquired, and the model is gradually improved.
One approach to statistical inference which has not been covered at
all in this book is Bayesian inference. This approach requires the
analyst to put his prior information about the problem into a
mathematical form and then use a probability theorem, called Bayes’
theorem, to update this prior information in the light of the data.
While some statisticians are strong advocates of this approach, this
author has rarely found the approach to be practicable. It is often
difficult or impossible to put one’s prior information into the
formalized mathematical form required for this approach. The
approach has therefore been deliberately omitted, although I recognize
that some specialists will not agree with this.

E.5 Using a computer


It is difficult to make general remarks about computer programs, as
facilities vary considerably from place to place and are changing
rapidly anyway. When the analyst decides to use a computer to analyse

370 Appendix E
a set of data, he or she must decide whether to use a computer package,
to augment a published algorithm, or to write a special program. In the
latter case a choice must be made between a standard programming
language, such as FORTRAN, and a specialized language such as
GENSTAT which is particularly appropriate for complex statistical
problems. A choice must also be made between making the program
suitable for the given set of data only, or making it more general so
that it will cope with similar subsequent sets of data.
As regards computer packages, there are many available including
the SPSS and BMDP packages. If you have interactive facilities, then I
particularly recommend the MINITAB package which is easy to use and
covers many routine statistical techniques. It is worth stressing that most
packages are not intelligent enough to tell the user when the data are
unsuitable for the technique being tried. It is also worth pointing out that
even well-established packages may still have ‘bugs’ in them or do things
which are controversial to say the least.

£.6 Using a library


When carrying out a literature search, it is worth stressing that
libraries contain much more than just books. Much recent research
work is published in journals, and the more useful statistical journals
include Applied Statistics, Technometrics, and the Bulletin in Applied
Statistics, amongst many others. The reader may also find it useful to
consult the abstract journals, the Science Citation Index and various
computerized reference systems.
As regards statistical data, most libraries periodically receive tables
of national and international statistics covering an enormous variety of
activities from population and crime to economics and education.

E.7 Presenting the results


The good scientist must be able to communicate his work effectively
both verbally and by means of a written report. The art of report­
writing is most important and the reader may be wise to consult one of
the many books which have been written on this topic. It is certainly
true that many reports are poorly presented and that this reflects badly
on the writer even when the statistics has been carried out correctly.
Some ‘obvious’ but often ignored points to bear in mind when
writing a report are:

371 Appendix E
1. Begin by planning the structure of the report by dividing the work
into sections arranged in a suitable order. Each section should be
given an appropriate title.
2. Include a clear summary.
3. Use simple English with appropriate punctuation.
4. Devote extra care to the clear presentation of graphs, tables and
computer output. It is amazing, for example, how often the scales of
graphs are not clearly labelled. Masses of computer output should
generally not be presented. Rather it is the writer’s job to extract the
important features of the output and present them in a clear way.
5. Be prepared to revise the report several times.
6. Leave the completed report for at least twenty-four hours and then
read it through again trying to put yourself in the place of a reader
who has not seen it before; if possible, get someone else to read the
report as well.
7. If the report is typed, check it carefully for errors as it is futile to try
to blame the secretary for any misprints.
References
C h a t f i e l d , C. (1988), Problem-Solving: A Statistician s Guide, Chapman
and Hall.
and C o l l i n s , A. J. (1980), Introduction to Multivariate
C h a t f ie l d , C.
Analysis, Chapman and Hall.
M o s e r , C. A . and FCa l t o n , G. (1971), Survey Methods in Social
Investigation, Heinemann Educational Books.

372 Appendix E
Answers to exercises

Chapter 2
2. (a) 4-6, 5-3, 2-3. (b) 104-6, 5-3, 2-3. (c) 0-46, 0.053, 0.23.
3. (a) 995-4, 5-3 (b) 130 is a suitable constant, ,x = 136-2, .s^ =11-8

Chapter 3
1 . (a) ( 1 - / 7,)( 1 - / 72) (b) l - ( l - p , ) ( l - / ^ 2)
(C) 1 - d - P l) ( l - P 2) - P lP 2
2. 0-99
3. (a) 5 (b) ^ (c) 5 (look at sample space)
4. (a) -j^ (b) (write out all possible outcomes)
5. (a) f (b) i (c )
6. 60 X 59
7. 26-’
8. 23
9. 0-583
10. (a) 0-72 (b) 0-921 (c) 0-950
11. l-dl^" = 0-67

Chapter 4
1. Binomial distribution; (a) 0*2 (b) 0-817 (c) 0*016
2. (a) P(0) = ( |) \ F (l) = 4 (|f(i). etc.
3. Number of defectives follows a binomial distribution with n = \0
and p = 0*04; 5*8%; very grave doubts — take another sample
immediately.
4. n ^ 69
5. 0-0085
6. Yes, P{2 additional failures) ^ 0 00000003
Yes, P(2 additional failures) ^ 0 00000001
So three-engine jet is three times as reliable in this situation.

373 Answers to exercises


7. (a) 10 cars (b) Yes
8. Tw o: third ambulance would be used once in 150 hours: a good idea.
9. 0-9986; ignore second signal and pray.
10. (a) 0-366 (b) 0-368
11. (a) 0-135 (b) 0-324

Chapter 5
1. (a) 0-317 (b) 0-383 (c) 0-036
2. (a) 0-5 (b) 0-841 (c) 0159
3. (a) 0-954 (b) 0-023
4. 0-017
5. (a) 13-3 (b) 14-7;/c = 3-92
6. 97-7% will have a life-time >400 hours. Therefore purchaser will
be satisfied.
7. 13 questions (11-58 + 0-5, rounded up) using normal approxim­
ation. The pass mark is also 13 using exact binomial values.
8. Expected proportion ofbolts less than 2-983 in. = 0 015:£(ni) = 4-5,
£(« 2) ^ 30-6. In second case solve /i-l-8 4 (j = 2-983 and
//+ 1-96(7 = 3 021, giving /i = 3-001, o = 0-010.

Chapter 6
1. (a) 0-66 (b) 0-4 (c) 0-2; sample size = 16
3. a = -0-52, h = 0-52, c - 0-37, d = 1-88
a = 4-95, h = 7-05, c = 1-48, d = 7-52
5. 14-1 ±2-0
6. 14-1 ±2-2

Chapter 7
1. Zq = 1-33; two-tailed level of significance > 5 % ; therefore accept
H q. 95% confidence interval for f.i is 12-22± 1-96 x 0-4/^6; note that
this interval contains 12-0.
2. s = 0-37, to = 1-45; two-tailed level of significance > 5 % ; accept
f/o- 93% confidence interval for ¡a is 12-22 ±2-57 xO-37/^6.
3. H o’.fA = 3 / / i : / i < 3 % ; one-tailed test; io = 3-16. But
'0-01 9 = 2-82; therefore reject H q at 1 % level.
4. n is smallest integer above 3^(1-64+ l -64)^/(32 —30)^; n = 25

374 Answers to exercises


5. tQ = 4-74; significant at 1% level
6. = 34-5, Iq = 2*49 ;(a^ Yes (b) No
7. = 7 0; 9 degrees of freedom; accept hypothesis
8. Combine three or more accidents; Xo = l with 2 d.f.; reject
hypothesis. (Some women are accident prone so that the Poisson
parameter is not the same for all members of the population. See
M. G. Kendall and A. Stuart, Advanced theory of Statistics, Voi /,
Griffin, 2nd edn, 1963.)
9. Xo = 2*82 with 1 d.f.; not significantly large; effectiveness of drug
has not been proved.

Chapter 8
\. p = 0-514+ 00028x; y - 0-584
2. y = 13-46+1-13 Xcoded year ; y = 16-85
3. (a) 0*780 (b) height = -19*2 + 0.579 x weight
(c) weight = 81 *0 + 1*04 X height
4. y = -0*074 + 0*941x, + 0*598^2

Chapter 9

3^2-1 + = 0-381<
^3/
(mj + ^ 2 + ^3 + 3 ^ 4)
3.
^ 25/ti, 9^2
34 34
5. E{z) ^ 1-06 X 2-31; a,' ^ (0-04)^ x (2-31)' + (1 06)' x (0-05)';
tolerance limits are 2-45 + 0-32

Chapter 10

4. F = 5-4; significant at 5% level


5. F = 11-4; significant at 1 % level
6. F = 19-8; significant at 5% level
7. Ten blocks are necessary; ABC, ABD, ABE, ACD, ACE, ADE,
BCD, BCE, BDE, CDE; randomize the order within each block.

375 Answers to exercises


Chapter 11
2. A nova table
Sum of Mean
Source squares d.f. square F
Catalysts 117-8 3 39-27 20-1
Agitation 90-2 2 45-1 23-1
Interaction 4-1 6 0-68 0-4
Blocks 0-04 1 0-04 0-02
Residual 21-5 11 1-95

Total 233-6 23
Catalyst and agitation effects are highly significant; interaction not
significant. By inspection CE 3 is best combination, but this is only
marginally better than CE 2 which would probably be preferred on
economic grounds.
3. A : - 3 i B:0, C : - 7 , AB: - 2 , /1 C ;-1 , ABC:-\; A
effect significant at 5 % level, B effect not significant, C effect significant
at 1 % level

Chapter 12
1. x-chart has action lines at 50±15/(2-33 x ^ 5 ); K-chart has action
limit at 5 X 5-48/2-32.
2. x-chart has action lines at 10-80+3x0-46/(2-06x2); R-charX has
action limit at 0-46 x 5-31/2-06.
3. P(r defectives in sample size 30) ^ e~^^{0 9Y/r\; warning limit = 4
defectives. Action limit = 6 defectives.
4. 0-023;
P{r observations taken before one lies outside action lines) = (0-977)''" ^
X0-023;
mean = 1/0-023 = 43-5
5. (0-046)^ - 0-0021; (0160)^ = 0-026
6. Remove samples 11 and 12; revised values: x-chart: 15-2 + 2-5,
K-chart: 3-5. Roooi ^ 90; tolerance limits: 15-2 +51.

376 Answers to exercises


Index

Acceptable quality level 290 Blocking 225, 244


Acceptance number 293 in a factorial experiment 260, 269
Acceptance sam pling 288 B ox-Jenkins control 314
Accurate measurements 204 Box plots 21, 361
Action lines 300
Addition law 41 Calibration curve 204
Additivity assum ption 226, 246 Categorical data 21
Advanced stress conditions 16, 321 Causal relationship 197, 231
Alternative hypothesis 135 Central limit theorem 93, 114
Analysis of covariance 225, 278 Centroid 169, 188
Analysis o f variance Chi-square distribution 116-17
in a factorial experiment 266 confidence interval for 131
in a T experiment 273 degrees of freedom of 117, 145, 149, 155
in a randomized block experiment 251 goodness-of-fit test 148-51, 162
one-way 237-41 relationship with normal distribution
two-way 248-52 332
Assignable causes 299 testing independence in a contingency
Attributes, sam pling by 289, 302 table 1 5 1 ^
A utom atic control 313 Class
Average 28 interval 23
Average outgoing quality 296 mark 23
limit 296 Coding 34, 197, 353
Average run length, 309 Coefficient of determ ination 177, 194
Coefficient of variation 33, 210
C om binations 4 9-50
Backward elimination 199 Com parative experiment 225
Balanced incom plete block design 253 Com plete factorial experiment 185, 258
Bar chart 21, 57 advantages of 258-60
Bartlett’s test o f hom ogeneity of variances analysis of 263-9
244
design of 260-63
Batch 37. 51, 288 C om ponents of variance model 276
Bayesian inference 370 C om posite design 285
Bias 205 Com pressed time testing 321
Binomial distribution 57-68 Computer packages 371
mean of 6 4-8 C oncom itant variable 278
normal approxim ation to 57, 93-5 Conditional distribution 76
Poisson approxim ation to 57, 72, 74 Conditional failure rate function 322
variance of 64-9 Conditional probability 42, 76
Bivariate distribution Confidence interval
continuous 103 for population mean {a known) 126
discrete 75 for population mean (a unknown) 128
normal 187, 194 6 for population variance 131

377 Index
C o n fid en ce interval (co n t.) D istrib u tio n (c o n t.)
for slo p e o f line 175 co n tin u o u s 64, 81
C o n fid en ce lim its 127 d iscrete uniform 115
C o n fo u n d in g 278 e x p o n en tia l 9 8 -1 0 3 , 323
C o n siste n t estim a to r 119 F - 155-7
C o n su m e r ’s risk 291 g a m m a 333
C o n tin g e n c y ta ble 152 g eo m etric 80, 102, 316
C o n tin u ity co rrectio n 95 h y p ergeom etric 80, 291
C o n tin u o u s m arginal 76
d a ta 20 n orm al, see N o r m a l d istrib u tio n
d istrib u tio n 64, 81 P o isso n , see P o is so n d istrib u tion
C o n to u r diagram 281 t- 1 2 9 -3 1 ,3 3 3
C o n tra st 272 uniform 83, 87
C o n tro l chart 299 W eib u ll 327
C o n tro l lim its 300 D istrib u tion -free tests 157
C o n tro lled variable 167 D o u b le sam p lin g 293
C o rrela tio n coefficient 1 8 5 -9 0 D y n a m ic p rogram m in g 3 4 8
in terp retation o f 196
theoretical 195 E fficiency 118
C o v a ria n ce 188 Error
C ritical p ath sc h e d u lin g 3 4 8 o f type 1 159, 162
C ritical v alu e 140, 158 o f type II 159, 162
C ro ssed d esig n 277 E stim ate or e stim a to r 29, 118
C u m u la tiv e d istrib u tio n fun ctio n 83 co n siste n t 119
C u m u la tiv e frequency diagram 2 6 -7 interval 107, 126
C u su m chart 3 0 6 -1 2 p oin t 107, 121
C u rve-fitting 167-71 u n b iased 32, 1 1 8 -1 9
E vents 39
D a ta d ep en d en t 4 2 - 6
c o n tin u o u s 20 in d ep en d en t 4 2 - 6
d iscrete 20 jo in t 44
p resen ta tio n o f 2 0 -3 4 m utu ally exclu siv e 4 0
D e c isio n interval 310 E v o lu tio n a ry o p era tio n (E vop ) 285
D eg re es o f freedom 116, 121, 229 E x p ec ta tio n or ex p ected valu e 6 6 -8 , 86,
for d istrib u tio n 116, 145, 149, 155 107-11
E xp erim en tal unit 225
for F -d istrib u tio n 155
E xp loratory d ata a n alysis 362
for i-d istr ib u tio n 130
E x p o n en tia l d istrib u tion 9 8 -1 0 3 , 323
in a n a ly sis o f variance 238, 2 4 8 ,2 6 6 ,2 7 4
E x p o n en tia l sm o o th in g 3 1 4
o fs^ 117, 121, 145
D ep en d en t variable 167 F -d istrib u tio n 1 5 5 -7
D esig n o f ex p erim en ts 185, 224, 260 rela tio n sh ip w ith x^ d istrib u tio n 334
D escrip tive sta tistics 20 F -test 1 5 5 -7
D iscrete F a cto r 225, 257
d a ta 20 F a cto rial exp erim en t, see C o m p le te
d istrib u tio n 56, 64 factorial exp erim en t
D istrib u tio n fun ction 83 F a cto rial n 50
D istrib u tio n F eed b ack c o n tro l system 312
b in o m ia l 5 7 -6 8 F ixed effects m odel 276
ch i-sq u are, see C h i-sq u a re d istrib u tio n F orw ard selection 199

378 index
F raction al factorial design 278 M arginal p rob ab ility d en sity fun ction 103
F requency curve 26, 30, 81 M a th em atical m odel 18, 37, 60, 106, 226
F requency d istrib u tion 21, 66, 75, 3 51 M axim u m lik elih o o d , m eth od o f 123
M ean
G a in 315
a rith m etic 28
G a m e theory 3 4 8 p o p u la tio n 65, 86
G a m m a d istrib u tio n 333 sam ple, see S am p le m ean
G a u ssia n d istrib u tio n , see N o rm a l M edian 30
d istrib u tio n
M ilitary standard plan 297
G e o m etr ic d istrib u tio n 80, 102, 316
M in im u m varian ce u nbiased estim ate
G o o d n ess-o f-fit test 14 8 -5 5 , 162
118, 120
G raphical m eth o d s 20
M I N IT A B 371
M ixed effects m odel 277
H azard fun ction 322
M o d e 30
H isto g ra m 2 2 -5
H o m o g en eity o f variances 244
M o d el, see M ath em atical m odel
M o m en ts, m eth od o f 121
H yp erg eo m etric d istrib u tio n 80, 291
M u ltip le regression 1 7 8 -8 0
H y p o th esis testing, see S ign ifican ce tests
M u ltiv a riate a n alysis 3 4 9
In co m p lete block design 253 M u tua lly exclu siv e even ts 40

In d ep en d en t even ts 4 2 - 6
In d e p e n d e n t v aria b le 167, 199 N ested d esign 277
Inference 18, 106 N o ise 312
Interaction 18, 179, 258, 262, 272 N o n -p a r a m etric tests 157
In terq u a rtile ran ge 3 1 , 361 N o rm a l d istrib u tion 8 7 -9 8 , 3 3 2 -4
Interval estim a te 107, 126 bivariate 187, 194 -6
m ean o f 88
Joint p ro b a b ility 76 standard 90, 130
uses o f 9 2 -5
Latin square 252 variance o f 88
L east significant difference m eth od 241 N o rm a l eq u a tio n s 169, 172, 178
L east squares, m eth od o f 168, 172, 2 1 6 -2 2 N o rm a l p rob ab ility paper 9 5 -8 , 280
L evel 257, 271 N u isa n ce factor 230
Level o f sign ifican ce 137 N u ll h yp o th esis 135
Life testin g 319
L ik elih o o d fun ction 123 O n e-a t-a -tim e exp erim en t 258
Linear p ro gram m in g 3 4 8 O n e-ta iled (o n e-sid ed ) test 139
L inear regression 177 O n e-w a y a n alysis o f variance 237-41
co n fid e n c e intervals and sign ifican ce O p era tin g ch aracteristic cu rve 2 8 9 -9 3
tests 1 7 4-7 O p erations research 347
fitting a straight line 1 6 8 -7 0 O p tim iz a tio n 2 8 0 -8 5
m odel 171, 173 O r th o g o n a l design 273
w ith tw o random variables 1 9 1 -4 O r th o g o n a l p o ly n o m ia ls 176, 180-85
Lot 37, 288 O u tlier 280
to lera n ce percen tage d efective 290

M ain effect 259, 272 Paired c o m p a riso n s test 147


M a n h a tta n diagram 311 P aram eter 18, 37, 60
M a n n -W h itn e y U -test 158 P erm u tation s 4 9 - 5 0
M arginal d istrib u tio n 76 P o in t estim ate 107, 121

379 Index
P o isso n d istrib u tio n 6 9 -7 5 , 124, 151 R esid u als 243, 279
m ean o f 72 R esp o n se surface 281
variance o f 72 R esp o n se variab le 167
P o is so n p ro cess 71, 99 R o b u stn ess 158, 244
P o ly n o m ia l regression 1 7 9 -8 5 R o u n d in g 21, 3 5 5 - 9
P o p u la tio n 20
P o w er 1 5 8 -6 3 S a m p le 20
cu rve 161, 291 m ean 2 8 -9 , 6 4 -5 , 3 51
P recise m ea su rem en ts 204 d istrib u tio n o f 1 1 1 -1 6
P red ictio n interval 177 standard error o f 112
P r o b a b ility 39 p o in t 39
cu rv e 81 size, c h o ic e o f 162, 2 2 7 -3 0
d istrib u tio n 56 sp a ce 38
P ro b a b ility d en sity fu n ctio n 81 S a m p lin g d istrib u tio n 107
P ro b a b ility p aper (norm al) 9 5 -8 , 280 o f the sam p le m ean 1 1 1 -1 6
P ro cess avera g e p ercen ta g e d efectiv e 297 o f the sa m p le varian ce 1 1 6 -1 7
P ro cess co n tr o l 288, 299 S catter d iagram 166
P r o d u c e r ’s risk 290 S creen in g exp erim en t 271
P ro d u c t law 45 S eq u en tial sa m p lin g 294
P r o p a g a tio n o f error 2 0 5 -1 5 Service tim e 101
P ro p o rtio n 289, 363 S h ew h art chart 299
Sign test 157
Q u a lita tiv e factor 257 S ig n ifican ce tests 134
Q u a lity circles 288 co m p a rin g tw o sa m p le m ean s 1 4 3 -6
Q u a lity co n tr o l 2 8 8 -3 1 8 d istrib u tion -free 1 5 7 -8
Q u a n tita tiv e factor 257 g o o d n e ss-o f-fit 1 4 8 -5 5
Q u e u e in g th eo ry 101, 3 4 8 o n a sam p le m ean 14 0 -4 3
paired co m p a r iso n s 147 -8
S ign ifican t resu lt 1 3 7 - 8
R a n d o m effects m o d e l 276
S im u la tio n 3 4 8
R a n d o m n u m b er 115
S in g le sa m p lin g 293
g en era to r 115
S k ew d istrib u tio n s 25, 30
R a n d o m v a ria b le 5 1 -3
S p ecific a tio n lim its 304
R a n d o m iz a tio n 185, 230, 26 0
S tan d ard d e v ia tio n 3 1 -3 , 3 5 1 - 5
R a n d o m ize d b lo ck exp erim en t 244 p o p u la tio n 68
R a n g e 31, 235, 3 0 1 ,3 5 5
S tan d ard error 112
R ecta n g u la r d istrib u tio n , see U n iform S tan d ard n orm al d istrib u tio n 90, 130
d istrib u tio n
S ta tistic 28
R ectify in g sc h e m e 289, 296
S teep est ascen t, m eth o d o f 282
R ed u n d a n t m eter 2 1 5 -1 6 S te m -a n d -lea f p lots 21, 360
R eg ressio n 171 S to c h a stic p rocess 71, 3 4 9
cu rv e 167, 171 S tu d en tized range test 235
linear, see L inear regression S tu d e n t’s i-d istr ib u tio n 1 2 9 -3 1 , 333
m u ltip le 1 7 8 -8 0 Sum m ary sta tistics 28
p o ly n o m ia l 1 7 9 -8 5 Survival tim es 330
R egressor v ariab le 167 S y stem id e n tifica tio n 3 1 2 - 6
R elia b ility 18, 319
fu n ctio n 322 r-D istrib u tio n 129 -3 1 , 333
R ep lica tio n s, n u m b er o f 227 r-Test 142, 1 4 7 -8 , 162

380 Index
Taguchi m ethods 278, 317 V-mask 308
Taylor series 209 Variables, sam pling by 289, 298, 301
Test of significance, see Significance tests Variance 31-3
Test statistic 136 of a sum of independent random
Time-series analysis 349 variables 110
Tolerance limits 207, 304 of the sample mean 112
Total quality control 288 population 68, 87
Traffic intensity 103 Variate 20
Transformations 93, 95, 170, 180, 244
Treatment 225
Treatment com bination 257 Warning lines 3(X)
Two-tailed (two-sided) test 139 Weibull distribution 327
Two-way analysis of variance 248-52 Weights-and-Measures Act 317
Tukey’s T-method 242 W ilcoxon signed rank test 158
Tw o-sam ple tests 143-6 W insorization 280

Unbiased estim ate 32, 118


Uniform distribution 83, 87 Y ates’s m ethod 273-5

381 Index
9780412253409

780412 253409

You might also like