Statistics-Problems and Solutions
Statistics-Problems and Solutions
by
and
and
MACMILLAN
© J. Murdoch and J. A. Barnes 1973
Published by
THE MACMILLAN PRESS LTD
London and Basingstoke
Associated companies in New York Toronto
Melbourne Dublin Johannesburg and Madras
the reader to perform to help him understand the concepts involved. In the
chapters of this book references to Statistical Tables for Science, Engineering
and Management are followed by an asterisk to distinguish them from
references to tables in this book.
The problems and examples given represent work by the authors over many
years and every attempt has been made to-select a representative range to
illustrate the basic concepts and application of the techniques. The authors
would like to apologise if inadvertently examples which they have used have
previously been published. It is extremely difficult in collating a problem book
such as this to avoid some cases of duplication.
It is hoped that this new book, together with its two companion books, will
form the basis of an effective approach to the teaching of statistics, and
certainly the results from its trials at Cranfield have proved very stimulating.
J. Murdoch
Cranfield J. A. Barnes
Contents
List of symbols
Probability theory
1:2:1 Introduction
i222 Measurement of probability
ees Experimental Measurement of Probability
1.2.4 Basic laws of probability
1.2.5 Conditional probability
1.2.6 Theory of groups
ip? 7 Mathematical expectation
1.2.8 Geometric probability
2.9 Introduction to the hypergeometric law
F:2.10 Introduction to the binomial law
$2.11 Management decision theory
1.3 Problems
1.4 Worked solutions
t5 Practical experiments
Appendix 1—specimen experimental results
Theory of distributions 32
22:1 Introduction . 32
PRG: Frequency distributions 33
2233 Probability distributions 35
2.2.4 Populations ee)
Wi) Moments of distribution 37
2.2.6 Summary of terms 38
2.04) Types of distribution 40
2.2.8 Computation of moments 42
2219 Sheppard’s correction 45
2:3 Problems 45
2.4 Worked solutions 48
Vii
Vili Contents
Normal distribution 80
4.2.1 Introduction 80
4.2.2 Equation of normal curve 80
4.2.3 Standardised variate 81
4.2.4 Area under normal curve 81
4.2.5 Percentage points of the normal distribution 82
4.2.6 Ordinates of the normal curve 82
4.2.7 Fitting a normal distribution to data 82
4.2.8 Arithmetic probability paper 82
4.2.9 Worked examples 83
4.3 Problems 89
4.4 Worked solutions 92
4.5 Practical experiments 102
Appendix 1—Experiment 10 of Laboratory Manual* 102
Appendix 2—Experiment 11 of Laboratory Manual* 105
Sampling theory and significance testing (II)—‘t’, ‘F’ and x? tests 170
8.2.1 Unbiased estimate of population variance 170
$32.2 Degrees of freedom 171
8.2.3 The ‘u’-test with small samples 171
8.2.4 The ‘t-test of significance 13
$2.5 The ‘F’-test of significance 174
8.2.6 The ‘x?’-test of significance 175
8.2.7 One- and two-tailed tests C77
8.2.8 Worked examples ET
8.3 Problems 182
8.4 Worked solutions 184
8.5 Practical experiments 192
Appendix 1—Experiment 14 of Laboratory Manual* 193
Greek Symbols
population mean
population variance
Nv
sum of the squares of standardised normal deviates
number of degrees of freedom
magnitude of risk of Ist kind or significance level
magnitude of risk of 2nd kind or (1 —8) is the power of the test
proportion of a population having a given attribute
<<
RES
DAS
Oe standard error
Note: a and £ are also used as parameters of the population regression line
n= at (x;—x) but again no confusion should arise.
Mathematical Symbols
n
n : ea : ,
se a(S) number of different combinations of size x from group of size n
x! factorial x =x(x—1)(x—2)...3x2x1
Fy number of permutations of n objects taken x at a time
n : F
Note: The authors use | but in order to avoid any confusion both are given
in the definitions.
1 Probability theory
1.2.1 Introduction
Probability or chance is a concept which enters all activities. We speak of the
chance of it raining today, the chance of winning the football pools, the chance
of getting on the bus in the mornings when the queues are of varying size, the
chance of a stock item going out of stock, etc. However, in most of these uses of
probability, it is very seldom that we attempt to measure or quantify the
statements. Most of our ideas about probability are intuitive and in fact
probability is a quantity rather like length or time and therefore not amenable
to simple definition. However, probability (like length or time) can be measured
and various laws set up to govern its use.
The following sections outline the measurement of probability and the rules
used for combining probabilities.
4. or 0.5 Probability that an unbiased coin shows ‘heads’ after one toss
% or 0.167 Probability that a die shows ‘six’ on one roll
0 Probability that you will live forever (absolute impossibility)
It will be seen that on this continuous scale, only the two end points are
concerned with deductive logic (although even here, there are certain logical
difficulties with the particular example quoted).
On this scale absolute certainty is represented by p = 1 and an impossible
event has probability of zero. However, it is between these two extremes that
the majority of practical problems lie. For instance, what is the chance that a
machine will produce defective items? What is the probability that a machine
will find the overhead service crane available when required? What is the
probability of running out of stock of any item? Or again, in insurance, what
is the chance that a person of a given agé will survive for a further year?
For example, what is the probability of an item’s going out of stock in a given
period?
Measurement showed that 189 items ran out in the period out of a total
number of stock items of 2000, therefore the estimate of probability of a stock
running out is
P(A)
= 283 = 0.0945
Again, if out of a random sample of 1000 men, 85 were found to be over
1.80 m tall, then
P(at least one goal) = P(1) + P(2) + P(3 or more) = 0.30 + 0.15 + 0.05 = 0.50
Any event either occurs or does not occur on a given occasion. From the
definition of probability and the addition law, the probabilities of these two
alternatives must sum to unity. Thus the probability that an event does not
occur is equal to
1 —(probability that the event does occur)
3. Independent Events
Events are defined as independent if the probability of the occurrence of either
is not affected by the occurrence or not of the other. Thus if A and B are
independent events, then the law states that the probability of the combined
occurrence of the events A and B is the product of their individual
probabilities. That is
Examples
1. In the throw of two dice, what is the probability of obtaining two sixes?
One of the dice must show a six and the other must also show a six. Thus the
required probability (independent events) is
PSX =
2. In the throw of two dice, what is the probability of a score of 9 points?
Here we must consider the number of mutually exclusive ways in which the
score 9 can occur. These ways are listed below
Dice A 3 4 5 6
and or | or | or
Dice B 6 5 4 3
Probability Theory 5
Not
blue 0.6
Blue 0.4
Figure 1.2
This result is valid on the assumption that the number of doors that a car has
is not dependent on its colour.
Figure 1.2 illustrates the situation. The areas of the four rectangles within the
square represent the proportion of all cars having the given combination of
colour and number of doors. The total area of the three shaded rectangles is
equal to 0.58, the proportion of cars that are either blue or have two doors or
are two-door blue cars.
If one card is drawn at random from a full pack of 52 playing cards, the
probability that it is red is 26/52. Random selection of a card means that each
of the 52 cards is as likely as any of the others to be the sampled one.
If a second card is selected at random from the pack (without replacing the
first), the probability that it is red depends on the colour of the first card drawn.
There are only 51 cards that could be selected as the second card, all of them
having an equal chance. If the first had been black, there are 26 red cards
available and the probability that the second card is red is therefore 26/51 (i.e.,
conditional upon the first card being black).
Similarly, if the first card is red, the probability that the second is also
red is 25/51.
The process can be continued; the probability that a third card is red being
26/50, 25/50 or 24/50 depending on whether the previous two cards drawn were
both black, one of each colour (drawn in either order) or both red.
Probability Theory 7
P(first card red) x P(second card red given that first was red) = BxB=z
This result applies whether the two cards are taken one after the other or
both at the same time.
The same result can be obtained if a burnt-out bulb is found on the second
test, the first bulb being good. The two situations are mutually exclusive.
Then P(first bulb good) = 43
P(second bulb burnt out) = 4
Thus P(first bulb good and second burnt out) = 7? x 7 = 3
Bither of these two situations satisfies (b) and the probability of at least one
of the burnt-out bulbs being in the first two tested is given by their sum
B+8-B-2
(c) The probability of at least one burnt-out bulb being found in two tests
is equal to the sum of the answers to parts (a) and (b), namely
20 fer
28 rT
a 1 66.— 66-22
8 Statistics: Problems and Solutions
As a check on this result, the only other possibility is that neither of the
faulty bulbs will be picked out for the first two tests. The probability of this,
using the multiplication law with the appropriate conditional probability, is
Bxh-H
The situation in part (c) therefore has probability of 1 —43 = 3 as given by
direct calculation.
Consider a box containing r red balls and w white balls. A random sample of
two balls is drawn. What is the probability of the sample containing two red
balls?
r=red balls
w-=white balls
If the first ball is red (event A), probability of this event occurring
r
nA r+w
The probability of the second ball being red (event B) given the first was red
is thus
r=A
P(B/A) =
r+w—1
since there are now only (r—1) red balls in the box containing (r + w— 1) balls.
.. Probability of the sample containing two red balls
r (r—1)
e (r+w) : (r+w—-1)
In similar manner, probability of the sample containing two white balls
re cud (w—1)
(r+w) ‘ (r+w-1)
Examples
1. In a group of ten people where six are male and four are female, what is the
chance that a committee of four, formed from the group with random selection,
comprises (a) four females, or (b) three females and one male?
lstmember M F F F
2ndmember F or M or F or F
3rd member F F M F
4th member F -~ F F M
tp x $ xd x4 = 0.0286
and similarly for the third and fourth columns, the position of the numbers in
the numerator being different in each of the four cases. The required probability
is thus 4 x 0.0286 = 0.114.
(a) Probability of no defective items in the sample only arises in one way.
.. Probability of no defective items
P(O)
= Ho x % x Bx...
x 3 =0.33
(b) Exactly one defective item in the sample can arise in 10 mutually exclusive
ways as shown below
D = defective item
G = good item
QaQa”
Quan
aici agaan
Qaqa+ QOaqan
angaga Se
AAD(Qo GOQ95
Qu@
10 Statistics: Problems and Solutions
Permutations
Groups form different permutations if they differ in any or all of the following
aspects.
ABB, BAB are different permutations (3); AA, BAA are different permutations
(1) and (2); CAB, CAAB are different permutations (1) and (2); BAABA, BABBA
are different permutations because of (2).
Thus distinct arrangements differing in (1) and/or (2) and/or (3) form different
permutations.
Heren=4,x=2
Py
4!
=a, = 12
n n!
Cl = ———_
poh (a) x!(n—x)!
As an example, a committee of three is to be formed from five department
heads. How many different committees can be formed?
ee ACEYRe
(2) Amt 2x1)
In gambling, for the game to be fair the expectation should equal the charge for
playing the game. This concept is also used in insurance plans, etc. Use of this
concept of expected value is illustrated in the following example.
Example
The probability that a man aged 55 will live for another year is 0.99. How large
a premium should he pay for £2000 life insurance policy for one year?
(Ignore insurance company charges for administration, profit, etc.)
Figure 1.3 |
Given that the lines are 1 mm thick, the sides of the squares are 60 mm and
the diameter of the coin is 20 mm what is
6|
mm
Figure 1.4
Probability Theory 13
Considering one square (figure 1.4), total possible area (ignoring small
edge-effects of line thickness) = 61? = 3721
For one square, the probability that the coin does not touch a line is
1600 _e048
Be
Thus if the coin falls at random on the board
(a) the chance that it falls completely within the 4 square = § x 0.43 = 0.048
(b) the chance that it falls completely within a 2 square = § x 0.43 = 0.191
(c) the expected payout per trial is (4 x 0.048) + (2.x 0.191) +(1 x 0.191)
= (0.76
Since it costs one coin to play, the player will lose 0.24 of a coin per turn in the
long run. 1
Sample size n = 10
é 90) tio! ©)90!
0 /\10/ _ 0110! *10!80!
*For (a) x=0; P(0) = 100\ 100!
(oe 10! x 90!
AOU BO) 81
= 100%
— 90*%°° * —91 = 0.33
14 Statistics: Problems and Solutions
for(b) x=1,
Both results are the same as before but are obtained more easily.
P(x) =()ea-py"
To illustrate the use of the binomial law consider the following example. A
firm has 10 lorries in service distributing its goods. Given that each lorry spends
10% of its time in the repair depot, what is the probability of (a) no lorry in the
depot for repair, and (b) more than one in for repair?
(6) The probability of more than one lorry being in for repair, P(>1), can best
be obtained by:
P(>1) = 1—P(0)—P(1)
Probability of exactly one lorry being in for repair
Examples
1. Consider, as a simplification of the practical case, that a person wishing to
sell his car has the following alternatives: (a) to go to a dealer with complete
certainty of selling for £780, (b) to advertise in the press at a cost of £50, in
order to sell the car for £850.
Under alternative (b), he estimates that the probability of selling the car for
£850 is 0.60. If he does not sell through the advertisement for £850, he will take
it to the dealer and sell for £780. (Note that a more realistic solution would
allow for different selling prices each with their associated probability of
occurrence.) Should he try for a private sale?
If he advertises the car there is a chance of 0.6 of obtaining £850 and
therefore a chance of 0.4 of having to go to the dealer and accept £780.
The expected return on the sale
= £850 x 0.6 + £780 x 0.4 = £822
run criterion may not be relevant to his once only decision. Compared with the
guaranteed price, by advertising, he will either lose £50 or be £20 in pocket with
probabilities of 0.4 and 0.6 respectively. He would probably make his decision
by assessment of the risk of 40% of losing money. In practice, he could probably
increase the chances of a private sale by bargaining and allowing the price to drop
as low as £830 before being out of pocket.
As a further note, the validity of the estimate (usually subjective) of a 60%
chance of selling privately at the price asked should be carefully examined as
well as the sensitivity of any solution to errors in the magnitude of the
probability estimate.
2. A firm is facing the chance of a strike occurring at one of its main plants.
Considering only two points (normally more would be used), management
assesses the following:
(a) An offer of 5% pay increase has only a 10% chance of being accepted
outright. If a strike occurs:
(a) Considering expected costs for the offer of 5%. Expected loss due to strike
= 0.90[(0.20 x 1) + (0.50 x 2) + (0.30 x 3)] x £1 000 000 = £1 890 000
= £10 000 x 12 x 5 x 10
= £6 000 000
Total (expected) cost of decision = £6 102 000
Thus, management should clearly go for the lower offer and the possible
strike with its consequences, although many other factors would be considered
in practice before a final decision was made.
(a) What is the chance of both drill and lathe not being used at any instant
of time?
(b) What is the chance of all machines being in use?
(c) What is the chance of all machines being idle?
3. A man fires shots at a target, the probability of each shot scoring a hit being
1/4 independently of the results of previous shots. What is the probability that
in three successive shots
4. Five per cent of the components in a large batch are defective. If five are
taken at random and tested
(a) What is the probability that no defective components will appear?
‘ (b) What is the probability that the test sample will contain one defective
component?
(c) What is the probability that the test sample will contain two defective
components?
6. A certain type of seed has a 90% germination rate. If six seeds are planted,
what is the chance that
(a) exactly five seeds will germinate?
(b) at least five seeds will germinate?
7. A bag contains 7 white, 3 red, and 5 black balls. Three are drawn at random
without replacement. Find the probabilities that (a) no ball is red, (b) exactly
one is red, (c) at least one is red, (d) all are of the same colour, (€) no two are
of the same colour.
9. If the probability that any person 30 years old will be dead within a year is
0.01, find the probability that out of a group of eight such persons, (a) none,
(b) exactly one, (c) not more than one, (d) at least one will be dead within a
year.
10. A and B arrange to meet between 3 p.m. and 4 p.m., but that each should
wait no longer than 5 min for the other. Assuming all arrival times between
3 o'clock and 4 o’clock to be equally likely, find the probability that they meet.
11. A manufacturer has to decide whether or not to produce and market a new
Christmas novelty toy. If he decides to manufacture he will have to purchase a
special plant and scrap it at the end of the year. If a machine costing £10 000
is bought, the fixed cost of manufacture will be £1 per unit; if he buys a
machine costing £20 000 the fixed cost will be 50p per unit. The selling
price will be £4.50 per unit.
Given the following probabilities of sales as:
12. Three men arrange to meet one evening at the “Swan Inn’ in a certain town.
Probability Theory 19
There are, however, three inns called ‘The Swan’ in the town. Assuming that each
man is equally likely to go to any one of these inns
(1)
Assembly unit
Figure 1.5
14. A marketing director has just launched four new products onto the market.
A market research survey showed that the chance of any given retailer adopting
the products was
Product A 0.95 Product C 0.80
Product B 0.50 Product D 0.30
What proportion of retailers will (a) take all four new products, (b) take
A, B and C but not D?
= 0.000 003 7
(a) By the multiplication law, the probability of drill and lathe being
idle = 0.50 x 0.60 = 0.30
(b) By the multiplication law, the probability of all machines
being busy = 0.50 x 0.40 x 0.70 x 0.80 = 0.112
(c) Probability of all machines being idle at any
instant = 0.5 x 0.6 x 0.3 x 0.2 = 0.018
By the multiplication law, probability of selecting five good items from the
large batch = 0.95 x 0.95 x 0.95 x 0.95 x 0.95 = 0.77
(b) In a sample of five, one defective item can arise in the following five
ways:
Aa da GAL A
D= defective part
AeA a Dai 2A
A = acceptable part
ANA et
AIA) SAN SAT AD
The probability of each one of these mutually exclusive ways occurring
=5 x 0.0407 = 0.2035
(c) Ina sample of five, two defective items can occur in the following ways:
SPS—2
22 Statistics: Problems and Solutions
7. Conditional probability:
(a) Probability that no ball is red = 42 x 44 x 48 = 0.4835
(b) Probability that 1 ball is red = 3 x ( x 3 x 43) = 0.4352
(c) Probability that at least 1 is red = 1—0.4835 = 0.5165
(d) Probability that all are the same colour
= P(all white) + P(all red) + P(all black)
By the multiplication law, probability that all 8 will be alive = 0.99° = 0.92
10. At the present stage this is best done geometrically, as in figure 1.6,
A and B will meet if the point representing their two arrival times is in the
shaded area.
P(meet) = 1—P(point in unshaded area) = 1—(44)? = 7%
MW —=—5 min
'‘ =
Bs arriva!
time
lOSSSimin i]
11. There are three possibilities: (a) to produce the toys on machine costing
£10 000; (b) to produce the toys on machine costing £20 000; (c) not to
produce the toys at all.
The solution is obtained by calculating the expected profits for each
possibility.
4.50-(1+ oa
(a) Profit on sales of 2000 = e| per unit
4 suri i000)‘,
Profit on sales of 5000 = ¢|
4.50 _ (:
+ 5000 )|
per unit
20 000
(b) As before: profit on sales of 2000 = e|
4.50= (0.501 4000 )
=1x3x4=3
(b) P(all three men meet) = P(1st goes to any inn)
=1xhxh=J
13. (a) There will be no defective components in the assembly if all five
components selected are acceptable ones. The chance of such an occurrence is
given by the product of the individual probabilities and is
(b) If the assembly contains one defective component, any one (but only
one) of the five components could be the defective. There are thus five mutually
exclusive ways of getting the required result, each of these ways having its
probability determined by multiplying the appropriate individual probabilities
together.
Probability Theory 25
lst x component | D A
2nd x component| A D
A = acceptable part
y component | A | or or or or
lst z component | A D = defective part
A
2nd z component] A A SF
meaoSRADR
mw RRA”
dba
The probability of there being just one defective component in the assembly
is given by
2 x (0.10 x 0.90 x 0.98 x 0.95 x 0.95) +(0.90 x 0.90 x 0.02 x 0.95 x 0.95) +
+2 x (0.90 x 0.90 x 0.98 x 0.05 x 0.95) = 0.1592 + 0.0146 + 0.0754 = 0.2492
14. Assume the products to be independent of each other. Then
1.5.1 Experiment 1
This experiment, in being the most comprehensive of the experiments in the
book, is unfortunately also the longest as far as data collection goes. However,
as will be seen from the points made, the results more than justify the time.
Should time be critical it is possible to miss experiment 1 and carry out
experiments 2 and 3 which are much speedier. In experiment 1 the data
collection time is relatively long since the three dice have to be thrown 100
times (this cannot be reduced without drastically affecting the results).
26 Statistics : Problems and Solutions
Appendix 1 contains full details of the analysis of eight groups’ results for
the first experiment, and the following points should be observed in summarising
the experiment:
1.5.2 Experiment 2
This gives a speedy demonstration of Bernoulli’s law. Asn, the number of
trials, increases, the estimate of p the probability gets closer to the true
population value. For 7 = 1 the estimate is either p = 1 or 0 and as n increases,
the estimates tend to get closer to p = 0.5. Figure 1.7 shows a typical result.
1.0
x
= x ifwe e x (eed
seo
°
erx Xx X
ee X
a \/ AY
oO
Figure 1.7
1.5.3 Experiment 3
Again this is a simple demonstration of probability laws and sampling errors.
Four coins are tossed 50 times and in each toss the number of heads is
recorded. See table 6 of the laboratory manual.
Note
It is advisable to use the specially designed shakers or something similar.
Otherwise the coins will roll or bias in the tossing will occur. The results
of this experiment are summarised in table 8 of the laboratory manual and the
variation in groups’ results are stressed as is the fact that the results based on all
groups’ readings are closer to the theoretical than those for one group only.
to obey the theoretical law exactly, they will have been shown that in statistics
all samples vary, but an underlying pattern emerges. The larger the samples
used the closer this pattern tends to be to results predicted by theory. The
basic laws—those of addition and multiplication—and other concepts of
probability theory, have been illustrated.
Other experiments with decimal dice can be designed.t+
Probability Theory
Number of persons: 2 or 3.
Object
The experiment is designed to illustrate
Method
Throw three dice (2 white, 1 coloured) a hundred times. For each throw,
record in table 1
(a) the number of ones
(b) the number of sixes
(c) the score of the coloured die
(d) the total score of the three dice.
Draw up these results, together with those of other groups, into tables
(2, 3, and 4).
Analysis
1.- For each set of 100 results and for the combined figures of all groups,
calculate the probabilities that, in a throw of three dice:
+ Details from: Technical Prototypes (Sales) Limited, 1A West Holme Street, Leicester.
28 Statistics: Problems and Sclutions
Table |
ee GPa lB
2
O | Np | eS |\%
Oo | a
|Qe [ uy 1 Oo u 10
O | 6 Ne \ ) Work
afofe|« t
\ 0) 7
0 4
gs 4
O o
0) 7
0 Ze
2 4
Oo Ze
BO) ie 7
O Oe AS | 4
| O || 6
AolmglOS I ue
T | T
{ | | \o
10) (6) 4 7
Probability Theory
Table 2
Theo —
mental jretical
Pproba-— |proba—
bility
ees
0: 0525
eee
0:077§ |0.06
a
ca
eee
ss 8 |S]
H}0
aN
Ea
AP
|
Onl 22 00872 |
OS
72 004947¢
us lo 2x62 [00694
3 5 lo-0u37|0,0463
T
Za [00263 0.0278
6 |0:007¢|0.0138
4 |O00S5
No. of throws
Probability of
score of more
than 13
2. Compare these results with those expected from theory and comment on
the agreement both for individual groups and for the combined observations.
3. Draw probability histograms both for the score of the coloured die and for
the tctal score of the three dice, on page 27. Do this for your own group’s
readings and for the combined results of all groups.
Probability
that,
one two
in more
or
throw,
e
wo
c4
°=
iS
-
_
°
°
"ha given
of
which
no. ets
Hear
Ht
SIXES
appear
J
ie}
iS
-_
~~
c=
A
fe)
ve)
Table °=
3
a one|SIXES
in face
shows
no
occur
throw,
ac|aha ae
eiSSeenAe
SPR
fe
Pe
given
which
of
no.
No.
|No.
of
in
throwsthrows|ONES
occur
Group
probability
Experimental
Probability Theory 31
vU
Uv
°
he
°
>
=
=
5
a
°
i
eu
®
S
oH
-_
=
°
wo
ss
5
®
Qa
Qa
Lo)
oO
he
°o
Oo
w
(=
o
=
oO
he.
pos
£
4Table =
Ae
w
3
°o
Nes
Ae
_
te
°
fe}
ie
i
ece
SRS
ol
eee
eer:oot
°
Bel etc)
el
[ia
[48
bic
tw
d
ae eS
[eo
ec
orth
eel
| gor ey
eee
PSEC
SRE
fe}
Za die
|coloured
throws
@ 7
= ae ee
SSS
Pa
ee
ee eae
Ed
ee
(ae
eS
ee fo[v7
[we
five
Da]
7k
te
|
ae
Ga iar
fee
[too
Weed
Peel
Be
ee fae heoretical
probabilit
probabilit
2 Theory of distributions
Referring to these data, it will be seen that the figures vary one from the
other; the first is 7.8 h, the next 8.4 h and so on; there is one as low as 7.1 h and
one as high as 8.7 h.
In statistics the basic logic is inductive, and the data must be looked at as a
whole and not as a collection of individual readings.
It is often surprising to the non-statistician or deterministic scientist how
often regularities appear in these statistical counts.
The process of grouping data consists of two steps usually carried out
together.
(2) The range is then sub-divided into a series of steps called class intervals.
34 Statistics: Problems and Solutions
These class intervals are usually of equal size, although in certain cases unequal
class intervals are used. For usual sample sizes, the number of class intervals is
chosen to be between 8 and 20, although this should be regarded as a general
rule only. For table 2.1, class intervals of size 0.2 h were chosen, i.e.,
7.1=1.35 TST Sy aes] Oe
(3) More precise definition of the boundaries of the class intervals is however
required, otherwise readings which fall say at 7.3 can be placed in either of two
class intervals.
Since in practice the reading recorded as 7.3 h could have any value between
7.25 h and 7.35 h (normal technique of rounding off), the class boundaries will
now be taken as:
7.05-7.25, 7.25-7.45, .. ., 8.45-8.65, 8.65-8.85
Note: Since an extra digit is used there is no possibility of any reading’s falling
on the boundary of a class.
The summarising of data in figure 2.1 into a distribution is shown in
table 2.2. For each observation in table 2.1 a stroke is put opposite the sub-range
into which the reading falls. The strokes are made in groups of five for easy
summation.
7.05=7°25 | 1 0.01
125-745 tt 5 0.05
7.45-7.65 TH tit 10 ‘0.11
7.65-7.85 HH THT TH Iill 19 0.20
7.85-8.05 TH ttt tH tH tH II 27 0.28
8.05-8.25 TH tt THT TH II 22 0.23
8.25-8.45 tHt | 6 0.06
8.45-8.65 III 3 0.03
8.65-8.85 \| 2 0.02
Total=95 Total=1.00
Table 2.2
The last operation is to total the strokes and enter the totals in the next to
last column in table 2.2 obtaining what is called a frequency distribution. There
are for example, one reading in class interval 7.05-7.25, five readings in the
next, ten in the next, and so on. Such a table is called a frequency distribution
since it shows how the individuals are distributed between the groups or class
intervals. Diagrams are more easily assimilated so it is normal to plot the
Theory ofDistributions 35
30
25
20
Y,
YY) 8.45 8.65 885 9.05
Dare time (h)
Frequency scale
GZ.
Infinite
sample
Scale of x
[ee fk as eer). eee Yh
=a 2" Sb Oe BS — See Oe es
Figure 2.2. The effect of the sample size on the histogram shape.
here being taken from an experiment in a laboratory. A sample size of 100 gives
an irregular shape similar to those obtained from the data of output times.
However, with increasing sample size, narrower class intervals can be used and
the frequency distribution becomes more uniform in shape until with a sample of
10 000 it is almost smooth. The limit as the sample size becomes infinite is also
shown. Thus with small samples, irregularities are to be expected in the frequency
distributions, even when the population gives a smooth curve.
It is the assumption that the population from which the data was obtained
Theory of Distributions 37
has a smooth curve (although not all samples have), that enables the statistician
to use the mathematics of statistics.
Figure 2.3
Consider now the 1st moment of the distribution about the origin
N
ay p; x; = xX (the arithmetical average)
i=1
Thus the lst statistic or measure is the arithmetical average x. Higher moments
are now taken about this arithmetical average rather than the origin.
Thus, the 2nd moment about the arithmetical average
N
Fs 2 pdx; —x)?
l=
38 Statistics: Problems and Solutions
This 2nd moment is called the variance in statistics, and its square root is called
the standard deviation.
Thus the standard deviation of the distribution
-/[> read
N
4th moment about the average = yi pAx;—x)*
i=
N
or in general the kth moment about the average = by Ddotre ee
=
The first two moments, the mean and the variance, are by far the most
important.
Random Sample
A random sample is a sample selected without bias, i.e., one for which every
member of the population has an equal chance of being included in the sample.
Population or Universe
This is the total number of possible observations. This concept of a population
is fundamental to statistics. All data studied are in sample form and the
statistician’s sample is regarded as having been drawn from the population of all
possible events. A population may be finite or infinite. In practice, many finite
populations are so large they can be conveniently considered as infinite in size.
3.95-4.95 8
4.95-5.95 7
5.95-6.95 5
The class boundaries shown in this example are suitable for measurements
recorded to the nearest 0.1 of a unit. The boundaries chosen are convenient for
easy summary of the raw data since the first class shown contains all
measurements whose integer part is 4, the next class all measurements starting
with 5 and so on.
It would have been valid but less convenient to choose the class as, say,
3.25-4.25;4.25=5.25, 0.4.7
In grouping, any group is called a class and the number of values falling in
the class is the class frequency. The magnitude of the range of the group is
called the class interval, i.e., 3.95-4.95 or 1.
Number of Groups
For simplicity of calculation, the number of intervals chosen should not be too
large, preferably not more than twenty. Again, in order that the results obtained
may be sufficiently accurate, the number must not be too small, preferably
not less than eight.
Types of Variable
Continuous. A continuous variable is one in which the variable can take every
value between certain limits a and b, say.
Discrete. A discrete variable is one which takes certain values only—frequently
part or all of the set of positive integers. For example, each member of a
sample may or may not possess a certain attribute and the observation recorded
(the value of the variable) might be the number of sample members which possess
the given attribute.
Frequency Histogram
A frequency distribution shows the number of samples falling into each class
interval when a sample is grouped according to the magnitude of the values. If the
class form, frequency is plotted as a rectangular block on the class interval the
diagram is called a frequency histogram. Note: Area is proportional to frequency.
Probability Histograms
A probability histogram is the graphical picture obtained when the grouped
40 Statistics: Problems and Solutions
sample data are plotted, the class probability being erected as a rectangular
block on the class interval. The area above any class interval is equal to the
probability of an observation being in that class since the total area under the
histogram is equal to one.
Variate
A variate is a variable which possesses a probability distribution.
Type 1: Unimodal
Examples of this variation pattern are: intelligence quotients of children,
heights (and/or weights) of people, nearly all man-made objects when produced
under controlled conditions (length of bolts mass-produced on capstans, etc.).
A simple example of this type of distribution can be illustrated if one
assumes that the aim is to make each item or product alike but that there
exists a very large number of small independent forces deflecting the aim, and
under such conditions, a unimodal distribution arises. For example, consider
a machine tool mass-producing screws. The setter sets the machine up as
correctly as he can and then passes it over to the operator and the screws
produced form a pattern of variation of type 1. The machine is set to produce
each screw exactly the same, but, because of a large number of deflecting
forces present, such as small particles of grit in the cooling oil, vibrations in
the machine, slight variation in the metal—manufacturing conditions are not
constant, hence there is variation in the final product. (See simple quincunx
unit on page 61.)
ah shaped Ul shaped
|
!
!
|
}
I
'
I
|
!
I
|
!
|
\
|
|
Type 4: Bimodal
This type cannot be classified as a separate form unless more evidence of
measures conforming to this pattern of variation are discovered. In most cases
42 Statistics: Problems and Solutions
this type arises from the combination of two distributions of type 1 (see
figure 2.5).
™ Ms
Figure 2.5. Bimodal distribution arising from two type-1 distributions with
different means m, and mM.
Type 6: U-Shaped
This type is fascinating in that its pattern is the opposite to type 1. A variable
where the least probable values are those around the average would not be
expected intuitively and it is rare when it occurs in practice. One example,
however, is the degree of cloudiness of the sky—at certain times of the year
the sky is more likely to be completely clear or completely cloudy than anything
in between.
DdAm
The 1st moment (arithmetic average) =~—— = X
ad I
or
rs DiXi =X
1
2 pix; — X)?
Xj —Xo
ONE aera ane Or xXj=cutXg
ch. fui
lst moment X = Xo ta
dali
t
2
2nd moment (s’)? = c? |2rd esChu"
Xi
44 Statistics: Problems and Solutions
Example
The values given in table 2.3 have been calculated using the data from table 2.2.
7.05-7.25 TRS 1 4 —4 16
7.25-7.45 235 5 —3 ='5 45
7.45-7.65 1.59 10 —2 =20 40
7.65-7.85 TAS 19 —| —19 19
7.85-8.05 7.95 Dit 0 0 0
8.05-8.25 8.15 22 +] +22 22
8.25-8.45 8.35 6 +2 +12 24
8.45-8.65 8.55 3 +3 +9 29
8.65-8.85 8.75 9) +4 +8 BH
2 ae Lhuj=—7 Dfhu?=225
Table 2.3
2G)
arithmetic average = 7.95 + 0.20(32) =7.94h
_7\2
Table 2.4
Theory of Distributions 45
Thus the 1st moment calculation is unbiased while the answer given for the
2nd moment should be reduced by c?/12.
4. The number of defects per shift from a large indexing machine are given
below for the last 52 shifts:
Z 6 4 5
3 4 3 2
7 3 5 4
5 3 2 1 WW
eS
he NW
ON WorFNnd
eRe
WOW FSF
NAWbBROW
re
WO
= D
fh
NO
WNNAA
5. The crane handling times, in minutes, for a sample of 100 jobs lifted and
moved by an outside yard mobile crane are given below:
5 6 2A 8 7 8 Bl 5 10 Pail
13 15 17 7 DT 6 6 11 9 4
7 4 2 1 92. 10 tts 4 15 11 38
16 52 87 20-3 18 22 11 i 9 8
6 10 10 Wi 37 32 10 26 14 15
28 18 2 17 pA} 4 9 19 10 44 20
We > 20 8 25 14 23 13 12 vi,
9 92 33 p79) 19 151 171 od 4 6
al 1 5) 7 45 6 7 17 f) 19 42
9 6 55 61 52 4 5 102 8 oe
1067 919 1196 785 1126 936 918 1156 920 1192
855 1092 1162 1170 929 950 905 972 1035 922
1022 978 832 1009 1157 1 151 1009 765 958 1039
923 1333 811 1217 1085 896 958 1311 1037 1083
999 932 1035 944 1049 940 lig2 1145 1026 1040
901 1324 818 1250 1203 1 078 890 1303 1147 1289
1187 1067 1118 1037 958 760 1101 949 883 699
824 643 980 935 878 934 910 1058 867 1083
844 814 1103 1000 788 1 143 935 1069 990 880
1037 Mey 863 990 1035 1 112 93 970 1258 1029
7. The number of goals scored in 57 English and Scottish league matches for
Saturday 23rd September, 1969, was:
0 Zz 3
WwW
WN
he = f
COrFRNnN
NNO ns
WN
MAWAaAN
WOR
We BW
WB WwW
KK
Nov fw
es Wh
W
WN (on OWN
WwWNWD
mAWrnNB
48 Statistics: Problems and Solutions
9. The sales value for the last 30 periods of a non-seasonal product are given
below in units of £100:
43 41 74 61 79 60 71 69 63 c
70 66 64 71 71 74 56 74 41 71
63 57 57 68 64 62 a oe 40 76
10. The records of the total score of three dice in 100 throws are given below:
16 + 9 12 11 8 15 13 12 13
8 7 6 13 10 1] 16 14 iy i?
14 14 ~ 13 9 13 8 10 12 14
8 = 10 6 9 10 13 12 13 13
16 ij 13 12 9 8 10 11 12 10
15 12 4 16 10 9 13 10 9 12
9 - 14 13 7 6 11 9 15 8
5 £2 7 6 7 13 13 iia| US. 14
12 i 10 12 12 12 13 3 16 4
+ These data were taken from Facts from Figures by M. J. Moroney, Pelican.
Theory ofDistributions
49
30
20
Frequency
- Basic minutes
Transforming x = x9 + cu
Let xg = 0.09
c=0.01
=the Zuf _ 18 ;
oe Beles aa 0.09 +0.01 x(4° )=0.087 min
Variance ae
ey aes
Lf
ible
104 — a 104— 5.4
(ShLp =C nned) of Ls
0.01 2 =
0.01 2 ( 60 )
= 1.64 x 1074
50 Statistics: Problems and Solutions
Standard deviation
2. Range = 0.46 — 0.17 = 0.29 min; size of class interval = 0.03 min, giving
9-10 class intervals.
0.165-0.195 0.18 3 —3 —9 WY
0.195-0.225 0.21 5 —2 —10 20
0.225-0.255 0.24 25 —] —25 D5
0.255-0.285 0.27 26 0 0 0
0.285-0.315 0.30 19 +1 +19 19
0.315-0.345 0.33 10 +2 +20 40
0.345-0.375 0.36 3 +3 +9 aT
0.375-0.405 0.39 0 +4 0 0
0.405-0.435 0.42 1 +5 +5 25
0.435-0.465 0.45 1 +6 +6 36
Lf= 93 Duf=+15 Lu? f=219
Table 2.6
(For histogram see figure 2.7.)
Xo = 0.27, c= 0.03
Average time
ar=x) Xo
Xa
Luf = 0.27 ++ {0.03 x 93
EC +c
xf =8) = 0.275
Variance of sample
2
| Du f— (auf)* 219 — (+15)?
_0.03? 93
x 216.58 = 0.0021
30
tw(e)
Frequency
y, Zs j
Cis O12) 70.245 (O24040. 3000355) O56 4 OS970:.42 ~O:45
Basic minutes
3. Range = 4.60 —0.01 = 4.59 min; width of class interval = 0.5 min.
0-0.499 19 —2 —38 76
0.50-0.999 ri —1 —11 iM
1.00-1.499 a 0 0 0
1.50-1.999 6 +1 +6 6
2.00-2.499 4 +2 +8 16
2.50-2.999 Ss +3 +9 27
3.00-3.499 2 +4 +8 32
3.50-3.999 3 +5 +15 75
4.00-4.499 0 +6 +0 0
4.50-4.999 ers +7 +7 49
6 Duf =+4 Lu?f= 292
Table 2.7
(For histogram, see figure 2.8.)
Transform
X=Xo t cu
Let
xo s 1.25, c= 0.50
zs Lu 4 _ :
paayhe ae 125-+-0:50-x 56 1.29 min
52 Statistics: Problems and Solutions
Frequency
0 3 —3 —9 Da
1 7 —2 —14 28
2 9 —1 —9 9
3 1 0 0 0
4 9 +] +9 9
5 6 +2 lD 24
6 3 +3 +9 QT
al 2 +4 +8 32
8 0 +5 +0 0
9 1 +6 +6 36
Uf =52 Luf=+12 Lu?f= 192
; Table 2.8
(For histogram see figure 2.9.)
Theory ofDistributions
53
fe)
Frequency
a
eee)
aes
ie
leh
Sal
Ce
ea
O- 9.99 5 35 3 —105 BS
10- 19.99 iS) 30 —2 —60 120
20- 29.99 25 15 —] —15 15
30- 39.99 35 6 0 0 0
40- 49.99 45 3 +] 3 3
50- 69.99 60 4 12 10 25
70- 99.99 85 2 +5 10 50
100-139.99 120 3 arto) ASD) 216.75
140-199.99 140 2 tlt). Dal 364.5
xf = 100 Duf =—104.5 Du?
f= 1108.5
Table 2.9
(For histogram, see figure 2.10.)
SPS—3
54 Statistics: Problems and Solutions
ee SSF (2)
SSS
GW
SX
SS
NSS
SS
MAAAASSSSS.
KG
[NN
J[L
4 SSS.
SS
SSG. : :
5 15 25 5 0
1658
Transform x = x9 + cu
Let
c = 10 min
Oi 35
549.5- 649.5 1 —4 —4 16
649.5- 749.5 1 —3 —3 9
749.5- 849.5 10 —2 —20 40
849.5- 949.5 26 —1 —26 ‘26
949 .5-1049.5 26 0 0 0
1049.5-1149.5 18 +1 +18 18
1149.5-1249.5 11 ' +2 9D) 44
1249.5-1349.5 7 +3 iol 63
Lf= 100 Luf= +8 Lu?f= 216
op NOC ae DAIL Suis eemere Dna ee a Dl Rn aetna cee
Table 2.10
(For histogram, see figure 2.11.)
Transforming x = x9 t+ uc
where c = 100h
Xo = 1000h
Average life of bulbs,
30
Dy 2)
fo)
Frequency
Number of 2
ease Frequency (f) u uf uef
0 2 —4 —8 32
1 9 —3 —27 81
2 11 —2 —22 44.
5 LS —l —15 15
4 8 0 0 0
5 5 ae +5 5
6 5 an +10 20
7 1 +3. +3 9
8 1 +4 +4 16
Lf=57 Luf = —50 Luss fieaee
Table 2.11
(For histogram, see figure 2.12.)
Xo =4 c=1
Average goals/match
—50 |=
x¥=4+1
a +
x(
—_
57 ) 3.12
Variance of sample
2
222 — eer
(s 7 = aleex est ee, Sails
Frequency
Goals / match
Figure 2.12. Number ofgoals scored in soccer matches.
Theory of Distributions 57
54.5- 64.5 1 a4 -4 16
64.5- 74.5 D —3 —6 18
7T4.5- 84.5 2 —2 —18 36
84.5- 94.5 22, —l —22 22
94.5-104.5 33 2a) 0 0
104.5-114.5 22 +1 “E22 22
114.5-124.5 8 +) +16 372
124.5-134.5 g TS +6 18
134.5-144.5 1 +4 +4 16
Lf= 100 Luf = —2 Lu’f= 180
Table 2.12
Transforming x = x9 + cu (For histogram, see figure 2.13.)
where Xo = 99.5
c=10
Average intelligence quotient
gs
¥=99.5+ 2). 99.3
10x (=)
Variance of sample
_9)2
(s')? = 10? i
180— 7 180
30
20
Frequency
Table 2.13
(For histogram, see figure 2.14.)
where x9 = 61.5
c=4
Average sales/period
aa
z= 61.5+4(35)
iia 63
Variance of sample
2
207 — Ce
(s')? = 4? Sear] |= 108.3
Standard deviation of sample s’ = 10.4
7
Frequency
40 48 56 64 72 80
Sales value in £100’s
Figure 2.14. Sales value of a product over 30 time periods.
Theory of Distributions 59
Table 2.14
where x = 10.5
c=2
Average score
2
(s) —
D 42 100
(i)
250gal Gere
100
|= 10
30
Frequency
4 vA
VA, a vA
Figure 2.16
or:
For the case of
(a) The cutter has held the standard and produced a bell-shaped curve.
(b) Here either consciously or not, the standard has been changing.
(c) Here the negative skew distribution has arisen by the cutter again either
consciously or not, placing control on the short end of the straw.
Laboratory Equipment
Shove-halfpenny board or specially designed board (available from Technical
Prototypes (Sales) Ltd).
Method
After one trial, carry out 50 further trials, measuring the distance travelled each
time, the object being to send the disc the same distance at each trial.
Analysis
Summarise the data into a distribution, draw a histogram and calculate the mean
and standard deviation.
P(x)= wey, )
n
P(x) = (")p*(1—p)"-*
3.2.3 The Poisson Law
If the chance of an event occurring at any instant is constant in a continuum of
63
64 Statistics: Problems and Solutions
time, and if the average number of successes in time ¢ is m, then the probability
of x successes in time ¢ is
mxeany
P(x) = x!
Tutors must stress the relationship between these distributions so that students
can understand the type to use for any given situation.
Tutors can introduce students to the use of binomial distribution in place of
hypergeometric distribution in sampling theory when n/N < 0.10.
Students should be introduced to the use of statistical tables at this stage. For
all examples and problems, the complementary set of tables, namely Statistical
Tables by Murdoch and Barnes, published by Macmillan, has been used. As
mentioned in the preface, references to these tables will be followed by an
asterisk.
Note: The first and second moments of the binomial and Poisson distributions
are given below.
Binomial Poisson
1st moment (mean) uw np m
2nd moment about the
mean (variance) o? np(1—p) m
Hypergeometric, Binomial and Poisson Distributions 65
2. A distribution firm has 50 lorries in service delivering its goods; given that
lorries break down randomly and that each lorry utilisation is 90%, what
proportion of the time will
(a) exactly three lorries be broken down?
(b) more than five lorries be broken down?
(c) less than three lorries be broken down?
This is the binomial distribution since the probability of success, i.e. the
probability of a lorry breaking down, is p = 0.10 and this probability is constant.
The number of trials n = SO.
(a) Probability of exactly three lorries being broken down
from table 1*
3
=0.1117
Pe <3)=1- => (*2) o.10*c1 0.1050 = 10.888
3. How many times must a die be rolled in order that the probability of 5
occurring is at least 0.75?
This can be solved using the binomial distribution. Probability of
success, ie. a 5 occurring, is p =.
Let k be the unknown number of rolls required, then probability of x
number of 5’s in k rolls
roo=(‘N(6) (6)
Ke k 1 x 5 k—-x
(Gl
Probability required is
k
KN /ANGf SNE ls
:
k
ENT AN(S\"* )
1 — »e (\é) (é) = probability of not getting a 5 in k throws = (=
= 10.75, = 0:25
Din
4. A firm receives very large consignments of nuts from its supplier. A random
sample of 20 is taken from each consignment. If the consignment is in fact 30%
defective, what is
(a) probability of finding no defective nuts in the sample?
(b) probability of finding five or more defective nuts in the sample?
This is strictly a hypergeometric problem but it can be solved by using the
binomial distribution since probability of success, i.e. of obtaining a defect, is
p = 0.30 which can be assumed constant. The consignment is large enough to
Hypergeometric, Binomial and Poisson Distributions 67
20 199
Px>5)= Dd ( )0.30*(1 —0.30)2°-*
x=5 \%
5. The average usage of a spare part is one per month. Assuming that all machines
using the part are independent and that breakdowns occur at random, what is
(a) the probability of using three spares in any month?
(b) the level of spares which must be carried at the beginning of each month
so that the probability of running out of stock in any month is at most 1 in
100?
This is the Poisson distribution.
The expected usage m = 1.0
(a) .. Probability of using three spares in any month
1.02e7!°
P(3) = 3
from table 2* P(3) = 0.0803 —0.0190 = 0.0613
(b) This question is equivalent to: what demand in a month has a probability
of at most 0.01 of being equalled or exceeded?
From table 2*
random, and calculate and draw the probability distribution of the number of
failures per six months per machine over 100 machines.
Calculate the average and the standard deviation of the distribution.
This is the Poisson distribution.
Expected number of failures per machine per six months, m = 2.
Expected
Number Probability number ap u2f;
of failures P; of failures ; :
fi
0 O71353 1335 —2 —27 54.0
1 0.2707 Dial —l —27.1 21
2 0.2707 Del 0 0 0
3 0.1804 18.0 +] +18 18.0
4 0.0902 9.0 in) +18 36.0
5 0.0361 3.6 +3 +10.8 32.4
6 0.0121 12 +4 4.8 19.2
7 or over 0.0045 0.5 nS 2.5 Noe
1.0000 f;= 100 Lufj=0 u*f;= 199.2
Table 3.1. The values have been calculated from table 2* of statistical tables.
Transform x = uc + Xo
Xone 2
c=1
- 3}
X=Xo eae
Variance
Luf)*
Su2 fos
, D2 199.2 —
(s')? =¢? = f =il2adsarsapes
=)= 1.992
Standard deviation = 1.41
Students will be introduced here to some of the logic used later so that they
can see, even at this introductory stage, something of the overall analysis using
statistical methods.
Table 3.2
Table 3.3
The agreement will be seen to be fairly close and when tested (see chapter 8), is
a good fit. It is interesting to see that the greater part of the variation is due to this
basic law of variation. However, larger samples tend to show that the Poisson
does not give a correct fit in this particular context.
Number of deaths/corps/year 0 1 2, 8} 4
Frequency 109 65 2) 3 1
Table 3.4
From this table the average number of deaths/corps/year, m = 0.61
Setting up the null hypothesis, namely, that the probability of a death has been
constant over the years and is the same for each corps, is equivalent to
postulating that this pattern of variation follows the Poisson law. Fitting a
Poisson distribution to these data and comparing the fit, gives a method of
testing this hypothesis. Using table 2* of statistical tables, and without
interpolating, i.e. use m = 0.60, gives the results shown in table 3.5
Table 3.5
3. Outbreaks of War
The data in table 3.6 (from Mathematical Statistics by J. F. Ractliffe, O.U.P.) give
the number of outbreaks of war each year between the years 1500 and 1931
inclusive.
Table 3.6
Setting up a hypothesis that war was equally likely to break out at any instant
of time during this 432-year period would give rise to a Poisson distribution. The
fitting of this Poisson distribution to the data gives a method of testing this
hypothesis.
The average number of outbreaks/year = 0.69 = 0.70
Using table 2* of statistical tables, table 3.7 gives a comparison of the actual
variation with that of the Poisson. Again comparison shows the staggering fact
that life has closely followed this basic law of variation.
Hypergeometric, Binomial and Poisson Distributions 71
Number of
outbreaks of war 0 1 2 3 4 5 ormore Total
Table 3.7
The actual demands at the MacDill Airforce Base per week for three spares
for B47 airframe over a period of 65 weeks are given in table 3.8.
The Poisson frequencies are obtained by using the statistical tables and
table 3.8 gives a comparison of the actual usage distribution with that of the
Poisson distribution.
The theoretical elements assuming the Poisson distribution are shown in the
table also. It will be seen that these distributions agree fairly well with actual
demands.
0 15 74.2
1 90 90.1
2 54 54.8
3 OR 222
4 6 6.8
5 2, 1.6
6 or more 1 0.4
2. If the chance that any one of ten telephone lines is busy at any instant is 0.2,
what is the chance that five of the lines are busy?
(a) 10% and (b) 20% what percentage of the batches will be rejected?
5. In a quality control scheme, samples of five are taken from the production at
regular intervals of time. ;
What number of defectives in the samples will be exceeded 1/20 times if the
process average defective rate is (a) 10%, (b) 20%, (c) 30%?
6. In a process running at 20% defective, how often would you expect ina
sample of 20 that the rejects would exceed four?
7. From a group of eight male operators and five female operators a committee
of five is to be formed. What is the chance of
(a) all five being male?
(b) all five being female?
(c) how many ways can the committee be formed if there is exactly one
female on it?
8. In 1000 readings of the results of trials for an event of small probability, the
frequencies f; and the numbers x; of successes were:
ER NO TL NIM Ses 3 A G7
Gee 305 365. 5 210 5 80. oneBgnot
tied 12 cadul
Show that the expected number of successes is 1.2 and calculate the expected
frequencies assuming Poisson distribution.
Calculate the variance of the distribution.
10
2. P(5 lines busy) = (5 )0.25 0.8° = 0.0264 from table 1* in statistical tables.
0 0:33
1 0.41
3: eae
05 approximatel
-
4 0.01
5) 0
Table 3.10
5. (a) n=5
p=0.10
From table 1*
Probability of exceeding 1 = 0.0815
Probability of exceeding 2 = 0.0086
1 in 20 times is a probability of 0.05
Number of defectives exceeded 1 in 20 times is greater than 1 but less
than 2.
Hypergeometric, Binomial and Poisson Distributions 75
(b) n=5
p= 0.20
From table 1*
Probability of more than 2 = 0.0579
Number of defectives exceeded 1 in 20 times (approximately) is 2
(c) n=5
p = 0.30
From table 1*
Probability of more than 3 = 0.0318
Number of defectives exceeded 1 in 20 times is nearly 3
6. n= 20
p= 0.20
From table 1*
Probability of more than four rejects = 0.3704
Four will be exceeded 37 times in 100
oe
Probability
Probability
-({)
Total number of ways
25 (Coury
Varirianc oak ae
>f se 1000 :
Binomial Distribution
Number of persons: 2 or 3.
Object
The experiment is designed to demonstrate the basic properties of the binomial
law.
Method
Using the binomial sampling box, take 50 samples of size 10 from the population,
recording in table 18, the number of coloured balls found in each sample.
(Wote: Proportion of coloured (i.e. other than white) balls is 0.15.)
Analysis
1. Group the data of table 18 into the frequency distribution, using the top
part of table 19.
2. Obtain the experimental probability distribution of the number of coloured
balls found per sample and compare it with the theoretical probability
distribution.
3. Combine the frequencies for all groups, using the lower part of table 19,
and obtain the experimental probability distribution for these combined results.
Again, compare the observed and theoretical probability distributions.
4. Enter, in table 20, the total frequencies obtained by combining individual
groups’ results. Calculate the mean and standard deviation of this distribution
and compare them with the theoretical values given by np and ./[np(1—p)]
respectively where, in the present case, n = 10 and p= 0.15.
Sample Results
(= 1)
kisZzo
21-30
31-40
41-50
giver ees =
=
‘Tally-marks'
Group No.__ a
me a
Experimental
frequency 50
Min sas ee
Experimental
oe as
peavosiiy, pisrpyeprnfrsfoupoorpoa
Theoretical
| | | | 10
Ore ESE Ona eee eae
Group eA
2 nba
4abail
3
Number of
coloured
balls per
dete vaptintts
ecs
Bzaabaat
Table 3.14 (Table 20 of the laboratory manual)
Hypergeometric, Binomial and Poisson Distributions 79
ny Sf 406.” 1.618
observed mean=
=4/1.275 = 1.13.
4 Normal distribution
This equation is
(x=)?
naive oils yerions BOO?"
~~ ox/(2m)
where yu is the mean of the variable x
o is the standard deviation of x
e is the well-known mathematical constant (=2.718 approximately)
7 is another well-known mathematical constant (=3.142 approximately)
This equation can be used to derive various properties of the normal distribution.
A useful one is the relation between area under the curve and deviation from the
mean, but before looking at this we need to refer to a standardised variable.
Figure 4.1
u LSie
0
Figure 4.2 a fn b x
82 Statistics: Problems and Solutions
values between a and b. This is equal to the probability that a single random
value of x will be bigger than a but less than b.
By standardising the variable and using the symmetry of the distribution,
table 3* can be used to find this probability as well as the unshaded areas in each
tail.
(a) (c)
(b)
S_ “a N
2) N
(d)
Kl: oO ees)
Figure 4.3
(a) u=1.0
Area = 0.1587
(b) u=2.0
Area in right tail = 0.02275
Thus shaded area = 1 — 0.02275 = 0.97725
(c) By symmetry area to left of wu =—2 is the same as the area to the right
of “= 42.
Thus the shaded area = 0.02275
(d) Area above u = +0.5 is 0.3085
Area below u = —1.5 is 0.0668
Total unshaded area= 0.3753
shaded area = 0.6247
84 Statistics: Problems and Solutions
2. Jam is packed in tins of nominal net weight 1 kg. The actual weight of jam
delivered to a tin by the filling machine is normally distributed about the set
weight with standard deviation of 12 g.
(a) If the set, or average, filling of jam is 1 kg what proportion of tins
contain
(i) less than 985 g?
(ii) more than 1030 g?
(iii) between 985 and 1030 g?
(b) If not more than one tin in 100 is to contain less than the advertised
net weight, what must be the minimum setting of the filling machine in order to
achieve this requirement?
Figure 4.4
( w= 285 zee ay ae
Using table 3* and the symmetry of the curve, the required proportion is 0.1056
ay MGSO“ NOOO TOT
(iii) To find a shaded area as in this case, the tail areas are found directly
from tables* and then subtracted from the total curve area (unity).
Normal Distribu tion 85
The lower and upper tail areas have already been found in (i) and (ii) and
thus the solution is
0.01
1000
Figure 4.5
From table 4* (or table 3* working from the body of the table outwards),
1% of a normal distribution is cut off beyond 2.33 standard deviations from the
mean.
The required minimum value for the mean is thus
1000 + 2.33 x 12 = 1028 g= 1.028 kg
3. The data from problem 1, chapter 2 (page 46), can be used to show the
fitting of a normal distribution. The observed and fitted distributions are also
shown plotted on arithmetic probability paper.
The mean of the distribution was 0.087 min and the standard deviation
0.013 min. The method of finding the proportion falling in each class of a
normal distribution with these parameters is shown in table 4.1. The expected
class frequencies are found by multiplying each class proportion by the total
observed frequency. Notice that the total of the expected normal frequencies is
not 60. The reason is that about 4% of the fitted distribution lies outside the
range (0.045 to 0.125) that has been considered.
Table 4.2 shows the observed and expected normal class frequencies in
cumulative form as a percentage of the total frequency. Figure 4.6 shows these
two sets of data superimposed on the same piece of normal (or arithmetic)
probability paper.
The dots in figure 4.6 represent the observed points and the crosses represent
the fitted normal frequencies. Note that the plot of the cumulative normal
percentage frequencies does not quite give a straight line. The reason for this
is that the 7% of the normal distribution having values less than 0.045 has not
been included. If this 7% were added to each of the cumulative percentages in
the right-hand column of table 4.2 then a straight-line plot would be obtained.
SPS—4
Statistics: Problems and Solutions
Uy e9eL
pore ree a Sic Tee Te Se ee
STTO-SOT'O
09 86S 9L66°0
‘0-cOl:
8510°0
4 Po eS Gn-eit0
SIT0
aN
0890°0
0 8°0 OF 10°
:STZ
Vv
:
3
;
6 OTT 8E810 9/.97°0 29°0 $60°0 SOT'0-S$60°0
EZ LI 07670 - . $60°0-S80°0
; : 96SS'0 = vOrr 0-T St0- $30°0 Pea
vl Lt 91970 ZIZ8°0 = 88L1:0-1 260— $00 ¢80°0-S$L0°0
v 0°8 rt omOO) ShS6'0 = SSb0'0-1 69 I— 590°0 ¢L0°0-$90°0
S eC 98€0°0 see yaa ste ; $90°0-S$S0°0
0 70 £900°0 1€66'0 = 6900°0-1 9V'C ¢s00 $60°0-S+0°0
7666°0 = 9000'0-T US =e 0
Abe $v0'0-S€0°0
Aouonbaly Aouonbeljy Sseyo <n, Avepunoq
(Nn, ss0ge ‘q’°N jo Arepunoqg
ssejo jewiou yore dd Joddn oe ssefO
pearesqo pojoodxq UI BoIV Wiehe ln ns pesipiepurys D
86
Normal Distribu tion
sayejnum
% ng aATJeNUIND sanyejnuing
% aAT]eINUIND
SSPID Aouenbealy
Aouenbaly Aouanbely Aouanbor.4 Aouenbasy Aduonbely
¢$$0°0-S+0°0 v0 v0
$90°0-SS0°0 S €°8
L0
Sc, LE Sv
¢L0°0-S$90°0 6 Ost 0°8 LOl SLI
$80°0-SL0°0 €¢ tatste ESI VIC O'bY
£60°0-S80°0 ov LOE SUB Ev 6 CEb
SO1'0-$60°0 SS L'16 OTT VS 6° S16
STT'O-SOT'O 09 0°00I Tv 0°6S €°86
ScIO-SIT'O 80 86S L’66
AGeLCv
87
88 Statistics: Problems and Solutions
oO
(=
o
: RAE
=]
: BSAg-)a
9
=
a
=)
S)
; alte
A further point to note is that the cumulative frequences are plotted against
the upper class boundaries (not the mid point of the class) since those are the
values below which lie the appropriate cumulative frequencies.
In addition, if the plotted points fall near enough on a straight line, which
implies approximate normality of the distribution, the mean and standard
deviation can be estimated graphically from the plot. To do this the best
straight line is drawn through the points (by eye is good enough). This straight
line will intersect the 16%, 50% and 84% lines on the frequency scale at three
points on the scale of the variable.
The value of the variable corresponding to the 50% point gives an estimation
of the median, which is the same as the mean if the distribution being plotted
is approximately symmetrical.
The horizontal separation between the 84% and 16% intercepts is equal to
Normal Distribution 89
2o for a straight line (normal) plot and so half of this distance gives an estimate
of the standard deviation.
Applying this to the fitted normal points, the mean is estimated as 0.087
and the standard deviation comes out as 0.5 (0.100—0.074) = 0.013, the
figures used to derive the fitted frequencies in the first place. The small bias
referred to earlier caused by omitting the bottom 74% of the distribution in the
plot has had very little influence on the estimate in this case.
6. The data summarised in table 4.3 come from the analysis of 53 samples of
rock taken every few feet during a tin-mining operation. The original data for
each sample were obtained in terms of pounds of tin per ton of host rock but
since the distribution of such a measurement from point to point is quite skew,
the data were transformed by taking the ordinary logarithms of each sample
value and summarising the 53 numbers so obtained into the given frequency
distribution.
Fit a normal distribution to the data.
**7_ The individual links used in making chains have a normal distribution of
strength with mean of 1000 kg and standard deviation of 50 kg.
If chains are made up of 20 randomly chosen links
(a) what is the probability that such a chain will fail to support a load of
900 kg?
Normal Distribu tion 91
0.6-0.799 1
0.8-0.999 3
1.0-1.199 6
L.2=1.399 8
1.4-1.599 12
1.6-1.799 11
1.8=1-999 6
2:0=2:199 4
Do). 399 2
53
Table 4.3
(6) what should the minimum mean link strength be for 99.9% of all chains
to support a load of 900 kg?
(c) what is the median strength of a chain?
**8_ The standardised normal variate, u, having mean of 0 and variance of 1, has
probability density function
tay
p (uv)
(a) The proportion two standard deviations is 0.02275 (from the table).
(b) From the symmetry of the normal distribution, 0.3085 of the area is
further than 0.5 standard deviations below the mean.
(c) 0.0668 of the distribution is beyond one and a half standard deviations
from the mean in each tail. Thus the proportion within 1.5 standard deviations is
1 —(0.0668 + 0.0668) = 0.8664
2. (a) u=
68—56 _12_ 12
19. 10
Thus, 0.1151 of the area exceeds 68
Figure 4.8 56 68
Figure 4.9 40 56
Figure 4.10 56 65
Normal Distribution 93
Figure 4.12 52 56 65
+130=—99.3
(iii) LO. = 1305 4= eae eyae = 2.29
f pl Ui ok ear
I.Q. = 70, u ear 2.19
Figure 4.15
(b) (i) For all normal distributions, 1% in the tail occurs at a point 2.33 standard
deviations from the mean. (See table 4* or use table 3* in reverse.)
Thus, 1% of all children will have an I.Q. value greater than
a=0.01
a=0.00 |
(iii) Ten per cent of children will have I.Q. values less than the value which
90% exceed.
The u-value corresponding to this point is —1.28 and converting this into the
scale of I.Q. gives
(c) We need to find the lower and upper limits such that the shaded area is 95%
of the total. There are a number of ways of doing this, depending on how the
remaining 5% is split between the two tails of the distribution. It is usual to
divide them equally. On this basis, each tail will contain 0.025 of the total area
and here the required limits will be 1.96 standard deviations below and above
the mean respectively.
Thus, 95% of children will have I.Q. values between
99.3—1.96x 13.4 and 99.3+1.96x 13.4 i.
99.3-26.2 and 99.3+26.2
We have assumed that the original sample of 100 children was taken randomly
and representatively from the whole population of children about whom the
above probability statements have been made. This kind of assumption should
always be carefully checked for validity in practice.
In addition, the mean and standard deviation of the sample were used as
though they were the corresponding values for the population. In general, they
will not be numerically equal, even for samples as large as 100, and this will
introduce errors into the statements made. However, the answers will be of the
right order of magnitude which is mostly all that is required in practice.
The assumption of normality of the population has already been mentioned.
4. (a) If the mean length is 1.45 m then the maximum deviation allowed for a
stocking to be acceptable is
+
0.020
0.013
standard deviations, i.e. u = +1.54.
The percentage of unacceptable output is represented by the two shaded
areas in figure 4.20 and is 2 x 0.0618 x 100 = 12.36%.
96 Statistics: Problems and Solutions
= OLONS
(b) This time the two shaded areas are each specified to be 0.025 (23%).
Therefore the tolerance that can be worked to corresponds to u = + 1.96,
ie. to + 1.96 x 0.013 = + 0.025 m, or +25 mm.
0.025 0.025
(c) The lower and upper lengths allowed are 1.425 m and 1.475 m respectively.
The shaded area gives the proportion of stockings that do not meet the standard
when the process mean length is 1.46 m.
_ 1.475— 1.460_ ; a
1.475m, u Sie ledksye area = 0.1251
_ LAZS
= 1460 : F
1.425m, u OES 2.69; area = 0.0036
Bee We
5. (a) For 1.83m, u= Mgigeae = 1.56
ow=0.050
(c) Men shorter than 1.83 — 0,13 = 1.70 will have a clearance of at least
0.13 m.
; E110
= Wise
Corresponding u = 00 0.47
(d) The frame height which is exceeded by one man in a thousand will be
3.09 standard deviations above the mean height of men, i.e. at
Figure 4.26
The problem can be extended by allowing some people to wear hats as well
as shoes with different heights of heel.
This problem was intended to give practice in using normal tables of area. Any
practical consideration of the setting of standard frame heights would need to
take account of the physiological and psychological needs of human door users,
of economics and of the requirements of the rest of the building system.
Coded
xy ifs variable (uw) fu fu?
0.6-0.799 1 —4 —4 16
0.8-0.999 3 —3 —9 27
1.0-1.199 6 —2 —12 24
1.2-1.399 8 —1 8 8
1.4-1.599 12 0 =33 0
1.6-1.799 11 1 11 11
1.8-1.999 6 2D 12 24
2.0-2.199 4 3 12 36
2.2-2.399 "2 4 8 Si
53 43 178
=3o
10
Table 4.4
Using these two values, the areas under the fitted normal curve falling in each
class are found using table 3* of the statistical tables. This operation is carried
out in table 4.5. Note that the symbol uw in the table refers to the standardised
normal variate corresponding to the class boundary, whereas in table 4.4 it
represents the coded variable (formed for ease of computation) obtained by
Normal Distribution
99
subtracting 1.5 from each class midpoint and dividing the result by 0.2, the
class width.
———————
eee
: Expected
Class Class ; u Area above u Area el normal
boundaries each class
frequency
ae
0.6-0.8 0.6 oe2.58 1-0.0049=0.9
Se a eS 4054 0.86
Pe 0.8 =2.03 1-0.0212=0.9788 o4gy Bice
1.0-1.2 1.0 1.48 1—0.0694=0.9306 ~
0.1068 5.66
a a 1.2 0.93 1-0.1762= 0.8238 4 jas eas
1.4-1.6 1.4 —0.38 1—0.3520=0.6480 0.2155 11.42
heer8 1.6 0.17 0.4325 9 1967 10.43
1.8-2.0 1.8 “hOel Zin 0.2358 - 9 1338 7.09
2.0-2.2 2.0 E27 0.1020, 9 0676 3.58
2.2-2.4 oe, ae ishao 0.0255 1.35
WAla 6 2. 3
Table 4.5
7. (a) Since a chain is as strong as its weakest link, the chain will fail to support
a load of 900 kg if one or more of its links is weaker than 900 kg.
The probability that a single link is weaker than 900 kg is given by the area
in the tail of the normal curve below
a= 50
900 1000
Figure 4.27 Single link strength
Let p be the probability that an individual link is stronger than 900 kg.
Then we have that
p”® = 0.999
p = 0.99998 (using 5 figure logarithms)
(c) In the long run, one chain out of every two will be stronger than the
median chain strength.
Let p be the probability that an individual link exceeds the median chain
strength.
Then from p”°= 0.5
p = 0.96594 (using 5 figure logarithms)
and the probability that an individual link is Jess than the median chain strength
is (1 —p) = 0.0341.
a=50
0.0341
: u=—|.82 1000
Figure 4.29 Single link strength
(uv)
u d(u) du [ue—24” du
i a (27) (1--@)
ig em
(1—a)/(27) gigs
| g(u) du
—oo
= 1 Te hey ie 1
“ieaVGay? "= Gay He
Since the mean was previously at u = 0 (i.e. when a = 0), the above
expression also represents the shift in mean.
o(u,) is the ordinate (from table 5* of statistical tables) of the normal
distribution corresponding tou = uy.
The result just obtained can be used to solve the numerical part of the
problem.
The bottle contents are distributed normally but if the segregation process
operates perfectly (which it will not do in practice), the distribution of bottle
contents offered for sale will correspond to the unshaded part of figure 4.31.
991 1000
Figure 4.31 Bottle contents (ml)
pe_ 991~—1000
oS 5 _
1.8
Note: The change in mean is positive since the truncation occurs in the lower
tail instead of the upper tail.
The mean volume of bottle contents is therefore 1000 + 0.41 = 1000.4 ml.
Appendix 1—Experiment 10
Normal Distribution
Number of persons: 2 or 3.
Object
To give practice in fitting a normal distribution to an observed frequency
distribution.
Method
The frequency distribution of total score of three dice obtained by combining
all groups’ results in table 2, experiment 1, should be re-listed in table 26
(Table 4.6).
Analysis
1. In table 26, calculate the mean and standard deviation of the observed
frequency distribution.
2. Using table 27, fit a normal distribution, having the same mean and standard
deviation as the data, to the observed distribution.
3. Draw the observed and normal frequency histograms on page 46 and comment
on the agreement.
Notes ;
1. It is not implied in this experiment, that the distribution of the total score
of three dice should be normal in form.
2. The total score of three dice is a discrete variable, but the method of fitting
a normal distribution is exactly the same for this case as for a frequency
distribution of grouped values of a continuous variable.
Normal Distribution 103
5.5—-6.9
6:0 -7,9
(gleeats)
8°95
= 9.9
13.5-14.5 14
15; 5-16.5
(6.551%5
liver:
+ve terms
Tot al of
=v e terms
a
\\ N
\
The standard deviation, s', of
the sample is given by
s! = ./(variance)
Net Totals \V
{ee 28S)
S
oS
L 4
Lee)
5 +
Sr
6
655
>
1625)
Appendix 2—Experiment 11
Normal Distribution
Object
To calculate the mean and standard deviation of a sample from a normal
population and to demonstrate the effect of random sampling fluctuations.
Method
From the red rod population M6/1 (Normally distributed with a mean of 6.0
and standard deviation of 0.2) take a random sample of 50 rods and measure
their lengths to the nearest tenth of a unit using the scale provided. The rods
should be selected one at a time and replaced after measurement, before the
next one is drawn. ;
Record the measurements in table 28.
Care should be taken to ensure good mixing in order that the sample is
random. The rod population should be placed in a box and stirred-up well
during sampling.
Analysis
1. Summarise the observations into a frequency distribution using table 29.
2. Calculate the mean and standard deviation of the sample data using table 30.
3. Compare, in table 31, the sample estimates of mean and standard deviation
obtained by each group. Observe how the estimates vary about the actual
population parameters.
4. Summarise the observed frequencies of all groups in table 32. On page 51,
draw, to the same scale, the probability histograms for your own results and
for the combined results of all groups. Observe the shapes of the histograms
and comment.
a fg Supe
We ee]
1-10
11-20
21-30 = |
31-40
41-50
ee
[eae fe eh
Table 4.8 (Table 28 of the laboratory manual)
Summarise these observations into class intervals of width 0.1 unit with the
measured lengths at the mid points using the ‘tally-mark’ method and table 29.
106 Statistics: Problems and Solutions
Class
Class
interval “‘Tally-marks’ Frequency
mid point
(units)
5-35-3249 5.4
5.45-5.55 Sos
3-3 9=509 SG
Dede el Sil,
Se Byte)
5:85-5:95 559
3295-6.05 6.0
6:05-6.15 6.1
G.25=0250 6.3
6.35-6.45 6.4
6.45-6.55 6.5
6.55-6.65 6.6
Total frequency
Class Mid
Interval, point writes wens
= [a
units
26He
Hi
§:85=5:95-| 5.9
feareisfer
|
6.35-6.45 | 6.4 i ies
6.55—-6.65 | 6.6
posalof ine Yj, pao
\S
eens LZAZZZZ_
at
[netrors
\
V/V \\ nt
Table 4.10 (Table 30 of the laboratory manual)
$F
2f
s' =+/(variance)
108 Statistics: Problems and Solutions
Sample
| Pee
Grou Sample M x Standard
- size St deviation
ieee
2 si —
| =
B
es
ee
FE
aoe at pees et
7
ao 8
[FoR
Population
E parameters 6.00
| 0.2
|
Table 4.11 (Table 31 of the laboratory manual—summary of data)
iif
Total
frequencies
|
hos groups)
I
v
c
ATT d1IjaWOasIOd [erurourg uOSSsIOg [BULION
uorynqriystp uorynqrtystp uorynqrijstp NQIIsIp
UOT
( a) W—
[eIaUayUWII9}
JO x-—u\/x u x yisuoq
p wornWorjngriiys
x)q(x)dqt ee EN (dared
me
ae — jxalia 3uorounyx) z(1—
“(xn
c
z0¢
=(x)/fL 3
/NO uz
WwW
uvoy xu N du ul nt
1SIJ0N uolejndu
jo og uorjejndu
[f[M rog joa1IG uoljeInduros SaTIT]IGeqo
Uvd lg
AlIsea
9q
SartqtyIqeqoid
SI AT[ensn
oq SNOTPd9}
UdAd $1 sJoIsea ueyy10} (1) pourejqo
Woy
& 9]42)JO
dAIssaoxe
puke 94} It 4 st ‘9[qeorjoesd
J] 10 (7) satqey,
JO seore IopuN 9Y} [BUulIOU
UOT}NGIISIp 9e[NUIIOJ satqe}
jo Satj[Iqeqoid uosstog SorjIqeqoid “aAINO
1] SI Alessad9au
0}
paou A[UOoq pasn
UI are jou ‘oTqeyieae
ay} ore Ajipeol ‘ayquyieae Ssoidxa
94} a[QeIIeA
UI
sor10e1do10yM
9y} [ensn 93e11do1dde
ouo JO pastprepuej
‘UIIOJ
‘a°t
s UI
SUOT}eWITxo1dde
‘o'T) suornqrysip
(¢) pue SUIIO}
JO
uorjnqrsysip
(Z) Pue (p) UvD A[[e1oues
aq
sj1 (suoljewtxoidde
Op pasn
se e Poos
jou dats 94} polinbol ‘uol]ewTxoidde
*Aob’INIIV
Relationship Between the Basic Distributions 111
: M n
utt
puttingp =N ages7 < 0.10
if
puttingm = np if p<0.10
19 20 2! 22
Figure 5.1 Number of deaths
Table 5.3 gives a full comparison of the probabilities of finding x defects in the
sample.
Table 5.3
Table 5.4
Relationship Between the Basic Distributions 113
From table 2*
Probability of accepting batches with 2% defectives = 0.3679
2. In 50 tosses of an unbiased coin, what is the probability of more than 30
heads occurring?
This requires the binomial distribution which gives
50
A> 30 heads)= E (3)ayaye
Resi
a0 =3.54
Jao 25 5 soc)
33 Saroaeigaa See
which from table 3* leads to a probability of 0.0606.
Note: Since a continuous distribution is being used to approximate to a
discrete distribution, the value 30.5 and not 30 must be used in calculating the
u value.
3. A machine produces screws 10% of which have defects. What is the probability
that, in a sample of 500
(a) more than 35 defects are found?
(b) between 30 and 35 (inclusive) defects are found?
The binomial law: assuming a sample of 500 from a batch of at least 5000.
The normal approximation can be used since p > 0.10, np = 50.
w= np = 500 x 0.10= 50
o = (500 x 0.10 x 0.90) = 45 = 6.7
(b) Probability of between 30 and 35 defects, use limits 29.5 and 35.5.
o=6.7
y 3 0* *:e~1 30
x= x=41. = 0:03.23
o =V30= 5.48
To determine the solution, part of London was divided into 576 equal areas
(4 km? each) and the number of areas with 0, 1, 2, . . ., hits was tabulated from
the results of 537 bombs which fell on the area. These data in distribution form
are shown in table 5.5.
116 Statistics: Problems and Solutions
Number of hitsj 0 1 2 3 4
Number of areas with] hits 229 DiI 93 65) 7 1
eS a ee ee
Table 5.5
In statistical logic, as will be seen later, an essential step in testing in the logic
is the setting up of what is called the null hypothesis.
Here the null hypothesis is that the bombs are falling randomly or that there
is no ability to aim at targets of the order of 4 km? in area.
Then if the hypothesis is true, the probability of any given bomb falling in
any one given area = 3%.
Probability of x hits in any area
2 (O37 Ten\Vi.£575\22)-
HO (x Nate) G2)
from the binomial law.
However, since the probability of success is very small and the number of
attempts is relatively large, the Poisson law can be used as an approximation to
the binomial thus greatly reducing the computation involved.
Thus, for the Poisson calculation
average number of successes m = np = 537 x 3% = 0.93
Number of hitsj 0 1 2, 3 4 5
Probability of j hits 0:39.55; 10:367) + OF1708 10:053h.520,0125 2 0: 002
Table 5.6
Table 5.7 shows the results obtained by comparing the actual frequency
distribution of number of hits per area with the Poisson expected frequencies if
the hypothesis is true.
Table 5.7
Relationship Between the Basic Distributions 117
0 2 0.0273 1
1 6 0.0984 5
2 9 Onli 9
3 11 O125 iil
4 8 0.1912 10
5 6 O213 77, Wl
6 4 0.0826 4.5
7 3 0.0425 2
8 2 0.0191 1
9 1 0.0076 0.5
10 0 0.0040 0.2
52 1.0000 51.2
Table 5.9
SPS—5
118 Statistics: Problems and Solutions
However, here again the Poisson law gives an excellent approximation to the
binomial, reducing the computation considerably.
It should be noted that in most attribute quality control tables this Poisson
approximation is used.
Using m = 3.6, table 5.9 gives the comparison of the actual pattern of variation
with the Poisson.
Reference to the table indicates that the defects in the period of 52 shifts did
not show any ‘abnormal’ deviations from the expected number.
Thus, this comparison gives the basis for determining whether or not a
process is in control, the basic first step in any quality control investigation.
3. Assuming equal chance of birth of a boy or girl, what is the probability that
in a class of 50 students, less than 30% will be boys?
5. In a hotel, the five public telephones in the lobby are utilised 48% of the time
between 6 p.m. and 7 p.m. in the evening. What is the probability of
(a) all telephones being in use?
(b) four telephones being in use?
6. A city corporation has 24 dustcarts for collection of rubbish in the city. Given
that the dustcarts are 80% utilised or 20% of time broken down, what proportion
of the time will there be more than three dustcarts broken down?
7. A batch of 20 special resistors are delivered to a factory. Four resistors are
Relationship Between the Basic Distributions 119
defective. Four resistors are selected at random and installed in a control panel.
What is the probability that no defective resistor is installed?
u=5.0
o=6:3
_ 60.5-50 _
(a) u 6.3 1.67
_ 65.5-50 _ 2.46
ore
Probability of more than 65 machines idle P(>65) = 0.0069
Also
u=
59.5-50
per _ et
Probability of more than 59 machines idle P(>59) = 0.0655
Probability of between 60 and 65 machines idle (inclusive = 0.0655-0.0069
= 0.0586
_ 31.5-50 _
2.94
53
Probability of less than 32 machines idle P(<32) = 0.0016
(a) m=np
= 500 x qo = 5
Probability of rejecting batches with 1% defectives
P(>0) = 0.9933
(b) m= m= np
np = 500 X se
709 aa Tis
:
P(>0) = 0.3935
a=3.54
From tables*,
Probability of class of 50 having less than 15 boys = 0.0015
Compare this with the correct answer from binomial tables of 0.0013.
4. This by definition is the Poisson law. However, since m > 15, the normal
approximation can be used. Here p = 30, o =»./30= 5.48
_ 40.5-30_
MO Sgro aiiee
Probability of more than 40 customers arriving in 1 h = 0.0274
(Compare this with the theoretically correct result from Poisson of 0.0323.)
Relationship Between the Basic Distributions 121
a=5.48
6. Here, n = 24
Probability of dustcart’s being broken down (p) = 0.20. This is the binomial
distribution. Here the normal distribution can be used as an approximation.
Mean yp = np = 24 x 0.20 = 4.8
3548-13
1.96 1.96
ao = 1.96
Table 3* gives the probability of three or less dustcarts being out of service
as 0.2546
Probability of more than three dustcarts being out of service
P(>3) = 1— 0.2546 = 0.7454 or 74.5%
7. Here this is the hypergeometric distribution and since the sample size 4 is
greater than 10% of population (20) no approximation can be made. Thus the
hypergeometric distribution must be used.
Probability of 0 defects
Ces 16!
o/\4/_12!4!16 15 14 13
P(0) = (2) SaaHr MoO 1g M18 y7 TSAO
& 16! 4!
Number of persons: 2 or 3.
Object
To demonstrate that the Poisson law may be used as an approximation to the
binomial law for suitable values of n (sample size) and p (proportion of the
population having a given attribute), and that, for a given sample size n, the
approximation improves as p becomes smaller. (Vote: for a given value of p, the
approximation also improves as n increases.)
Method
Using the binomial sampling box, take 100 samples of size 10, recording, in
table 21, the number of red balls in each sample. (Proportion of red balls in the
population = 0.02.)
Relationship Between the Basic Distributions 123
Analysis
1. Summarise the data into a frequency distribution of number of red balls per
sample in table 22 and compare the experimental probability distribution with
the theoretical binomial (given) and Poisson probability distributions.
Draw both the theoretical Poisson (mean = 0.2) and the experimental
probability histograms on figure 1 below table 22.
2. Using the data of experiment 7 and table 23, compare the observed
probability distribution with the binomial and Poisson (mean = 1.5) probability
distributions.
Also, draw both the theoretical Poisson (mean = 1.5) and the experimental
probability histograms on figure 2 below table 23.
Note: Use different colours for drawing the histograms in order that comparison
may be made more easily.
6 Distribution of linear
functions of variables
Wy = Xp + yy
WH=EtH
Oy 20s POs
or the variance of the sum of two independent variates is equal to the sum of
their variances.
aa Wy =Xr—Jr
W=xX-j
Oy “OstOy
124
Distribution of Linear Functions of Variables 125
or the variance of the difference of two variates is the sum of their variances.
Note: It should be noted that while this theorem places no restraint on the
form of distribution of variates the following conditions are of prime importance:
Examples
1. In fitting a shaft into a bore of a housing, the shafts have a mean diameter of
50 mm and standard deviation of 0.12 mm. The bores have a mean diameter of
51 mm and standard deviation of 0.25 mm. What is the clearance of the fit?
The mean clearance = 51 —50=1mm
Variance of clearance = 0.12? + 0.25? = 0.0769
Standard deviation of clearance = \/0.0769 = 0.277 mm
mm
38 mm :
Figure 6.1
3. (a) The time taken to prepare a certain type of component before assembly
is normally distributed with mean 4 min and standard deviation of 0.5 min. The
time taken for its subsequent assembly to another component is independent of
preparation time and again normally distributed with mean 9 min and standard
deviation of 1.0 min.
126 Statistics: Problems and Solutions
What is the distribution of total preparation and assembly time and what
proportion of assemblies will take longer than 15 min to prepare and assemble?
Let w = total preparation and assembly time for rth unit.
w=4+9= 13 min
Figure 6.2 15
(b) In order to show clearly the use of constants a, b, c,.. ., consider the
previous example, but suppose now that each unit must be left to stand for
twice as long as its actual preparation time before assembly is commenced.
What is the distribution of total operation time now?
Here
Wy = 3X, + yy
where
¥, = assembly time
w=(3
x 4)+9=21 min
Ow = 3° KU fi 125
Standard deviation of w = 1.8 min.
(c) To further clarify the use of constants, consider now example 3(a). Here
the unit has to be sent back through the preparation phase twice before passing
on to. assembly.
Distribution of Linear Functions of Variables 127
Assuming that the individual preparation times are independent, what is the
distribution of the total operation time now?
Here ;
Ow = 1.32
then
WX tx. 4 te = AX
Example
Five resistors from a population whose mean resistance is 2.6 k&2 and standard
deviation is 0.1 k&82 are connected in series. What is the mean and standard
deviation of such random assemblies?
Average resistance = 5 x 2.6 = 13 kQ
Variance of assembly = 5 x 0.17 = 0.05
Standard deviation = 0.225 kQ
Population
mean UL
variance o”
128 Statistics: Problems and Solutions
Pe |
Xj G1 tee Gal tee. oc Xi)
1
as(t)m a=1 1
a x3 +...
ba (3) aries471 (x,) )
6959) i Ps (; Xi
2 2 2
; 1 1
Variance of distribution of sample of size n = (2) o? + (2) Oo Fs x (2) o
NO Bas oO ®
Figure 6.3. Probability distribution (a) the score of 1 die (b) the score of 3 dice.
Distribution of Linear Functions of Variables 129
sampling distribution of means gets closer to normality and similarly the closer
the original distribution to normal the quicker the approach to true normal form.
However the rapidity of the approach is shown in figure 6.3 which shows the
distribution of the total score of three 6-sided dice thrown 50 times. This is
equivalent to sampling three times from a rectangular population and it will
be seen that the distribution of the sum of the variates has already gone a long
way towards normality.
Mx + My (or Ux — My)
and variance
0% , Oy
2 Ps
i ny
In the special case where two samples of size n, and, are taken from the
same population with mean yu and variance o*, the moments of the distribution
of sum (or difference) of sample averages is mean 2u for sum (and 0 for
difference) and variance
@ IG, + a
ny Ny
This theorem is most used for testing the difference between two populations,
but this is left until chapter 7.
Example
A firm calculates each period the total value of sales orders received in £.p. The
average value of an order received is approximately £400, and the average number
of orders per period is 100.
What likely maximum error in estimation will be made if in totalling the
orders, they are rounded off to the nearest pound?
Assuming that each fraction of £1 is equally likely (the problem can,
however, be solved without this restriction) the probability distribution of the
error on each order is rectangular as in figure 6.4, showing that each rounding
off error is equally likely.
Consider the total error involved in adding up 100 orders each rounded off.
Statistically this is equivalent to finding the sum of a sample of 100 taken
from the distribution in figure 6.4.
130 Statistics: Problems and Solutions
creae tS
=—50'p Oo +50p
From theorem 3 the distribution of this sum will be normal and its mean and
variance are given below.
Average error =0
Variance of sum = 1000? where o? variance of the distribution of individual
errors
2. The maximum payload of a light aircraft is 350 kg. If the weight of an adult
is normally distributed (N.D.) with mean and standard deviation of 75 and 15 kg
respectively, and the weight of a child is normally distributed with mean and
standard deviation of 23 and 7 kg respectively, what is the probability that the
plane can take off safely with
In each case, what is the probability that the plane can take off if 40 kg of
baggage is carried?
3. Two spacer pieces are placed on a bolt to take up some of the slack before
a spring washer and nut are added. The bolt (b) is pushed through a plate (p)
and then two spacers (s) added, as in figure 6.5.
(b)
Clearance
Figure 6.5
what is the probability of the clearance being less than 7.2 mm?
4. In a machine fitting caps to bottles, the force (torque) applied is distributed
normally with mean 8 units and standard deviation 1.2 units. The breaking
strength of the caps has a normal distribution with mean 12 units and standard
deviation 1.6 units. What percentage of caps are likely to break on being fitted?
5. Four rods of nominal length 25 mm are placed end to end. If the standard
deviation of each rod is 0.05 mm and they are normally distributed, find the
99% tolerance of the assembled rods.
6. The heights of the men in a certain country have a mean of 1.65 m and
standard deviation of 76 mm.
(a) What proportion will be 1.80 m or over?
132 Statistics: Problems and Solutions
(b) How likely is it that a sample of 100 men will have a mean height as
great as 1.68 m. If the sample does have a mean of 1.68 m, to what extent does
it confirm or discredit the initial statement?
7. A bar is assembled in two parts, one 66 mm + 0.3 mm and the other
44 mm + 0.3 mm. These are the 99% tolerances. Assuming normal distributions,
find the 99% tolerance of the assembled bar.
(a) Show that, for 95% of assemblies to satisfy the minimum clearance
condition, the mean plug diameter must be 34.74 mm.
(b) Find the mean plug diameter such that 60% of assemblies will have the
required clearance.
In each case find the percentage of plugs that would fit too loosely (clearance
greater than 0.375 mm).
9. Tests show that the individual maximum temperature that a certain type of
capacitor can stand is distributed normally with mean of 130°C and standard
deviation of 3°C. These capacitors are incorporated into units (one capacitor per
unit), each unit being subjected to a maximum temperature which is distributed
normally with a mean of 118°C and standard deviation of 5°C.
What percentage of units will fail due to capacitor failure?
10. It is known that the area covered by 5 litres of a certain type of paint is
normally distributed with a mean of 88 m? and a standard deviation of 3 m?. An
area of 3500 m? is to be painted and the painters are supplied with 40 5-litre tins
of paint. Assuming that they do not adjust their application of paint according
to the area still to be painted, find the probability that they will not have
sufficient paint to complete the job.
11. A salesman has to make 15 calls a day. Including journey time, his time
spent per customer is 30 min on average with a standard deviation of 6 min.
(a) If his working day is of 8 h, what is the chance that he will have to work
overtime on any given day?
(b) In any 5-day week, between what limits is his ‘free’ time likely to be?
12. A van driver is allowed to work for a maximum of 10h per day. His
journey time per delivery is 30 min on average with a standard deviation of 8
min.
In order to ensure that he has only a small chance (1 in 1000) of exceeding
the 10 h maximum, how many deliverties should he be scheduled for each day?
Distribution of Linear Functions of Variables 133
1. At least 95% of individual packets must weigh more than 0.475 kg. Thus the
process average weight must be set above 0.475 kg by 1.645 times the standard
deviation (see figure 6.6; 5% of the tail of a normal distribution is cut off
beyond 1.645 standard deviations), i.e at
0.05
0.475 0.475+1.645 x 0.01 1.9 1.9+1.645x0.0l
Individual packets Weight of 4 packs
If individual packages are packed four at a time, the distribution of total net
weight and the probability requirements are shown in figure 6.7.
The mean weight of 4 packages must be
2. (a) The weight of four adult passengers will be normally distributed with
mean of 4 x 75(= 300) kg and standard deviation of \/4 x 15(=30) kg. The
shaded area in figure 6.9 gives the probability that the plane is within its
maximum payload.
_ The standardised normal variate,
y=_ 320-=
350—300_
300 _5
Sn SelOi)
30 30
Cent
: Nioas! 5
23 75
Child weight Adult weight
Figure 6.10
As before,
250323 . 2f
aces inane hc
Thus probability of safe take-off = 1— 0.1894 = 0.81
Figure 6.11
:
o = 308
P2710 70.2
Ssh
10.095 8
thus the probability that the clearance is less than 7.2 mm is 1 — 0.0132 = 0.987
JU o=0.090
Figure 6.13
Do».
7
.
7.2
4. Acap will break if the applied force is greater than its breaking strength.
The mean excess of breaking strength is 12 — 8 = 4 units while the standard
deviation of the excess of breaking strength is \/(1.6? + 1.27) =./4.00 = 2.0.
When the excess of cap strength is less than zero the cap will break and the
proportion of caps doing so will be equal to the shaded area of figure 6.14, i.e
the area below
u= =p a COT O0225.
O 4
Figure 6.14 Excess of breaking strength
5. The distribution of the total length of four rods will be normal with a mean
of 4x 25 = 100 mm and standard deviation of \/4 x 0.05 = 0.10 mm.
Ninety-nine per cent of all assemblies of four rods will have their overall
length within the range
100+2.58x0.10mm ie. 100+0.26mm
metre, the required answer is equal to the shaded area in figure 6.15.
1180S 1.6575
u= “SOE = 1.97
a =0.076
_ 1.68— 1.65
Unie a
The shaded area is about 0.00004.
1.65 168
Figure 6.16 Mean
of |OO heights
Possible alternative conclusions are that this particular sample is a very unusual
one or that the assumed mean height of 1.65 m is wrong (being an underestimate)
or that the standard deviation is actually higher than the assumed value of
76 mm.
_ 0.375 — 0.256
pagers? 10:28
Table 3* shows that the probability of exceeding a standardised normal variate
of 0.95 is 0.1711, i.e. approximately 17% of plugs would be too loose a fit.
(b) For 60% of assemblies to have clearance greater than 0.05 mm, the mean
clearance must be
0.05 + 0.253 x 0.125 = 0.082 mm
and the mean plug diameter must be less than 35 mm by this amount, i.e.
34.92 mm.
0.05 OS75
_ 0.375 — 0.082 _
u
0.125 i
138 Statistics: Problems and Solutions
GS
118 y 130 x
Max. applied temperature Capacitor max. temperature
10. The area covered by S litre of paint is normally distributed with mean and
standard deviation of 88 m? and 3 m’, respectively. Thus the area covered by
a0=3V40,
3500 3520
Figure 6.22 Area covered by 40 x 5 litres of paint
Distribution of Linear Functions of Variables 139
57> 6V 15
450 480
Figure 6.23 Time for |5 visits
The probability that 15 calls take longer than 8 h is represented by the shaded
area in figure 6.23.
480 min (8 h) corresponds to
_ 480—450
is 1:29
CHS)
Figure 6.24 Working time
140 Statistics: Problems and Solutions
q "
@
30 30n 600
Time per delivery Time for 7 deliveries
In order that there is only 1 chance in 1000 that n journeys take longer than
10h (600 min), n must be such that
30n + 3.09 x 8\/n < 600
The largest value of n that satisfies the inequality can be found by systematic
trial and error. However, a more general approach is to solve the equality as a
quadratic in./n, taking the integral part of the admissible solution as the number
of deliveries to be scheduled.
Thus
inadmissible since the average total journey time would be 12 h, violating the
probability condition.
The number of deliveries to be scheduled is therefore 16.
If 16 deliveries were scheduled, the probability of exceeding 10 h would
actually be less than 0.001—in fact about 1 in 10 000.
Appendix 1—Experiment 12
Object
To demonstrate that the distribution of the means of samples of size n, taken
from a rectangular population, with standard deviation o tends towards the
normal with standard deviation o/./n.
Method
From the green rod population M6/3 (rectangularly distributed with mean of
6.0 standard deviation of 0.258), take 50 random samples each of size 4,
replacing the rods after each sample and mixing them, before drawing the next
sample of 4 rods.
Measure the lengths of the rods in the sample and record them in table 33.
Analysis
1. Calculate, to 3 places of decimals, the means of the 50 samples and summarise
them into a grouped frequency distribution using table 34.
2. Also in table 34, calculate the mean and standard deviation of the sample
means and record these estimates along with those of other groups in table 35.
Observe how they vary amongst themselves around the theoretically
expected values.
3. In table 36, summarise the frequencies obtained by all groups and draw, on
page 57, the frequency histogram for the combined results. Observe the shape
of the histogram.
142 Statistics: Problems and Solutions
Sample no.
|Sample no Oe eedanevan amPe
reirine bey
a ee eee eT
|
| ie at
Total iL bay | |
Average Be | 12
|
[aa
|
= 4
aro
4
Total
Average | hse sah
| Total
Average
Sample no.
Total
Average
Sample no.
Total
Average
creesroyerst
me
PEPPER
PERE
RE
fARGGEGE
TS
Totals of +ve terms
\
\\
Total of —ve terms
Net totals
Lid
Table 6.2 (Table 34 of the laboratory manual)
% = 6.000 + 0.075 4 = 7
rf
The standard deviation, s’, of the distribution is given by:
2
Lfu? — sate
s =0.075 pace pac, =
* Strictly the class intervals should read 5.5875-5.6625 and the next 5.6625-5.7375 etc.
but the present tabulation makes summarising simpler.
144 Statistics: Problems and Solutions
7.1 Syllabus
Point and interval estimates; hypothesis testing; risks in sampling; tests for
means and proportions; sample sizes; practical significance; exact and approximate
tests.
(1) In testing whether a coin is biased, the hypothesis would be set up that it
was fair, i.e. the probability of a ‘head’ on one toss is 0.5.
(2) In testing the efficiency of a new drug, it would be assumed as a hypothesis
that it was no different in cure potential from the standard drug in current use.
(3) A new teaching method has been introduced; to assess whether it gives an
improvement in its end product compared with the previous method, the
hypothesis set up would be that it made no difference, i.e. that it was of the
same effectiveness.
(4) To determine whether an overall 100 k.p.h. speed limit on previously
unrestricted roads reduces accidents, the hypothesis would be set up that it
makes no difference. The same method would be used to assess the effect of
breathalyser tests.
Table 7.1 summarises the requirements for tackling these four problems.
The notation used is:
Sample size n (n, and n, for problems II and IV)
n+ (td—"d)s z T =
|ea Ita‘ (td sa
Statistics: Problems and Solutions
roa 2°
tu Olu
cu tu | 2/n — (td —1d) — (%@x—! x)
(@n—'n)ss( — 4+ )/ “Un
fee z9 70 = a3
AI
Ajoyewrxoidde st [eAlo}UI BDUSpPIJUO %(%— 1) OOT BOUAPIFUOS SI
[PAIO}UT %(— OOT1)
ar
ia z/” n+edsus
SYS (d=pa
‘6 7/ 2 = (0
Fees
ce eae tek a.
SI [PAIOJUT BoUaPTJUOD %(M— 1) OOT
Ill
AJoVeWIXxOIdde St [eAIOJUT soUSpPIJUOD %(7— 1) OO!
tdtu~
tu
tdtu
Tu
4
4
pue %u 10 Ty Jo 93¥WIT}S9 JSoq OY] SI L
ty — ly st stsoyjodAdy [Inu oy} s1oyM Ay [[NuU sy} soy
tr = '™n sournsse sisoyjod
c Sate
(33
(+ “)u-pe| :
3)
td—ld
I
(@x—txy="
zo zo
0} ssonpel SIU, 0} seonpel STU],
ty Ty 3)
Snkh
a
s+
eae ETE he 32?
zo
tay=(td
“(t u Tay)" =Crt) ~
(Ha)
et
n
le
(u— |)
ue
Oe
SoTqeiieA
=
148 u—d n—x
(a3Ie] U IOJ) soynqtiy1V We]qolg
Estimation and Significance Testing (I) 149
Variables Population mean uw (u, and wu, for problems II and IV)
Population standard deviation o (0, and 0, for problems II and
IV) assumed known
Sample mean X (X, and x, for II and IV)
Proportions Population proportion (7, and 7, for problems II and IV)
Sample proportion p (p, and p, for problems II and IV)
normality for large n (and preferably with 7 neither small nor large)—see chapter 5.
allowance must be made for sampling fluctuations; this is done by using the
standard error of the sample mean to determine a confidence interval for the
population mean.
For 95% confidence, the interval (conventionally symmetric in tail
probability) is
X—1.960/\/n and X+1.960//n ice.
53-1.96x 1.5 and 53+1.96x 1.5
53—2.94 and 53+2.94
50.06 and 55.94 say 50.1 and 55.9
Notice that the interval does not include the previously assumed mean of 50.0.
In this respect, the two procedures (hypothesis testing and interval estimation)
are equivalent since the test hypothesis will be rejected at the 5% level of
significance if the observed sample mean is more than 1.96 standard errors on
either side of the assumed mean, and if this is the case the 95% confidence
interval cannot include the assumed mean. This argument applies in the two-
sided case for any significance level a and associated confidence probability
(1—a).
Also note, that in this example, the standard deviation was known and the
test and confidence interval estimation was perfectly valid for any size of
sample.
3 _~ 0.103— 0.100 = v0 -
: fe GRO ee
This is significant at the 1 in 1000 level (|u| > nasthe actual type I error being
less than 6 parts in 100 000 (table 3*).
Ninety-nine per cent confidence limits for the real mean piecing-up time
under the conditions applying during the sampling of the 160 readings are
Thus, the evidence suggests that the synthesis of the mean operation time
tends to underestimate the actual time by something between 1% and 5%.
Whether this is of any practical importance depends on what use is going to be
made of the synthetic time. Perhaps the method of synthesising the time may be
worth review in order to bring it into line with reality.
3. In special trials carried out on two furnaces each given a different basic mix,
furnace A in 200 trials gave an average output time of 7.10 h while 100 trials
with furnace B gave an average output time of 7.15 h.
Given that, from previous records, the variance of furnace A is 0.09 h? and
of B is 0.07 h? and an assurance that these variances did not change during the
trials, is furnace A more efficient than B?
First of all, set up the test hypothesis that there is no difference in furnace
efficiencies (i.e. average output times). The test is two-sided since there is no
reason to suppose that if one is more efficient then it is known which one it will
be.
Set up
which becomes on substituting the observed data and the test assumptions
regarding (ua — Mp)
_(7.10—7.15)—0 _ —0.05 _
u "7/0.09, 0.07) = 9.034 = —1.47
200 100
Since this is numerically less than 1.96, or any higher value of wu corresponding
to a smaller a, the difference in mean output times has not been shown to be
statistically significant at any reasonable type I error level.
Note: Even if a very highly significant value of u had been obtained (say
|u| > 4.0) then the question could still not have been answered because of the
way the trials had been set up. The two furnaces may have been different in
mean output times (efficiencies) but because different basic mixes had always
been used in the furnaces, it is not apparent how much of the efficiency
difference was due to the different mixes and how much was due to inherent
properties of the furnaces (including, perhaps, the crews who operate them). To
determine whether the mix differences, furnace differences or a combination of
Estimation and Significance Testing (I) 153
both are responsible for differing mean output times would require a properly
designed experiment (this experiment is not designed to answer the question
posed). E
In addition, it was assumed that the variances of the output times would be
unchanged during the special trials. This may often be a questionable assumption
and is unnecessary in this example since the sample variances of the 200 and
100 trials respectively could be substituted for 04 and of with very little effect
on the significance test.
This is almost significant at the 1% level and suggests that the mean yield for the
whole population of farms is greater in the second year.
As a word of warning, such a conclusion may not really be valid since the two
samples may not cover in the same way the whole range of climatic conditions,
soil fertility, farming methods, etc. The significant result may be due as much
to the samples’ being non-representative as to a real change in mean yield for the
whole population. The extent of each would be impossible to determine without
proper design of the survey. There are many methods of overcoming this, one
of which would be to choose a representative samples of farms and use the same
farms in both years.
5. A further test of the types illustrated in examples 3 and 4 can be made
when the population variances are unknown but there is a strong a priori
suggestion that they are equal. In this case, the two sample variances can be
pooled to make the test more efficient, i.e. to reduce 6 for given a and total
sample size (n, + 12).
A group of boys and girls were given an intelligence test by a personnel
officer. The mean scores, standard deviations and the numbers of each sex
154 Statistics: Problems and Solutions
are given in table 7.2. Were the boys significantly more intelligent than the
girls?
Boys Girls
Table 7.2
The question as stated is trivial. If the test really does measure that which is
termed ‘intelligence’, then on average that group of boys was more intelligent
than that group of girls, although as a group they were more variable than the
girls.
However, if the boys are a random sample from some defined population of
boys and similarly for the girls, then any difference in average intelligence
between the populations may be tested for.
Assuming that there is a valid reason for saying that the two populations have
the same variances, the two sample variances can be pooled by taking a weighted
average using the sample sizes as weights (strictly the degrees of freedom—see
chapter 8—are used as weights but this depends on whether the degrees of
freedom were used in calculating the quoted standard deviations; in any case,
since the sample sizes here are large the error introduced will be negligible).
Pooled estimate of variance of individual scores
_ (72x 117)
2
+ (50x 10?)
2
_ 445
72+ 50
(Note: The variances are pooled, not the standard deviations.)
u= (Xp -XG)—(p—Uo)
2S oath
2 2
np Ne
where s? is the pooled variance
_— (124-121)-0 _ 373600 _ 3x 60_
V[112G4 +4)) V(i12x 122) 117 me
Thus there is no evidence that the populations of boys and girls differ in average
intelligence. This conclusion does not mean that there is not a difference,
merely that if there is one we have not sufficient evidence to demonstrate it,
and even if we did, it may be so small as to be of no practical importance at all.
Confidence limits for the difference between two population means can be
set in the same way as in examples (1) and (2) above.
Estimation and Significance Testing (I) 155
Thus 95% confidence limits for the difference in mean intelligence are given
by
(x eae ) + 1.96 5B, SG
B—%XG)= 1. ears
=3+1.96x
2 =3+43.82, ie
—0.82 to 6.82
including the null value, 0, as it must from the significance test.
Note: The use of 1.96 instead of 2.0 is somewhat pedantic in practical terms;
it is retained in this chapter to serve as a reminder that the appropriate u-factor
is found from the tables* of the normal distribution in conjunction with the
choice of a.
Serta OS Oe me OO
a ee 0.8x0.2\~ 0.16 bat
n 50
This is not numerically large enough to reject the test hypothesis—the type I
error would correspond to just under 16%.
A slight improvement can be made in the adequacy of the normal approximation
by making the so-called correction for continuity. However, with the large
sample sizes generally required for use of the normality condition, this refinement
will not usually be worth incorporating. It is given here as an example.
BE
“J [nn(1—7)]
which on dividing top and bottom by n gives
aahicg
The exact test can be carried for this example since the appropriate
Estimation and Significance Testing (I) 157
Set up the null hypothesis that both germination rates are the same, i.e.
Ho: Ta = TR Hi:
1, FT,
An approximately normal test statistic can be set up (see summary table 7.1) as
Under the null hypothesis, 7,4 = 7p = some value 7, say, and the test statistic
becomes
ee (Pa —Pp)
abo
(ex*70)
u
9. Example (1) of this section was concerned with a normal variable with
standard deviation of 6.0 units, this being assumed constant whatever the mean
of the population. The null hypothesis was set up that the mean was 50.0 units.
(a) If a two-sided test of this hypothesis is carried out at the 1% level of
significance, what will be the type II error, if a sample of size 16 is taken and the
population mean is actually equal to
(b) What size of sample would be necessary to reject the test hypothesis
with probability 90% when the population mean is actually 48.0 units? The
significance level (type I error) remains at 1%.
(a) (i) Figure 7.1 shows the essence of the solution. The solid distribution
is how xX is assumed to be distributed under the null hypothesis, the critical
Estimation and Significance Testing (I) 159
Figure 7.1
region being given by the shaded area in its two tails. The boundaries of the
acceptance region for a 1% significance level are at
6
50 Bc+ 2.58 aie
—_—_ = 46.13 and 53.87
/16
and
V16
The tail areas corresponding to these values are 0.0268 and 0.0006
approximately.
The type II error is therefore equal to
1 —(0.0268 + 0.0006) = 0.9726
(ii) The solution to this part is the same as that for part (i) except that the
actual distribution of the sample mean will be centred around 53.0.
The values of u corresponding to the limits of the acceptance region are
53.87— 53.0
[LR 16IPF pee
apna = 0.58
/16
and
The dotted distribution shows how the means of samples of size n will be
distributed when the population average is actually 48.0 units. The extreme
part of the right-hand tail of this distribution will lie above x2 but it will be
such a minute proportion in this case as to be negligible.
The following equations can be set up.
#* = 50.0-2.58.x a (7.2)
Subtracting one equation from the other leads to
or
.
Ree 3.86 x 6
) = 11.58? = 134
2
6
50:0 as
2.58°x Ji134
——
48.66 and 51.34
Part (b) of this example postulated the requirement that if the mean is 48.0
units (or more generally, if it differs from the test value by more than 2.0 units),
the chance of detecting such a difference should be 90%. This requirement
Estimation and Significance Testing (I) 161
would have been determined by the practical aspects of the problem. However,
if the actual population mean were less than 48.0 (or bigger than 52.0), the
probability of committing a type II error with a sample size of 134 would be less
than 10%; and if the population mean were actually between 48.0 and 50.0, this
probability would be greater than 10%.
10. What is the smallest random sample of seeds necessary for it to be asserted,
with a probability of at least 0.95, that the observed sample germination
proportion deviates from the population germination rate by less than 0.03?
The standard error of a sample proportion is \/[m(1 —7)/n] where 7 is the
population proportion and n the sample size. Assuming that 1 will be large
enough for the approximation of normality to apply reasonably well to the
distribution of p, the problem requires that
giving
men 1096\"
n= eae m1 —m)
n= (158) 1 = 1060
would certainly satisfy the conditions of the problem (whatever the value of 7).
Alternatively if an estimate is available of the likely value of 7, this can be
used instead of 7 as an approximation. Such an estimate may come from previous
experience of the population or perhaps from a pilot random sample; the pilot
sample estimate can be used to determine the total size necessary. If the pilot
sample is at least as big as this, no further sampling is needed. If it was not, the
extra number of observations required can be found approximately. If such
extra sampling is not possible for some reason (too costly, not enough time),
the confidence probabilities of types I and II errors will be modified (adversely).
For this example, if the seed population germination rate is usually about
80%, then the required value of sample size for at most a deviation of 0.03
(i.e. 3%) with probability of 0.95 is
1.96 2
n= (iss) 0.8 x 0.2 = 680
Does this result show that the firm’s product is used more in the country than in
town?
7. If men’s heights are normally distributed with mean of 1.73 m and standard
Estimation and Significance Testing (I) 163
deviation of 0.076 m and women’s heights are normally distributed with mean
of 1.65 m and standard deviation of 0.064 m, and if, in a random sample of 100
married couples, 0.05 m was the average value of the difference between
husband’s height and wife’s height, is the choice of partner in marriage influenced
by consideration of height?
8. For the data of problem (3) (page 46), chapter 2, estimate 99% confidence
limits for the mean time interval between customer arrivals. Also find the
number of observations necessary to estimate the mean time to within 0.2 min.
then
~%—Mo _ 1.005—1.000_
V6 _
Be akin. 0,025 Foss Oe
The probability of such a deviation is about 62% and so there is no real
evidence that the process average is not 1.00 kg, i.e. the sample data are quite
164 Statistics: Problems and Solutions
consistent with a setting of 1.00 kg, although a type II error could be committed
in deciding not to re-set the process.
(b) Confidence limits for the actual current process average are, for two levels
of confidence
2. Losing S5Op in 200 throws means that there must have been 125 odd numbers
(losing results) and 75 even numbers (winners) in 200 throws. Set up the null
hypothesis that the dice is unbiased.
Ho: 7 = 0.5 (7 = the probability of an odd number)
A, :17#0.5
The total number of odd numbers will be binomially distributed and since
n= 200 and m7 =5 we know that
3. 95% confidence limits for the proportional utilisation of the machine are
approximately
De 196,||
PO=P)]
which gives
and
Also, since 7 is near to 0.5, for 95% confidence estimation, the required number
of spot checks is given by
4. 99% confidence limits for the population proportional support for candidate
A are
1100 Se
aos
1100 x 900. = =
Then
(0.10 —0.09) — 0
ua = 1.08
~ s/[0.095 x 0.905 (ada + z000)]
There is no evidence that the proportion of people in the country area using the
product is any different from that in the town.
6. Assume that the percentage of sub-assemblies which are defective is the
same in the long run for both suppliers.
Thus
Ho: Ta —TR = 9 Ay: 1, —TR #0
T=
200 x 0.05 +300 x 0.03 _ 19
200 + 300 500
166 Statistics: Problems and Solutions
Thus
This value of u nearly reaches its critical value for a 5% (two-sided) level of
significance; the actual level is about 6%. There is thus some suspicion that B is
better than A but what action is taken depends on the consequences of the
possible alternative decisions.
7. Set up the test hypothesis that the choice of marriage partner is not influenced
by the height of either. In this case, in a married couple, the height of a man and
of a woman is each a random selection from the distributions of men’s and
women’s heights respectively.
{0.00987 ee
¥10o ;
0.05 0.08
Figure 7.3 Average height excess for |OO couples
The excess of the man’s height over the woman’s height will be a normal
variable with mean of (1.73 — 1.65) m and variance of (0.076? + 0.064? )m?.
The average difference (excess) of height taken over a random sample of 100
such married couples will be normally distributed (i.e. from one sample of 100
to another) with mean of 1.73 — 1.65 = 0.08 m and variance of
(0.076? + 0.0647)/100 and have a standard error of \/0.00987/s/100 = 0.0099 m.
The observed average difference was 0.05 m corresponding to a u value of
0.05—0.08__
Ooodere mueUs
Estimation and Significance Testing (I) 167
0.005 0.005
129? HB 129°
Time between successive customers Average of 56 time intervals
99% confidence limits for the mean time between arrivals are
The denominator being the standard error of the difference of two sample
menns based on samples of size n, and ng respectively.
168 Statistics: Problems and Solutions
Thus
EM and = 2a)
he |eg
100 80
substituting the sample variances for the population variances
=0.3 =0.3°
ea
~ 40.0605 0.246
Since this value is not numerically larger than 2.58, there is no evidence of a
difference in mean lifetimes between A and B.
10. (a) Assume there is no systematic difference between the analysts, i.e. the
means of an infinitely large number of analyses of the same material would be
equal for A and B.
Under such a null hypotheses we may use the test statistic
This is significant at the 1% level (i.e. |u| > 2.58) and we can conclude (with
only a small type I error) that there is a systematic difference between the
analysts, A giving a higher result than B on average. Thus at least one of them,
and possibly both, gives a biased estimate of the actual percentage composition.
99% confidence limits for the extent of this systematic difference are given
by
(28.4 — 28.2) + 2.58 ./[0.17(4 + 4)] = 0.2 + 0.173 = 0.027 and 0.373%
(b) Figure 7.0 shows the requirements of this problem.
(7.3)
(7.4)
Figure 7.6
Estimation and Significance Testing (I) 169
Note: In writing down equations (7.3) and (7.4), the minute part of the left-
hand tail of the dotted distribution falling in the lower part of the critical
region has been ignored.
_ (2.58 +2.33)?
x 2.x 0.1? =4.91?
x §=5.35
oa Oar nt
Thus each analyst should do six tests, the probability of detecting a systematic
difference of 0.3% between them (if it exists) being greater than the required
minimum of 99%. :
In fact the required minimum power would still be achieved if one analyst
took six tests and the other five in order to reduce the total cost or effort
involved.
DG er
S pete ae (8.1)
where X = sample average.
PG py
s? = (8.2)
The denominators in both equations (8.1) and (8.2) are called the degrees of
freedom of the variance estimate.
170
Sampling Theory and Significance Testing (II) 171
Testing the Hypothesis that the Mean of a Normal Population has a Specific
Value Up» —Population Variance Known
Here, providing the population variance is known (and therefore the sample
estimate of variance is not used), then the ‘u’ test is appropriate whatever the
sample size.
Thus
u=
X —WMo
we
Vn
Example
In an intelligence test on ten pupils the following scores were obtained: 105,
120, 90, 85,130, 110, 120,115,125, 100.
Given that the average score for the class before the special tuition for the
test. was 105 with standard deviation 8.0, has the special tuition improved the
performance?
Here since the standard deviation is given and if the assumption is made that
tuition method does not change this variation, then the wu test is applicable.
Here a one-tailed test can be used if it is again assumed that tuition could not
have worsened the performance.
Thus
_ 110-105 =,1298
of 8
V10
From table 3*
for5% u=1.64
Lo t= 233
>x? ie (2x;)
S
Peles
=
es
=>
n (8.3)
n—1 n—l
The null hypothesis is set up that the sample has come from a normal population
with mean Uo.
W. S. Gosset under the nom de plume of ‘Student’ examined the following
statistic
t=~ ee (8.4)
Vn
and showed that it is not distributed normally but in a form which depends on
the degrees of freedom (v) if the null hypothesis is true. Table 7* sets out the
various percentage points of the ‘t’ distribution for a range of degrees of freedom.
Obviously ¢ tends to the statistic u in the limit where v>%, i.e. f is
approximately normally distributed for large degrees of freedom v. Reference
to the table* shows that, as most textbooks assert, where the degrees of freedom
exceed 30, the normal approximation can be used or the ‘t’ test can be replaced
by the simpler ‘w’ test.
Again, one-tailed tests are only used when a priori logic clearly shows that
the alternative population mean must be on one side of the hypothesis value ug.
See section 8.2.7.
Testing the Hypothesis that the Means of Two Normal Populations are yu, and py
Respectively—Variances Equal but Unknown
Note: The assumption must hold that the variances of the two populations
are the same (i.e. 0% = 03,) since we are going to pool two sample variances and
this only makes sense if they are both estimates of the same thing—a common
population variance. If 0% does not equal 0%, then the statistic given below is not
distributed like ¢ ;
The two sample variances s2 and By are pooled to give a best estimate of the
common population variance.
p= FD)
— Wx = Hy)
(8.6)
lesen,
with (7, +n, — 2) degrees of freedom. The usual test hypothesis is that the
populations have equal means and under this assumption (u, —s,) = 0 and the
test statistic reduces to
f=
es
eraliadn'd
Ooh,
(8.7)
Note: This approach is only legitimate provided that there is a valid reason
for pairing the observations. This validity is determined by the way in which
the experimental observations are obtained.
Let the number of paired readings = n
Let the difference of the ith pair =;
174 Statistics: Problems and Solutions
Then
n n
and
fae =H (8.8)
Vn
The test hypothesis is usually of the null type where there is assumed to be no
difference on average in the paired readings, i.e. uo = 0. In this case the test
statistic t is given by
(8.9)
S|
ole.)
7 aes a S
for 99% confidence limits x + (40.005,v) Fy
This is similar to the large sample case except that f is used instead of u.
Then
If F is greater than Fo 925 (see table 9*) for (n, — 1) degrees of freedom of
numerator and (71, — 1) degrees of freedom for the denominator, then the
difference is significant at 5% level (a = 0.05). For F to be significant at 1% level as
use F’g995 (actually F’'9.9; will have to be used giving a 2% significance level
of F).
the suffix n denoting the number of degrees of freedom. Obviously the larger n
is the larger x? and the percentage points of the sampling distribution of x? are
given in table 8*.
For example, where m = 1 the numerical value of the standardised normal
deviate u exceeds 1.96 with 5% probability and 2.58 with 1% probability (i.e.
with half the probabilities in each tail). Consequently x? with one degree of
freedom has 5% and 1% points as 1.96? and 2.58” or 3.841 and 6.635.
However, for higher degrees of freedom the distribution of x? is much more
difficult to calculate, but it is fully tabulated in table 8*.
a (OF 1
x Hi
is distributed like x? with (k — r) degrees of freedom where r = number of
parameters used to fit the distribution.
For the use of this test all the £; values must be greater than 5. If any are less
then the data must be grouped.
176 Statistics: Problems and Solutions
Factor 1 Row
1 2 3 —— i Baie a totals
Factor 2
1 On Or 031 On Oa 2Oj1
1
2 On Or
b Ow Oab 2Ojn,
l
Column
totals 7 Oy 20H 20aj 2205
Table 8.1
22O;j; = total frequency
ij
XOj; = total frequency at the ith level of factor 1 (column total)
j
2 Oj; = total frequency at the jth level of factor 2 (row total)
t
These tables are generally used to test the hypothesis that the factors are
independent.
If this hypothesis is true then, the expected cell frequency
220i; x 2O* 2
Ey = bel wand Ep |tee
— Fi |
LLUO*z
Pad
Ee] ij
Sampling Theory and Significance Testing (II) 177
x= "2=26
Estimated variance of population
rx? ee P23
s= tie ie IO =e,
n—1
t
26
ae 2.67
178 Statistics: Problems and Solutions
2. A weaving firm has been employing two methods of training weavers. The
first is the method of ‘accelerated training’, the second, ‘the traditional’ method.
Although it is accepted that the former method enables weavers to be trained
more quickly, it is desired to test the long-term effects on weaver efficiency. For
this purpose the varying efficiency of the weavers who have undergone training
during a period of years has been calculated, and is given in table 8.2.
Is there any significant difference between training methods?
Specialised Traditional
training method Total
A B
Total 52 43 95
=10.76
Degrees of freedom = (3—1)(2—1) = 2
Above average |
Below average 2
Insufficient
data 3
Tota!
Table 8.3
Process i Process 2
Sample size 50 60
Mean 10.2 ACE
Standard deviation Re 2.1
Table 8.4
+ STOR INY
ceearrco 8.83
ioe
F= ar 2.00
From table 9*
Degrees of freedom of greater estimate = 149, read as
Degrees of freedom of lesser estimate = 59, read as 60
5% significance level= 1.48
4. For example (1), page 69, in chapter 3, for goals scored per soccer match,
test whether this distribution agrees with the Poisson law.
Null hypothesis: the distribution agrees with the Poisson law.
Table 8.5
In table 8.5 the last three class intervals must be grouped to give each class
interval an expected value greater than 5. Also, the first two.
Degrees of freedom = 6—1—1 = 4, since the totals are made the same and the
Poisson distribution is fitted with same mean as the actual distribution.
Referring to table 8*
5. In a mixed sixth form the marks of eight boys and eight girls in a subject
were :
Boys: 25, 30, 42, 44, 59, 73, 82, 85; boys’ average = 55
Girls: 32, 36, 40, 41, 46, 47, 54, 72; girls’ average = 46
Do these figures support the theory that boys are better than girls in this
subject?
Null hypothesis—that boys and girls are equally good at the subject.
From the sample of boysx, = 55 st = 540.57
From the sample of girls x. = 46 5> = 156.86
Applying the F test to test that population variances are not different, gives .
540.57 Ms f¥e
F Fe156.86
le a 3.46 (not significant) v,=7
= Uy —=9/
There is no evidence from these data that boys are better than girls.
(see discussion of example 5, chapter 7, p 153).
Previous machine standard deviation before conversion = 2.8 mm. For the new
process, calculated from sample of 20, standard deviation = 1.7 mm.
What is the significance of this test?
It can be assumed that the process change could not have increased the
variation of the process.
Null hypothesis—that no change has occurred in process variation. Thus, a
one-tailed test can be used
28a
Fao a = 2.71 Uy =%, V2 = 19 (use 18)
Therefore, the result is highly significant and the change can be assumed to have
reduced the process variation.
2. Table 8.6 gives the data obtained in an analysis of the labour turnover
records of the departments of a factory. Is there any evidence that departmental
factors affect labour turnover and if so, which departments?
Average Number of
Dore labour force leavers/year
A 60 15
B 184 16
C 162 iS
D 56 12
E 30 4
F 166 25
G 182 aS
H 204 18
Table 8.6
3. Table 8.7 gives the data obtained on process times of two types of machine.
Is machine A more variable than machine B?
Sampling Theory and Significance Testing (II) 183
Machine A Machine B
Table 8.7
6. The number of cars per hour passing an intersection, counted from 11 p.m.
to 12 p.m. over nine days was 7, 10, 5, 1, 0, 6, 11, 4, 9.
Does this represent an increase over the previous average per hour of three
cars?
8. A coin is tossed 200 times and heads appear only 83 times. Is the coin
biased?
those of the six areas before the special campaign. The data are given in table 8.8.
Has the new campaign had any effect on the sales?
Table 8.8
1. The null hypothesis is set up—that the advanced typing course will not
affect the speed of typists.
This is a paired ‘t’ test since by considering the differences only, the variation
due to varying basic efficiency of the individuals is eliminated.
Difference
Typist
ae xe
1 5 25
2 8 64
3 sa 4
2x; = 15 xs a= 93
_x-—0. 5-0.
t= oi cs = 389
V3 a
with 2 degrees of freedom.
Sampling Theory and Significance Testing (II) 185
A 60 15 TS TRS)
B 184 16 23.0 wai
Cc 162 15 20.0 12
D
a 0, {86
56
4 {16
ae
be
be
ial: 25
F 166 25 20.8 0.9
G 182 25 22.8 0.2
H 204 18 SS Ve)
Table 8.10
Since in department E, the expected number of leavers is less than five, it has
to be grouped with another department. It is logical to group it with a similar
department or one whose effect would be expected to be similar on number of
leavers. Here, having no other a priori logic, since there is little difference between
the observed and expected frequency for department E, it has little effect. Here
it is combined with department D, the next smallest.
S235
=F00 * 00-79
and so on.
Thus, x? = 16.6 with (7—1) or 6 degrees of freedom since only the total was
used to set up hypothesis.
Reference to table 8*
x3.05
=12.592 x}.o1
=16.812 X$,001
= 22.457
Thus result is significant at 1/20 level, or there is evidence of differences between
departments.
When such a result is obtained then it is usually possible to isolate the
heterogeneous departments by locating the department with the largest
contribution to x. Providing the x? is significant at 1 degree of freedom or
exceeds 3.841 at 1/20 significance level, this department should be excluded
from the data and the analysis repeated until x” is not significant. If the result
is significant with no single contribution greater than 3.841 then conclusion
can only be drawn that heterogeneity is not due to one or two specific
departments but general variations between all.
The results of repeating the analysis excluding department A are given in
table 8.11.
Average Expected Le
Dept. inbour Nome of umes conte tee
force/year eavers/year leavers/year ee
Table 8.11
eg
= 6.25 Vv; = 99 v, = 79
Fo
Referring to table 9* use v, = 24, vy 2 60 (on safe side)
Clearly present value is highly significant, the product from machine A is more
variable than the product from machine B.
4. Null hypothesis—the change to the process has not affected the time.
Letx, = time of new process
xX, = time of old process
si 2G : 5)" _ 16.44
Mean of old process X2 = 37s.
Variance of old process estimated from sample
st = 42.67
In order to apply the ‘t’ test, the variances of the two populations must be
the same.
Using the ‘F’ test to test that population variances are the same, gives
42.67 ;
f =
16.44
eee =
2.59 with v, =
=9 degrees of freedom
f f;
v2 = 9 degrees of freedom
ge = = Ds + (m2 — D8 _ 996
ny t+n,—-2
188 Statistics: Problems and Solutions
p= GT=3)-9_ 0.82
with 18 degrees of freedom.
From table 7*
to.05/2= 2.101 to.02/2= 2.878
or result is not significant at 0.05 level—there is no evidence that the change has
reduced the time.
The result is significant at the 5% level but not at the 1% level. Some further
sampling would probably be in order so as to reduce the errors involved in
reaching a decision.
Strictly, the one-sided F test (used because there is, say, prior knowledge
that the conversion cannot possibly increase product variation but may reduce
it) should be applied as follows.
Observed
ae
B= 7 gz = 0.37
F . =
1 SE
1
0.95, 12, Faas ane 0.435
Since the observed value of F is lower than this, the reduction in variation is
significant (statistically) at the 5% level.
The lower 1% point of F is
| fale
3.36 2:30
and the observed F is not significantly low at this level.
Sampling Theory and Significance Testing (II) 189
Dx,=53 Ex?=429
Sample mean
x=5.9
Vari
ariance 429 — ee 2
2=
—g =
14.6
Standard deviation
s= 3.82
Sexe A
t= 3.8) 2.28
V9
to.05/2 = 2.306
Thus the result is not quite significant at the 5% level. On the present data no
real increase in mean traffic flow is shown.
7. Sample mean
ZX i
x= = 0.136 min
n
Estimate of population variance
were Ve
= mae tr = 0.000212 = 0.0146 min
Let uo = unknown true population average. Then for 95% confidence
= s ~ s
X— 10.025 TS Mo <x + 10.025 J,
0.0146 0.0146
0.136—2.11 V18 = Lo 0.1364 241 V18
8. This problem will be solved using two alternative methods—the ‘u’ test and
the x? test.
190 Statistics: Problems and Solutions
83.5 100
Figure 8.1
Heads Tails
Observed O 83 iG
Expected EF 100 100
Table 8.12
Heads Tails
O 83.5 116.5
E 100 100
x2 = 83.5—100) , GIGS
sin #2 3
SEO 2
ae
0 100
with 1 degree of freedom.
Sampling Theory and Significance Testing (II) 191
9. Here, since from a priori knowledge, it can be stated that the new campaign
can only increase the sales rate. Then a one-tailed test can be used for extra
‘power’ in the test.
Again the paired ‘t’ test is applicable. Null hypothesis—new campaign has
not increased the sales.
1 +500
oy) —600
3 +600
4 —200
5 +600
6 +300
Average +200
Table 8.14
Code data
tego
y fi OO
Thus x =2
27 = 2244, 8, = 4.93
2-0
= = 0.99
04.93
V6
with 5 degrees of freedom
to.os = 2-015 (for one tail test)
Red Yellow
rod population rod population
Table 8.15
The population parameters are given in table 8.15. These parameters are
chosen so that for the first part of the experiment with sample sizes n = 10,
approximately half the groups will establish a significant difference between the
populations while the other half will show no significant difference at the 5%
probability level. Since each group summarises the results of all the groups, this
experiment brings out much more clearly than any lecture could do, the
concept of significance.
In the second part of the experiment where each sample size is increased to
30, the probability is such that all groups generally establish (95% probability)
a significant difference. The experiment demonstrates that there is a connection
between the two types of error inherent in hypothesis testing by sampling and
the amount of sampling carried out. To complete this experiment, including the
full analysis, takes approximately 40 min.
Appendix 1
Object
To test whether the means of two normal populations are significantly different
and to demonstrate the effect of sample size on the result of the test.
Method
Take a random sample of size 10 from each of the two populations (red and
yellow rods) and record the lengths in table 1. Return the rods to the
appropriate population.
Also take a random sample of size 30 (a few rods at a time) from each of the
two populations (red and yellow rods) and record the lengths in table 2.
Analysis
(1) Code the data, as indicated, in tables 1 and 2.
(2) Calculate the observed value of ‘t’ for the two samples of size 10 and again
' for the samples of size 30. ,
(3) Summarise your results with those of other groups in tables 3 and 4.
Observe whether a significant difference is obtained more often with the
samples of size 30 than with the smaller samples.
Notes
The ‘t’ test used is only valid provided the variances of the two populations
are equal. This requirement is, in fact, satisfied in the present experiment.
Table |
Yellow
ee
population
In order to reduce the subsequent arithmetic,
and to keep all numbers positive, the coded
values, x’, are used in the calculation. The
os
x’ id
!
ve}
e/o]S{s
SJ
iN
SE
ay
A os a8)
If b is the length of the shortest red rod in the
sample, the mean, y, of the sample is
ou Ly
v= bt 10 = 5.97
lengths data
The variance, s},, of the red sample is
Dy"? ~~ (zy)?
S |p.
®
52 = 10 = LOY stole ai0 0667
> 9 9 nae
S ” vt
The pooled estimate of variance, s?, is
|S]}o]s
~) Om a
Gea
Ges)
@
rx’? peels fs Lys “Cy:
>w of
| fel? S N) S 1e) es
Ss
2 =
18 1 =
0.0512
. fe)
(2 8=9 8 £6.17 5.97) oi0aT
° fe) £ , [1,1 V0.0512V%5 +75 0.0512
e
in
eal
aa ng ny
= 1.96
> SS
1O
OF]aie
cl
ea S/IS]S
s/o]—-}y~r{—-
= Ly "2
Table 2
Yellow
SS
population
a a
—(30)
ee
Red ae population
a he Ee
— (30)
08
02S goatee 6.10
peo es
ere Ogee
PITNIOTO};lo];o
|e
Q ae 1. (ay)?
pa 0.8 = 30
O% le
[Ff
ole
NI
HMI
o~ i 304 30—2
OF
S|
[ROHR
Ci
=|
Su
S/S
F ole
-|0
y aun Oey) A?
lwlole
[TS
Tp
TN
|ALAIS
IS
IW
ISIS
SIS
sIs
ISIS]
22
GS)
ISISOISI/OIY
ips
G a) -—
{|
Oo x 10) 0 Ny Ny
|o gfj»
ww
ON
oe
OS
tee
Sia
Ge
TS iw]
{re
iP
O}|wW
15
&NI}
=|» S|S2 |FOo
9 YIN + be
Re
seh
aa ° ie) =
which reduces to
foo los S ie) » S 2 2
fer feuloul [eo -= a eo
(X-Y)/870
ie)
;
NEVES
=
| 0 [oslooa 1 fe
0-6 |O'36
a6 [ons
| £8 [on joo
Miae. a 5h
Sample means Difference Value Whether
significant
Group at 5% level
(x=y) r (two -tail test)
2 Coal
3 6-38
4 6: 25S)
| 5) (gp 80
6 6-04
[ il 6-\3
8 ON arf,
The value of |*]| which must be exceeded for the observed difference to be significant
at the 5% level =2:10l
Table 4
9.1 Syllabus
Assumption for use of regression theory; least squares; standard errors;
confidence limits; prediction limits; correlation coefficient and its meaning in
regression analysis; transformations to give linear regression.
a priori knowledge or assumptions that this relationship will hold for other
values of x.
n= a+ (x3)
then the best estimate of this line is given by
Y=atb(x-X)
where
n
dLfyi
a =
Ly;
I
and
dy ~a b=
» Oi-HGei-¥)
n a ae
U ~
Since this book is concerned with giving an introduction to the theory, the
examples given will be for this case of paired variables. It should be stressed,
however, that for cases where f; > 1 a more rigorous theory can be developed
and, in fact, a test for linearity can be incorporated into the analysis. Details
of this more advanced analysis can be found in most mathematical statistics
textbooks.
This omission of an independent test of linearity requires usually an a priori
knowledge of linearity and this should in all cases be examined by drawing a
scatter diagram.
55)
€,= ae
standard error of b
Qeete Sale
> V2 - x)
‘where s’ = residual variance about regression line
_ 20;
= i)
n—2
where Y; = estimate from regression line.
The significance of a and b can, therefore, be tested by the ‘t’ test (see
chapter 8) or, alternatively, as shown in some textbooks, by an ‘F’ test, (see for
example Weatherburn, A First Course in Mathematical Statistics, C.U.P., pages
193 and 224, example 8).
b—8B
ep
200 Statistics: Problems and Solutions
t=
b=
€p
given by the data can be referred to table 7* of the Statistical Tables and if it is
significantly large, judged usually on a two-sided basis, there is thus evidence of a
linear relationship between y and x.
= s%,(1-17) SS:
~1<r<tl
Figure 9.1
Lyi
Suyt
a
e n
oy xe (2x;)(Zyj)
b =
: n
ye (x,
: n
The correlation coefficient
Ricmedicme|
ties HENS)
9.2.8 Transformations
In some problems the relationship between the variables, when plotted or from
a priori knowledge, is found not to be linear. In many of these cases it is possible
to transform the variables to make use of linear regression theory.
For example, in his book Statistical Theory with Engineering Applications
(Wiley), Hald discusses the problem of the relationship between tensile strength
of cement (y) and its curing time (x).
From a priori knowledge a relationship of the form y = Ae®* is to be
expected.
The simple logarithmic transformation therefore gives
B
logio Y =logig A — - logio €
Student 1 2 3 4 5 6 if 8 Ohe10
Numeracy test score 200 175 385 300 350 125 440 315 275 230
(pts)
Final exam performance Saat tealale Gl mgole 0.8/4.3 67 An OS euro 2
(%)
Table 9.1
What is the best relationship between test score and final performance?
Before any analysis is started the scatter diagram must be plotted to test the
assumption that the relationship is linear. This diagram shows no evidence of
non-linearity (figure 9.2).
204 Statistics: Problems and Solutions
100
90
80
70
60
50
“\X Be Confidence limits
Le -
40 eat *
Bad eet
(%)
final
in
Score
exam
30 Prediction limits
Figure 9.2. Regression line with 95% confidence limits and prediction limits.
Total variance of x
ox? 2»
t n uh 69 395 ~2795F
104 )dxi 871228
ag : eas wae
Linear Regression Theory
205
Total variance of y
Correlation Coefficient
Regression Coefficients
However, in addition, in this example, 1 (=10) is not very large and the
approximation used will lead to an underestimate of the actual residual variance.
Using the exact expression gives
2
st MD = 14.4 x8 = 16.2
and this will be used in the remaining calculations since without this correction,
the bias of the estimator is — (11%) of the true value.
The residual standard deviation, s, is+/16.2 = 4.02
206 Statistics: Problems and Solutions
Y; + taj2,n—2) €Yi
For 95% limits, the appropriate value of t (table 7*) is 2.306; table 9.2 shows
the derivation of the actual limits for a range of values x.
The scatter diagram (drawn before any computations were carried out, in
order to check that the basic regression assumptions were not obviously violated),
the fitted regression line and 95% confidence limits are shown in figure 9.2.
From figure 9.2 or table 9.2, there is 95% confidence that the average final
examination percentage for all candidates who score 330 points in their initial
numeracy test will lie between 61.3% and 67.9%.
Linear Regression Theory 207
——_—_—_—_—n
Table 9.2
Veh W3tte,;
These limits are calculated in table 9.3 and are also drawn in figure 9.2.
Table 9.3
From the figures in table 9.2, it can be expected, for example, that 95% of
candidates scoring 330 points in their numeracy test will achieve a final
examination mark between 55% and 74% inclusive, 5% of candidates gaining
marks outside this range.
208 Statistics: Problems and Solutions
Note: Such predictions are only likely to be at all valid if the sampled data
used to calculate the regression relation are representative of the same population
of students (and examination standards) for which the prediction is being made.
In other words, care must be taken to see that inferences really do apply to the
population or conditions for which they are made.
The danger of extrapolation has been mentioned. The regression equation
indicates that students scoring zero in the test, on average, gain a final mark of
35.6%. This may be so but it is very likely that the relation between the two
examination performances is not linear over all values of x. Conclusions on the
given data should only be made for x in the range 125 to 440.
0.2 102
0.3 129
0.4 201
0.5 342
0.6 420
0.7 591
0.8 694
0.9 825
1.0 1014
ival 1143
jo. 1219
Table 9.4
Calculate the linear relationship between strength and thickness and give the
limits of accuracy of the regression line.
60 960
220 830
180 1260
80 610
120 590
100 900
170 x 820
110 880
160 860
230 760
70 1020
120; 1080
240 960
160 700
90e% 800
110 1130
220 760
110 740
160 980
80 800
Table 9.5
state the limits of error in using this relationship to predict farm income from
farm size.
(a) Calculate the regression coefficients and thus the regression equation
210 Statistics: Problems and Solutions
which will enable the manufacturer to predict the unit cost of these lenses in
terms of the number of lenses contained in each order.
(b) Estimate the unit cost of an order for eight lenses.
5. The work of wrapping parcels of similar boxes was broken down into eight
elements. The sum of the basic seconds per parcel (i.e. of these eight elements)
together with the number of boxes in each parcel is given in table 9.5.
1 130 22 260
6 200 20 190
13 150 34 290
19 200 42 270
Table 9.5
(a) Calculate the constant basic seconds per parcel and the basic seconds for
each additional box in the parcel.
Calculate the linear regression and test its significance.
(b) What would be the best estimate of the basic seconds for wrapping a
parcel of 18 boxes?
Estimate the best linear time trend and calculate confidence limits for
forecasting.
Variance of x ees
6.49 — lie
i=
> 10
LPs
107
10
0.110
Total Variance of y
_ (6680)?
Ti idea 11__ _ 1636 376.2 _ 163 637.6
s2, = 10 10
Correlation Coefficient
consti eeisase)
= USI 28
r GAO x 1 636 376.2)
The proportion of the total variance of y ‘explained’ by the linear regression
relation betweeny and x is approximately 0.99287 or 98.6%.
Regression Line
a=y = 607.3
ig Eee 0.9928 x
b=rx2 163
TTD 637.6 _ 1210.9
212 Statistics: Problems and Solutions
Standard Errors
The estimated residual variance about the regression line is
(aS_ 120.9209
a =24.9
a very highly significant value of t for 9 degrees of freedom.
ey = VIS? + ed + Be 3)?]
Table 9.7 shows some values of these two standard errors for particular
values of x, together with the 95% confidence and prediction limits using the
appropriate ¢-value of 2.26 (9 degrees of freedom).
The information in this table, as well as the observed data are plotted in
figure 9.3.
Notice that the fitted ‘best’ line does not go through the origin. In fact the
origin is not contained within the 95% confidence interval for the ‘true’
regression line—which is equivalent to saying that the intercept of the fitted
line is significantly (5% level) different from zero. From inspection of the
observed data, there is a suggestion that the true relation curves towards the
origin for low values of sheet thickness. In short, do not extrapolate for
thickness values below 0.2 mm and bear in mind that the calculated relationship
for sheet thicknesses of 0.2 mm and just above may underestimate the average
shear strength of welds.
Linear Regression Theory 213
Table 9.7
1400
Regression iine
Prediction
limits
(kg)
Shear
sheets
of
strength
Confidence limits
Figure 9.3. Regression line with 95% confidence limits and prediction limits.
214 Statistics: Problems and Solutions
In the following solutions, since the calculations are all similar to that of
problem 1, the detailed computations are not given.
2. Here the scatter diagram (figure 9.4) shows little evidence of a relationship
but, on the other hand, it does not offer any evidence against the linearity
assumption so the computation is as follows.
n= 20 x= 139.5
x = 2790 p= 872.0
Sy = 17440 s2 = 3194.47
Lx? = 449 900 s} = 28711.58
Sy? = 15 753 200 r= +0.0078
Exy = 2 434 300 Sy = 56.5
Sy = 169.4
b = 0.0078 x ws = + 0.02339
1300
1200
600
($) 500
Income
400
300
200
100
Figure 9.4
Linear Regression Theory
215
Regression Line
Significance of b
From inspection of the scatter diagram (figure 9.4) and the low value of r (the
significance of which can be tested using table 10*), the observed value of b is
not expected to differ significantly from zero.
Residual variance
w= | 30 305 = |(33398)-0.7
Soe Zep) 2."
60 695 ez .
0.0234 —0
Oe a ome 0.033
which is clearly not significant. (For the slope of the fitted regression line to be
significantly different from zero, at the 5% level, the observed value of t would
have to be numerically larger than 2.101.)
Thus, until further evidence to the contrary is obtained, farm income can be
assumed to be independent of farm size, at least for the population of farms
covered by the sample of 20 farms.
Since the data show no evidence of a relation between farm size and income,
there is little point in retaining the fitted regression equation. The best estimate
of the mean income of farms in the given population is therefore $872.
Ninety-five per cent confidence limits for this mean income are given by
y 45
x
40 =
x
35 X
30 x
x
25 The relationship is not
linear so analysis cannot
be continued
10 20 30 40 50 60 70 80
Figure 9.5
4. The scatter diagram (figure 9.6) indicates quite a strong relationship between
unit cost and order size, and a simple linear relation would probably be adequate,
at least in the range of order size considered. Such a simple model would be
inadequate for extrapolation purposes since the cost per unit would be expected
to tend towards a fixed minimum value as order size was increased indefinitely
and therefore some sort of exponential relation would be a better fit for such
purposes.
Linear Regression Theory
217
60 Regression line
+ 50 95% Confidence
%e limits
S 40
=
o
a
ae)
°o
3 SS
x
20 oa
aS
Sx
10
penne | els [E
i Rees
E SPRL SOF
O 2 4 6 8 10 l2
Number of units in order
Figure 9.6
Regression Line
a=y=424
Significance of b
Residual variance
s? = 213.3[1 —(—0.9459)?] x 3 = 29.94
Standard error of b
~—2,97 —0 gare
“0-587 sabe
Reference to table 7* for 3 degrees of freedom shows that the value of |¢|
for significance at the 1% level is 5.841 and at the 2% level is 4.541. The
observed value of t falls between the two and it may reasonably be inferred that
the slope of the ‘true’ regression line is different from zero and is negative, the
best estimate of its value being —2.97.
95% confidence limits for the regression estimate at several values of x; are
derived in table 9.8, figure 9.6 showing these limits plotted on the scatter diagram.
Table 9.8
(b) To estimate the unit cost of an order for eight lenses, substitution of
x = 8 can be made in the regression equation giving
This figure is the ‘best’ estimate of the average over all possible orders of eight
lenses, of the cost per lens in an order of eight lenses.
The uncertainty of this figure (£37.0) is given by the interval (at 95%
confidence) £28.50 to £45.50.
If required, the cost per lens for a randomly selected order for eight lenses
is likely to be (95% probability) in the interval, £17.64 to £56.36, a very wide
range indeed.
5. The scatter diagram (figure 9.7) does not show any evidence against the
assumption of linearity and in this example, a priori logic suggests that it would
be a reasonable model of the situation.
Let x = the number of boxes in a parcel and y = the number of basic seconds
per parcel.
300 i
280 Yi
260
240
220 <<
200
parcel
Basic
per
sec
180
140
120
Figure 9.7
220 Statistics: Problems and Solutions
The following totals are obtained from the data (without coding)
n=8
xx = 164 x =2Z05
x? = 4700 sx = 191.14
zy = 1690 yp = 211.25
Ly? = 380 100 sy = 3298.21
xxy = 39 130 r= +0.8069
Regression Line
a=y =211.25
Significance of b
Residual variance
s* = 3298.21 (1—0.8069?) x % = 1342.5
Standard error of b
‘4 alk13842:5\i0
ae ik1.002
Observed value of
_3.35=0 = 3.34
1.002
Reference to table 7* shows that this value, having 6 degrees of freedom, falls
between the 2% and 1% levels of significance (3.143 and 3.707 respectively). The
slope of the regression line can therefore be assumed to be different from zero
with b = 3.35 as its best estimate.
Table 9.9 shows values of €y; for certain x; together with 95% confiden
ce
limits for the regression estimate at that point. The scatter diagram (figure 9.7)
also has 95% confidence limits drawn on it.
ee
Table 9.9
6. Here, in order to reduce the computation slightly, all the basic data have
been coded into units of $100; i.e. $1300 becomes 13 etc.
The scatter diagram (figure 9.8) illustrates the case of ‘fliers’ or ‘outliers’, i.e.
readings which do not appearto belong to the bivariate distribution. These
suspect readings are marked as A and B in figure 9.8. Whenever such observations
occur in practice, a decision has to be made as to whether or not to exclude
them. Special tests to assist in this are available but are beyond the level of this
book and all that can be said here is that the source of the readings should be
carefully examined and if any reason is found for their not being homogeneous
with the others, they should then be rejected. In many cases, a commonsense
approach will indicate what should be done.
In this example, the two points, A and B, clearly do not conform and a closer
examination of the situation would probably isolate a reason so that the points
could validly be excluded. However, to demonstrate their strong effect on the
analysis, the points A and B have been retained in fitting the regression line.
222 Statistics: Problems and Solutions
x (B)
: ae
va
ee
3000 x
x a
4 Regression line
not significant
2500
sales($)
Total
Figure 9.8
Regression Line
a=y
= 23.64
b i 0.356% [(43.45\
Zee) _ 0.898
(Y—23.64) = 0.898(x— 10.64) Y = 14.09 + 0.898x (in $100)
or converting back to the original units
s? = 43.45(1—0.35667) x 2 = 42.14
Observed ;
2 0-05 0:098—_
ar 0.784 es es
Ly? = 5266
Lxy = 2353
leading to
r=0.9854 and Y=66.15 + 2.29x (in $)
The fact that just two points have obscured the relationship should be noted,
as should the assistance given by the scatter diagram towards interpretation of the
situation.
y SOO
400
oa Regression line
($)
sales
Total
200
100
Figure 9.9
Ly = 2740 s2 = 6.0
Linear Regression Theory 225
Regression Line ~
a=y = 342.5
b a 0.4403, /(S3071\-_
a ) 13.095
(Y — 342.5)= 13.095(x
— 4.5) Y = 283.57+ 13.09x
Significance of b
The residual variance about the line is
s* = 5307.1[1—(0.4403)?
]x Z = 4991.3
The standard error of b is ;
Ep -|(24)- 10.90
and
_b-0_ 13.09 _
7 a, =10.90 on
with 6 degrees of freedom.
Reference to table 7* shows that this is not significantly different from
zero, that is, there is no evidence of a relationship between sales and time. In
this case there is no point in using the regression equation above to estimate sales
for 1968 (Year 9) or beyond. The average yearly sales figure of 342 is probably
as good a figure as any to use for making a short-term forecast on the basis of the
information given.
it 1907 sooner ps
€ wdg Ms g4
- 192 00,ti “(2S eas
ai+52 88S.at . DD
a eee
_ 7 EDP
ees
Y\.-
= la oa
So , sg
as y
¥. { aS
+ el Nae eeCEE g
t oeHi“i
Poy
MAS nd
Ss Ua es ose aa: ‘a
meat! tasuot ib rei nmaltingie Perce aids tet weit *<oides of som 151 5k.
nf tai? bn asice na-rwise qislenoiiglos # to sansbive'on a sree 2} are
estsz sinmiseeod svoda nciiaups noleesgay od} giles af Iniog onal bs
>sfdiedorg2REY 0 orugil seine yitaey Sge79ve oT bnoyed 1 (2 1e9¥)} 82
= att Yoand alt fa Faeroe) Anist-toile 9anixsiriol saroF vite #6
pa Sc
=
= ed . 7” vai
@a
P,
ee)
hap
as
ethnono 200f Sree cs
#
a =
| eet
a
ke
ee
es 3 ge
ee eae
:
| ee) | ee 2 ie wi
oe tae i
give SR
-
2
7
The scanter dipiigraiabws Ac i hath baipives
are
a AT
\"
“ 4 hs
aon, ae
Se ew
The first two chapters of this book provide a detailed treatment of two of
the basic concepts of statistics—Probability and Distribution, and thereafter,
% apart from brief summarised introductions to other topics, the work rests
mainly on worked and unworked examples. Every attempt has been made to
use examples which will stimulate interest and to demonstrate practical
applications throughout the whole range of an introductory course in’ statistics
The book will be especially useful to students of engineeri > and to all ae
he seeking an elementary introduction, to the subject. Although the book can be
i used as an independent text, the attention of students is drawn to two all
books by the same authors which may in certain cases be used to advantage ti
alongside the present volume. These are Basic Statistics, Laboratory Instruction
Manual, and Statistical Tables. Further details are availab|le from the
publishers. it
SBN 333 12017 5 AMD tlh
224 430.74797 .
226 485.45426 326
228 440)16822 328
286; oe 93986
T BBB 584, BB7LAy