1 Statistics PDF
1 Statistics PDF
,,
I
,,
·,,
,·
I
I
i
(
l
'·
t
!
,,
_)! __
tieinemann Educational Australia
Heinemann Educational Australia
a division of the Octopus Publishing Group Australia Pty Ltd
22 Salmon Street, Port Melbourne, Victoria 3207
Offices in Sydney, Brisbane and Adelaide.Associated companies,
branches and representatives throughout the world.
©J.B.Fitzpatrick and P. L. Galbraith 1990
First published 1990
Reprinted 1991
All rights reserved.No part of this publication may be reproduced,
stored in a retrieval system or transmitted in any form by any means
whatsoever without the prior permission of the c_opyright owner.
Apply in writing to the publishers.
Edited by Scharlaine Cairns, Charlie C.Editorial Pty Ltd
Designed by Tom Kurema
Illustrations by Gavin Mount
Keying and preparation of disks by Tricia Randle
Typeset in Times Roman by Savage Type Pty Ltd, Brisbane
Printed in Singapore by Chong Moh Offset Printing
National Library of Australia
Cataloguing-in-publication data:
Fitzpatrick, J. B. (John Bernard).
Reasoning and data
Includes index.
ISBN O 85859 527 3.
l . Mathematics. I. Galbraith, P. (Peter).II. Henry, Bruce. Ill.
Title.(Series: Heinemann senior mathematics).
510
'·
Contents
(Projects and Investigations are identified by [;J and U respectively.)
Acknowledgements (x)
Preface [xi)
Chapter 1 Statistics 1
1.1 Graphical representation of data 2
(;I 1.2 Limited-over cricket 9
1.3 Continuous and discrete data 10
1.4 Frequency distribution 10
1.5 Histograms 12
1.6 Frequency polygons 14
1.7 Measures of central tendency 17
1.8 Measurement of dispersion 22
(;I 1.9 Multi-lingual 'Scrabble' 30
Chapter 2 Probability 31 1:
(v)
/
3.9 Combinations 92
3.10 The symbol (;) or ncr 93
3.11 Probability associated with permutations and combinations 100
(vi)
8.10 Analysis of a time series 248
8.11 Measures of trend 248
8.12 Measurement of seasonal variations 253
8.13 Forecasting using single moving average 256
(vii)
/
(viii)
Chapter 14 Calculus [extension) 423
14.1 Power series for � 424
14.2 Antidifferentiation by parts 427
14.3 Other density functions 429
14.4 Measures of location for probability distributions 438
14.5 The mean (expected value) of g(X) 444
14.6 Variance and standard deviation 444
Summary 495
Answers 503
Index 531
(ix)
Acknowledgements
The authors with to express their thanks to Mr Ted Byrt, formerly of State College Rusden
Campus, for his contribution and helpful suggestions in the area of statistics.
The authors and publisher would like to thank the following individuals and organisations
for their assistance in providing photographs and for their permission to reproduce
copyright material:
Charles Ciurleo, pp. 77 (a, b, c), 137 (b) and 219; D. A. Heffernan, p. 401; The Herald
& Weekly Times Ltd, Melbourne, pp. 77 (d), 263 and 423; Tattersall Sweep Consultation,
pp. 205; Tubemakers of Australia Ltd, p. 167.
Every effort has been made to trace and acknowledge copyright material and the authors
and publisher would welcome any information from people who believe they own copyright
material used in this book.
(x)
Preface
Reasoning and Data provides a comprehensive coverage of the compulsory sections of the
unit, together with detailed coverage of eight of the content clusters. The book also provides
for study of Reasoning and Data at the extension level, with coverage of the probability,
statistics, and algebra requirements together with two selections (calculus and geometry)
from the additional study areas.
With respect to the work requirements, essential content in the area of probability is
contained within Chapters 2, 4, 5 and 6. The compulsory statistics material is contained in
Chapters 1, 4, 5 and 6. Chapter 1 is an introduction, consolidating aspects of data
representation that will have been studied to varying degrees in past years. The other
chapters systematically introduce discrete and continuous distributions together with their
special features, and related calculations of statistical measures and estimates of parameters.
The logic requirements are provided for within Chapters 2, 10, 12 and 13. Set diagrams are
utilised in probability work (Chapter 2) and also in the chapters on logic and reasoning
(Chapter 11) and Boolean algebra (Chapter 13). The concept and application of proof
appears in the chapters on logic and reasoning (Chapter 11), graphs and optimisation
(Chapter 10), methods of proof (Chapter 12) and Boolean algebra (Chapter 13). 'Graphs
and optimisation' (Chapter 10) contains all the material necessary for the study of
undirected graphs.
The algebra section is well covered. Chapter 3 contains applications of combinations; basic
equation solving and formula manipulation is required regularly throughout almost all
chapters; set algebra is used widely in Chapters 3, 11 and 13; sequences and series are
applied in Chapters 8 and 12, and Chapter 8 also includes work on non-linear relationships.
The companion volumes Space and Number and Change and Approximation contain
additional material that systematically addresses analytical and numerical methods for
solving equations and inequations.
Clusters of content
The following chapters contain material that enables comprehensive coverage of the
nominated clusters.
Combinations Chapters 3 and 4
Sampling processes Chapter 4
Probability distributions - geometric, Poisson and exponential Chapter 5
Time series analysis and economic statistics Chapter 8
Correlation and regression Chapter 8
Non-parametric statistics Chapter 9
Logic and proof Chapters 11 and 12
Boolean Algebra Chapter 13
In addition, substantial amounts of material pertaining to the Clusters (Random sampling,
Estimation and confidence intervals, and Directed graphs) are also included.
(xi}
/
/
For the extension course, Chapters 2, 3, 4 and 6 contain extension material for probability;
Chapter 6 provides extension material for statistics; and Chapter 3 provides extension
material for algebra.
Within the additional area of study, two options are provided; 'Calculus extension'
(Chapter 14) and 'Euclidean geometry extension' (Chapter 15).
The treatment of the subject matter emphasises coherence so that, where relevant, extension
material appears as a natural development bf the core material. Chapter 7 'Problem solving
and investigations' provides material particularly geared to problem solving, modelling, and
project work.
Features of the presentation include:
• a systematic and thorough introduction to, and consolidation of, content material to
promote concept understanding and facility in skills and standard applications. Numerous
worked examples and sets of exercises are included to this end, including sets of revision
exercises.
• provision of problem-solving examples, modelling situations, investigations and project
material integrated through the chapters, in addition to those provided in Chapter 7.
• integration of the electronic calculator throughout, and provision of computer-based
learning tasks for concept learning, applications and investigation and project work.
Project material is defined in terms of its nature rather than its length. School projects of
varying lengths may be obtained by combining one or more text-based projects. Text-based
Projects and Investigations are frequently presented in a sequential fashion so that
variations between students can be provided for, e.g. not every student may be required to
complete every part of such an activity. A computer application often forms the final section
of a Project I Investigation and can be retained or omitted without otherwise affecting the
structure.
The authors endorse the spirit and intent of the general course structure and its work
requirements. It is expected that many effective modelling s_ituatfons, investigations and
projects will be designed with the local school environment in mind. This book provides a
supporting base upon which such local emphasis can be built, while at the same time
containing more than sufficient material to meet the work requirements in all areas.
(xii)
CHAPTER
1
Statistics
Statistical information presented by courtesy of Toshiba (Aust) Information Systems
Division.
/
2 STATISTICS
'Statistics', in a broad sense, deals with scientific methods of collecting, recording and
summarising data from which future trends can be predicted, or which can be used as a
basis for making decisions and drawing valid conclusions.
Government departments use data collected by statisticians to observe trends in such areas
as population growth, urban development and employment, so that provision can be made
for public transport, schools, hospitals, playgrounds, and so on. Can you think of other
uses by government departments of statistical data?
Sporting commentators use statistics to compare the performances of individuals and teams,
for example cricketers' batting and bowling averages.
The school uses statistics in the form of class lists and student subject choices to help
determine the number of teachers needed, classroom allocations, number of desks and
lockers required, and so on. Your teachers are using statistics when they analyse and
interpret your assessment results to determine your progress or to obtain the class average
in various subjects.
Industry and commerce use statistics, for example, to help reduce the number of defective
items produced by machines. If records show that a particular machine is constantly
producing inferior quality articles, management uses this information to decide if the
machine should be repaired or replaced. Can you think of other uses of statistics in industry
and commerce?
Example 1
In 1987, 705 people died on Victorian roads. These people were either drivers, passengers,
pedestrians, motorcyclists (and pillion passengers) or bicyclists as shown in the following
table:
The bar graph, in Figure 1-2, is similar to the column graph, but bars or rectangles of any
width replace the vertical lines of the column graph. The bars are equally spaced and of
the same width. Frequently, in a bar graph, the bars are placed horizontally instead of
vertically.
Number
killed
310
166
137
I
67
25 I
D Pa Pe M B
Figure 1 -1 : Column graph
Number
killed
310
166
137
67
25
D Pa Pe M B
Figure 1-2: Bar graph
. The pie chart, in Figure 1-3 shows the number of each type of road user killed expressed
as a proportion or percentage. The steps used in drawing the pie chart are:
a Express the number killed as a percentage.
Drivers: ��� = 43.9%
Passengers: �i� = 23.6%, etc.
b Convert each percentage into its equivalent part of a circle, in degrees.
Drivers: 43.9% of 360° = 158°
Passengers: 23.6% of 360 = 85° , etc.
°
c Draw a circle and, with the aid of a protractor, mark accurately the sector representing
each type of road user.
/
4 STATISTICS
Example 2
The bar graph in Figure 1-4 shows the profit, before and after tax was paid, of a chain store
operating throughout Australia. The information is given in the table below.
The full height of each rectangle in Figure 1-4 represents the profit before tax.
The information given in Example 2 can also be illustrated by means of a line graph, as
in Figure 1-5. It should be noted that only the position of the dots represents the
information given. The steepness of the lines joining these dots indicates the degree of
increase or decrease. For exampie, the profit after tax rose more sharply from 1985 to 1986
than in any other year.
Profit
$ million t:·:·:·:·I Tax
60 - Profit after tax
50
40
30
20
10
Profit
$ million 60 -- Profit before tax
-------- Profit after tax
50
40
_ ........
--
30 ..
........... ...
-
k ''
- --
--
--
--
20 --
--
--
------
.. -
-- --
10 ------
Figure 1-5
STATISTICS 5
Exercises 1a
(Most of these questions are based on data supplied by the Australian Bureau of Statistics.)
1 The table on the right shows the
percentage of imports into Victoria Country Imports Exports
from various countries, and the
exports from Victoria to other USA 25.7 14.2
Japan 19.2 14.6
countries, in 1986-87.
Germany 9.7 4.0
Represent the data by means of bar UK 7.2 -
graphs and make any comments you China 6.4 8.8
consider to be relevant. NZ 3.9 7.9
Italy 2.9 -
Hong Kong - 5.5
Singapore - 4.3
Other 23.1 40.8
2 The following table shows the rainfall (mm) and the number of days of rain in
Melbourne for each of the twelve months in 1987.
Month Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec
Rainfall mm 57 57 50 19 85 51 69 23 38 39 82 85
Days of rain 9 7 10 8 17 16 11 16 14 10 13 10
Represent the data by means of line graphs and make any comments you consider to
be relevant.
3 The following table shows the number of divorces (to the nearest thousand) granted in
Australia in the years 1973 to 1983.
Year 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983
Petitions granted
('000) 16 18 24 63 45 40 38 39 41 44 44
Injured ('000) 17 20 24 18 20 20 23
a Using the same scale and axes, draw line graphs to represent the data.
b In December 1970, legislation requiring compulsory wearing of seat belts was
introduced. Explain how your graphs illustrate the effectiveness of this legislation.
c Express the number of people injured as a percentage of the number of registered
vehicles and comment on the effectiveness of the legislation.
/
6 STATISTICS
5 The bar graphs below show the average number of fatal accidents in 1986-1987 on
Victorian roads for different times of day and different days of the week. Comment on
the data provided.
100
D 4AM-4PM
90
• 4PM-4AM
80 AVERAGE 1986 -1987
70
60
a:
50
40
30
20
10
0
SUN MON TUES WED THURS FRI SAT
6 The three bar graphs below show the Federal Government revenue from company tax,
PAYE tax and sales tax for each financial year ending June 1984 to 1988.
0
'84 '85 '86 '87 '88
a Draw a single bar graph to represent the total tax collected from the three sources.
b Use the bar graphs to estimate the likely revenue from each source for 1989.
7 The following table shows the working days lost per thousand employees due to
industrial disputes in each of the Australian States in 1983 and 1984.
Represent the data on two bar graphs, drawn side-by-side, and make any relevant
comments.
STATISTICS 7
8 The number of thousands of people working on building jobs in New South Wales in
a particular year was:
Carpenters 10.0 Bricklayers 4.4
Painters 2.6 Electricians 3.0
Plumbers 4.0 Builders' labourers 4.8
Others 7.2
Represent this information on a pie chart.
9 The pie chart on the right shows the percentage
of world production of tin from selected
countries in a particular year. In that year,
Australia produced 10 200 tonnes. How many
tonnes were produced by each of the other
countries in the chart? Malaysia
38%
10 The bar graphs below show the income and overhead expenses of a mining company
in the years 1984 to 1988. Assume that profit = income - expenses.
$ million
14
D Income
• Overhead expenses
12
10
8 STATISTICS
11 a Study the following graphs and state how they tend to misrepresent the data.
(i)
100 0 4AM-4 PM
• 4 PM-4AM
90
AVERAGE 1986 -1987
80
a: 70
60
z
50
40
(ii) (iii)
Profit
Sales $'000
$million 115
30
110
20
105
10
100
0 I 0
1987 1988 '85 '86 '87 '88
(iv)
(/)
Ql
.c
0
ai
.c
E
z
(The above details were provided by courtesy of the Victorian Cricket Association.)
/
10 STATISTICS
The heights range from 150 to 185 cm. This range is called the sample range. This range
is divided into intervals of 5 cm, called the class intervals. There are two students whose
heights range from150 cm up to, but less than, 155 cm; eight students with heights ranging
from155 cm up to, but less than, 160 cm; and so on.
To allocate the heights into their various classes, it is advisable to mark them off
systematically in bundles of five, as shown in the tally column of the frequency distribution
in Table1.1 . This enables the frequencies to be easily and quickly totalled.
\
STATISTICS 11
Number of
Height in cm Tally students (frequency)
150- II 2
155- HH//1 8
160- HH fH++fH II 17
165- HH fH+ H+r +Ht rlH Ill 28
170- rt+f f+H- I/ IJ 14
175- H-HI 6
180-<185 +l+t 5
Total 80
Alternatively, we may use the stem and /ea/technique to classify the data. Each number
may be considered as consisting of two parts, the stem and the leaf. In the data in Example
3, in which the heights range from150to 185 cm, we may consider the first two digits of
each measurement as the stem and the units digit as the leaf. For example, for the first
observation (153), we may consider15 as the stem and 3as the leaf. However, this would
provide us with only four stems, namely,15,16,17 and18. Since we are dividing the data
into class intervals of5 cm, it would be better to consider two stems fo! each of 15, 16,
17 and 18 and attach the units 0to 4with the first15 and the units5 to9 with the second
15, and so on as follows:
Stem Leaf Total
15 34 2
15 65999758 8
16 011 4302302433241 0 17
16 5779976589799568585979688659 28
17 1 3402401 41 2024 14
17 568975 6
18 34024 5
80
We may now place the leaves in ascending order, if desired. This then arranges the80
observations in ascending order.
Stem Leaf
15 34
15 55678999
16 000011 1 2223333444
16 5555556666777778888899999999
17 000111 22234444
17 556789
18 02344
The above frequency distribution shows the distribution of heights of a sample of 80
students. This sample may or may not be representative of the population of students of
this particular age group. We cannot generalise from the result of this sample, that two out
of every 80students of this age group would have a height in the range150- . The number
will vary from sample to sample.
If, however, our sample were considerably larger and, in place of the actual numbers of
students of the large sample, we put the percentage of students, the frequency distribution
would then be called a percentage frequency distribution or relative frequency distribution
or a probability distribution; in which case, we could then say that 2.5 per cent of students
/
12 STATISTICS
in this age group have heights from 150 cm up to, but less than, 155 cm or that the
probability of any student, randomly selected from this age group, having a height in this
range is 0.025. The probability of a student having a height of less than 160 cm is:
10 = 0.125.
80
In Table 1.2, the actual frequencies are expressed as percentage frequencies and
proportionate frequencies. For example, in the 150- class range, there are two students out
of 80, i.e. 2.50Jo or 0.025.
Table 1.2: Proportionate frequency distribution
Percentage Proportionate
Height in cm frequency frequency
Total 100 1
Frequency distributions, then, are sometimes of importance in themselves but they are
mainly important in providing information about the population from which the sample is
drawn.
Note:
The word 'population' does not necessarily refer to the entire population of a country or
even a State. It could refer to the population in a certain area or even in a particular school.
Furthermore, it does not necessarily refer to a population of people. We speak of, say, the
population of mass-produced electric light globes having varying lengths of life, or the
population of metal rods having varying diameters.
1.5 Histograms
A histogram is a diagram
representing the frequency
30
distribution of a continuous
variable.
The class limits are marked off 25
on a horizontal axis, and a
rectangle is constructed on each 20
class interval so that the area of
the rectangle is proportional to
the corresponding class 15
frequency. If the class intervals·
are equal, the heights of the 10
rectangles are proportional to
the frequencies.
5
0-'-----'----'------"'---�- - �-�---'-------
150 155 160 165 170 175 180 185
Figure 1-6: Histogram
STATISTICS 13
In the histogram in Figure 1-6, the height (and also the area) of each rectangle is
proportional to the frequency, because each class interval is the same.
However, consider the following example. Table 1.3 gives the distribution of the diameters
of a large number of mass-produced wheels. Draw a histogram for these data.
Diameter Percentage
(cm) of wheels
4.86- 4
4.90- 6
4.92- 9
4.94- 20
4.96- 31
4.98- 20
5.02- 6
5.08-5.12 4
100
In Table 1.3 we notice that the class intervals are unequal. The most common class interval
is 0.02 cm. Taking this as the unit, and remembering that the area of each rectangle is
proportional to its corresponding class frequency, we notice that the 4. 86 - range has an
interval of 0.04, and so the height for this rectangle will be halved. Similarly for the 4.98 -
and 5.08 - 5.12 ranges. The range 5.02- has a class interval of 0.06, and therefore
we will have to take½ of the height of this rectangle (Figure 1-7).
32
28
24
Q)
20
'16
Q)
Q)
12
Diameter (centimetres)
30 E
"
''
' '
25 '' ''
'' ''
\
,/'
(5' 20
Ql
�/ \
\
:\
::::,
CT
� 15
''
' ,,/ \
10
c/
'
'' H
5 ' '
,'
B,, '
Exercises 1 b
1 State which of the following are discrete variables (D) and which are continuous
variables (C)
a The ages of the students in your class.
b The number of goals scored by a soccer team.
c The lengths of the lives of electric light globes.
d The number of accidents in a factory per month.
e The number of errors per page in a book.
f The speed of a car in km I h.
g The diameter of mass-produced metal rods.
2 The following numbers represent the heights,in cm,of 50 students.
152,160,168,163,170,173,151,162,166,174,165,155,166,170,169,179,
165, 166,176,167,167,172,169,162,156,169,169,163,166,168,160,165,
171,161,167,165,157,168,175,155,171,159,158,172,163,182,162,167,
168,164
a Construct a frequency table using 5 cm as the class interval.
b Represent the data by means of:
(i) a histogram (ii) a frequency polygon.
STATISTICS 15
6
Length of life Percentage Relative
(hours) Frequency frequency frequency
300- 28
400- 60
500- 72
600- 92
700- 76
800- 50
900-<1000 22
The frequency table above gives the lengths of the lives of 400electric light bulbs tested
in a factory.
Answer the following questions:
a Complete the percentage frequency column and the relative frequency column.
b What percentage of bulbs lasted for at least 700hours?
16 STATISTICS
8 Draw histograms for the population of Victoria in 1959 and 1982, as tabulated below,
and make some relevant comments. Populations are given to the nearest thousand. (Set
out your histograms horizontally as in Question 7 .)
9 Use a histogram and a frequency polygon to represent the following data relating to
the civilian labour force, by age, in Victoria in 1987. The number of people is expressed
to the nearest thousand in this table:
Age group 15-17 18-19 20-24 25-34 35-44 45-54 55-59 60-64 �65
Persons
('000) 90 111 302 555 494 309 110 57 27
STATISTICS 17
The mode
The mode of a distribution is the most frequent or most popular value of the variable.
For the observations 6,7,7,5,8,6,7, 9,7,4,7,the mode is 7 because this number occurs
more frequently than any other number.
In Table 1.1,the mode of the distribution lies approximately in the middle of the class
interval 165 - ,i.e. at the value 167.5 cm. The 165 - class is called the modal class.
In Table 1.2,the mode is 4.97 cm approximately,and the 4.96 - class is the modal class.
Some distributions may have more than one mode. If their histograms have two well defined
humps,the distribution is said to be bimodal. If the histogram has one hump only,as in
Figures 1-6 and 1-7,the distribution is unimodal.
The median
Discrete data
The median of a set of observations is the middle number when the numbers are arranged
in order of magnitude,or is halfway between the two middle numbers if there is an even
number of observations.
Example 4
Find the median of
a 6,7,7,5,8,6,7,9,7,4,7
b 8,4, 10,2,6,9,8,5
Continuous data
The frequency distribution in Table 1.1 can be converted to a cumulative frequency
distribution by adding each frequency to the total of its predecessors.
Table 1.4: Cumulative frequency distribution
Height in Cumulative
cm frequency
< 150 0
< 155 2
< 160 10
< 165 27
< 1 70 55
<175 69
< 180 75
< 185 80
Table 1.4 shows, for example, that: 55 out of the 80 students have heights less than 170 cm;
11 have heights equal to or greater than 175 cm; and so on.
80
70
60
� 50
Q)
> 40 ---------------------------
ca
:;::,
::i
0 30
20
10
167.5
0--'-=-----'--_i_--L-...L__L__L___J___...1____..
150 155 160 165 170 175 180 185
Height (cm)
The 0.5 quantile, for example, is the value of the variable below which½ of the distribution
falls.
The median can be found from the cumulative frequency curve. Its value is approximately
167.5 cm. So one half of the 80 students has a height of l_ess than 167.5 cm.
Quantiles, when expressed as a percentage, are called percentiles. The 0.8 quantile is the
80th percentile and, from the cumulative curve, its value is 173 cm. So 80 per cent of the
students have heights less than 173 cm. It is advisable to draw the cumulative curve on graph
paper so that the quantiles may be read with reasonable accuracy. The 25th percentile or
0.25 quantile is called the lower quartile, below which¼ of the observations lie. The 75th
percentile or 0.75 quantile is called the upper quartile, below which¾ of the observations
lie.
Box plots
A useful method of illustrating the range, the median and the upper and lower quartiles
is by means of a box plot, sometimes called a box-and-whisker diagram.
In Example 4a, the 11 observations range in value from 4 to 9. The median, m, is 7, below
which there are five observations, namely:
4 5 6 6 7
The middle of this set is 6. This is the lower quartile, L. The upper quartile, U, is the middle
of the upper five observations:
7 7 7 8 9
The upper quartile is, therefore, 7. The interquartile range is from 6 to 7.
In Example 4b the eight observations range in value from 2 to 10. The median is 7, below
which there are four observations: 2, 4, 5 and 6. The middle of this set is 4.5. This is the
lower quartile, L. The upper quartile, U, is the middle of the upper four observations: 8,
8, 9 and 10. The upper quartile is 8.5.
The interquartile range is from 4.5 to 8.5. In a box plot, the interquartile range is boxed
as shown in Figure 1-10. The lines drawn from the box to the extreme values are the
whiskers.
L U
a
M
b L M u
2 3 4 5 6 7 8 9 10
Figure 1-1 O
The two distributions can be compared and contrasted by.drawing their box plots together
and noting the comparative lengths of the boxes and the whiskers. What conclusions can
you draw from box plots a and b in Figure 1-10?
/
I
20 STATISTICS
In Example 3, the heights, to the nearest centimetre, of 80 VCE students were given.Their
heights ranged from 153 cm to 184 cm, �s shown in the stem and leaf method of arranging
the data in ascending order. The lower and upper quartiles are 162 and 171 respectively,
and the median is 167. Check these from the cumulative frequency curve (Figure 1-9) or
from the stem and leaf presentation of the data.The box plot is shown in Figure 1-11.
L M u
The whiskers appear to be long compared with the length of the box. What conclusions can
be drawn?
Arithmetic mean
The mode and quantiles are typical values of a distribution. Some of these typical values,
e.g.the mode and the median, are measures of central tendency. Another measure of central
tendency is the arithmetic mean.
The arithmetic mean, or simply the mean, is the average of a set of observations.
Ungrouped data
The mean of a set ofn observations x 1, x2, ...x11 is denoted by x (read 'x bar') and is
defined by:
X = X1 + X2+X3 , , + X11
+ ,
n
Ex
n
Example 5
The mean of the numbers 3, 4, 8, 9, 11 is:
x=3+4+8+9+11
5
35
= =7
5
Grouped data
If the numbers x,, x2, X3, ..., Xk occur J,,h,h, . . . , /k times respectively, the arithmetic
mean is:
- xif,+Xz/2+Xy3-"--+ , , , +Xk/k
x=�--�- - -----=--
J, + h + h + ... + /k
= Exf
Ef
Exf
= -wheren=Ef
n
STATISTICS 21
Example 6
Calculate the mean height of the 80 students in Table 1.1.
Since two students have heights in the range from 150 cm up to, but less than,
155 cm, we take the middle of this class range, 152.5, as representing the average
height of the two students, and so on for the others, as shown in Table 1.5 below.
Table 1.5
Height in Number of
cm,x students,/ xf
152.5 2 305
157.5 8 1260
162.5 17 2762.5
167.5 28 4690
172.5 14 2415
177.5 6 1065 - -M
182.5 5 912.5 X - f,f
13410
'f:.f = 80 Exf = 13 410
80
= 167.6(cm)
It is obvious from the symmetry of this distribution that the mean is around 167.5,
in which case, the arithmetical calculation can be simplified by making 167.5 the
origin and creating a new variable, V. The relation between x and Vis given by
the equation x = a + kV, where a is the origin, 167.5, and k the class interval, 5.
Table 1.6
V f VJ
-3 2 - 6
-2 8 -15
-1 17 -17
0 28 0
1 14 14
x =a+ kV
2 6 12
3 5 15 = 167.5 + 5 X 2
80
80 2 = 167.5 + 0.1
= 167.6(cm)
The mean, mode and median are the most commonly used statistics to denote a
numerical characteristic of a set of observations and, although in many cases they
are numerically very close, they measure different characteristics of the set of
observations.
22 STATISTICS
Example 7
The marks out of 10 gained by the two top students, Gwen and Nick, for a series of ten
maths tests throughout the year are as follows:
Test 1 2 3 4 5 6 7 8 9 10 Total
Gwen 9 10 9 10 7 10 9 10 2 10 86
Nick 9 9 10 8 8 9 8 7 9 10 87
Who should receive the maths prize for the best maths student of the class?
Gwen Nick
Nick argues that he should receive the prize because his total score, and therefore his mean
score, for the ten tests is higher than Gwen's.
Gwen argues that she should receive the prize because both her mode score and her median
score are higher. Furthermore her score is equal to or greater than Nick's in seven out of
the ten tests - equal in two tests and greater in five tests. Why should she lose the prize
because of a mark of 2 in the ninth test?
The mean is the most commonly used measure of central tendency because it takes all of
the observations into account. However, it can be seriously influenced by any extreme values
such as Gwen's score of 2 in the ninth test. The mode and median are not altered by any
extreme values and in some situations are better measures of central tendency. A
manufacturer, for example, would consider the mode to be the most important. Why?
In a survey of incomes, for example, the median income would give a clearer indication of
the situation than the mean because of the few people who would have very high incomes
compared with the majority. In situations like this, it is a common practice to eliminate the
upper and lower quarter of the distribution and calculate the mean for the inter-quartile
range.
both have the same mean, 9, but the second set has more spread. It is important, then, to
have some method(or methods) of measuring spread or dispersion.
1 The range
The range is the difference between the greatest and least values in a distribution. It has a
limited usefulness, since it takes into account only the two extreme observations and ignores
a possible concentration of values around a typical value.
2 The inter-quartile range
The inter-quartile range is the difference between the upper and lower quartiles. It is used
to indicate the spread of the middle half of the observations and is useful in many situations
which wish to ignore extreme values.
3 Variance and standard deviation
The variance and standard deviation are the measures of dispersion most frequently used.
(i) Ungrouped data
2
The variance of a set of n observations x1, x2, X3 .•. Xn is denoted by s and is defined by:
The standard deviation is the positive square root of the variance and is defined by:
(x - x)2
s = �E
n - I
where E(x - x)2 represents the sum of the squares of the deviations of the n observations
from the mean, x.
(ii) Grouped data
If the numbers xi, X2, X3, ••• , Xk occur with frequenciesfi,f2,h, ... ,fk, the variance
is defined by:
2
s2 x>
= E(x - f, where n = Ef
n - I
s=
�
24 STATISTICS
Example 8
Calculate the standard deviation of the observations 1, 2, 6, 7, 9.
X x-x (x - x) 2
1 -4 16
2 -3 9
6 1 1
7 2 4
9 4 16
Total 25 46
Mean,x- = 25 = 5
5
The standard deviation, s, is given by:
s = . /E(x - x)2
'Y n - 1
=ii .I
=
= 3.391
-./TIT
The variance, s 2 = 11.5
Example 9
Calculate the standard deviation of the following frequency distribution.
0 1 2 4 5
· '
X 3
f 12 29 26 18 10 5
X f xf x-x (x - x)'f
0 12 0 -2 48
1 29 29 -1 29
2 26 52 0 0
3 18 54 1 18
4 10 40 2 40
5 5 25 3 45
Exf = 200 = 2
x =
n 100
STATISTICS 25
= �180
99
= .JT]Ts
The variance, s 2 = 1.818
Note:
The divisor in the formula for the standard deviation is n - 1 and not n. The reason for
this could, perhaps, be given at this stage by stating that the first observation clearly tells
us nothing about the variability in the sample, so that only n - 1 of the n observations
are available for estimation of this variability. Furthermore, only n - 1 of the n deviations
from the mean are independent.
Many distributions are approximately of the normal type with a fairly well defined
symmetrical tendency with:
(i) practically all of the observations in the range x ± 3s.
(ii) about 95 per cent of the observations in the range x ± 2s.
(iii) about� of the observations in the range x ± s.
Exercises 1c
1 A survey of the number of children per family of 20 families in a particular area
produced the following results:
0, 4, 1, 0, 2, 5, 3, 3, 2, 1, 4, 6, 2, 3, 1, 3, 4, 3, 1, 3
a Calculate the mean, mode and median.
b Draw a box plot of the data.
2 A proofreader recorded the number of errors per page in a 40-page document as
follows:
Number of errors 0 1 2 3 4 5
Weekly rent($) 60- 65- 70- 75- 80- 85- 90- 95-100
Frequency 6 8 9 20 30 15 8 4
4 The following table shows the age distribution of 60 female workers in a clothing
factory.
Number of hours
lost per accident 0- 5- 10- 15- 20- 25- 30<35
Number of
accidents 17 25 38 20 9 7 4
Number of cartons 57 30 10 2 1
Calculate: , .
a the mean number defective per carton
b the mode
c the median
7 The table below shows the percentage distribution of deaths from scarlet fever among
the various age groups:
Age in years 0- 1- 2- 3- 4- 5-
Percentage of deaths 6 14 17 20 12 8
Percentage of deaths 7 7 2 2 4 1
8 The following table gives the distribution of the diameters of a large number of mass
produced wheels.
Percentage of wheels 2 6 9 20
Percentage of wheels 31 20 8 4
Number of
candidates 5 30 60 105 130
Number of
candidates 100 75 50 30 15
Construct a cumulative percentage frequency curve and answer the following questions:
a What is the interquartile range?
b If the pass mark is 45,what percentage of the candidates passed?
c If honour passes were given to the top 20 per cent of candidates,what would be the
lowest mark required to obtain an honour?
10 For the following set of observations,calculate:
a the mean
b the standard deviation:
9,6,8,6,7,7,6,4,7,7,8,9.
11 Calculate the standard deviation of these observations:
45,40,42,40,38
12 In testing two modifications of an existing eyepiece design in a microscope,an observer
took 10 readings of a fixed length with each eyepiece. The results were as follows:
Design A: 295,278,289,304,293,307,293,290,296,300.
Design B: 276,266,273,286,276,268,238,252,290,242.
It is required to know whether:
a readings with one design are more consistent than with the other.
b one eyepiece produces readings that are generally lower than those obtained by using
the other.
Calculate the mean and the standard deviation for each set of readings and give your
opinion of considerations a and b above.
28 STATISTICS
13 The egg production from two pens of fowls, taken from a total hatching of 1000 fowls,
was recorded over a period of 90 days. The first pen contained 50 birds and produced
an average of 36.4 eggs per day. The birds in the second pen, which numbered 80,
produced an average of 69.3 eggs each in the period. Estimate the total production from
the 1000 birds over the period.
Under what conditions is this estimate justified?
14 Each of 26 students in a class measured
the lengths of the six sides of a regular Length Frequency
hexagon, to the nearest 0.05 cm. The
results were as shown on the right: 3.35 cm 5
3.40 cm 30
3.45 cm 61
3.50 cm 43
3.55 cm 17
a Calculate the mean length, x, and the standard deviation, s, both to the nearest
0.01 cm.
b Plot the histogram for the distribution and mark on the horizontal axis the points
x, x ± s, x ± 2s.
c If a student had obtained a reading of 3.60 cm for the length of one side, how would
you judge this measurement? Give reasons for your answer.
15 Three similar school classes had 20, 30 and 40 pupils respectively and their respective
class pass rates in an examination were 80 per cent, 70 per cent and 60 per cent. What
is the mean percentage pass rate for the 90 students.
16 In a limited-over cricket match (limit of 50 overs) between Australia and Pakistan,
Australia finished its innings with an average run rate of 4.28 runs per over after batting
for 50 overs. Pakistan had an average run rate of 3.8 runs per over after the first 25
overs. What minimum run rate per over did Pakistan require for the next 25 overs to
win the game?
17 A biologist has 21 animals with
weights having the frequency · Unit of Number of
distribution shown in the table weight animals
on the right. To secure
10 3
comparability with another
11 4
group of animals in the 12 7
experiment, the biologist wishes 13 5
to discard one of these animals 14 2
in such a way that the mean
weight of the rest is 12 units. Total 21
a From which class should the
animal be discarded? Justify
your answer.
b Calculate the standard
deviation of weight for the
remaining sample of 20
animals.
STATISTICS 29
a Construct a cumulative percentage frequency distribution for each route and draw
the cumulative curve.
b On what proportion of occasions would a driver using route A reach his/ her
destination no later than 11:06 a.m. if he/ she starts at 9 a.m.?
c At what time would a driver using route B need to leave to reach his/ her destination
no later than 11:06 a.m. on the same proportion of occasions as a driver using route A?
19 In a survey of milk consumption in households, interviewers called at a representative
sample of 400 households. In 320 of these they found someone at home, and obtained
an average consumption of 1.6 litres per day for the 320 households. They called back
at a random sample of 20 of the remaining 80, and found that the combined total daily
consumption in this sample was 21.5 litres. Estimate the average milk consumption in
households. Justify your estimate.
20 Sand particles are graded for size according to diameter, by passing large quantities
through sieves, each of which ,has uniform circular holes. The proportions of the
particles remaining in sieves of various hole sizes are given in the following table:
0.125 0.960
0.250 0.900
0.500 0.700
1.000 0.290
2.000 0.070
4.000 0.000
a Use the data above to plot points on the cumulative proportion graph of the
distribution of particle diameters. Draw a smooth curve through these points.
b Use your graph to help you estimate the proportions of particles with diameters in
the intervals (0-0.5), (0.5-1.0), . . , (3.5-4.0)(mm) and present your results
graphically.
c Estimate the mean and the mode of the distribution.
30 STATISTICS