Statistics Notes
Statistics Notes
CONDUCTING A SURVEY
Surveys can take place in the street, homes, work places or at school. Surveys for market research take
place near shopping complexes. The survey can involve the interviewer asking and noting questions . a
set of questions used in a survey is called a questionnaire.
QUESTIONNAIRES
Open questions allow the interviewee to respond in any way they like. Open questions may begin with
phrases such as
Responses to open questions are difficult to compile and analyze. They often need to be interpreted and
therefore can be misunderstood.
Closed questions allow the interviewee to respond in a limited number of ways. Answers are restricted
and some interviewees may feel none of the available answers is suitable.
Responses to closed questions can be assigned scores to make compilling and analysis easier. Eg yes 10
no = 1. Agree =5, agree strongly 10, disagree = 2
TYPES OF A VARIABLE
A variable is something that can change or take different values. A variable can be categorized into one
of the four types –
Quantitative
Qualitative
Discrete
Continuous
2
3
QUANTITATIVE AND QUALITATIVE VARIABLES
QUANTITATIVE variables are measured, counted or observed. They have numerical value and can be
ranked or ordered from small to large or vice versa.
EXAMPLES
QUALITATIVE variables have qualities or characteristics, and are described in words rather than
numbers. Generally they can only be ranked alphabetically.
EXAMPLES
DISCRETE variables are those that can take only certain or specific values within a given range and the
number of possible values are countable. There is always a gap between one value and the next value.
EXAMPLES
DISCRETE QUANTITATIVE variables can take values which are not whole numbers or integers. For
example shoe sizes can be 4.5 7.5. the values may appear at regular intervals or irregular within a given
range.
Within a given range, the following DISCRETE AND QUALITATIVE variables have a countable number of
possible qualities or characteristics
CONTINUOUS VARIABLES are those that can take any value within a given range or ranges. Values of
CONTINUOUS QUANTITATIVE VARIABLES are usually measured and the number of possible values
within a range is uncountable. Because measurement is required, values of continuous variables are
4
given to a specific degree of accuracy. It is therefore not possible to give exact values for a continuous
variable.
EXAMPLES
the height of people who to the nearest meter are 2meters. Any value from 1.5 ≤ height ˂ 2.5
the ages of trees who to the nearest 10 years are 30 years. Any value 25 ≤ age˂35
the masses of objects, which correct to 1 decimal place are 7.3 kg. any value 7.25˂ mass˂7.35
within a given range all the values all the values of CONTINUOUS QUALITATIVE VARIABLES have an
unaccountable number of possible characteristics or qualities
CLASS EXERCISE
1. Eight variables are listed: shape, height, price, altitude, volume, attitude, duration, number of cars. i) Which
of these variables are qualitative? ii) Which of the quantitative variables are continuous?
2. Four variables W, X, Y and Z are defined: W = {the number of lions seen on game drives in Chobe} X = {all
integers between 3 and 9, inclusive} Y = {any number such that 3 < Y < 9} Z = {all possible shoe sizes}
State whether each of the variables is discrete or continuous.
3. State which three of the following are not discrete quantitative variables and in each case give a reason.
A: the types of food eaten by Rre Mogapi’s cat.B: the number of white cars purchased each month in Gaborone.
C: the number of eggs in a box containing one dozen eggs.D: the mass, measured to the nearest kilogram, of the
customers entering a supermarket. E: the heights of the trees in a forest.
4. . A student was asked to name a discrete quantitative variable and answered “My score in the
Mathematics test that we wrote yesterday”.
Explain why his answer is not correct and suggest how he could correct it by changing just a few words.
5. When examination results are announced, candidates are awarded either a qualitative variable or a
Describe these two variables and explain how they are related.
5
GROUPED DISCRETE DATA
Grouping data is convenient if a variable has many values or if the total frequency is large. This will lead to some
Information about the variable to be lost as values will not be shown individually. There are a number of reasons
why data has to be grouped
Grouped discrete data can be tabulated as a frequency distribution or displayed in a variety of diagrams.
A frequency distribution table consists of CLASSES of grouped values of a variable. The number of values in each
class is is shown as a frequency
CLASS MEASURES
A class is a group of values. For classes of discrete data a gap will always appear between one class and the next
class. Each class has the following measures associated with it.
A lower limit and upper limit are the lowest and highest actual values that exist in a class
These are the two values in the middle of the gaps between a class and those on either side of it. They are found
half way between the upper limit of one class and the lower limit of the next class
CLASS INTERVAL
Class interval is the difference between a class upper boundary and a lower boundary.
Class mid value is the value that is halfway between the limits of a class, and for grouped discrete data is the
same as halfway between class boundaries.
6
EXAMPLES
10,11,1,4,13,9,11,8,14,8,1,11,9,6,14,12,3,6,9,13,12,5,0,7,10,4,6,9,13,9,10,2,11,3,12,13,9,14,13,7
Number of 4 5 7 13 11
chickens
(f)
Note the data is discrete and there are no gaps between the classes.
Discrete data also become grouped when values are rounded off to a certain degree of accuracy.
Suppose you counted the number of chips in each packet and the numbers were
15,18,20,22,25,27,31,32,33,and 34 and if these values are rounded to the nearest 10 the data can be grouped
into classes 4 packets of 20 and 6 packets of 30
actualNumber of chips 15 - 24 25 - 34
Number of packets 4 6
Lower class limit 15 and 25
Upper class limit 24 and 34
Lower class boundaries 14.5 and 24.5
Upper class boundaries 24.5 and 34.5
Class interval 10
Class mid value 19.5 and 29.5
7
1. The table below shows badminton games won by 90 players last year.
A) Why is it not possible to find the number of players who won 10 games last year
B) State the mid value of the class with the second largest number of players in it
C) Write the class intervals of all the classes.
2. The table below shows the number of people who entered 25 different banks between 8 am and 10 pm
corrected to the nearest 10.
i) What is the smallest possible number of people who entered the bank any one of these days
ii) What is the largest possible number of people who entered the bank any one of these days
iii) Write down the upper class boundary of the class containing the largest number of people
iv) Calculate the largest possible and smallest possible number of people that could have entered
the bank between 9 am and 10 pm.
3. Correct to the nearest 100. The number of pupils attending 10 different schools is given below.
These are used to illustrate discrete data in equal interval classes. They are used for displaying small to medium
amounts of data and they always allow us to see the original/raw data after grouping. There are no restrictions
on the class intervals used but they must be equal. A useful feature is that two sets of related data can be shown
back to back for making comparisons.
8
The diagrams are similar to bar charts only the bars are made from final digits of each piice of quantitative data
called the leaves and the stem is made from the remaining digit which can be ascending or descending. A key
must be included to explain what the values in the diagram represent.
EXAMPLE
43,31,23,37,58,61,72,70,77,68,82,39,67,53,61,55,45,59,91,52,83,27,61,45,30,64,46,59,62,41
The scores are classified and ranked in equal interval groups, 20 – 29, 30 – 39, 40 – 49 and so on. The tens digits
appears as a stem and tha leaves are the units.
2 3
represents score of 23% for a male
note that the leaves are aligned vertically so as to produce a bar -like shape. If we now have marks for 30 female
students as
73,51,86,40, 61,64,75,73,80,71,85,86,70,79,64,58,48,62,94,55,83,30,88,58,82,78,97,62,65,74
9
DATA REPRESENTATION
When displaying data it is very important that the presentation should be clear and easy to understand.
Consideration should be made as to who the data being presented to and what purpose does it serve.
One method of displaying data is to use tables. The columns and rows should have headings and totals should
be included when appropriate.
EXAMPLE
EXERCISE
1. The table below gives information about 100 students who wrote a physics examination.
a) the percentage pass-rate for boys, b) the percentage pass-rate for girls v) Which group performed
better, boys or girls?
10
2. A woman sells red and yellow flowers at a market; she has small, medium and large flowers of each
colour. There are 120 flowers altogether and 75 of them are red. She has 20 small red flowers, 14
medium yellow flowers and 48 large flowers. Large red and large yellow flowers are in the ratio 2:1.
3. Joan wants to tabulate data showing the number of students who passed or failed each of the three final
papers in Geography at her school in 2002 and in 2003. There were 322 students who sat for each of the three
papers in 2002; this was 23 less than in 2003. The numbers passing papers 1, 2 and 3 in 2002 were 310, 303 and
305, respectively. Fifteen more students failed paper 2 in 2003 than in 2002; equal numbers passed paper 1 in
both years and three times as many students failed paper 3 in 2003 than in 2002, which was 23 less than in
2003.
ii) What is the greatest possible number of students that failed all three papers in 2003?
PIE CHARTS
A pie chart shows how a whole of something is divided up, and this is done by dividing a circle into
sectors.
Key features
Each sector should be labeled with the name of the part it represents
Sector angles must be in the same ratio of the part they represent
Two or more sets of related data can be shown or compared using comparative pie charts. The areas of the two charts
must be proportional to the totals they represent. The radii are proportional and the totals are proportional, hence
11
r2 : R 2
t:T
therefore
Tr2 = tR2
√
2
R= Tr
AND t
1. At Deepdale School, 400 students sat for an English Language examination. A pie chart was drawn to
illustrate the results of these students using a radius of 6cm. At Shallowvale School, 576 students sat for the
same examination. Find the correct radius for a comparative pie chart showing the results of the Shallowvale
students.
2. A man surveyed 350 shoppers at CutPrice supermarket and 686 shoppers at BargainBin supermarket. The 350
shoppers at CutPrice were represented in a pie chart of radius 7cm. What is the correct radius to be used in a
comparative pie to represent the 686 shoppers at BargainBin?
3. A census was taken on the populations of the villages of Bristow and Chisholm. A pie chart of radius 8cm was
used to represent Bristow’s population of 6800 and a pie chart of radius 10cm was used to represent the
population of Chisholm. What was the population of Chisholm?
4. Kay earns £27 000 per annum. She has drawn a pie chart of radius 6.3cm to show how she spends her salary
and her partner Joshua has drawn a comparative pie chart with radius 5.67cm to show how he spends his
salary.
ii) Calculate, correct to 3 significant figures, the correct radius for another comparative pie chart
that could be drawn to show how Kay and Joshua spend their combined salaries.
5. The number of employees at a clothing manufacturer’s in 2003 was 19% less than in 2002.
A pie chart of radius 12.5 cm was drawn to represent the employees in 2002.
i) Calculate the radius of a comparative pie chart that could be drawn to represent the employees in
2003.
ii) If there were 486 employees in 2003, how many employees were there in 2002?
6. The table gives the number of males and the number of females who applied to join the Arme Forces at an office in
1993 and in 2003.
A pie chart was drawn to represent those applying in 2003 and the sector
1993 2003 area for females was 62 cm2. = 3.142, find
Males 423 424 i) The area of the pie chart used for 2003. ii) The radius of the pie chart
12
used for 2003.iii) The correct radius to be used for a comparative pie
Females 137 189
chart to represent those who applied in 1993.
iv) The area of the sector for males who applied in 1993.
PICTORIAL REPRESENTATION OF DATA
Data is represented by the use of symbols with a key to indicate what each symbol represents. Care
must be taken when drawing symbols. If the two symbols are different in shape and size they will not
represent the same item or the same number of items. It is recommended that we use symbols with
simple shapes as it may be necessary to draw fractions of a symbol.
1. Forty-five students were asked which sports they play. All students play at least one sport.
Their responses are illustrated in the pictogram.
Football i) Find
iii) Express the number that plays softball to the number that does not play table tennis as a simple ratio.
iv) If a student is randomly selected from the group, what is the probability that this student play table
tennis?
2. A farmer grows five different types of vegetable on his farm. The area used for growing each is given below:
13
A CHANGE CHART
A chart shows relative changes in a variable. The changes which are in percentages could be positive or
negative.
chnge
Percentage Change = x 100
original value
Example 1
14
Example 2
15
BAR CHARTS
Bar charts are columns of equal widths representing different numerical values. The heights represent
the frequencies and equal gaps should be drawn between the bars. Bars should not be shaded using
bright colors and complicated patterns as this could misrepresent the frequencies and distort the data.
Comparative bar charts are useful if two or pieces of related data are to be compared. A dual bar
chart is used to show pieces of related data. It consists of pairs of bars with one bar in each pair for
each set of data. Equal - width gaps should be left between the pairs and all bars should be of equal
width
A key should be used to distinguish between the two data sets. Example
Last month two travel companies, TourWell and TravelSafe organized holidays for customers to
different destinations and the numbers are given below and represented in a dual bar chart
16
Potato chip flavour Class A Class B
Smokey bacon 6 14
Barbecue 8 6
Cheese & onion 1 0
Salt & vinegar 11 8
Hot chilli 4 2
1. The results of a survey in two classes on the students’ favourite flavours in potato chips are shown.
17
2. No patients admitted ( f ) Hospital A Hospital B Hospital C
16
12
a) how many patients were admitted with malaria to the three hospitals altogether during these 6 weeks,
b) during which week the greatest number of patients were admitted into the three hospitals altogether
c) between which two consecutive weeks was the greatest change in the number of patients admitted to hospital C.
iii) Express the total number of patients admitted to hospitals A, B and C as a simple ratio. iv) What percentage of the total number of patients
admitted during this period was admitted to hospital B?
3. A company has recorded the numbers of its full-time and part-time employees for the years 2001 to 2003.
Type of employee 2001 2002 2003 ii) Express, in simple form, the ratio of the two types
iv) Over the three-year period, what percentage of the employees has been part-time?
In a sectional bar chart, each bar represents a total, and is divided into sections of how each total is made up. The size of
each total can then be easily compared relative to others and the proportions of each component in the bars can also be
compared.
For example
18
The sectional bar chart below shows the number of men, women and children living in Hill Street.
0 10 20 30 40
No
Trucks
Cars
80 Motor bikes
Garage
Mike’s Jomo’s
PERCENTAGE SECTIONAL BAR CHART
Each bar represents a total and is drawn to a height of 100%. The bars are divided according to sections
and each section is represents a percentage of the total. One advantage of this chart is that components
can be compared in bars easily and between bars. Contributions of components must be calculated and
this can only happen when totals are known.
19
3. A company has recorded the numbers of its full-time and part-time employees for the years 2001 to 2003.
Type of employee 2001 2002 2003 ii) illustrate the data in a sectional bar chart
Full-time 78 77 95
iii) illustrate the data in a percentage sectional chart
Part-time 26 22 19
2. Mrs. Mafela owns three filling stations.
The volume of petrol, diesel and paraffin sold, in litres, at each of these stations last week is shown.
i) a) calculate the total volume of each type of fuel that was sold,
b) what percentage of the petrol was sold at the North filing station?
c) draw a sectional percentage bar chart with one bar for each of the three types of fuel.
ii) a) calculate the total volume of fuel sold at each of the three filing stations,
b) what percentage of the fuel sold at the South filing station was diesel?
c) draw a sectional percentage bar chart with one bar for each of the three filling stations.
20
VENN DIAGRAMS
21
3 people can speak all the languages.
5 people can speak both Urdu and French
13 people speak only Arabic.
22
DATA REPRESENTATION
ADVANTAGES DISADVANTAGES
PICTOGRAMS
they offer an attractive visual impact they can be hard to draw
they are easy to read fractional pictures are hard to read
they can handle large volumes of data using the can only represent few categories of data.
keyed symbols (2/3)
PIE CHARTS
it shows relative proportions of item in at times its tedious to calculate sector angles
relation to the whole the actual frequencies are not shown and
offers easy comparism of each item as parts might need to be calculated
of a whole its accuracy depends on the accuracy of the
angles
the answers in most cases must be
approximated
SIMPLE BAR CHARTS
it is easy to see quantities it is difficult to see proportions
can be drawn easily can easily be manipulated
can only be used for discrete data
DUAL BAR CHARTS
shows comparisms between variables can only represent few variables
COMPONENT BAR CHART
proportions of items can easily be seen in can only represent few variables
bars
can be used to see trends by looking at the
heights of bars
PERCENTAGE SECTIONAL BAR
CHARTS
it shows proportions of various items does not show actual quantities
shows relative differences between
catergories
CHANGE CARTS
23
shows relative changes in items and only changes are shown therefore can be hard
quantities to read
f 21 27 83 19
2. The times taken by 76 cyclists to complete a 700 meter sprint race are given.
No cyclists ( f ) 14 32 30
4. Some students investigated the departure times of 200 commercial aircraft. They recorded the
number of minutes by which each aircraft’s departure was delayed. Their results are shown below.
24
Delay time No aircraft ii) On graph paper, draw and label suitable axes for delay time and for
number of aircraft and, by plotting the points from your table, construct a
(t minutes) (f)
histogram.
0 £ t < 10 48
iii) Use the histogram to estimate a) modal delay time,
10 £ t < 20 98
b) the number of aircraft that were delayed by less than 8 minutes,
20 £ t < 30 46
c) the number of aircraft that were delayed by 35 minutes or more.
30 £ t < 40 8 d) Calculate an estimate of the mean and median
25
26
HISTOGRAMS OF UNEQUAL CLASS INTERVALS
The process of constructing histograms of unequal class intervals is different from those with equal class
intervals as some calculations are required. In the vertical axis FREQUECY DENSITY has to be calculated
FREQUENCY
FREQUENCY DENSITY =
CLASS INTERVAL
EXAMPLE
The numbers of megabytes used for storing documents in on 339 computers at a college are given below.
DRAW A HISTOGRAM TO SHOW THIS DATA
THIS IS HOW THE HISTOGRAM WILL LOOK LIKE. NOTE. FREQUENCU DENSITY ON THE VERTICAL AXIS
27
CLASS EXERCISE
1. A biologist planted 684 seeds and recorded the times taken for the seeds to germinate.
i) Explain why the first class (containing 120 seeds) has an interval of 4 hours, not 3 hours.
iii) Calculate an estimate of a) the number of seeds that germinated in less than 31 hours,
c) the percentage of the seeds that took between 1 and 2 days to germinate
3. The departure delay-times of some buses last month are represented in the histogram below.
28
Buses per 2 minutes (density)
i) How many buses are represented?
29
30
31
MEASURES OF CENTRAL TENDENCY FOR GROUPED DATA
In grouped data distributions, the mean, the mode and median can only be estimated.
For example the table below shows the heights in meters reached by mountain climbers in a day.
Height (H metres) 200 - 500 600 - 1000 1100 - 1300 1400 - 1700 1800 - 2100
f 40 50 42 24 44
Answers
THE MEAN
MEASURES OF DISPERSION
32
A data distribution cannot be described by measures of central tendency alone. For example knowing
that the mean mark in a test is 50% is not informative enough as it give no idea how varied the marks
are. Therefore we need to know how values are spread and this is the purpose of finding measures of
dispersion.
Range
Interquartile range
Standard deviation
Mean deviation
1. RANGE
This is the simplest measure of spread and is commonly used. The range is the difference
between the largest and smallest possible values in a data distribution.
EXAMPLES
Classes in a certain school have 22, 34, 36,41,45,37 students.
The range will be 45 – 22 which is equal to 23.
In a company 10 people earn P20 an hour, 5 people earn P9.00 while two earn 3.50.
The range in salaries will be P20 – P3.50 = P 16.50
The lengths of 30 pencils in a grouped frequency table are as follows
The range cannot be found exactly in this case as the actual lengths are not given. The lower
limit of the first class and the upper limit of the last class are used to find the lower limit and
upper limit of the range.
Lower limit of the range = 13 – 9 = 4
Upper limit of the range = 17 – 8 = 9
So the range is between 4 and 9.
EXERCISE
Find the range of the following sets of data
(a) 4,7,7,9,13,21 ( b) -3,5,18,24,29,37 (c) z frequency
60 17
(d) 65 23
x frequency
10 5 70 28
11 7 75 59
12 9 80 21
13 4 85 12
2. (a) The times taken to the nearest minute by students to travel to school is summarized in a table
below. Find the lower and upper limit of the range
33
Time taken 20 - 21 22 - 24 25 - 30
No of students (f) 6 39 5
(b)The grouped frequency table below shows total amounts spent by customers in a
supermarket. Find the lower and upper limits of the range in the amount spent by customers.
The actual positions for the lower and upper quarters, depend on wether the number of values
in a distribution is odd or even.
QUATILE POSITIONS FOR AN ODD DISTRIBUTION
EXERCISE
1. For each of the following sets of data find (a) the lower quarter (b) the upper quarter and (c)
the inter quartile range.
(i) 20,6,28,34,16 (ii) 15,25,29,37,71,43,17,43,71,15,7,14
(ii) 43,31,23,37,58,61,72,70,77,68,82,39,67,53,61
(iii)
p 10 20 30 40 50 60
frequency 2 13 5 11 13 15
34
(ii)
score 0 1 2 3 4 5 6 7 8 9 10
frequenc 1 1 6 9 10 18 27 11 9 5 2
y
(iii)
The table shows the number of sons and daughters that each of 259 men has.
SONS
0 1 2 3 TOTALS
0 4 41 19 7 71
DAUGHTERS
1 31 58 11 5 105
2 22 19 10 6 57
3 7 8 7 4 26
TOTALS 64 126 47 22 259
(a) Find the inter quartile range for the number of (i) daughters
(ii) sons
(b) Draw a frequency table showing how many children(sons and daughters) these men have
(c) Find the inter quartile range of the number of children.
35
CUMMULATIVE FREQUENCY DIAGRAMS
Estimates for values at certain positions within a distribution (median and quartiles) can be read
from frequency diagrams. By estimating quartiles we can also estimate the interquartile range of
a data distribution in frequency table and diagrams.
As well as estimating quartiles, we can also estimate deciles and percentiles from a cumulative
frequency diagram. The position of the decile is any number of tenths and the percentile is any
number of the hundredth of the total frequency.
decile percentile
x x
xN xN
10 100
Linear interpolation is a method used to calculate an estimate of any value in any particular
position in a distribution, such as the median and the quartiles.
P−CF
Estimated value = LCB + (W)
Fm
EXERCISE
The standard deviation is the positive square root of the variance. Variance of a distribution
is in simple terms, the difference between two squared quantities which are;
The mean of the square of values
The square of the mean of the values
FOR UNGROUPED SETS OF DATA
VARIANCE = ∑
X2 ∑ X
[ ]
2
–
N N
√ ∑ X2 – ∑ X
[ ]
2
STANDARD DEVIATION =
N N
√ [ ]
∑ fX 2 – ∑ fX
2
STANDARD DEVIATION =
N N
EXERCISE
37
(d)
x 10 20 30 40 50 60 70 80
f 3 4 2 11 7 9 4 5
STANDARD DEVIATION AND VARIANCE FOR GROUPED DATA VALUES
In grouped data values the standard deviation can only be estimated. Class mid values are used instead
of X or individual values.
√ [ ]
2
Class mid values for grouped data should be calculated carefully, especially when values have been
given to a certain degree of accuracy
EXAMPLE
The capacities of 85 containers were recorded in the table below. Find the standard deviation
CAPACITY FREQUENCY
20˂ C ≤ 24 7
24˂ C ≤ 28 15
28˂ C ≤ 30 29
30˂ C ≤ 32 22
32˂ C ≤ 35 12
The times taken for seeds to germinate are recorded below. Calculate the standard deviation
Time 24 - 26 26 - 30 30 - 35 35 - 50 50 - 60 60 - 70
taken
No – 1 3 7 72 192 55
seeds (f)
The heights of 200 male students and 100 females are summarized in a table below. Find the standard
deviation for (a) Males (b) females
38
ADVANTAGES AND DISADVANTAGES OF MEASURES OF DISPERSION
ADVANTAGES DISADVANTAGES
Simple to calculate Based only on two sets of data.
Commonly used to Easily affected by extreme
RANGE compare spread values.
between similar sets
of data.
To calculate the mean value for combined sets of data we need to find the sum of all values in both sets
and also the total number in those sets. Similarly to find the standard deviation we need to find the sum
of all the squares in those sets.
EXERCISE
1. Given that ∑ p = 420 and ∑ q = 1290 find the mean of P and that of Q if there are 25 values
of P and 75 values of Q.
39
2. Given that ∑ x = 72 and ∑ y =240 find the mean of P and that of Q if there are 12 values of
x and 12 values of y.
3.
For the combined distribution P and Q find (a) the mean (b) the variance
4.
Calculate p,q,r,s,t,u
For the combined distribution M and N find (a) the mean (b) the standard deviation
40
LINEAR TRANSFORMATION OF DATA
Linear transformation is a process by which one set of numbers is mapped onto a set of another set by
addition and or multiplication.
For example, the linear function y = 3x + 2 maps values of x (1,2,3) onto the values of y (5,8,11). We say
the distribution 5,8,11 has been derived from 1,2,3.
Example
If measures of central tendency for a particular distribution are known, then measures of central
tendency for a distribution derived by addition or multiplication can be found directly.
All the three measures of central tendency are affected by the same operations that are used to
derive an operation, whether its addition or multiplication or a combination of these. Identical
operations must be performed on all original values.
EXERCISE
1. A set of values ( p, 13,18,29,29) has mode = 29, mean = 20, and median = 18. Without
calculating the value of p, calculate the three measures of central tendency for derived
distributions
(a) (p+2), 15,20,31,31
(b) (p – 5),8,13,24,24
(c) 2p,26,36,58,58
p
(d) , 6.5,9,14.5,14.5
2
41
2. Variable G has a mode = 6.2 find mean = 5.6 and a median = 5.85 find;
G
(a) The mean of –1
5
(b) The median of 2(G + 5)
(c) The mode of 1.7 + 1.5G
3. Variable R shown on the table has mean = 4.8, variables W and V are derived from R.
Variable R
R 3 4 5 6
f a b c d
Variable W
R 7 9 11 13
f a b c d
Variable V
P 6 14 22 30 38
f a b c d e
W 5 - 13 13 - 21 21 - 29 29 - 37 37 - 45
f a b c d e
X 0≤x˂4 4≤x˂8 8 ≤ x ˂ 12 12 ≤ x ˂ 16 16 ≤ x ˂ 20
f a b c d e
Y 6.8 – 8.4 8.4 – 25.2 25.2 – 27.6 27.6 – 44.4 44.4 – 46.8
f a b c d e
42
MEASURES OF DISPERSION FOR DERIVED DISTRIBUTIONS
EXAMPLES
If the smallest value in a distribution is 3 and the largest value is 10 then the range is 10 – 3 = 7. If both
values are increased by 20 then (10 + 20) – (3 +20) = 30 – 23 = 7. The range was NOT NAFFECTED. The
range will also not be affected if we made subtractions by the same value.
If the smallest value, 3 is multiplied by 11, and the largest value 10 is similarly multiplied by 11 the
range will be ( 10 x 11) - (3 x 11) = 110 – 33 = 77. The range has changed drastically! But note the initial
range of 7 has also been. Multiplied by 11 to become 77. Division also has the same effect on the
range.
If the range is affected by multiplication, and not addition, we should expect that the standard deviation
as a measure of dispersion is also similarly affected.
EXAMPLE
The tables below show values for distribution X and a derived distribution 3X + 1
0.758 2.274
The standard deviation of 3X + 1 is three times the standard deviation of X . Even the variance of 3X + 1
is 3 times the standard deviation of X.
EXERCISE
1. The distribution of a variable P has a range of 13 and a standard deviation of 5.4. find the range
and standard deviation of the derived distributions
(i) P +7 (II) 3p
1
(ii) 5P + 4 (IV) p+ 5
5
2. Each value in a distribution is reduced by 25%. If the new standard deviation is 3.4. Find the
original standard deviation.
3. The masses of a group of children have a mean mass of 46.9 kg and a standard deviation of
9.1kg. find the new mean and the new standard deviation if the mass of each child is reduced
by:
(i) 900g
(ii) 5%
(iii) 900g and then by 5%
4. A set of numbers 6, 10,t, 14,15 has a mean M and a standard deviation S. find the mean and the
standard deviation for derived distributions
(i) 10,14, t + 4, 18,19
(ii) 18,30,3t,42,45
(iii) 2,4, 0.5t – 1,6,6.5
44
SCALING
STANDARD SCORES
Standardized scores are used to compare two sets of unrelated data. We cannot compare oranges and
apples unless we have a common language for them, that is standardize them. We make them to have
something in common (that is a z score). Only then we can make some judgment about them.
A Z SCORE MEANS HOW FAR AWAY A PARTICULAR VALUE IS FROM THE MEAN
A z score of 0, means a data value fell exactly at the mean. A positive Z score means a value is above the
mean while a negative Z score means a value is below the mean. For example a Z score of 2 means that
a particular value was 2 standard deviations away from the mean. A Z value of negative 3 means the
value was 3 standard deviations below the mean value.
OBSERVEDVALUE−MEAN
STANDARD SCORE =
STANDARD DEVIATION
X− X
Z SCORE =
S
45
EXAMPLE
Katherine scores 75 % in English and 62 % in Mathematics. In which test did she do well? From face
value we can assume that it is in English. However we might need more information about the two tests
to arrive at a conclusion. If we are given that the mean for English is 70% and standard deviation is 5
while for Mathematics the was mean 50% while the standard deviation is 3 for Mathematics.
X− X
Z SCORE =
S
75−70
Z=
5
5
Z =
5
Z SCORE = 1
62−50
Z =
3
12
Z =
3
Z SCORE = 4
These results indicate that Katherine actually did much better in Mathematics than in English.
However it is important to note that a high Z score does not always indicate high achievement.
A lower Z score for time taken to complete a task will indicate that a particular participant was
significantly better than others .
EXERCISE
1. Given that the mean of a data distribution is,60 and the standard deviation is 8. Write the
following values as standardized scores
i. 68 ii. 52 iii. 44 iv. 80 v. 34
2. Complete the following table in terms of X and
46
FATIMA 80
PONTSHO 34
3. Complete the following table for standard scores in terms of X and using a mean of 50 and a
standard deviation of 16
SCALED MARKS
Scaling is a linear transformation of one set of numbers to another set that has a chosen mean and a
chosen standard deviation. This process can be used to compare different activities such as;
Examination results
Athletic races
Test results
X− X Sy−Sy
=
sx Ss
47
EXAMPLE
In a test, the raw mean was X = 60 and the raw standard deviation was Sy = 10. A raw mark of 80 is to be
scaled to a mean of 65 and a scaled standard deviation of 12. Find the scaled mark
X− X Sy−Sy
=
sx Ss
¿ 80−60 Sy−65
=
10 12
¿ 20 Sy−65
=
10 12
24 = Sy – 65
89 = Sy
EXERCISE
1.
48
2.
49
BOX AND WHISKER PLOTS
Box and whisker plots is a summary plot based on the median and the inter quartile range which
contains 50% of the values, the highest and lowest values excluding outliers. A line across the box
indicates the median.
EXAMPLE
3 5 5 6 6 7 8 10 11 12 12
n+1 11+1 th
The median Q2 is [ ¿ which is = 6 position
2 2
The median is 7
50
BOX AND WHISKER PLOTS
CLASS EXERCISE
Question 1
a) 2 51 43 54 53 51 62 49 50 63 60
b) 45 58 34 42 52 49 50 45 51
c) 75 65 78 79 76 79 72 82
d) 110 98 91 102 89 75 108 118 152
Question 2
a) 25 12 31 26 27 29 32
b) 35 46 50 32 54 44 60
c) 57 53 52 31 48 58 64 86 56 54 55
d) 34 42 45 45 49 50 51 52 58
e) 65 72 75 76 78 79 79 82
Question 3
Draw box and whisker plots for the above indicating outliers if any.
51
COMPARING DATA BY BOX AND WHISKER PLOTS
EXAMPLE
Two campus book stores are having a price war of their first year maths books. James a first year
maths major student goes into each store and tries to establish the cheapest price he can find. He
looks at the prices of randomly chosen five books for maths courses in each store and collects the
following data.
Store A Store B
Minimum value = 75 Minimum value = 60
Maximum value = 110 Maximum value = 120
80+75 60+84
Q1 = = 77.5 Q1 = = 72
2 2
Q2 = 95 Q2 = 84
100+110 100+120
Q3 = =105 Q3 = =110
2 2
IQR = Q3 – Q1 IQR = Q3 – Q1
105 – 77.5 = 27.5 110 – 72 = 38
OUTLIERS OUTLIERS
LOWER BOUNDARY LOWER BOUNDARY
Q1 – 1.5IQR Q1 – 1.5IQR
77.5 – 1.5(27.5) = 36.25 72 – 1.5(38) = 15
UPPER BOUNDARY UPPER BOUNDARY
Q3 + 1.5IQR Q3 + 1.5IQR
105 + 1.5(27.5) = 146.25 110 + 1.5(38) = 167
NO OUTLIERS NO OUTLIERS
52
James should buy from store B as the prices there are lower as shown by a lower median value.
Both data sets had no outliers but book prices in store B were more varied as shown by a higher
IQR.
EXAMPLE
Two campus book stores are having a price war of their first year maths books. James a first year
maths major student goes into each store and tries to establish the cheapest price he can find. He
looks at the prices of randomly chosen five books for maths courses in each store and collects the
following data.
Store A Store B
Minimum value = 75 Minimum value = 60
Maximum value = 110 Maximum value = 120
80+75 60+84
Q1 = = 77.5 Q1 = = 72
2 2
Q2 = 95 Q2 = 84
100+110 100+120
Q3 = =105 Q3 = =110
2 2
IQR = Q3 – Q1 IQR = Q3 – Q1
105 – 77.5 = 27.5 110 – 72 = 38
OUTLIERS OUTLIERS
LOWER BOUNDARY LOWER BOUNDARY
Q1 – 1.5IQR Q1 – 1.5IQR
77.5 – 1.5(27.5) = 36.25 72 – 1.5(38) = 15
UPPER BOUNDARY UPPER BOUNDARY
Q3 + 1.5IQR Q3 + 1.5IQR
105 + 1.5(27.5) = 146.25 110 + 1.5(38) = 167
NO OUTLIERS NO OUTLIERS
53
James should buy from store B as the prices there are lower as shown by a lower median value.
Both data sets had no outliers but book prices in store B were more varied as shown by a higher
IQR.
CLASS EXERCISE
1. EMPLOYEES ARRIVING AT WORK BY CAB OR THEIR OWN CAR, WHICH MODE OF TRANSPORT
IS MORE RELIABLE
CAR 14 18 16 22 25 12 32 16 15 10
CAB 12 10 13 14 9 17 11 10 8 11
2. STUDENTS MARKS IN TWO TESTS
SKEWENESS
One of the main reasons of drawing a stem and leaf or a box whisker is to quickly spot the trend
in the spread of the data.
SYMMETRICAL DISTRIBUTION
The box is symmetrical. The box is in the middle and the whiskers are of equal length.
54
CLASS EXERCISE
3. EMPLOYEES ARRIVING AT WORK BY CAB OR THEIR OWN CAR, WHICH MODE OF TRANSPORT
IS MORE RELIABLE
CAR 14 18 16 22 25 12 32 16 15 10
CAB 12 10 13 14 9 17 11 10 8 11
4. STUDENTS MARKS IN TWO TESTS
SKEWENESS
One of the main reasons of drawing a stem and leaf or a box whisker is to quickly spot the trend
in the spread of the data.
SYMMETRICAL DISTRIBUTION
The box is symmetrical. The box is in the middle and the whiskers are of equal length.
55
EXAMPLE 2
3.9 4.1 4.2 4.3 4.3 4.4 4.4 4.4 4.4 4.5 4.5 4.6 4.7 4.8 4.9 5.0 5.1
56
STEM DIAGRAM
Cumulative frequency diagrams for ungrouped data are sometimes referred to as STEM diagrams or
STEP POLYGONS because of their appearance.
EXAMPLE
The number of eggs laid each day by 7 hens for a period of 21 days was recorded in a table as
follows;
NUMBER OF EGGS F
4 1
5 4
6 7
7 5
8 4
total 21
57
GROUPED FREQUENCY TABLE
NUMBER OF EGGS F CF
X ˂4 0 0
4 ≤X˂5 1 1
5 ≤X˂6 4 5
6 ≤X˂7 7 12
7 ≤X˂8 5 17
8 ≤X˂9 4 21
(4,0) (4,1) (5,1) (5,5) (6,5)( 6,12) ( 7,12) ( 7,17) ( 8,17) ( 8,21).
These are then plotted on a pair of axis and joined by a straight line.
EXERCISE
6 5 3 3 5 3 1 2 4 2 2 5 1 5 2 2 3 8
2 4 5 3 3 0 2 5 0 1 0 1 3 0 3 12 7 1
MEASURES OF ASSOCIATION
Correlation is a description of a relationship between variables. Some variables are related while others
are not. For example
As we seek to answer these questions, we are simply trying to establish a CORRELATION between
variables.
TYPES OF CORRELATION
POSITIVE CORRELATION
58
The points slope upwards. The values of one set of data increases as the other set of data. If the points
are very near the trend line, then we have a very strong positive correlation.
NEGATIVE CORRELATION
The points slope downwards. As one set of data increases the other set decreases. The values are
indirectly or inversely proportional.
The plotted points are scattered all over. The variables show no relationship.
If two variables are compared, the value of one of them is usually controlled by the value of the other.
Sometimes there is no straight line passing through the points but you can still draw the line of best fit
which comes as close as possible to fitting all the points. The closer the points are to the line, the
stronger the correlation.
Calculate the mean for X values and Y values to have a point (X,Y)
Divide the points into two groups. Group 1 are points less than Y while Group 2 are points
greater than X
Calculate the arithmetic mean for group 1 to have (x1,y1) , first semi average.
Calculate the arithmetic mean for group 2 to have (x2,y2), second semi average.
Draw the line of best fit. It should pass through (X,Y) and as close as possible to the two semi
averages.
1. A group of students sat for two papers in science. The percentage marks for ten of the students were:
Student A B C D E F G H I J
Paper 1 (X %) 54 40 74 62 80 38 36 44 84 68
59
Paper 2 (Y %) 52 32 68 62 68 52 30 38 72 56
i) On graph paper, draw axes for X and for Y from 0 to 100 using 1 cm to represent a mark of 10%.
ii) Draw a scatter diagram for the data.
iii) Showing your working, calculate the average point and the two semi-average points.
iv) Plot the three average points onto your scatter diagram and draw a line of best fit through them.
v) Using the average point and one other point, find the gradient of the line to 2 decimal
places.
vi) Find, to the nearest integer, the value of the Y-intercept and write down the equation of the line of
best fit.
The point for student F is furthest from the line.
vii) What does this tell you about student F?
2. The recommended daily dose of a brand of wormer powder for domestic cats is given.
Mass of cat ( M kg) 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Daily dose (D g) 2.0 3.0 4.0 5.0 6.0 8.0 9.0 10.0 11.5 12.0 14.0 15.5
3. The masses and volumes of 13 specimens of stone, collected from a dry riverbed, are given.
Specimen A B C D E F G H I J K L M
Mass 26.0 17.0 14.0 8.0 14.0 29.0 23.5 35.0 10.0 45.5 29.5 51.5 25.0
Volume ( 15.0 22.0 7.5 11.0 18.0 16.0 30.5 20.0 14.0 25.0 38.5 29.0 33.5
i) Draw and label axes for volume horizontally from 0 to 40 cm3 and for mass vertically from 0 to 60g.
ii) Draw a scatter diagram to illustrate these data.
iii) Suggest a practical reason to explain why the points fall into two distinct groups.
60
iv) For each group of points, find the co-ordinates of the average point and plot these onto the
diagram.
The lines of best fit should intersect at a certain point.
v) Write down the co-ordinates of this point and explain why both lines should pass through it.
vi) Draw a line of best fit for each of the two groups of points.
vii) Calculate, correct to 2 decimal places, the gradients of the two lines and explain their meaning in
the context of this question.
4. A group of tourists from France visited Thailand on a holiday in June 2003.
The visitors exchanged different amounts of French Francs at various banks in Thailand.
The amount that each exchanged, in Francs, and the amount that each received, in Thai Bhats, were:
Tourist Jacque Emile Danny Xavier Fifi Eugen Marie Edith Andre Clement
s e
Bhats (B) 2800 4610 7150 11240 15000 4960 5700 1294 10300 13600
5
Francs (F 400 650 1000 1530 2000 700 820 1780 1440 1860
i) Draw an axis for the independent variable using 1 cm to represent 100 units and an axis for the
dependent variable using 1 cm to represent 500 units.
ii) Draw a scatter diagram to illustrate the data.
iii) By calculating appropriate average values and plotting, draw a line of best fit onto the scatter
diagram.
iv) Calculate the gradient of your line of best fit and explain its meaning in the context of this question.
v) Estimate how many Francs would have been needed to purchase 58 000 Thai Bhats in June 2003.
INDEX NUMBERS
61
WEIGHTED AVERAGES
It is much easier to work with a single set of data compared to a combination of two or more sets of
related data. The question that arises is, are these sets of data totally same or one is more significant
than the other. When sets of data are combined measures such as average (mode, mean, median) are
WEIGHTED (that is respective ratios of data taken into consideration).
The most commonly used measure of average is the mean which is found by dividing the sum of values
by the number of values. If two sets of data are combined and carry different weights then the weights
have to be used to find the final mean.
FOR EXAMPLE
62
Find the mean age of a 40 year old man and his 10 year old twins.
40+10
= 25
2
BUT RATHER
( 1× 40 ) +(2× 10)
= 20
1+ 2
∑ (weight × number)
Weighted Average =
∑ ( weights)
Exercise
2. Use weights of 11 and 14 respectively to find a weighted average of 2.4 and 3.8
3. In a business studies examination, which was taken by 382 form 4 students and 418 form 5 students,
the form 4s obtained a mean score of 45% and the form 5s obtained a mean score of 62%. Calculate, to
1 decimal place, the mean score of all the students who took the examination.
4. A group of 134 girls and 166 boys sat for a literature in English examination. The boys obtained a
mean score of 68.5% and the mean score for all the students was 64.35%. Calculate the mean score for
the girls.
5. Students sat two mathematics papers, A and B. The teacher decided that paper B was twice as
important as paper A and so calculated the students’ weighted average scores by assigning a weight of 1
to paper A and a weight of 2 to paper B. The table below shows some of the scores of four students
Daniel 62 47
Eva 38 77
63
Aesop 60
Rina 55
i) Find the weighted average score awarded to a) Daniel and b) Eva.
ii) Aesop was awarded a weighted average score of 56. Find his score on paper A.
iii) Rina was awarded a weighted average score of 85. Find her score on paper B.
He walked for 1 hour 30 minutes at an average speed of 6 km/h and then cycled 12 km at 16 km/h.
INDEX NUMBERS
Index numbers are used to show proportional changes in values and costs over a period of time. They
are used in relation to:
PRICE RELATIVES
64
Price relatives are index numbers showing proportional changes in the prices of items between years.
An index number of 100 is assigned to the price of an item in a chosen base year and all past and future
prices for that item are given index numbers relative to 100.
KEY POINTS
The price of an item and its price relative are always in the same proportion.
Price relatives show a percentage change in the price of an item since the base year.
OR
PRICE∈YEAR A
PRICE RELATIVE IN YEAR A = PRICE ∈ BASE YEAR × 100
EXAMPLES
1. A bottle of shampoo cost $4.00 in 2018. In 2019 the price had increased to $4.60. find the price
relative of 2019 based on the price of 2018.
PRICE∈YEAR A
PRICE RELATIVE IN YEAR A = PRICE ∈ BASE YEAR × 100
$ 4.60
PRICE RELATIVE IN 2019 = $ 4.00 × 100
= 115
Note, the price relative of 115 shows that the price of the item increased by 15% between 2018 and
2019.
2. The table below shows bus fares from Gaborone to Harare for the years 2015, 2016 and 2017
(i) Find the price relative of 2016 based on the price of 2015
(ii) Find the price in 2017 using 2015 as a base year
65
EXERCISE
1. In 2001 a coat cost £80. The price relatives, based on the 2001 price, were 110 in 2002 and 115 in
2003. Find the price of the coat in i) 2002, ii) 2003.
i) Using 2001 as the base year, find a) the price relative for 2002 b) the price relative for 2003.
ii) Using 2001 as the base year, the price relative for the computer was 87.5 in 2004.What was the
price of the computer in 2004?
3. The price relatives of three types of fuel are given for the years 1993, 1998 and 2003.
1993 1998 2003 i) What is the significance of the three 100s in the
column for 1993?
Petrol 100 112.5 135 ii) What is the significance of the two price relatives
Diesel 100 109 109 of 109 in the row for diesel?
iii) In 1993 one unit of paraffin cost £2; how much did
Paraffin 100 105 106
one unit of paraffin cost in 1998?
iv) In 1998 one litre of petrol cost £0.81; how much did one litre of petrol cost in 1993?
66
SIMPLE COMBINED INDEX
A simple combined Index which is commonly called COST OF LIVING INDEX can be calculated to compare
the cost of items in one year with the cost of the same items in the base year.
TOTALCOST ∈YEAR A
SIMPLE COMBINED INDEX FOR YEAR A = × 100
TOTALCOST ∈BASE YEAR
The index will indicate a percentage change in the cost of the items. This will only be relevant if exactly
the same items are bought in the two years.
EXAMPLE
The table shows the cost, and quantities of pens, pencils, and erasers used by a student in two years
2015 and 2016
item 2015 2016 Average number Cost per item Cost per item
purchased/weigh 2015 in 2016
t
Pen 80 95 25 80 cents 95 cents
Pencil 35 45 12 35 cents 45 cents
eraser 25 30 6 25 cents 30 cents
TOTALCOST ∈2016
Based on 2015, a combined index for 2016 = × 100
TOTALCOST ∈BASE YEAR
EXERCISE
67
1.Last year Tsaone kept 3 dogs, 7 cats and 25 tropical fish at her house as pets. The annual cost of
feeding a dog was P800, a cat was P350 and a tropical fish was P10.
i) Calculate how much it cost her to feed all of her pets last year.
This year the annual cost of feeding a dog is P1000, a cat is P400 and a tropical fish is P15.
ii) Calculate the amount that you would expect Tsaone to spend on pet food this year.
iii) Use your answers in i) and ii) to find a simple combined index, to 1 decimal place, for the costs this
year, based on last year’s costs.
2. To keep his car in good condition, a man spent the following amounts in 2003:P200 on servicing the
car every 3 months; P40 on cleaning every month; one new tyre (costing P500 each) every 4 months.
In 2004 the cost of servicing the car had increased by 10%; the cost of cleaning had increased by 20%
ii) For 2004 write down a) the cost of each service, b) how much he paid each time the car was
cleaned.
iii) Hence find the amount that you would expect him to pay altogether in 2004.
iv) Use your answers to i) and iii) to find a ‘cost of car-care’ index for the man in 2004, based on 2003.
3. An average family buys 150 kg of potatoes, 60 kg of cabbages and 85 kg of onions per year.
The price of 1 kg of each item is given (in £) for 2001, 2002 and 2003.
Year 2001 2002 2003 i) Using 2001 as the base year, find to 1 decimal
place, a simple combined index for these items in
Item
a) 2002,
Potato 0.28 0.34 0.40
b) 2003.
Cabbage 0.18 0.22 0.25
ii) Find a simple combined index for the items in 2003
Onion 0.32 0.35 0.38
using 2002 as the base year.
The predicted simple combined index for 2005, using 2001 as the base year, is 150.
68
For 2005 the predicted cost for 1 kg potatoes is £0.46 and for 1 kg cabbage it is £0.28.
Costs are made up of spending on a combination of items. An aggregate index is a weighted average of
the price relatives of a combination of items. Suitable weights can be found from base year quantities or
from base year expenditure. An aggregate index for any chosen year will only be accurate if the base
year weights are valid in the chosen year.
Jane runs a small business from a small office. Last year her business costs were;
This year:
69
Telephone = $ 0.40 × 500 = $ 3000
159
Office rental × 100 = 106
150
55
Electricity × 100 = 110
50
42
Telephone × 100 = 105
40
Aggregate Index =
∑ (weight × price relative )
∑ (weights)
( 3× 106 ) + ( 2 ×110 ) +(5 ×105)
Aggregate Index =
(2+3+5)
1. Calculate, correct to 2 decimal places, an aggregate index using i) Price relatives of 116, 103 and 96
with weights of 15, 7 and 8, respectively. ii) Price relatives of 112.5, 88.75 and 146 with weights of 11, 23
and 16 respectively. iii) Price relatives of 92, 105.5, 117 and 98 with weights of 25, 40, 23 and 12,
respectively.
Price (£) in year i) Calculate the price relative for each item in 2004,
C 6.00 5.76
3. During 2003 a primary school’s budget was spent on salaries, equipment and maintenance materials.
70
The amounts spent were £100 000, £25 000 and £12 500, respectively.
i) Suggest, using the figures given above, suitable weights that could be used to calculate an aggregate
index for the cost of running the school. Give the weights as a simple ratio.
In 2004 all school staff received salary increases of 9%, the cost of equipment rose by 13% and the cost
ii) Write down the price relatives for the three items in 2004, based on 2003 prices.
iii) Using your answers to i) and ii), calculate an aggregate index for running the school in 2004, using
2003 as the base year. Give your answer correct to 2 decimal places.
iv) The cost of running the school in 2003 was £137 500.
Use the index that you have calculated to find an estimate of the cost of running the school in 2004.
There are several reasons why the index that you have calculated may not give an accurate reflection
of the cost of running the school in 2004. The reasons are that, in calculating, certain assumptions have
been made. For example: the school may not have undertaken the same amount of maintenance work
in 2004 as in 2003 and so may not have purchased the same quantity of maintenance materials.
v) Suggest two more detailed reasons why the index may not be accurate.
4. In order to calculate an aggregate cost of housing index, a woman used her housing expenses in 2003
as the base for her calculations. In 2003 she spent the following: £480 on maintenance and £280 per
month on mortgage repayments. She also used 1280 units of electricity at £0.75 per unit.
ii) Use the information given and your answers to i) to suggest, in simplified form, suitable weights
that the woman could use to calculate an aggregate cost of housing index.
From 2003 to 2004 the cost of each maintenance job increased by 12%, her monthly mortgage
repayments decreased by £8.40 and the cost of one unit of electricity increased to £0.81.
iii) Calculate the price relatives for maintenance, mortgage repayments and electricity in 2004, using
2003 as the base year.
iv) Use the weights and price relatives in ii) and iii) to find an aggregate cost of housing index for 2004.
v) If the woman used only 1250 units of electricity in 2004 and if the index that you have calculated is
71
Actually correct, how much did she spend on maintenance in 2004?
Public health
Policing and employment
Allocation of resources to where they are needed
CRUDE RATES
These are simple rates that give the total number of events occurring in a population without reference
to the individuals or sub groups (strata) within the population. Crude birth rate and death rate measure
the number of births and deaths per thousand (‰) with no reference to the ages of those in the
population.
72
NUMBER OF BIRTHS
CRUDE BIRTH RATE/ FERTILITY RATE = ORIGINAL POPULATION ¿ ¿ ¿ ×1000‰
NUMBER OF DEATHS
CRUDE DEATH RATE/ MORTALITY RATE = ORIGINAL POPULATION ¿ ¿ ¿ ×1000‰
NOTE, crude birth rates can be calculated per thousand of the female population.
Example
A city has 125 000 people at the start of the year. There were 1425 deaths during the year. Calculate the
crude death rate.
1425
×1000 = 11.4‰
125 000
EXERCISE
1. The population of a small village was 4400 and during that year 33 people died. Find the crude death
rate.
2. Last year, in a small town whose population numbered 20 000, the death rate was 5.6 .
3. In a location, where the crude death rate was 7.44 , there were 125 deaths last year. Find, to the
nearest hundred, the population of the location at the beginning of last year.
4. During 2003 the crude death rate of a city, in which 5100 died, was 8 . The crude birth rate in
the city was 12 .
ii) Assuming there was no migration in or out of the city, find the population at the end of 2003.
5. The table shows information about the three classes of employees at two chemical companies
Fertilog and Drainmaster. Crude accidents rates are measured per thousand. The crude accident
rates for class C employees in the two companies were identical.
73
FERTILOG DRAINMASTER
class NO NO CRUDE NO NO CRUDE
EMPLOYEES ACCIDENTS ACCIDENT EMPLOYEES ACCIDENTS ACCIDENT
RATE RATE
A 10 A 100 B 3 150
B 24 2 C 55 11 D
C 64 4 e f 8 e
STANDARDISED RATES
A rate becomes standardized when the crude values used in its calculation are weighted. In the case of
birth and death rates this is done by taking into account the different age groups that exist in the
population. The weights are called standard population figures, and they reflect the proportion of each
age group in the population.
Standard death rates give measures of the healthiness of different environments. The lower the
standardized death rate the healthier the environment.
NUMBER OF DEATHS
CRUDE DEATH RATE = ×1000‰
ORIGINAL POPULATION ¿ ¿ ¿
126
×1000 = 8.4‰
15 000
74
¿ ¿ = 9.2‰
1. The table below gives information on the populations of two towns, Northside and Southlake.
Northside Southlake
o
Age Populatio N Populatio Group death Standard
group n n rate
population
deaths
( )
i) Calculate the crude death rate for Northside. ii) Calculate a standardized death rate for Southlake.
iii) Calculate the crude death rate for Southlake. iv) Calculate a standardised death rate for
Northside. v) Giving a reason, state which of the two towns appears to be a healthier place in
which to live.
2. The data in the table below relate to the coastal towns of Blymouth and Mounton.
Blymouth Mounton
o
Age group Population N deaths Population Standard population
Death rate ( )
30 - 49 b 45 6 7000 25%
i) Find each of the numbers represented by the letters a, b and c. ii) Calculate the crude death rate of
Blymouth. iii) Calculate a standardized death rate for Blymouth.
The crude death rates for the total populations of Mounton and Blymouth are identical.
75
iv) Calculate the total number of deaths that occurred in Mounton.
In Mounton, the death rate of the under 30s is 8.5 and there were 110 deaths in the under 50s age
group.
BLYMOUTH MOUNTON
AGE POPULATION NO FERTILITY POPULATION STANDARD
GROUP OF BIRTHS RATE POPULATION
0 – 19 11 000 99 A 8000 42%
20 – 29 B 450 60 7000 25%
30 – AND OVER 5500 C 12 5000 33%
The crude fertility rate for the population of Mounton is twice that of Blymouth.
4.
76
5.A family wished to investigate changes in their cost of living. They
chose five items, as given in the table below, from a normal week’s
groceries, and recorded the price per unit of each item every three
months for a year. The price relatives obtained, taking the prices on
January 1st as base, are given in the following table, together with the
weights for each item.
Item Weight March 31st June 30th September 30th December 31st
Meat 6 106 108 107 109
Brea 4 103 104 105 107
d
Milk 5 103 109 110 113
Coffe 2 105 107 109 110
e
Tea 3 102 105 107 106
(i) (a) Calculate a simple average of relatives index for December 31st, taking
January 1st as base.
(b) State one disadvantage of using this as an index number.
(ii) Calculate, to the nearest integer, a weighted aggregate price index for
December 31st, using January 1st as a base.
77
PROBABILITY AND EXPECTATION
Probability can be defined as the numerical value, 0 ≤ P ≤ 1, that represents
the likelihood of a given event occurring. Probability refers to chance. The
probability of a single event occurring, (or two or more events) maybe
theoretical or based on observations of what has happened in the past. In
probability an EVENT refers to something that takes place. An OUTCOME is a
result of an event. SAMPLE SPACE often denoted as ‘S’ refers to a set of all
possible outcomes of an experiment. The study of probability is important
because it is used in fields such as:
Insurance and risk management companies
Stock market investors
Speculators
Weather forecasting
Games ( like lottery and casino)
PROBABILITY SCALE
The probability of an event is measured on a scale of 0 to 1. The probability scale assigned to the event
( E) is thus known as the number known as the probability of event ‘ E’ written as P(E) and takes the
values 0 ≤ P ≤ 1. In addition to satisfying 0 ≤ P ≤ 1 if
1
0 1
2
The probability of an event, written as P(OUTCOME) can be given as a fraction, decimal or percentage.
P (OUTCOME) =
NUMBER OF EVENTS FAVOURABLE ¿ THAT OUTCOME ¿
TOTAL NUMBER OF OUTCOMES
EXAMPLE.
78
A fair die is rolled, what is the probability of, (i) getting an even number, (ii) a number less than 3, (iii) a 4
or a 5.
Possible outcomes are 1,2,3,4,5 and 6
3 1
(i) P(EVEN NUMBER) = =
6 2
1
(ii) P(NUMBER LESS THAN 3) =
2
2 1
(iii) P(4 OR 5) = =
6 3
Randomly selecting a SPADE, CLUB, HEART OR DIAMOND from a pack of 52 playing cards are 4 equally
likely outcomes. The aim of selecting objects or people randomly is to give each particular object or
person the same chance of being selected, thus avoiding bias or favoritism. If X objects are randomly
selected (without replacement), from N objects then the probability of selecting any particular object is
X , Number of events
N ,Total number of possible outcomes
X
P (OUTCOME) = N
EXAMPLE
One student is selected at random from a group of 12 boys and 8 girls.
OUTCOME PROBABILITY
Selecting a particular student 1
20
Selecting a particular boy 1
20
Selecting a particular girl 1
20
Selecting a boy 12
20
Selecting a girl 8
20
EXHAUSTIVE OUTCOMES
79
The outcomes of an event are said to be exhaustive if they describe a complete set of possible results.
Some examples of exhaustive outcomes are:
For any event, one of the exhaustive outcomes is certain. From the examples given in the table above;
1 1
(i) P(HEAD) + P(TAIL) = + =1
2 2
1 1 1 1 1 1
(ii) P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = + + + + + =1
6 6 6 6 6 6
13 39
(iii) P(HEART) + P(NOT HEART) = + =1
52 52
3 3
(iv) P(EVEN NUMBER) + P(ODD NUMBER = + = 1
6 6
1 1 4
(v) P(1) + P(2) + P(MORE THAN 2) = + + = 1
6 6 6
FOR EXHAUSTIVE OUTCOMES P (A) + P (NOT A) = 1
EXAMPLE
If the probability that a student will be late for class is 0.3. Then it follows that the probability that such a
student will be early for class is 0.7.
P (late) = 0.7
1 – 0.3 = 0.7
Exhaustive events are often referred to as complementary events. Either the event occurs or it does
not.
EXERCISE
1. A class list contains the names of 15 boys and 20 girls. A teacher randomly selects a name from
the list.
(i) How many names are on the list?
(ii) What is the probability that a particular student is selected?
80
(iii) What is the probability that a particular boy is selected?
(iv) Find as a simple fraction the probability that a teacher selects a boy.
(v) Is the teacher more likely to select a boy or girl?
2. The probability that it rains on any particular day in Moistville is 0.66.
(i) What is the probability that it does not rain on any particular day?
(ii) On how many days is rain not expected in Moistville in 30 days?
3. A bag contains 12 coloured balls of which three are red, four are blue and five are green. A girl
selects one ball at random from the bag.
(i) What is the probability that a particular ball is selected?
(ii) What is the probability that a particular red ball is selected?
(iii) Find the probability that the selected ball is (a) red (b) blue (c) green (d) not red
GENERALY AND = × OR = +
Venn diagrams use regions to represent sets of outcomes that are favorable to particular events.
Outcomes favorable to a particular event are shown inside the region labeled for that event. Outcomes
favorable to both events are shown in the region where the events overlap (see diagram 3). Outcomes
favorable to nether event are shown in the region outside the sets. (See diagram 4)
81
TRIALS AND EXPECTATION
If an event is repeated, we can estimate the number of times that any outcome will occur. The repeated
event is a trial. If there are N trials then outcome A is expected to occur N ×P (A) times.
EXAMPLE
(i) A six
(ii) A square number
82
EXERCISE
SINGLE EVENTS
MUTUALLY EXCLUSIVE AND NON-MUTUALLY EXCLUSIVE OUTCOMES OF A SINGLE EVENT
When looking at a single event, it may be useful to know the probability that “either this or that”
happens, that is the probability of outcome A or outcome B, written P(A or B). The rule
83
OUTCOMES OF A SINGLE EVENT ARE MUTUALLY EXCLUSIVE IF THEY HAVE NO COMMON FAVOURABLE
RESULT
The common favorable result of two outcomes must not be counted twice when looking for the total
number of favorable results.
EXAMPLE
The table below shows the number of animals Edwin has at the cattle post;
EXAMPLE 2
The five cards shown below are laid face down on a table. One card is picked at random.
1 2 4 5 9
84
Consider the outcomes; (a) an odd number is selected (b) a prime number is selected
3 2
The probabilities of these outcomes are (a) P (ODD) = (b) P (PRIME) =
5 5
3 2
If we use the rule P (A or B) = P (A) + P (B) then we find P (odd) + P (prime) = + = 1.
5 5
This answer claims that any of the five outcomes are odd or prime, but 4 is neither prime nor odd. The
card 5 has been counted twice as it is favorable to both outcomes. Outcomes A and B are both mutually
exclusive so P (A or B) # P (A) + P (B). Three results are favorable to A (1, 5 and 9); two to B (2 and
5); one to both A and also B (5).
3 2 1 4
+ - =
5 5 5 5
In summary we have
P (A and B) = 0
EXERCISE
1.The grades awarded to 40 students in a Mathematics test are given in a table below;
GRADE A B C D E F
NUMBER OF 2 8 10 9 6 5
STUDENTS
Grades A to C are credits, grade D and E are passes. Grade F is failure. One of the students is
randomly selected. Write as a simple fraction the probability of selecting a student who
obtained;
(i) Grade A (ii) grade F (iii) a credit (iv) not an E or F
2. The table below gives information on the results of an examination taken by 80 students.
Pass Fai Totals i) One student is chosen at random. Find the probability that the
l student
Boys 32 4 36
85
Girls 39 5 44
Totals 71 9 80
a) passed, c) failed, e) passed or is a boy,
ii) A boy is selected at random. What is the probability that he failed? iii) A girl is selected at random.
What is the probability that she passed?
3. The letters A, B, B, B, C, D, D, and E are each written onto squares of card and are placed into a bag.
One card is randomly selected. What is the probability that the letter written on the card is
i) A. ii) Not D. iii) In the word CADET or BADGE. iv) A vowel or in the word DONKEY.
i)(vi)
Red. ii) A red card or a picture card. iii) Black or a picture card.
(vii)
iv) A heart. v) A heart or picture card. vi) A red card or a non-picture card.
iii) A picture card. vii) A red picture card or a Queen. xi) Neither red nor an ace.
EXERCISE
1. The table gives details of the number of items a boy is carrying in a box.
86
(i) What fraction of the items are (a) red (b) not green (c ) pencils (d ) not pens
(ii) One of the items is selected at random from the box find the probability that it is (a ) a red
pencil ( b) neither green nor pen (c ) either red or a pencil ( d) either a sweet or blue (e)
not a green pen (f) either a pencil or not blue
(iii) A pencil is selected at random, what is the probability that it is red
(iv) A blue item is selected at random, what is the probability that it is not a pencil.
2. . Forty compounds were sampled to find the number of donkeys and dogs that were kept by the
occupants.
g 2 5 3 2 1 ii) What is the probability that there were no donkeys in the compound?
iii) What is the probability that there were not equal numbers of dogs and
donkeys in the compound?
iv) What is the probability that there were no donkeys in the compound?
v) What is the probability that there were not equal numbers of dogs and donkeys in the compound?
he
87
TREE DIAGRAMS
When more than two events are being considered then two way tables cannot be used,
therefore another method of representing information diagrammatically is needed. Tree
diagrams are a good way of doing this. A tree diagram has branches which show different
outcomes. To find the probability of a particular outcome you multiply the probabilities on the
branches that lead to it. The probability in each set of branch adds up to 1.
EXAMPLE
A bag contains three red buttons and two blue buttons. A button is taken at random, replaced,
and then another button is selected at random.
88
EXERCISE
1. A bag contains six coloured balls of equal size: 3 are red, 2 are blue and 1 is green. One ball is
randomly selected and then replaced; another ball is then randomly selected.
i) Write down the probability that a) the first ball is red, b) the first ball is not blue, c) the second ball
is green.
ii) Calculate the probability that a) the first ball is red and the second ball is green, b) the first ball is not
blue and the second ball is red c) both balls are red, d) neither ball is blue, e) one of the balls is red and
one of the balls is green, f) just one of the balls is red g) at least one of the balls is green, h) one of the
balls is green and the other isn’t red, i) At most one of the balls is blue, j) one ball is yellow and the other
ball is neither green nor red.
89
2. A bag contains 5 balls of equal size: 3 balls are white and 2 are purple.
One ball is selected at random and, without replacement, a second ball is randomly selected.
i) Find, as a simple fraction, the probability that (a) two white balls are selected, ,( b) two purple balls
are selected c) balls of the same colour are selected) (d) balls of different colours are selected.
3. Arnold has 3 tins of beans and 6 tins of peas. The tins are identical in shape and size but all the labels
have been removed. If he opens two tins at random find the probability that
i) Both contain beans. ii) Neither contains beans. iii) Just one contains peas.
3. Two cards are randomly selected from a normal pack of 52 playing cards.
i) Both red. v) Both the 7 of diamonds. ix) Two aces or two red Jacks.
ii) Both spades. vi) One of each colour. x) Of the same colour.
iii) Both Queens. vii) One black card and one red picture card. xi) Of the same suit.
iv) Both picture cards. viii) One Heart and one black King. xii) Identical.
4. . The grouped frequency table below gives the heights of 40 children.
Height (h cm) 150 £ h < 155 £ h < 160 £ h < 165 £ h < 170 £ h <
155 160 165 170 175
No children ( f ) 6 13 15 5 1
Two of these children are selected at random. Find the probability that
i) Both are less than 155 cm. iii) Both are 160 cm or more. v) At least one is less than 155 cm.
ii) Both are less than 160 cm. iv) Just one is less than 155 cm. vi) Both are 170 cm or more.
90
5. The table shows the flavours, sizes and numbers of the different fruit drinks that a boy has in his cool
box.
Orange Lemon
Small 5 4 9 i) If two drinks are randomly selected, find the probability that
ii) If two orange drinks are selected, find the probability that both are large.
iii) If two small drinks are selected, find the probability that they are of the same flavour.
91
DEPENDENT EVENTS AND CONDITIONAL PROBABILITY
MUTUALLY EXCLUSIVE EVENTS
Two or more events are mutually exclusive if they cannot occur at the same time. For two events A and
B to be mutually exclusive then;
P (A AND B) = P (A ∩ B) = 0
DEPENDENT EVENTS
Events are mutually dependent if one event has an effect on the probabilities of the outcome of the
other event. Probabilities in such cases are said to be conditional as they depend on the outcome of
another event. This is typical when selections are made without replacement. Examples of experiments
which will produce mutually dependent events are:
Selecting a card from a pack of cards, not replacing it, and selecting another card from the same
pack.
Selecting two balls from a bag at the same time or one after the other.
Selecting two students from a class.
EXAMPLE 1
Two students are randomly selected from a class of 18 girls and 22 boys. Find the probability that;
I) They are both girls ii) a girl and a boy are selected iii) a particular student is selected
ANSWERS
18 17 51
I) P (GG) = 40 × 39 = 260
II) P ( GB) = ( GB) OR (BG)
[ 18 22
×
40 39
+ ] [
22 18
×
40 39
=
33
65 ]
III) P ( A PATICULAR STUDENT) = P ( P,NP) OR P ( NP,P)
[ 1 39
×
40 39 ] [
39 1 1
+ 40 × 39 = 20 ]
TREE DIAGRAM FOR ABOY/GIRL SELECTION
92
EXERCISE
1. . A box contains 4 toffee sweets and 8 chocolate sweets: two sweets are randomly selected at the
same time.
i) Find the probability that a) two toffees are selected, c) first toffee then chocolate is selected,
b) two chocolates are selected, d) one of each type is selected.
2. . Two students are randomly chosen from a group of 10 boys and 18 girls. Find the probability of
selecting i) Two boys. ii) Students of the same sex. iii) One girl and one boy. iv) At least one girl.
3. There are 120 women attending a conference. Sixty percent of the women are married; 25% are single
and 15% are widows.
i) How many of the women at the conference are currently not married?
ii) Two of the women at the conference are selected at random. Find the probability that
a) a particular woman is selected, b) a particular married woman is selected, c) both the selected
women are widows, d) one is married and the other is single, e) just one is married,
f) one is single and the other is currently not married.
4. The diagram below is a cumulative frequency polygon illustrating the numbers of hours that 1000
doctors worked last week.
No doctors (cf )
1000
800
93
To relieve the stress of long working hours, the hospital management decided to offer a free holiday to
two randomly selected doctors. ii) Find the probability that a) both the selected doctors worked for 80
hours or more, b) at least one of the selected doctors worked for 85 hours or more.
iii) Calculate an estimate of the probability that one of the selected doctors worked at least 30 hours in
1. A player throws a normal six-sided die in a game and is awarded a prize of £12 if he obtains a square
94
2. A game consists of a player, having paid a stake of £6.50, drawing a card at random from a normal
pack of 52 cards. If the player draws a picture card, she wins a prize of £26.
i) Calculate a player’s expected winnings and explain why the game is not fair.
ii) How much profit can the organiser expect to make if 40 people play the game?
3. A stake of £2.75 is paid by a player to roll a normal dice. A prize of £3 is awarded if a player obtains a 6
ii) If the organiser expects a profit of £20 per day, how many people is he expecting to play each day?
4. A square spinner numbered 2, 4, 6, 8 is spun and the number scored is equal to the prize in Pounds.
i) Calculate the stake for playing this game, if it is known to be a fair game.
ii) The organiser increased the stake by 20% and replaced the number 8 to make his expected profit
when twenty people play the game become £15. With which number did he replace 8?
5. A man in a shopping mall has three small containers that are turned upside-down on a board; there is
a peanut under one of the containers.
The man shuffles the three containers very quickly and members of the public are asked to pay P50 to
guess which container the peanut is under.
If a person guesses correctly, the man will give him or her a prize of P100.
Explain why this game is not fair and find the prize that should be awarded to make it a fair game.
In a game, for which the stake is £9, a player randomly throws one dart at
the board.
95
Assuming that the dart that is thrown sticks within the perimeter
ii) The probability that a player wins a) the £20 prize, b) the £30 prize.
The organiser was encouraged to make the game fair by increasing the smaller prize.
iv) By what percentage should she increase the smaller prize to make the game fair?
7.
8.
96
97
8
98
9.
99
10.
100