0% found this document useful (0 votes)
24 views100 pages

Statistics Notes

stats

Uploaded by

quintrish9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views100 pages

Statistics Notes

stats

Uploaded by

quintrish9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 100

1

CONDUCTING A SURVEY

Surveys can take place in the street, homes, work places or at school. Surveys for market research take
place near shopping complexes. The survey can involve the interviewer asking and noting questions . a
set of questions used in a survey is called a questionnaire.

QUESTIONNAIRES

A survey has a clear purpose, so the questionnaire must

 Be relevant to the purpose of the survey


 Be simple to understand and easy to complete
 Not take too long to complete

OPEN AND CLOSED QUESTIONS

Open questions allow the interviewee to respond in any way they like. Open questions may begin with
phrases such as

 What do you think about…..


 What is your opinion on……

Responses to open questions are difficult to compile and analyze. They often need to be interpreted and
therefore can be misunderstood.

Closed questions allow the interviewee to respond in a limited number of ways. Answers are restricted
and some interviewees may feel none of the available answers is suitable.

Closed questions contain phrases such as

 Answer yes or no to the following….


 Tick the box
 Do you always, sometimes, or never…..

Responses to closed questions can be assigned scores to make compilling and analysis easier. Eg yes 10
no = 1. Agree =5, agree strongly 10, disagree = 2

TYPES OF A VARIABLE

A variable is something that can change or take different values. A variable can be categorized into one
of the four types –

 Quantitative
 Qualitative
 Discrete
 Continuous

2
3
QUANTITATIVE AND QUALITATIVE VARIABLES

QUANTITATIVE variables are measured, counted or observed. They have numerical value and can be
ranked or ordered from small to large or vice versa.

EXAMPLES

 Masses of students in a class


 Test scores of students
 Amount of money owned by individuals

QUALITATIVE variables have qualities or characteristics, and are described in words rather than
numbers. Generally they can only be ranked alphabetically.

EXAMPLES

 Colours of shoes worn by women


 Surnames of teachers at school
 Types of food eaten by cats and dogs

DISCRETE AND CONTINOUOUS VARIABLES

DISCRETE variables are those that can take only certain or specific values within a given range and the
number of possible values are countable. There is always a gap between one value and the next value.

EXAMPLES

 The number of people travelling in buses through town


 Shoe sizes worn by people
 Number of sides a polygon has.

DISCRETE QUANTITATIVE variables can take values which are not whole numbers or integers. For
example shoe sizes can be 4.5 7.5. the values may appear at regular intervals or irregular within a given
range.

Within a given range, the following DISCRETE AND QUALITATIVE variables have a countable number of
possible qualities or characteristics

 The surnames of policemen are specific and therefore countable


 The manufactures of new cars
 The days in a week a supermarket can open

CONTINUOUS VARIABLES are those that can take any value within a given range or ranges. Values of
CONTINUOUS QUANTITATIVE VARIABLES are usually measured and the number of possible values
within a range is uncountable. Because measurement is required, values of continuous variables are

4
given to a specific degree of accuracy. It is therefore not possible to give exact values for a continuous
variable.

EXAMPLES

 the height of people who to the nearest meter are 2meters. Any value from 1.5 ≤ height ˂ 2.5
 the ages of trees who to the nearest 10 years are 30 years. Any value 25 ≤ age˂35
 the masses of objects, which correct to 1 decimal place are 7.3 kg. any value 7.25˂ mass˂7.35

within a given range all the values all the values of CONTINUOUS QUALITATIVE VARIABLES have an
unaccountable number of possible characteristics or qualities

 the eye colour of brown eyed people. Different shades of colour.


 the texture of school jersey. There are various degrees of roughness.
 the feelings of happy people. Moods can be described in uncountable number of ways.

CLASS EXERCISE

1. Eight variables are listed: shape, height, price, altitude, volume, attitude, duration, number of cars. i) Which
of these variables are qualitative? ii) Which of the quantitative variables are continuous?

2. Four variables W, X, Y and Z are defined: W = {the number of lions seen on game drives in Chobe} X = {all
integers between 3 and 9, inclusive} Y = {any number such that 3 < Y < 9} Z = {all possible shoe sizes}
State whether each of the variables is discrete or continuous.

3. State which three of the following are not discrete quantitative variables and in each case give a reason.

A: the types of food eaten by Rre Mogapi’s cat.B: the number of white cars purchased each month in Gaborone.

C: the number of eggs in a box containing one dozen eggs.D: the mass, measured to the nearest kilogram, of the
customers entering a supermarket. E: the heights of the trees in a forest.

4. . A student was asked to name a discrete quantitative variable and answered “My score in the
Mathematics test that we wrote yesterday”.

Explain why his answer is not correct and suggest how he could correct it by changing just a few words.

5. When examination results are announced, candidates are awarded either a qualitative variable or a

quantitative variable or both.

Describe these two variables and explain how they are related.

5
GROUPED DISCRETE DATA

Grouping data is convenient if a variable has many values or if the total frequency is large. This will lead to some

Information about the variable to be lost as values will not be shown individually. There are a number of reasons
why data has to be grouped

 when a variable has a long list of numbers


 to derive more meaning from the data

Grouped discrete data can be tabulated as a frequency distribution or displayed in a variety of diagrams.

FREQUENCY DISTRIBUTIONS TABLES

A frequency distribution table consists of CLASSES of grouped values of a variable. The number of values in each
class is is shown as a frequency

CLASS MEASURES

A class is a group of values. For classes of discrete data a gap will always appear between one class and the next
class. Each class has the following measures associated with it.

LOWER LIMIT AND UPPER LIMIT

A lower limit and upper limit are the lowest and highest actual values that exist in a class

LOWER BOUNDARY AND UPPER BOUNDARY

These are the two values in the middle of the gaps between a class and those on either side of it. They are found
half way between the upper limit of one class and the lower limit of the next class

CLASS INTERVAL

Class interval is the difference between a class upper boundary and a lower boundary.

CLASS INTERVAL =UPPER CLASS BOUNDARY – LOWER CLASS BOUNDARY

CLASS MID VALUE

Class mid value is the value that is halfway between the limits of a class, and for grouped discrete data is the
same as halfway between class boundaries.

LOWER CLASS LIMIT +UPPER CLASS LIMIT


CLASS MID VALUE =
2

6
EXAMPLES

Eggs laid by 40 chickens in a farm were recorded as follows

10,11,1,4,13,9,11,8,14,8,1,11,9,6,14,12,3,6,9,13,12,5,0,7,10,4,6,9,13,9,10,2,11,3,12,13,9,14,13,7

This data can be summarized as follows

Number of 0-2 3-5 6-8 9 - 11 12 - 14


eggs

Number of 4 5 7 13 11
chickens
(f)

Note the data is discrete and there are no gaps between the classes.

 Lower class limits are 0,3,6,9,12


 Upper class limits are 2,5,8,11,14
 Class boundaries are -0.5,2.5,5.5,8.5,11.5,14.5
 The classes have equal class intervals of 3
 Class mid value are 1, 4, 7, 10, 13

APPROXIMATION AND ROUNDING OFF

Discrete data also become grouped when values are rounded off to a certain degree of accuracy.

Suppose you counted the number of chips in each packet and the numbers were

15,18,20,22,25,27,31,32,33,and 34 and if these values are rounded to the nearest 10 the data can be grouped
into classes 4 packets of 20 and 6 packets of 30

Number of chips(to nearest 10) 20 30


Number of packets 4 6
The first value of 20 represents 15 to 24 inclusive and the second value 25 to 34

actualNumber of chips 15 - 24 25 - 34
Number of packets 4 6
 Lower class limit 15 and 25
 Upper class limit 24 and 34
 Lower class boundaries 14.5 and 24.5
 Upper class boundaries 24.5 and 34.5
 Class interval 10
 Class mid value 19.5 and 29.5

7
1. The table below shows badminton games won by 90 players last year.

NUMBER OF 0-4 5-9 10 - 14 15 - 19 20 - 24 25 - 29


GAMES WON
NUMBER OF 25 19 15 21 8 2
BADMANTON PLAYERRS

A) Why is it not possible to find the number of players who won 10 games last year
B) State the mid value of the class with the second largest number of players in it
C) Write the class intervals of all the classes.

2. The table below shows the number of people who entered 25 different banks between 8 am and 10 pm
corrected to the nearest 10.

No of people(nearest 10) 30 - 40 50 - 60 70 - 80 90 - 100


No of banks (f) 8 12 1 4

i) What is the smallest possible number of people who entered the bank any one of these days
ii) What is the largest possible number of people who entered the bank any one of these days
iii) Write down the upper class boundary of the class containing the largest number of people
iv) Calculate the largest possible and smallest possible number of people that could have entered
the bank between 9 am and 10 pm.

3. Correct to the nearest 100. The number of pupils attending 10 different schools is given below.

No of pupils( nearest 100) 100 - 300 400- 600 700 - 900


No of schools (f) 3 5 2

Of these 10 schools westgale has the lowest number of pupils attending.


i) What is the smallest number of pupils attending west dale
ii) Highport has the highest number of pupils attending, what is the largest number of pupilsb
attending highport.
iii) Write down the mid value and class interval of a school with 400 – 600 pupils attending.
iv) Calculate the largest number of pupils’ altogether.

STEM AND LEAF DIAGRAMS ( STEM PLOTS)

These are used to illustrate discrete data in equal interval classes. They are used for displaying small to medium
amounts of data and they always allow us to see the original/raw data after grouping. There are no restrictions
on the class intervals used but they must be equal. A useful feature is that two sets of related data can be shown
back to back for making comparisons.

8
The diagrams are similar to bar charts only the bars are made from final digits of each piice of quantitative data
called the leaves and the stem is made from the remaining digit which can be ascending or descending. A key
must be included to explain what the values in the diagram represent.

EXAMPLE

Test scores in percentages of 30 male students in Physics are shown below.

43,31,23,37,58,61,72,70,77,68,82,39,67,53,61,55,45,59,91,52,83,27,61,45,30,64,46,59,62,41

The scores are classified and ranked in equal interval groups, 20 – 29, 30 – 39, 40 – 49 and so on. The tens digits
appears as a stem and tha leaves are the units.

Males (30) Key


2 3 7
3 0 1 7 9
4 1 35 5 6
5 2 3 5 8 9 9
6 1 1 12 4 7 8
7 0 2 7
8 2 3
9 1

2 3
represents score of 23% for a male

note that the leaves are aligned vertically so as to produce a bar -like shape. If we now have marks for 30 female
students as

73,51,86,40, 61,64,75,73,80,71,85,86,70,79,64,58,48,62,94,55,83,30,88,58,82,78,97,62,65,74

9
DATA REPRESENTATION

SUMMARY AND DISPLAY OF DATA

When displaying data it is very important that the presentation should be clear and easy to understand.
Consideration should be made as to who the data being presented to and what purpose does it serve.

TWO WAY TABLES

One method of displaying data is to use tables. The columns and rows should have headings and totals should
be included when appropriate.

EXAMPLE

The table shows the results of a Geography test taken by 45 students

pass p fail total


boys 22 4 26
girls 17 2 19
totals 39 6 45

 26 boys took the test


 2 girls failed the test
 33% of those who failed were girls.

EXERCISE

1. The table below gives information about 100 students who wrote a physics examination.

Pass Fail i) How many students passed?

ii) What fraction of the students failed?


Boys 45 15
iii) What fraction of the girls passed?
Girls 28 12
iv) Calculate

a) the percentage pass-rate for boys, b) the percentage pass-rate for girls v) Which group performed
better, boys or girls?

10
2. A woman sells red and yellow flowers at a market; she has small, medium and large flowers of each
colour. There are 120 flowers altogether and 75 of them are red. She has 20 small red flowers, 14
medium yellow flowers and 48 large flowers. Large red and large yellow flowers are in the ratio 2:1.

i) Copy and complete the table.


Small Medium Large Total
ii) What fraction of the flowers
are a)yellow, b)medium? Red 20 75

iii) What fraction of the red flowers is large iv) Yellow 14


What percentage of the small flowers is
yellow? [Answer to 1 decimal place]. v) If one Total 48 120
flower is selected randomly, what is the
probability that it is red.

3. Joan wants to tabulate data showing the number of students who passed or failed each of the three final
papers in Geography at her school in 2002 and in 2003. There were 322 students who sat for each of the three
papers in 2002; this was 23 less than in 2003. The numbers passing papers 1, 2 and 3 in 2002 were 310, 303 and
305, respectively. Fifteen more students failed paper 2 in 2003 than in 2002; equal numbers passed paper 1 in
both years and three times as many students failed paper 3 in 2003 than in 2002, which was 23 less than in
2003.

i) Tabulate the data.

ii) What is the greatest possible number of students that failed all three papers in 2003?

PIE CHARTS

A pie chart shows how a whole of something is divided up, and this is done by dividing a circle into
sectors.

Key features

 Each sector should be labeled with the name of the part it represents
 Sector angles must be in the same ratio of the part they represent

COMPARATIVE PIE CHARTS

Two or more sets of related data can be shown or compared using comparative pie charts. The areas of the two charts
must be proportional to the totals they represent. The radii are proportional and the totals are proportional, hence

11
r2 : R 2
t:T
therefore
Tr2 = tR2


2
R= Tr
AND t

1. At Deepdale School, 400 students sat for an English Language examination. A pie chart was drawn to
illustrate the results of these students using a radius of 6cm. At Shallowvale School, 576 students sat for the
same examination. Find the correct radius for a comparative pie chart showing the results of the Shallowvale
students.

2. A man surveyed 350 shoppers at CutPrice supermarket and 686 shoppers at BargainBin supermarket. The 350
shoppers at CutPrice were represented in a pie chart of radius 7cm. What is the correct radius to be used in a
comparative pie to represent the 686 shoppers at BargainBin?

3. A census was taken on the populations of the villages of Bristow and Chisholm. A pie chart of radius 8cm was
used to represent Bristow’s population of 6800 and a pie chart of radius 10cm was used to represent the
population of Chisholm. What was the population of Chisholm?

4. Kay earns £27 000 per annum. She has drawn a pie chart of radius 6.3cm to show how she spends her salary
and her partner Joshua has drawn a comparative pie chart with radius 5.67cm to show how he spends his
salary.

i) Calculate Joshua’s annual salary.

ii) Calculate, correct to 3 significant figures, the correct radius for another comparative pie chart

that could be drawn to show how Kay and Joshua spend their combined salaries.

5. The number of employees at a clothing manufacturer’s in 2003 was 19% less than in 2002.

A pie chart of radius 12.5 cm was drawn to represent the employees in 2002.

i) Calculate the radius of a comparative pie chart that could be drawn to represent the employees in

2003.

ii) If there were 486 employees in 2003, how many employees were there in 2002?

6. The table gives the number of males and the number of females who applied to join the Arme Forces at an office in
1993 and in 2003.
A pie chart was drawn to represent those applying in 2003 and the sector
1993 2003 area for females was 62 cm2. = 3.142, find

Males 423 424 i) The area of the pie chart used for 2003. ii) The radius of the pie chart
12
used for 2003.iii) The correct radius to be used for a comparative pie
Females 137 189
chart to represent those who applied in 1993.

iv) The area of the sector for males who applied in 1993.
PICTORIAL REPRESENTATION OF DATA

Data is represented by the use of symbols with a key to indicate what each symbol represents. Care
must be taken when drawing symbols. If the two symbols are different in shape and size they will not
represent the same item or the same number of items. It is recommended that we use symbols with
simple shapes as it may be necessary to draw fractions of a symbol.

1. Forty-five students were asked which sports they play. All students play at least one sport.
Their responses are illustrated in the pictogram.

Sport represents two


students
Softball

Football i) Find

Volleyball a) the most popular sport,

Table tennis b) the least popular sport,

c) what fraction plays volleyball,


Badminton

Tennis d) what percentage plays tennis.

iii) Express the number that plays softball to the number that does not play table tennis as a simple ratio.
iv) If a student is randomly selected from the group, what is the probability that this student play table
tennis?

2. A farmer grows five different types of vegetable on his farm. The area used for growing each is given below:

Vegetable Area (m2)


Cabbage 32 i) Using a suitable symbol to represent 16 m2, show this information
Carrot 64 in a pictogram.
ii) What is the total area used for growing vegetables?
Onion 48
iii) What fraction of the total area is used for onions?
Tomato 80 iv) What percentage, to 1 decimal place, is used for tomatoes?
Potato 40

13
A CHANGE CHART
A chart shows relative changes in a variable. The changes which are in percentages could be positive or
negative.

chnge
Percentage Change = x 100
original value
Example 1

14
Example 2

15
BAR CHARTS

Bar charts are columns of equal widths representing different numerical values. The heights represent
the frequencies and equal gaps should be drawn between the bars. Bars should not be shaded using
bright colors and complicated patterns as this could misrepresent the frequencies and distort the data.

There are several types of bar charts

 Simple bar chart


 Dual comparative bar chart
 Composite or sectional bar chart
 Percentage sectional bar chart

DUAL COMPARATIVE BAR CHART

Comparative bar charts are useful if two or pieces of related data are to be compared. A dual bar
chart is used to show pieces of related data. It consists of pairs of bars with one bar in each pair for
each set of data. Equal - width gaps should be left between the pairs and all bars should be of equal
width

A key should be used to distinguish between the two data sets. Example

Last month two travel companies, TourWell and TravelSafe organized holidays for customers to
different destinations and the numbers are given below and represented in a dual bar chart

16
Potato chip flavour Class A Class B
Smokey bacon 6 14
Barbecue 8 6
Cheese & onion 1 0
Salt & vinegar 11 8
Hot chilli 4 2

1. The results of a survey in two classes on the students’ favourite flavours in potato chips are shown.

i) Illustrate these data in a clearly labeled dual bar chart.

17
2. No patients admitted ( f ) Hospital A Hospital B Hospital C

16

12

one two three four five six Week


i) Tabulate the data that is shown in the bar chart. ii) Use the bar chart and your table to find

a) how many patients were admitted with malaria to the three hospitals altogether during these 6 weeks,

b) during which week the greatest number of patients were admitted into the three hospitals altogether

c) between which two consecutive weeks was the greatest change in the number of patients admitted to hospital C.

iii) Express the total number of patients admitted to hospitals A, B and C as a simple ratio. iv) What percentage of the total number of patients
admitted during this period was admitted to hospital B?

3. A company has recorded the numbers of its full-time and part-time employees for the years 2001 to 2003.

Y e a r i) Illustrate the data in a dual bar chart.

Type of employee 2001 2002 2003 ii) Express, in simple form, the ratio of the two types

of Full-time 78 77 95 employee in each year.


iii) In which year was the highest proportion of
Part-time 26 22 19
employees full-time?

iv) Over the three-year period, what percentage of the employees has been part-time?

SECTIONAL/ COMPONENT BAR CHART

In a sectional bar chart, each bar represents a total, and is divided into sections of how each total is made up. The size of
each total can then be easily compared relative to others and the proportions of each component in the bars can also be
compared.

For example

18
The sectional bar chart below shows the number of men, women and children living in Hill Street.

0 10 20 30 40

No

Men Women Children


i) Copy and complete the table

No men No women No children

2. The numbers of cars, trucks and


motorbikes serviced at Mike’s and
No vehicles ( f )
Jomo’s garages last month are
120
shown.

Trucks

Cars

80 Motor bikes

Garage
Mike’s Jomo’s
PERCENTAGE SECTIONAL BAR CHART

Each bar represents a total and is drawn to a height of 100%. The bars are divided according to sections
and each section is represents a percentage of the total. One advantage of this chart is that components
can be compared in bars easily and between bars. Contributions of components must be calculated and
this can only happen when totals are known.

19
3. A company has recorded the numbers of its full-time and part-time employees for the years 2001 to 2003.

Y e a r i) Illustrate the data in a dual bar chart.

Type of employee 2001 2002 2003 ii) illustrate the data in a sectional bar chart

Full-time 78 77 95
iii) illustrate the data in a percentage sectional chart
Part-time 26 22 19
2. Mrs. Mafela owns three filling stations.

The volume of petrol, diesel and paraffin sold, in litres, at each of these stations last week is shown.

Filling Volume of Volume of Volume of

station petrol (l) diesel (l) paraffin (l)

North 21000 12000 1500

South 18000 16500 6000

West 11000 11500 2500

i) a) calculate the total volume of each type of fuel that was sold,

b) what percentage of the petrol was sold at the North filing station?

c) draw a sectional percentage bar chart with one bar for each of the three types of fuel.

ii) a) calculate the total volume of fuel sold at each of the three filing stations,

b) what percentage of the fuel sold at the South filing station was diesel?

c) draw a sectional percentage bar chart with one bar for each of the three filling stations.

20
VENN DIAGRAMS

What is a Venn diagram?


A Venn diagram uses overlapping circles or other shapes to illustrate the logical relationships
between two or more sets of items. Often, they serve to graphically organize things, highlighting
how the items are similar and different. Venn diagrams show relationships even if a set is
empty. Venn diagrams are commonly used in school to teach basic math concepts such
as sets, unions and intersections

EXAMPLE OF A VENN DIAGRAM

21
 3 people can speak all the languages.
 5 people can speak both Urdu and French
 13 people speak only Arabic.

Venn diagram purpose and benefits


 To visually organize information: to see the relationship between sets of items, such as
commonalities and differences. Students and professionals can use them to think through the logic
behind a concept and to depict the relationships for visual communication.
 To compare two or more choices and clearly see what they have in common
versus what might distinguish them. This might be done for selecting an
important product or service to buy
 To compare data sets, find correlations and predict probabilities of certain occurrences.

22
DATA REPRESENTATION

ADVANTAGES DISADVANTAGES
PICTOGRAMS
 they offer an attractive visual impact  they can be hard to draw
 they are easy to read  fractional pictures are hard to read
 they can handle large volumes of data using  the can only represent few categories of data.
keyed symbols (2/3)
PIE CHARTS
 it shows relative proportions of item in  at times its tedious to calculate sector angles
relation to the whole  the actual frequencies are not shown and
 offers easy comparism of each item as parts might need to be calculated
of a whole  its accuracy depends on the accuracy of the
angles
 the answers in most cases must be
approximated
SIMPLE BAR CHARTS
 it is easy to see quantities  it is difficult to see proportions
 can be drawn easily  can easily be manipulated
 can only be used for discrete data
DUAL BAR CHARTS
 shows comparisms between variables  can only represent few variables
COMPONENT BAR CHART
 proportions of items can easily be seen in  can only represent few variables
bars
 can be used to see trends by looking at the
heights of bars
PERCENTAGE SECTIONAL BAR
CHARTS
 it shows proportions of various items  does not show actual quantities
 shows relative differences between
catergories
CHANGE CARTS

23
 shows relative changes in items and  only changes are shown therefore can be hard
quantities to read

ESTIMATING THE MODE FROM HISTOGRAMS

1. The lengths of a sample of ‘sausages’ from a moporoto tree are given.

Length (L cm) 20 £ L < 45 45 £ L < 60 60 £ L < 70 70 £ L < 90

f 21 27 83 19

i) Calculate an estimate of the mean length.


ii) Draw a histogram to represent this data

ii) Find, using the drawn histogram, an estimate of the mode.

2. The times taken by 76 cyclists to complete a 700 meter sprint race are given.

Time taken (seconds) 50 - 53 54 - 58 59 - 64

No cyclists ( f ) 14 32 30

i) Calculate an estimate of the mean time taken.


ii) Calculate the class densities and, on a sheet of graph paper, construct a histogram to illustrate these data.

iii) Find an estimate of the modal time taken.


iv) Calculate an estimate of the median time taken.

4. Some students investigated the departure times of 200 commercial aircraft. They recorded the
number of minutes by which each aircraft’s departure was delayed. Their results are shown below.

i) Copy and complete the cumulative frequency table.

24
Delay time No aircraft ii) On graph paper, draw and label suitable axes for delay time and for
number of aircraft and, by plotting the points from your table, construct a
(t minutes) (f)
histogram.
0 £ t < 10 48
iii) Use the histogram to estimate a) modal delay time,
10 £ t < 20 98
b) the number of aircraft that were delayed by less than 8 minutes,
20 £ t < 30 46
c) the number of aircraft that were delayed by 35 minutes or more.
30 £ t < 40 8 d) Calculate an estimate of the mean and median

MEASSURES OF CENTRAL TENDENCY


TYPES OF DISTRIBUTIONS
1. NORMAL/SYMMETRICAL DISTRIBUTION
The mean ,the mode and the median are all located at the same point.
2. POSITIVELY SKEWED DISTRIBUTION, SKEWED TO THE RIGHT
Observations are moistly concentrated towards smaller values. There are some extremely high
values.
3. NEGATIVELY SKEWED DISTRIBUTION. SKEWED TO THE LEFT
Observations are mostly concentrated towards larger values and there are extremely low values.

25
26
HISTOGRAMS OF UNEQUAL CLASS INTERVALS

The process of constructing histograms of unequal class intervals is different from those with equal class
intervals as some calculations are required. In the vertical axis FREQUECY DENSITY has to be calculated

FREQUENCY
FREQUENCY DENSITY =
CLASS INTERVAL

EXAMPLE

The numbers of megabytes used for storing documents in on 339 computers at a college are given below.
DRAW A HISTOGRAM TO SHOW THIS DATA

STORAGE USED NUMBER OF COMPUTERS (F)


0 - 10 51
10 - 30 84
30 - 60 96
60 - 100 108

THE CALCULATIONS ARE

CLASS FREQUENCY INTERVAL FREQUENCY


DENSITY FD=
CLASS INTERVAL
0 - 10 51 10 51 ÷10 = 5.1
10 - 30 84 20 84 ÷20 = 4.2
30 - 60 96 30 96 ÷300 = 3.2
60 - 100 108 40 108 ÷40 = 2.7

THIS IS HOW THE HISTOGRAM WILL LOOK LIKE. NOTE. FREQUENCU DENSITY ON THE VERTICAL AXIS

27
CLASS EXERCISE

1. A biologist planted 684 seeds and recorded the times taken for the seeds to germinate.

Time taken (hours) 21 - 24 25 - 28 29 - 33 34 - 39 40 - 47 48 - 50

No seeds ( f ) 120 160 130 84 160 30

i) Explain why the first class (containing 120 seeds) has an interval of 4 hours, not 3 hours.

ii) Illustrate the data in a histogram.

iii) Calculate an estimate of a) the number of seeds that germinated in less than 31 hours,

b) the number of seeds that germinated in less than 26 hours,

c) the percentage of the seeds that took between 1 and 2 days to germinate

2. The heights of a group of 130 children are given in the table.

Height (cm) No children ( f )

110 - 120 40 i) Illustrate the data in a histogram.

130 - 160 60 ii) Calculate the percentage of these children whose

170 - 190 30 heights are less than 1.65 metres.

iii) Calculate an estimate of the number of these

children whose heights are less than 118 cm.

3. The departure delay-times of some buses last month are represented in the histogram below.

28
Buses per 2 minutes (density)
i) How many buses are represented?

20 ii) Calculate an estimate of the percentage


of these buses that departed

a) less than 10 minutes late,

b) at least a quarter of an hour late.

0 iii) The bus company manager said that only


2 4 12 18 20

29
30
31
MEASURES OF CENTRAL TENDENCY FOR GROUPED DATA

In grouped data distributions, the mean, the mode and median can only be estimated.

For example the table below shows the heights in meters reached by mountain climbers in a day.

Caculate an estimate of (a) the mean

(b) the mode

(c) the median

Height (H metres) 200 - 500 600 - 1000 1100 - 1300 1400 - 1700 1800 - 2100

f 40 50 42 24 44
Answers

THE MEAN

For the mean we will use mean = =


∑ fm
n

MEASURES OF DISPERSION

32
A data distribution cannot be described by measures of central tendency alone. For example knowing
that the mean mark in a test is 50% is not informative enough as it give no idea how varied the marks
are. Therefore we need to know how values are spread and this is the purpose of finding measures of
dispersion.

Measures of dispersion commonly used to test the spread of data are

 Range
 Interquartile range
 Standard deviation
 Mean deviation
1. RANGE
This is the simplest measure of spread and is commonly used. The range is the difference
between the largest and smallest possible values in a data distribution.
EXAMPLES
 Classes in a certain school have 22, 34, 36,41,45,37 students.
The range will be 45 – 22 which is equal to 23.
 In a company 10 people earn P20 an hour, 5 people earn P9.00 while two earn 3.50.
The range in salaries will be P20 – P3.50 = P 16.50
 The lengths of 30 pencils in a grouped frequency table are as follows

Length(cm) 8≤l˂9 9≤l˂11 11≤l˂13 13≤l˂17


No of pencils (f) 7 12 6 6

The range cannot be found exactly in this case as the actual lengths are not given. The lower
limit of the first class and the upper limit of the last class are used to find the lower limit and
upper limit of the range.
Lower limit of the range = 13 – 9 = 4
Upper limit of the range = 17 – 8 = 9
So the range is between 4 and 9.
EXERCISE
Find the range of the following sets of data
(a) 4,7,7,9,13,21 ( b) -3,5,18,24,29,37 (c) z frequency
60 17
(d) 65 23
x frequency
10 5 70 28
11 7 75 59
12 9 80 21
13 4 85 12
2. (a) The times taken to the nearest minute by students to travel to school is summarized in a table
below. Find the lower and upper limit of the range

33
Time taken 20 - 21 22 - 24 25 - 30
No of students (f) 6 39 5

(b)The grouped frequency table below shows total amounts spent by customers in a
supermarket. Find the lower and upper limits of the range in the amount spent by customers.

Time taken 5 ≤ t ˂ 15 15 ≤ t˂ 25 25 ≤ t˂ 40 40 ≤ t ˂ 100


No of students (f) 35 66 108 41

2. INTERQUARTILE RANGE FOR UNGROUPED DATA


If a data distribution consists of even one extreme value, the range will not be representative of
the spread of a majority of values. The inter quartile range is a measure of dispersion that gives
the range of the middle half of the values; therefore it is not affected by extreme values. The
inter quartile range is the difference between the upper quarter and the lower quarter. The
median divides the data into two equal parts. The lower quarter divides the first half into two
equal parts while the upper quarter divides the last half into two equal parts. In fact the lower
quarter, median and upper quarter divides the data into four equal parts.

The actual positions for the lower and upper quarters, depend on wether the number of values
in a distribution is odd or even.
QUATILE POSITIONS FOR AN ODD DISTRIBUTION

LOWER QUARTILE MEDIAN UPPER QUARTILE


¿1 N +1 ¿3
( N +1) ( N + 1)
4 2 4

QUATILE POSITIONS FOR AN EVEN DISTRIBUTION

LOWER QUARTILE MEDIAN UPPER QUARTILE


N +2 N 3N +2
4 2 4

EXERCISE
1. For each of the following sets of data find (a) the lower quarter (b) the upper quarter and (c)
the inter quartile range.
(i) 20,6,28,34,16 (ii) 15,25,29,37,71,43,17,43,71,15,7,14
(ii) 43,31,23,37,58,61,72,70,77,68,82,39,67,53,61
(iii)

p 10 20 30 40 50 60
frequency 2 13 5 11 13 15

34
(ii)

score 0 1 2 3 4 5 6 7 8 9 10
frequenc 1 1 6 9 10 18 27 11 9 5 2
y

(iii)

The table shows the number of sons and daughters that each of 259 men has.

SONS
0 1 2 3 TOTALS
0 4 41 19 7 71
DAUGHTERS

1 31 58 11 5 105
2 22 19 10 6 57
3 7 8 7 4 26
TOTALS 64 126 47 22 259
(a) Find the inter quartile range for the number of (i) daughters
(ii) sons

(b) Draw a frequency table showing how many children(sons and daughters) these men have
(c) Find the inter quartile range of the number of children.

35
CUMMULATIVE FREQUENCY DIAGRAMS

Estimates for values at certain positions within a distribution (median and quartiles) can be read
from frequency diagrams. By estimating quartiles we can also estimate the interquartile range of
a data distribution in frequency table and diagrams.

QUARTILES, DECILES AND PERCENTILES

As well as estimating quartiles, we can also estimate deciles and percentiles from a cumulative
frequency diagram. The position of the decile is any number of tenths and the percentile is any
number of the hundredth of the total frequency.

decile percentile

x x
xN xN
10 100

LINEAR INTERPOLATION FROM A CUMMULATIVE FREQUENCY TABLE

Linear interpolation is a method used to calculate an estimate of any value in any particular
position in a distribution, such as the median and the quartiles.

P−CF
Estimated value = LCB + (W)
Fm

 LCB = lower class boundary


 P = position of value
 CF = cumulative frequency before the estimated value
 W = class width

EXERCISE

TIME TAKEN NO OF Estimate (a) the IQT


(minutes PASSENGERS(Cf)
T˂0 0 (b) The 9th decile
T˂3 15 (c) The 60th percentile
T˂8 40
T˂15 70
T˂18 88 36
T˂20 92
Of the times taken by passengers to reach their work place.

STANDARD DEVIATION AND VARIANCE

The standard deviation is the positive square root of the variance. Variance of a distribution
is in simple terms, the difference between two squared quantities which are;
 The mean of the square of values
 The square of the mean of the values
 FOR UNGROUPED SETS OF DATA

For a set of N numbers denoted as X

VARIENCE = MEAN OF SQUARES – SQUARE OF MEAN

VARIANCE = ∑
X2 ∑ X
[ ]
2


N N

STANDARD DEVIATION = + √ VARIANCE

√ ∑ X2 – ∑ X
[ ]
2
STANDARD DEVIATION =
N N

√ [ ]
∑ fX 2 – ∑ fX
2
STANDARD DEVIATION =
N N

EXERCISE

Find the standard deviation for this set of numbers

(a) 5,7,11,14,18 (b) 7,3,15,22,25,31,32,25 (c) 10,45,77,63,88,85,90

37
(d)

x 10 20 30 40 50 60 70 80
f 3 4 2 11 7 9 4 5
STANDARD DEVIATION AND VARIANCE FOR GROUPED DATA VALUES

In grouped data values the standard deviation can only be estimated. Class mid values are used instead
of X or individual values.

√ [ ]
2

STANDARD DEVIATION = ∑ fm2 – ∑ fm


∑f ∑f

Class mid values for grouped data should be calculated carefully, especially when values have been
given to a certain degree of accuracy

EXAMPLE

The capacities of 85 containers were recorded in the table below. Find the standard deviation

CAPACITY FREQUENCY
20˂ C ≤ 24 7
24˂ C ≤ 28 15
28˂ C ≤ 30 29
30˂ C ≤ 32 22
32˂ C ≤ 35 12

The times taken for seeds to germinate are recorded below. Calculate the standard deviation

Time 24 - 26 26 - 30 30 - 35 35 - 50 50 - 60 60 - 70
taken
No – 1 3 7 72 192 55
seeds (f)

The heights of 200 male students and 100 females are summarized in a table below. Find the standard
deviation for (a) Males (b) females

height 145 ≤ h ˂ 155 155 ≤ h ˂ 170 170 ≤ h ˂ 185 185 ≤ h ˂ 120


Males (f) 37 102 59 2
Females (f) 24 66 10 0

38
ADVANTAGES AND DISADVANTAGES OF MEASURES OF DISPERSION

ADVANTAGES DISADVANTAGES
 Simple to calculate  Based only on two sets of data.
 Commonly used to  Easily affected by extreme
RANGE compare spread values.
between similar sets
of data.

 Unlikely to be  Depends on the median being an


INTER QUARTILE affected by extreme accurate value.
RANGE values  Based only on two sets of data.
 Can be calculated
even when other
values are not
recorded correctly
 Takes into account all  Easily affected by errors and
sets of data extreme values.
STANDARD DEVIATION  Can be used in further  Depends on the mean being an
calculations accurate measure
 Has an application
when values deviate
from the mean.

MEASURES OF DISPERSION FOR COMBINED SETS OF DATA

To calculate the mean value for combined sets of data we need to find the sum of all values in both sets
and also the total number in those sets. Similarly to find the standard deviation we need to find the sum
of all the squares in those sets.

S of A and B = √ mean of squares of A∧B−square of the meanof A∧B

EXERCISE

1. Given that ∑ p = 420 and ∑ q = 1290 find the mean of P and that of Q if there are 25 values
of P and 75 values of Q.

If ∑ p = 9000 and ∑ q = 25 000 calculate the variance and standard deviation.


2 2

39
2. Given that ∑ x = 72 and ∑ y =240 find the mean of P and that of Q if there are 12 values of
x and 12 values of y.

If ∑ x = 2400 and ∑ y = 24 720 calculate the variance and standard deviation.


2 2

3.

NUMBER OF SUM OF MEAN SUM OF VARIENCE STANDARD


VALUES VALUES SQUARES DEVIATION
P 56 2352 a 100 800 b c
Q 44 d 53.5 e 49 f

Calculate the values a,b,c,d,e,f

For the combined distribution P and Q find (a) the mean (b) the variance

4.

NUMBER OF SUM OF MEAN SUM OF VARIENCE STANDARD


VALUES VALUES SQUARES DEVIATION
M 40 P -1.2 208.144 q r
N 80 48 S t u 2.4

Calculate p,q,r,s,t,u

For the combined distribution M and N find (a) the mean (b) the standard deviation

40
LINEAR TRANSFORMATION OF DATA

Linear transformation is a process by which one set of numbers is mapped onto a set of another set by
addition and or multiplication.

For example, the linear function y = 3x + 2 maps values of x (1,2,3) onto the values of y (5,8,11). We say
the distribution 5,8,11 has been derived from 1,2,3.

Example

From a distribution 5,8,20,35 many others can be found

Original distribution Operation required Derived distribution


Adding 1 6,9,21,36
Subtracting 2 (adding -2) 3,6,18,33,
5,8,20,35 Multiplying by 2 10,16,40,70,
1 1.25,2,5,8.75
Dividing by 4 (multiplying by ¿
4
Multiplying by 2 and adding 1 11,17,41,71
Identical operat

MEASURES OF CENTRAL TENDENCY FOR DERIVED DISTRIBUTION

If measures of central tendency for a particular distribution are known, then measures of central
tendency for a distribution derived by addition or multiplication can be found directly.

All the three measures of central tendency are affected by the same operations that are used to
derive an operation, whether its addition or multiplication or a combination of these. Identical
operations must be performed on all original values.

EXERCISE

1. A set of values ( p, 13,18,29,29) has mode = 29, mean = 20, and median = 18. Without
calculating the value of p, calculate the three measures of central tendency for derived
distributions
(a) (p+2), 15,20,31,31
(b) (p – 5),8,13,24,24
(c) 2p,26,36,58,58
p
(d) , 6.5,9,14.5,14.5
2

41
2. Variable G has a mode = 6.2 find mean = 5.6 and a median = 5.85 find;
G
(a) The mean of –1
5
(b) The median of 2(G + 5)
(c) The mode of 1.7 + 1.5G

3. Variable R shown on the table has mean = 4.8, variables W and V are derived from R.
Variable R

R 3 4 5 6
f a b c d

Variable W

R 7 9 11 13
f a b c d

Variable V

R 2.0 2.5 3.0 3.5


f a b c d

(a) Find the mean of W


(b) Find the mean of V

4. The ungrouped distribution P shown in the table has a mean of 17.4

P 6 14 22 30 38
f a b c d e

Calculate an estimated mean for derived distributions W,X,Y

W 5 - 13 13 - 21 21 - 29 29 - 37 37 - 45
f a b c d e

X 0≤x˂4 4≤x˂8 8 ≤ x ˂ 12 12 ≤ x ˂ 16 16 ≤ x ˂ 20
f a b c d e

Y 6.8 – 8.4 8.4 – 25.2 25.2 – 27.6 27.6 – 44.4 44.4 – 46.8
f a b c d e

42
MEASURES OF DISPERSION FOR DERIVED DISTRIBUTIONS

All measures of dispersion are


 NOT AFFECTED BY ADDITION AND SUBTRACTION
 AFFECTED BY MULTIPLICATION AND DIVISION BY THE SAME FACTOR

EXAMPLES

EFFECT OF ADDITION AND SUBTRACTION ON THE RANGE

If the smallest value in a distribution is 3 and the largest value is 10 then the range is 10 – 3 = 7. If both
values are increased by 20 then (10 + 20) – (3 +20) = 30 – 23 = 7. The range was NOT NAFFECTED. The
range will also not be affected if we made subtractions by the same value.

EFFECT OF MULTIPLICATION ON THE RANGE

If the smallest value, 3 is multiplied by 11, and the largest value 10 is similarly multiplied by 11 the
range will be ( 10 x 11) - (3 x 11) = 110 – 33 = 77. The range has changed drastically! But note the initial
range of 7 has also been. Multiplied by 11 to become 77. Division also has the same effect on the
range.

EFFECT ON THE STARDARD DEVIATION

If the range is affected by multiplication, and not addition, we should expect that the standard deviation
as a measure of dispersion is also similarly affected.

EXAMPLE

The tables below show values for distribution X and a derived distribution 3X + 1

X f fx fx2 3X +1 F(3X +1) F((3 X +1)2


3 4 12 36 10 40 400
4 7 28 112 13 91 1183
5 6 30 150 16 96 1536
total 17 70 298 43
total 227 3119
Standard deviation of X is standard deviation of 3X + 1 is

0.758 2.274

The standard deviation of 3X + 1 is three times the standard deviation of X . Even the variance of 3X + 1
is 3 times the standard deviation of X.

CHECK FOR YOURSELF BY USING FORMULAS.

EXERCISE

1. The distribution of a variable P has a range of 13 and a standard deviation of 5.4. find the range
and standard deviation of the derived distributions
(i) P +7 (II) 3p
1
(ii) 5P + 4 (IV) p+ 5
5
2. Each value in a distribution is reduced by 25%. If the new standard deviation is 3.4. Find the
original standard deviation.
3. The masses of a group of children have a mean mass of 46.9 kg and a standard deviation of
9.1kg. find the new mean and the new standard deviation if the mass of each child is reduced
by:
(i) 900g
(ii) 5%
(iii) 900g and then by 5%
4. A set of numbers 6, 10,t, 14,15 has a mean M and a standard deviation S. find the mean and the
standard deviation for derived distributions
(i) 10,14, t + 4, 18,19
(ii) 18,30,3t,42,45
(iii) 2,4, 0.5t – 1,6,6.5

PERCENTAGE INCREASES AND DECREASES SHOULD BE TREATED AS MULTIPLICATION, THEREFORE


THEY HAVE AN EFFECT ON MEASURES OF DISPERSION.

44
SCALING

STANDARD SCORES

Standardized scores are used to compare two sets of unrelated data. We cannot compare oranges and
apples unless we have a common language for them, that is standardize them. We make them to have
something in common (that is a z score). Only then we can make some judgment about them.

A Z SCORE MEANS HOW FAR AWAY A PARTICULAR VALUE IS FROM THE MEAN

A normal curve (bell shaped distribution) A z score curve

A z score of 0, means a data value fell exactly at the mean. A positive Z score means a value is above the
mean while a negative Z score means a value is below the mean. For example a Z score of 2 means that
a particular value was 2 standard deviations away from the mean. A Z value of negative 3 means the
value was 3 standard deviations below the mean value.

OBSERVEDVALUE−MEAN
STANDARD SCORE =
STANDARD DEVIATION

X− X
Z SCORE =
S

45
EXAMPLE

Katherine scores 75 % in English and 62 % in Mathematics. In which test did she do well? From face
value we can assume that it is in English. However we might need more information about the two tests
to arrive at a conclusion. If we are given that the mean for English is 70% and standard deviation is 5
while for Mathematics the was mean 50% while the standard deviation is 3 for Mathematics.

The standard score for English

X− X
Z SCORE =
S

75−70
Z=
5

5
Z =
5

Z SCORE = 1

A standard score for Mathematics

62−50
Z =
3

12
Z =
3

Z SCORE = 4

These results indicate that Katherine actually did much better in Mathematics than in English.

However it is important to note that a high Z score does not always indicate high achievement.

 A lower Z score for time taken to complete a task will indicate that a particular participant was
significantly better than others .

EXERCISE

1. Given that the mean of a data distribution is,60 and the standard deviation is 8. Write the
following values as standardized scores
i. 68 ii. 52 iii. 44 iv. 80 v. 34
2. Complete the following table in terms of X and

STUDENT RAW MARK STANDARD SCORE


PAT 68
NIKKY 52
JANE 44

46
FATIMA 80
PONTSHO 34
3. Complete the following table for standard scores in terms of X and using a mean of 50 and a
standard deviation of 16

STUDENT RAW MARK STANDARD SCORE


KABO 66
LORATO 34
TEBOGO 50
SHAUN 42
LESH 58
OPELO 90
MOREMI 26

SCALED MARKS
Scaling is a linear transformation of one set of numbers to another set that has a chosen mean and a
chosen standard deviation. This process can be used to compare different activities such as;

 Examination results
 Athletic races
 Test results

A raw score can be transformed to a scaled mark by using the formula

¿ raw mark −raw mean


=
raw standard deviation
scaled mark−scaled mean
scaled standard deviation

X− X Sy−Sy
=
sx Ss

47
EXAMPLE

In a test, the raw mean was X = 60 and the raw standard deviation was Sy = 10. A raw mark of 80 is to be
scaled to a mean of 65 and a scaled standard deviation of 12. Find the scaled mark

X− X Sy−Sy
=
sx Ss

¿ 80−60 Sy−65
=
10 12

¿ 20 Sy−65
=
10 12

24 = Sy – 65

89 = Sy

EXERCISE

1.

48
2.

49
BOX AND WHISKER PLOTS

Box and whisker plots is a summary plot based on the median and the inter quartile range which
contains 50% of the values, the highest and lowest values excluding outliers. A line across the box
indicates the median.

DRAWING A WHISKER PLOT

1. ORDER THE DATA IN NUMERICAL ORDER


2. FIND THE MEDIAN, LOWER QUARTER AND UPPER QUARTER
3. FIND THE LOWEST AND HIGHEST VALUES IN THE DISTRIBUTION

EXAMPLE

For the following ordered data distribution construct a box plot

3 5 5 6 6 7 8 10 11 12 12

n+1 11+1 th
The median Q2 is [ ¿ which is = 6 position
2 2

The median is 7

The lower quarter is 5

The upper quarter is 11

The inter quartile range is 11 – 5 = 6

The least value is 3

The largest value is 12

Th box plot can hence be represented as follows

50
BOX AND WHISKER PLOTS

CLASS EXERCISE

Represent the following in a box and whisker diagram

Question 1

a) 2 51 43 54 53 51 62 49 50 63 60
b) 45 58 34 42 52 49 50 45 51
c) 75 65 78 79 76 79 72 82
d) 110 98 91 102 89 75 108 118 152

Question 2

Use the IQR to identify any outliers in each of the following

a) 25 12 31 26 27 29 32
b) 35 46 50 32 54 44 60
c) 57 53 52 31 48 58 64 86 56 54 55
d) 34 42 45 45 49 50 51 52 58
e) 65 72 75 76 78 79 79 82

Question 3

Draw box and whisker plots for the above indicating outliers if any.

51
COMPARING DATA BY BOX AND WHISKER PLOTS

EXAMPLE

Two campus book stores are having a price war of their first year maths books. James a first year
maths major student goes into each store and tries to establish the cheapest price he can find. He
looks at the prices of randomly chosen five books for maths courses in each store and collects the
following data.

Store A Price 95 75 110 100 80


Store B price 120 60 89 84 100

Show this data in a box and whisker on the same plot.

Store A Store B
Minimum value = 75 Minimum value = 60
Maximum value = 110 Maximum value = 120
80+75 60+84
Q1 = = 77.5 Q1 = = 72
2 2
Q2 = 95 Q2 = 84
100+110 100+120
Q3 = =105 Q3 = =110
2 2
IQR = Q3 – Q1 IQR = Q3 – Q1
105 – 77.5 = 27.5 110 – 72 = 38
OUTLIERS OUTLIERS
LOWER BOUNDARY LOWER BOUNDARY
Q1 – 1.5IQR Q1 – 1.5IQR
77.5 – 1.5(27.5) = 36.25 72 – 1.5(38) = 15
UPPER BOUNDARY UPPER BOUNDARY
Q3 + 1.5IQR Q3 + 1.5IQR
105 + 1.5(27.5) = 146.25 110 + 1.5(38) = 167
NO OUTLIERS NO OUTLIERS

52
James should buy from store B as the prices there are lower as shown by a lower median value.
Both data sets had no outliers but book prices in store B were more varied as shown by a higher
IQR.

COMPARING DATA BY BOX AND WHISKER PLOTS

EXAMPLE

Two campus book stores are having a price war of their first year maths books. James a first year
maths major student goes into each store and tries to establish the cheapest price he can find. He
looks at the prices of randomly chosen five books for maths courses in each store and collects the
following data.

Store A Price 95 75 110 100 80


Store B price 120 60 89 84 100

Show this data in a box and whisker on the same plot.

Store A Store B
Minimum value = 75 Minimum value = 60
Maximum value = 110 Maximum value = 120
80+75 60+84
Q1 = = 77.5 Q1 = = 72
2 2
Q2 = 95 Q2 = 84
100+110 100+120
Q3 = =105 Q3 = =110
2 2
IQR = Q3 – Q1 IQR = Q3 – Q1
105 – 77.5 = 27.5 110 – 72 = 38
OUTLIERS OUTLIERS
LOWER BOUNDARY LOWER BOUNDARY
Q1 – 1.5IQR Q1 – 1.5IQR
77.5 – 1.5(27.5) = 36.25 72 – 1.5(38) = 15
UPPER BOUNDARY UPPER BOUNDARY
Q3 + 1.5IQR Q3 + 1.5IQR
105 + 1.5(27.5) = 146.25 110 + 1.5(38) = 167
NO OUTLIERS NO OUTLIERS

53
James should buy from store B as the prices there are lower as shown by a lower median value.
Both data sets had no outliers but book prices in store B were more varied as shown by a higher
IQR.

CLASS EXERCISE

BY CONSTRUCTING A DOUBLE WHISKER PLOT MAKE A REASONABLE COMPARISM BETWEEN


THESE DATA SETS.

1. EMPLOYEES ARRIVING AT WORK BY CAB OR THEIR OWN CAR, WHICH MODE OF TRANSPORT
IS MORE RELIABLE

CAR 14 18 16 22 25 12 32 16 15 10
CAB 12 10 13 14 9 17 11 10 8 11
2. STUDENTS MARKS IN TWO TESTS

MIN VALUE Q1 Q2 Q3 MAX VALUE


TEST 1 40 55 60 75 100
TEST 2 15 50 75 85 95
Using a scale of 2 c
m to 10 units construct a double box whisker plot and make a comparism on students’
performance.

SKEWENESS
One of the main reasons of drawing a stem and leaf or a box whisker is to quickly spot the trend
in the spread of the data.
SYMMETRICAL DISTRIBUTION
The box is symmetrical. The box is in the middle and the whiskers are of equal length.

54
CLASS EXERCISE

BY CONSTRUCTING A DOUBLE WHISKER PLOT MAKE A REASONABLE COMPARISM BETWEEN


THESE DATA SETS.

3. EMPLOYEES ARRIVING AT WORK BY CAB OR THEIR OWN CAR, WHICH MODE OF TRANSPORT
IS MORE RELIABLE

CAR 14 18 16 22 25 12 32 16 15 10
CAB 12 10 13 14 9 17 11 10 8 11
4. STUDENTS MARKS IN TWO TESTS

MIN VALUE Q1 Q2 Q3 MAX VALUE


TEST 1 40 55 60 75 100
TEST 2 15 50 75 85 95
Using a scale of 2 c
m to 10 units construct a double box whisker plot and make a comparism on students’
performance.

SKEWENESS
One of the main reasons of drawing a stem and leaf or a box whisker is to quickly spot the trend
in the spread of the data.
SYMMETRICAL DISTRIBUTION
The box is symmetrical. The box is in the middle and the whiskers are of equal length.

55
EXAMPLE 2

3.9 4.1 4.2 4.3 4.3 4.4 4.4 4.4 4.4 4.5 4.5 4.6 4.7 4.8 4.9 5.0 5.1

The minimum value is 3.9 while the highest value is 5.1

4.3+ 4.3 4.7+ 4.8


Q1 is = 4.3 and Q3 is = 4.75
2 2

IQR is 4.75 – 4.3 = 0.45

56
STEM DIAGRAM

Cumulative frequency diagrams for ungrouped data are sometimes referred to as STEM diagrams or
STEP POLYGONS because of their appearance.

EXAMPLE

The number of eggs laid each day by 7 hens for a period of 21 days was recorded in a table as
follows;

NUMBER OF EGGS F
4 1
5 4
6 7
7 5
8 4
total 21

57
GROUPED FREQUENCY TABLE

NUMBER OF EGGS F CF
X ˂4 0 0
4 ≤X˂5 1 1
5 ≤X˂6 4 5
6 ≤X˂7 7 12
7 ≤X˂8 5 17
8 ≤X˂9 4 21

COODINATES (NUMBER OF EGGS AGAINST CUMULATIVE FREQUENCY)

(4,0) (4,1) (5,1) (5,5) (6,5)( 6,12) ( 7,12) ( 7,17) ( 8,17) ( 8,21).

These are then plotted on a pair of axis and joined by a straight line.

EXERCISE

Display the data given in a step diagram.

6 5 3 3 5 3 1 2 4 2 2 5 1 5 2 2 3 8
2 4 5 3 3 0 2 5 0 1 0 1 3 0 3 12 7 1

MEASURES OF ASSOCIATION
Correlation is a description of a relationship between variables. Some variables are related while others
are not. For example

 Is there a connection between passing and revising


 Are shoe sizes related to heights of student
 Is body weight connected to height

As we seek to answer these questions, we are simply trying to establish a CORRELATION between
variables.

TYPES OF CORRELATION

POSITIVE CORRELATION

58
The points slope upwards. The values of one set of data increases as the other set of data. If the points
are very near the trend line, then we have a very strong positive correlation.

NEGATIVE CORRELATION

The points slope downwards. As one set of data increases the other set decreases. The values are
indirectly or inversely proportional.

NO CORRELATION OR ZERO CORRELATION

The plotted points are scattered all over. The variables show no relationship.

INDEPENDENT AND DEPENDENT VARIABLES

If two variables are compared, the value of one of them is usually controlled by the value of the other.

 The controlled variable is the dependent variable


 The variable that is in control is the independent variable
 It is conventional to plot the independent variable on the horizontal axis of the scatter plot.

THE LINE OF BEST FIT/ REGRESSION LINE

Sometimes there is no straight line passing through the points but you can still draw the line of best fit
which comes as close as possible to fitting all the points. The closer the points are to the line, the
stronger the correlation.

DRAWING THE LINE OF BEST FIT (THE SEMI AVERAGE METHOD)

 Calculate the mean for X values and Y values to have a point (X,Y)
 Divide the points into two groups. Group 1 are points less than Y while Group 2 are points
greater than X
 Calculate the arithmetic mean for group 1 to have (x1,y1) , first semi average.
 Calculate the arithmetic mean for group 2 to have (x2,y2), second semi average.
 Draw the line of best fit. It should pass through (X,Y) and as close as possible to the two semi
averages.

1. A group of students sat for two papers in science. The percentage marks for ten of the students were:

Student A B C D E F G H I J

Paper 1 (X %) 54 40 74 62 80 38 36 44 84 68

59
Paper 2 (Y %) 52 32 68 62 68 52 30 38 72 56

i) On graph paper, draw axes for X and for Y from 0 to 100 using 1 cm to represent a mark of 10%.
ii) Draw a scatter diagram for the data.
iii) Showing your working, calculate the average point and the two semi-average points.
iv) Plot the three average points onto your scatter diagram and draw a line of best fit through them.
v) Using the average point and one other point, find the gradient of the line to 2 decimal
places.
vi) Find, to the nearest integer, the value of the Y-intercept and write down the equation of the line of
best fit.
The point for student F is furthest from the line.
vii) What does this tell you about student F?
2. The recommended daily dose of a brand of wormer powder for domestic cats is given.

Mass of cat ( M kg) 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

Daily dose (D g) 2.0 3.0 4.0 5.0 6.0 8.0 9.0 10.0 11.5 12.0 14.0 15.5

i) Which is the independent variable?


ii) On graph paper, draw axes for mass from 0 to 8 kg and for daily dose from 0 to 16 g.
iii) Draw a scatter diagram for these data.
iv) Find the co-ordinates of the average point for these data.
v) Explain why the line of best fit should pass through the point (0, 0).
vi) Plot the average point onto the scatter diagram and draw a line of best fit.
vii) Calculate the gradient of the line, giving your answer to 2 places of decimals, and write down the
equation of the line. [Use M and D, not X and Y]
viii) Use your equation to find the value of D when M = 50.
ix) Explain why the answer that you have obtained in viii) has no practical meaning.

3. The masses and volumes of 13 specimens of stone, collected from a dry riverbed, are given.
Specimen A B C D E F G H I J K L M

Mass 26.0 17.0 14.0 8.0 14.0 29.0 23.5 35.0 10.0 45.5 29.5 51.5 25.0

Volume ( 15.0 22.0 7.5 11.0 18.0 16.0 30.5 20.0 14.0 25.0 38.5 29.0 33.5

i) Draw and label axes for volume horizontally from 0 to 40 cm3 and for mass vertically from 0 to 60g.
ii) Draw a scatter diagram to illustrate these data.
iii) Suggest a practical reason to explain why the points fall into two distinct groups.

60
iv) For each group of points, find the co-ordinates of the average point and plot these onto the
diagram.
The lines of best fit should intersect at a certain point.
v) Write down the co-ordinates of this point and explain why both lines should pass through it.
vi) Draw a line of best fit for each of the two groups of points.
vii) Calculate, correct to 2 decimal places, the gradients of the two lines and explain their meaning in
the context of this question.
4. A group of tourists from France visited Thailand on a holiday in June 2003.
The visitors exchanged different amounts of French Francs at various banks in Thailand.
The amount that each exchanged, in Francs, and the amount that each received, in Thai Bhats, were:

Tourist Jacque Emile Danny Xavier Fifi Eugen Marie Edith Andre Clement
s e

Bhats (B) 2800 4610 7150 11240 15000 4960 5700 1294 10300 13600
5

Francs (F 400 650 1000 1530 2000 700 820 1780 1440 1860

i) Draw an axis for the independent variable using 1 cm to represent 100 units and an axis for the
dependent variable using 1 cm to represent 500 units.
ii) Draw a scatter diagram to illustrate the data.
iii) By calculating appropriate average values and plotting, draw a line of best fit onto the scatter
diagram.
iv) Calculate the gradient of your line of best fit and explain its meaning in the context of this question.
v) Estimate how many Francs would have been needed to purchase 58 000 Thai Bhats in June 2003.

INDEX NUMBERS

61
WEIGHTED AVERAGES
It is much easier to work with a single set of data compared to a combination of two or more sets of
related data. The question that arises is, are these sets of data totally same or one is more significant
than the other. When sets of data are combined measures such as average (mode, mean, median) are
WEIGHTED (that is respective ratios of data taken into consideration).

COMBINING SETS OF DATA

The most commonly used measure of average is the mean which is found by dividing the sum of values
by the number of values. If two sets of data are combined and carry different weights then the weights
have to be used to find the final mean.

FOR EXAMPLE

62
Find the mean age of a 40 year old man and his 10 year old twins.

The answer is not

40+10
= 25
2

BUT RATHER
( 1× 40 ) +(2× 10)
= 20
1+ 2

The ages 40 and 10 have weights 1 and 2 respectively.

In summary for a set of weighted numbers

∑ (weight × number)
Weighted Average =
∑ ( weights)

Exercise

1. Use weights of 3 and 2 respectively to find a weighted average of 16 and 21.

2. Use weights of 11 and 14 respectively to find a weighted average of 2.4 and 3.8

3. In a business studies examination, which was taken by 382 form 4 students and 418 form 5 students,
the form 4s obtained a mean score of 45% and the form 5s obtained a mean score of 62%. Calculate, to
1 decimal place, the mean score of all the students who took the examination.

4. A group of 134 girls and 166 boys sat for a literature in English examination. The boys obtained a
mean score of 68.5% and the mean score for all the students was 64.35%. Calculate the mean score for
the girls.

5. Students sat two mathematics papers, A and B. The teacher decided that paper B was twice as
important as paper A and so calculated the students’ weighted average scores by assigning a weight of 1
to paper A and a weight of 2 to paper B. The table below shows some of the scores of four students

Name Score on paper A Score on paper B

Daniel 62 47

Eva 38 77
63
Aesop 60

Rina 55
i) Find the weighted average score awarded to a) Daniel and b) Eva.

ii) Aesop was awarded a weighted average score of 56. Find his score on paper A.

iii) Rina was awarded a weighted average score of 85. Find her score on paper B.

6. A boy made a journey on foot and then by bicycle.

He walked for 1 hour 30 minutes at an average speed of 6 km/h and then cycled 12 km at 16 km/h.

i) How far did he walk?

ii) For what length of time did he cycle?

iii) Calculate his average speed for the whole journey.

INDEX NUMBERS

Index numbers are used to show proportional changes in values and costs over a period of time. They
are used in relation to:

 Prices of goods and services


 Cost of living
 Costs involved in running a business
 Values of stocks and shares

PRICE RELATIVES

64
Price relatives are index numbers showing proportional changes in the prices of items between years.
An index number of 100 is assigned to the price of an item in a chosen base year and all past and future
prices for that item are given index numbers relative to 100.

KEY POINTS

 The price of an item and its price relative are always in the same proportion.
 Price relatives show a percentage change in the price of an item since the base year.

PRICE RELATIVE ∈YEAR A 100


PRICE∈YEAR A
= PRICE ∈ BASE YEAR

OR

PRICE∈YEAR A
PRICE RELATIVE IN YEAR A = PRICE ∈ BASE YEAR × 100

EXAMPLES

1. A bottle of shampoo cost $4.00 in 2018. In 2019 the price had increased to $4.60. find the price
relative of 2019 based on the price of 2018.

PRICE∈YEAR A
PRICE RELATIVE IN YEAR A = PRICE ∈ BASE YEAR × 100

$ 4.60
PRICE RELATIVE IN 2019 = $ 4.00 × 100

= 115
Note, the price relative of 115 shows that the price of the item increased by 15% between 2018 and
2019.

2. The table below shows bus fares from Gaborone to Harare for the years 2015, 2016 and 2017

YEAR COST ($) PRICE RELATIVE


2015 500 100
2016 485 a
2017 b 108

(i) Find the price relative of 2016 based on the price of 2015
(ii) Find the price in 2017 using 2015 as a base year

65
EXERCISE

1. In 2001 a coat cost £80. The price relatives, based on the 2001 price, were 110 in 2002 and 115 in
2003. Find the price of the coat in i) 2002, ii) 2003.

2. The table shows the prices of a computer over a three-year period.

Year 2001 2002 2003

Price (£) 1000 1050 910

i) Using 2001 as the base year, find a) the price relative for 2002 b) the price relative for 2003.

ii) Using 2001 as the base year, the price relative for the computer was 87.5 in 2004.What was the
price of the computer in 2004?

3. The price relatives of three types of fuel are given for the years 1993, 1998 and 2003.

1993 1998 2003 i) What is the significance of the three 100s in the
column for 1993?
Petrol 100 112.5 135 ii) What is the significance of the two price relatives
Diesel 100 109 109 of 109 in the row for diesel?

iii) In 1993 one unit of paraffin cost £2; how much did
Paraffin 100 105 106
one unit of paraffin cost in 1998?

iv) In 1998 one litre of petrol cost £0.81; how much did one litre of petrol cost in 1993?

v) Calculate the cost of 5 litres of petrol in 2003.

66
SIMPLE COMBINED INDEX

A simple combined Index which is commonly called COST OF LIVING INDEX can be calculated to compare
the cost of items in one year with the cost of the same items in the base year.

TOTALCOST ∈YEAR A
SIMPLE COMBINED INDEX FOR YEAR A = × 100
TOTALCOST ∈BASE YEAR

The index will indicate a percentage change in the cost of the items. This will only be relevant if exactly
the same items are bought in the two years.

EXAMPLE

The table shows the cost, and quantities of pens, pencils, and erasers used by a student in two years
2015 and 2016

item 2015 2016 Average number Cost per item Cost per item
purchased/weigh 2015 in 2016
t
Pen 80 95 25 80 cents 95 cents
Pencil 35 45 12 35 cents 45 cents
eraser 25 30 6 25 cents 30 cents

TOTALCOST ∈2016
Based on 2015, a combined index for 2016 = × 100
TOTALCOST ∈BASE YEAR

( 25× 95 ) + ( 12 × 45 ) +(6× 45)


× 100
( 25 × 80 ) + ( 12× 35 ) +(6× 25)

120.4(the index has no units)


The index shows that there was an increase of 20.4% in the two years, but only if the same quantity of
items is bought in the two years.

EXERCISE

67
1.Last year Tsaone kept 3 dogs, 7 cats and 25 tropical fish at her house as pets. The annual cost of
feeding a dog was P800, a cat was P350 and a tropical fish was P10.

i) Calculate how much it cost her to feed all of her pets last year.

This year the annual cost of feeding a dog is P1000, a cat is P400 and a tropical fish is P15.

ii) Calculate the amount that you would expect Tsaone to spend on pet food this year.

iii) Use your answers in i) and ii) to find a simple combined index, to 1 decimal place, for the costs this
year, based on last year’s costs.

iv) What assumptions have you made in calculating the index?

2. To keep his car in good condition, a man spent the following amounts in 2003:P200 on servicing the
car every 3 months; P40 on cleaning every month; one new tyre (costing P500 each) every 4 months.

i) Calculate the total amount that he spent during 2003.

In 2004 the cost of servicing the car had increased by 10%; the cost of cleaning had increased by 20%

and a new tyre cost P575.

ii) For 2004 write down a) the cost of each service, b) how much he paid each time the car was
cleaned.

iii) Hence find the amount that you would expect him to pay altogether in 2004.

iv) Use your answers to i) and iii) to find a ‘cost of car-care’ index for the man in 2004, based on 2003.

v) What three assumptions have you made in calculating the index?

3. An average family buys 150 kg of potatoes, 60 kg of cabbages and 85 kg of onions per year.

The price of 1 kg of each item is given (in £) for 2001, 2002 and 2003.

Year 2001 2002 2003 i) Using 2001 as the base year, find to 1 decimal
place, a simple combined index for these items in
Item
a) 2002,
Potato 0.28 0.34 0.40
b) 2003.
Cabbage 0.18 0.22 0.25
ii) Find a simple combined index for the items in 2003
Onion 0.32 0.35 0.38
using 2002 as the base year.

The predicted simple combined index for 2005, using 2001 as the base year, is 150.

68
For 2005 the predicted cost for 1 kg potatoes is £0.46 and for 1 kg cabbage it is £0.28.

iii) Calculate, to 2 decimal places, the predicted cost of 10 kg of onions in 2005.

WEIGHTED AGGREGATE INDEX

Costs are made up of spending on a combination of items. An aggregate index is a weighted average of
the price relatives of a combination of items. Suitable weights can be found from base year quantities or
from base year expenditure. An aggregate index for any chosen year will only be accurate if the base
year weights are valid in the chosen year.

∑ (weight × price relative )


Aggregate Index =
∑ (weights)
EXAMPLE

Jane runs a small business from a small office. Last year her business costs were;

 Office rental $150 per month


 $2400 units of electricity at $0.50 per unit
 7500 telephone units at $0.40 per unit

This year:

 Office rental has increased by $9 per month


 Electricity prices have increased to $0.55 per unit
 Telephone usage increased by $ 0.02 per unit

To calculate an aggregate index for Jane we need

(i) Suitable weights for the three items


(ii) Price relatives for the three items

Weights based on her expenditure are:

Office rental = $150 × 12 = $ 1800

Electricity = $ 0.50 × 2400 = $1200

69
Telephone = $ 0.40 × 500 = $ 3000

The weights are 1800: 1200:3000 = 3:2:5

Price relatives are:

159
Office rental × 100 = 106
150

55
Electricity × 100 = 110
50

42
Telephone × 100 = 105
40

Aggregate Index =
∑ (weight × price relative )
∑ (weights)
( 3× 106 ) + ( 2 ×110 ) +(5 ×105)
Aggregate Index =
(2+3+5)

AGGREGATE INDEX = 106.3

1. Calculate, correct to 2 decimal places, an aggregate index using i) Price relatives of 116, 103 and 96
with weights of 15, 7 and 8, respectively. ii) Price relatives of 112.5, 88.75 and 146 with weights of 11, 23
and 16 respectively. iii) Price relatives of 92, 105.5, 117 and 98 with weights of 25, 40, 23 and 12,
respectively.

2. The prices of three items in 2003 and 2004 are shown.

Price (£) in year i) Calculate the price relative for each item in 2004,

using 2003 as the base year.


Item 2003 2004
ii) Using weights of 5 for item A, 11 for item B
A 10.00 12.00
and 4 for item C, calculate an aggregate
index for 2004, using 2003 as the base year.
B 8.50 9.52

C 6.00 5.76

3. During 2003 a primary school’s budget was spent on salaries, equipment and maintenance materials.

70
The amounts spent were £100 000, £25 000 and £12 500, respectively.

i) Suggest, using the figures given above, suitable weights that could be used to calculate an aggregate

index for the cost of running the school. Give the weights as a simple ratio.

In 2004 all school staff received salary increases of 9%, the cost of equipment rose by 13% and the cost

of maintenance materials decreased by 2%.

ii) Write down the price relatives for the three items in 2004, based on 2003 prices.

iii) Using your answers to i) and ii), calculate an aggregate index for running the school in 2004, using
2003 as the base year. Give your answer correct to 2 decimal places.

iv) The cost of running the school in 2003 was £137 500.

Use the index that you have calculated to find an estimate of the cost of running the school in 2004.

There are several reasons why the index that you have calculated may not give an accurate reflection

of the cost of running the school in 2004. The reasons are that, in calculating, certain assumptions have
been made. For example: the school may not have undertaken the same amount of maintenance work
in 2004 as in 2003 and so may not have purchased the same quantity of maintenance materials.

v) Suggest two more detailed reasons why the index may not be accurate.

4. In order to calculate an aggregate cost of housing index, a woman used her housing expenses in 2003
as the base for her calculations. In 2003 she spent the following: £480 on maintenance and £280 per
month on mortgage repayments. She also used 1280 units of electricity at £0.75 per unit.

i) Calculate the amount that she spent on a) Mortgage repayments in 2003.


b) Electricity in 2003.

ii) Use the information given and your answers to i) to suggest, in simplified form, suitable weights
that the woman could use to calculate an aggregate cost of housing index.

From 2003 to 2004 the cost of each maintenance job increased by 12%, her monthly mortgage
repayments decreased by £8.40 and the cost of one unit of electricity increased to £0.81.

iii) Calculate the price relatives for maintenance, mortgage repayments and electricity in 2004, using
2003 as the base year.

iv) Use the weights and price relatives in ii) and iii) to find an aggregate cost of housing index for 2004.

v) If the woman used only 1250 units of electricity in 2004 and if the index that you have calculated is

71
Actually correct, how much did she spend on maintenance in 2004?

CRUDE AND STANDADISED RATES


A rate is a measurement of at least two quantities. The most common of these are the length of time or
the size of the population. It is important to know about population rates as these assist governments
and local authorities in areas such as

 Public health
 Policing and employment
 Allocation of resources to where they are needed

Examples of rates that can be calculated are

 Death rates per thousand per year (mortality rate)


 Incidents of diseases such as malaria per ten thousand per year
 Unemployment per hundred of the working population
 Birth rates per thousand of the female population per year (fertility rate)
 Defective items per thousand items in the factory

CRUDE RATES
These are simple rates that give the total number of events occurring in a population without reference
to the individuals or sub groups (strata) within the population. Crude birth rate and death rate measure
the number of births and deaths per thousand (‰) with no reference to the ages of those in the
population.

72
NUMBER OF BIRTHS
CRUDE BIRTH RATE/ FERTILITY RATE = ORIGINAL POPULATION ¿ ¿ ¿ ×1000‰

NUMBER OF DEATHS
CRUDE DEATH RATE/ MORTALITY RATE = ORIGINAL POPULATION ¿ ¿ ¿ ×1000‰

NOTE, crude birth rates can be calculated per thousand of the female population.

Assuming no immigration and emigration:

New population = original population + number of births – number of deaths

Example

A city has 125 000 people at the start of the year. There were 1425 deaths during the year. Calculate the
crude death rate.

1425
×1000 = 11.4‰
125 000

EXERCISE

1. The population of a small village was 4400 and during that year 33 people died. Find the crude death
rate.

2. Last year, in a small town whose population numbered 20 000, the death rate was 5.6 .

How many people died in this town during the year?

3. In a location, where the crude death rate was 7.44 , there were 125 deaths last year. Find, to the

nearest hundred, the population of the location at the beginning of last year.

4. During 2003 the crude death rate of a city, in which 5100 died, was 8 . The crude birth rate in
the city was 12 .

i) Find the population of the city at the beginning of 2003.

ii) Assuming there was no migration in or out of the city, find the population at the end of 2003.

5. The table shows information about the three classes of employees at two chemical companies
Fertilog and Drainmaster. Crude accidents rates are measured per thousand. The crude accident
rates for class C employees in the two companies were identical.

73
FERTILOG DRAINMASTER
class NO NO CRUDE NO NO CRUDE
EMPLOYEES ACCIDENTS ACCIDENT EMPLOYEES ACCIDENTS ACCIDENT
RATE RATE
A 10 A 100 B 3 150
B 24 2 C 55 11 D
C 64 4 e f 8 e

Calculate the exact values for A, B,C,D,E, and F.

STANDARDISED RATES
A rate becomes standardized when the crude values used in its calculation are weighted. In the case of
birth and death rates this is done by taking into account the different age groups that exist in the
population. The weights are called standard population figures, and they reflect the proportion of each
age group in the population.

Standard death rates give measures of the healthiness of different environments. The lower the
standardized death rate the healthier the environment.

∑ (standard population × group crude rate)


STANDARDISED RATE =
∑ (standard population)
Example

AGE GROUP GROUP NO GROUP CRUDE STANDARD


POPULATION DEATHS DEATH RATE(‰) POPULATION
UNDER 20 6000 36 6 30%
20 TO 60 7000 56 8 50%
OVER 60 2000 34 17 20%
T0TALS 15000 126 100%
The table above gives information about a village of Kwena. The standard population refer to the whole
country in Kwena is located. Find the standardized death rates per 1000 of the population.

NUMBER OF DEATHS
CRUDE DEATH RATE = ×1000‰
ORIGINAL POPULATION ¿ ¿ ¿

126
×1000 = 8.4‰
15 000

STANDARDISED DEATH RATE =


∑ (standard population × group crude death rate)
∑ (standard population)

74
¿ ¿ = 9.2‰

1. The table below gives information on the populations of two towns, Northside and Southlake.

Northside Southlake
o
Age Populatio N Populatio Group death Standard
group n n rate
population
deaths
( )

0 - 20 24 500 196 18 000 9 30%

20 - 50 19 500 234 22 000 11 30%

50 - 75 16 500 231 11 500 14 25%

Over 75 9 000 147 18 000 15%


15

i) Calculate the crude death rate for Northside. ii) Calculate a standardized death rate for Southlake.
iii) Calculate the crude death rate for Southlake. iv) Calculate a standardised death rate for
Northside. v) Giving a reason, state which of the two towns appears to be a healthier place in
which to live.

2. The data in the table below relate to the coastal towns of Blymouth and Mounton.

Blymouth Mounton
o
Age group Population N deaths Population Standard population
Death rate ( )

0 - 29 11000 99 a 8000 42%

30 - 49 b 45 6 7000 25%

50 or over 5500 c 12 5000 33%

i) Find each of the numbers represented by the letters a, b and c. ii) Calculate the crude death rate of
Blymouth. iii) Calculate a standardized death rate for Blymouth.

The crude death rates for the total populations of Mounton and Blymouth are identical.

75
iv) Calculate the total number of deaths that occurred in Mounton.

In Mounton, the death rate of the under 30s is 8.5 and there were 110 deaths in the under 50s age
group.

v) Calculate a standardised death rate for Mounton.

3. FERTILTY RATES IN BLYMOUTH AND MOUNTON

BLYMOUTH MOUNTON
AGE POPULATION NO FERTILITY POPULATION STANDARD
GROUP OF BIRTHS RATE POPULATION
0 – 19 11 000 99 A 8000 42%
20 – 29 B 450 60 7000 25%
30 – AND OVER 5500 C 12 5000 33%

(i) Find each of the numbers A,B and C.


(ii) Calculate the crude fertility rate of Blymouth
(iii) Calculate a standardized fertility rate for Blymouth.

The crude fertility rate for the population of Mounton is twice that of Blymouth.

(iv) Calculate the total number of births that occurred in Mounton.


(v) Calculate a standardized fertility rate for Mounton, where the under – 20 fertility rate is
18.0‰ and there were 756 births in the under 30 age group.

4.

76
5.A family wished to investigate changes in their cost of living. They
chose five items, as given in the table below, from a normal week’s
groceries, and recorded the price per unit of each item every three
months for a year. The price relatives obtained, taking the prices on
January 1st as base, are given in the following table, together with the
weights for each item.
Item Weight March 31st June 30th September 30th December 31st
Meat 6 106 108 107 109
Brea 4 103 104 105 107
d
Milk 5 103 109 110 113
Coffe 2 105 107 109 110
e
Tea 3 102 105 107 106
(i) (a) Calculate a simple average of relatives index for December 31st, taking
January 1st as base.
(b) State one disadvantage of using this as an index number.
(ii) Calculate, to the nearest integer, a weighted aggregate price index for
December 31st, using January 1st as a base.

77
PROBABILITY AND EXPECTATION
Probability can be defined as the numerical value, 0 ≤ P ≤ 1, that represents
the likelihood of a given event occurring. Probability refers to chance. The
probability of a single event occurring, (or two or more events) maybe
theoretical or based on observations of what has happened in the past. In
probability an EVENT refers to something that takes place. An OUTCOME is a
result of an event. SAMPLE SPACE often denoted as ‘S’ refers to a set of all
possible outcomes of an experiment. The study of probability is important
because it is used in fields such as:
 Insurance and risk management companies
 Stock market investors
 Speculators
 Weather forecasting
 Games ( like lottery and casino)
PROBABILITY SCALE
The probability of an event is measured on a scale of 0 to 1. The probability scale assigned to the event
( E) is thus known as the number known as the probability of event ‘ E’ written as P(E) and takes the
values 0 ≤ P ≤ 1. In addition to satisfying 0 ≤ P ≤ 1 if

 E is impossible then P (E) = 0


 E is certain then P ( E ) = 1
 Intermediate values of P(E) have natural interpretation P (E ) = 0.5 means E is both likely to
happen and not happen, P ( E ) = 0.0001 very unlikely to happen ( close to zero, impossible) and
P ( E ) = 0.9999 E is very likely to happen ( close to 1)

1
0 1
2

Impossible likely very likely certain

The probability of an event, written as P(OUTCOME) can be given as a fraction, decimal or percentage.

P (OUTCOME) =
NUMBER OF EVENTS FAVOURABLE ¿ THAT OUTCOME ¿
TOTAL NUMBER OF OUTCOMES

EXAMPLE.

78
A fair die is rolled, what is the probability of, (i) getting an even number, (ii) a number less than 3, (iii) a 4
or a 5.
Possible outcomes are 1,2,3,4,5 and 6

3 1
(i) P(EVEN NUMBER) = =
6 2
1
(ii) P(NUMBER LESS THAN 3) =
2
2 1
(iii) P(4 OR 5) = =
6 3

EQUALLY LIKELY AND RANDOMN SELECTION


Some events produce a set of equally likely outcomes. Each of the equally likely outcomes has the same
chance of occurring and their probabilities are equal. Obtaining a HEAD OR TAIL, when a coin is tossed,
are two equally likely outcomes. Scoring 1,2,3,4,5 or 6 when a fair die is rolled are equally likely
outcomes.

Randomly selecting a SPADE, CLUB, HEART OR DIAMOND from a pack of 52 playing cards are 4 equally
likely outcomes. The aim of selecting objects or people randomly is to give each particular object or
person the same chance of being selected, thus avoiding bias or favoritism. If X objects are randomly
selected (without replacement), from N objects then the probability of selecting any particular object is
X , Number of events
N ,Total number of possible outcomes

X
P (OUTCOME) = N

EXAMPLE
One student is selected at random from a group of 12 boys and 8 girls.

OUTCOME PROBABILITY
Selecting a particular student 1
20
Selecting a particular boy 1
20
Selecting a particular girl 1
20
Selecting a boy 12
20
Selecting a girl 8
20
EXHAUSTIVE OUTCOMES

79
The outcomes of an event are said to be exhaustive if they describe a complete set of possible results.
Some examples of exhaustive outcomes are:

EVENT EXHAUSTIVE OUTCOME


Tossing a coin Head or tail
Rolling a die 1,2,3,4,5,6
Selecting a card from a pack Heart, not heart
Rolling a normal die Even number, odd number
Rolling a normal die 1,2 and more than 2

THE SUM OF ALL PROBABILITIES OF A SET OF EXHAUSTIVE OUT COMES = 1

For any event, one of the exhaustive outcomes is certain. From the examples given in the table above;

1 1
(i) P(HEAD) + P(TAIL) = + =1
2 2
1 1 1 1 1 1
(ii) P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = + + + + + =1
6 6 6 6 6 6
13 39
(iii) P(HEART) + P(NOT HEART) = + =1
52 52
3 3
(iv) P(EVEN NUMBER) + P(ODD NUMBER = + = 1
6 6
1 1 4
(v) P(1) + P(2) + P(MORE THAN 2) = + + = 1
6 6 6
FOR EXHAUSTIVE OUTCOMES P (A) + P (NOT A) = 1

EXAMPLE

If the probability that a student will be late for class is 0.3. Then it follows that the probability that such a
student will be early for class is 0.7.

P (late) = 0.7

Then P (early) = 1 – P (late)

1 – 0.3 = 0.7

Exhaustive events are often referred to as complementary events. Either the event occurs or it does
not.

EXERCISE

1. A class list contains the names of 15 boys and 20 girls. A teacher randomly selects a name from
the list.
(i) How many names are on the list?
(ii) What is the probability that a particular student is selected?

80
(iii) What is the probability that a particular boy is selected?
(iv) Find as a simple fraction the probability that a teacher selects a boy.
(v) Is the teacher more likely to select a boy or girl?
2. The probability that it rains on any particular day in Moistville is 0.66.
(i) What is the probability that it does not rain on any particular day?
(ii) On how many days is rain not expected in Moistville in 30 days?
3. A bag contains 12 coloured balls of which three are red, four are blue and five are green. A girl
selects one ball at random from the bag.
(i) What is the probability that a particular ball is selected?
(ii) What is the probability that a particular red ball is selected?
(iii) Find the probability that the selected ball is (a) red (b) blue (c) green (d) not red

GENERALY AND = × OR = +

FOR EXHAUSTIVE OUTCOMES P (A) + P (NOT A) = 1

VENN DIAGRAMS AND BASIC SETS THEORY

Venn diagrams use regions to represent sets of outcomes that are favorable to particular events.
Outcomes favorable to a particular event are shown inside the region labeled for that event. Outcomes
favorable to both events are shown in the region where the events overlap (see diagram 3). Outcomes
favorable to nether event are shown in the region outside the sets. (See diagram 4)

81
TRIALS AND EXPECTATION
If an event is repeated, we can estimate the number of times that any outcome will occur. The repeated
event is a trial. If there are N trials then outcome A is expected to occur N ×P (A) times.
EXAMPLE

A fair die is rolled 120 times, how many times do we expect a

(i) A six
(ii) A square number

82
EXERCISE

SINGLE EVENTS
MUTUALLY EXCLUSIVE AND NON-MUTUALLY EXCLUSIVE OUTCOMES OF A SINGLE EVENT

When looking at a single event, it may be useful to know the probability that “either this or that”
happens, that is the probability of outcome A or outcome B, written P(A or B). The rule

P(A or B) = P(A) + P(B)


Is misleading. It only applies when the outcomes are mutually exclusive. If the outcomes are not
mutually exclusive, then this rule cannot be applied.

83
OUTCOMES OF A SINGLE EVENT ARE MUTUALLY EXCLUSIVE IF THEY HAVE NO COMMON FAVOURABLE
RESULT

The common favorable result of two outcomes must not be counted twice when looking for the total
number of favorable results.

EXAMPLE

The table below shows the number of animals Edwin has at the cattle post;

sheep goats total


Male 5 8 13
Female 14 22 36
total 19 30 49
How many animals are;
(i) sheep or female?
19 36 14 41
+ - =
49 49 49 49

(ii) Male or goat


13 30 8 35
+ - =
49 49 49 49

(iii) Neither female nor sheep


49 41 8
- =
49 49 49
(iv) Female or goat
36 30 22 44
+ - =
49 49 49 49
(v) Male or sheep
13 19 5 27
+ - =
49 49 49 49

EXAMPLE 2

The five cards shown below are laid face down on a table. One card is picked at random.

1 2 4 5 9

84
Consider the outcomes; (a) an odd number is selected (b) a prime number is selected

3 2
The probabilities of these outcomes are (a) P (ODD) = (b) P (PRIME) =
5 5

3 2
If we use the rule P (A or B) = P (A) + P (B) then we find P (odd) + P (prime) = + = 1.
5 5
This answer claims that any of the five outcomes are odd or prime, but 4 is neither prime nor odd. The
card 5 has been counted twice as it is favorable to both outcomes. Outcomes A and B are both mutually
exclusive so P (A or B) # P (A) + P (B). Three results are favorable to A (1, 5 and 9); two to B (2 and
5); one to both A and also B (5).

P (A or B) = P (A) + P (B) – P (both A and also B)

3 2 1 4
+ - =
5 5 5 5

4 of the cards are not odd or prime.

In summary we have

P (A or B) = P (A) + P (B) – P (both A and also B)

If A and B are mutually exclusive then

P (A and B) = 0

EXERCISE

1.The grades awarded to 40 students in a Mathematics test are given in a table below;

GRADE A B C D E F
NUMBER OF 2 8 10 9 6 5
STUDENTS
Grades A to C are credits, grade D and E are passes. Grade F is failure. One of the students is
randomly selected. Write as a simple fraction the probability of selecting a student who
obtained;
(i) Grade A (ii) grade F (iii) a credit (iv) not an E or F

2. The table below gives information on the results of an examination taken by 80 students.

Pass Fai Totals i) One student is chosen at random. Find the probability that the
l student

Boys 32 4 36
85
Girls 39 5 44

Totals 71 9 80
a) passed, c) failed, e) passed or is a boy,

b) is a boy, d) is a girl, f) failed or is a girl.

ii) A boy is selected at random. What is the probability that he failed? iii) A girl is selected at random.
What is the probability that she passed?

3. The letters A, B, B, B, C, D, D, and E are each written onto squares of card and are placed into a bag.

One card is randomly selected. What is the probability that the letter written on the card is

i) A. ii) Not D. iii) In the word CADET or BADGE. iv) A vowel or in the word DONKEY.

v) B. vi) B or D. vii) In the word TABLE or the word CHAIR.

4. Diane selects a card at random from a normal pack of 52 playing cards.

Find, as a simple fraction, the probability that the card is

i)(vi)
Red. ii) A red card or a picture card. iii) Black or a picture card.
(vii)
iv) A heart. v) A heart or picture card. vi) A red card or a non-picture card.

iii) A picture card. vii) A red picture card or a Queen. xi) Neither red nor an ace.

EXERCISE

1. The table gives details of the number of items a boy is carrying in a box.

pens pencils sweets totals


Blue 12 5 3 20
Red 7 9 2 18
Green 4 4 4 12
totals 23 18 9 50

86
(i) What fraction of the items are (a) red (b) not green (c ) pencils (d ) not pens
(ii) One of the items is selected at random from the box find the probability that it is (a ) a red
pencil ( b) neither green nor pen (c ) either red or a pencil ( d) either a sweet or blue (e)
not a green pen (f) either a pencil or not blue
(iii) A pencil is selected at random, what is the probability that it is red
(iv) A blue item is selected at random, what is the probability that it is not a pencil.

2. . Forty compounds were sampled to find the number of donkeys and dogs that were kept by the
occupants.

i) Find the number of compounds in which there were


No donkeys a) one donkey and two dogs,
No 0 1 2 3 b) two donkeys,
d 0 1 4 2 0 c) a total of three of these animals altogether. A compound with
one dog was selected.
o 1 3 2 3 0

g 2 5 3 2 1 ii) What is the probability that there were no donkeys in the compound?

s 3 3 2 6 3 A compound with at least one donkey was selected.

iii) What is the probability that there were not equal numbers of dogs and
donkeys in the compound?

A compound with one dog was selected.

iv) What is the probability that there were no donkeys in the compound?

A compound with at least one donkey was selected.

v) What is the probability that there were not equal numbers of dogs and donkeys in the compound?

he

87
TREE DIAGRAMS
When more than two events are being considered then two way tables cannot be used,
therefore another method of representing information diagrammatically is needed. Tree
diagrams are a good way of doing this. A tree diagram has branches which show different
outcomes. To find the probability of a particular outcome you multiply the probabilities on the
branches that lead to it. The probability in each set of branch adds up to 1.
EXAMPLE
A bag contains three red buttons and two blue buttons. A button is taken at random, replaced,
and then another button is selected at random.

88
EXERCISE

1. A bag contains six coloured balls of equal size: 3 are red, 2 are blue and 1 is green. One ball is
randomly selected and then replaced; another ball is then randomly selected.

i) Write down the probability that a) the first ball is red, b) the first ball is not blue, c) the second ball
is green.

ii) Calculate the probability that a) the first ball is red and the second ball is green, b) the first ball is not
blue and the second ball is red c) both balls are red, d) neither ball is blue, e) one of the balls is red and
one of the balls is green, f) just one of the balls is red g) at least one of the balls is green, h) one of the
balls is green and the other isn’t red, i) At most one of the balls is blue, j) one ball is yellow and the other
ball is neither green nor red.

89
2. A bag contains 5 balls of equal size: 3 balls are white and 2 are purple.
One ball is selected at random and, without replacement, a second ball is randomly selected.

i) Find, as a simple fraction, the probability that (a) two white balls are selected, ,( b) two purple balls
are selected c) balls of the same colour are selected) (d) balls of different colours are selected.

ii) Show that the outcomes described in i) c) and i) d) are exhaustive.

3. Arnold has 3 tins of beans and 6 tins of peas. The tins are identical in shape and size but all the labels
have been removed. If he opens two tins at random find the probability that

i) Both contain beans. ii) Neither contains beans. iii) Just one contains peas.

3. Two cards are randomly selected from a normal pack of 52 playing cards.

Find the probability that the selected cards are

i) Both red. v) Both the 7 of diamonds. ix) Two aces or two red Jacks.

ii) Both spades. vi) One of each colour. x) Of the same colour.

iii) Both Queens. vii) One black card and one red picture card. xi) Of the same suit.

iv) Both picture cards. viii) One Heart and one black King. xii) Identical.
4. . The grouped frequency table below gives the heights of 40 children.

Height (h cm) 150 £ h < 155 £ h < 160 £ h < 165 £ h < 170 £ h <
155 160 165 170 175

No children ( f ) 6 13 15 5 1

Two of these children are selected at random. Find the probability that

i) Both are less than 155 cm. iii) Both are 160 cm or more. v) At least one is less than 155 cm.

ii) Both are less than 160 cm. iv) Just one is less than 155 cm. vi) Both are 170 cm or more.

90
5. The table shows the flavours, sizes and numbers of the different fruit drinks that a boy has in his cool
box.

Orange Lemon

Small 5 4 9 i) If two drinks are randomly selected, find the probability that

Large 3 8 11 a) both are small lemon, c) one is orange and one is


lemon,
8 12 20
b) both are orange, d) one of each size is selected.

ii) If two orange drinks are selected, find the probability that both are large.

iii) If two small drinks are selected, find the probability that they are of the same flavour.

91
DEPENDENT EVENTS AND CONDITIONAL PROBABILITY
MUTUALLY EXCLUSIVE EVENTS

Two or more events are mutually exclusive if they cannot occur at the same time. For two events A and
B to be mutually exclusive then;

P (A AND B) = P (A ∩ B) = 0

DEPENDENT EVENTS

Events are mutually dependent if one event has an effect on the probabilities of the outcome of the
other event. Probabilities in such cases are said to be conditional as they depend on the outcome of
another event. This is typical when selections are made without replacement. Examples of experiments
which will produce mutually dependent events are:

 Selecting a card from a pack of cards, not replacing it, and selecting another card from the same
pack.
 Selecting two balls from a bag at the same time or one after the other.
 Selecting two students from a class.

EXAMPLE 1

Two students are randomly selected from a class of 18 girls and 22 boys. Find the probability that;

I) They are both girls ii) a girl and a boy are selected iii) a particular student is selected

ANSWERS

18 17 51
I) P (GG) = 40 × 39 = 260
II) P ( GB) = ( GB) OR (BG)

[ 18 22
×
40 39
+ ] [
22 18
×
40 39
=
33
65 ]
III) P ( A PATICULAR STUDENT) = P ( P,NP) OR P ( NP,P)

[ 1 39
×
40 39 ] [
39 1 1
+ 40 × 39 = 20 ]
TREE DIAGRAM FOR ABOY/GIRL SELECTION

92
EXERCISE

1. . A box contains 4 toffee sweets and 8 chocolate sweets: two sweets are randomly selected at the
same time.
i) Find the probability that a) two toffees are selected, c) first toffee then chocolate is selected,
b) two chocolates are selected, d) one of each type is selected.

2. . Two students are randomly chosen from a group of 10 boys and 18 girls. Find the probability of
selecting i) Two boys. ii) Students of the same sex. iii) One girl and one boy. iv) At least one girl.

3. There are 120 women attending a conference. Sixty percent of the women are married; 25% are single
and 15% are widows.
i) How many of the women at the conference are currently not married?
ii) Two of the women at the conference are selected at random. Find the probability that
a) a particular woman is selected, b) a particular married woman is selected, c) both the selected
women are widows, d) one is married and the other is single, e) just one is married,
f) one is single and the other is currently not married.

4. The diagram below is a cumulative frequency polygon illustrating the numbers of hours that 1000
doctors worked last week.

No doctors (cf )

1000

800

600 i) Find the probability that a


doctor, selected at random,
400
worked for a) less than 5hours,
b) less than 65 hours) 75 hours
or more,
200
d) between 60 and 70 hours.
0 Hours worked
50 60 70 80 90

93
To relieve the stress of long working hours, the hospital management decided to offer a free holiday to
two randomly selected doctors. ii) Find the probability that a) both the selected doctors worked for 80
hours or more, b) at least one of the selected doctors worked for 85 hours or more.

iii) Calculate an estimate of the probability that one of the selected doctors worked at least 30 hours in

excess of the other.

FAIR GAMES AND EXPECTED WINNINGS

1. A player throws a normal six-sided die in a game and is awarded a prize of £12 if he obtains a square

number. The stake required to play this game is £5.

i) Calculate a player’s expected winnings.

ii) State, with a reason, whether or not the game is fair.

94
2. A game consists of a player, having paid a stake of £6.50, drawing a card at random from a normal
pack of 52 cards. If the player draws a picture card, she wins a prize of £26.

i) Calculate a player’s expected winnings and explain why the game is not fair.

ii) How much profit can the organiser expect to make if 40 people play the game?

3. A stake of £2.75 is paid by a player to roll a normal dice. A prize of £3 is awarded if a player obtains a 6

or a prize of £4 is awarded if a player obtains a prime number.

i) Calculate a player’s expected winnings.

ii) If the organiser expects a profit of £20 per day, how many people is he expecting to play each day?

4. A square spinner numbered 2, 4, 6, 8 is spun and the number scored is equal to the prize in Pounds.

i) Calculate the stake for playing this game, if it is known to be a fair game.

ii) The organiser increased the stake by 20% and replaced the number 8 to make his expected profit
when twenty people play the game become £15. With which number did he replace 8?

5. A man in a shopping mall has three small containers that are turned upside-down on a board; there is
a peanut under one of the containers.

The man shuffles the three containers very quickly and members of the public are asked to pay P50 to
guess which container the peanut is under.

If a person guesses correctly, the man will give him or her a prize of P100.

Explain why this game is not fair and find the prize that should be awarded to make it a fair game.

6. A 12 cm by 14 cm board has three playing cards each measuring

3.6 cm by 5 cm attached to it, as shown.

In a game, for which the stake is £9, a player randomly throws one dart at

the board.

If a dart pierces a club (§), then a prize of £20 is awarded. ª §


If a dart pierces the spade (ª), then a prize of £30 is awarded. 14 cm

i) Calculate the area of a) the board, b) one of the cards. §

95
Assuming that the dart that is thrown sticks within the perimeter

of the board, use your answers to i) to find 12 cm

ii) The probability that a player wins a) the £20 prize, b) the £30 prize.

iii) Hence calculate a player’s expected winnings.

The organiser was encouraged to make the game fair by increasing the smaller prize.

iv) By what percentage should she increase the smaller prize to make the game fair?

7.

8.

96
97
8

98
9.

99
10.

100

You might also like