0% found this document useful (0 votes)
74 views26 pages

Single Variable data-MA4-20SP, MA5.1-12SP

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views26 pages

Single Variable data-MA4-20SP, MA5.1-12SP

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Analysing Data

Range
 It is defined as the difference between the highest and lowest scores.
Range = Highest Score – Lowest score

Mode
 The mode is simply the outcome that occurs the most often, it has the highest frequency.

Median
 After a set of scores has been arranged in order, the median is the ‘middle score’. This is only strictly true if
there is an odd number of scores.
For an even number of scores, the median is the average of the middle two scores.
Mean
 The mean or average of a set of scores is the sum of all the scores divided by the number of scores.
Total of Scores
Mean =
Number of Scores

Example1. Explain which statistical measure is referred to in these statements.


a) The majority of people surveyed prefer Activ-8 sports drink. - Mode
b) The ages of fans at the Rolling Stones concert varied from 8 to 80. - Range
c) The average Australian family has 2.1 children. - Mean
Q1. Explain which statistical measure is referred to in these statements.
a) There was a 15° temperature variation during the day.
b) Children at this school are absent 3.4 days per semester, on average.
c) Most often you have to pay $79.95 for those sports shoes.
d) The average Australian worker earns about $470 per week.
e) A middle-income family earns about $35 000 per annum.
Example2. A class of 20 students scored the following marks (out of 10) in a mathematics test:

5 1 7 6 7 9 8 7 6 3
2 3 5 3 5 4 7 9 7 2
Find
5+1+7+6+ 7+9+8+ 7+6+3+ 2+ 3+5+3+5+ 4+7 +9+7+ 2
Mean = = 5.3
20

Mode = 7

5+6
Median = = 5.5
2

Range = 9 -1 = 8

Q2. Find the range, mean, median and mode for these simple ordered data sets.
a)1, 2, 2, 2, 4, 4, 6 b)1, 4, 8, 8, 9, 10, 10, 10, 12, c)1, 5, 7, 7, 8, 10, 11
d) 3, 3, 6, 8, 10, 12 e) 7, 11, 14, 18, 20, 20 f)2, 2, 2, 4, 10, 10, 12, 14
Q3. For the given data sets, find the:
i) mean ii) median iii) mode iv) range
a) 5,2, 4,1, 0, 6, 1, 2, 9, 6 b) 1,7, 1, 3, 2,6, 1,5, 9,10
Example3. Elio’s batting scores in last year’s cricket series were 65, 30, 0, 0, 0, and 80; while Gaetano’s scores
were 0, 30, 30, 80, 25 and 20 in the same matches.
a) Calculate the mean score for each player.
b) Calculate the median score for each player.
c) Which of the mean and median is the better measure of each player’s ability?
Q4. Frank scored 5, 7, 6, 8, 7 in a series of spelling tests, while Erica scored 8, 8, 6, 1, 9 in the same tests.
a) Calculate the mean for each.
b) Find the median for each.
c) Which is the better measure of their abilities?
Q5. The following scores were made by four teams in sports matches.
Jackals: 4, 0, 5, 9, 4, 8
Panthers: 7, 10, 10, 11, 10, 9
Wallabies: 2, 15, 1, 17, 10, 3
Tigers: 9, 10, 20, 25, 0, 14
a) Which team has the highest mean?
b) Which team shows the greatest range of scores?
c) Compare modal scores for Jackals and Panthers.
d) Find the median score for each team.
Q6. The hours a shop assistant spends cleaning the store in eight successive weeks are:
8, 9, 12, 10, 10, 8, 5, 10
a) Calculate the mean for this set of data.
b) Determine the score that needs to be added to this data to make the mean equal to 10.
Q7. Decide if the following data sets are bimodal.
a) 2, 7, 9, 5, 6, 2, 8, 7, 4 b)1, 6, 2, 3, 3, 1, 5, 4, 1, 9 c)10, 15, 12, 11, 18, 13, 9, 16, 17
Q8.A netball player scored the following number of goals in her 10 most recent games:
15, 14, 16, 14, 15, 12, 16, 17, 16, 15
a) What is her mean score?
b) What number of goals does she need to score in the next game for the mean of her scores to be 16?
Q9.Write down a set of 5 numbers which has the following values:
a) Mean of 5, median of 6 and mode of 7
b) Mean of 5, median of 4 and mode of 8
c) Mean of 4, median of 4 and mode of 4
d) Mean of 4.5, median of 3 and mode of 2.5
e) Mean of 1, median of 0 and mode of 0
Q10.This dot plot shows the frequency of households with 0, 1, 2 or 3 pets.
a) How many households were surveyed?
b) Find the mean number of pets correct to one decimal place.
c) Find the median number of pets.
d) Find the mode.
e) Another household with 7 pets is added to the list. Does this change the median? Explain.
Q11. Eight numbers have a mean of 9. Seven of the numbers are 9, 7, 10, 6, 11, 6 and 10.
Find the eighth number.
Grouped Data
Example4. For set of scores, find the:
i) Mean
ii) Median
iii) Mode
iv) Range.

14+ 40+54+ 80+44


i) Mean = = 9.28 ii) Median is 13th score = 9
25

iii) Mode = 10 iv) Range = 11 – 7 = 4

Q9. Find the mean, median, mode and range of these scores.

a) b)

c)

Example5. Find the median, mode and range of the data presented in the following stem-and-leaf plots.

Median = middle score = 9th score = 169


Mode = 172
Range = 185 – 142 = 43
Q10. Find the median, mode and range of the data presented in the following stem-and-leaf plots.
Clusters, gaps and outliers
Example1. Identify any clusters, outliers or gaps in the following sets of data.
a) Monthly rainfall: 25 mm, 16 mm, 6 mm, 27 mm, 28 mm, 96 mm
96 mm is an outlier. It is much larger than all the other data values. (There is a large gap between 96 and the
other scores.)
b) c)

b) There is a cluster of scores in the ‘fifties’.

c) This data has a gap between 2 and 5. No students made 3 or 4 mistakes.

Q1. Identify any clusters, outliers or gaps in the following sets of data.
a) 13, 14, 15, 15, 17, 104

Example2. a) Find the mean, median, mode and range of each set of scores.
i) 3, 5, 5, 7, 9 ii) 3, 5, 5, 7, 90
29 110
Mean = = 5.8 Mean = = 22
5 5
Median = 5 Median = 5
Mode = 5 Mode = 5
Range = 6 Range = 87
b) Draw a dot plot for each set of data and mark the position of the mean, median and mode.

c) Compare and discuss the use of the mean, median


and mode as measures of central tendency for these data sets.
In the first data set, the mean, median and mode are all central and typical values of the scores.
In the second data set, the mean is no longer a central value as it is larger than 4 of the 5 scores
The two sets of data are the same except for the last score. As the mean is calculated using the value
of every score, it is greatly affected by outliers. When the 9 in the first set is replaced by 90, much
larger than the other scores, the mean changes significantly from 5.8 to 22. The mean is not an
appropriate measure of central tendency if the data has an outlier.
Note: The median and mode remain unchanged despite the presence of the outlier in the second set of
data and are appropriate to use as measures of central tendency.
d) Discuss the use of range as a measure of spread.
i) The range is a good measure of the spread of the scores.
ii) The range is greatly affected by the outlier and is not a useful measure of the spread of this set of
scores.
Q2. a) Find the mean, median and mode of the scores in each data set.
i) 7, 9, 9, 10, 12 ii) 7, 9, 9, 10, 80
b) Draw a dot plot for each set of data and mark the position of the mean, median and mode.
c) Compare and discuss the use of the mean, median and mode as measures of central tendency for
these data sets.
d) Discuss the use of the range as a measure of spread.
Example3. a) The heights of students in a school netball team were measured and recorded as 166 cm,
170 cm, 168 cm, 67 cm, 170 cm, and 169 cm.
i) Calculate the mean, median and mode of this data.
166+170+168+67 +170+169
Mean = = 151.7 Note: The mean is not central or typical of the data.
6
Arrange the data in order: 67, 166, 168, 169, 170, 170
168+169
Median = = 168.5cm
2
Mode = 170
ii) Identify the outlier in this set of data.
Outlier = 67 cm
iii) Ignore the outlier and calculate the mean, median and mode of the remaining 5 scores.
166 cm, 170 cm, 168 cm, 170 cm, and 169 cm.
166+170+168+170+169
Mean = = 168.7
5
Arrange the data in order: 166, 168, 169, 170, 170
Median = 169
Mode = 170
iv) Should the outlier be included when reporting the mean, median and mode for this data? Why or
why not?
In this case it is reasonably obvious that the value 67 cm is a measurement or recording error, as it is not
likely that any girl in the netball team would be 67 cm tall. In this case, the outlier could be ignored and
the mean, median and mode would then all be central and typical of the data.
b) The scores of 5 students on a mechanical aptitude test were recorded as 18, 23, 21, 20, 52.
i) Calculate the mean, median and mode of this data.
18+23+21+20+52
Mean = = 26.8 Note: The mean is not central or typical of the data.
5
ii) Identify the outlier in this set of data.
Arrange the data in order: 18, 20, 21, 23, 52
Median = 21
There is no mode.
iii) Ignore the outlier and calculate the mean, median and mode of the remaining 4 scores.
Outlier = 52
Mean = 20.5
Median = 20.5
There is no mode.
iv) Should the outlier be included when reporting the mean, median and mode for this data? Why or
why not?
In this case the outlier could be the result of one of the students having an exceptionally high
mechanical aptitude compared with the others, so the outlier should be included in the reporting even
though, by including it, the mean is not a central or typical value.
Note: The median is the best measure of central tendency with or without the outlier included.
Q3. A metal rod was measured by 6 students and the results were recorded as 112mm, 111mm, 110 mm, 13
mm, 112 mm, 112 mm.
a) Calculate the mean, median and mode of this data.
b) Identify the outlier in this set of data.
c) Ignore the outlier and calculate the mean, median and mode of the remaining 5 scores.
d) Should the outlier by included in reporting the mean, median and mode for this data? Why or why not?
Q4. The times (in minutes) taken to travel to work in a 5 day week were recorded as 17, 15, 16, 18, 55.
a) Calculate the mean, median and mode of this data.
b) Identify the outlier in this set of data.
c) Ignore the outlier and calculate the mean, median and mode of the remaining 4 scores.
d) Should the outlier by included in reporting the mean, median and mode for this data? Why or why not?
Variation of sample mean and proportion
A key factor in the use of samples is the determination of how large the sample should be in order to give a
good estimate of the properties of the whole population. Consider the following results, recorded in groups of 5,
when a normal six-sided die is rolled 200 times. The results are summarised in the table below.
34664, 14624, 31362, 51242, 63611, 42553, 63144, 45213,
56443, 54346, 52415, 33663, 55244, 65132 63514, 62453,
12646, 35236, 24546, 13251, 43356, 64132, 21634, 46323,
55651, 26435, 53142, 25145 26513, 42214, 26563, 21264,
23245, 61224, 32616, 11326, 21621, 16231, 21652, 31643

699
Mean = = 3.5
200

Example1. Consider the proportion of 6s in samples of size 5 from the previous results for die rolls. Compare
these with the population proportion.
Number of 6 s
Proportion of 6s in sample of 5 =
5

Complete the following table.

b)

i) What is the lowest proportion of 6s in these 5 samples?


Lowest = 0%
ii) What is the highest proportion of 6s in these 5 samples?
Highest = 40%
c) In how many of these samples is the proportion in the sample approximately the same as in the population?
There are 2 sample proportions (20% and 20%) that are approximately the same as the population proportion.
d) Do you think that a sample of size 5 is big enough to provide a good estimate of the proportion of 6s in the
population? Give a reason.
No. Only 2 of the 5 samples have proportions close to the population proportion of 18%.
Q1. Consider the proportion of 6s in some samples of size 10 for the data given in the introduction to this
section.
a) Complete the following table.

b) Complete the following.


i) The lowest proportion of 6s in these 5 samples is ___.
ii) The highest proportion of 6s in these 5 samples is ___.
c) In how many of these samples is the proportion in the sample approximately the same as in the population?
d) Do you think that a sample of size 10 is big enough to provide a good estimate of the proportion of 6s in the
population? Give a reason for your answer.

Q2. Consider the proportion of 6s in some samples of size 20 for the data given in the introduction to this
section.
a) Complete the following table.

b) i) What is the lowest proportion of 6s in these 5 samples?


ii) What is the highest proportion of 6s in these 5 samples?
c) In how many of these samples is the proportion in the sample approximately the same as in the population?
d) Do you think that a sample of size 20 is big enough to provide a good estimate of the proportion of 6s in the
population? Give a reason for your answer.
Example2. Combine the information from the 5 samples in Example 1 into one of size 25. Is the proportion of
6s in this sample a good estimate of the population proportion?
Combining the information from our 5 samples:
2+1+0+0+1 4
Proportion of 6s = = = 16%
5+5+5+5+5 25
Yes, this is close to the population proportion of 18%.
Q3. Combine the information from the 5 samples in question 1 into one of size 50. Complete the following to
find if the proportion of 6s in this samples a good estimate of the population proportion.

Proportion of 6s = 10+10+10+10+10 = ___%

Q4. Combine the information from the 5 samples in question 2 into one of size 100. Is the proportion of 6s in
this sample a good estimate of the population proportion?
Example3. Consider the means of samples of size 5 taken from the data at the beginning of Section I and
compare this with the population mean.
a) Complete the table.
b) i) What is the lowest sample mean? = 2.4
ii) What is the highest sample mean? = 4.6
c) In how many of these samples is the mean of the sample approximately the same as that of the population?
Population mean = 3.5
Let’s take ‘within 10% of’ to indicate ‘approximately the same as’.
10
10% of population mean = ×3.5 = 0.35
100
Now 3.5 − 0.35 = 3.15 and 3.5 + 0.35 = 3.85, hence we will consider any sample means between 3.15
and 3.85 to be ‘approximately the same as’ the population mean.
There are two sample means (3.4 and 3.8) that are approximately the same as the population mean.
d) Do you think that a sample of size 5 is big enough to provide a good estimate of the mean of the population?
Give a reason.
No. Only 2 of the 5 sample means are approximately the same as the population mean.
Q5. Consider the means of samples of size 10 and compare this with the population mean.
a) Complete the following table.

b) Complete the following.


i) The lowest sample mean is ___.
ii) The highest sample mean is ___.
c) In how many of these samples is the mean approximately the same as that of the population? Complete the
following:
Number of sample means that lie between 3.15 and 3.85 = ___
d) Do you think that a sample of size 10 is big enough to provide a good estimate of the mean of the
population? Give a reason for your answer.
Q6. Consider the means of samples of size 20 and compare them with the population mean.
a) Complete the following table.

b) i) What is the lowest sample mean?


ii) What is the highest sample mean?
c) In how many of these samples is the mean of the sample approximately the same as that of the population?
d) Do you think that a sample of size 20 is big enough to provide a good estimate of the mean of the
population? Give a reason for your answer.
Example4. Use the information in Example 3 as listed in the table below.
a) Find the mean of the sample means for the first:
i) 3 samples ii) 4 samples iii) 5 samples

Mean of first 3 sample means =


∑ of sample means = 4.6+ 3.4+3.8 = 3.9
3 3

Mean of first 4 sample means =


∑ of sample means = 4.6+ 3.4+3.8+3 = 3.7
4 4

Mean of first 5 sample means =


∑ of sample means = 4.6+ 3.4+3.8+3+ 2.4 = 3.4
5 5
b) Is the mean of the sample means in part a approximately the same as the population mean? As the number of
samples increases, the mean of the sample means gets closer to the mean of the population.
Q7. Use the information in question 5.
a) Find the mean of the sample means for the first:
i) 3 samples ii) 4 samples iii) 5 samples.
b) Is the mean of the sample means approximately the same as the population mean?
Q8. Use the information in question 6.
a) Find the mean of the sample means for the first:
i) 3 samples ii) 4 samples iii) 5 samples.
b) Is the mean of the sample means approximately the same as the population mean?
Stem-and-leaf plots
A stem-and-leaf plot uses a stem number and leaf number to represent data.
– The data is shown in two parts: a stem and a leaf.
– The ‘key’ tells you how the plot is to be read.
– The graph is similar to a histogram on its side or a bar graph, but there is no loss of detail of the original data.
Back-to-back stem-and-leaf plots can be used to compare two sets of data. The stem is drawn in the middle,
with the leaves on either side.
 Symmetrical data will produce a graph that is symmetrical about the centre.
 Skewed data will produce a graph that lacks symmetry.
Example1.The times taken for the students from two classes to travel to and from school are given below.
Class 9P: 19, 49, 25, 25, 22, 55, 26, 38, 54, 22, 33, 44, 15, 86, 31, 18, 33, 67, 34, 42, 49, 29, 45, 65, 29
Class 9W: 22, 34, 48, 18, 58, 67, 74, 66, 53, 31, 57, 25, 58, 49,
35, 47, 50, 65, 54, 49, 38, 23, 58, 19, 42
a) Draw a back-to-back stem-and-leaf plot for this data.

Q1. Two groups of Year 9 students were asked to unscramble a


seven-letter word. Their times, in seconds, are shown below. Draw a back-to-back stem-and-leaf plot for this
data.
Group 1: 11, 16, 39, 23, 51, 24, 31, 4, 29, 16, 27, 40, 13, 23, 30, 29, 6, 22, 34, 38, 13
Group 2: 12, 27, 46, 17, 26, 32, 18, 15, 21, 41, 37, 36, 23, 8, 25, 43, 34, 7, 36, 12, 7
Q2. The scores for a class of 16 students on two tests are given below. Draw a back-to-back stem-and-leaf plot
for this data.
Test 1: 22, 42, 34, 30, 19, 39, 46, 41, 38, 35, 47, 39, 24, 45, 27, 32
Test 2: 13, 18, 21, 6, 40, 16, 26, 24, 35, 12, 20, 26, 31, 13, 15, 19
The shape of displays
 The general shape of a distribution can provide information about the
scores. Here is the graph of a symmetric distribution, also referred to as a
bell-shaped curve or a normal distribution.
 If the distribution is not symmetric then it is said to be skewed.
 A distribution is positively skewed if most of the data is on the left-hand
side of the distribution. The data has a ‘tail’ to the right as shown in the
diagram on the below.
 A distribution is negatively skewed if most of the data is on the right-hand side of the distribution. The
data has a ‘tail’ to the left as in the diagram on the below.

 The mode is the score with the highest frequency. The mode for skewed distributions above are shown
below
 Some distributions have two modes. This is called a bimodal distribution. As long as the distribution
has two distinct humps, not necessarily with the same frequency
(height), then it is said to be bimodal. Two examples are shown below.
Q1. Describe the shape of the following distributions as symmetric, positively skewed, negatively skewed or
bimodal.

Comparing like sets of numerical data


Example2.The scores of two groups of university
students on a mechanical aptitude test are given at right.

a) Draw a back-to-back histogram and a side-by-


side histogram for this data.
Q3. The parallel dot plot shows the number of goals scored by two soccer teams in a 16-match competition.
a) Describe the shape of each distribution.
b) Use the mean, median and range to compare the data.

Q4. This back-to-back


dot plot shows the number of saves made by goalkeepers for the two teams in question 3.
a) Comment on the shape of each distribution.
b) Use the mean, median and range to compare the data.

Frequency Histogram and Polygons

Example: Constructing histograms from tables


Represent the frequency distribution table below as a histogram and a polygon. Ensure an appropriate scale and
label are chosen for each axis.

Solution:

Exercise
4. For the following sets of data:
i create a frequency distribution table
ii hence draw a histogram and a polygon
a 1, 2, 5, 5, 3, 4, 4, 4, 5, 5, 5, 1, 3, 4, 1
b 5, 1, 1, 2, 3, 2, 2, 3, 3, 4, 3, 3, 1, 1, 3
c 4, 3, 8, 9, 7, 1, 6, 3, 1, 1, 4, 6, 2, 9, 7, 2, 10, 5, 5, 4
d 60, 52, 60, 59, 56, 57, 54, 53, 58, 56, 58, 60, 51, 52, 59, 59, 52, 60, 50, 52
5 Edwin records the results for his spelling tests out of 10. They are 3, 9, 3, 2, 7, 2, 9, 1, 5, 7,10, 6, 2, 6, 4.
a Draw a histogram for his results.
b Is he a better or a worse speller generally than Fred, whose results are given by the histogram shown below?

6 Some tennis players count the number of aces served in different matches. Match up the histograms with the
descriptions.
7 A car dealership records the number of sales each salesperson makes per day over three weeks.

a Which salesperson holds the record for the greatest number of sales in one day?
b Which salesperson made a sale every day?
c Over the whole period, which salesperson made the most sales in total?
d Over the whole period, which salesperson made the fewest sales in total?
e On the same set of axes draw all four salesperson’s frequency polygons (only draw the line joining the tops,
not with columns).
f Explain why a frequency polygon can be more useful for comparing data than a histogram.
(Hint: consider what multiple histograms would look like on the same axes.)
Answers

You might also like