0% found this document useful (0 votes)
120 views26 pages

2 - Comparing Data Sets PDF

The document discusses different methods for comparing two data sets including parallel boxplots, back-to-back stem plots, two-way frequency tables, and segmented bar charts. It provides examples of comparing the results of two groups on an obstacle course and the number of promotional materials handed out by two companies using these graphical methods and summary statistics.

Uploaded by

Nhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views26 pages

2 - Comparing Data Sets PDF

The document discusses different methods for comparing two data sets including parallel boxplots, back-to-back stem plots, two-way frequency tables, and segmented bar charts. It provides examples of comparing the results of two groups on an obstacle course and the number of promotional materials handed out by two companies using these graphical methods and summary statistics.

Uploaded by

Nhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

O

FS
O

PR

2
LI
N

PA

Comparing
data sets

2.1 Kick off with CAS

2.2 Back-to-back stem plots


2.3 Parallel boxplots and dot plots
2.4 Two-way (contingency) frequency tables and segmented
bar charts
2.5 Review

2.1 Kick off with CAS


Exploring parallel boxplots with CAS
Parallel boxplots can be used to compare and contrast key information about two
different numerical data sets.
1 Use CAS to draw parallel boxplots of the following two data sets, which detail

O
FS

the time it takes for two groups of individuals to complete an obstacle course
(rounded to the nearest minute).

PA

PR

Group A: 18, 22, 24, 17, 22, 27, 15, 20, 25, 19, 26, 19, 23, 26, 18, 20, 27, 24, 16
Group B: 21, 22, 19, 21, 17, 21, 18, 24, 21, 20, 18, 24, 35, 22, 19, 17, 23, 20, 19

2 Use your parallel boxplots from question 1 to answer the following questions.

a Which group has the larger range of data?


b Which group has the larger interquartile range of data?

LI
N

c Which group has the higher median value?

3 One of the data sets has an outlier.

a How is this marked on the parallel boxplot?


b State the value of the outlier.

4 a Use CAS to draw a parallel boxplot of the following two data sets.

Set A: 41, 46, 38, 44, 49, 39, 50, 47, 47, 42, 53, 44, 46, 35, 39
Set B: 35, 31, 39, 41, 37, 43, 29, 40, 36, 38, 42, 33, 34, 30, 37
b Which data set has the largest range?
c Which data set has an interquartile range of 8?

Please refer to the Resources tab in the Prelims section of your eBookPLUS for a comprehensive
step-by-step guide on how to use your CAS technology.

2.2
Unit 3
AOS DA
Topic 6

Back-to-back stem plots


In topic 1, we saw how to construct a stem plot for a set of univariate data.
Wecanalso extend a stem plot so that it compares two sets of univariate data.
Specifically, we shallcreate astem plot that displays the relationship between a
numerical variable and acategorical variable. We shall limit ourselves in this section
to categorical variables with just two categories, for example, gender. The two
categories are used toprovide two back-to-back leaves of a stem plot.

Concept 2

A back-to-back stem plot is used to display two sets of univariate


data, involving a numerical variable and a categorical variable with
2 categories.

The girls and boys in Grade 4 at Kingston Primary School submitted projects
on the Olympic Games. The marks they obtained out of 20 are as shown.

Girls marks

16

17

19

15

12

16

17

19

19

16

Boys marks

14

15

16

13

12

13

14

13

15

14

PR

WoRKEd
EXAMpLE

O
FS

Back-to-back
stem plots
Concept summary
Practice questions

Display the data on a back-to-back stem plot.


WRITE

1 Identify the highest and lowest scores in order

PA

2 Create an unordered stem plot first. Put the

LI
N

boys scores on the left, and the girls scores


on the right.

3 Now order the stem plot. The scores on the

left should increase in value from right to left,


while the scores on the right should increase
invalue from left to right.

Interactivity
Back-to-back
stem plots
int-6252

84

Highest score = 19
Lowest score = 12
Use a stem of 1, divided into fifths.

to decide on the stems.

THINK

Leaf
Boys

Stem

Leaf
Girls

1
1
1
1
1

2
5
6 7 6 7 6
9 9 9

Leaf Boys

Stem

Leaf Girls

3 3 3 2
5 5 4 4 4
6

1
1
1
1

2
5
6 6 6 7 7
9 9 9

3 2 3 3
4 5 4 5 4
6
Key: 1| 2 = 12

Key: 1| 2 = 12

The back-to-back stem plot allows us to make some visual comparisons of the two
distributions. In Worked example 1, the centre of the distribution for the girls is
higherthan the centre of the distribution for the boys. The spread of each of the
distributions seems to be about the same. For the boys, the scores are grouped around
the 1215 mark; for the girls, they are grouped around the 1619 mark. On the whole,
we can conclude that the girls obtained better scores than the boys did.

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

To get a more precise picture of the centre and spread of each of the distributions, we
can use the summary statistics discussed in topic 1. Specifically, we are interested in:
1. the mean and the median (to measure the centre of the distributions), and
2. the interquartile range and the standard deviation (to measure the spread of the
distributions).
We saw in topic 1 that the calculation of these summary statistics is very
straightforward using CAS.
The number of how to vote cards handed out by various Australian Labor
Party and Liberal Party volunteers during the course of a polling day is as
shown.
Labor

233

246

252

263

270

193

202

210

222

257

247

204

215

226

253

263

272

287

273

266

233

244

229

238

226

211

234

226

214

204

285

245

267

275

250

272

280

279

261

PA

PR

Liberal

180

O
FS

WoRKEd
EXAMpLE

LI
N

THINK

Display the data using a back-to-back stem plot and use this, together with
summary statistics, to compare the distributions of the number of cards
handed out by the Labor and Liberal volunteers.

1 Construct the stem plot.

WRITE

Leaf
Labor
0
3
4 2
4 1 0
9 6 6 2
8 4 3
7 6
7 2
3
0

Stem
18
19
20
21
22
23
24
25
26
27
28

Leaf
Liberal

4
5
6
3
4
0
1
2
0

Key: 18 | 0 = 180
5
3
3 6 7
2 3 5 9
5 7

Topic 2 CoMpARIng dATA SETS

85

2 Use CAS to obtain summary statistics

for each party. Record the mean,


median, IQR and standard deviation
in the table. (IQR= Q3 Q1)

Liberal
257.5

Median

227.5

264.5

IQR

36

29.5

Standard deviation

23.9

23.4

From the stem plot we see that the Labor distribution is


symmetric and therefore the mean and the median are
very close, whereas the Liberal distribution is negatively
skewed.
Since the distribution is skewed, the median is a better
indicator of the centre of the distribution than the mean.
Comparing the medians therefore, we have the median
number of cards handed out for Labor at 228 and for
Liberal at 265, which is a big difference.
The standard deviations were similar, as were the
interquartile ranges. There was not a lot of difference in
thespread of the data.
In essence, the Liberal party volunteers handed
out morehow to vote cards than the Labor party
volunteersdid.

PR

O
FS

3 Comment on the relationship.

Mean

Labor
227.9

Boys and girls submitted an assignment on the history of the ANZACs.


Theresults out of 40 are shown.
WE1

Girls results 30

35

31

32

38

33

35

30

PRACTISE

PA

EXERCISE 2.2 Back-to-back stem plots

33

37

39

31

32

39

36

Boys results

34

LI
N

Display the data on a back-to-back stem plot.

2 The marks obtained out of 50 by students in Physics and Chemistry are shown.

Display the data on a back-to-back stem plot.


32

45

48

32

24

30

41

29

44

45

36

34

28

49

Chemistry

46

31

38

28

45

49

34

45

47

33

30

21

32

28

Physics
3

The number of promotional pamphlets handed out for company A and


company B by a number of their reps is shown.
WE2

Company A

144 156 132 138 148 160 141 134 132 142 132 134 168 149

Company B

146 131 138 155 145 153 134 153 138 133 130 162 148 160

Display the data using a back-to-back stem plot and use this, together with
summary statistics, to compare the number of pamphlets handed out by
each company.

86

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

4 A comparison of student achievements (out of 100) in History and English was

recorded and the results shown.


History

75

78

42

92

59

67

78

82

84

64

77

English

78

80

57

96

58

71

74

87

79

62

75 100

98

a Draw a back-to-back stem plot.


b Use summary statistics and the stem plot to comment on the two subjects.
CONSOLIDATE

5 The marks out of 50 obtained for the end-of-term test by the students in German

and French classes are given as shown. Display the data on a back-to-back
stem plot.

20 38 45 21 30 39 41 22 27 33 30 21 25 32 37 42 26 31 25 37

French

23 25 36 46 44 39 38 24 25 42 38 34 28 31 44 30 35 48 43 34

O
FS

German

6 The birth masses of 10 boys and 10 girls (in kilograms, correct to the nearest

100grams) are recorded in the table. Display the data on a back-to-back


stem plot.
3.4 5.0 4.2 3.7 4.9 3.4 3.8 4.8 3.6 4.3

Girls

3.0 2.7 3.7 3.3 4.0 3.1 2.6 3.2 3.6 3.1

PR

Boys

7 The number of delivery trucks making deliveries to a supermarket each day over a

2-week period was recorded for two neighbouring supermarkets supermarket A


and supermarket B. The data are shown in the table.
11 15 20 25 12 16 21 27 16 17 17 22 23 24

10 15 20 25 30 35 16 31 32 21 23 26 28 29

PA

a Display the data on a back-to-back stem plot.


b Use the stem plot, together with some summary statistics, to compare the

distributions of the number of trucks delivering to supermarkets A and B.


8 The marks out of 20 obtained by males and females for a science test in a
Year10class are given.
12

13

14

14

15

15

16

17

Males

10

12

13

14

14

15

17

19

LI
N

Females

a Display the data on a back-to-back stem plot.


b Use the stem plot, together with some summary statistics, to compare the

distributions of the marks of the males and the females.


9 The end-of-year English marks for 10 students in an English class were
compared over 2 years. The marks for 2011 and for the same students in 2012
are as shown.
2011

30

31

35

37

39

41

41

42

43

46

2012

22

26

27

28

30

31

31

33

34

36

a Display the data on a back-to-back stem plot.


b Use the stem plot, together with some summary statistics, to compare the

distributions of the marks obtained by the students in 2011 and 2012.

Topic 2 CoMpARIng dATA SETS

87

10 The age and gender of a group of people attending a

fitness class are recorded.


Female

23

24

25

26

27

28

30

31

Male

22

25

30

31

36

37

42

46

a Display the data on a back-to-back stem plot.


b Use the stem plot, together with some summary

O
FS

statistics, to compare the distributions of the ages


of the female members to the male members of the
fitness class.
11 The scores on a board game for a group of kindergarten children and for a group

of children in a preparatory school are given as shown.


3

13

14

25

28

32

36

41

47

50

Prep. school

12

17

25

27

32

35

44

46

52

Kindergarten

PR

a Display the data on a back-to-back stem plot.


b Use the stem plot, together with some summary statistics, to compare the

LI
N

PA

distributions of the scores of the kindergarten children compared to the


preparatory school children.
12 A pair of variables that could be displayed on a back-to-back stem plot is:
A the height of a student and the number of people in the students household
B the time put into completing an assignment and a pass or fail score on the
assignment
C the weight of a businessman and his age
D the religion of an adult and the persons head circumference
E the income of an employee and the time the employee has worked for
the company
13 A back-to-back stem plot is a useful way of displaying the relationship between:
A the proximity to markets in kilometres and the cost of fresh foods on average
per kilogram
B height and head circumference
C age and attitude to gambling (for or against)
D weight and age
E the money spent during a day of shopping and the number of shops visited
on that day
14 The score out of 100 a group of males and females received when going for their
licence are shown. Construct a back-to-back stem plot of the data.

MASTER

88

Male

86

92

100

90

94

82

72

90

88

94

76

80

Female

94

96

72

80

84

92

83

88

90

70

81

83

15 A back-to-back stem plot is used to display two sets of data, involving which two

variables?
A increasing variables
C continuous and categorical variables
E numerical and continuous variables

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

B discrete and numerical variables


D numerical and categorical variables

16 The study scores (out of 50) of students who studied both Mathematical Methods

and Further Mathematics are shown.


Methods

28

34

41

36

33

39

44

40

39

42

36

31

29

44

Further

30

37

38

41

35

43

44

46

43

48

37

31

28

48

a Display the data in a back-to-back stem plot.


b Use the stem plot, together with some summary statistics, to compare the

distributions of the scores for Mathematical Methods compared to Further


Mathematics.

Unit 3

Parallel boxplots
Concept summary
Practice questions

PR

Topic 6
Concept 3

The four Year 7 classes at Western Secondary College complete the same
end-of-year maths test. The marks, expressed as percentages for the four
classes, are given as shown.

PA

Interactivity
Parallel boxplots
int-6248

WoRKEd
EXAMpLE

We saw in the previous section that we could display relationships between a


numerical variable and a categorical variable with just two categories, using a
back-to-back stem plot.
When we want to display a relationship between a numerical variable and a
categorical variable with two or more categories, parallel boxplots or parallel dot
plots can be used.
Parallel boxplots are obtained by constructing individual boxplots for each distribution
and positioning them on a common scale.
Parallel dot plots are obtained by constructing individual dot plots for each
distribution and positioning them on a common scale.
Construction of individual boxplots was discussed in detail in topic 1. In this section
we concentrate on comparing distributions represented by a number of boxplots (that
is, on the interpretation of parallel boxplots).

AOS DA

O
FS

2.3

Parallel boxplots and dot plots

LI
N

7A 40 43 45 47 50 52 53 54 57 60 69 63 63 68 70 75 80 85 89 90
7B 60 62 63 64 70 73 74 76 77 77 78 82 85 87 89 90 92 95 97 97

7C 50 51 53 55 57 60 63 65 67 69 70 72 73 74 76 80 82 82 85 89

7D 40 42 43 45 50 53 55 59 60 61 69 73 74 75 80 81 82 83 84 90

Display the data using parallel boxplots. Use this to describe any similarities
or differences in the distributions of the marks between the four classes.

THINK

WRITE/DRAW

1 Use CAS to determine the

fivenumber summaryfor
each data set.

Min.

7A
40

7B
60

7C
50

7D
40

Q1

51

71.5

58.5

51.5

Median = Q2

61.5

77.5

69.5

65

Q3

72.5

89.5

78

80.5

Max.

90

97

89

90

Topic 2 CoMpARIng dATA SETS

89

2 Draw the boxplots, labelling

7D

eachclass. All four boxplots


sharea common scale.

7C
7B
7A
30

40

50 60 70 80
Maths mark (%)

90 100

Class 7B had the highest median mark and the range


ofthedistribution was only 37. The lowest mark in 7B
was60.
We notice that the median of 7As marks is 61.5. So, 50%
ofstudents in 7A received less than 61.5. This means that
about half of 7A had scores that were less than the lowest
score in 7B.
The range of marks in 7A was the same as that of 7D with
the highest scores in each equal (90), and the lowest scores
in each equal (40). However, the median mark in 7D (65)
was slightly higher than the median mark in 7A (61.5) so,
despite asimilar range, more students in 7D received a
higher mark than in 7A.
While 7D had a top score that was higher than that of 7C,
the median score in 7C (69.5) was higher than that of 7D
and almost 25% of scores in 7D were less than the lowest
score in 7C. In summary, 7B did best, followed by 7C, then
7D and finally 7A.

O
FS

3 Describe the similarities and

PA

PR

differences between the four


distributions.

PRACTISE

LI
N

EXERCISE 2.3 Parallel boxplots and dot plots


1

The times run for a 100 m race in grade 6 are shown for both boys and girls.
Thetimes are expressed in seconds.
WE3

Boys

Girls

15.5 16.1 14.5 16.9 18.1 14.3 13.8 15.9 16.4 17.3 18.8 17.9 16.1
16.7 18.4 19.4 20.1 16.3 14.8 17.3 20.3 19.6 18.4 16.5 17.2 16.0

Display the data using parallel boxplots and use this to describe any similarities or
differences between the boys and girls performances.
2 A teacher taught two Year 10 maths classes and wanted to see how they compared
on the end of year examination. The marks are expressed as percentages.
10A

67

73

45

59

67

89

42

56

68

75

94

80

98

10D

76

82

62

58

40

55

69

71

89

95 100 84

70

66

87

Display the data using parallel boxplots and parallel dot plots. Use this to describe
any similarities or differences between the two classes.

90

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

CONSOLIDATE

3 The heights (in cm) of students in 9A, 10A and 11A were recorded and are shown

in the table.
9A

120 126 131 138 140 143 146 147 150

10A

140 143 146 147 149 151 153 156 162

11A

151 153 154 158 160 163 164 166 167

9A

156 157 158 158 160 162 164 165 170

10A

164 165 167 168 170 173 175 176 180

11A

169 169 172 175 180 187 189 193 199

O
FS

a Construct parallel boxplots to show the data.


b Use the boxplots to compare the distributions of height for the 3 classes.
4 The amounts of money contributed annually to superannuation schemes by people

in 3 different age groups are as shown.


2000

3100

5000

5500

6200

6500

6700

7000

3039

4000

5200

6000

6300

6800

7000

8000

9000 10 300 12 000

2029

9200 10 000

PR

4049 10 000 11 200 12 000 13 300 13 500 13 700 13 900 14 000 14 300 15 000
a Construct parallel boxplots to show the data.
b Use the boxplots to comment on the distributions.
5 The daily share price of

Company A

LI
N

PA

twocompanies was recorded


over a period of one month. The
Company B
results are presented as
parallel boxplots.
65 70 75 80 85 90 95 100 105
Price per share (cents)
State whether each of the
following statements is true or false.
a The distribution of share prices for Company A is symmetrical.
b On 25% of all occasions, share prices for Company B equalled or exceeded the
highest price recorded for Company A.
c The spread of the share prices was the same for both companies.
d 75% of share prices for Company B were at least as high as the median share
price for Company A.
6 Last year, the spring season at the
Sydney Opera House included two
major productions: The Pearlfishers
and Orlando. The number of
A-reserve tickets sold for each
performance ofthe two operas is
shown asparallel boxplots.
The Pearlfishers
Orlando
400 450 500 550 600 650 700 750 800 850 900 950
Number of A-reserve tickets sold

Topic 2 CoMpARIng dATA SETS

91

a Which of the two productions proved to be more popular with the

public,assuming A-reserve ticket sales reflect total ticket sales? Explain


your answer.
b Which production had a larger variability in the number of patrons purchasing
A-reserve tickets? Support your answer with the necessary calculations.
7 The results for a maths test given to classes in two different year levels, one in
Year 8 and the other in Year 10, are given by the parallel boxplots.
Year 8

25

30

35

40

45

50

55 60 65 70
Result out of 100

75

O
FS

Year 10
80

85

90

95 100

Day 2

5.4
5.6
4.9
5.2

5.4

5.6

4.9

PA

Day 1

4.1

PR

The percentage of Year 10 students who obtained a mark greater than 87 was:
A 2%
B 5%
C 20%
D 25%
E 75%
8 From the parallel boxplots in question 7, it can be concluded that:
A the Year 8 results were similar to the Year 10 results
B the Year 8 results were lower than the Year 10 results and less variable
C the Year 8 results were lower than the Year 10 results and more variable
D the Year 8 results were higher than the Year 10 results and less variable
E the Year 8 results were higher than the Year 10 results and more variable
9 The scores of 10 competitors on two consecutive days of a diving competition are
recorded below:
5.4

6.0

5.8

6.0

5.1

5.3

5.8

5.7

5.8

5.4

5.5

6.0

LI
N

Construct parallel dot plots to show the data and comment on the divers
resultsover the two days.
The following figure relates to questions 10 to 12.
The ages of customers in different areas of a department store are as shown.

Area C

Area B
Area A
5

10 15 20 25 30 35 40 45 50
Ages of people in various areas in a store

10 Which area has the largest range from Q2 (the median) to Q3: Area A, Area B

or Area C?
11 Which area has the largest range: Area A, Area B or Area C?
12 Which area has the highest median age: Area A, Area B or Area C?

92

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

MASTER

13 The numbers of jars of vitamin A, B, C and multi-vitamins sold per week by a

local chemist are shown in the table.


Vitamin A

11

13

14

Vitamin B

10

10

11

12

14

15

15

15

17

19

Vitamin C

10

11

12

12

13

Multi-vitamins

12

13

13

15

16

16

17

19

19

20

Construct parallel boxplots to display the data and use it to compare the
distributions of sales for the 4 types of vitamin.
14 Eleven golfers in a golf tournament play 18 holes each day. The scores for

Friday
77

Saturday
81

71

78

83

75

81

84

79

82

84

80

83

81

83

83

85

83

85

90

Unit 3
AOS DA

Topic 6

Concept 4

Two-way
frequency tables
and segmented
bar charts
Concept summary
Practice questions

PR

89

87

90

87

90

85

87

91

88

88

93

89

89

94

86

Two-way (contingency) frequency tables


and segmented bar charts
When we are examining the relationship between two categorical variables, two-way
(or contingency) tables are an excellent tool. Consider the following example.
Once the two-way table is formed, marginal distributions and conditional
distributions can both be found. Marginal distributions are the sums (totals) of the row
or the column and are found in the margins of the table. The conditional distribution is
the sub-population (sample) and this is found in the middle of the table.
If we were to look at mobile phone preference as shown in the table the marginal
distributions are the totals, as shown by the green highlighted numbers.
Apple
13

Samsung
9

Nokia
3

Total
25

Women

17

25

Total

30

16

50

Men
Interactivity
Two-way tables
and segmented
bar graphs
int-6249

88

LI
N

2.4

81

88

85

81

86

PA

84

Sunday
70

Thursday
70

O
FS

each of the golfers on the four days are given below. Display this data using
parallel boxplots.

Topic 2 CoMpARIng dATA SETS

93

The conditional distribution is the sub-population, so if we are looking at


people who prefer Samsung, the conditional distribution is shown by the purple
highlighted numbers.
Apple
13

Samsung
9

Nokia
3

Total
25

Women

17

25

Total

30

16

50

Men

At a local shopping centre, 34 females and 23 males were asked which of the
two major political parties they preferred. Eighteen females and 12 males
preferred Labor. Display these data in a two-way (contingency) table, and
calculate the party preference for males and females.
WRITE

1 Draw a table. Record the respondents sex in

the columns and party preference in the rows


of the table.

Party preference
Labor

Male

Total

Female
18

Male
12

Total
30

34

23

57

Female
18

Male
12

Total
30

Liberal

16

11

27

Total

34

23

57

Liberal
Total

Party preference
Labor

PA

asked. Put this information into the table and


fill in the total.
We also know that 18 females and 12 males
preferred Labor. Put this information in
the table and find the total of people who
preferred Labor.

2 We know that 34 females and 23 males were

3 Fill in the remaining cells. For example, to

LI
N

find the number of females who preferred


the Liberals, subtract the number of females
preferring Labor from the total number of
females asked: 34 18 = 16.

4 Marginal distributions for party preference

for males and females refers to percentage


(probability) of each party. For Labor there
are30 out of a total of 57.
5 For Liberal there are 27 out of a total of 57.

Female

THINK

O
FS

PR

WoRKEd
EXAMpLE

Liberal
Total

Party preference
Labor

Labor:

30
= 0.53
57

Liberal:

27
= 0.47
57

In Worked example 4, we have a very clear breakdown of data. We know how many
females preferred Labor, how many females preferred the Liberals, how many males
preferred Labor and how many males preferred the Liberals.
If we wish to compare the number of females who prefer Labor with the
numberofmales who prefer Labor, we must be careful. While 12 males
94

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

preferredLabor compared to 18 females, there were fewer males than females


beingasked. That is, only 23 males were asked for their opinion, compared to
34 females.
To overcome this problem, we can express the figures in the table as percentages.
WoRKEd
EXAMpLE

Fifty-seven people in a local shopping centre were asked whether they


preferred the Australian Labor Party or the Liberal Party. The results are as
shown.
Convert the numbers in this table to percentages.
Male
12

Total
30

Liberal

16

11

27

Total

34

23

57

O
FS

Female
18

Party preference
Labor

THINK

WRITE

PR

Party preference
Labor

Female
52.9

Male
52.2

47.1

47.8

100.0

100.0

Liberal
Total

PA

Draw the table, omitting the total column.


Fill in the table by expressing the number in each cell
asa percentage of its columns total. For example,
to obtain the percentage of males who prefer Labor,
dividethe number of males who prefer Labor by the
totalnumber of males and multiply by 100%.
12
100% = 52.2% (correct to 1 decimal place)
23

LI
N

We could have also calculated percentages from the table rows, rather than columns.
Todo that we would, for example, have divided the number of females who preferred
Labor(18) by the total number of people who preferred Labor (30) and so on. The
table shows this:
Party preference
Labor
Liberal

Female
60.0

Male
40.0

Total
100

59.3

40.7

100

By doing this we have obtained the percentage of people who were female and
preferred Labor (60%), and the percentage of people who were male and preferred
Labor (40%), and so on. This highlights facts different from those shown in the
previous table. In other words, different results can be obtained by calculating
percentages from a table in different ways.
In the above example, the respondents gender is referred to as the explanatory
variable, and the party preference as the response variable.
As a general rule, when the explanatory variable is
placed in the columns of the table, the percentages
should be calculated in columns.

Topic 2 CoMpARIng dATA SETS

95

Comparing percentages in each row of a two-way table allows us to establish whether


a relationship exists between the two categorical variables that are being examined.
As we can see from the table in Worked example 5, the percentage of females who
preferred Labor is about the same as that of males. Likewise, the percentage of
females and males preferring the Liberal Party are almost equal. Thisindicates that
for the group of people participating in the survey, party preferenceis not related
to gender.

Segmented bar charts

Sixty-seven primary and 47 secondary school students were asked about


their attitude to the number of school holidays which should be given.
They were asked whether there should be fewer, the same number, or more
schoolholidays. Five primary students and 2 secondary students wanted
fewer holidays, 29 primary and 9 secondary students thought they had
enough holidays (that is, they chose the same number) and the rest thought
they needed to be given more holidays.

LI
N

WoRKEd
EXAMpLE

PA

PR

Percentage

O
FS

Party
When comparing two categorical
100
preference
variables, it can be useful to represent
90
Liberal
the results from a two-way table (in
80
Labor
percentage form) graphically. We can
70
60
dothis using segmented bar charts.
50
A segmented bar chart consists of two or
40
more columns, each of which matches
30
onecolumn in the two-way table. Each
20
10
column is subdivided into segments,
0
corresponding to each cell in that column.
x
Female
Male
Gender
For example, the data from Worked
example 5 can be displayed using the
segmented bar chart shown.
The segmented bar chart is a powerful visual aid for comparing and examining the
relationship between two categorical variables.

THINK

Present these data in percentage form in a two-way table and a


segmentedbar chart. Compare the opinions of the primary and the
secondary students.

1 Put the data in a table. First, fill in the

given information, then find the missing


information by subtracting the appropriate
numbers from the totals.

96

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

WRITE/DRAW

Attitude
Fewer

Primary
5

Secondary
2

Total
7

Same

29

38

More

33

36

69

Total

67

47

114

2 Calculate the percentages. Since the

Secondary
4.3

Same

43.3

19.1

More

49.2

76.6

Total

100.0

100.0

Attitude
More
Same
Fewer

PR

100
90
80
70
60
50
40
30
20
10
0

O
FS

Primary
7.5

Primary Secondary x
School level

Secondary students were much keener on having


more holidays than were primary students.

PA

4 Comment on the results.

Attitude
Fewer

Percentage

explanatory variable (the level of the


student: primary or secondary) has been
placed in the columns of the table, we
calculate the percentages in columns.
For example, to obtain the percentage
of primary students who wanted fewer
holidays, divide the number of such
students by the total number of primary
students and multiply by 100%.
5
That is,
100% = 7.5%.
67
3 Rule out the set of axes. (The vertical axis
shows percentages from 0 to 100, while the
horizontal axis represents the categories
from the columns of the table.) Draw two
columns to represent each category
primary and secondary. Columns must be
the same width and height (up to 100%).
Divide each column into segments so that
the height of each segment is equal to the
percentage in the corresponding cell of the
table. Add a legend to the graph.

EXERCISE 2.4 Two-way (contingency) frequency tables and


1

A group of 60 people, 38 females and 22 males, were asked whether they


prefer an Apple or Samsung phone. Twenty-three females and 15 males said they
preferred an Apple phone. Display this data in a two-way (contingency) table and
calculate the marginal distribution for phone preference for males and females.
WE4

LI
N

PRACTISE

segmentedbar charts

2 A group of 387 females and 263 males were asked their preference from Coke

and Pepsi. Two hundred and twenty-one females preferred Coke, whereas
108males preferred Pepsi. Display this data in a two-way (contingency) table
andcalculate the conditional distribution of drink preference among females.

A group of 60 people were asked their preferences on phones. The results


are shown.
Convert the numbers in this table to percentages.
WE5

Phone
Apple

Females
23

Males
15

Total
38

Samsung

15

22

Total

38

22

60

Topic 2 CoMpARIng dATA SETS

97

4 A group of 650 people were asked their preferences on soft drink. The results

are shown.
Convert the numbers in this table to percentages.
Females
221

Males
155

Total
376

Coke

166

108

274

Total

387

263

650

Sixty-one females and 57 males were asked which they prefer off the menu:
entre, main or dessert. Seven males and 18 females preferred entre, while
31males and 16 females said they preferred the main course, with the remainder
having dessert as their preferred preference.
Present these data in percentage form in a two-way table and a segmented bar
chart. Compare the opinions of the males and females on their preferences.
WE6

O
FS

Drink
Pepsi

6 Ninety-three people less than 40 years of age and 102 people aged 40 and over

PR

were asked where their priority financially is, given the three options mortgage,
superannuation or investing. Eighteen people in the 40 and over category and
42 people in the less than 40 years category identified mortgage as their priority,
whereas 21 people under 40 years of age and 33 people aged 40 and over said
investment was most important. The rest suggested superannuation was their
mostimportant priority.
Present these data in percentage form in a two-way table and segmented bar chart.
Compare the opinions of the under 40s to the people aged 40 and over on their
priority to their finances.

PA

7 In a survey, 139 women and 102 men were asked whether they approved or

disapproved of a proposed freeway. Thirty-seven women and 79 men approved


of the freeway. Display these data in a two-way table (not as percentages),
and calculate the approval or disapproval of the proposed freeway for men
and women.

CONSOLIDATE

8 Students at a secondary school were asked whether the length of lessons should

LI
N

be 45 minutes or 1 hour. Ninety-three senior students (Years 1012) were asked


and it was found 60 preferred 1-hour lessons, whereas of the 86 junior students
(Years79), 36 preferred 1-hour lessons. Display these data in a two-way table
(not as percentages), and calculate the conditional distribution on length of
lessonsand senior students.

98

9 For each of the following two-way frequency tables, complete the missing

entries.
a

Attitude
For

Female
25

Male
i

Total
47

Against

ii

iii

iv

Total

51

92

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

Attitude
For

Female

Male

ii

Total
21

Against

iii

21

iv

Total

30

63

Female
i

Male
42%

53%

ii

iii

iv

Party preference
Labor
Liberal
Total

O
FS

10 Sixty single men and women were asked whether they prefer to rent by

themselves, or to share accommodation with friends. The results are shown below.
Men
12

Women
23

Share with friends

16

Total

21

39

Total
35

PR

Preference
Rent by themselves

25
60

Convert the numbers in this table to percentages.


The information in the following two-way frequency table relates to
questions 11 and 12.
The data show the reactions of administrative staff and technical staff to an upgrade
of the computer systems at a large corporation.
Administrative staff
53

Technical staff
98

Total
151

Against

37

31

68

90

129

219

Total

PA

Attitude
For

11 From the previous table, we can conclude that:

53% of administrative staff were for the upgrade


37% of administrative staff were for the upgrade
37% of administrative staff were against the upgrade
59% of administrative staff were for the upgrade
54% of administrative staff were against the upgrade
12 From the previous table, we can conclude that:
A 98% of technical staff were for the upgrade
B 65% of technical staff were for the upgrade
C 76% of technical staff were for the upgrade
D 31% of technical staff were against the upgrade
E 14% of technical staff were against the upgrade
13 Delegates at the respective Liberal Party and Australian Labor Party conferences
were surveyed on whether or not they believed that marijuana should belegalised.
Sixty-two Liberal delegates were surveyed and 40 of them were against
legalisation. Seventy-one Labor delegates were surveyed and 43 were against
legalisation.

LI
N

A
B
C
D
E

Topic 2 CoMpARIng dATA SETS

99

Present the data in percentage form in a two-way frequency table. Comment on


any differences between the reactions of the Liberal and Labor delegates.
14 Use the results in question 13 to draw a segmented bar chart.

The information in the following table relates to questions 1518.


The amount of waste recycled by 100 townships across Australia was rated as low,
medium or high and the size of the town as small, mid-sized or large.
The results of the ratings are:
Type of town

Amount of
waste recycled

Mid-sized

Large

Low

Medium

31

High

16

18

O
FS

Small

15 The percentage of mid-sized towns rated as having a high level of waste recycling

PA

MASTER

PR

is closest to:
A 41%
B 25%
C 30%
D 17%
E 50%
16 The variables, Amount of waste recycled and Type of town, as used in this
rating are:
A both categorical variables
B both numerical variables
C numerical and categorical respectively
D categorical and numerical respectively
E neither categorical nor numerical variables
17 Calculate the conditional distribution for amount of waste and large towns.
18 Calculate the percentage of small towns rated as having a high level of waste

LI
N

recycling.

100

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

ONLINE ONLY

2.5 Review

The Maths Quest Review is available in a customisable


format for you to demonstrate your knowledge of this
topic.
The Review contains:
Multiple-choice questions providing you with the
opportunity to practise answering questions using
CAS technology
Short-answer questions providing you with the
opportunity to demonstrate the skills you have
developed to efficiently answer questions using the
most appropriate methods

www.jacplus.com.au
Extended-response questions providing you with
the opportunity to practise exam-style questions.

A summary of the key points covered in this topic is


also available as a digital document.

REVIEW QUESTIONS

ONLINE ONLY

PR

O
FS

Download the Review questions document from


the links found in the Resources section of your
eBookPLUS.

Activities

www.jacplus.com.au

PA

Interactivities

studyON is an interactive and highly visual online


tool that helps you to clearly identify strengths
and weaknesses prior to your exams. You can
then confidently target areas of greatest need,
enabling you to achieve your best results.

To access eBookPLUS activities, log on to

LI
N

A comprehensive set of relevant interactivities


to bring difficult mathematical concepts to life
can be found in the Resources section of your
eBookPLUS.

Topic 2 CoMpARIng dATAIndex


SETS

101

2 Answers
|

1 Key 3 1 = 31

History
Leaf
2
9
7 4
8 8 7 5
4 2
8 2

Leaf
Girls
0 0 1
2 3
5 5

Stem
3
3
3
3
3

2 Key 2 4 = 24

Stem
2
2*
3
3*
4
4*

Chemistry
Leaf
1
8 8
0 1 2 3 4
8

IQR

Median
IQR

3 3
5
0 2

Mean

5 6 8

Standard deviation

83 65.5 = 17.5

83.5 66.5 = 17

Company B

143.57

144.71

141.5

145.5

149 134 = 15 153 134 = 19


11.42

15.07

Company A

10.87

They are both positively skewed. The median is a better


indicator of the centre of the distribution than the mean.
This shows Company B handing out more pamphlets,
taking into account that the IQR and the standard
deviations are quite similar.

102

76.5

G
PA

15
15*
16
16*

77.5

5 Key: 2 3 = 23

6
0
8

Company B
Leaf
0 1 3 4
8 8

LI
N

Stem
13
13*
14
14*

76.42

13.60

History has a slightly higher median; however,


Englishhas a slightly higher mean. Their standard
deviations are similar, so overall the results are
quite similar.

3 Key 13 0 = 130

Company A
Leaf
4 4 2 2 2
8
4 2 1
9 8

74.67

Standard
deviation

5 5 6 7 9

7 8
2
1 4 5 8 9
0 7
6
0

English

Mean
Median

English
Leaf

History

PR

Physics
Leaf
4
9 8
4 2 2 0
6
4 1
9 8 5 5

Stem
4
5
6
7
8
9
10

Leaf
Boys
1
3 2
4
7 6
9 9

Key 5 | 7 = 57

O
FS

4a

EXERCISE 2.2

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

Leaf
German
2 1 1 0
7 6 5 5
3 2 1 0 0
9 8 7 7
2 1
5

Stem
2
2*
3
3*
4
4*

Leaf
French
3 4
5 5 8
0 1 4 4
5 6 8 8 9
2 3 4 4
6 8

6 Key: 2* 7 = 2.7 (kg)

Leaf
Boys
4
8 7
3
9

4
6
2
8
0

Stem
2*
3
3*
4
4*
5

Leaf
Girls
6 7
0 1 1 2 3
6 7
0

7 a Key: 2* 5 = 25 trucks

Leaf
A
2 1
7 7 6 6 5
4 3 2 1 0
7 5

Stem

The spread of each of the distributions is much the


same, but the centre of each distribution is quite
different with the centre of the 2012 distribution
lower. The work may have become a lot harder!

Leaf
B
0
5 6
0 1 3
5 6 8 9
0 1 2
5

1
1*
2
2*
3
3*

10 a Key: 3* 6 = 36 years old

Leaf
Female
4 3
8 7 6 5
1 0

Leaf
Males
0
2 3
4 4 5
7
9

11 a Key: 5 0 = 50 points

3 2
5 5 4 4
7 6

Stem
1
1
1
1
1

Leaf
Females

the median is 26.5, the standard deviation is 2.8 and


the interquartile range is 4.5.
For the distribution of the males, the mean is 33.6, the
median is 33.5, the standard deviation is 8.2 and the
interquartile range is 12.
The centre of the distributions is very different: it is
much higher for the males. The spread of the ages of
the females who attend the fitness class is very small
but very large for males.

PR

8 a Key: 1 2 = 12 marks

b For the distribution of the females, the mean is 26.75,

the standard deviation is 4.9 and the interquartile


range is 7. The distribution is symmetric.
For supermarket B the mean is 24.4, the median is
25.5, the standard deviation is 7.2 and the interquartile
range is 10. The distribution is symmetric.
The centre and spread of the distribution of
supermarket B is higher than that of supermarket A.
There is greater variation in the number of trucks
arriving at supermarket B.

b For the marks of the females, the mean is 14.5, the

9 a Key: 2* 6 = 26 marks

Stem
2
2*
3
3*
4
4*

1 0
9 7 5
3 2 1 1
6

Leaf
2012
2
6 7 8
0 1 1 3 4
6

b The distribution of marks for 2011 and for 2012 are

each symmetric.
For the 2011 marks, the mean is 38.5, the median
is40, the standard deviation is 5.2 and the
interquartile range is 7. The distribution is symmetric.
For the 2012 marks, the mean is 29.8, the median
is30.5, the standard deviation is 4.2 and the
interquartile range is 6.

Stem
0
1
2
3
4
5

Leaf
Prep.
5
2 7
5 7
2 5
4 6
2

children, the mean is 28.9, the median is 30, the


standard deviation is 15.4 and the interquartile
range is 27.
For the distribution of scores for the prep. children,
the mean is 29.5, the median is 29.5, the standard
deviation is 15.3 and the interquartile range is 27.
The distributions are very similar. There is not a lot of
difference between the way the kindergarten children
and the prep. children scored.

LI
N

Leaf
Kindergarten
3
4 3
8 5
6 2
7 1
0

b For the distribution of scores of the kindergarten

PA

median is 14.5, the standard deviation is 1.6 and the


interquartile range is 2. The distribution is symmetric.
For the marks of the males, the mean is 14.25, the
median is 14, the standard deviation is 2.8 and the
interquartile range is 3.5. The distribution is symmetric.
The centre of each distribution is about the same. The
spread of marks for the boys is greater, however. This
means that there is a wider variation in the abilities of
the boys compared to the abilities of the girls.
Leaf
2011

Leaf
Male
2
5
0 1
6 7
2
6

O
FS

b For supermarket A the mean is 19, the median is 18.5,

Stem
2
2*
3
3*
4
4*

12 B
13 C

14 Key: 7 2 = 72

Male
Leaf
2
6
2 0
8 6
4 4 2 0 0
0

Stem
7
7*
8
8*
9*
9*
10

Female
Leaf
0 2
0 1 3 3 4
8
0 2 4
6

Topic 2 CoMpARIng dATA SETS

103

15 D

Further Mathematics
Leaf
8
0 1
5 7 7 8
1 3 3 4
6 8 8

Mathematical
Methods

Further
Mathematics

36.86

39.21

37.5
41 33 = 8

39.5
44 35 = 9

5.29

6.58

Mean
Median
IQR
Standard
deviation

Stem
2*
3
3*
4
4*

3a

11A
10A
9A

O
FS

Mathematical Methods
Leaf
9 8
4 3 1
9 9 6 6
4 4 2 1 0

120 130 140 150 160 170 180 190 200


Height (cm)

b Clearly, the median height increases from Year 9 to

Year 11. There is greater variation in 9As distribution


than in 10As. There is a wide range of heights in the
lower 25% of the distribution of 9As distribution.
There is a greater variation in 11As distribution than
in 10As, with a wide range of heights in the top 25%
of the 11A distribution.

EXERCISE 2.3

PA

Note: When comparing and contrasting data sets, answers


will naturally vary. It is good practice to discuss your
conclusions in a group to consider different viewpoints.

4a

and standard deviation. It was found that Further


Mathematics had a greater mean (39.21) as compared
to Mathematical Methods (39.5), as well as a greater
median; 39.5 as compared to 36.86. This suggests
that students do better in Further Mathematics as
compared to Mathematical Methods by an average of
two study scores.

b Mathematical Methods has a slightly lower IQR

PR

16 a Key: 3 1 = 31

From the boxplots you can see the medians are the same
but 10D has a higher mean. 10D also has the highest
score of 100%, but 10D also has the lowest score. Since
Q1 and Q 3 are closer together for 10D their results
are more consistent around the median. The parallel
dot plot confirms this but doesnt give you any further
information.

1 Boys

14 16 18 20
Time (seconds)

LI
N

Boys: 13.8, 15, 16.1, 17.6, 18.8

Girls: 14.8, 16.4, 17.3, 19.5, 20.3

5a
c
6a

From the boxplots we can see that the boys have a


significantly lower median. The boys median is lower
than Q1 of the girls time; that is; the lowest 25% of
times for the girls is greater than the lowest 50% of times
for the boys.
2

10A
10D
30

2029 age
group
2
4
6
8 10 12 14 16
Annual superannuation contribution
( $1000)

superannuation for people in their 40s. The spread


of contributions for that age group is smaller than
for people in their 20s or 30s, suggesting that a high
proportion of people in their 40s are conscious of
superannuation. For people in their 20s and 30s,
the range is greater, indicating a range of interest in
contributing to super.
True
b True
False
d True
The Pearlfishers, which had a significantly higher
medium number of A-reserve tickets sold, as well as a
higher minimum and maximum number of A-reserve
tickets sold.
Orlando, which had both a larger range and IQR of
A-reserve tickets sold.

7D
40

50 60 70 80
Exam mark (%)

90 100

10A: 42, 59, 70.5, 87, 98

8C
9 Day 1
4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0

Diving
score

10D: 40, 62, 70.5, 84, 100

Day 2

10A

The dives on day 1 were more consistent than the dives


on day 2 with most of the dives between 5.4 and 6.0

Exam
mark
40 45 50 55 60 65 70 75 80 85 90 95 100 (%)

10D

104

3039 age
group

b Clearly, there is a great jump in contributions to

Girls

4049 age
group

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

(inclusive), despite two lower dives. Day 2 was more


spread with dives from 4.9 to 6.0 (inclusive). It must be
noted that there were no very low scoring dives on the
second day.
10 B
11 B
12 C

the answers.

Multi-vitamin
10 12 14 16
Number of jars sold

18

20

Overall, the biggest sales were of multi-vitamins,


followed by vitamin B, then C and finally vitamin A.
14 For all four days, the median is the 6th score.
For all four days, Q1 is the 3rd score. For all four days,
Q3is the 9th score.

221

155

376

Pepsi

166

108

274

Total

387

263

650

Female

Male

Apple

60.5%

68.2%

Samsung

39.5%

31.8%

Total

100%

100%

Drink

Female

Male

Coke

57.1%

58.9%

42.9%

41.1%

100%

100%

Choice

Male

Female

Total

Entre

18

26

Main

31

16

47

Dessert

19

27

46

Total

58

61

118

Choice

Male

Female

PR

Range

Median

Thursday

70

90

20

81

Friday

77

89

12

83

Saturday

81

89

86

Sunday

70

94

24

Max.

89

PA

Sunday

Phone

Total

Min.

14

30

Main

53

26

Friday

Dessert

33

44

Total

100

100

Entre

Saturday

LI
N

Thursday

70

Coke

Pepsi

Day

Total

O
FS

Male

Female

Conditional distribution: Females who prefer Coke = 0.57


Females who prefer Pepsi = 0.43

Drink

75 80 85 90 95 100
Golf scores on 4 days
in a tournament

EXERCISE 2.4

1 Note that black data is given in the question; red data are

the answers.
Phone

Female

Male

Total

Apple

23

15

38

Samsung

15

22

Total

38

22

60

Percentage

13

2 Note that black data is given in the question; red data are

100
90
80
70
60
50
40
30
20
10
0

Dessert
Main
Entre

Male
Female
Gender

Males enjoy main meal the most compared to females


who prefer their dessert the most.

Marginal distribution: Apple = 0.63 Samsung = 0.37

Topic 2 CoMpARIng dATA SETS

105

<40

40+

Total

Mortgage

42

18

60

Superannuation

30

51

81

Investment

21

33

54

Total

93

102

195

<40

40+

Mortgage

45

Superannuation

Choice

v 41

b i 12

ii 9

iv 42

ii 58%

iii 100%
10

iii 21

v 33

c i 47%

iv 100%

18

Rent by themselves

57%

59%

32

50

Share with friends

43%

41%

Investment

23

32

Total

100%

100%

Total

100

100

11 D
12 C
13

Attitude
For

35.5

Against

64.5

14
x

Male

Total

For

37

79

116

Against

102

23

125

Total

139

102

Female

PA

Attitude

241

45 minutes
1 hour
Total

Junior

100
90
80
70
60
50
40
30
20
10
0

100.0
Attitude
For
Against

Liberal
Labor
Delegates

There is not a lot of difference in the reactions.


15 C

Senior

Total

33

83

17 Conditional distribution:

Large town and no waste = 0.15


Large town and medium waste = 0.19
Large town and high waste = 0.67
Note: rounding causes the total to be greater than 100%.
18 26.32%

50

Lesson length

100.0

60.6

LI
N

Marginal distribution: For = 0.48, Against = 0.52

39.4

40+

Labor

PR

Total

<40

Liberal

Investment
Superannuation
Mortgage

O
FS

Women

The under 40s have a focus on their mortgage, whereas


the 40 and overs prioritise their superannuation.

36

60

96

86

93

179

Conditional distribution:
Senior students prefer 45 mins = 0.35,
Senior students prefer an hour = 0.65

106

iv 45

iii 19

Men

100
90
80
70
60
50
40
30
20
10
0

ii 26

Preference

Choice

9 a i 22

Percentage

MATHS QUEST 12 FURTHER MATHEMATICS VCE Units 3 and 4

16 A

LI
N

O
E
G

PA
E

PR

O
FS

You might also like