Chapter 1 Exam Review - Graphical Displays of Data SOLUTIONS
Chapter 1 Exam Review - Graphical Displays of Data SOLUTIONS
1
Exam
Review
MDM4U
Jensen
Section
1.2
–
Displaying
Categorical
Data
1)
A
researcher
asked
150
high
school
students
what
their
favourite
fast
food
restaurant
was.
The
results
are
in
the
table
below:
Restaurant
Number
of
Students
Relative
Frequency
McDonald’s
22
14.7%
Wendy’s
38
25.3%
Subway
22
14.7%
Harvey’s
11
7.3%
Pizza
Pizza
29
19.3%
A&W
6
4%
KFC
9
6%
Other
13
8.7%
a)
What
type
of
variable
is
‘favourite
fast
food
restaurant?’
Categorical,
nominal
b)
Would
it
be
more
appropriate
to
make
a
histogram
or
bar
graph
to
display
this
data?
Bar
graph’s
are
more
appropriate
for
categorical
data
c)
Complete
the
relative
frequency
column
d)
Display
the
relative
frequencies
using
a
bar
graph
or
histogram.
2)
A
student
is
interested
in
whether
there
is
a
relationship
between
gender
and
major
at
her
college.
She
randomly
sampled
some
men
and
women
on
campus
and
asked
them
if
their
major
was
part
of
the
natural
sciences
(NS),
social
sciences
(SS),
or
humanities
(H).
Her
results
appear
in
the
table
below.
Major
NS SS H Total
15
22
18
Women
=27.3%
=40%
=32.7%
55
13
8
4
=52%
=32%
=16%
25
Gender
Men
80
Total
28
30
22
a) Complete
the
totals
b) Determine
if
major
depends
on
gender
by
calculating
the
conditional
distribution
of
major
based
on
gender
(row
percentages).
c) Use
your
conditional
distribution
to
describe
the
relationship
between
the
variables.
Based
on
the
conditional
distribution
of
major
based
on
gender,
it
appears
that
major
does
depend
on
gender.
A
higher
percentage
of
males
are
enrolled
in
natural
sciences,
while
there
is
a
higher
percentage
of
females
in
social
sciences
and
humanities.
Section
1.3
–
Displaying
Quantitative
Data
3)
The number of goals by Jaromir Jagr in each of his 21 NHL seasons is recorded below
27, 32, 34, 32, 32, 72, 47, 35, 44, 42, 52, 31, 36, 31, 54, 30, 25, 19, 16, 24, 17
4)
The
heights
of
the
2013
Toronto
Raptors
(in
centimeters)
are
listed
below:
201,
183,
191,
211,
201,
201,
203,
213,
206,
206,
183,
208,
198,
198,
211
a)
Determine
the
range
of
the
data.
𝑅𝑎𝑛𝑔𝑒 = 213 − 183 = 30
b)
Determine
an
appropriate
bin
width
that
will
divide
the
data
into
7
intervals.
𝑟𝑜𝑢𝑛𝑑𝑒𝑑 𝑟𝑎𝑛𝑔𝑒 35
𝐵𝑖𝑛 𝑊𝑖𝑑𝑡ℎ = = = 5
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 7
c)
Create
a
frequency
table
for
the
data
Height
Interval
Frequency
𝑆𝑡𝑎𝑟𝑡𝑖𝑛𝑔 𝑃𝑜𝑖𝑛𝑡
180.5
–
185.5
2
185.5
–
190.5
0
35 − 30
= 183 −
190.5
–
195.5
1
2
= 183 − 2.5
195.5
–
200.5
2
= 180.5
200.5
–
205.5
4
205.5
–
210.5
3
210.5
–
215.5
3
d)
Create
a
histogram
of
the
data
Section
1.5
–
Linear
Regression
Using
Technology
5) Two variables have a correlation coefficient of r = 0.9. This indicates
6) If two variables have no correlation, their correlation coefficient would have a value of
a. 1 c. 100
b. 1 d. 0
7) Two variables have a coefficient of determination of 0.64. The correlation coefficient could be
a. 0.64 c. -0.8
b. 0.41 d. 0.36
8) A relationship in which all data values lie on the regression line has a correlation coefficient of
a. 1 c. 1
b. 0 d. +1 or -1
9) The regression line shown would have a correlation coefficient closest to
a. 1 c. -1
b. 0.5 d. 0
11) If a set of data has a very strong correlation, the residual values will be
16)
What
type
of
linear
correlation
is
represented
when
the
correlation
coefficient
is
-‐0.7?
a. Strong negative
b. Moderate negative
c. Weak negative
d. No correlation
17)
What
type
of
correlation
is
represented
when
the
correlation
coefficient
is
0.41?
a. Strong positive
b. Moderate positive
c. Weak positive
d. No correlation
18)
This
table
shows
the
data
for
the
full-‐time
employees
of
a
small
company.
Age
(year)
33
25
19
44
50
54
38
29
Annual
Income
33
31
18
52
56
60
44
35
(in
thousands)
2.5028
Residuals
-‐4.099
3.1044
-‐2.993
2.2472
-‐0.6551
-‐1.257
1.1494
a)
Construct
a
scatterplot
using
your
calculator
b)
Find
the
equation
of
the
regression
line
and
interpret
the
slope
and
y-‐intercept
in
context.
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛: 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑖𝑛𝑐𝑜𝑚𝑒 = −0.864 + 1.150(𝑎𝑔𝑒)
𝑆𝑙𝑜𝑝𝑒 = 1.15; 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 1 𝑦𝑒𝑎𝑟 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒 𝑖𝑛 𝑎𝑔𝑒, 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑠 𝑎 $1150 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒 𝑖𝑛 𝑎𝑛𝑛𝑢𝑎𝑙 𝑖𝑛𝑐𝑜𝑚𝑒
𝑦 − 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 = −0.864; 𝑎𝑡 𝑎𝑔𝑒 0, 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑠 𝑎𝑛 𝑎𝑛𝑛𝑢𝑎𝑙 𝑖𝑛𝑐𝑜𝑚𝑒 𝑜𝑓 − $864
c)
Find
and
interpret
correlation
coefficient,
r.
𝑟 = 0.98; 𝑡ℎ𝑖𝑠 𝑡𝑒𝑙𝑙𝑠 𝑢𝑠 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎 𝑠𝑡𝑟𝑜𝑛𝑔, 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑎𝑔𝑒 𝑎𝑛𝑑 𝑎𝑛𝑛𝑢𝑎𝑙 𝑖𝑛𝑐𝑜𝑚𝑒
d)
Find
the
coefficient
of
determination,
r2.
Interpret
it
in
the
context
of
this
data.
𝑟 ! = 0.965; 𝑡ℎ𝑖𝑠 𝑡𝑒𝑙𝑙𝑠 𝑢𝑠 𝑡ℎ𝑎𝑡 𝑎𝑏𝑜𝑢𝑡 96.5% 𝑝𝑒𝑟𝑐𝑒𝑛𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑖𝑛 𝑎𝑛𝑛𝑢𝑎𝑙 𝑖𝑛𝑐𝑜𝑚𝑒 𝑐𝑎𝑛 𝑏𝑒
𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑏𝑦 𝑖𝑡𝑠 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒 𝑙𝑖𝑛𝑒𝑎𝑟 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝑤𝑖𝑡ℎ 𝑎𝑔𝑒.
e)
Calculate
the
residual
values,
record
them
and
analyze
them
using
the
residual
plot
to
help.
Is
a
linear
model
a
good
fit?
There
is
no
distinguishable
pattern
in
the
residual
plot
and
the
residual
values
are
relatively
small.
This
tells
us
that
the
linear
regression
is
a
good
model
for
the
data.
f) Using
the
linear
regression
equation,
what
would
you
predict
the
annual
income
of
a
40
year
old
to
be?
𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑖𝑛𝑐𝑜𝑚𝑒 = −0.864 + 1.150 40 = 45.136
The
predicted
annual
income
of
a
40
year
old
is
$45
136