Module 2 Examining Relationship Quiz Assignment
Module 2 Examining Relationship Quiz Assignment
2.2 Price versus size. You visit a local Starbucks to buy a Mocha Frappuccino©. The barista
explains that this blended coffee beverage comes in three sizes and asks if you want a Tall, a
Grande, or a Venti. The prices are $3.50, $4.00, and $4.50, respectively.
(a) What are the variables and cases?
The cases are Tall, Grande and Venti.
(b) Which variable is the explanatory variable? Which is the response variable?
Explain your answers.
The explanatory variable is the prices $3.50, $4.00, and $4.50 because prices cause
the size of the beverage and vice versa.
(c) The Tall contains 12 ounces of beverage, the Grande contains 16 ounces, and the
Venti contains 20 ounces. Answer parts (a) and (b) with ounces in place of the
names for the sizes.
Scrizbi 315 60
Bobax 185 9
Rustock 150 30
Cutwail 125 16
Storm 85 3
Grum 50 2
Ozdok 35 10
Nucrypt 20 5
Wopla 20 1
Spamthru 12 0
(c) Describe the labels, variables, and values that you used.
The variables are (a) botnet, (b) bots and (c) spams per day. A botnet is a remotely and
silently controlled collection of networked computers used to send unwanted commercial emails,
called spam botnet. A bot is a software application that is programmed to do automated certain
tasks. Spam is an unsolicited and unwanted junk email that is sent out in bulk to a user's
computer through the internet. The values in numbers are their respective quantities.
2.13 Is the cost too high? Because it is so costly, many individuals and families cannot afford
to purchase health insurance. The Current Population Survey collected data on the
characteristics of the uninsured. Below are the numbers of uninsured and the total number of
people classified by age. The units are thousands of people.
(b) Find the total number of uninsured persons, and use this total to compute the percent of the
uninsured who are in each age group.
18 to 24 years 29.30%
25 to 34 years 26.87%
35 to 44 years 18.75%
45 to 64 years 14.19%
(d) Explain how the plot you produced in part (c) differs from
the plot that you made in part (a).
The plot in part (a) plots the numbers of uninsured and total number of population for
each age group, while in part (c), only the percentages of uninsured are being plotted.
2.15 Compare the two percents. In the previous two exercises, you computed percents in two
different ways and generated plots versus age group. Describe the difference between the two
ways with an emphasis on what kinds of conclusions can be drawn from each.
In plot (a), the data is more informative since it plots the two variables which are the total
population and the numbers of uninsured. While in plot (c), only the percentages are shown
which means that the information is brief compared to the first plot.
2.25 Spam botnets. In Exercise 2.3 you made a data set for the botnet data. Use that data set
to compute the correlation between the number of bots and the number of spam messages per
day.
X Values
∑ = 997
Mean = 99.7
∑(X - Mx)2 = SSx = 84068.1
Y Values
∑ = 136
Mean = 13.6
∑(Y - My)2 = SSy = 3126.4
X and Y Combined
N = 10
∑(X - Mx)(Y - My) = 14330.8
R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))
2.26 Change the units. In the previous exercise bots were given in thousands and spam messages
per day were recorded in billions. In Exercise 2.6 you created a data set using the actual values. For
example, Srizbi has 315,000 bots and generates 60,000,000,000 spam messages per day.
(a) Find the correlation between bots and spam messages using this data set.
X Values
∑ = 997000
Mean = 99700
∑(X - Mx)2 = SSx = 84068100000
Y Values
∑ = 135950000000
Mean = 13595000000
∑(Y - My)2 = SSy = 3.12724225E+21
X and Y Combined
N = 10
∑(X - Mx)(Y - My) = 1.4331985E+16
R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))
(b) Compare this correlation with the one that you computed in the previous exercise.
They have the same results. The only difference is the length of computation given that the
amounts are in wider range or form.
(c) What can you say in general about the effect of changing units in this way on the
size of the correlation?
I’d rather choose the first way of computing since it requires less time.
2.27 Correlation for debt. Figure 2.6 (page 84) is a scatterplot of 2007 debt versus 2006 debt for 24
countries. Is the correlation r for these data near −1, clearly negative but not near −1, near 0, clearly positive
but not near 1, or near 1? Explain your answer.
X Values
∑ = 10400
Mean = 2080
∑(X - Mx)2 = SSx = 9920096
Y Values
∑ = 11316
Mean = 2263.2
∑(Y - My)2 = SSy = 10253300.8
X and Y Combined
N=5
∑(X - Mx)(Y - My) = 10074160
R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))
r = 10074160 / √((9920096)(10253300.8)) = 0.9989
r = 0.9989
x 20 30 40 50 60
y 10 30 50 30 10
(a) Make a scatterplot of y versus x.
X Values
∑ = 200
Mean = 40
∑(X - Mx)2 = SSx = 1000
Y Values
∑ = 130
Mean = 26
∑(Y - My)2 = SSy = 1120
X and Y Combined
N=5
∑(X - Mx)(Y - My) = 0
R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))
r = 0 / √((1000)(1120)) = 0
(d) What important point about correlation does this exercise illustrate?
The important point in this exercise is the illustration of weak correlation between two
quantitative variables, and what it looks like if plotted.
2.32 First test and final exam. How strong is the relationship between the score on the first test
and the score on the final exam in an elementary statistics course? Here are data for eight students
from such a course:
First-test score 153 144 162 149 127 158 158 153
Final-exam score 145 140 145 170 145 175 170 160
(a) Do you think that one of these variables should be an explanatory variable and the other a
response variable? Give reasons for your answer.
Yes, that's possible. One explanation for the final scores being higher or lower than the first
score is that the students carried their knowledge of the subjects covered on the initial test over to
the final exam. Thus, the final exam results are influenced by how well they understood the subjects
covered on the first test. In this case, the explanatory variable could be the first-test score and the
response variable is the final-exam score.
Although technically a positive correlation, the relationship between your variables is weak because
the nearer the value is to zero, the weaker the relationship.
Y Values
∑ = 1250
Mean = 156.25
∑(Y - My)2 = SSy = 1387.5
X and Y Combined
N=8
∑(X - Mx)(Y - My) = 445
R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))
2.33 Second test and final exam. Refer to the previous exercise. Here are the data for the second
test and the final exam for the same students.
Second-test score 158 162 144 162 136 158 175 153
Final-exam score 145 140 145 170 145 175 170 160
(a) Explain why you should use the second-test score as the explanatory variable.
The second-test score is the explanatory variable because it affects or causes the results in
the final-exam score.
The scatterplot shows a weak to moderate positive relationship between the variables.
Y Values
∑ = 1250
Mean = 156.25
∑(Y - My)2 = SSy = 1387.5
X and Y Combined
N=8
∑(X - Mx)(Y - My) = 610
R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))
(d) Why do you think the relationship between the second-test score and the final-exam score is
stronger than the relationship between the first-test score and the final-exam score?
The relationship between the second-test score and final-exam score is stronger because
the timeframe between the happening of the case is closer compared to the timeframe between first-
test and final-exam.
2.34 Add an outlier. Refer to the previous exercise. Add a ninth student whose scores on the
second test and final exam would lead you to classify the additional data point as an outlier.
(b) Find the correlation and describe the effect of the outlier on the correlation.
X Values
∑ = 1438
Mean = 159.778
∑(X - Mx)2 = SSx = 2021.556
Y Values
∑ = 1445
Mean = 160.556
∑(Y - My)2 = SSy = 2722.222
X and Y Combined
N=9
∑(X - Mx)(Y - My) = 1781.111
R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))
This is a strong positive correlation, which means that high X variable scores go with high Y variable
scores (and vice versa).
(c) Describe the performance of the student on the second exam and final exam and why that leads
to the conclusion that the result is an outlier. Give a possible reason for the performance of this
student.
The ninth student got high scores on both exams. These scores greatly affect the correlation
of the two variables resulting in an outlier or an increase in a correlation. Notice that from r 0.5194, it
went to r 0.7593.
(c) Find the predicted values of y for x = 10, for x = 20, and for x = 30.
x = 10
y = 10 + 20 (x)
y = 10 + 20 (10)
y = 10 + 200
y = 210
x = 20
y = 10 + 20 (x)
y = 10 + 20 (20)
y = 10 + 400
y= 410
y = 410x = 30
y = 10 + 20 (x)
y = 10 + 20 (30)
y = 10 + 600
y = 610
(d) Plot the regression line for values of x between 0 and 50.
2.59 The “January effect.” Some people think that the behavior of the stock market in January
predicts its behavior for the rest of the year. Take the explanatory variable x to be the percent
change in a stock market index in January and the response variable y to be the change in the index
for the entire year. We expect a positive correlation between x and y because the change during
January contributes to the full year’s change. Calculation based on 38 years of data gives
(b) What is the equation of the least-squares line for predicting full-year change from January
change?
𝑏=𝑟 𝑥 𝑆_𝑦/𝑆_𝑥
𝑏=1.1707yˆ
= 6.08% + 1.707 x
(c) The mean change in January is x = 1.75%. Use your regression line to predict change in the
index in a year in which the index rises 1.75% in January. Why could you have given this result (up
to round off error) without doing the calculation?
𝑏=𝑟 𝑥 𝑆_𝑦/𝑆_𝑥
ȳ = 9.08%.
The least-squares regression line passes through the point (x, y). Thus, we would predict ȳ
= ȳ = 9.07% when x = x = 1.75%.