0% found this document useful (0 votes)
91 views14 pages

Module 2 Examining Relationship Quiz Assignment

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views14 pages

Module 2 Examining Relationship Quiz Assignment

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Jeffer Sol E.

Cortado BSA - 3rd Year


STT041

APPLY YOUR KNOWLEDGE

2.2 Price versus size. You visit a local Starbucks to buy a Mocha Frappuccino©. The barista
explains that this blended coffee beverage comes in three sizes and asks if you want a Tall, a
Grande, or a Venti. The prices are $3.50, $4.00, and $4.50, respectively.
(a) What are the variables and cases?
The cases are Tall, Grande and Venti.

(b) Which variable is the explanatory variable? Which is the response variable?
Explain your answers.
The explanatory variable is the prices $3.50, $4.00, and $4.50 because prices cause
the size of the beverage and vice versa.

(c) The Tall contains 12 ounces of beverage, the Grande contains 16 ounces, and the
Venti contains 20 ounces. Answer parts (a) and (b) with ounces in place of the
names for the sizes.

Coffee Beverage Price Ounces

Tall $3.50 12 oz.

Grande $4.00 16 oz.

Vendi $4.50 20 oz.

2.3 Make a data set.


(a) Create a spreadsheet that contains the spam botnet data.

BOTNET BOTS (thousands) SPAMS PER DAY(billions)

Scrizbi 315 60
Bobax 185 9
Rustock 150 30
Cutwail 125 16
Storm 85 3
Grum 50 2
Ozdok 35 10
Nucrypt 20 5
Wopla 20 1
Spamthru 12 0

(b) How many cases are in your data set?


There are ten cases in the data set which are the botnets; Srizbi, Grum, Bobax, Ozdok,
Rustock, Nucrypt, Cutwail, Wopla, Storm and Spamthru.

(c) Describe the labels, variables, and values that you used.
The variables are (a) botnet, (b) bots and (c) spams per day. A botnet is a remotely and
silently controlled collection of networked computers used to send unwanted commercial emails,
called spam botnet. A bot is a software application that is programmed to do automated certain
tasks. Spam is an unsolicited and unwanted junk email that is sent out in bulk to a user's
computer through the internet. The values in numbers are their respective quantities.

(d) Which columns give quantitative variables?


The column BOTS and SPAMS PER DAY give quantitative variables.

2.5 Make a scatterplot.


(a) Make a scatterplot similar to Figure 2.1 for the spam botnet data.

(b) Mark the location of the botnet Bobax on your plot.


The green arrow points to the Bobax
2.6 Change the units.
(a) Create a spreadsheet with the spam botnet data using the actual values. In other
words, for Srizbi use 315,000 for the number of bots and 60,000,000,000 for the
number of spam messages per day.

BOTNET BOTS (thousands) SPAMS PER DAY

Scrizbi 315,000 60,000,000,000


Bobax 185,000 9,000,000,000
Rustock 150,000 30,000,000,000
Cutwail 125,000 16,000,000,000
Storm 85,000 3,000,000,000
Grum 50,000 2,000,000,000
Ozdok 35,000 10,000,000,000
Nucrypt 20,000 5,000,000,000
Wopla 20,000 600,000,000
Spamthru 12,000 350,000,000

(b) Make a scatterplot for the data coded in this way.

(c) Describe how this scatterplot differs from Figure 2.1.


The scatterplot differs in a way that the first figure is much easier to locate data since it
has a narrow range of quantities while the second figure has a much wider range of datas,
making it harder to estimate and plot data.

2.13 Is the cost too high? Because it is so costly, many individuals and families cannot afford
to purchase health insurance. The Current Population Survey collected data on the
characteristics of the uninsured. Below are the numbers of uninsured and the total number of
people classified by age. The units are thousands of people.

(a) Plot the number of uninsured versus age group.

(b) Find the total number of uninsured persons, and use this total to compute the percent of the
uninsured who are in each age group.

Age Group Percentage of Uninsured

Under 18 years 11.69%

18 to 24 years 29.30%

25 to 34 years 26.87%

35 to 44 years 18.75%
45 to 64 years 14.19%

65 years and older 1.50%

(c) Plot the percentages versus age group.

(d) Explain how the plot you produced in part (c) differs from
the plot that you made in part (a).
The plot in part (a) plots the numbers of uninsured and total number of population for
each age group, while in part (c), only the percentages of uninsured are being plotted.

(e) Summarize what you can conclude from these plots.


Based on my observation, the plot in part (a) is useful to users of information who want
to see where the total number is plotted to be able to compare it to the uninsured or the main
data. The plot in part (c) on the other hand, is useful to users of information who only want to
see the percentages of the uninsured over its total population.

2.15 Compare the two percents. In the previous two exercises, you computed percents in two
different ways and generated plots versus age group. Describe the difference between the two
ways with an emphasis on what kinds of conclusions can be drawn from each.
In plot (a), the data is more informative since it plots the two variables which are the total
population and the numbers of uninsured. While in plot (c), only the percentages are shown
which means that the information is brief compared to the first plot.

2.25 Spam botnets. In Exercise 2.3 you made a data set for the botnet data. Use that data set
to compute the correlation between the number of bots and the number of spam messages per
day.

X Values
∑ = 997
Mean = 99.7
∑(X - Mx)2 = SSx = 84068.1

Y Values
∑ = 136
Mean = 13.6
∑(Y - My)2 = SSy = 3126.4

X and Y Combined
N = 10
∑(X - Mx)(Y - My) = 14330.8

R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))

r = 14330.8 / √((84068.1)(3126.4)) = 0.884

2.26 Change the units. In the previous exercise bots were given in thousands and spam messages
per day were recorded in billions. In Exercise 2.6 you created a data set using the actual values. For
example, Srizbi has 315,000 bots and generates 60,000,000,000 spam messages per day.

(a) Find the correlation between bots and spam messages using this data set.

X Values
∑ = 997000
Mean = 99700
∑(X - Mx)2 = SSx = 84068100000

Y Values
∑ = 135950000000
Mean = 13595000000
∑(Y - My)2 = SSy = 3.12724225E+21

X and Y Combined
N = 10
∑(X - Mx)(Y - My) = 1.4331985E+16

R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))

r = 1.4331985E+16 / √((84068100000)(3.12724225E+21)) = 0.8839

(b) Compare this correlation with the one that you computed in the previous exercise.
They have the same results. The only difference is the length of computation given that the
amounts are in wider range or form.

(c) What can you say in general about the effect of changing units in this way on the
size of the correlation?
I’d rather choose the first way of computing since it requires less time.

2.27 Correlation for debt. Figure 2.6 (page 84) is a scatterplot of 2007 debt versus 2006 debt for 24
countries. Is the correlation r for these data near −1, clearly negative but not near −1, near 0, clearly positive
but not near 1, or near 1? Explain your answer.

X Values
∑ = 10400
Mean = 2080
∑(X - Mx)2 = SSx = 9920096

Y Values
∑ = 11316
Mean = 2263.2
∑(Y - My)2 = SSy = 10253300.8

X and Y Combined
N=5
∑(X - Mx)(Y - My) = 10074160

R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))
r = 10074160 / √((9920096)(10253300.8)) = 0.9989
r = 0.9989

Thus, the correlation r is near 1.


2.29 Strong association but no correlation. Here is a data set that illustrates an important point
about correlation:

x 20 30 40 50 60
y 10 30 50 30 10
(a) Make a scatterplot of y versus x.

(b) Describe the relationship between y and x. Is it weak or strong? Is it linear?

As a rule of thumb, a correlation coefficient between 0.25 and 0.5 is considered to be a


“weak” correlation between two variables. Thus, the correlation between y and x with a value of
0 is considered weak. It is still a linear correlation.

(c) Find the correlation between y and x.

X Values

∑ = 200
Mean = 40
∑(X - Mx)2 = SSx = 1000

Y Values
∑ = 130
Mean = 26
∑(Y - My)2 = SSy = 1120

X and Y Combined
N=5
∑(X - Mx)(Y - My) = 0

R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))

r = 0 / √((1000)(1120)) = 0

(d) What important point about correlation does this exercise illustrate?

The important point in this exercise is the illustration of weak correlation between two
quantitative variables, and what it looks like if plotted.

2.32 First test and final exam. How strong is the relationship between the score on the first test
and the score on the final exam in an elementary statistics course? Here are data for eight students
from such a course:
First-test score 153 144 162 149 127 158 158 153
Final-exam score 145 140 145 170 145 175 170 160

(a) Do you think that one of these variables should be an explanatory variable and the other a
response variable? Give reasons for your answer.
Yes, that's possible. One explanation for the final scores being higher or lower than the first
score is that the students carried their knowledge of the subjects covered on the initial test over to
the final exam. Thus, the final exam results are influenced by how well they understood the subjects
covered on the first test. In this case, the explanatory variable could be the first-test score and the
response variable is the final-exam score.

(b) Make a scatterplot and describe the relationship.

Although technically a positive correlation, the relationship between your variables is weak because
the nearer the value is to zero, the weaker the relationship.

(c) Find the correlation.


X Values
∑ = 1204
Mean = 150.5
∑(X - Mx)2 = SSx = 854

Y Values
∑ = 1250
Mean = 156.25
∑(Y - My)2 = SSy = 1387.5

X and Y Combined
N=8
∑(X - Mx)(Y - My) = 445

R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))

r = 445 / √((854)(1387.5)) = 0.4088

(d) Give some possible reasons why this relationship is so weak.


The nearer the value is to zero, the weaker the relationship.

2.33 Second test and final exam. Refer to the previous exercise. Here are the data for the second
test and the final exam for the same students.

Second-test score 158 162 144 162 136 158 175 153
Final-exam score 145 140 145 170 145 175 170 160

(a) Explain why you should use the second-test score as the explanatory variable.
The second-test score is the explanatory variable because it affects or causes the results in
the final-exam score.

(b) Make a scatterplot and describe the relationship.

The scatterplot shows a weak to moderate positive relationship between the variables.

(c) Find the correlation.


X Values
∑ = 1248
Mean = 156
∑(X - Mx)2 = SSx = 994

Y Values
∑ = 1250
Mean = 156.25
∑(Y - My)2 = SSy = 1387.5

X and Y Combined
N=8
∑(X - Mx)(Y - My) = 610

R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))

r = 610 / √((994)(1387.5)) = 0.5194

(d) Why do you think the relationship between the second-test score and the final-exam score is
stronger than the relationship between the first-test score and the final-exam score?

The relationship between the second-test score and final-exam score is stronger because
the timeframe between the happening of the case is closer compared to the timeframe between first-
test and final-exam.

2.34 Add an outlier. Refer to the previous exercise. Add a ninth student whose scores on the
second test and final exam would lead you to classify the additional data point as an outlier.

(a) Highlight the outlier on your scatterplot.

(b) Find the correlation and describe the effect of the outlier on the correlation.
X Values
∑ = 1438
Mean = 159.778
∑(X - Mx)2 = SSx = 2021.556
Y Values
∑ = 1445
Mean = 160.556
∑(Y - My)2 = SSy = 2722.222

X and Y Combined
N=9
∑(X - Mx)(Y - My) = 1781.111

R Calculation
r = ∑((X - My)(Y - Mx)) / √((SSx)(SSy))

r = 1781.111 / √((2021.556)(2722.222)) = 0.7593

This is a strong positive correlation, which means that high X variable scores go with high Y variable
scores (and vice versa).

(c) Describe the performance of the student on the second exam and final exam and why that leads
to the conclusion that the result is an outlier. Give a possible reason for the performance of this
student.
The ninth student got high scores on both exams. These scores greatly affect the correlation
of the two variables resulting in an outlier or an increase in a correlation. Notice that from r 0.5194, it
went to r 0.7593.

2.57 A regression line. A regression equation is y = 10 + 20x.

(a) What is the slope of the regression line? 20

(b) What is the intercept of the regression line? 10

(c) Find the predicted values of y for x = 10, for x = 20, and for x = 30.
x = 10
y = 10 + 20 (x)
y = 10 + 20 (10)
y = 10 + 200
y = 210

x = 20
y = 10 + 20 (x)
y = 10 + 20 (20)
y = 10 + 400
y= 410

y = 410x = 30
y = 10 + 20 (x)
y = 10 + 20 (30)
y = 10 + 600
y = 610

(d) Plot the regression line for values of x between 0 and 50.

2.59 The “January effect.” Some people think that the behavior of the stock market in January
predicts its behavior for the rest of the year. Take the explanatory variable x to be the percent
change in a stock market index in January and the response variable y to be the change in the index
for the entire year. We expect a positive correlation between x and y because the change during
January contributes to the full year’s change. Calculation based on 38 years of data gives

x = 1.75% sx = 5.36% r = 0.596


y = 9.07% sy = 15.35%
(a) What percent of the observed variation in yearly changes in the index is explained by a straight-
line relationship with the change during January?
𝑟²
𝑟=0.596²
𝑟=0.3552 𝑥 100
𝑟=35.52%
r = 0.596

(b) What is the equation of the least-squares line for predicting full-year change from January
change?
𝑏=𝑟 𝑥 𝑆_𝑦/𝑆_𝑥
𝑏=1.1707yˆ
= 6.08% + 1.707 x

(c) The mean change in January is x = 1.75%. Use your regression line to predict change in the
index in a year in which the index rises 1.75% in January. Why could you have given this result (up
to round off error) without doing the calculation?

𝑏=𝑟 𝑥 𝑆_𝑦/𝑆_𝑥
ȳ = 9.08%.

The least-squares regression line passes through the point (x, y). Thus, we would predict ȳ
= ȳ = 9.07% when x = x = 1.75%.

You might also like