2023 Tutorial 11
2023 Tutorial 11
1. Does a high value of R imply that two variables are causally related? Explain.
2
2. The Dow Jones Industrial Average (DJIA) and the Standard & Poor’s 500 (S&P 500)
indexes are used as measures of overall movement in the stock market. The DJIA is
based on the price movements of 30 large companies; the S&P 500 is an index composed
of 500 stocks. Some say the S&P 500 is a better measure of stock market performance
because it is broader based. The closing price for the DJIA and the S&P 500 for 1 0
weeks, of a previous year follow (Barron’s website). R output is given below.
a. Develop the estimated regression equation with DJIA as the independent variable.
b. Test for a significant relationship. Use a = .05.
c. Did the estimated regression equation provide a good fit? Explain.
d. Suppose that the closing price for the DJIA is 13,500. Predict the closing price for the
S&P 500.
f. Should we be concerned that the DJIA value of 13,500 used to predict the S&P 500
value in part (e) is beyond the range of the data used to develop the estimated regression
equation?
Call:
lm(formula = SP ~ DJIA, data = closingprice)
Residuals:
Min 1Q Median 3Q Max
-9.575 -7.074 -2.090 6.856 12.849
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -422.92020 280.25708 -1.509 0.169724
DJIA 0.13853 0.02155 6.430 0.000203 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
1
3. Is the number of square feet of living space a good predictor of a house’s selling price?
The following data collected in April, 2015, show the square footage and selling price for
fifteen houses in Winston Salem, North Carolina (Zillow.com).
a. Develop a scatter diagram with square feet of living space as the independent variable
and selling price as the dependent variable. What does the scatter diagram indicate about
the relationship between the size of a house and the selling price?
b. Develop the estimated regression equation that could be used to predict the selling
price given the number of square feet of living space.
c. At the .05 level, is there a significant relationship between the two variables?
d. Use the estimated regression equation to predict the selling price of a 2000 square foot
house in Winston Salem, North Carolina.
e. Do you believe the estimated regression equation developed in part (b) will provide a
good prediction of selling price of a particular house in Winston Salem, North Carolina?
Explain.
f. Would you be comfortable using the estimated regression equation developed in part
(b) to predict the selling price of a particular house in Seattle, Washington? Why or why
not?
2
Call:
lm(formula = SellingPrice ~ Size, data = housesale)
Residuals:
Min 1Q Median 3Q Max
-30.953 -25.149 7.078 20.335 31.539
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -59.01 21.27 -2.775 0.0158 *
Size 115.06 10.78 10.676 8.38e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
4. In a manufacturing process the assembly line speed (feet per minute) was thought to
affect the number of defective parts found during the inspection process. To test this
theory, managers devised a situation in which the same batch of parts was inspected
visually at a variety of line speeds. They collected the following data.
a. Develop the estimated regression equation that relates line speed to the number of
defective parts found.
3
b. At a .05 level of significance, determine whether line speed and number of defective
parts found are related.
c. Did the estimated regression equation provide a good fit to the data?
d. Develop a 95% confidence interval to predict the mean number of defective parts for a
line speed of 50 feet per minute.
Call:
lm(formula = NoDefects ~ LineSpeed, data = defects)
Residuals:
1 2 3 4 5 6
1.7826 -0.2174 -1.2609 -1.7391 0.6957 0.7391
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.17391 1.65275 13.416 0.000179 ***
LineSpeed -0.14783 0.04391 -3.367 0.028135 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
5. One of the biggest changes in higher education in recent years has been the growth of
online universities. The Online Education Database is an independent organization whose
mission is to build a comprehensive list of the top accredited online colleges. The
following
table shows the retention rate (%) and the graduation rate (%) for 29 online colleges
(Online Education Database website, January 2009).
4
a. Develop a scatter diagram with retention rate as the independent variable. What does
the scatter diagram indicate about the relationship between the two variables?
b. Develop the estimated regression equation.
c. Test for a significant relationship. Use α =.05.
d. Did the estimated regression equation provide a good fit?
e. Suppose you were the president of South University. After reviewing the results, would
you have any concerns about the performance of your university as compared to other
online universities?
f. Suppose you were the president of the University of Phoenix. After reviewing the
results, would you have any concerns about the performance of your university as
compared to other online universities?
5
Call:
lm(formula = GR ~ RR, data = OnlineEdu)
Residuals:
Min 1Q Median 3Q Max
-14.9337 -6.4945 0.9448 4.8067 13.9198
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.42290 3.74628 6.786 2.74e-07 ***
RR 0.28453 0.06063 4.693 6.95e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.456 on 27 degrees of freedom
Multiple R-squared: 0.4492, Adjusted R-squared: 0.4288
F-statistic: 22.02 on 1 and 27 DF, p-value: 6.955e-05
a. Develop an estimated regression equation showing how total points earned is related to
hours spent studying.
b. Test the significance of the model with α =.05.
c. Predict the total points earned by Mark Sweeney. He spent 95 hours studying.
d. Develop a 95% prediction interval for the total points earned by Mark Sweeney.
6
Call:
lm(formula = Points ~ Hours, data = HoursPts)
Residuals:
Min 1Q Median 3Q Max
-9.767 -4.923 -3.006 6.909 9.494
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.8470 7.9717 0.733 0.484
Hours 0.8295 0.1095 7.577 6.44e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.523 on 8 degrees of freedom
Multiple R-squared: 0.8777, Adjusted R-squared: 0.8624
F-statistic: 57.42 on 1 and 8 DF, p-value: 6.44e-05