0% found this document useful (0 votes)
75 views

2023 Tutorial 11

This document contains 6 tutorials with statistical analyses of relationships between variables. Each tutorial provides sample data, R code to run a linear regression, and questions to interpret the results. Questions assess whether variables are correlated, the fit of the regression model, use of the model to predict new values, and implications for real-world scenarios. The tutorials cover a range of applications from stock prices and home values to manufacturing defects and educational outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

2023 Tutorial 11

This document contains 6 tutorials with statistical analyses of relationships between variables. Each tutorial provides sample data, R code to run a linear regression, and questions to interpret the results. Questions assess whether variables are correlated, the fit of the regression model, use of the model to predict new values, and implications for real-world scenarios. The tutorials cover a range of applications from stock prices and home values to manufacturing defects and educational outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Tutorial 12

1. Does a high value of R imply that two variables are causally related? Explain.
2

2. The Dow Jones Industrial Average (DJIA) and the Standard & Poor’s 500 (S&P 500)
indexes are used as measures of overall movement in the stock market. The DJIA is
based on the price movements of 30 large companies; the S&P 500 is an index composed
of 500 stocks. Some say the S&P 500 is a better measure of stock market performance
because it is broader based. The closing price for the DJIA and the S&P 500 for 1 0
weeks, of a previous year follow (Barron’s website). R output is given below.

a. Develop the estimated regression equation with DJIA as the independent variable.
b. Test for a significant relationship. Use a = .05.
c. Did the estimated regression equation provide a good fit? Explain.
d. Suppose that the closing price for the DJIA is 13,500. Predict the closing price for the
S&P 500.
f. Should we be concerned that the DJIA value of 13,500 used to predict the S&P 500
value in part (e) is beyond the range of the data used to develop the estimated regression
equation?

Call:
lm(formula = SP ~ DJIA, data = closingprice)

Residuals:
Min 1Q Median 3Q Max
-9.575 -7.074 -2.090 6.856 12.849

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -422.92020 280.25708 -1.509 0.169724
DJIA 0.13853 0.02155 6.430 0.000203 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1

Residual standard error: 9.15 on 8 degrees of freedom


Multiple R-squared: 0.8379, Adjusted R-squared: 0.8176
F-statistic: 41.34 on 1 and 8 DF, p-value: 0.0002027

1
3. Is the number of square feet of living space a good predictor of a house’s selling price?
The following data collected in April, 2015, show the square footage and selling price for
fifteen houses in Winston Salem, North Carolina (Zillow.com).

a. Develop a scatter diagram with square feet of living space as the independent variable
and selling price as the dependent variable. What does the scatter diagram indicate about
the relationship between the size of a house and the selling price?
b. Develop the estimated regression equation that could be used to predict the selling
price given the number of square feet of living space.
c. At the .05 level, is there a significant relationship between the two variables?
d. Use the estimated regression equation to predict the selling price of a 2000 square foot
house in Winston Salem, North Carolina.
e. Do you believe the estimated regression equation developed in part (b) will provide a
good prediction of selling price of a particular house in Winston Salem, North Carolina?
Explain.
f. Would you be comfortable using the estimated regression equation developed in part
(b) to predict the selling price of a particular house in Seattle, Washington? Why or why
not?

2
Call:
lm(formula = SellingPrice ~ Size, data = housesale)

Residuals:
Min 1Q Median 3Q Max
-30.953 -25.149 7.078 20.335 31.539

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -59.01 21.27 -2.775 0.0158 *
Size 115.06 10.78 10.676 8.38e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 24.58 on 13 degrees of freedom


Multiple R-squared: 0.8976, Adjusted R-squared: 0.8897
F-statistic: 114 on 1 and 13 DF, p-value: 8.378e-08

4. In a manufacturing process the assembly line speed (feet per minute) was thought to
affect the number of defective parts found during the inspection process. To test this
theory, managers devised a situation in which the same batch of parts was inspected
visually at a variety of line speeds. They collected the following data.

a. Develop the estimated regression equation that relates line speed to the number of
defective parts found.

3
b. At a .05 level of significance, determine whether line speed and number of defective
parts found are related.
c. Did the estimated regression equation provide a good fit to the data?
d. Develop a 95% confidence interval to predict the mean number of defective parts for a
line speed of 50 feet per minute.

Call:
lm(formula = NoDefects ~ LineSpeed, data = defects)

Residuals:
1 2 3 4 5 6
1.7826 -0.2174 -1.2609 -1.7391 0.6957 0.7391

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.17391 1.65275 13.416 0.000179 ***
LineSpeed -0.14783 0.04391 -3.367 0.028135 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.489 on 4 degrees of freedom


Multiple R-squared: 0.7391, Adjusted R-squared: 0.6739
F-statistic: 11.33 on 1 and 4 DF, p-value: 0.02813

5. One of the biggest changes in higher education in recent years has been the growth of
online universities. The Online Education Database is an independent organization whose
mission is to build a comprehensive list of the top accredited online colleges. The
following
table shows the retention rate (%) and the graduation rate (%) for 29 online colleges
(Online Education Database website, January 2009).

4
a. Develop a scatter diagram with retention rate as the independent variable. What does
the scatter diagram indicate about the relationship between the two variables?
b. Develop the estimated regression equation.
c. Test for a significant relationship. Use α =.05.
d. Did the estimated regression equation provide a good fit?
e. Suppose you were the president of South University. After reviewing the results, would
you have any concerns about the performance of your university as compared to other
online universities?
f. Suppose you were the president of the University of Phoenix. After reviewing the
results, would you have any concerns about the performance of your university as
compared to other online universities?

5
Call:
lm(formula = GR ~ RR, data = OnlineEdu)
 
Residuals:
     Min       1Q   Median       3Q      Max 
-14.9337  -6.4945   0.9448   4.8067  13.9198 
 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 25.42290    3.74628   6.786 2.74e-07 ***
RR           0.28453    0.06063   4.693 6.95e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 7.456 on 27 degrees of freedom
Multiple R-squared:  0.4492, Adjusted R-squared:  0.4288 
F-statistic: 22.02 on 1 and 27 DF,  p-value: 6.955e-05

6. A marketing professor at Givens College is interested in the relationship between hours


spent studying and total points earned in a course. Data are collected on 10 students who
took the course last quarter. R output is given below.

a. Develop an estimated regression equation showing how total points earned is related to
hours spent studying.
b. Test the significance of the model with α =.05.
c. Predict the total points earned by Mark Sweeney. He spent 95 hours studying.
d. Develop a 95% prediction interval for the total points earned by Mark Sweeney.

6
Call:
lm(formula = Points ~ Hours, data = HoursPts)
 
Residuals:
   Min     1Q Median     3Q    Max 
-9.767 -4.923 -3.006  6.909  9.494 
 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.8470     7.9717   0.733    0.484    
Hours         0.8295     0.1095   7.577 6.44e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 7.523 on 8 degrees of freedom
Multiple R-squared:  0.8777, Adjusted R-squared:  0.8624 
F-statistic: 57.42 on 1 and 8 DF,  p-value: 6.44e-05

You might also like