0% found this document useful (0 votes)
24 views5 pages

3 Linear Regression 2

This document discusses assessing the accuracy of regression models through residual standard error (RSE) and the R2 statistic. It provides details on: 1) RSE measures how much data deviates from the regression line on average. For advertising data, RSE was 3,260 units, or 23% of the mean sales. 2) R2 measures the proportion of variability in the response explained by the model. For advertising data, R2 was 0.612, meaning the model explained just under two-thirds of sales variability. 3) Multiple linear regression extends simple linear regression to incorporate multiple predictor variables and their interactions, providing a more accurate prediction than separate simple regressions.

Uploaded by

neuro.ultragod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views5 pages

3 Linear Regression 2

This document discusses assessing the accuracy of regression models through residual standard error (RSE) and the R2 statistic. It provides details on: 1) RSE measures how much data deviates from the regression line on average. For advertising data, RSE was 3,260 units, or 23% of the mean sales. 2) R2 measures the proportion of variability in the response explained by the model. For advertising data, R2 was 0.612, meaning the model explained just under two-thirds of sales variability. 3) Multiple linear regression extends simple linear regression to incorporate multiple predictor variables and their interactions, providing a more accurate prediction than separate simple regressions.

Uploaded by

neuro.ultragod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

9/24/2021

Assessing the Accuracy of the Model

• Once we have rejected the null hypothesis in favor of the


alternative hypothesis, it is natural to want to quantify the extent
to which the model fits the data.

• The quality of a linear regression fit is typically assessed using


two related quantities:
1. The residual standard error (RSE) and
2. The R2 statistic.

11

11

1. Residual Standard Error


• Recall from the population line equation that associated with
each observation is an error term .
• Due to the presence of these error terms, even if we knew the
true regression line, we would not be able to perfectly predict Y
from X
• The RSE is an estimate of the standard deviation of .
• It is the average amount that the response will deviate from the true
regression line.

12

12

1
9/24/2021

The least squares model for the regression of number of units


sold on TV advertising budget.

• Residual standard error = 3.26


• In other words, actual sales in each market deviate from the true
regression line by approximately 3,260 units, on average.
• In the advertising data set, the mean value of sales over all markets is
approximately 14,000 units, and so the percentage error is 3,260/14,000 =
23%.
• RSE is considered a measure of the lack of fit of the model to the data

• Since the RSE is measured in the units of Y, it is not always clear


what constitutes a good RSE

13

13

2. Coefficient of Determination: R2
• Some of the variation in Y can be explained by variation in the
X’s and some cannot.

• R2 measures the proportion of variability in Y that can be


explained using X.

𝑅𝑆𝑆
𝑅2 = 1 −
σ𝑛𝑖=1 𝑌𝑖 − 𝑌ത 2

• R2 is always between 0 and 1. Zero means no variance has been


explained. One means it has all been explained (perfect fit to the
data).

14

14

2
9/24/2021

• An R2 statistic that is close to 1 indicates:


• that a large proportion of the variability in the response has been
explained by the regression.

• A number near 0 indicates:


• that the regression did not explain much of the variability in the response;
this might occur because the linear model is wrong, or the inherent error
σ2 is high, or both.

• R2 = 0.612
• just under two-thirds of the variability in sales is explained by a linear
regression on TV.

• In the simple linear regression setting, R2 = r2

15

15

Multiple Linear Regression (MLR)


• Simple linear regression is a useful approach for predicting a
response on the basis of a single predictor variable. However, in
practice we often have more than one predictor.

• How can we extend our analysis to accommodate additional


predictors?

• One option is to run separate simple linear regressions for each


predictor; not entirely satisfactory.
1. Difficult to make a single prediction given the different predictors.
2. Each predictor ignores the other predictors. What if the various
predictors are correlated ➔ can lead to very misleading estimates of
the individual predictor's effects on response.

16

16

3
9/24/2021

Multiple Linear Regression (2)

Population Yi = b0 + b1X1 + b2 X2 + + bp Xp +e
line

Least Squares
line
Yˆi = b̂0 + b̂1 X1 + b̂2 X2 + + b̂ p X p

• The parameters in the linear regression model are very easy to


interpret.
• 0 is the intercept (i.e. the average value for Y if all the X’s are
zero), j is the slope for the jth variable Xj
• j is the average increase in Y when Xj is increased by one unit
and all other X’s are held constant.

17

17

MLR on advertising data

18

18

4
9/24/2021

Observations
• Simple and multiple regression coefficients can be quite different.
• Does it make sense for the multiple regression to suggest no
relationship between sales and newspaper while the simple
linear regression implies the opposite?

• So newspaper sales are a surrogate for radio advertising;


newspaper gets “credit” for the effect of radio on sales.
• Almost all the explaining that Newspapers could do in simple regression
has already been done by TV and Radio in multiple regression!

19

19

20

20

You might also like