0% found this document useful (0 votes)
7 views

Lecture 11 Simple Linear Regression

Uploaded by

em2547160
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lecture 11 Simple Linear Regression

Uploaded by

em2547160
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Lecture 11

Simple Linear Regression


Lecture Outline
Simple Linear Regression Model
Least Squares Method
Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation for
Estimation and Prediction
Computer Solution
Residual Analysis: Validating Model
Assumptions
Residual Analysis: Outliers and Influential
Observations

Slide 2
The Simple Linear Regression Model
Simple Linear Regression Model
y = 0 + 1 x + 

Simple Linear Regression Equation


E(y) = 0 + 1x

Estimated Simple Linear Regression Equation


y^ = b0 + b1x

Slide 3
Least Squares Method
Least Squares Criterion

min  (y i − y i ) 2

where:
yi = observed value of the dependent variable
for the ith observation
y^i = estimated value of the dependent
variable for the ith observation

Slide 4
The Least Squares Method
Slope for the Estimated Regression Equation

 xi y i − (  xi  y i ) / n
b1 = 2 2
 xi − (  xi ) / n
y-Intercept for the Estimated Regression
Equation
_ _
b0 = y - b1x
where:
xi = value of independent variable for ith observation
yi = value of dependent variable for ith observation
_
x = mean value for independent variable
_
y = mean value for dependent variable
n = total number of observations

Slide 5
Example: Reed Auto Sales
Simple Linear Regression
Reed Auto periodically has a special week-long
sale. As part of the advertising campaign Reed
runs one or more television commercials during
the weekend preceding the sale. Data from a
sample of 5 previous sales are shown below.
Number of TV Ads Number of Cars Sold
1 14
3 24
2 18
1 17
3 27

Slide 6
Example: Reed Auto Sales
Slope for the Estimated Regression
Equation
b1 = 220 - (10)(100)/5 = 5
24 - (10)2/5
y-Intercept for the Estimated Regression
Equation
b0 = 20 - 5(2) = 10
Estimated Regression Equation
y^ = 10 + 5x

Slide 7
Example: Reed Auto Sales
Scatter Diagram

30
25
20
Cars Sold

y = 5x + 10
15
10
5
0
0 1 2 3 4
TV Ads

Slide 8
The Coefficient of Determination
Relationship Among SST, SSR, SSE
SST = SSR + SSE
 ( y i − y ) =  ( y^i − y ) +  ( y i − y^i )
2 2 2

Coefficient of Determination
r2 = SSR/SST
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error

Slide 9
Example: Reed Auto Sales
Coefficient of Determination

r2 = SSR/SST = 100/114 = .8772

The regression relationship is very strong since


88% of the variation in number of cars sold can
be explained by the linear relationship between
the number of TV ads and the number of cars
sold.

Slide 10
The Correlation Coefficient
Sample Correlation Coefficient

rxy = (sign of b1 ) Coefficient of Determination

rxy = (sign of b1 ) r 2

where:
b1 = the slope of the estimated regression
equation yˆ = b0 + b1 x

Slide 11
Example: Reed Auto Sales
Sample Correlation Coefficient

rxy = (sign of b1 ) r 2
The sign of b1 in the equation yˆ = 10 + 5 x is “+”.

rxy = + .8772
rxy = +.9366

Slide 12
Model Assumptions
Assumptions About the Error Term 
The error  is a random variable with mean
of zero.
The variance of  , denoted by  2, is the same
for all values of the independent variable.
The values of  are independent.
The error  is a normally distributed random
variable.

Slide 13
Testing for Significance
To test for a significant regression relationship,
we must conduct a hypothesis test to determine
whether the value of 1 is zero.
Two tests are commonly used
t Test
F Test
Both tests require an estimate of  2, the
variance of  in the regression model.

Slide 14
Testing for Significance
An Estimate of  2
The mean square error (MSE) provides the
estimate
of  2, and the notation s2 is also used.
s2 = MSE = SSE/(n-2)
where:
SSE =  ( yi − yˆ i ) 2 =  ( yi − b0 − b1 xi ) 2

Slide 15
Testing for Significance
An Estimate of 
To estimate  we take the square root of  2.
The resulting s is called the standard error of
the estimate.

SSE
s = MSE =
n−2

Slide 16
Testing for Significance: t Test
Hypotheses
H0 :  1 = 0
Ha: 1 = 0
Test Statistic
b1
t=
sb 1
Rejection Rule
Reject H0 if t < -t or t > t
where t is based on a t distribution with
n - 2 degrees of freedom.

Slide 17
Example: Reed Auto Sales
t Test
Hypotheses H0 :  1 = 0
Ha :  1 = 0
Rejection Rule
For  = .05 and d.f. = 3, t.025 = 3.182
Reject H0 if t > 3.182
Test Statistics
t = 5/1.08 = 4.63
Conclusions
Reject H0

Slide 18
Confidence Interval for 1
We can use a 95% confidence interval for 1 to

test the hypotheses just used in the t test.

H0 is rejected if the hypothesized value of 1 is

not included in the confidence interval for 1.

Slide 19
Confidence Interval for 1
The form of a confidence interval for 1 is:
b1  t / 2 sb1

where
b1 is the point estimate
t / 2 sb1 is the margin of error
t / 2 is the t value providing an area
of /2 in the upper tail of a
t distribution with n - 2 degrees
of freedom

Slide 20
Example: Reed Auto Sales
Rejection Rule
Reject H0 if 0 is not included in the
confidence interval for 1.
95% Confidence Interval for 1
b1  t / 2 sb1 = 5 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44
Conclusion
Reject H0

Slide 21
Testing for Significance: F Test
Hypotheses
H 0 : 1 = 0
H a : 1 = 0
Test Statistic
F = MSR/MSE
Rejection Rule
Reject H0 if F > F
where F is based on an F distribution with 1
d.f. in the numerator and n - 2 d.f. in the
denominator.

Slide 22
Example: Reed Auto Sales
F Test
Hypotheses H 0 : 1 = 0
H a : 1 = 0
Rejection Rule
For  = .05 and d.f. = 1, 3: F.05 = 10.13
Reject H0 if F > 10.13.
Test Statistic
F = MSR/MSE = 100/4.667 = 21.43
Conclusion
We can reject H0.

Slide 23
Some Cautions about the
Interpretation of Significance Tests
Rejecting H0: b1 = 0 and concluding that the
relationship between x and y is significant does
not enable us to conclude that a cause-and-
effect relationship is present between x and y.
Just because we are able to reject H0: b1 = 0
and demonstrate statistical significance does
not enable us to conclude that there is a linear
relationship between x and y.

Slide 24
Using the Estimated Regression
Equation for Estimation and Prediction
Confidence Interval Estimate of E(yp)
y p  t  /2 s y p

Prediction Interval Estimate of yp

yp + t/2 sind

where the confidence coefficient is 1 -  and t/2


is based on a t distribution with n - 2 d.f.

Slide 25
Example: Reed Auto Sales
Point Estimation
If 3 TV ads are run prior to a sale, we expect the
mean number of cars sold to be:
y^ = 10 + 5(3) = 25 cars
Confidence Interval for E(yp)
95% confidence interval estimate of the mean
number of cars sold when 3 TV ads are run is:
25 + 4.61 = 20.39 to 29.61 cars
Prediction Interval for yp
95% prediction interval estimate of the number
of cars sold in one particular week when 3 TV
ads are run is: 25 + 8.28 = 16.72 to 33.28 cars

Slide 26
Residual Analysis
Residual for Observation i
yi – y^i

Standardized Residual for Observation i

y i − y^i
syi − y^i
where:
syi − y^i = s 1 − hi

Slide 27
Example: Reed Auto Sales
Residuals

Observation Predicted Cars Sold Residuals


1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2

Slide 28
Example: Reed Auto Sales
Residual Plot

TV Ads Residual Plot


3
2
Residuals

1
0
-1
-2
-3
0 1 2 3 4
TV Ads

Slide 29
Residual Analysis
Detecting Outliers
An outlier is an observation that is unusual in
comparison with the other data.
Minitab classifies an observation as an outlier if
its standardized residual value is <-2 or >+2.
This standardized residual rule sometimes fails
to identify an unusually large observation as
being an outlier.
This rule’s shortcoming can be circumvented by
using studentized deleted residuals.
The |i th studentized deleted residual| will be
larger than the |i th standardized residual|.

Slide 30

You might also like