Chapter 13 Linear Regression
Chapter 13 Linear Regression
Chap 13-1
Learning Objectives
In this chapter, you learn:
To use regression analysis to predict the value of a
dependent variable based on an independent variable
The meaning of the regression coefficients b0 and b1
To evaluate the assumptions of regression analysis and
Chap 13-2
the relationship
No causal effect is implied with correlation
Correlation was first presented in Chapter 3
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
Chap 13-3
Regression Analysis
Regression analysis is used to:
Predict the value of a dependent variable based
Chap 13-4
by a linear function
Changes in Y are related to changes in X
Chap 13-5
Types of Relationships
Linear relationships
Curvilinear relationships
X
Y
X
Y
X
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
X
Chap 13-6
Types of Relationships
Strong relationships
Y
Weak relationships
Y
X
Y
X
Y
X
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
X
Chap 13-7
Types of Relationships
Y
No relationship
Y
X
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
Chap 13-8
Population
Slope
Coefficient
Independent
Variable
Random
Error
term
Yi 0 1X i i
Linear component
Random Error
component
Chap 13-9
Yi 0 1X i i
Observed Value
of Y for Xi
i
Predicted Value
of Y for Xi
Slope = 1
Random Error
for this Xi value
Intercept = 0
Xi
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
X
Chap 13-10
Estimate of the
regression
intercept
Estimate of the
regression slope
Yi b 0 b1X i
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
Value of X for
observation i
Chap 13-11
Chap 13-12
Chap 13-13
Interpretation of the
Intercept and the Slope
b0 is the estimated mean value of Y when the
value of X is zero
b1 is the estimated change in the mean value of
Chap 13-14
$1000s
Independent variable (X) = square feet
Chap 13-15
Square Feet
(X)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Chap 13-16
Chap 13-17
Chap 13-18
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
10
ANOVA
df
SS
MS
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Intercept
Square Feet
Coefficients
Standard Error
t Stat
F
11.0848
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Chap 13-19
Slope
= 0.10977
Intercept
= 98.248
Chap 13-20
Chap 13-21
Chap 13-22
Chap 13-23
Do not try to
extrapolate beyond
the range of
observed Xs
Chap 13-24
Measures of Variation
Total variation is made up of two parts:
SST
Total Sum of
Squares
SST ( Yi Y )2
SSR
Regression Sum of
Squares
SSE
Error Sum of
Squares
)2
SSR ( Yi Y )2 SSE ( Yi Y
i
where:
Y
i = Predicted value of Y for the given X i value
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
Chap 13-25
Measures of Variation
SST = total sum of squares
Measures the variation of the Yi values around
their mean Y
SSR = regression sum of squares
Explained variation attributable to the
Chap 13-26
Measures of Variation
Y
Yi
SSE = (Yi - Yi )2
Xi
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
_
Y
X
Chap 13-27
Coefficient of Determination, r2
The coefficient of determination is the portion of
SST
total sum of squares
2
0 r 1
2
Chap 13-28
Coefficient of Determination, r2
Y
r2 = 1
r2 = -1
r =1
2
Chap 13-29
Coefficient of Determination, r2
Y
0 < r2 < 1
X
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
Chap 13-30
Coefficient of Determination, r2
r2 = 0
r2 = 0
Chap 13-31
0.58082
SST 32600.5000
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
41.33032
Observations
10
ANOVA
df
SS
MS
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Intercept
Square Feet
Coefficients
Standard Error
t Stat
F
11.0848
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Chap 13-32
SYX
SSE
n2
(
Y
Y
)
i i
i 1
n2
Where
SSE = error sum of squares
n = sample size
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
Chap 13-33
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
S YX 41.33032
41.33032
Observations
10
ANOVA
df
SS
MS
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
Intercept
Square Feet
Coefficients
Standard Error
t Stat
F
11.0848
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Chap 13-34
small s YX
large s YX
Chap 13-35
Assumptions of Regression
L.I.N.E
Linearity
The relationship between X and Y is linear
Independence of Errors
Error values are statistically independent
Normality of Error
Error values are normally distributed for any given value
of X
Equal Variance (also called homoscedasticity)
The probability distribution of the errors has constant
variance
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
Chap 13-36
Residual Analysis
ei Yi Yi
The residual for observation i, ei, is the difference between its
Chap 13-37
residuals
residuals
Not Linear
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
x
Linear
Chap 13-38
residuals
residuals
Independent
residuals
Not Independent
Chap 13-39
Residuals
Examine the Box-and-Whisker Plot of the
Residuals
Examine the Histogram of the Residuals
Construct a Normal Probability Plot of the
Residuals
Chap 13-40
residuals
residuals
Unequal variance
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
x
Equal variance
Chap 13-41
Residuals
251.92316
-6.923162
273.87671
38.12329
284.85348
-5.853484
304.06284
3.937162
218.99284
-19.99284
268.38832
-49.38832
356.20251
48.79749
367.17929
-43.17929
254.6674
64.33264
10
284.85348
-29.85348
Measuring Autocorrelation:
The Durbin-Watson Statistic
Used when data are collected over time to
Chap 13-43
Autocorrelation
Autocorrelation is correlation of the errors
statistically independent
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
Chap 13-44
(e e
i2
2
e
i
i 1
i 1
Chap 13-45
Inconclusive
dL
Do not reject H0
dU
2
Chap 13-46
3296.18
3279.98
Durbin-Watson Statistic
1.00494
(e
i 2
ei1 )2
ei
3296.18
1.00494
3279.98
i1
Chap 13-47
Inconclusive
dL=1.29
Do not reject H0
dU=1.45
2
Chap 13-48
H 0 : 1 = 0
H 1 : 1 0
Test statistic
b1 1
t
Sb1
where:
d.f. n 2
Sb1 = standard
error of the slope
b1 = regression slope
coefficient
1 = hypothesized slope
Chap 13-49
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Chap 13-50
Intercept
Square Feet
Coefficients
b1
Standard Error
Sb1
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
b1 1 0.10977 0
t
3.32938
t
Sb1
0.03297
Chap 13-51
H0: 1 = 0
H1: 1 0
d.f. = 10- 2 = 8
/2=.025
Reject H0
/2=.025
Do not reject H0
-t/2
-2.3060
Reject H
0
t/2
2.3060 3.329
Decision: Reject H0
There is sufficient evidence
that square footage affects
house price
Chap 13-52
Intercept
H0: 1 = 0
H1: 1 0
Square Feet
Coefficients
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
Chap 13-53
where
MSR
F
MSE
SSR
MSR
k
SSE
MSE
n k 1
Chap 13-54
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
MSR 18934.9348
F
11.0848
MSE 1708.1957
41.33032
Observations
10
P-value for
the F-Test
ANOVA
df
SS
MS
Regression
18934.9348
18934.9348
Residual
13665.5652
1708.1957
Total
32600.5000
F
11.0848
Significance F
0.01039
Chap 13-55
H 0 : 1 = 0
MSR
F
11.08
MSE
H 1 : 1 0
= .05
df1= 1
Do not
reject H0
df2 = 8
Critical Value:
Decision:
Reject H0 at = 0.05
F = 5.32
= .05
Conclusion:
Reject H0
F.05 = 5.32
Chap 13-56
b1 t n2Sb1
d.f. = n - 2
Intercept
Square Feet
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Chap 13-57
Intercept
Square Feet
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Chap 13-58
H0: = 0
H1: 0
(correlation exists)
Test statistic
r -
1 r
n2
2
where
r r 2 if b1 0
r r 2 if b1 0
Chap 13-59
(No correlation)
H 1: 0
(correlation exists)
=.05 , df = 10 - 2 = 8
r
1 r2
n2
.762 0
1 .762 2
10 2
3.329
Chap 13-60
d.f. = 10- 2 = 8
/2=.025
Reject H0
/2=.025
-t/2
-2.3060
Do not reject H0
Reject H0
t/2
2.3060
3.329
Conclusion:
There is evidence
of a linear
association at the
5% level of
significance
Chap 13-61
Y = b0+b1Xi
Xi
X
Chap 13-62
1 (X i X) 2 1
(X i X) 2
hi
n
SSX
n (X i X) 2
Statistics for Managers Using Microsoft Excel, 5e 2008 Prentice-Hall, Inc.
Chap 13-63
Chap 13-64
Find the 95% confidence interval for the mean price of 2,000
square-foot houses
Y t n-2S YX
1
(Xi X)2
317.85 37.12
2
n (Xi X)
Chap 13-65
Estimation of Individual
Values: Example
Prediction Interval Estimate for YX=X
Y t n-1S YX
1
(Xi X)2
1
317.85 102.28
2
n (Xi X)
Chap 13-66
Chap 13-67
possible relationship
Perform residual analysis to check the
assumptions
Plot the residuals vs. X to check for violations
Chap 13-68
Chap 13-69
Chapter Summary
In this chapter, we have
Introduced types of regression models
Reviewed assumptions of regression and
correlation
Discussed determining the simple linear
regression equation
Described measures of variation
Discussed residual analysis
Addressed measuring autocorrelation
Chap 13-70
Chapter Summary
In this chapter, we have
Described inference about the slope
Discussed correlation -- measuring the
Chap 13-71