Introduction To Linear Regression and Correlation Analysis
Introduction To Linear Regression and Correlation Analysis
y y
x x
y y
x x
Scatter Plot Examples
(continued)
Strong relationships Weak relationships
y y
x x
y y
x x
Scatter Plot Examples
(continued)
No relationship
x
Correlation Coefficient
(continued)
x x x
r = -1 r = -.6 r=0
y y
x x
r = +.3 r = +1
Calculating the
Correlation Coefficient
Sample correlation coefficient:
r
( x x)( y y)
[ ( x x ) ][ ( y y ) ]
2 2
r
t (with n – 2 degrees of freedom)
1 r 2
n2
Introduction to Regression Analysis
y β0 β1x ε
Variable
y y β0 β1x ε
Observed Value
of y for xi
εi Slope = β1
Predicted Value
Random Error
of y for xi
for this x value
Intercept = β0
xi x
Estimated Regression Model
The sample regression line provides an estimate of
the population regression line
ŷ i b0 b1x variable
e 2
(y ŷ) 2
(y (b 0 b1x))
2
The Least Squares Equation
The formulas for b1 and b0 are:
b1
( x x )( y y )
(x x) 2
algebraic equivalent:
and
xy x y
b1 n b0 y b1 x
x 2
( x ) 2
n
Interpretation of the
Slope and the Intercept
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
SSR
R
2 where 0 R 12
SST
Coefficient of Determination, R2
(continued)
Coefficient of determination
SSR sum of squares explained by regression
R
2
SST total sum of squares
R r2 2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Excel Output
SSR 18934.9348
Regression Statistics
R 2
0.58082
Multiple R 0.76211 SST 32600.5000
R Square 0.58082
Adjusted R Square 0.52842 58.08% of the variation in
Standard Error 41.33032 house prices is explained by
Observations 10
variation in square feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
SSE
s
n k 1
Where
SSE = Sum of squares error
n = Sample size
k = number of independent variables in the
model
The Standard Deviation of the
Regression Slope
The standard error of the regression slope
coefficient (b1) is estimated by
sε sε
sb1
(x x) 2
x 2
( x)2
n
where:
sb1 = Estimate of the standard error of the least squares slope
SSE
sε = Sample standard error of the estimate
n2
Excel Output
Regression Statistics sε 41.33032
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error
Observations
41.33032
10
sb1 0.03297
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
b1 β1 where:
t b1 = Sample regression slope
sb1 coefficient
β1 = Hypothesized slope
sb1 = Estimator of the standard
d.f. n 2
error of the slope
Inference about the Slope:
t Test
(continued)
d.f. = 10-2 = 8
Decision:
/2=.025 /2=.025 Reject H0
Conclusion:
Reject H0 Do not reject H0 Reject H
There is sufficient evidence
-tα/2 tα/2 0
1 (x p x)
2
ŷ t /2sε
n (x x) 2
Confidence Interval for
an Individual y, Given x
Confidence interval estimate for an
Individual value of y given a particular xp
1 (x p x)
2
ŷ t /2 sε 1
n (x x) 2
Prediction Interval
for an individual y,
y given xp
Confidence
Interval for
+ b x the mean of
y = b0
1
y, given xp
x
x xp
Example: House Prices
98.25 0.1098(2000)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Estimation of Mean Values:
Example
Confidence Interval Estimate for E(y)|xp
Find the 95% confidence interval for the average
price of 2,000 square-foot houses
Predicted Price Yi = 317.85 ($1,000s)
1 (x p x)2
ŷ t α/2 sε 317.85 37.12
n (x x) 2
1 (x p x)2
ŷ t α/2 sε 1 317.85 102.28
n (x x) 2