Chapter 12
Chapter 12
and Statistics
Twelfth Edition
Chapter 12
Linear Regression and
Correlation
Some graphic screen captures from Seeing Statistics ® Copyright ©2006 Brooks/Cole
Some images © 2001-(current year) www.arttoday.com A division of Thomson Learning, Inc.
Introduction
• In Chapter 11, we used ANOVA to investigate the
effect of various factor-level combinations
(treatments) on a response x.
• Our objective was to see whether the treatment
means were different.
• In Chapters 12 and 13, we investigate a response y
which is affected by various independent variables,
xi.
• Our objective is to use the information provided by
the xi to predict the value of y.
Source df SS MS F
Regression 1 SSR SSR/(1) MSR/MSE
Error n-2 SSE SSE/(n-2)
Total n -1 Total SS
Analysis of Variance
Source DF SS MS F P
Regression 1 1450.0 1450.0 19.14 0.002
Residual Error 8 606.0 75.8
Total 9 2056.0
Regression coefficients, 2
MSE a and Copyright t F
b ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Measuring the Strength
of the Relationship
• If the independent variable x is of useful in
predicting y, you will want to know how well
the model fits.
• The strength of the relationship between x and y
can be measured using:
SSxyxy
Correlatio
Correlationncoefficien
tt::rr
coefficien
SSxxxxSSyyyy
22
SS
22 xyxy SSR
SSR
Coefficien
ttof
Coefficien of determinat
determination
ion::rr
SxxxxSSyyyy©2006 Total
SCopyright TotalSS
Brooks/ColeSS
A division of Thomson Learning, Inc.
Measuring the Strength
of the Relationship
• Since Total SS = SSR + SSE, r2 measures
the proportion of the total variation in the
responses that can be explained by using the
independent variable x in the model.
the percent reduction the total variation by
using the regression equation rather than just
using the sample mean y-bar to estimate y.
For the calculus problem, r2 = .705 or 22 SSR
SSR
70.5%. The model is working well! rr
Total
TotalSS
SS
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Interpreting a
Significant Regression
• Even if you do not reject the null hypothesis
that the slope of the line equals 0, it does not
necessarily mean that y and x are unrelated.
• Type II error—falsely declaring that the slope is
0 and that x and y are unrelated.
• It may happen that y and x are perfectly related
in a nonlinear way.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Some Cautions
• You may have fit the wrong model.
IfIf not,
not, you
you will
will often
99
often see
see the
the pattern
pattern fail
fail
in
in the
the tails
tails of
of the
the graph.
95
90
80
graph.
70
Percent
60
50
40
30
20
10
1
-20 -10 0 10 20
Residual
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Residuals versus Fits
IfIf the
the equal
equal variance
variance assumption
assumption isis valid,
valid,
the
the plot
plot should
should appear
appear asas aa random
random
scatter
scatter around
around the
the zero
zero center
center line.
line.
Residuals Versus the Fitted Values
(response is y)
IfIf not,
not, you
you will
will see
15
see aa pattern
pattern in in the
the
10
residuals.
residuals. 5
Residual
-5
-10
60 70 80 90 100
Fitted Value
Estimating the
average value of y
when x = x0
Calculateyˆyˆ
Calculate 40.78424
40.78424.76556(50) 79.06
.76556(50) 79.06
11 ((50
50 46
46 )) 22
yˆyˆ
22..306
306 75 7532
75..7532
10
10 2474
2474
79 06
79..06 55 or
66..55 or72.51
72.51to to85.61.
85.61.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Calculus Problem
• Estimate the calculus grade for a particular
student whose achievement score is 50 with a
95% confidence interval.
Calculateyˆyˆ
Calculate 40.78424
40.78424.76556(50) 79.06
.76556(50) 79.06
11 ((50
50 46
46 )) 22
yˆyˆ
22..306
306 75 753211
75..7532
10
10 2474
2474
Notice how
79 06
79..06 21 11 or
21..11 or57.95
57.95to to100.17.
100.17.
much wider this
interval is!
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Minitab Output
Confidence and prediction
intervals when x = 50
Predicted Values for New Observations
New Obs Fit SE Fit 95.0% CI 95.0% PI
1 79.06 2.84 (72.51, 85.61) (57.95,100.17)
80
confidence bands.
y
70
60
40
narrowest when x = x- 30
20 30 40 50 60 70 80
bar. x
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Correlation Analysis
• The strength of the relationship between x and y is
measured using the coefficient of correlation:
correlation
SSxyxy
Correlatio
Correlationncoefficien
tt::rr
coefficien
SSxxxxSSyyyy
• Recall from Chapter 3 that
(1) -1 r 1 (2) r and b have the same sign
(3) r 0 means no linear relationship
(4) r 1 or –1 means a strong (+) or (-)
relationship Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Example
The table shows the heights and weights of
n = 10 randomly selected college football
players.
Player 1 2 3 4 5 6 7 8 9 10
Height, x 73 71 75 72 72 75 67 69 71 69
Weight, y 185 175 200 210 190 195 150 170 180 175
Use
Useyour
yourcalculator
calculator
SSxyxy 328 SSxxxx
328 60..44 SSyyyy
60 2610
2610
to
tofind
findthe
thesums
sums 328
rr 328
and
andsums
sumsofof ..8261
8261
squares.
squares. ((60
60..44)()(2610
2610))
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Football Players
Scatterplot of Weight vs Height
210
200
190
Weight
180
rr==.8261
.8261
170
160
Strong
Strongpositive
positive
150
correlation
correlation
66 67 68 69 70 71 72 73 74 75
Height
As
Asthe
theplayer’s
player’s
height
heightincreases,
increases,so
so
does
doeshis
hisweight.
weight.
rr==1;
1;Linear
Linear
relationship
relationship rr==-.67;
-.67;Weaker
Weaker
negative
negativecorrelation
correlation
MY APPLET