Ch11 - Simple Linear Regression
Ch11 - Simple Linear Regression
11
CHAPTER OUTLINE
1
Chapter 11 Title and Outline
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Learning Objectives for Chapter 11
After careful study of this chapter, you should be able to do
the following:
1. Use simple linear regression for building empirical models to
engineering and scientific data.
2. Understand how the method of least squares is used to estimate the
parameters in a linear regression model.
3. Analyze residuals to determine if the regression model is an
adequate fit to the data or to see if any underlying assumptions are
violated.
4. Test the statistical hypotheses and construct confidence intervals on
the regression model parameters.
5. Apply the correlation model.
5
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Regression Assumptions
4. Uncertain relationship between Variables
– Addition of a stochastic term (error or
disturbance term)
– Account for omitted variables
– Account for measurement errors
5. Disturbance Term Independent of X and
Expected Value Zero
6
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Regression Assumptions
6. Disturbance Terms Not Autocorrelated
– Disturbances are independent across
observations
7. Regressors and Disturbance Uncorrelated
– Exogenity of the regressors
– Y does not directly influence the value of a
regressor
8. Disturbances Approximately Normally
Distributed
7
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-2: Simple Linear Regression
Least Squares Estimates
Fig_11-3 Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-2: Simple Linear Regression
Least Squares Estimates
The least-squares estimates of the intercept and slope in the simple
linear regression model are
bˆ y bˆ x (11-1)
n n
y i x i
n
yi xi i 1 i 1
n
bˆ i 1
2
n (11-2)
x i
n
x i2 i 1
i 1 n
n n
xi y i
n n
S xy y i x i x 2 x i y i i 1 i 1
i 1 i 1 n
yˆ bˆ bˆ x (11-3)
yi bˆ bˆ xi ei , i 1, 2, , n
Therefore, the least squares estimates of the slope and intercept are
S xy 10 .17744
bˆ 1 14 .94748
S xx 0 .68088
and
The fitted simple linear regression model (with the coefficients reported to
three decimal places) is
yˆ 74 .283 14 .947 x
Fig_11-2 Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
EXAMPLE 11-1 Oxygen Purity - continued
Fig_11-4 Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-2: Simple Linear Regression
Estimating 2
The error sum of squares is
n n
SS E ei2 yi yˆ i 2
i 1 i 1
2 SS E
ˆ (11-4)
n 2
where SSE can be easily computed using
SS E SS T b̂1S xy (11-5)
• Slope Properties
2
E ( bˆ 1 ) b 1
V ( bˆ 1 )
S xx
• Intercept Properties
2
1 x
E (bˆ 0 ) b 0 and V (bˆ 0 ) 2
n S xx
H0: b1 = b1,0
H1: b1 b1,0
H0: b0 = b0,0
H1: b0 b0,0
An appropriate test statistic would be
bˆ 0 b 0 , 0 bˆ 0 b 0 , 0
T0 (11-7)
2 1 x 2 se ( bˆ 0 )
ˆ
n S xx
H0: b1 = 0
H1: b1 0
Practical Interpretation: Since the reference value of t is t0.005,18 = 2.88, the value of the test
statistic is very far into the critical region, implying that H0: b 1 = 0 should be rejected. There is
strong evidence to support this claim. The P-value for this test is P ~ 1.23 109. This was
obtained manually with a calculator.
Table 11-2 presents the Minitab output for this problem. Notice that the t-statistic value for the
slope is computed as 11.35 and that the reported P-value is P = 0.000. Minitab also reports the
t-statistic for testing the hypothesis H0: b 0 = 0. This statistic is computed from Equation 11-7,
with b 0,0 = 0, as t0 = 46.62. Clearly, then, the hypothesis that the intercept is zero is rejected.
Symbolically,
EXAMPLE 11-3 Oxygen Purity ANOVA We will use the analysis of variance
approach to test for significance of regression using the oxygen purity data
model from Example 11-1. Recall that SST 173.38, bˆ1 14.947 , Sxy = 10.17744,
and n = 20. The regression sum of squares is
SS bˆ S (14 .947 ) 10 . 17744 152 . 13
R 1 xy
ˆb t 21 x2
0 ˆ
a /2, n 2
n S xx (11-12)
2 1 x2
b 0 bˆ 0 t a /2, n 2 ˆ
n S xx
Sec 11-5 Confidence Intervals 29
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-5: Confidence Intervals
EXAMPLE 11-4 Oxygen Purity Confidence Interval on the Slope We will
find a 95% confidence interval on the slope of the regression line using the data
in Example 11-1. Recall that bˆ 1 14.947, S xx 0.68088 , and
ˆ 2 1.18 (see Table
11-2). Then, from Equation 11-11 we find
ˆ 2 ˆ 2
bˆ t 0 .0 2 5 ,1 8 b 1 bˆ 1 t 0 . 0 2 5 ,1 8
S xx S xx
Or
1.18 1.18
14.947 2.101 b1 14.947 2.101
0.68088 0.68088
This simplifies to
12.181 b1 17.713
Definition
A 100(1 - a)% confidence interval about the mean response at the value of
x x0, say Y | x 0 , is given by
ˆ Y | x 0 t a /2, ˆ 2
1 x 0 x 2
n2
n S xx
2
1 x 0 x 2
Y | x 0 ˆ Y | x 0 t a /2, n2 ˆ (11-13)
n S xx
ˆ ˆ
where ˆ Y | x 0 b 0 b 1 x 0 is computed from the fitted regression model.
We will construct a 95% confidence interval about the mean response for the
data in Example 11-1. The fitted model is ˆ Y | x0 74 .283 14 .947 x0 , and the
95% confidence interval on Y |x0 is found from Equation 11-13 as
1 ( x 0 1 . 1960 ) 2
ˆ Y | x 0 2 .101 1 . 18
20 0 .68088
Suppose that we are interested in predicting mean oxygen purity when
x0 = 100%. Then ˆ 74.283 14.947(1.00) 89.23
Y | x1.00
Table 11-4 presents the observed and predicted values of y at each value
of x from this data set, along with the corresponding residual. These values
were computed using Minitab and show the number of decimal places
typical of computer output.
A normal probability plot of the residuals is shown in Fig. 11-10. Since the
residuals fall approximately along a straight line in the figure, we conclude
that there is no severe departure from normality.
The residuals are also plotted against the predicted value ŷi in Fig. 11-11
and against the hydrocarbon levels xi in Fig. 11-12. These plots do not
indicate any serious model inadequacies.