0% found this document useful (0 votes)
53 views40 pages

Ch11 - Simple Linear Regression

Uploaded by

yaseen10388
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views40 pages

Ch11 - Simple Linear Regression

Uploaded by

yaseen10388
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Simple Linear Regression

11
CHAPTER OUTLINE

11-1 Empirical Models


11-2 Simple Linear Regression
11-3 Properties of the Least Squares Estimators
11-4 Hypothesis Test in Simple Linear Regression
11-5 Confidence Intervals
11-7 Adequacy of the Regression Model

1
Chapter 11 Title and Outline
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Learning Objectives for Chapter 11
After careful study of this chapter, you should be able to do
the following:
1. Use simple linear regression for building empirical models to
engineering and scientific data.
2. Understand how the method of least squares is used to estimate the
parameters in a linear regression model.
3. Analyze residuals to determine if the regression model is an
adequate fit to the data or to see if any underlying assumptions are
violated.
4. Test the statistical hypotheses and construct confidence intervals on
the regression model parameters.
5. Apply the correlation model.

Chapter 11 Learning Objectives 2


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-1: Empirical Models
• Many problems in engineering and science involve
exploring the relationships between two or more
variables.
• Regression analysis is a statistical technique that
is very useful for these types of problems.
• For example, in a chemical process, suppose that
the yield of the product is related to the process-
operating temperature.
• Regression analysis can be used to build a model
to predict yield at a given temperature level.

Sec 11-1 Empirical Models 3


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-2: Simple Linear Regression

• Thesimple linear regression considers a single


regressor or predictor x and a dependent or
response variable Y.
• The expected value of Y at each level of x is a
random variable:
E(Y|x) = b0 + b1x

• We assume that each observation, Y, can be


described by the model
Y = b0 + b1x + 

Sec 11-2 Simple Linear Regression 4


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Regression Assumptions
1. Continuous Dependent Variable Y
2. Linear-in-Parameters Relationship between Y
and X
– Linear relationship
– May need transformation
– May not be valid for any data
3. Observations Independently and Randomly
Sampled

5
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Regression Assumptions
4. Uncertain relationship between Variables
– Addition of a stochastic term (error or
disturbance term)
– Account for omitted variables
– Account for measurement errors
5. Disturbance Term Independent of X and
Expected Value Zero

6
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
Regression Assumptions
6. Disturbance Terms Not Autocorrelated
– Disturbances are independent across
observations
7. Regressors and Disturbance Uncorrelated
– Exogenity of the regressors
– Y does not directly influence the value of a
regressor
8. Disturbances Approximately Normally
Distributed

7
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-2: Simple Linear Regression
Least Squares Estimates

Fig_11-3 Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-2: Simple Linear Regression
Least Squares Estimates
The least-squares estimates of the intercept and slope in the simple
linear regression model are

bˆ   y  bˆ  x (11-1)

 n  n 
  y i    x i 
n 
 yi xi   i 1  i 1 
n
bˆ   i 1
2
 n  (11-2)
  x i 
n 
 x i2   i 1 
i 1 n

where y  (1/n) in1 yi and x  (1/n) in1 xi.


Sec 11-2 Simple Linear Regression 9
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-2: Simple Linear Regression
Notation
2
 n 
  xi 
n n  
S xx    xi  x 2   xi2   i 1 
i 1 i 1 n

 n  n 
  xi    y i 
n n   
S xy   y i  x i  x 2   x i y i   i 1   i 1 
i 1 i 1 n

Sec 11-2 Simple Linear Regression 10


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-2: Simple Linear Regression
The fitted or estimated regression line is therefore

yˆ  bˆ   bˆ  x (11-3)

Note that each pair of observations satisfies the relationship

yi  bˆ   bˆ xi  ei , i  1, 2, , n

where ei  y i  yˆ i is called the residual. The residual describes the


error in the fit of the model to the ith observation yi.

Sec 11-2 Simple Linear Regression 11


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
EXAMPLE 11-1 Oxygen Purity
We will fit a simple linear regression model to the oxygen purity data in
Table 11-1. The following quantities may be computed:
20 20
n  20  x i  23 . 92  y i  1 , 843 . 21
i1 i1
x  1 . 1960 y  92 . 1605 Table 11-1
20 20
 y i2  170 , 044 . 5321  x i2  29 . 2892
i1 i1
20
 x i y i  2 , 214 . 6566
i1
2
 20 
  xi 
20   ( 23 . 92 ) 2
S xx   x i2   i 1   29 . 2892 
i 1 20 20
 0 . 68088
 20   20 
  xi    y i 
20   
 i 1   i 1 
S xy   xi yi 
20
i 1
( 23 . 92 ) (1, 843 . 21 )
 2 , 214 . 6566   10 . 17744
20
Sec 11-2 Simple Linear Regression 12
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
EXAMPLE 11-1 Oxygen Purity - continued

Therefore, the least squares estimates of the slope and intercept are
S xy 10 .17744
bˆ 1    14 .94748
S xx 0 .68088
and

bˆ 0  y  bˆ  x  92 .1605  (14 .94748 )1 .196  74 .28331

The fitted simple linear regression model (with the coefficients reported to
three decimal places) is
yˆ  74 .283  14 .947 x

Sec 11-2 Simple Linear Regression 13


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
EXAMPLE 11-1 Oxygen Purity - continued

Fig_11-2 Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
EXAMPLE 11-1 Oxygen Purity - continued

Fig_11-4 Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-2: Simple Linear Regression
Estimating 2
The error sum of squares is

n n
SS E   ei2    yi  yˆ i 2
i 1 i 1

It can be shown that the expected value of the


error sum of squares is E(SSE) = (n – 2)2.

Sec 11-2 Simple Linear Regression 16


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-2: Simple Linear Regression
Estimating 2
An unbiased estimator of 2 is

2 SS E
ˆ  (11-4)
n  2
where SSE can be easily computed using

SS E  SS T  b̂1S xy (11-5)

Sec 11-2 Simple Linear Regression 17


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-3: Properties of the Least Squares Estimators

• Slope Properties
2
E ( bˆ 1 )  b 1 
V ( bˆ 1 ) 
S xx

• Intercept Properties

 2 
1 x
E (bˆ 0 )  b 0 and V (bˆ 0 )   2   
 n S xx 

Sec 11-3 Properties of the Least Squares Estimators 18


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-4: Hypothesis Tests in Simple Linear Regression

11-4.1 Use of t-Tests

Suppose we wish to test

H0: b1 = b1,0
H1: b1  b1,0

An appropriate test statistic would be


bˆ   b 1 , 0
T0 
(11-6)
ˆ 2 / S xx

Sec 11-4 Hypothesis Tests in Simple Linear Regression 19


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-4: Hypothesis Tests in Simple Linear Regression

11-4.1 Use of t-Tests


The test statistic could also be written as:
bˆ 1  bˆ 1 , 0
T0 
se ( bˆ 1 )

We would reject the null hypothesis if

|t0| > ta/2,n - 2

Sec 11-4 Hypothesis Tests in Simple Linear Regression 20


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-4: Hypothesis Tests in Simple Linear Regression

11-4.1 Use of t-Tests


Suppose we wish to test

H0: b0 = b0,0
H1: b0  b0,0
An appropriate test statistic would be
bˆ 0  b 0 , 0 bˆ 0  b 0 , 0
T0   (11-7)
2 1 x 2  se ( bˆ 0 )
ˆ   
 n S xx 

Sec 11-4 Hypothesis Tests in Simple Linear Regression 21


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-4: Hypothesis Tests in Simple Linear Regression

11-4.1 Use of t-Tests

We would reject the null hypothesis if

|t0| > ta/2,n - 2

Sec 11-4 Hypothesis Tests in Simple Linear Regression 22


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-4: Hypothesis Tests in Simple Linear Regression

11-4.1 Use of t-Tests


An important special case of the hypotheses of
Equation 11-18 is

H0: b1 = 0
H1: b1  0

These hypotheses relate to the significance of


regression.
Failure to reject H0 is equivalent to concluding that
there is no linear relationship between x and Y.

Sec 11-4 Hypothesis Tests in Simple Linear Regression 23


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-4: Hypothesis Tests in Simple Linear Regression
EXAMPLE 11-2 Oxygen Purity Tests of Coefficients We will test for significance of
regression using the model for the oxygen purity data from Example 11-1. The hypotheses are
H0: b 1 = 0
H1: b 1  0
and we will use a = 0.01. From Example 11-1 and Table 11-2 we have
bˆ 1  14.947 n  20, S xx  0.68088, ˆ 2  1.18
so the t-statistic in Equation 11-6 becomes
bˆ 1 bˆ  14 .947
t0     11 .35
ˆ
ˆ 2 /S xx se (b1 ) 1 .18 /0.68088

Practical Interpretation: Since the reference value of t is t0.005,18 = 2.88, the value of the test
statistic is very far into the critical region, implying that H0: b 1 = 0 should be rejected. There is
strong evidence to support this claim. The P-value for this test is P  ~ 1.23  109. This was
obtained manually with a calculator.

Table 11-2 presents the Minitab output for this problem. Notice that the t-statistic value for the
slope is computed as 11.35 and that the reported P-value is P = 0.000. Minitab also reports the
t-statistic for testing the hypothesis H0: b 0 = 0. This statistic is computed from Equation 11-7,
with b 0,0 = 0, as t0 = 46.62. Clearly, then, the hypothesis that the intercept is zero is rejected.

Sec 11-4 Hypothesis Tests in Simple Linear Regression 24


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-4: Hypothesis Tests in Simple Linear Regression

11-4.2 Analysis of Variance Approach to Test


Significance of Regression
The analysis of variance identity is
n n n
2 2 2

 iy  y   
 iˆ
y  y   
 i i
y  ˆ
y  (11-8)
i 1 i 1 i 1

Symbolically,

SST = SSR + SSE (11-9)

Sec 11-4 Hypothesis Tests in Simple Linear Regression 25


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-4: Hypothesis Tests in Simple Linear Regression

11-4.2 Analysis of Variance Approach to Test


Significance of Regression
If the null hypothesis, H0: b1 = 0 is true, the
statistic
SS R /1 MS R
F0   (11-10)
SS E / n  2  MS E

follows the F1,n-2 distribution and we would


reject if f0 > fa,1,n-2.

Sec 11-4 Hypothesis Tests in Simple Linear Regression 26


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-4: Hypothesis Tests in Simple Linear Regression
11-4.2 Analysis of Variance Approach to Test
Significance of Regression
The quantities, MSR and MSE are called mean
squares.
Analysis of variance table:
Source of Sum of Squares Degrees of Mean Square F0
Variation Freedom
Regression SS R  b̂1S xy 1 MSR MSR/MSE
Error SS E  SST  b̂1S xy n-2 MSE
Total SS T n-1

Note that MSE = ˆ 2

Sec 11-4 Hypothesis Tests in Simple Linear Regression 27


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-4: Hypothesis Tests in Simple Linear Regression

EXAMPLE 11-3 Oxygen Purity ANOVA We will use the analysis of variance
approach to test for significance of regression using the oxygen purity data
model from Example 11-1. Recall that SST  173.38, bˆ1  14.947 , Sxy = 10.17744,
and n = 20. The regression sum of squares is
SS  bˆ S  (14 .947 ) 10 . 17744  152 . 13
R 1 xy

and the error sum of squares is

SSE = SST - SSR = 173.38 - 152.13 = 21.25

The analysis of variance for testing H0: b1 = 0 is summarized in the Minitab


output in Table 11-2. The test statistic is f0 = MSR/MSE = 152.13/1.18 = 128.86,
for which we find that the P-value is P  ~ 1.23 109 , so we conclude that b is not
1
zero.
There are frequently minor differences in terminology among computer
packages. For example, sometimes the regression sum of squares is called the
“model” sum of squares, and the error sum of squares is called the “residual”
sum of squares.

Sec 11-4 Hypothesis Tests in Simple Linear Regression 28


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-5: Confidence Intervals
11-5.1 Confidence Intervals on the Slope and Intercept
Definition
Under the assumption that the observation are normally and independently
distributed, a 100(1 - a)% confidence interval on the slope b1 in simple linear
regression is
ˆ 2 ˆ 2
bˆ 1  t a /2, n  2  b 1  bˆ 1  t a /2, n  2 (11-11)
S xx S xx

Similarly, a 100(1 - a)% confidence interval on the intercept b0 is

ˆb  t 21 x2 
0 ˆ  
a /2, n  2  
 n S xx  (11-12)
2 1 x2 
 b 0  bˆ 0  t a /2, n  2 ˆ   
 n S xx 
Sec 11-5 Confidence Intervals 29
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-5: Confidence Intervals
EXAMPLE 11-4 Oxygen Purity Confidence Interval on the Slope We will
find a 95% confidence interval on the slope of the regression line using the data
in Example 11-1. Recall that bˆ 1  14.947, S xx  0.68088 , and 
ˆ 2 1.18 (see Table
11-2). Then, from Equation 11-11 we find
ˆ 2 ˆ 2
bˆ   t 0 .0 2 5 ,1 8  b 1  bˆ 1  t 0 . 0 2 5 ,1 8
S xx S xx
Or
1.18 1.18
14.947  2.101  b1  14.947  2.101
0.68088 0.68088
This simplifies to

12.181  b1  17.713

Practical Interpretation: This CI does not include zero, so there is strong


evidence (at a = 0.05) that the slope is not zero. The CI is reasonably narrow
(2.766) because the error variance is fairly small.

Sec 11-5 Confidence Intervals 30


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-5: Confidence Intervals
11-5.2 Confidence Interval on the Mean
Response
ˆ Y | x 0  bˆ 0  bˆ 1 x 0

Definition
A 100(1 - a)% confidence interval about the mean response at the value of
x  x0, say  Y | x 0 , is given by

ˆ Y | x 0  t a /2, ˆ 2
1 x 0  x 2 
n2   
 n S xx 

2
1 x 0  x 2 
  Y | x 0  ˆ Y | x 0  t a /2, n2 ˆ    (11-13)
 n S xx 

ˆ ˆ
where ˆ Y | x 0  b 0  b 1 x 0 is computed from the fitted regression model.

Sec 11-5 Confidence Intervals 31


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-5: Confidence Intervals
Example 11-5 Oxygen Purity Confidence Interval on the Mean Response

We will construct a 95% confidence interval about the mean response for the
data in Example 11-1. The fitted model is ˆ Y | x0  74 .283  14 .947 x0 , and the
95% confidence interval on Y |x0 is found from Equation 11-13 as
 1 ( x 0  1 . 1960 ) 2 
ˆ Y | x 0  2 .101 1 . 18   
 20 0 .68088 
Suppose that we are interested in predicting mean oxygen purity when
x0 = 100%. Then  ˆ  74.283  14.947(1.00)  89.23
Y | x1.00

and the 95% confidence interval is


 1 (1 . 00  1 .1960 ) 2 
89 .23  2 . 101 1 . 18   
 20 0 . 68088 
or
89.23  0.75
Therefore, the 95% CI on Y|1.00 is
88.48  Y|1.00  89.98
This is a reasonable narrow CI.

Sec 11-5 Confidence Intervals 32


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-7: Adequacy of the Regression Model
• Fitting a regression model requires several
assumptions.
1. Errors are uncorrelated random variables
with mean zero;
2. Errors have constant variance; and,
3. Errors be normally distributed.
• The analyst should always consider the validity
of these assumptions to be doubtful and
conduct analyses to examine the adequacy of
the model

Sec 11-7 Adequacy of the Regression Model 33


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-7: Adequacy of the Regression Model
11-7.1 Residual Analysis
• The residuals from a regression model are ei = yi - ŷi ,
where yi is an actual observation and ŷi is the corresponding
fitted value from the regression model.

• Analysis of the residuals is frequently helpful in checking


the assumption that the errors are approximately normally
distributed with constant variance, and in determining
whether additional terms in the model would be useful.

Sec 11-7 Adequacy of the Regression Model 34


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-7: Adequacy of the Regression Model
EXAMPLE 11-7 Oxygen Purity Residuals
The regression model for the oxygen purity data in Example 11-1 is
yˆ  7 4 .2 8 3  1 4 .9 4 7 x .

Table 11-4 presents the observed and predicted values of y at each value
of x from this data set, along with the corresponding residual. These values
were computed using Minitab and show the number of decimal places
typical of computer output.

A normal probability plot of the residuals is shown in Fig. 11-10. Since the
residuals fall approximately along a straight line in the figure, we conclude
that there is no severe departure from normality.

The residuals are also plotted against the predicted value ŷi in Fig. 11-11
and against the hydrocarbon levels xi in Fig. 11-12. These plots do not
indicate any serious model inadequacies.

Sec 11-7 Adequacy of the Regression Model 35


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-7: Adequacy of the Regression Model
Example 11-7

Sec 11-7 Adequacy of the Regression Model 36


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-7: Adequacy of the Regression Model
Example 11-7

Figure 11-10 Normal


probability plot of residuals,
Example 11-7.

Sec 11-7 Adequacy of the Regression Model 37


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-7: Adequacy of the Regression Model
Example 11-7

Figure 11-11 Plot of


residuals versus predicted
oxygen purity, ŷ, Example
11-7.

Sec 11-7 Adequacy of the Regression Model 38


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-7: Adequacy of the Regression Model
11-7.2 Coefficient of Determination (R2)
• The quantity
2SS R SS E
R  1
SS T SS T

is called the coefficient of determination and is


often used to judge the adequacy of a regression
model.
• 0  R2  1;
• We often refer (loosely) to R2 as the amount of
variability in the data explained or accounted for by the
regression model.

Sec 11-7 Adequacy of the Regression Model 39


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-7: Adequacy of the Regression Model
11-7.2 Coefficient of Determination (R2)

• For the oxygen purity regression model,


R2 = SSR/SST
= 152.13/173.38
= 0.877
• Thus, the model accounts for 87.7% of the
variability in the data.

Sec 11-7 Adequacy of the Regression Model 40


Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.

You might also like