0% found this document useful (0 votes)
30 views

Linear Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Linear Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 11: Simple Linear

Regression and Correlation


11-1 Empirical Models 11-6 Prediction of New Observations
11-2 Simple Linear Regression 11-7 Adequacy of the Regression Model
11-3 Properties of the Least Squares 11-7.1 Residual analysis
Estimators 11-7.2 Coefficient of determination
11-4 Hypothesis Test in Simple Linear (R2)
Regression 11-8 Correlation
11-4.1 Use of t-tests 11-9 Regression on Transformed
11-4.2 Analysis of variance Variables
approach to test significance of 11-10 Logistic Regression
regression
11-5 Confidence Intervals
11-5.1 Confidence intervals on the
slope and intercept
11-5.2 Confidence interval on the
mean response
1

Chapter Learning Objectives


After careful study of this chapter you should be able to:
1. Use simple linear regression for building empirical models to
engineering and scientific data
2. Understand how the method of least squares is used to estimate
the parameters in a linear regression model
3. Analyze residuals to determine if the regression model is an
adequate fit to the data or to see if any underlying assumptions
are violated
4. Test the statistical hypotheses and construct confidence
intervals on the regression model parameters
5. Use the regression model to make a prediction of a future
observation and construct an appropriate prediction interval on
the future observation
6. Apply the correlation model
7. Use simple transformations to achieve a linear regression
model
2
Empirical Models
• Many problems in engineering and science involve
exploring the relationships between two or more
variables.
• Regression analysis is a statistical technique that is
very useful for these types of problems.
• For example, in a chemical process, suppose that the
yield of the product is related to the process-operating
temperature.
• Regression analysis can be used to build a model to
predict yield at a given temperature level.
3

Empirical Model - Example Data

4
Empirical Model - Example Plot

Figure 11-1: Scatter diagram of oxygen purity versus


hydrocarbon level from Table 11-1.
5

Simple Linear Regression


Based on the scatter diagram, it is probably reasonable to
assume that the mean of the random variable Y is related to x by
the following straight-line relationship:

where the slope and intercept of the line are called regression
coefficients.

The simple linear regression model is given by

where  is the random error term.


6
Variance of Y = Variance of ε
We think of the regression model as an empirical model.
Suppose that the mean and variance of  are 0 and 2,
respectively, then:

The variance of Y given x is:

Model of True Regression Line


• The true regression model is a line of mean values:

where 1 can be interpreted as the change in the


mean of Y for a unit change in x (slope of the line).
• The variability of Y at a particular value of x is
determined by the error variance,  2.
• This implies there is a distribution of Y-values at
each x and that the variance of this distribution is
the same at each x.
8
Distribution of Y along Line

Figure 11-2:The distribution of Y for a given value of x


for the oxygen purity-hydrocarbon data.
9

Predictor and Response Variables


• The case of simple linear regression considers
a single regressor or predictor x and a
dependent or response variable Y.
• The expected value of Y at each level of x is a
random variable:

• We assume that each observation, Y, can be


described by the model:

10
Method of Least Squares
• Suppose that we have n pairs of observations (x1,
y1), (x2, y2), … (xn, yn). The method of least squares
is used to estimate the parameters, 0 and 1, by
minimizing the sum of the squares of the vertical
deviations.

Figure 11-3:
Deviations of the
data from the
estimated
regression model.

11

Sum of Square Deviations


• Since the n observations in the sample can be
expressed as:

• The sum of the squares of the deviations (errors) of


the observations from the true regression line is:

12
Least Squares Normal Equations

13

Simple Linear Regression


Coefficients

14
Fitted Regression Line

15

Sums of Squares
The following notation may also be used:
n 2

n n  x  i

S xx   xi x   x i 1 (11-10)
2 2
i
i 1 i 1 n
n n

n n  x   y 
i i

S xy    yi y xi x    xi yi i 1 i 1 (11-11)
i 1 i 1 n
Then,
S
ˆ1  xy and ˆ0  y ˆ1 x
S xx
16
Simple Linear Regression - Example
Example 11-1

17

Example 11-1 (continued)

18
Example 11-1 (continued)

Figure 11-4: Scatter


plot of oxygen
purity y versus
hydrocarbon level x
and regression
model ŷ = 74.20 +
14.97x.

19

Computing 2
The error sum of squares is:

It can be shown that the expected value of the error


sum of squares is E(SSE) = (n – 2)2. An unbiased
estimator of 2 is:

where SSE can be easily computed using:

20
21

Excel ®– Data Analysis Tool Regression output


SUMMARY OUTPUT

Regression Statistics
Multiple R 0.937
R Square 0.877
Adjusted R Square 0.871
Standard Error 1.087
Observations 20.000

ANOVA
df SS MS F Significance F
Regression 1 152.127 152.127 128.862 0.000
Residual 18 21.250 1.181
Total 19 173.377

Coefficients Standard Error t Stat P-value


Intercept 74.283 1.593 46.617 0.000
X Variable 1 14.947 1.317 11.352 0.000

22
Properties of Least Squares Estimators

• Slope properties for the mean and variance


(11-15)

(11-16)

• Intercept properties for the mean and variance

(11-17)

23

• Estimated Standard Errors


In simple linear regression the estimated standard
error of the slope and the estimated standard
error of the intercept are:

  ˆ
  21 x2 
2
se ˆ1  ˆ
se 0  ˆ   
S xx  n S xx 

respectively, where the estimated variance is


computed using Equation 11-13.

24
Hypothesis Test for the Slope
If we wish to test the slope is some value β1,0:

(11-18)

An appropriate test statistic would be:


ˆ1 1,0 ˆ1 1,0
T0  
ˆ S XX
2
 
se ˆ1
(11-19)

We would reject the null hypothesis if:


(11-20)

25

Hypothesis Test for the Intercept


If we wish to test the intercept is some value β0,0:

(11-21)

An appropriate test statistic would be:


(11-22)

We would reject the null hypothesis if:

26
Significance of Regression
An important special case of these hypotheses is:

(11-23)

Failure to reject H0 is equivalent to concluding that


there is no linear relationship between x and Y.
In other words, if we conclude the slope could be 0
the information on x tells us nothing about the
variation in the response, Y.
27

Figure 11-5: The hypothesis H0: 1 = 0 is not rejected.

Figure 11-6: The hypothesis H0: 1 = 0 is rejected.


28
Hypothesis Testing - Example
Example 11-2

29

Analysis of Variance (ANOVA)


The analysis of variance identity is:

If the null hypothesis, H0: β1 = 0 is true, the statistic


follows the F1,n-2 distribution and we would reject if
f0 > f,1,n-2.

30
The ANOVA Table
The quantities MSR and MSE are called mean squares of
the regression and the errors, respectively.
Analysis of variance (ANOVA) table:

31

Analysis of Variance - Example


Example 11-3

(14.947)10.17744 = 152.13

21.25

32
Equivalence of t-tests and ANOVA

33

Confidence Intervals on
Regression Model Parameters
The following state the confidence intervals for the slope
and intercept of a regression model.

34
Example 11–4 (Confidence Interval on the Slope)

12.181 ≤ β1 ≤ 17.713

35

Confidence Interval on
the Mean Response
The point estimate for the response at a given x is:
ˆY x  ˆ0  ˆ1 x0
0

The confidence interval for the mean response is then:

36
Example 11–5 (Confidence Interval on the Mean Response)

74.283 + 14.947(1.00) = 89.23

37

Example 11–5 (continued)

38
Example 11–5 (continued)

Figure 11-7:
Scatter diagram of
oxygen purity data
from Example 11-1
with fitted
regression line and
95% confidence
limits on Y|x0.

39

Prediction of New Observations


The response point estimate for a new observation at x0 is:
Yˆ0  ˆ0  ˆ1 x0
The prediction interval for the new response, Y0, is then:

40
Example 11–6 (Prediction Interval)

41

Example 11–6 (continued)

42
Example 11–6 (continued)

Figure 11-8:
Scatter diagram of
oxygen purity data
from Example 11-1
with fitted regression
line, 95% prediction
limits (outer lines) ,
and 95% confidence
limits on Y|x0.

43

Adequacy of Regression Models


• Fitting a regression model requires several
assumptions.
1. Errors are uncorrelated random variables with
mean zero;
2. Errors have constant variance; and,
3. Errors be normally distributed.
• The analyst should always consider the validity of
these assumptions to be doubtful and conduct
analyses to examine the adequacy of the model
44
Residual (Error) Analysis
• The residuals from a regression model are ei =
yi - ŷi , where yi is an actual observation and ŷi is
the corresponding fitted value from the
regression model.
• Analysis of the residuals is frequently helpful in
checking the assumption that the errors are
approximately normally distributed with
constant variance, and in determining whether
additional terms in the model would be useful.
45

Residual Plots

Figure 11-9: Patterns


for residual plots. (a)
satisfactory, (b)
funnel, (c) double
bow, (d) nonlinear.

46
Residual Analysis - Example

Example 11-7

47

Example 11-7 (continued)

48
Example 11-7 (continued)

Figure 11-10: Normal


probability plot of
residuals, Example
11-7.

49

Example 11-7 (continued)

Figure 11-11: Plot of


residuals versus
predicted oxygen
purity, ŷ, Example
11-7.

50
Coefficient of Determination (R2)
• The quantity

(11-34)

is called the coefficient of determination and is often


used to judge the adequacy of a regression model.
• 0  R2  1;
• We often refer (loosely) to R2 as the amount of
variability in the data explained or accounted for by
the regression model.
51

R2 Computations - Example
• For the oxygen purity regression model,
R2 = SSR/SST
152.13/173.38
= 152.13/173.38
0.877
= 0.877
• Thus, the model accounts for 87.7% of the
variability in the data.

52
Regression on Transformed Variables
In many cases a plot of the independent variable, y,
against the dependent variable, x, may show the
relationship is not linear.
Performing a linear regression would lead to a poor
fit and residual analysis would show the model is
inadequate.
However, we can often transform the dependent
variable first. This transformed variable, x’, may
have a linear relationship with y.
53

Therefore, we can perform a linear regression


between the x’ and y.
However, note that any use of the new equation for
prediction would require a reverse transformation to
indicate the desired value of x.
Transformation can take on many forms. Typical
ones include:
x’ = logarithm (x)
x’ =x' = logarithm(x)
square root (x)
x' = squareroot(x)
x’ = x'
inverse (x).
= inverse(x)

54
Obs. Output (y) Velocity (x) x'=1/x
Example 11-9
1 1.582 5.00 0.200
2 1.822 6.00 0.167
An engineer has collected data on the 3 1.057 3.40 0.294
4 0.5 2.70 0.370
DC output from a windmill under 5 2.236 10.00 0.100
6 2.386 9.70 0.103
different wind speed conditions. He 7 2.294 9.55 0.105
8 0.558 3.05 0.328
wishes to develop a model describing 9 2.166 8.15 0.123
10 1.866 6.20 0.161
output in terms of wind speed. 11 0.653 2.90 0.345
12 1.93 6.35 0.157
13 1.562 4.60 0.217
The table on the right shows the data 14 1.737 5.80 0.172
15 2.088 7.40 0.135
collected for output, y, as a response 16 1.137 3.60 0.278
17 2.179 7.85 0.127
and wind speed, x, as the dependent 18 2.112 8.80 0.114
19 1.8 7.00 0.143
variable. 20 1.501 5.45 0.183
21 2.303 9.10 0.110
22 2.31 10.20 0.098
The final column shows the 23 1.194 4.10 0.244

transformed value, x’=1/x.


24 1.144 3.95 0.253
25 0.123 2.45 0.408

55

Example 11-9 (continued)


3.0

2.5

2.0
DC Output

1.5

1.0
Original
0.5

0.0
0 2 4 6 8 10 12
Wind Velocity, x

Regression Equation (Original Data):


y = 0.1309 + 0.2411 x
2
R = 0.875
56
Example 11-9 (continued)
3.0

2.5

2.0

DC Output
Transformed
1.5

1.0

0.5

0.0
0.0 0.1 0.2 0.3 0.4 0.5
Transformed Wind Velocity, 1/x

Regression Equation (Transformed Data):


y = 2.9789 – 6.9345 x’
R2 = 0.980
57

THE END OF ENGG 319 CLASS NOTES

58

You might also like