Regression and Correlation Analysisxy
Regression and Correlation Analysisxy
Tapia
▪ Many problems in engineering and science involve exploring the relationships
between two or more variables.
▪ Regression analysis is a statistical technique that is very useful for these types of
problems.
▪ For example, in a chemical process, suppose that the yield of the product is related
to the process-operating temperature.
▪ Regression analysis can be used to build a model to predict yield at a given
temperature level.
Simple Linear Regression
Example 1: Oxygen Purity vs Hydrocarbon Level
Example 2: Selling Price vs Taxes
Estimators in Linear Regression
Hypothesis Tests in Linear Regression
Example: We will test for significance of regression using the model for the oxygen purity data from
Example 1. The hypotheses are
H0 : β1 = 0 and H1 : β1 ≠ 0
and we will use 𝛼 = 0.01.
Analysis of Variance Appproach to Test Significance of Regression
Example: We will test for significance of regression using the model for the oxygen purity data from
Example 1. The hypotheses are
H0 : β1 = 0 and H1 : β1 ≠ 0
and we will use 𝛼 = 0.01.
Confidence Intervals and Prediction Interval
Example: Using Example 1, find a 95% confidence interval on the slope of the regression line.
Example: Using Example 1, find a 95% confidence interval on the mean response at 𝑥 = 1.00
Example: Using Example 1, find a 95% prediction interval on next observation of oxygen purity at 𝑥 =
1.00
▪ Fitting a regression model requires several assumptions.
▪ Errors are uncorrelated random variables with mean zero;
▪ Errors have constant variance; and,
▪ Errors be normally distributed.
▪ The analyst should always consider the validity of these assumptions to be doubtful
and conduct analyses to examine the adequacy of the model
Residual analysis
• The residuals from a regression
model are ei = yi - ŷi , where yi is an
actual observation and ŷi is the
corresponding fitted value from the
regression model.
are the mean and variance X, and r is the correlation coefficient between Y and
X. Recall that the correlation coefficient is defined as
s XY
r= (11-15)
s X sY
where sXY is the covariance between Y and X.
The conditional distribution of Y for a given value of X = x is
y − 0 − 1x
2
fY | x ( y ) =
1 1
exp − (11-16)
2sY | x 2 sY | x
where
sY (11-17)
0 = mY − m X r
sX
sY
1 = r (11-18)
sX
Sec 11-8 Correlation 18
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-8: Correlation
It is possible to draw inferences about the correlation coefficient r in this model.
The estimator of r is the sample correlation coefficient
n
Yi (X i − X ) S XY
R= i =1 = (11-19)
n n
2
1/ 2
(S XX SST )1/2
( X i − X ) i ( )
2
Y − Y
i =1 i =1
Note that
1/2
ˆ = SST
R (11-20)
1 S
XX
S XX ˆ S
SS R
R =
2 ˆ2
=
1 XY
=
1
SYY SST SST
Sec 11-8 Correlation 19
Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
11-8: Correlation
It is often useful to test the hypotheses
H0: r = 0
H1: r 0
R n−2
T0 = (11-21)
1 − R2
H0: r = 𝜌0
H1: r 𝜌0
z/2 z/2
tanh arctanh r − r tanh arctanh r + (11-23)
n−3 n−3