0% found this document useful (0 votes)

101 views

3 SimpleLinearRegression

Simple linear regression fits a straight line to data points to minimize the sum of squared residuals. It finds the intercept (a) and slope (b) that best predict a response (y) from an explanatory variable (x) according to the equation y = a + bx + ε, where ε is random error. The least squares method estimates a and b by solving equations that set partial derivatives of the residual sum of squares to zero. The estimates of a and b are unbiased but have variance that depends on the variance of the errors and terms involving the data.

Uploaded by

2022CEP006 AYANKUMAR NASKAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views

3 SimpleLinearRegression

Uploaded by

2022CEP006 AYANKUMAR NASKAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

SIMPLE LINEAR REGRESSION

Simple linear regression is the least squares estimator of a linear

regression model with a single explanatory variable. In other
words, simple linear regression fits a straight line through the set of
n points in such a way that makes the sum of squared residuals of
the model (that is, vertical distances between the points of the data
set and the fitted line) as small as possible.

Suppose there are n data points {yi, xi},i = 1, 2, …,n, which are i-th
realizations of the random variables Y and X respectively. The goal
is to find the equation of the straight line
y  a  bx  
which would provide a "best" fit for the data points. In the above
model, the intercept a and the slope b are unknown constants and
 is a random error component. The errors are assumed to have
mean zero and unknown variance  2 . Additionally, we usually
assume that the errors are uncorrelated. That is, the value of one
error does not depend on the value of any other error. So, we
assume that

E  i X   E i   0, Var  i X   Var i    2 and Cov i j   0.

     

It is convenient to view the regressor x as controlled by the data

analyst and measured with negligible error, whereas the response y
is a random variable. So, there is probability distribution for y at
each possible value for x. The mean of this distribution is

 
E y x  a b x

and the variance is

 
Var y x  Var  a  b x      2 .

Page 1 of 30
Thus, the mean of y is a linear function of x although the variance
of y does not depend on the value of x. Furthermore, because the
errors are uncorrelated, the responses are also uncorrelated.

LEAST-SQUARES ESTIMATION OF THE PARAMETERS

Here the "best" will be understood as in the least-squares
approach: such a line that minimizes the sum of squared residuals
of the linear regression model. In other words, a and b solves the
following minimization problem:
min n n
Q  a, b  , where Q  a, b    ˆ    yi  a  bxi 
2 2
i
a, b i 1 i 1

The estimates of a and bare obtained by minimize the objective

function Q. Differentiating Q partially with respect to a and b and
equating them to zero, we get
Q
 
n
  (2)  yi  aˆ  bx
ˆ 0
aˆ i 1  i
 , and

Q n
  (2) xi  yi  aˆ  bx
bˆ i 1  i 
ˆ 0
 . 
On simplification, they give us
n n

 yi  an
i 1
ˆ  bˆ. x i
i 1
These equations are
known as least
n n n square normal
 xi yi  aˆ. xi  bˆ. xi2
i 1 i 1 i 1
equations.

Solving this system of equations by method of elimination, we find

the least square estimate of a and b as

Page 2 of 30
From first normal equation, we have

∑𝑛𝑖=1 𝑦𝑖 − 𝑏̂ ∑𝑛𝑖=1 𝑥𝑖
𝑎̂ = = 𝑦̅ − 𝑏̂𝑥̅
𝑛
Now, putting the expression for 𝑎̂ in the second normal equation and
simplifying, we have

𝑛 𝑛 𝑛

∑ 𝑥𝑖 𝑦𝑖 = (𝑦̅ − 𝑏̂𝑥̅ ) ∑ 𝑥𝑖 + 𝑏̂ ∑ 𝑥𝑖2

𝑖=1 𝑖=1 𝑖=1

∑𝑛 𝑛
𝑖=1 𝑥𝑖 ∑𝑖=1 𝑦𝑖
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑆𝑥𝑦
𝑛
⇒ 𝑏̂ = 2 =
(∑𝑛
𝑖=1 𝑥𝑖 ) 𝑆𝑥𝑥
∑𝑛𝑖=1 𝑥𝑖2 −
𝑛

To verify whether the solution is really a minimum, the matrix of

second order derivatives of Q, the Hessian matrix, must be positive
definite. It is easy to show that

 n

 n  x i 

 
H aˆ , bˆ  4  n

i 1
n


  xi  xi2 
 i 1 i 1 
 xi  x   0 .
2

n
and above is clearly positive definite, since i 1

Page 3 of 30
Example 1

To illustrate, let us consider the following data on the number of

hours which ten persons studied for a French test and their scores
on the test:

Hours Test
x2 xy y2
Studied, x Score, y
4 31 16 124 961
9 58 81 522 3364
10 65 100 650 4225
14 73 196 1022 5329
4 37 16 148 1369
7 44 49 308 1936
12 60 144 720 3600
22 91 484 2002 8281
1 21 1 21 441
17 84 289 1428 7056
100 564 1376 6945 36562 TOTALS

Therefore, Sxx = 1376 – 1/10*(100)2 = 376 and

Sxy = 6945 – 1/10*(100)(564) = 1305.
Thus, b̂  1305/376 = 3.471 and â = 56.4 – 3.471*10 = 21.69.
So, least square regression equation is yˆ  21.69  3.471x .

Page 4 of 30
PROPERTIES OF REGRESSION COEFFICIENTS

Linearity

Regression coefficient estimates are random variables, since they

are just linear combinations of yi and the yi are random variables,
since
yi  xi  x  n
n

S  x x
bˆ  xy
 i1   wi yi , where wi  i .
Sxx Sxx i1 S xx

Unbiasedness
Let us now investigate the bias and variance properties of these
estimates.

We have, from above

 
n n
bˆ   wi yi   wi a  bxi   i a wi  b wi xi   wii
i 1 i 1

Now, clearly,
w  0 i and w x 1
i i

So, the above expression reduces to bˆ  b   wi i .

Therefore,
𝐸(𝑏̂) = 𝐸 (𝑏 + ∑ 𝑤𝑖 𝜀𝑖 )
= 𝐸 (𝑏) + ∑ 𝑤𝑖 𝐸 (𝜀𝑖 ) = 𝑏, ∵ 𝐸 (𝜀𝑖 ) = 0

Thus, b̂ is an unbiased estimate of the true slope b.

Page 5 of 30
Similarly,
ˆ  1  a  bx     bx
aˆ  y  bx
n
 i i
ˆ

ˆ
= a  bx    bx


=a  bˆ  b x  
Since, E  bˆ   b and E    0, we get
 
 

E aˆ  a. Thus, â is also
an unbiased estimate of the true intercept a .

Now, let us try to obtain the variances of the estimates. We have,

𝑏̂ = 𝑏 + ∑𝑛𝑖=1 𝑤𝑖 𝜀𝑖 ⟹ 𝑏̂ − 𝐸(𝑏̂) = ∑𝑛𝑖=1 𝑤𝑖 𝜀𝑖 .
𝑛 2
2
𝑣𝑎𝑟(𝑏̂) = 𝐸 [{𝑏̂ − 𝐸(𝑏̂)} ] = 𝐸 [(∑ 𝑤𝑖 𝜀𝑖 ) ]
𝑖=1
𝑛

= 𝐸 [∑ 𝑤𝑖2 𝜀𝑖2 + cross − product termd involving 𝜀𝑖 & 𝜀𝑗 , 𝑖 ≠ 𝑗]

𝑖=1
𝑛 𝑛
𝑆𝑥𝑥 𝜎2
= ∑ 𝑤𝑖2 𝐸(𝜀𝑖2 ) =𝜎 2
∑ 𝑤𝑖2 2
=𝜎 × 2 =
𝑆𝑥𝑥 𝑆𝑥𝑥
𝑖=1 𝑖=1

Page 6 of 30
 2
 2
Var  aˆ   aˆ  E  aˆ      
E   E    bˆ  b  x    
 
      

 2
= 2 


x E  bˆ  b   E  

  
 2  2 xE  bˆ  b   


 
 
  
= x2Var bˆ   2 ˆ
 E     2 xE  b  b   
 
  

 
Now, E  2   and
2

n
 
  
   
ˆ
E  b  b     E  wi i  1 i  
   
n 

 
= E  1   wi i2  cross  product terms  
 n   

= 1  wi E  i2   1  2  wi  0
n   n

 x2   1 2 

2 2 x
Therefore, Var aˆ   2   .
Sxx n n Sxx 


Putting together what we know so far, we can describe b̂ as a

linear, unbiased estimator of b , with a variance given by
 2 Sxx . Similarly, â can be described as a linear, unbiased
 2 
estimator of a , with a variance given by 2  1  x  .
 n S xx 
 

To show that OLS estimates are best (i.e. has least variation),
we will show that if there exist another linear unbiased

Page 7 of 30
estimator other than 𝑏̂, then its variation must be greater than
that of 𝑏̂.

Let bˆ*   ki yi be any other linear estimator of b. Suppose

that, ki  wi  ci , where ci is another constant and wi is as
defined earlier.

bˆ*   ki yi    wi  ci  (a  bxi  i )
 a wi  a ci  b wi xi  b ci xi    wi  ci   i
 a ci  b  b ci xi    wi  ci   i

Taking mathematical expectation of 𝑏̂ ∗ and noting that

E  i   0 , we find that in order the above estimate to be
unbiased it is necessary that  ci   ci xi  0. So, in order for
bˆ*   ki yi to be in the class of linear unbiased estimators, it
has to be

bˆ*  b    wi  ci  i
Now,

 
Var bˆ*  Var  b    wi  ci   i     wi  ci  Var   i 
2

  2   wi  ci 
2

  2  wi2   2  ci2   c w  0

i i


 Var bˆ   2  ci2

 Var  bˆ 

Page 8 of 30
Above establishes that, for the family of linear and unbiased
estimators b̂* , each of the alternative estimators has variance that
is greater than or equal to that of the least squares estimator b̂ . The
only time that Var  bˆ*   Var  bˆ  is when all the ci = 0, in which case
   
   
bˆ*  bˆ . Thus, there is no other linear and unbiased estimator of b
that is better than b̂ . Hence the OLS estimate b̂ is BLUE.

ESTIMATION OF 
2

The estimate of  2 can be obtained from the residuals 𝑒𝑖 = 𝑦𝑖 −

𝑦̂𝑖 . So, sum of squares of residuals or error sum of squares will be
SS E   ei2    yi  yˆ i 
2

i i

This can be simplified to

i  yi  aˆ  bx
2
ˆ 
SSE = i

=   yi  y  b  xi  x  
2
ˆ
i
2
= ∑𝑖[(𝑦𝑖 − 𝑦̅) − 𝑏̂(𝑥𝑖 − 𝑥̅ )]
= 𝑆𝑦𝑦 + 𝑏̂ 2 𝑆𝑥𝑥 − 2𝑏̂𝑆𝑥𝑦
𝑆𝑥𝑦
= 𝑆𝑦𝑦 + 𝑏̂𝑆𝑥𝑦 − 2𝑏̂𝑆𝑥𝑦 [∵ 𝑏̂ = ]
𝑆𝑥𝑥
= 𝑆𝑦𝑦 − 𝑏̂𝑆𝑥𝑦

SS E
 MS E gives an unbiased estimate of  .
2
The quantity
n2

Page 9 of 30
In simple linear regression the estimated standard error of the slope
ˆ ˆ 2
is se(b)  and the estimated standard error of the intercept is
S xx

1 x2 
se  aˆ   ˆ  
2
 , where ˆ = MSE.
2

 n S xx 

TESTING THE SLOPE OF REGRESSION EQUATION

Let the null hypothesis be H 0 : b  b0

H1 : b  b0

Since yi are independent normal random variables and b̂ is a

 
linear combination of them, so b̂ is N b,  2 / S xx . So, we can test
validity of above hypotheses using the statistic

bˆ  b0 bˆ  b0
t0  
ˆ S xx
2
MS E S xx ,

that has a t distribution with n-2 degrees freedom under the null
hypothesis. Thus we would reject null hypothesis if
t 0  t / 2,n2 .

A similar procedure can be used to test the hypothesis about the

intercept. To test

H 0 : a  a0
H1 : a  a0

Page 10 of 30
we would use the statistic

aˆ  a 0
t0 
1 x 
2
MS E
 n  S 
xx

and reject the null hypothesis if t 0  t / 2,n2 .

Testing the following hypothesis can be used to ensure the

significance of regression

H0 : b  0
H1 : b  0

Failure to reject above null hypothesis is equivalent to concluding

that there is no linear relationship between x and y.

Example 2

Let us test the significance of regression using the following data

and model parameters.

bˆ  3.471 , n =10, Sxx = 376, Sxy = 1305 and Syy = 4752.4.

The hypotheses to be tested are 𝐻0 : 𝑏 = 0 𝑣𝑠 𝐻1 : 𝑏 ≠ 0 and we test

at 0.01 level of significance.

Page 11 of 30
Mean square error is given by

S yy  bˆ.S xy 4752.4  3.471  1305

MS E    27.84.
n2 8

So, the test statistic becomes

bˆ 3.471
t0    12.76
ˆ
2
27.84 .
S xx 376

Now, since above computed value of t0 = 12.76 is much greater

than t(0,005,8) = 3.36, we reject the null hypothesis and conclude
that regression is significant.

Page 12 of 30
ANOVA APPROACH FOR TESTING SIGNIFICANCE OF
REGRESSION

ANOVA procedure partitions the total variability in the response

variable into two components as described below.

  yi  y     yˆi  y     yi  yˆi 
2 2 2

i i i

The two components in the right hand side of above equation,

respectively, measures the amount of variability in yi accounted for
by the regression line [called regression sum of squares, denoted
by SSR] and the residual variation left unexplained by the
regression line [called error sum of squares, denoted by SSE].
Thus, above equation can equivalently be written as

S yy  SS R  SS E ,
where 𝑆𝑦𝑦 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 is the total corrected sum of squares of
y. Now, we have already noted that
ˆ , or equivalently, S  bS
SSE  S yy  bS ˆ  SS ,
xy yy xy E

therefore we have 𝑆𝑆𝑅 = 𝑏̂𝑆𝑥𝑦 .

The total SS has n-1 degrees of freedom; SSR and SSE have 1 and
n-2 degrees of freedom, respectively.

It can be shown that E  SS E  n  2     , E ( SS R )    b .S xx

2 2 2

and that SSE / 2 and SSR / 2 are independent 2 random variable

with n-2 and 1d.f. respectively.

The null hypothesis H0: b = 0 is thus tested by the statistic

SS R / 1. 2 MS R
F0   , which follows the F1, n-2 distribution
SS E (n  2). 2 MS E

Page 13 of 30
and we would reject H 0 , if f 0  f  ,1,n 2 . The test procedure is
usually represented in an ANOVA table, as given below.

Analysis of Variance for Testing Significance of Regression

Source of Degrees Sum of Mean Square F0
Variation of Squares
Freedom
Regression 1 SS  bˆ.S MS R  SS R / 1 MS R MS E
R xy

Error n-2 SS  S  bˆ.S MS E  SS E n  2

E yy xy

TOTAL n-1 S yy
Note: ̂ 2  MS E .

It may be noted that for testing significance of regression ANOVA

bˆ
procedure is equivalent to t-test. We have, T0  . Squaring
 2 S xx
both sides and using ̂ 2  MS E , we get

bˆ 2 .S xx bˆ.S xy SS R MS R
T 
0
2
   = F0.
MS E MS E MS E MS E

It is worthwhile to note that square of a t-random variable with 

degrees freedom is an F-random variable with (1,) degrees of
freedom. Thus, the test using T0 is equivalent to test based on F0.

Page 14 of 30
CONFIDENCE INTERVALS

Confidence interval is a measure of overall quality of the

regression line. If the error terms, i in the regression model are
NID (0, 2), then it is already shown that
bˆ  b aˆ  a
and
ˆ 2 S xx 2 1 x2 
ˆ   
 n S xx 
follow t-distribution with (n-2) degrees of freedom. Thus a 100(1-
) percent confidence interval on the slope b in simple linear
regression is

ˆ 2 ˆ 2
bˆ  t /2,n2  b  bˆ  t /2,n2 .
S xx S xx

Similarly, a 100(1-) percent confidence interval on the intercept a

in simple linear regression is

1 x2  2 1 x2 
aˆ  t /2,n2 ˆ  
2
  a  aˆ  t /2,n2 ˆ   .
 n S xx   n S xx 

INTERVAL ESTIMATION OF THE MEAN RESPONSE

A major use of a regression model is to estimate the mean response

E(y) for a particular value of the regressor variable x. Let x0 be a
value of the regressor variable within the region in which the
variable is explored and we wish to estimate the mean response,
say E  y | x0  at x  x0 . An unbiased point estimator of E  y | x0  can
be obtained from the fitted model as

Page 15 of 30
Eˆ  y x0   ˆ y|x0  aˆ  bx
ˆ
0

Since â and b̂ are unbiased estimates of a and b respectively,

ˆ y|x
0
is an unbiased estimate of  y|x0 . The variance of ˆ y|x
0
is

 1  x  x  2

V ( ˆ y| x )     
2 0

 n S 
0

Again since â and b̂ are normally distributed, so is 

ˆ y|x0 .
Therefore, if we use ˆ as an estimate of  2 , it is easy to see that
2

ˆ y| x  E
0
y x  0

1 x  x 
2

ˆ n 
2


0

 S xx 
has a t-distribution with n-2 degrees of freedom. This leads to the
following confidence interval definition.

A 100(1-) percent confidence interval about the mean response at

x  x0 (or about the regression line) is given by

1 x  x 
2

ˆ y| x  t / 2 , n  2 ˆ n 
2


0

 S xx 
0

Page 16 of 30
1 x  x 
2

E  y x   ˆ
0  t / 2 , n  2 ˆ
2

n 
0


 
y| x 0
S xx

It is to be noted that width of the confidence interval is a function

of x0 and the width is minimum for x0  x and widens as
x0  x increases.

PREDICTION OF NEW OBSERVATIONS

Let y0 be the prediction of a new observation y corresponding to a

specific level of the regressor variable x, say x0 . Clearly, the point
estimate of the future observation y0 is
ˆ .
yˆ0  aˆ  bx0
Here, we will develop a prediction interval for the future
observation y0 .

There are two sources of variability:

a) the mean response, and
b) the natural variation 𝜎 2 .

In earlier section, we have shown that variance of predicted mean

response at x  x0 is given by
 1  x0  x 2 
V ( ˆ y| x )     .
2

 n S 
0

Page 17 of 30
Again, the actual predicted value will vary about the mean value
with variance  2 . So clearly, variance of the predicted response ŷ0
at x  x0 will be given by
  x  x  2

var  yˆ 0    1   0
1

2

 n S xx 

If we use ˆ as an estimate of  2 , then

y0  yˆ 0
 1  x0  x 2 
ˆ 1  
2

 n S xx 

has a t-distribution with n-2 degrees of freedom. From this, we can

develop the following prediction interval for a future observation.

A 100(1-) percent prediction interval on a future observation y0

at x = x0 is given by

 1  x  x  2

yˆ 0  t /2,n2 ˆ 2 1   0 
 n S xx 
 1  x0  x 2 
 y0  yˆ 0  t /2,n2 ˆ 1   
2

 n S xx 
The prediction interval is of minimum width at x0  x and widens
as x0  x increases.

ASSESSMENT OF REGRESSION MODEL

Throughout the discussion we have assumed that

Page 18 of 30
1. Errors are
a) normally distributed,
b) distributed with mean ‘0’ and constant variance  2 , and
c) uncorrelated.

Above conditions are conveniently written as errors are NID(0,

2).

2. Linear fit is the adequate fit.

We will now examine the adequacy of these assumptions.

Residual Analysis for Normality

Analysis of the residuals, i.e, 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 , 𝑖 = 1,2, ⋯ , 𝑛, is

helpful in checking the assumption that errors are approximately
normally distributed with constant variance.

As an approximate check of normality one can construct a

histogram of the residuals or a normal probability of residuals.

The residuals, unlike the errors, do not all have the same variance:
the variance depends on how farther the corresponding x-value is
from the average x-value. The fact that the variances of the
residuals differ, even though the variances of the true errors are all
equal to each other, it does not make sense to compare residuals at
different data points without some sort of standardization.

Standardized Residuals

Page 19 of 30
One may also standardize the residuals by computing
ei ei
di   , i = 1, 2, , n.
ˆ 2 MSE

If the errors are normally distributed, then approximately 95% of

the standardized residuals should fall in the interval (-2, +2).
Residuals that are far outside this interval may indicate the
presence of outlier, i.e. an observation that is abnormal to the rest
of the data. Sometimes outliers may provide important information
about unusual circumstances of interest to experimenter and should
be given due importance.

Residual Analysis for Homoscedasticity

This assumption of constant variation is called the

homoscedasticity assumption. The word comes from the Greek:
homo (equal) and scedasticity (spread). This means that the
variation of y around the regression line is the same across the x
values; that is to say, it neither increases or decreases as x varies.

Homoscedastic Heteroscedastic

It is frequently helpful to plot the residuals (1) against the ŷi and
(2) against the xi. If the plot is evenly and randomly distributed

Page 20 of 30
around the zero-residual-line, we will assume that there no
abnormal pattern in the residuals. If the plot is funnel-shaped
around the zero-residual-line, the variance of the observations is
not remaining same over magnitude yi or xi .

Data transformation on the response y is often used to eliminate

this problem. Widely used variance-stabilizing transformations
include the use of y , ln y, or 1 y as the response. If the residual
plot is found to non-linear, the model requires higher order terms
or possibility of including other independent variables should be
explored.

Residual analysis for Independence

Here residuals are plotted against time sequence, i.e. in order of

data collection.

Page 21 of 30
Not Independent
Independent
resi resid
dua uals
ls X
X
resi
dual
s
X

Second plot in first column suggest auto correlation, as adjacent

observations tend to have residuals of same sign.

Coefficient of Determination (R2)

The quantity
SS R SS
R2   1 E
S yy S yy
is called the coefficient of determination, and is often used to judge
the adequacy of the regression model. It should be noted that R2
represents amount of variability in the data explained or accounted
for by the regression model and since 0  SS R  S yy , 0  R  1.
2

Page 22 of 30
Lack of Fit Test

Here we will test for the goodness of fit of the regression model.
Specifically we wish to test

H0: The simple linear regression model is correct

H1: The simple linear regression model is not correct.

The test involves partitioning the error SS into SS attributable to

two components, namely, pure error and lack of fit of the model,
that is, SS E  SS PE  SS LOF .

To compute SSPE, we must have repeated observations in the

response for at least one level of x. Suppose we have n total
observations such that

𝑦11 , 𝑦12 , ⋯ , 𝑦1𝑛1 repeated observations at x1


𝑦𝑗1 , 𝑦𝑗2 , ⋯ , 𝑦𝑗𝑛𝑗 repeated observations at xj


𝑦𝑚1 , 𝑦𝑚2 , ⋯ , 𝑦𝑚𝑛𝑚 repeated observations at xm

The TSS for pure error would be obtained by summing over those
levels of x’s that contain repeat observations.
𝑛𝑖

𝑆𝑆𝑃𝐸 = ∑ ∑ (𝑦𝑖𝑢 − 𝑦̅𝑖 )2

𝑖=1,𝑗,𝑚 𝑢=1

Page 23 of 30
The degrees of freedom associated with the pure error SS is
m

 (n
i 1
i  1)  n  m . The lack of fit SS is simply SSLOF = SSE - SSPE with

df(E) – df(PE) = n-2 – (n-m) = m-2 degrees of freedom. The test

statistic for lack of fit would then be

SS LOF (m  2) MS LOF
F0  
SS PE (n  m) MS PE

and we would reject the hypothesis that model adequately fits

the data if f 0  f  , m  2, n  m .

Note: It was assumed above that, we have repeat observations at

all levels of the predictor variable x. If not, summation will be
restricted to those levels of x only that contains repeat
observations.

Page 24 of 30
Example 2

Consider the data on two variables, y and x shown below. Fit a

simple linear regression model and test for lack of fit using  =
0.05

x Y
1.0 2.3, 1.8
2.0 2.8
3.3 1.8, 3.7
4.0 2.6, 2.6, 2.2
5.0 2.0
5.6 3.5, 2.8, 2.1
6.0 3.4, 3.2
6.5 3.4
6.9 5.0

The regression model is yˆ  1.697  0.259 x , the regression SS is

SSR = 3.4930, total SS is TSS = 10.83 and error SS is SSE = 7.3372.
The pure-error SS is computed as follows:

ni
Degrees
  yiu  yi 
2
Level
i 1
of
of x
freedom
1.0 0.1250 1
3.3 1.8050 1
4.0 0.0166 2
5.6 0.9800 2
6.0 0.0200 1
Totals 3.0366 7

Page 25 of 30
So, lack of fit SS is SSLOF = SSE – SSPE = 7.3372 – 3.0366 = 4.3006.

ANOVA table for this data analysis is given below:

Source DF SS MS F0 P-value
Regression 1 3.4930 3.4930 6.66 0.0218
Error 14 7.3372 0.5241
(Lack of Fit) 7 4.3006 0.6144 1.42 0.3276
(Pure Error) 7 3.0366 0.4338
Total 15 10.8302

Since lack of fit is not significant, we cannot reject the null

hypothesis that the tentative model adequately describes the
data. Moreover since regression is significant, we conclude that
b 0.

CORRELATION

We have so far assumed x to be a mathematical variable and y to

be a random variable. But many applications of regression
analysis involve situations where both x and y are random
variables. In such situations, it is usually assumed that (xi, yi) are
jointly distributed random variable obtained from bivariate
normal distribution f(x, y) with  x and  x as mean and variance
2

of x and  y and  y2 as mean and variance of y. For example,

suppose we wish to develop a regression model relating the shear
strength of spot welds to the weld diameters. Here weld diameter
cannot be controlled and we would randomly select n spot welds
and observe their diameters (xi) and shear strength (yi).

Page 26 of 30
cov( x, y )
Correlation coefficient in such cases is defined as  
 x y .
The estimate of  is the simple correlation coefficient and can be
given by

 y (x  x )
i i
S xy
r i 1

 n n
2 S xx .S yy
 i       
2
x x . yi y 
 i 1 i 1 

We may also write

S xy2 S xy S xy ˆ .S xy SS R
r 
2
 .    R2 .
S xx .S yy S xx S yy S yy S yy

Thus, the correlation coefficient is the square root of the

coefficient of determination.

TRANSFORMATION TO A STRAIGHT LINE

Often we find that straight-line regression model is inappropriate

because the true regression function is nonlinear. In some of
these situations, a non-linear function can be expressed as a
straight line by using suitable transformation. Such a non-linear
model is known as intrinsically linear. Few examples are given
below:

Page 27 of 30
Transformed Linear
Non-linear form Remark
form
Y  aebx ln Y  ln a  bx  ln  ln   should be
NID(0, 2)
Y  a b 1  x   Y  a b z  Using z = 1/x
Y 1 ln Y *  a  bx  
exp  a  bx    Using Y* = 1/Y

Example 1

Following table gives the purity of oxygen produced in a chemical

distillation process and the % of hydrocarbons present at that
time in the main condenser of the distillation unit.

Obs # 1 2 3 4 5 6 7 8 9
HC
0.99 1.02 1.15 1.29 1.46 1.36 0.87 1.23 1.55
Level %
Purity
90.01 89.05 91.43 93.74 96.73 94.45 87.59 91.77 99.42
%

Obs # 10 11 12 13 14 15 16 17 18
HC
1.40 1.19 1.15 0.98 1.01 1.11 1.20 1.26 1.32
Level %
Purity
93.65 93.54 92.52 90.56 89.54 89.85 90.39 93.25 93.41
%

Obs # 19 20 Total Average Sum_SQ Sum_Prod

HC
1.43 0.95 23.92 1.1960 29.2892
Level %
2214.6566
Purity
94.98 87.33 1843.21 92.1605 170044.5321
%

Page 28 of 30
a) Calculate the least square estimates of slope and intercept.
b) What % of total variability in Purity% is accounted for by the
model?
c) Test the significance of the model thus obtained using
ANOVA.
d) Obtain 95% confidence interval on i) slope and ii) intercept.
e) Construct a 95% confidence interval of mean purity level at
HC level of 1.01.
f) Construct a 95% prediction interval at HC level % of 1.00.

Soln.

a) S xx = 0.681, S xy = 10.177, S yy = 173.377

b̂ = 14.944 â = 74.287
yˆ  74.287  14.944 x

SS R  bˆ.S xy = 152.085 SSE  S yy  SSR = 21.292

b) R 2  SSR SST = 152.085/173.377 = 0.877192.

Thus, 87.7% of variability in purity % accounted for by
the model.

c)
ANOVA table
Source of
DF SS MS f0 Remark
Variation
Regression 1 152.085 152.085 128.559 Significant
Error 18 21.292 1.183
Total 19 173.377

Page 29 of 30
d) 95% confidence interval on
MS E MS E
i) slope: bˆ  t0.025,18  b  bˆ  t0.025,18
S xx S xx

 14.944  2.101 1.183 0.681  b  14.944  2.101 1.183 0.681

 [12.715, 17.713]

ii) Intercept: [70.936, 77.638]

e) Mean purity at x = 1.01 is 89.380 and the confidence interval is

[89.095, 89.665].

f) Predicted value at x = 1.00 is 89.231 and the prediction interval

is [86.827, 91.635]

Exercise 1

Show that an equivalent way to define the test for significance of

regression in simple linear regression is to base the test on R2 as
follows:

To test H 0 : b  0 versus H1 : b  0 , calculate

R2  n  2
F0 
1  R2
and reject the null hypothesis, if the computed value f0  f , 1, n2 .

Hence test the significance of regression at   0.05 for a simple

linear regression fit based on n  25 observations with R2  0.90 .

Page 30 of 30

HW 03 Sol
No ratings yet
HW 03 Sol
9 pages
Final Project Report 8th Sem
No ratings yet
Final Project Report 8th Sem
34 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
SLRM note
No ratings yet
SLRM note
15 pages
Linear Models
No ratings yet
Linear Models
92 pages
Simple Linear Regression Analysis - Final
No ratings yet
Simple Linear Regression Analysis - Final
46 pages
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
No ratings yet
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
8 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
Lecture 2: Simple Linear Regression Model: Recap
No ratings yet
Lecture 2: Simple Linear Regression Model: Recap
5 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Regression Notes- Part-1
No ratings yet
Regression Notes- Part-1
17 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
Chapter_8_Linear_regression (1)
No ratings yet
Chapter_8_Linear_regression (1)
22 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Scott and Watson CHPT 4 Solutions
No ratings yet
Scott and Watson CHPT 4 Solutions
4 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
DAUNIT-3
No ratings yet
DAUNIT-3
32 pages
RegEstimationLS_ML_StatColumbia
No ratings yet
RegEstimationLS_ML_StatColumbia
44 pages
Lecture-4
No ratings yet
Lecture-4
11 pages
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
No ratings yet
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
35 pages
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
No ratings yet
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
7 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
CVEN2002 Week11
No ratings yet
CVEN2002 Week11
49 pages
8. Linear Regression
No ratings yet
8. Linear Regression
29 pages
2020 Comp Q1_3
No ratings yet
2020 Comp Q1_3
4 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
WST 311 Notes part 2 2024
No ratings yet
WST 311 Notes part 2 2024
21 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
11 pages
Simple Linear Regression: From Wikipedia, The Free Encyclopedia
No ratings yet
Simple Linear Regression: From Wikipedia, The Free Encyclopedia
10 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Properties of The OLS Estimator: Quantitative Methods 2
No ratings yet
Properties of The OLS Estimator: Quantitative Methods 2
57 pages
Linear Model
No ratings yet
Linear Model
14 pages
Lect 6
No ratings yet
Lect 6
20 pages
Lecture5 Module2 Anova 1
No ratings yet
Lecture5 Module2 Anova 1
9 pages
UMVUE Statmat 2 2022
No ratings yet
UMVUE Statmat 2 2022
43 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Linear Regression Analysis: Module - Vii
No ratings yet
Linear Regression Analysis: Module - Vii
10 pages
Reg02
No ratings yet
Reg02
46 pages
Lecture 12
No ratings yet
Lecture 12
29 pages
Econometrics I: TA Session 5: Giovanna Ubida
No ratings yet
Econometrics I: TA Session 5: Giovanna Ubida
20 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Math644 Chapter 1 Part1
No ratings yet
Math644 Chapter 1 Part1
5 pages
Week 2
No ratings yet
Week 2
33 pages
Exercises of Equations and Disequations
From Everand
Exercises of Equations and Disequations
Simone Malacrida
No ratings yet
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Designing Pid Controller Using Labview For Controlling Fluid Level of Vessel
No ratings yet
Designing Pid Controller Using Labview For Controlling Fluid Level of Vessel
5 pages
hw7 Sol
100% (1)
hw7 Sol
5 pages
Unit -1 _Logic gates and circuit – B.C.A study
No ratings yet
Unit -1 _Logic gates and circuit – B.C.A study
13 pages
Chapter 16 - Probability Class 10 Maths - Google Docs
No ratings yet
Chapter 16 - Probability Class 10 Maths - Google Docs
4 pages
Class 30 - Replacement Analysis
No ratings yet
Class 30 - Replacement Analysis
14 pages
EEE 350: Control Systems (Quiz 1) Name: Student ID: Date
No ratings yet
EEE 350: Control Systems (Quiz 1) Name: Student ID: Date
1 page
Introduction To Antenna Test Ranges Measurements Instrumentation
100% (1)
Introduction To Antenna Test Ranges Measurements Instrumentation
13 pages
Part II Principles of Quantum Mechanics Anne Christine Davies Notes 2015
No ratings yet
Part II Principles of Quantum Mechanics Anne Christine Davies Notes 2015
87 pages
Dronacharya College of Engineering: Subject: DAA Sem: VTH Trade: CSE/IT
No ratings yet
Dronacharya College of Engineering: Subject: DAA Sem: VTH Trade: CSE/IT
2 pages
08B - Chapter 8, Sec 8.4 - 8.8 Black
No ratings yet
08B - Chapter 8, Sec 8.4 - 8.8 Black
18 pages
De Haas Van Alphen Effect
No ratings yet
De Haas Van Alphen Effect
13 pages
0 - CM Practical File - PDF - 20231122 - 130527 - 0000 3
No ratings yet
0 - CM Practical File - PDF - 20231122 - 130527 - 0000 3
30 pages
6 Reasons The Dark Ages Weren't So Dark - HISTORY
No ratings yet
6 Reasons The Dark Ages Weren't So Dark - HISTORY
4 pages
Introductory Course For Themodynamics 2
No ratings yet
Introductory Course For Themodynamics 2
3 pages
Correspondence Analysis in Practice, Third Edition Greenacre - Instantly access the complete ebook with just one click
No ratings yet
Correspondence Analysis in Practice, Third Edition Greenacre - Instantly access the complete ebook with just one click
66 pages
Chapter 1 - Symmetry Elements and Operation
No ratings yet
Chapter 1 - Symmetry Elements and Operation
32 pages
Immediate download Light Propagation Through Biological Tissue and Other Diffusive Media Theory Solutions and Validations 2nd Edition Fabrizio Martelli ebooks 2024
100% (4)
Immediate download Light Propagation Through Biological Tissue and Other Diffusive Media Theory Solutions and Validations 2nd Edition Fabrizio Martelli ebooks 2024
40 pages
Exercises: Guided Practice
No ratings yet
Exercises: Guided Practice
2 pages
7.4 Loop Diagram: C L F: B A C P
No ratings yet
7.4 Loop Diagram: C L F: B A C P
21 pages
Tutorial - 5 Eigenvalues and Linear Transformation PDF
No ratings yet
Tutorial - 5 Eigenvalues and Linear Transformation PDF
2 pages
Modified Aqua Silencer
No ratings yet
Modified Aqua Silencer
6 pages
Day 2
No ratings yet
Day 2
6 pages
Application of The Principles of Permutation and C
No ratings yet
Application of The Principles of Permutation and C
6 pages
Volatility Stock
No ratings yet
Volatility Stock
15 pages
A tribute to Emil Wolf First Edition. Edition Visser - The 2025 ebook edition is available with updated content
100% (1)
A tribute to Emil Wolf First Edition. Edition Visser - The 2025 ebook edition is available with updated content
62 pages
Gr.10 Eais Assessment 1 Reference Guide 2024 2025.
No ratings yet
Gr.10 Eais Assessment 1 Reference Guide 2024 2025.
13 pages
NMR Structure Determination: With The NMR Assignments and Molecular Modeling Tools in Hand
No ratings yet
NMR Structure Determination: With The NMR Assignments and Molecular Modeling Tools in Hand
69 pages
HO Kuliah 01 Pengolahan Sinyal
No ratings yet
HO Kuliah 01 Pengolahan Sinyal
17 pages
Machine Drawing With CAD - Ning (PDFDrive)
No ratings yet
Machine Drawing With CAD - Ning (PDFDrive)
227 pages