0% found this document useful (0 votes)
21 views21 pages

Mungadze Linear

Linear regression

Uploaded by

bongani mungadze
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views21 pages

Mungadze Linear

Linear regression

Uploaded by

bongani mungadze
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

1 Introduction

In many problems, there are two or more variables that are related and it is
important to model and explore this relationship.
For example, in a chemical process, the yield of product is related to the op-
erating temperature. It may be of interest to build a model relating yield to
temperature and then use the model for prediction, process optimization or pro-
cess control.
In general, suppose that there is a single dependent variable or response Y
that depends on k independent or regressor variables eg X1 , X2 , ...XK
The relationship between these variables is characterized by a mathematical
model called a regression equation.
The regression model is fit to a set of sample data. In some instances, the ex-
perimenter knows the exact form of the true functional relationship between Y
and X1 , X2 , ...XK say
Y = φ(X1 , X2 , ...XK )
However, in most cases, the true functional relationship is unknown and the
experimenter chooses an appropriate function to approximate φ.
Generally, the analysis of variance in a designed experiment helps to identify
which factors are important and regression is used to build a quantitative model
relating the important factors to the response.

2 The general linear model


General linear models (GLM) are being used widely in data analysis in almost
all fields of science. Some of the GLM are:

(1) Simple linear regression


-one response, one predictor
(2) Multiple regression
-multiple regression is not the same as multivariate regression.
(a) Multiple regression (only one response and several predictors).
(b) Multivariate regression (more than one response variable and predic-
tors could be one or more).

3 Assumptions of the Linear regression model


We have big four assumptions which are:
• Linearity of residuals
• Independence of residuals

• Normal distribution of residuals

1
• Homoscedasticity- Equal variance of residuals across all levels of predic-
tors.

4 How to check the above four assumptions


Linearity We draw a scatter plot of residuals and y values. Y values are taken
on the vertical y axis, and standardized residuals (SPSS calls them ZRESID)
are then plotted on the horizontal x axis.
If the scatter plot follows a linear pattern (i.e. not a curvilinear pattern) that
shows that linearity assumption is met.

Independence we worry about this when we have longitudinal dataset.


Longitudinal dataset is one where we collect observations from the same entity
over time, for instance stock price data here we collect price info on the same
stock i.e. same entity over time.
We generally have two types of data: cross sectional and longitudinal. Cross
-sectional datasets are those where we collect data on entities only once. For
example we collect IQ and GPA information from the students at any one given
time (think: camera snap shot)
Longitudinal data set is one where we collect GPA information from the
same student over time (think: video).
In cross sectional datasets we do not need to worry about Independence as-
sumption. It is assumed to be met.

Normality: we draw a histogram of the residuals, and then examine the


normality of the residuals. If the residuals are not skewed, that means that the
assumption is satisfied.
Equality of variance: We also use a scatter plot to check equality of variance.
The scatter plot should have y on the vertical axis, and the ZRESID (standard-
ized residuals) on the x axis. If the residuals do not fan out in a triangular
fashion that means that the equal variance assumption is met.
NB: What does the general linear model mean?
Linearity in the unknown parameters, B 0 s, the fixed constants.
The X 0 s could be squared, cubed, exponential, it does not matter.

5 Simple linear regression model


We wish to determine the relationship between a single regressor variable X
and a response variable Y .
The regressor variable X is usually assumed to be a continuous variable that is
controllable by the experimenter. Now, the expected value of Y for each value
X is
E(Y /X) = β0 + β1 X (1)

2
where the parameters of the straight line β0 and β1 are unknown constants. We
assume that each observation Y can be described by the model

Y = β0 + β1 X +  (2)

where  is a random error with mean zero and variance σ 2 , ie  ∼ N (0, σ 2 )

NB: If  is a random variable, so is Y .


The {} are also assumed to be uncorrelated random variables.
The regression model (2) involving only a single regressor variable X is often
called the simple linear regression model.
If we have n pairs of data (Y1 , X1 ), (Y2 , X2 ), ...(Yn , Xn ), we may estimate the
model parameters β0 and β1 by least squares.
From equation (2), we may write

Yj = β0 + β1 Xj + j , j = 1, 2, ...n

and the least squares function is


n
X n
X
L= 2j = (Yj − β0 − β1 Xj )2 (3)
j=1 j=1

Minimising the least squares function is simplified if we write the model equation
(2) as
Y = β01 + β1 (X − X̄) +  (4)
where  Xn
1
X̄ = Xj
n j=1

and
β01 = β0 + β1 X̄
Equation (4) is frequently called the transformed simple linear regression
model or simply the transformed model.
Employing the transformed model, the least squares function becomes
n
X 2
Yj − β01 − β1 (Xj − X̄)

L= (5)
j=1

The least squares estimators of β01 and β1 say β̂01 and β̂1 must satisfy
n h
∂L X
1
i
| 1 = −2 Yj − β̂ 0 − β̂ 1 (Xj − X̄) =0
∂β01 β̂0 β̂1 j=1

n h
∂L X i
|β̂ 1 β̂1 = −2(Xj − X̄) Yj − β̂01 − β̂1 (Xj − X̄) = 0
∂β1 0 j=1

3
Simplifying these two equations yields
n
X
nβ̂01 = Yj
j=1

n
1X
⇒ β̂01 = Yj = Ȳ (6)
n j=1
n
X n
X
β̂1 (Xj − X̄)2 = Yj (Xj − X̄) (7)
j=1 j=1

Equations (6) and (7) are called the least squares normal equations and
the solutions are:
n
1X
β̂01 = Yj = Ȳ (8)
n j=1
Pn
j=1 Yj (Xj − X̄)
β̂1 = Pn (9)
j=1 (Xj − X̄)2

β̂01 and β̂1 are the least squares estimators of the intercept and slope respectively.
The fitted simple linear regression model is

Ŷ = β̂01 + β̂1 (X − X̄) (10)

If we wish to represent our results in terms of the original intercept, β0 , then

β̂0 = β̂01 − β̂1 X̄

and the fitted model is


Ŷ = β̂0 + β̂1 X
In equation (9), let
n n Pn
X
2
X ( j=1 Xj )2
Sxx = (Xj − X̄) = Xj2 − (11)
j=1 j=1
n

and
n n P P
X X ( Xj )( Yj )
Sxy = Yj (Xj − X̄) = Xj Yj − (12)
j=1 j=1
n

Sxx is called the correlated sum of squares of X.


Sxy is the correlated sum of the cross-products of X and Y .
Equations (11) and (12) are the usual computational formulas. Now

Sxy
β̂1 = (13)
Sxx

4
Example 1
A study was made to determine the effect of stirring rate on the amount of im-
purity in paint produced by a chemical process. The study yielded the following
data.

Stirring rate(pm) (x) 20 22 24 26 28 30 32 34 36 38 40 42


Impurity (%), (y) 8.4 9.5 11.8 10.4 13.3 14.8 13.2 14.7 16.4 16.5 18.9 18.5
A scatter diagram is very important in identifying the relationship between
two variables.
The model
Y = β0 + β1 X + 
is proposed and the following quantities are computed:
P12 P12
n = 12 ; j=1 Xj = 372 ; j=1 Yj = 166.4
P12
j=1 Xj2 = 12104; Ȳ = 13.87 X̄ = 31
P12 P12
j=1 Yj2 = 2435.14 ; j=1 Yj Xj = 5419.60
( X)2 2
P12 P
Sxx = j=1 Xj2 −
n = 12104 − (372)
12 = 572
P12 P P
( X)( y)
Sxy = j=1 Xj Yj − n = 5419.60 − (372)(166.4)
12 = 261.20

Thus
Sxy 261.20
β̂1 = = = 0.4566
Sxx 572
β̂01 = Ȳ = 13.8667
and the fitted model is

Ŷ = β̂01 + β̂1 (X − X̄) = 13.8667 + 0.4566(X − 31)

If we express the model in terms of the original intercept, then

β̂0 = β̂01 − β̂1 X̄ = 13.8667 − (0.4566)(31) = −0.2879

and since Ŷ = β̂0 + β̂1 X, we have

Ŷ = −0.2879 + 0.4566X

N.B : Residuals ej = Yj − Ŷj where Ŷj , the fitted values are useful in exam-
ining the adequacy of the least squares fit.

5
6 Bias and variance properties of the estimators
Consider β̂1 (Expected value).

 
Sxy
E(β̂1 ) = E
Sxx
 
n
1 X
= E Yj (Xj − X̄)
Sxx j=1
 
n
1 X
= E  (β01 + β1 (Xj − X̄) + j )(Xj − X̄)
Sxx j=1
   
n
1  X h X i hX i 
= E β01 (Xj − X̄) + E β1 (Xj − X̄)2 + E j (Xj − X̄)
Sxx  j=1

Pn
But j=1 (Xj − X̄) = 0 and E(j ) = 0, then

1
E(β̂1 ) = β1 Sxx
Sxx

⇒ E(β̂1 ) = β1
Thus β̂1 is an unbiased estimator of the true slope β1

Variance of β̂1
We have assumed thatV (j ) = σ 2 , it follows that V (Yj ) = σ 2

Finding the estimate of σ 2


This estimate can be obtained from the residuals ej = Yj − Ŷj
The sum of the squares of the residuals, or the sum of squares would be
n
X n
X
SSE = e2j = (Yj − Ŷj )2 (14)
j=1 j=1

A more convenient computing formula for SSE may be found by substituting


the estimated model
Ŷj = Ȳ + β̂1 (Xj − X̄)
into equation (14) and we have
n h
X i2
SSE = Yj − Ȳ − β̂1 (Xj − X̄)
j=1

6
n h
X i
= Yj2 + Ȳ 2 + β̂12 (Xj − X̄)2 − 2Ȳ Yj − 2β̂1 Yj (Xj − X̄) − 2β̂1 Ȳ (Xj − X̄)
j=1
(15)
Note that
n
X
2Ȳ Yj = 2nȲ 2 (i)
j=1
 
Sxy
β̂12 Sxx = β̂1 Sxx = β̂1 Sxy (ii) and
Sxx
n
X
2β̂1 Ȳ (Xj − X̄) = 0 (iii)
j=1
Equation (15) becomes
n
X
SSE = Yj2 − nȲ 2 − β̂1 Sxy
j=1

But
n
X n
X
Yj2 − nȲ = 2
(Yj − Ȳ )2 = Syy
j=1 j=1
i.e the corrected sum of squares of the Y 0 s. Thus, the sum of squares of the
residuals becomes
SSE = Syy − β̂1 Sxy (16)
By taking the expectation of SSE , it can be shown that
E(SSE ) = (n − 2)σ 2
, therefore
SSE
σ2 = ≡ M SE (17)
n−2
is an unbiased estimator of σ 2 .
M SE is the error or residual mean square.

Task
Prove (17).

Remark
• Regression models should never be used for extrapolation.
• Regression relationships are valid only for values of the regressor variable
within the range of original data.
• As we move beyond the original range of X, we become less certain about
the validity of the assumed model.

7
7 Hypothesis testing in simple linear regression
To test hypothesis about the slope and intercept of the regression model, we
make additional assumption about the error term namely :
j ∼ N (0, σ 2 )
i.e they are independent and normally distributed with mean zero and variance
σ2 .

Slope
Suppose the experimenter wishes to test the hypothesis that the slope equals
some value, for example, β1,0 . The appropriate hypotheses are:
H0 : β1 = β1,0
H1 : β1 6= β1,0 (18)
2 2
If j are N.D(0, σ ), then Yj are N.D(β0 + β1 Xj , σ ).
2
Consequently β̂1 is N (β1 , Sσxx )
Also β̂1 is independent of M SE .
Then the result of the normality assumption, the statistic

β̂1 − β̂1,0
t0 = q (19)
M SE
Sxx

follows a t distribution with n − 2 degrees of freedom.


We would reject H0 if
| t0 |> t α2 ,n−2 (20)
where t0 is computed from equation (19)

Intercept
To test the hypotheses
H0 : β0 = β0,0
H1 : β0 6= β0,0 (21)
we would use the statistic
β̂0 − β0,0
t0 = q (22)
2
M SE ( n1 + Sx̄xx )

and reject the null hypothesis if


| t0 |> t α2 ,n−2

8
Important case
A very important special case of the hypotheses in equation (18) is

H0 : β1 = 0

H1 : β1 6= 0 (23)
The hypothesis H0 : β1 = 0 relates to the significance of regression.

Failing to reject H0 is equivalent to concluding that there is no linear com-


bination or relationship between X and Y ,that is the best estimator of Yj
for any Xj is Ŷj = Ȳ .

This means that there is no casual relationship between X and Y or that


the true relationship is not linear.
The test procedure for H0 : β1 = 0 is developed from two approaches.

Partitioning of the total corrected sum of squares


for Y
n
X
Syy = (Yj − Ȳ )2
j=1
n
X n
X
= (Ŷ − Ȳ )2 + (Yj − Ŷj )2 (24)
j=1 j=1

=(variability accounted by the regression line)+(Residual variation unexplained


by the regression line)
Also
Xn
SSE = (Yj − Ŷj )2
j=1

is the error or residual sum of squares and


n
X
SSR = (Ŷ − Ȳ )2
j=1

is the regression sum of squares. Equation (24) may be written as :

Syy = SSR + SSE (25)

From equation (16) i.e SSE = Syy − β̂1 Sxy ∼ (16)

we obtain the computing formula for SSR as

SSR = β̂1 Sxy

9
Syy has n − 1 degrees of freedom.
SSR has 1 degree of freedom and
SSE has n − 2 degrees of freedom.

Thus if H0 : β1 = 0 is true, the test statistic


SSR
1 M SR
F0 = SSE
= (26)
n−2
M SE

follows the F(α,1,n−2) distribution and we would reject H0 if F0 > F(α,1,n−2 )

The test procedure is usually arranged in an analysis of variance table


(ANOVA) table.

8 Analysis of variance for testing significance of


regression
Source of variation Sum of squares Degrees of freedom Mean square F0
M SR
Regression SSR = β̂1 Sxy 1 M SR M SE
Error or Residual SSE = Syy − β̂1 Sxy n−2 M SE
Total Syy n−1

Remark:Test for significance of regression may also be developed from equa-


tion (19) with β1,0 = 0 say
β̂1
t0 = q (27)
M SE
Sxx

By squaring both sides of equation (27), we obtain

(β̂1 )2 Sxx ˆ xy
β1 S M SR
t20 = = = (28)
M SE M SE M SE

Note that t20 in equation (28) is identical to F0 in equation (26).


It is true in general that the square of a t random variable with f degrees of
freedom is an F random variable with one and f degrees of freedom in the
numerator and denominator respectively.

Example 2
For data given in example 1, test for the significance of regression with the
fitted model
Ŷ = −0.2879 + 0.4566X

10
Solution
n Pn
X ( j=1 Yj )2
Syy = Yj2 −
j=1
12

(166.4)2
= 2435.14 − = 127.73
12
The regression sum of squares is

SSR = βˆ1 Sxy = (0.4566)(261.20) = 119.26

Thus, the error sum of squares is

SSE = Syy − SSR

= 127.73 − 119.26 = 8.47

The analysis of variance for testing H0 : β1 = 0 is summarized in the table


below.

ANOVA For Testing Significance of Regression


Source of variation Sum of squares Degrees of freedom Mean square F0
Regression 119.26 1 119.26 140.80
Residual 8.47 10 0.847
Total 127.73 11

H0 : β1 = 0
H1 : β1 6= 0
F(0.01,1,10) = 10
Since F0 > F(0.01,1,10) , we reject H0 and conclude that β1 6= 0.

N.B: The error mean square (residual mean square) is the estimate of σ 2 .

9 Interval estimation in simple linear regression


In addition to point estimates of the slope and intercept, it is possible to obtain
interval estimates of these parameters.
If the j are normally and independently distributed, then

(β̂1 − β1 )
q
M SE
Sxx

11
and
(β̂0 − β0 )
q
X¯2
M SE ( n1 + Sxx )

are both distributed as t with n − 2 degrees of freedom. Then


A 100(1 − α)% confidence interval on β1 is given by:
r
M SE
β̂1 ± t 2 ,n−2
α (29)
Sxx
Similarly, A 100(1 − α)% confidence interval of β0 is given by:
s
1 X̄ 2
β̂0 ± t( α2 ,n−2) M SE ( + ) (30)
n Sxx

From example 1, a 95% confidence interval for β1 for the data is given by (from
equation(29)) r
M SE
β̂1 ± t( 2 ,n−2)
α
Sxx
r
0.847
0.4566 ± (2.228)
572.0
= [0.4566 ± 0.08581]
= [0.37089, 0.54231]
or
0.37089 ≤ β1 ≤ 0.5423

Exercise
Find the 95% confidence interval of β̂0 using the data from example 1.
ANS (−3.033 ≤ β0 ≤ 2.4375)

10 Model Adequacy Checking:


Residual Analysis
As in fitting any linear model, analysis of the residuals from a regression model
is necessary to determine the adequacy of the least squares fit.
It is helpful to examine
(i) normal probability test (Test for normality)

• The plot must resemble a line and if this is the outcome, it is sufficient to
test for the normality.

12
• Plotting of the histogram
(ii) Residuals versus Fitted values(Test of independence)
• this is adequate to test for independence of residuals.
• the plot must be structureless.
(iii) Test for constant mean and variance of the residuals.
• plotting residuals against (order of data) regressor variable can do the best
for homogeneity of the mean and variance of the residuals.
• The plot of the residuals against the regressor should show that the mean
varies closely to zero with a relatively constant variance. *****missing
diagram****

11 The lack-of-fit test


• Regression models are often fitted to data when the true functional rela-
tionship is unknown.
• Naturally, we would like to know whether the order of the model tenta-
tively assumed is correct.
• We present a test for the ”goodness of fit” of a regression model.
• The diagram below is an illustration of using a regression model that is a
poor approximation of the true functional relationship.

******** Missing diagram*******

• A polynomial of degree two or greater should have been used for this
hypothetical situation.
• The model or procedure will generalise for k regressor variable easily.
• The hypotheses we wish to test are:

H0 : The model adequately fits the data.


H1 : The model does not fit the data.
• The test involves partitioning the error or residual sum of squares into the
following two components:

SSE = SSP E + SSLOF

where SSP E is the sum of squares attributable to ”pure” experimental


error.
SSLOF is the sum of squares attributable to the lack of fit of the model.

13
• To compute SSP E , we require observations on Y for at least one level of
X. i.e
Y11 , Y12 , Y13 , ...Y1n1 =repeated observations at X1 .
Y21 , Y22 , Y23 , ...Y2n2 =repeated observations at X2 .
.
.
.
Ym1 , Ym2 , Ym3 , ...Ymnm =repeated observations at Xm .
• We see that there are m distinct levels of X.
• The contribution to the pure error sum of squares at X1 , say would be
n1
X
(Yju − Ȳj )2 (31)
u=1

The total sum of squares for pure error would be obtained by summing
equation (31) over all levels of X as :
nj
m X
X
SSP E = (Yju − Ȳj )2
j=1 u=1

• There are n − m degrees of freedom associated with the pure-error sum of


squares.
• The sum of squares of lack of fit is simply:

SSLOF = SSE − SSP E


Pm
with n − 2 − ne = m − 2 degrees of freedom. (ne = j=1 (nj − 1))
• The test statistic for lack of fit would then be :
SSLOF
m−2 M SLOF
F0 = SSP E
= (32)
n−m
M SP E

and we would reject the hypothesis of model adequacy if

F0 > Fα,m−2,n−m

Remark
• This test procedure may be easily introduced into the analysis of variance
conducted for the significance of regression.
• If the null hypothesis of model adequacy is rejected, then the model must
be abandoned and attempts must be made to find a more appropriate
model.

14
• If H0 is rejected, then there is no apparent reason to doubt the adequacy
of the model.

*** M SP E and M SLOF are often combined to estimate σ 2 .

Example 3
Given the data below,
(i) Carry out the lack-of-fit test at 25%, level of significance.

(ii) Test the significance of the model at 5% level of significance.

X 1.0 1.0 2.0 3.3 3.3 4.0 4.0 4.0 4.7 5.0 5.6 5.6 5.6 6.0 6.0 6.5 6.9
Y 2.3 1.8 2.8 1.8 3.7 2.6 2.6 2.2 3.2 2.0 3.5 2.8 2.1 3.4 3.2 3.4 5.0

Solution
Syy = 10.96 Sxy = 13.62 Sxx = 52.32
ȳ = 2.847 x̄ = 4.382
The regression model is ŷ = 1.708 + 0.260x

SSR = β̂1 Sxy = (0.260)(13.62) = 3.541


The pure-error sum of squares is computed as follows:

(y − ȳ)2 Degrees of freedom


P
Level of x
1.0 0.1250 1
3.3 1.8050 1
4.0 0.1066 2
5.6 0.9800 2
6.0 0.0200 1
Totals 3.0366 7
N.B:
We have 17 data points, thus n = 17 but 10 are distinct values of x. Thus
xi = 10 distinct values=m. And n − m = 17 − 10 = 7. Degrees of freedom from
various sources are calculated as follows:

Source of variation Degrees of freedom


Regression 1
Residual error n−2
Lack of fit m−2
Pure error n−m
Total n−1

15
Analysis of variance
Source of variation sum of squares Degrees of freedom Mean square F0
Regression 3.541 1 3.541 7.15
Residual 7.429 15 0.4952
(Lack of fit) 4.3924 8 0.5491 1.27
(Pure error) 3.0366 7 0.4338
Total 10.970 16

(i) H0 : The model fits the data


H1 : The model does not fit the data.
Test statistic:
M SLOF 0.5491
F0 = = = 1.27
M SP E 0.4338
Critical Region :
Fc = F0.25,8,7 = 1.70
Now, F0 < Fc (1.27 < 1.70)
Therefore we fail to reject the null hypothesis that the tentative model
adequately describes the data at 25% level of significance.

(ii)
H0 : β 1 = 0
H1 : β1 6= 0
Test statistic:
M SR 3.541
F0 = = = 7.15
M SE 0.4952
Critical Region:
Fc = F0.05,1,15 = 4.54
Since F0 > Fc (7.15 > 4.54), we reject H0 at 5% level of significance and
conclude that β1 6= 0.

12 The Coefficient of Determination


The quantity Pn 2
2 SSR j=1 (ŷj − ȳ)
R = = Pn 2
(33)
Syy j=1 (yj − ȳ)

is called the coefficient of determination and is often used to judge the


adequacy of a regression model (0 ≤ R2 ≤ 1).
• We often refer loosely to R2 as the proportion of variability in the data
explained or accounted for by the regression model.

16
• If the regressor x is a random variable so that y and x may be viewed as
jointly distributed random variables, then R is just the simple correlation
between y and x.
• In example 1, we have
SSR 119.26
R2 = = = 0.9337
Syy 127.73
that is, 93.37% of the variability in the data is accounted for the model.
Alternatively, this can be written as:
SSE
R2 = 1 −
Syy
The range of R2 is
0 ≤ R2 ≤ 1
– If R2 = 1, we say that the fitted model is perfect. That is all residuals
are zero. What is the acceptable value of R2 ?
– this depends on the scientific field from which the data is collected.
e.g. A chemist charged with doing a linear calibration of a high
precision piece of equipment would be happy with a very high value
of R2 , say 0.999.
– Behavioural science may be collecting data reflecting human be-
haviour would be very content to get an R2 of 0.7
• Normally, values of R2 ≥ 0.80 are considered to show/indicate a
good fit.

13 Confidence Interval About The Regression


Line
• A confidence interval may be constructed for the mean response at a spec-
ified X, for example, X0 .
• This is a confidence interval about E(Y /X0 ) and is often called a confi-
dence interval about the regression line.
Since E(Y /X0 ) = β01 + β1 (x0 − x̄), we may obtain a point estimate of
E(Y /X0 ) from the fitted model as :
ˆ 0 ) ≡ Ŷ0 = β̂ 1 + β̂1 (x0 − x̄)
E(Y /X 0

• It is clear that E(Ŷ0 ) = β01 + β1 (x0 − x̄) since β̂01 and β̂1 are unbiased, and
further more that
(x0 − x̄)2
 
1
V ar(Ŷ0 ) = σ 2 +
n Sxx

17
• Also, Ŷ0 is normally distributed as β̂0 and β̂1 are normally distributed and
Cov(β̂01 , β̂1 = 0. prove this.

A 100(1 − α)% confidence interval about the true regression line at X = X0


may be computed from:
s
(x0 − x̄)2
 
1
ŷ0 ± t 2 ,n−2 M SE
α + (34)
n Sxx

Example 4
Construct a 95% confidence interval about the regression line for the data in
example 1 at x0 = 26. where ŷ0 = −0.2879 + 0.4566x0 .

Solution
At
X0 = 26
ŷ0 = −0.287 + 0.4566(26) = 11.5837
Therefore s
(x0 − 31)2
 
1
ŷ0 ± 2.228 (0.847) +
12 572.00
11.5837 − 0.73 ≤ E(Y /X0 = 26) ≤ 11.5837 + 0.73
OR
10.85 ≤ E(Y /X0 = 26) ≤ 12.31

14 Prediction Interval
A prediction interval is an estimate of an interval in which a future observation
will fall, with a certain probability, given what has already been observed.
Prediction intervals are often used in regression analysis.
Another useful concept in simple linear regression is the prediction interval, an
interval estimate on the mean of k future observations at a particular value of
X, say X0 .
A 100(1 − α)% prediction interval on the mean of k future observations at X0
is s
(x0 − x̄)2
 
1 1
ŷ0 ± t α2 ,n−2 M SE + + (35)
k n Sxx

18
Remark
• the prediction interval is of minimum width at X0 = X̄ and widens as
| X0 − X̄ | increases.
• If k = 1, then equation 35 yields a prediction interval on a single future
observation at X0 .

• By comparing equation 35 with equation 34, we observe that the


prediction interval at X0 is wider than the confidence interval at X0 .

Example 5
Suppose with the use of data in example 1, find a 95% prediction interval on
the mean impurity of the next two batches of paint produced at X0 = 34.

Solution
Now we have:
s
(34 − 31)2
 
1 1
15.2365 ± 2.228 (0.847) + +
2 12 572.00

this calculation yields


[15.2365 ± 1.5870]
. Thus the 95% prediction interval for k = 2 at X + 0 = 34 is

13.6495 ≤ ȳ0 ≤ 16.8235

15 Matrix form: Simple linear regression


Writing the simple linear regression back to back, we have

yi = β0 + β1 xi + εi i = 1, 2, ...N (36)

Then we have:
Y = Xβ + ε, ε ∼ N (0, σ 2 ) (37)
X =design matrix and the columns of the X are : a column of 10 s and a column
of xi .
We need to estimate β which has two components

β0 − intercept, β1 − slope

For N individual subjects, we have

19
y1 = β0 + β1 x1 + ε1
y2 = β0 + β1 x2 + ε2
y3 = β0 + β1 x3 + ε3
.
.
yN = β0 + β1 xN + εN
This will be written as follows in the matrix notation

     
y1 1 x1 ε1
 y2  1 x2   ε2 
     
 y3  1 x3     ε3 
    β0  
 .  = . . 
 β1 +  . 
 
  
 .  . .   . 
     
 .  . .   . 
yN 1 xn εN
Y X β ε
Therefore we have to solve for β:
" PN #  " P #
N
N i=1 x i β 0 y i
PN PN 2
= PNi=1
i=1 xi i=1 xi
β1 i=1 xi yi

XT X β XT y
Equivalently we could write them as:
N
X N
X
N β 0 + β1 xi = yi (38)
i=1 i=1

N
X N
X N
X
β0 x i + β1 x2i = xi yi (39)
i=1 i=1 i=1

And the estimates are given by the following equations:

β̂0 = ȳ − β̂1 x̄ (40)


PN
(xi − x̄)yi
β̂1 = Pi=1
N
(41)
i=1 (xi − x̄)2
Then we have to estimate β. The estimate of β is given by the equation:

β̂ = (X T X)−1 X T Y

20
Going through the math and deriving these estimates:
 
1 x1
1 x2 
 
  1 x3   
1 1 1 . . . 1    β0
. . 
x1 x2 x3 . . . xN    β1
. . 

. . 
1 xN
" PN # 
N i=1 x i β0
= PN PN 2
i=1 xi x
i=1 i
β1

XT X β
T
Lets compute X Y :
 
y1
 y2 
 
   y3  " PN #
1 1 1 . . . 1  
 .  = PNi=1 y i
x1 x2 x3 . . . xN  . 

i=1 xi yi
 
 . 
yN

XT Y

21

You might also like