Mungadze Linear
Mungadze Linear
In many problems, there are two or more variables that are related and it is
important to model and explore this relationship.
For example, in a chemical process, the yield of product is related to the op-
erating temperature. It may be of interest to build a model relating yield to
temperature and then use the model for prediction, process optimization or pro-
cess control.
In general, suppose that there is a single dependent variable or response Y
that depends on k independent or regressor variables eg X1 , X2 , ...XK
The relationship between these variables is characterized by a mathematical
model called a regression equation.
The regression model is fit to a set of sample data. In some instances, the ex-
perimenter knows the exact form of the true functional relationship between Y
and X1 , X2 , ...XK say
Y = φ(X1 , X2 , ...XK )
However, in most cases, the true functional relationship is unknown and the
experimenter chooses an appropriate function to approximate φ.
Generally, the analysis of variance in a designed experiment helps to identify
which factors are important and regression is used to build a quantitative model
relating the important factors to the response.
1
• Homoscedasticity- Equal variance of residuals across all levels of predic-
tors.
2
where the parameters of the straight line β0 and β1 are unknown constants. We
assume that each observation Y can be described by the model
Y = β0 + β1 X + (2)
Yj = β0 + β1 Xj + j , j = 1, 2, ...n
Minimising the least squares function is simplified if we write the model equation
(2) as
Y = β01 + β1 (X − X̄) + (4)
where Xn
1
X̄ = Xj
n j=1
and
β01 = β0 + β1 X̄
Equation (4) is frequently called the transformed simple linear regression
model or simply the transformed model.
Employing the transformed model, the least squares function becomes
n
X 2
Yj − β01 − β1 (Xj − X̄)
L= (5)
j=1
The least squares estimators of β01 and β1 say β̂01 and β̂1 must satisfy
n h
∂L X
1
i
| 1 = −2 Yj − β̂ 0 − β̂ 1 (Xj − X̄) =0
∂β01 β̂0 β̂1 j=1
n h
∂L X i
|β̂ 1 β̂1 = −2(Xj − X̄) Yj − β̂01 − β̂1 (Xj − X̄) = 0
∂β1 0 j=1
3
Simplifying these two equations yields
n
X
nβ̂01 = Yj
j=1
n
1X
⇒ β̂01 = Yj = Ȳ (6)
n j=1
n
X n
X
β̂1 (Xj − X̄)2 = Yj (Xj − X̄) (7)
j=1 j=1
Equations (6) and (7) are called the least squares normal equations and
the solutions are:
n
1X
β̂01 = Yj = Ȳ (8)
n j=1
Pn
j=1 Yj (Xj − X̄)
β̂1 = Pn (9)
j=1 (Xj − X̄)2
β̂01 and β̂1 are the least squares estimators of the intercept and slope respectively.
The fitted simple linear regression model is
and
n n P P
X X ( Xj )( Yj )
Sxy = Yj (Xj − X̄) = Xj Yj − (12)
j=1 j=1
n
Sxy
β̂1 = (13)
Sxx
4
Example 1
A study was made to determine the effect of stirring rate on the amount of im-
purity in paint produced by a chemical process. The study yielded the following
data.
Thus
Sxy 261.20
β̂1 = = = 0.4566
Sxx 572
β̂01 = Ȳ = 13.8667
and the fitted model is
Ŷ = −0.2879 + 0.4566X
N.B : Residuals ej = Yj − Ŷj where Ŷj , the fitted values are useful in exam-
ining the adequacy of the least squares fit.
5
6 Bias and variance properties of the estimators
Consider β̂1 (Expected value).
Sxy
E(β̂1 ) = E
Sxx
n
1 X
= E Yj (Xj − X̄)
Sxx j=1
n
1 X
= E (β01 + β1 (Xj − X̄) + j )(Xj − X̄)
Sxx j=1
n
1 X h X i hX i
= E β01 (Xj − X̄) + E β1 (Xj − X̄)2 + E j (Xj − X̄)
Sxx j=1
Pn
But j=1 (Xj − X̄) = 0 and E(j ) = 0, then
1
E(β̂1 ) = β1 Sxx
Sxx
⇒ E(β̂1 ) = β1
Thus β̂1 is an unbiased estimator of the true slope β1
Variance of β̂1
We have assumed thatV (j ) = σ 2 , it follows that V (Yj ) = σ 2
6
n h
X i
= Yj2 + Ȳ 2 + β̂12 (Xj − X̄)2 − 2Ȳ Yj − 2β̂1 Yj (Xj − X̄) − 2β̂1 Ȳ (Xj − X̄)
j=1
(15)
Note that
n
X
2Ȳ Yj = 2nȲ 2 (i)
j=1
Sxy
β̂12 Sxx = β̂1 Sxx = β̂1 Sxy (ii) and
Sxx
n
X
2β̂1 Ȳ (Xj − X̄) = 0 (iii)
j=1
Equation (15) becomes
n
X
SSE = Yj2 − nȲ 2 − β̂1 Sxy
j=1
But
n
X n
X
Yj2 − nȲ = 2
(Yj − Ȳ )2 = Syy
j=1 j=1
i.e the corrected sum of squares of the Y 0 s. Thus, the sum of squares of the
residuals becomes
SSE = Syy − β̂1 Sxy (16)
By taking the expectation of SSE , it can be shown that
E(SSE ) = (n − 2)σ 2
, therefore
SSE
σ2 = ≡ M SE (17)
n−2
is an unbiased estimator of σ 2 .
M SE is the error or residual mean square.
Task
Prove (17).
Remark
• Regression models should never be used for extrapolation.
• Regression relationships are valid only for values of the regressor variable
within the range of original data.
• As we move beyond the original range of X, we become less certain about
the validity of the assumed model.
7
7 Hypothesis testing in simple linear regression
To test hypothesis about the slope and intercept of the regression model, we
make additional assumption about the error term namely :
j ∼ N (0, σ 2 )
i.e they are independent and normally distributed with mean zero and variance
σ2 .
Slope
Suppose the experimenter wishes to test the hypothesis that the slope equals
some value, for example, β1,0 . The appropriate hypotheses are:
H0 : β1 = β1,0
H1 : β1 6= β1,0 (18)
2 2
If j are N.D(0, σ ), then Yj are N.D(β0 + β1 Xj , σ ).
2
Consequently β̂1 is N (β1 , Sσxx )
Also β̂1 is independent of M SE .
Then the result of the normality assumption, the statistic
β̂1 − β̂1,0
t0 = q (19)
M SE
Sxx
Intercept
To test the hypotheses
H0 : β0 = β0,0
H1 : β0 6= β0,0 (21)
we would use the statistic
β̂0 − β0,0
t0 = q (22)
2
M SE ( n1 + Sx̄xx )
8
Important case
A very important special case of the hypotheses in equation (18) is
H0 : β1 = 0
H1 : β1 6= 0 (23)
The hypothesis H0 : β1 = 0 relates to the significance of regression.
9
Syy has n − 1 degrees of freedom.
SSR has 1 degree of freedom and
SSE has n − 2 degrees of freedom.
(β̂1 )2 Sxx ˆ xy
β1 S M SR
t20 = = = (28)
M SE M SE M SE
Example 2
For data given in example 1, test for the significance of regression with the
fitted model
Ŷ = −0.2879 + 0.4566X
10
Solution
n Pn
X ( j=1 Yj )2
Syy = Yj2 −
j=1
12
(166.4)2
= 2435.14 − = 127.73
12
The regression sum of squares is
H0 : β1 = 0
H1 : β1 6= 0
F(0.01,1,10) = 10
Since F0 > F(0.01,1,10) , we reject H0 and conclude that β1 6= 0.
N.B: The error mean square (residual mean square) is the estimate of σ 2 .
(β̂1 − β1 )
q
M SE
Sxx
11
and
(β̂0 − β0 )
q
X¯2
M SE ( n1 + Sxx )
From example 1, a 95% confidence interval for β1 for the data is given by (from
equation(29)) r
M SE
β̂1 ± t( 2 ,n−2)
α
Sxx
r
0.847
0.4566 ± (2.228)
572.0
= [0.4566 ± 0.08581]
= [0.37089, 0.54231]
or
0.37089 ≤ β1 ≤ 0.5423
Exercise
Find the 95% confidence interval of β̂0 using the data from example 1.
ANS (−3.033 ≤ β0 ≤ 2.4375)
• The plot must resemble a line and if this is the outcome, it is sufficient to
test for the normality.
12
• Plotting of the histogram
(ii) Residuals versus Fitted values(Test of independence)
• this is adequate to test for independence of residuals.
• the plot must be structureless.
(iii) Test for constant mean and variance of the residuals.
• plotting residuals against (order of data) regressor variable can do the best
for homogeneity of the mean and variance of the residuals.
• The plot of the residuals against the regressor should show that the mean
varies closely to zero with a relatively constant variance. *****missing
diagram****
• A polynomial of degree two or greater should have been used for this
hypothetical situation.
• The model or procedure will generalise for k regressor variable easily.
• The hypotheses we wish to test are:
13
• To compute SSP E , we require observations on Y for at least one level of
X. i.e
Y11 , Y12 , Y13 , ...Y1n1 =repeated observations at X1 .
Y21 , Y22 , Y23 , ...Y2n2 =repeated observations at X2 .
.
.
.
Ym1 , Ym2 , Ym3 , ...Ymnm =repeated observations at Xm .
• We see that there are m distinct levels of X.
• The contribution to the pure error sum of squares at X1 , say would be
n1
X
(Yju − Ȳj )2 (31)
u=1
The total sum of squares for pure error would be obtained by summing
equation (31) over all levels of X as :
nj
m X
X
SSP E = (Yju − Ȳj )2
j=1 u=1
F0 > Fα,m−2,n−m
Remark
• This test procedure may be easily introduced into the analysis of variance
conducted for the significance of regression.
• If the null hypothesis of model adequacy is rejected, then the model must
be abandoned and attempts must be made to find a more appropriate
model.
14
• If H0 is rejected, then there is no apparent reason to doubt the adequacy
of the model.
Example 3
Given the data below,
(i) Carry out the lack-of-fit test at 25%, level of significance.
X 1.0 1.0 2.0 3.3 3.3 4.0 4.0 4.0 4.7 5.0 5.6 5.6 5.6 6.0 6.0 6.5 6.9
Y 2.3 1.8 2.8 1.8 3.7 2.6 2.6 2.2 3.2 2.0 3.5 2.8 2.1 3.4 3.2 3.4 5.0
Solution
Syy = 10.96 Sxy = 13.62 Sxx = 52.32
ȳ = 2.847 x̄ = 4.382
The regression model is ŷ = 1.708 + 0.260x
15
Analysis of variance
Source of variation sum of squares Degrees of freedom Mean square F0
Regression 3.541 1 3.541 7.15
Residual 7.429 15 0.4952
(Lack of fit) 4.3924 8 0.5491 1.27
(Pure error) 3.0366 7 0.4338
Total 10.970 16
(ii)
H0 : β 1 = 0
H1 : β1 6= 0
Test statistic:
M SR 3.541
F0 = = = 7.15
M SE 0.4952
Critical Region:
Fc = F0.05,1,15 = 4.54
Since F0 > Fc (7.15 > 4.54), we reject H0 at 5% level of significance and
conclude that β1 6= 0.
16
• If the regressor x is a random variable so that y and x may be viewed as
jointly distributed random variables, then R is just the simple correlation
between y and x.
• In example 1, we have
SSR 119.26
R2 = = = 0.9337
Syy 127.73
that is, 93.37% of the variability in the data is accounted for the model.
Alternatively, this can be written as:
SSE
R2 = 1 −
Syy
The range of R2 is
0 ≤ R2 ≤ 1
– If R2 = 1, we say that the fitted model is perfect. That is all residuals
are zero. What is the acceptable value of R2 ?
– this depends on the scientific field from which the data is collected.
e.g. A chemist charged with doing a linear calibration of a high
precision piece of equipment would be happy with a very high value
of R2 , say 0.999.
– Behavioural science may be collecting data reflecting human be-
haviour would be very content to get an R2 of 0.7
• Normally, values of R2 ≥ 0.80 are considered to show/indicate a
good fit.
• It is clear that E(Ŷ0 ) = β01 + β1 (x0 − x̄) since β̂01 and β̂1 are unbiased, and
further more that
(x0 − x̄)2
1
V ar(Ŷ0 ) = σ 2 +
n Sxx
17
• Also, Ŷ0 is normally distributed as β̂0 and β̂1 are normally distributed and
Cov(β̂01 , β̂1 = 0. prove this.
Example 4
Construct a 95% confidence interval about the regression line for the data in
example 1 at x0 = 26. where ŷ0 = −0.2879 + 0.4566x0 .
Solution
At
X0 = 26
ŷ0 = −0.287 + 0.4566(26) = 11.5837
Therefore s
(x0 − 31)2
1
ŷ0 ± 2.228 (0.847) +
12 572.00
11.5837 − 0.73 ≤ E(Y /X0 = 26) ≤ 11.5837 + 0.73
OR
10.85 ≤ E(Y /X0 = 26) ≤ 12.31
14 Prediction Interval
A prediction interval is an estimate of an interval in which a future observation
will fall, with a certain probability, given what has already been observed.
Prediction intervals are often used in regression analysis.
Another useful concept in simple linear regression is the prediction interval, an
interval estimate on the mean of k future observations at a particular value of
X, say X0 .
A 100(1 − α)% prediction interval on the mean of k future observations at X0
is s
(x0 − x̄)2
1 1
ŷ0 ± t α2 ,n−2 M SE + + (35)
k n Sxx
18
Remark
• the prediction interval is of minimum width at X0 = X̄ and widens as
| X0 − X̄ | increases.
• If k = 1, then equation 35 yields a prediction interval on a single future
observation at X0 .
Example 5
Suppose with the use of data in example 1, find a 95% prediction interval on
the mean impurity of the next two batches of paint produced at X0 = 34.
Solution
Now we have:
s
(34 − 31)2
1 1
15.2365 ± 2.228 (0.847) + +
2 12 572.00
yi = β0 + β1 xi + εi i = 1, 2, ...N (36)
Then we have:
Y = Xβ + ε, ε ∼ N (0, σ 2 ) (37)
X =design matrix and the columns of the X are : a column of 10 s and a column
of xi .
We need to estimate β which has two components
β0 − intercept, β1 − slope
19
y1 = β0 + β1 x1 + ε1
y2 = β0 + β1 x2 + ε2
y3 = β0 + β1 x3 + ε3
.
.
yN = β0 + β1 xN + εN
This will be written as follows in the matrix notation
y1 1 x1 ε1
y2 1 x2 ε2
y3 1 x3 ε3
β0
. = . .
β1 + .
. . . .
. . . .
yN 1 xn εN
Y X β ε
Therefore we have to solve for β:
" PN # " P #
N
N i=1 x i β 0 y i
PN PN 2
= PNi=1
i=1 xi i=1 xi
β1 i=1 xi yi
XT X β XT y
Equivalently we could write them as:
N
X N
X
N β 0 + β1 xi = yi (38)
i=1 i=1
N
X N
X N
X
β0 x i + β1 x2i = xi yi (39)
i=1 i=1 i=1
β̂ = (X T X)−1 X T Y
20
Going through the math and deriving these estimates:
1 x1
1 x2
1 x3
1 1 1 . . . 1 β0
. .
x1 x2 x3 . . . xN β1
. .
. .
1 xN
" PN #
N i=1 x i β0
= PN PN 2
i=1 xi x
i=1 i
β1
XT X β
T
Lets compute X Y :
y1
y2
y3 " PN #
1 1 1 . . . 1
. = PNi=1 y i
x1 x2 x3 . . . xN .
i=1 xi yi
.
yN
XT Y
21