Stat 473-573 Notes
Stat 473-573 Notes
Stat 473-573 Notes
[email protected]
Oce: 361 F
815-753-6829
1. The prerequisite for this course is STAT 350, STAT 301 and
MATH 211.
2. The prescribed text for this course is
APPLIED LINEAR STATISTICAL MODELS -5
th
EDITION by
Kutner et al.
3. For this course we plan on covering chapters 1, 2, 3, 4, 6, 7,
9, 10 and 11. However if time permits we can start chapter 8
or 12.
4. The rst midterm exam will be on Sep 28 and the second one
will be on Nov 18.
5. Reading Exercise: Please go through Sections 1.1, 1.2, 1.3,
1.4, 1.5 on your own.
A brief Introduction
Experiment
Probability
Problem: Study how the Lot Size explains the Work Hours.
That is study the change in Work Hours when the Lot Size
changes.
Most people use statistics the way a drunk uses a lamp post.
More for support than illumination.
Linear trend
NonLinear trend
No trend
Linear Trend
6 4 2 0 2 4 6
5
0
5
1
0
x
y
1
Linear Trend
5 0 5
5
0
5
x
y
5 0 5
5
0
5
x
y
1
Nonlinear Trend
6 4 2 0 2 4 6
0
1
0
2
0
3
0
4
0
5
0
x
y
1
Nonlinear Trend
5 0 5
0
1
0
2
0
3
0
4
0
5
0
x
y
5 0 5
0
1
0
2
0
3
0
4
0
5
0
x
y
1
No Trend
4 2 0 2 4
1
0
1
2
x
y
If the scatter plot does not show linear plot, does it mean that
we can never we a linear model there?
Scatter Plot: Refrigeration Equipment Data
How do we construct the model
Y =
0
+
1
X which would be a deterministic model, which
would not be able to explain the scattering
Y =
0
+
1
X + where is a random variable
Why the epsilon?
Our model is
Y
i
=
0
+
1
X
i
+
for i = 1, ..., n
Here Y
i
is the i
th
observation ofresponse variable.
0
and
1
are the model parameters.
X
i
s are known constants, i.e., the i
th
value of the predictor
varaible.
i
are i.i.d N(0,
2
). Is known?
The tted model
What is best?
The sum of squares of the distance from the obs and the
tted is:
n
i =1
(Y
i
0
+
1
X
i
)
2
The tted model
i =1
(Y
i
0
+
1
X
i
)
2
b
1
=
n
i =1
(X
i
X)(Y
i
Y)
n
i =1
(X
i
X)
2
b
0
=
Y b
1
X
Now that we the estimated values of the parameters we us the
data to compute it and plot the tted regression line on the scatter
plot. (See reg2.png)
The Gauss Markov Theorem
Unbisaed
When X
i
= 0 we nd Y
i
=
0
. So
0
is that population mean
of the observed variable y when x = 0.
Y
i
= b
0
+ b
1
X
i
An example. page 23
What is E(Y
i
) =?. What is E(Y
i
Y
i
) =?
n
i =1
e
2
i
Estimation of
2
2
= MSE =
SSE
n 2
E(MSE) =
2
MSE =
_
SSE
n 2
Chapter 1: Important Terms and Concepts:
1. explanatory variable/covariates/independent variable
2. predictor variable/observations/dependent variable
3. scatter plot
4. Linear Model
5. Simple linear regression
6. slope
1
its estimate b
0
and its interpretation
7. intercept
0
its estimate and its interpretation
8. method of least squares
9. Fitted model and its properties (7 of them). Fitted value.
10. Gauss Markov Theorem
11. residual
12. error sum of squares or residual sum of squares
13. error mean square or residual mean square
14. estimate of
2
The section in Chapter 1 which will not be included in the syllabus
is 1.8.
Chapter 2
5 0 5
3
4
5
6
7
x
y
1
5 0 5
3
4
5
6
7
x
y
We need a H
0
and a H
1
.
Distribution of T under H
0
.
Check to see if T
belongs to R.
n
i =1
(X
i
X)(Y
i
Y)
n
i =1
(X
i
X)
2
=
n
i =1
(X
i
X)
n
i =1
(X
i
X)
2
Y
i
This b
1
shall be our test statistic.
E(b
1
)=?
Var(b
1
)=?
Estimated variance of b
1
, that is, estimate of Var (b
1
) (s(b
1
)
2
).
2
n
n
?
Where Z( N(0, 1)) and
2
n
are independent.
If we know (n 2)
2
/
2
2
n2
Now
P
_
t(/2; n 2)
b
1
1
s(b
1
)
t(1 /2; n 2)
_
= 1
i =1
(Y
i
Y)
Is s
2
always 0? When will it be 0?
A part explained by X
i
s.
Random error
i
s.
Y
i
=
Y + Y
i
Y
What is
Y?
Y
i
Y =
Y
i
Y + Y
i
Y
Thus
(Y
i
Y)
2
= (
Y
i
Y + Y
i
Y)
2
i =1
(Y
i
Y)
2
=
n
i =1
(
Y
i
Y)
2
+
n
i =1
(Y
i
Y)
2
So we have,
n
i =1
(Y
i
Y)
2
=
n
i =1
(
Y
i
Y)
2
+
n
i =1
(Y
i
Y)
2
n
i =1
(
Y
i
Y)
2
MSR = SSR/df(SSR)
n
i =1
(Y
i
Y)
2
MSE = SSE/df(SSE)
n
i =1
(Y
i
Y)
2
E(MSR) =
2
+
2
1
n
i =1
(X
i
X)
2
E(MSE) =
2
E(MSR)
E(MSE)
=
2
+
2
1
n
i =1
(X
i
X)
2
2
= 1 when
1
= 0.
n
i =1
Z
2
i
? where Z
i
are i.i.d N(0, 1)
2
n
n
where Z and
2
n
are independent.
F =
2
m
/m
2
n
/n
where
2
m
and
2
n
are independent.
Now
F =
SSR
2
SSE
2
(n2)
If we know, under H
0
:
1
= 0,
SSR
2
and
SSE
2
are independent
and have
2
1
and
2
n2
F F(1, n 2)
Recall that
E(MSR)
E(MSE)
=
2
+
2
1
n
i =1
(X
i
X)
2
2
= 1 when
1
= 0.
Notice if
{X
2
> 4} = {X > 2} {X < 2}
Since F = T
2
, therefore
{F > c} = {T >
c} {T <
c}
Recall
b
0
=
Y b
1
X =
1
n
n
i =1
Y
i
_
n
i =1
Xk
i
Y
i
_
=
n
i =1
(
1
n
Xk
i
)Y
i
E(b
0
) =
0
2
(b
0
) =
2
_
1
n
+
X
2
(X
i
X)
2
_
Estimated variance of b
0
, that is, estimate of Var (b
0
) (s(b
0
)
2
).
s(b
0
)
2
= MSE
_
1
n
+
X
2
(X
i
X)
2
_
Recall that
Z\
_
2
n
n
t
n2
Where Z( N(0, 1)) and
2
n
are independent.
If we know (n 2)
2
/
2
2
n2
Now
P
_
t(/2; n 2)
b
0
0
s(b
0
)
t(1 /2; n 2)
_
= 1
Recall Y
h
=
0
+
1
X
h
+
h
. Thus
E(Y
h
) =
0
+
1
X
h
Y
h
= b
0
+ b
1
X
h
would be an estimator of E(Y
h
).
E(
Y
h
) =
0
+
1
X
h
Variance of
Y
h
2
(
Y
h
) =
2
_
1
n
+
(X
h
X)
2
(X
h
X)
2
_
Estimate of
2
(
Y
h
) =?
MSE
_
1
n
+
(X
h
X)
2
(X
h
X)
2
_
Sampling distribution of
Y
h
Y
h
(
0
+
1
X
h
)
_
Var(
Y
h
)
Estimated variance of
Y
h
, that is, estimate of Var (
Y
h
)
(s(
Y
h
)
2
).
MSE
_
1
n
+
(X
h
X)
2
(X
h
X)
2
_
Recall that
Z\
_
2
n
n
t
n2
Where Z( N(0, 1)) and
2
n
are independent.
If we know (n 2)
2
/
2
2
n2
Y
h
(
0
+
1
X
h
)
_
MSE
_
1
n
+
(X
h
X)
2
(X
h
X)
2
_
Condence interval for E(Y
h
)
Now
P
_
t(/2; n2)
Y
h
(
0
+
1
X
h
)
s(
Y
h
)
t(1/2; n2)
_
= 1
Y
h
), b
0
+b
1
X
h
+t(1/2; n2)s(
Y
h
)
_
Prediction of new observation
Now suppose we have X = X
new
and want to nd the
corresponding value of Y
new
. Since
Y
new
=
0
+
1
X
new
+
new
and
new
is unobserved so we shall new be able to get the exact
value of Y
new
. So here we propose an interval estimate of Y
new
.
Recall
Y
new
= b
0
+ b
1
X
new
.
Now consider Y
new
Y
new
.
E(Y
new
Y
new
) =?
Var (Y
new
Y
new
) =
2
+
2
_
1
n
+
(X
new
X)
2
(X
i
(X))
_
s
2
(Y
new
Y
new
) = MSE + MSE
_
1
n
+
(X
new
X)
2
(X
i
(X))
_
2
+
2
_
1
n
+
(X
new
X)
2
(X
i
(X))
_
If
2
is unknown we shall replace it by the MSE
X)
2
(X
i
(X))
_
Y
new
Y
new
_
MSE
_
1 +
1
n
+
(X
new
X)
2
(X
i
(X))
_
Y
new
+ t(1 /2; n 2) s(Y
new
Y
new
)
_
R
2
and r
i =1
(Y
i
Y)
2
=
n
i =1
(
Y
i
Y)
2
+
n
i =1
(Y
i
Y)
2
n
i =1
(
Y
i
Y)
2
n
i =1
(Y
i
Y)
2
n
i =1
(Y
i
Y)
2
Thus,
1 =
SSR
SSTO
+
SSE
SSTO
SSR
SSTO
= 1
SSE
SSTO
R
2
=
SSR
SSTO
= 1
SSE
SSTO
R
2
= 0. What does it imply? See the expln on page 74.
A brief note on R
2
and r .
R
2
is dened as the proportionate reduction of the total variation
associated with the use of the predictor variable X.
R
2
=
SSR
SSTO
= 1
SSE
SSTO
=
SSTO SSE
SSTO
Thus larger the value of SSR (or smaller the value of SSE)
closer is R
2
to 1.
i =1
(y
i
y)
2
.
i =1
(y
i
y
i
)
2
.
i =1
(y
i
y)
2
i =1
(y
i
y
i
)
2
n
i =1
(y
i
y)
2
n
i =1
(y
i
y
i
)
2
n
i =1
(y
i
y
i
)
2
=
n
i =1
( y y
i
)
2
n
i =1
(y
i
y
i
)
2
Limitations of R
2
In the text book page 75 the authors speaks of 3
misunderstandings. Here we go over them.
Y
h
) =
0
+
1
X
h
. This
implies that the tted regression equation
Y
h
= b
0
+ b
1
X
h
is
an unbiased estimator of the mean value of Y when the level
of X is xed at X
h
. Now merely being unbiased does not
mean much. We would want to make sure that the variance
of this estimator is not too large. We have
s.e
2
(
Y
h
) = MSE
_
1
n
+
(X
h
X)
2
(X
i
X)
2
_
So if X
h
= 100, then
Y
h
= b
0
+ b
1
100, which would be an
unbiased estimator of
0
+
1
100, but the s.e
2
would be
203.72, which we can see is quite large. (See page 55
Example 2)
Using 100 (1 )% C.I. for testing hypothesis at level
That is
P
_
b
1
t(1 /2, n 2) se(b
1
)
1
b
1
+ t(1 /2, n 2) se(b
1
)
_
= 1
1
and we want to use it to test the null hypothesis
H
0
:
1
=
10
against two sided alternative, then what do we
do.
We check to see if
10
lies outside the interval, if so then we
reject the null.
Chapter 2
Sections not included for the midterm 2.6, 2.8 and 2.11
T test
Prediction interval of Y
new
R
2
and r
Chapter 3
Recall that
e
i
= Y
i
Y
i
and
Y
i
= b
0
+ b
1
X
i
i =1
e
i
= 0
n
i =1
X
i
e
i
= 0
Now E(e
i
) = 0 for all i = 1, ..., n.
Var (e
i
) =
2
(1 h
ii
)
What is h
ii
? Its the i
th
diagonal element of the Hat matrix
H? What is the Hat matrix? Later.
So consider
e
i
=
e
i
e
_
Var (e
i
)
=
e
i
_
Var (e
i
)
=
e
i
_
2
(1 h
ii
)
This is going to make it mean 0 and variance 1. But what if
variance
2
is unknown?
i
=
e
i
e
_
Var (e
i
)
=
e
i
_
Var (e
i
)
=
e
i
_
MSE(1 h
ii
)
Nonlinearity of Regression Function
1
0
5
0
5
1
0
1
5
independent
d
e
p
e
n
d
e
n
t
Semi-Studentised Residual and Independent Variable Plot
10 5 0 5 10
2
.
0
1
.
5
1
.
0
0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
predictor
s
t
d
r
e
s
i
d
u
a
l
s
Semi-Studentised Residual and Fitted Equation Plot
10 5 0 5 10 15
2
.
0
1
.
5
1
.
0
0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
fitted
s
t
d
r
e
s
i
d
u
a
l
s
1
.
0
0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
predictor
s
t
d
r
e
s
i
d
u
a
l
s
Semi-Studentised Residual and Fitted Equation Plot
36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8
1
.
0
0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
fitted
s
t
d
r
e
s
i
d
u
a
l
s
From the scatter plot we observe that its not linear at all. So
we can guess that a linear t wont work.
Notice that the plot shows a distinct pattern (in this case a
quadratic pattern). Implying that the functional form of the
population regression equation is not linear.
0
.
6
0
.
4
0
.
2
0
.
0
0
.
2
0
.
4
0
.
6
predictor
r
e
s
i
d
u
a
l
s
Semi-Studentised Residual and Fitted Equation Plot
5.0135 5.0140 5.0145 5.0150 5.0155 5.0160 5.0165
0
.
6
0
.
4
0
.
2
0
.
0
0
.
2
0
.
4
0
.
6
fitted
r
e
s
i
d
u
a
l
s
In the past exmples it was pretty obvious from the scatter plot
itself as what to expect.
1
.
0
0
.
5
0
.
0
0
.
5
1
.
0
predictor
r
e
s
i
d
u
a
l
s
Semi-Studentised Residual and Fitted Equation Plot
2 3 4 5 6 7
1
.
0
0
.
5
0
.
0
0
.
5
1
.
0
fitted
r
e
s
i
d
u
a
l
s
b0 -1.727 0.13500
b1 6.484 0.00064 ***
R-squared: 0.8751
Scatter Plot
0 5 10 15 20
0
1
0
0
2
0
0
3
0
0
4
0
0
independent
d
e
p
e
n
d
e
n
t
Semi-Studentised Residual and Independent Variable Plot
0 5 10 15 20
1
.
0
0
.
5
0
.
0
0
.
5
1
.
0
1
.
5
2
.
0
predictor
s
t
d
r
e
s
i
d
u
a
l
s
Scatter Plot and the Regression Equation
0 5 10 15 20
0
1
0
0
2
0
0
3
0
0
4
0
0
independent
d
e
p
e
n
d
e
n
t
i
=
i 1
+ u
i
where u
i
are i.i.d N(0,
2
)
Observe the plot. If it does not display any pattern then the
errors are independent.
H
0
: The errors are not autocorrelated
H
1
: The errors are autocorrelated
Test Statistic:
d =
n
i =2
(e
i 1
e
i
)
2
n
i =1
e
2
i
Q-Q Plot
Why z
(i )
? because E(e
[i ]
) very close to
2
z
(i )
Outliers
Extreme Observations
MODEL ASSUMPTIONS
OUTLIER DETECTION
X
1
.
Transforming X.
Y, then the
distribution of the error shall change and the variances shall not
necessarily be constant any more.
These are issues related to the error term. So the only way to
x it would be transforming the Y values. Usually both these
problems are addressed together by using only one
transformation.
We could just guess from the scatter plot like we did for X,
but there is another method which is less heuristic.
instead of Y,
that is now we have to t
Y
i
=
0
+
1
X
i
+
i
Box Cox Transformation
i
1)
If = 0
W
i
= (
n
1
Y
i
)
1/n
log
e
(Y
i
)
P(R) = P(A B) 1
Regression through the origin
E(Y
i
) =
1
X
i
The LSE of
1
is
b
1
=
n
i =1
X
i
Y
i
/
n
i =1
X
2
i
i =1
e
2
i
/(n 1)
j =1
j
X
ji
+
i
for all i = 1 to n
Y
n1
= X
np
p1
+
n1
where
Y
n1
= vector of n observations.
Y = (Y
1
, ..., Y
n
)
X
np
= the design matrix.
The i
th
row of the above matrix is
X
i
= (1, X
1i
, X
2i
, ..., X
(p1)i
)
n1
= vector of n errors.
that is
= (
1
, ...,
n
)
and
i
are i.i.d N(0,
2
)
Now that we can write
Y = X +
we have
E(Y
n1
) = X
Note we are taking the expectation of a vector.
V(Y) =
2
I
nn
We derived the variance of a vector. The Least Squares
Estimates, here, are
= b = (X
X)
1
X
Y
E(
) =
Fitted value
Y = X
Y = X(X
X)
1
X
Y
or
Y = HY
H = X(X
X)
1
X
Y
1
n
Y
JY = Y
[I
1
n
J]Y
as usual it has n 1 degrees of freedom.
SSE = e
e = (Y Xb)
(Y Xb) = Y
(I H)Y
it has n p degrees of freedom.
SSR = b
Y
1
n
Y
JY = Y
[H
1
n
J]Y
it has p 1 degrees of freedom. where J is an n n matrix of 1s.
(For a numerical example see page 243, for ANOVA table see page
225)
Thus we have
MSE = SSE/(n p)
(n p)MSE/
2
2
np
MSR = SSR/(p 1)
As before it can be shown that, if all the
i
s are zero, then,
E(MSR) =
2
otherwise
E(MSR) >
2
Now that we have MSE, MSR we can construct an ANOVA table
as before. Now here,
H
0
:
1
=
2
= ... =
p1
= 0
H
a
: At least one
i
is non zero
The test statistic
F
=
MSR
MSE
Rejection region is
F
F(1 ; p 1, n p)
R
2
: Coecient of multiple determination
As before
R
2
=
SSR
SSTO
= 1
SSE
SSTO
0 R
2
1. Now R
2
= 0, when all b
k
= 0 for
k = 0, 1, .., p 1, and R
2
= 1 when all
Y
i
= Y
i
for all
i = 1, ..., n.
X)
1
Why?
s
2
(b) = MSE(X
X)
1
We know that
b MVN(, )
So we know that each b
k
where k = 0, 1, ..., (p 1) is normally
distributed. Hence as before
b
k
k
s{b
k
}
t(n p)
for all k = 0, 1, 2, ..., p 1
Hence the interval estimate of
k
with (1 ) condence
coecient is
_
b
k
t(1 /2, n p)s{b
k
}, b
k
+ t(1 /2, n p)s{b
k
}
_
Tests for
k
where k = 0, 1, 2, ..., p 1. In order to test
H
0
:
k
= 0
H
a
:
k
= 0
We use
t
=
b
k
s{b
k
}
as our test statistic and our ctitical region is,
|t
= (XX
)
1
Y
In order to test
H
0
:
j
= 0
H
a
:
j
= 0
We use, the t-test, where the test statistic is
i
i
MSE C
ii
t
(np)
Also we had
SSE = SSR + SSE
and as p increases SSE decreases or (SSR increases) and vice versa.
Here
SST =
n
i =1
(y
i
y)
2
and it does not depend on either p, the number of parameters in
the model or the values of the covariates in the model (i.e, the
actual values of X
i
s).
Now lets, for a moment, get back to the SLR model setting, that
is,
y
i
=
0
+
1
x
i
+
i
when i = 1, ..., n
Now in this setting the full model is
y
i
=
0
+
1
x
i
+
i
and the reduced model is
y
i
=
0
+
i
Under the full model the
SSE =
n
i =1
(y
i
y
i
)
2
=
n
i =1
(y
i
b
0
b
1
x
i
)
2
Under the reduced model
SSE =
n
i =1
(y
i
y
i
)
2
=
n
i =1
(y
i
y)
2
Observe that, under the reduced model, the SSE = SST. Since
SSE decreases as p increases, so if adding any new variable had
any eect can be found out by comparing
SSE(F) with SSE(R)
We also know that
SSE(F) SSE(R)
So in order to test
H
0
:
1
= 0
H
a
:
1
= 0
we use the following test statistic
F
=
(SSE(R) SSE(F))/(df
R
df
F
)
SSE(F)/df
F
If H
0
is true we know
F
=
SST SSE
(n 1) (n 2)
SSE
(n 2)
=
MSR
MSE
F(1, n 2)
Here (that is the case when p=2) we nd that the General Linear
Test is identical to the ANOVA test.
When p=2, we have
SST = SSR(X
1
) + SSE(X
1
)
When p=3, we have
SST = SSR(X
1
, X
2
) + SSE(X
1
, X
2
)
When p=4, we have,
SST = SSR(X
1
, X
2
, X
3
) + SSE(X
1
, X
2
, X
3
)
EXTRA SUM OF SQUARES:
SSR(X
2
|X
1
) = SSR(X
1
, X
2
) SSR(X
1
)
= SSE(X
1
) SSE(X
1
, X
2
)
The EXTRA SUM OF SQUARES is the measure of the marginal
eect of adding the new variable to the existing model. Similarly
we can dene:
SSR(X
3
|X
1
, X
2
) = SSE(X
1
, X
2
) SSE(X
1
, X
2
, X
3
)
or
SSR(X
3
, X
2
|X
1
) = SSE(X
1
) SSE(X
1
, X
2
, X
3
)
SST = SSR(X
1
) + SSE(X
1
)
SSR(X
2
|X
1
) = SSE(X
1
) SSE(X
1
, X
2
)
SST = SSR(X
1
) + SSE(X
2
|X
1
) + SSE(X
1
, X
2
)
SST = SSR(X
1
, X
2
) + SSE(X
1
, X
2
)
Comparing the two we get
SSR(X
1
, X
2
) = SSR(X
1
) + SSR(X
2
|X
1
)
Since
SSR(X
1
|X
2
) = SSE(X
1
) SSE(X
1
, X
2
)
SSR(X
3
|X
1
, X
2
) = SSE(X
1
, X
2
) SSE(X
1
, X
2
, X
3
)
We can write
SST = SSR(X
1
) + SSE(X
1
)
= SSR(X
1
) + SSE(X
1
, X
2
) + SSE(X
2
|X
1
)
= SSR(X
1
) + SSE(X
2
|X
1
) + SSR(X
3
|X
1
, X
2
) + SSE(X
1
, X
2
, X
3
)
Comparing this with
SST = SSR(X
1
, X
2
, X
3
) + SSE(X
1
, X
2
, X
3
)
We can write
SSR(X
1
, X
2
, X
3
) = SSR(X
1
) + SSE(X
2
|X
1
) + SSR(X
3
|X
1
, X
2
)
The df of SSR is p 1, so the df of SSR(X
1
|X
2
, X
3
) is 1, and that
of SSR(X
1
, X
2
|X
3
) is 2. Now that we have SSR we can also dene
the MSR
MSR(X
2
, X
3
|X
1
) = SSR(X
2
, X
3
|X
1
)/2
Thus we decompose the total SSR into smaller components. What
is the use of all this ? Well it gives us an idea as to how the
reduction in variation takes place and how each covariate is
responsible in bringing about this change, in other words, the
contribution of each covariate gets more explicit.
Consider the following (full)model
Y
i
=
0
+
1
X
i 1
+
2
X
i 2
+
3
X
i 3
+
i
H
0
:
3
= 0
H
a
:
3
= 0
SSE(F) = SSE(X
1
, X
2
, X
3
)
SSE(R) = SSE(X
1
, X
2
)
F
=
SSE(R) SSE(F)
df
R
df
F
SSE(F)
df
F
=
SSE(X
1
, X
2
) SSE(X
1
, X
2
, X
3
)
(n 3) (n 4)
SSE(X
1
, X
2
, X
3
)
(n 4)
=
SSR(X
3
|X
1
, X
2
)
1
SSE(X
1
, X
2
, X
3
)
n 4
=
MSR(X
3
|X
1
, X
2
)
MSE(X
1
, X
2
, X
3
)
This is known as the partial F-test
H
0
:
2
=
3
= 0
H
a
: H
0
is not true.
Full Model: Same as before, thus SSE(F) = SSE(X
1
, X
2
, X
3
)
Reduced Model: Y
i
=
0
+
1
X
i 1
+
i
, thus SSE(R) = SSE(X
1
)