0% found this document useful (0 votes)
10 views

Lecture 6

Uploaded by

kf.aforka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lecture 6

Uploaded by

kf.aforka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

ANOVA

Dr. Frank Wood

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 1


ANOVA
• ANOVA is nothing new but is instead a way of
organizing the parts of linear regression so as
to make easy inference recipes.
• Will return to ANOVA when discussing
multiple regression and other types of linear
statistical models.

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 2


Partitioning Total Sum of Squares
• “The ANOVA approach is based on the
partitioning of sums of squares and degrees
of freedom associated with the response
variable Y”
• We start with the observed deviations of Yi
around the observed mean Ȳ

Yi − Ȳ

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 3


Partitioning of Total Deviations

SSTO SSE SSR

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 4


Measure of Total Variation
• The measure of total variation is denoted by

SST O = (Yi − Ȳ )2
• SSTO stands for total sum of squares
• If all Yi’s are the same, SSTO = 0
• The greater the variation of the Yi’s the
greater SSTO

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 5


Variation after predictor effect
• The measure of variation of the Yi’s that is still
present when the predictor variable X is taken
into account is the sum of the squared
deviations

SSE = (Yi − Ŷi )2

• SSE denotes error sum of squares

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 6


Regression Sum of Squares
• The difference between SSTO and SSE is
SSR


SSR = (Ŷi − Ȳ )2

• SSR stands for regression sum of squares

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 7


Partitioning of Sum of Squares

Yi − Ȳ = Ŷi − Ȳ + Yi − Ŷi

Total Deviation Deviation


deviation of fitted around
regression fitted
value regression
around mean line

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 8


Remarkable Property
• The sums of the same deviations squared
has the same property!

(Yi − Ȳ )2 = (Ŷi − Ȳ )2 + (Yi − Ŷi )2

or SSTO = SSR + SSE

• Proof:

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 9


Remarkable Property
• Proof: (Yi − Ȳ )2 = (Ŷi − Ȳ )2 + (Yi − Ŷi )2

2
(Yi − Ȳ ) = [(Ŷi − Ȳ ) + (Yi − Ŷi )]2

= [(Ŷi − Ȳ )2 + (Yi − Ŷi )2 + 2(Ŷi − Ȳ )(Yi − Ŷi )]
  
= 2
(Ŷi − Ȳ ) + ˆ 2
(Yi − Yi ) + 2 (Ŷi − Ȳ )(Yi − Ŷi )

but
  
(Ŷi − Ȳ )(Yi − Ŷi ) = Ŷi (Yi − Ŷi ) − Ȳ (Yi − Ŷi ) = 0

By properties previously demonstrated

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 10


Remember: Lecture 3
• The ith residual is defined to be

ei = Yi − Ŷi
• The sum of the residuals is zero:
 
ei = (Yi − b0 − b1 Xi )
i
 
= Yi − nb0 − b1 Xi
= 0 By first normal equation.

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 11


Remember: Lecture 3
• The sum of the weighted residuals is zero
when the residual in the ith trial is weighted by
the fitted value of the response variable for
the ith trial
 
Ŷi ei = (b0 + b1 Xi )ei
i i
 
= b0 ei + b1 ei Xi
i i
= 0
By previous properties.

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 12


Breakdown of Degrees of Freedom
• SSTO
– 1 linear constraint due to the calculation and
inclusion of the mean
• n-1 degrees of freedom
• SSE
– 2 linear constraints arising from the estimation of
β and β
• n-2 degrees of freedom
• SSR
– Two degrees of freedom in the regression
parameters, one is lost due to linear constraint
• 1 degree of freedom

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 13


Mean Squares
• A sum of squares divided by its associated
degrees of freedom is called a mean square
– The regression mean square is
SSR
M SR = 1 = SSR

– The error mean square is


SSE
M SE = n−2

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 14


ANOVA table for simple lin. regression
Source of SS df MS E{MS}
Variation

Regression 1 MSR =
 SSR/1 2

SSR = (Ŷi − Ȳ ) 2 σ + β12 (Xi − X̄)2

Error n-2 MSE =


 SSE/(n-2)
SSE = (Yi − Ŷi )2 σ2

Total n-1

SST O = (Yi − Ȳ )2

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 15


E{M SE} = σ 2
• We know from earlier lectures that
– SSE/σ ~ χ(n-2)
• That means that E{SSE/σ} = n-2
• And thus that E{SSE/(n-2)} = E{MSE} = σ

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 16


2 2
 2
E{M SR} = σ + β1 (Xi − X̄)
• To begin, we take an alternative but
equivalent form for SSR

SSR = b21 (Xi − X̄)2

• And note that, by definition of variance we


can write
σ 2 {b1 } = E{b21 } − (E{b1 })2

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 17


2 2
 2
E{M SR} = σ + β1 (Xi − X̄)
• But we know that b1 is an unbiased estimator
of β so E{b1} = β
• We also know (from previous lectures) that
2
2  σ
σ {b1 } = (Xi −X̄)2

• So we can rearrange terms and plug in


σ 2 {b1 } = E{b21 } − (E{b1 })2
σ2
E{b21 } =  + β 2
1
(Xi − X̄)2

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 18


2 2
 2
E{M SR} = σ + β1 (Xi − X̄)
• From the previous slide
σ2
E{b21 } =  2
+ β 2
1
(Xi − X̄)
• Which brings us to this result

E{M SR}
 = E{SSR/1} 
= E{b21 } (Xi − X̄)2 = σ 2 + β12 (Xi − X̄)2

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 19


Comments and Intuition
• The mean of the sampling distribution of MSE
is σ regardless of whether X and Y are
linearly related (i.e. whether β = 0)
• The mean of the sampling distribution of MSR
is also σ when β = 0.
– When β = 0 the sampling distributions of MSR
and MSE tend to be the same

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 20


F Test of β = 0 vs. β ≠ 0
• ANOVA provides a battery of useful tests.
For example, ANOVA provides an easy test
for Test statistic from before
– Two-sided test
• H0 : β = 0 ∗ b1 −0
• Ha : β ≠ 0
t = s{b1 }
• Test statistic
ANOVA test statistic

∗ M SR
F = M SE

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 21


Sampling distribution of F*
• The sampling distribution of F* when H0(β =
0) holds can be derived starting from
Cochran’s theorem
• Cochran’s theorem
– If all n observations Yi come from the same
normal distribution with mean µ and variance σ,
and SSTO is decomposed into k sums of squares
SSr, each with degrees of freedom dfr, then the
SSr/σ terms are independent χ variables with dfr
degrees of freedom if
k
r=1 dfr = n − 1
Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 22
The F Test
• We have decomposed SSTO into two sums
of squares SSR and SSE and their degrees
of freedom are additive, hence, by Cochran’s
theorem:
– If β = 0 so that all Yi have the same mean µ = β
and the same variance σ, SSE/σ and SSR/σ
are independent χ variables

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 23


F* Test Statistic
• F* can be written as follows
SSR/σ 2
∗ M SR
F = M SE = 1
SSE/σ 2
n−2

• But by Cochran’s theorem, we have when H0


holds
χ2 (1)

F ∼ 1
χ2 (n−2)
n−2

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 24


F Distribution
• The F distribution is the ratio of two
independent χ random variables.
• The test statistic F* follows the distribution
– F* ~ F(1,n-2)

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 25


Hypothesis Test Decision Rule
• Since F* is distributed as F(1,n-2) when H0
holds, the decision rule to follow when the risk
of a Type I error is to be controlled at α is:
– If F* ≤ F(1-α; 1, n-2), conclude H0
– If F* > F(1-α; 1, n-2) conclude Ha

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 26


F distribution
• PDF, CDF, Inverse CDF of F distribution
F Density Function F Cumulative Density Function F Inverse CDF
1.4 1 7

0.9
1.2 6
0.8
1 5
0.7
fpdf(X,1,100)

0.6

fcdf(X,1,100)
0.8 4

X
0.5
0.6 3
0.4

0.4 0.3 2

0.2
0.2 1
0.1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
X X fcdf(X,1,100)

• Note, MSR/MSE must be big in order to reject


hypothesis.

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 27


Partitioning of Total Deviations
• Does this make sense? When is MSR/MSE
big?

SSTO SSE SSR

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 28


General Linear Test
• The test of β = 0 versus β ≠ 0 is but a single
example of a general test for a linear
statistical models.
• The general linear test has three parts
– Full Model
– Reduced Model
– Test Statistic

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 29


Full Model Fit
• The standard full simple linear regression
model is first fit to the data
Yi = β0 + β1 Xi + ǫi

• Using this model the error sum of squares is


obtained
 2

SSE(F ) = [Yi − (b0 + b1 Xi )] = (Yi − Ŷi )2 = SSE

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 30


Fit Reduced Model
• For instance, so far we have considered
– H0 : β = 0
– Ha : β ≠ 0
• The model when H0 holds is called the
reduced or restricted model. Here this results
in β = 0

Yi = β0 + ǫi

• The SSE for the reduced model is obtained


 2

SSE(R) = (Yi − b0 ) = (Yi − Ȳ )2 = SST O
Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 31
Test Statistic
• The idea is to compare the two error sums of
squares SSE(F) and SSE(R).
• Because F has more parameters than R
– SSE(F) ≤ SSE(R) always
• The relevant test statistic is
SSE(R)−SSE(F )
∗ dfR −dfF
F = SSE(F )
dfF

which follows the F distribution when H0 holds.


• dfR and dfF are those associated with the reduced
and full model error sumes of square respectively
Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 32
R2
• SSTO measures the variation in the
observations Yi when X is not considered
• SSE measures the variation in the Yi after a
predictor variable X is employed
• A natural measure of the effect of X in
reducing variation in Y is to express the
reduction in variation (SSTO-SSE = SSR) as
a proportion of the total variation

R2 = SSR
SST O =1− SSE
SST O

Frank Wood, [email protected] Linear Regression Models Lecture 6, Slide 33

You might also like