0% found this document useful (0 votes)
69 views

Correlation and Regression: Fathers' and Daughters' Heights

The document discusses correlation and linear regression. It provides examples of correlation between variables like fathers' and daughters' heights. Correlation measures the linear association between two variables, ranging from -1 to 1. Linear regression models the relationship between a predictor variable X and response variable Y to estimate the expected value of Y given X. It finds the linear trendline that best fits the data based on minimizing residuals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Correlation and Regression: Fathers' and Daughters' Heights

The document discusses correlation and linear regression. It provides examples of correlation between variables like fathers' and daughters' heights. Correlation measures the linear association between two variables, ranging from -1 to 1. Linear regression models the relationship between a predictor variable X and response variable Y to estimate the expected value of Y given X. It finds the linear trendline that best fits the data based on minimizing residuals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Correlation and Regression

Fathers’ and daughters’ heights

Fathers’ heights

mean = 67.7
SD = 2.8

55 60 65 70 75

height (inches)

Daughters’ heights

mean = 63.8
SD = 2.7

55 60 65 70 75

height (inches)

Reference: Pearson and Lee (1906) Biometrika 2:357-462 1376 pairs


Fathers’ and daughters’ heights

corr = 0.52
70
Daughter’s height (inches)

65

60

55

60 65 70 75

Father’s height (inches)

Reference: Pearson and Lee (1906) Biometrika 2:357-462 1376 pairs

Covariance and correlation

Let X and Y be random variables with


µX = E(X), µY = E(Y), σX = SD(X), σY = SD(Y)

For example, sample a father/daughter pair and let


X = the father’s height and Y = the daughter’s height.

Covariance Correlation

cov(X, Y)
cov(X,Y) = E{(X – µX) (Y – µY)} cor(X, Y) =
σXσY

−→ cov(X,Y) can be any real number −→ −1 ≤ cor(X, Y) ≤ 1


Examples
corr = 0 corr = 0.1 corr = !0.1

2 30 30

1 25 25

0 20
Y

Y
20
!1 15
15
!2 10
10

!3 !2 !1 0 1 2 5 10 15 20 25 30 5 10 15 20 25 30

corr = 0.3 corr = 0.5 corr = !0.5

30 30 30

25 25 25

20 20 20
Y

Y
15 15
15
10 10
10
5 5

5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30

corr = 0.7 corr = 0.9 corr = !0.9

30 30 30
25 25 25
20 20 20
Y

Y
15 15 15
10 10 10
5 5 5

5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30

Estimated correlation

Consider n pairs of data: (x1, y1), (x2, y2), (x3, y3), . . . , (xn, yn)

We consider these as independent draws from some


bivariate distribution.

We estimate the correlation in the underlying distribution by:


!
i (xi
− x̄)(yi − ȳ)
r = "! !
x x̄ 2 2
(
i i − ) i(yi − ȳ)

This is sometimes called the correlation coefficient.


Correlation measures linear association

−→ All three plots have correlation ≈ 0.7!

Correlation versus regression

−→ Covariance / correlation:

◦ Quantifies how two random variables X and Y co-vary.


◦ There is typically no particular order between the two ran-
dom variables (e. g. , fathers’ versus daughters’ height).

−→ Regression

◦ Assesses the relationship between predictor X and response


Y: we model E[Y|X].
◦ The values for the predictor are often deliberately chosen,
and are therefore not random quantities.
◦ We typically assume that we observe the values for the
predictor(s) without error.
Example

Measurements of degradation of heme with different concentra-


tions of hydrogen peroxide (H2O2), for different types of heme.

A A and B

0.35 0.35
A

0.30 0.30 B

0.25 0.25
OD

OD
0.20 0.20

0.15 0.15

0.10 0.10

0 10 25 50 0 10 25 50

H2O2 concentration H2O2 concentration

Linear regression

140 Y = 20 + 15X

120

Y = 40 + 8X
100

80
Y

Y = 70 + 0X

60

40 Y = 0 + 5X

20

0 2 4 6 8 10 12

X
Linear regression

2
!1

1
Y

!0

!1 0 1 2 3 4

The regression model

Let X be the predictor and Y be the response. Assume we have n


observations (x1, y1), . . . , (xn, yn) from X and Y.

The simple linear regression model is

yi=β0 + β1xi + #i, #i ∼ iid N(0,σ 2).

This implies:
E[Y|X] = β0 + β1X.

Interpretation:
For two subjects that differ by one unit in X, we expect the responses to differ by β1 .

−→ How do we estimate β0, β1, σ 2 ?


Fitted values and residuals

We can write

#i = yi − β0 − β1xi

For a pair of estimates (β̂0, β̂1) for the pair of parameters (β0, β1)
we define the fitted values as

ŷi = β̂0 + β̂1xi

The residuals are

#̂i = yi − ŷi = yi − β̂0 − β̂1xi

Residuals
Y

^
Y

^"

X
Residual sum of squares

For every pair of values for β0 and β1 we get a different value for
the residual sum of squares.

#
RSS(β0, β1)= (yi − β0 − β1xi)2
i

We can look at RSS as a function of β0 and β1. We try to minimize


this function, i. e. we try to find

(β̂0, β̂1)=minβ0,β1 RSS(β0, β1)

Hardly surprising, this method is called least squares estimation.

Residual sum of squares


RSS

b0 b1
Notation

Assume we have n observations: (x1, y1), . . . , (xn, yn).


!
i xi
x̄ =
n
!
i yi
ȳ =
n
# #
SXX = (xi − x̄)2= x2i − n(x̄)2
i i
# 2
#
SYY = (yi − ȳ) = y2i − n(ȳ)2
i i
# #
SXY = (xi − x̄)(yi − ȳ)= xiyi − nx̄ȳ
i i
# 2
#
RSS = (yi − ŷi) = #̂2i
i i

Parameter estimates

The function
#
RSS(β0, β1)= (yi − β0 − β1xi)2
i

is minimized by

SXY
β̂1 =
SXX

β̂0 = ȳ − β̂1x̄
Useful to know

Using the parameter estimates, our best guess for any y given x is

y=β̂0 + β̂1x

Hence

β̂0 + β̂1x̄ = ȳ − β̂1x̄ + β̂1x̄ = ȳ

That means every regression line goes through the point (x̄, ȳ).

Variance estimates

As variance estimate we use


RSS
σ̂ 2=
n–2

This quantity is called the residual mean square. It has the follow-
ing property:

σ̂ 2
(n – 2) × 2
∼ χ2n – 2
σ

In particular, this implies

E(σ̂ 2)=σ 2
Example

H2O2 concentration
0 10 25 50
0.3399 0.3168 0.2460 0.1535
0.3563 0.3054 0.2618 0.1613
0.3538 0.3174 0.2848 0.1525

We get
x̄=21.25, ȳ=0.27, SXX=4256.25, SXY=– 16.48, RSS=0.0013.

Therefore

– 16.48
β̂1 = = – 0.0039, β̂0 = 0.27 – (– 0.0039) × 21.25 = 0.353,
4256.25
$
0.0013
σ̂ = = 0.0115.
12 – 2

Example

0.35 Y = 0.353 ! 0.0039X

0.30
OD

0.25

0.20

0.15

0 10 25 50

H2O2 concentration
Comparing models

We want to test whether β1 = 0:

H0 : yi = β0 + #i versus Ha : yi = β0 + β1xi + #i

Fit under Ha

Fit under Ho
y

Example

0.35 Y = 0.353 ! 0.0039X


Y = 0.271

0.30
OD

0.25

0.20

0.15

0 10 25 50

H2O2 concentration
Sum of squares

Under Ha :
# 2 (SXY)2
RSS = (yi − ŷi) = SYY − = SYY − β̂12 × SXX
SXX
i

Under H0 :
# 2
#
(yi − β̂0) = (yi − ȳ)2 = SYY
i i

Hence
(SXY)2
SSreg = SYY − RSS =
SXX

ANOVA

Source df SS MS F

SSreg MSreg
regression on X 1 SSreg MSreg =
1 MSE

RSS
residuals for full model n–2 RSS MSE =
n–2

total n–1 SYY


Example

Source df SS MS F

regression on X 1 0.06378 0.06378 484.1

residuals for full model 10 0.00131 0.00013

total 11 0.06509

Parameter estimates

One can show that

E(β̂0) = β0 E(β̂1) = β1
% &
2 1 x̄2 σ2
Var(β̂0) = σ + Var(β̂1) =
n SXX SXX

x̄ −x̄
Cov(β̂0, β̂1) = −σ 2 Cor(β̂0, β̂1) = '
SXX
x̄2 + SXX/n

−→ Note: We’re thinking of the x’s as fixed.


Parameter estimates

One can even show that the distribution of β̂0 and β̂1 is a bivariate
normal distribution!

% &
β̂0
∼ N(β, Σ)
β̂1

where

 
( ) 1 x̄2 −x̄
β0 2 n
+ SXX SXX 
β= and Σ=σ
β1 −x̄ 1
SXX SXX

Simulation: coefficients

!0.0034

!0.0036
slope

!0.0038

!0.0040

!0.0042

!0.0044

0.340 0.345 0.350 0.355 0.360 0.365

y!intercept
Possible outcomes

0.35

0.30
OD

0.25

0.20

0.15

0 10 20 30 40 50

H2O2

Confidence intervals

We know that
% % &&
2
1 x̄
β̂0 ∼ N β0, σ 2 +
n SXX

( )
σ2
β̂1 ∼ N β1,
SXX

−→ We can use those distributions for hypothesis testing and to


construct confidence intervals!
Statistical inference

We want to test: H0 : β1 = β1% versus Ha : β1 (= β1% (generally, β1% is 0.)

We use

$
β̂1 − β1∗ σ̂ 2
t= ∼ tn – 2 where se(β̂1) =
se(β̂1) SXX

Also,
. /
β̂1 − t(1 – α
2 ),n –2 × se(β̂1) , β̂1 + t(1 – α
2 ),n – 2 × se(β̂1)

is a (1 – α)×100% confidence interval for β1.

Results

The calculations in the test H0 : β0 = β0∗ versus Ha : β0 (= β0∗ are


analogous, except that we have to use
0 % &
1
1 1 x̄2
se(β̂0) = 2σ̂ 2 × +
n SXX

For the example we get the 95% confidence intervals


(0.342 , 0.364) for the intercept
(– 0.0043 , – 0.0035) for the slope

Testing whether the intercept (slope) is equal to zero, we obtain


70.7 (– 22.0) as test statistic.
This corresponds to a p-value of 7.8 ×10-15 (8.4 ×10-10).
Now how about that

Testing for the slope being equal to zero, we use

β̂1
t=
se(β̂1)

For the squared test statistic we get

% &2
2 β̂1 β̂12 β̂12 × SXX (SYY − RSS)/1 MSreg
t = = 2
= 2
= = = F
se(β̂1) σ̂ /SXX σ̂ RSS/n – 2 MSE

−→ The squared t statistic is the same as the F statistic from the


ANOVA!

Joint confidence region

A 95% joint confidence region for the two parameters is the set of
all values (β0, β1) that fulfill

( )T ( ! )( )
∆β0 n
! ! i 2i x ∆β0
∆β1 i xi i xi ∆β1
≤ F(0.95),2,n-2
2σ̂ 2

where ∆β0 = β0 − β̂0 and ∆β1 = β1 − β̂1.


Joint confidence region

!1
^

^
!0

Notation

Assume we have n observations: (x1, y1), . . . , (xn, yn).

We previously defined
# #
SXX = (xi − x̄)2 = x2i − n(x̄)2
i i
# 2
#
SYY = (yi − ȳ) = y2i − n(ȳ)2
i i
# #
SXY = (xi − x̄)(yi − ȳ) = xiyi − nx̄ȳ
i i

We also define

SXY
rXY = √ √ (called the sample correlation)
SXX SYY
Coefficient of determination

We previously wrote

(SXY)2
SSreg = SYY − RSS =
SXX
Define
SSreg RSS
R2 = =1−
SYY SYY

R2 is often called the coefficient of determination. Notice that

2SSreg (SXY)2
R = = = r2XY
SYY SXX × SYY

The Anscombe Data


^ ^ ^ 2=13.75 R2=0.667 ^ ^ ^ 2=13.75 R2=0.667
!0=3.0 !1=0.5 # !0=3.0 !1=0.5 #

12 12

10 10

8 8

6 6

4 4

2 2

0 0

0 5 10 15 20 0 5 10 15 20

^ ^ ^ 2=13.75 R2=0.667 ^ ^ ^ 2=13.75 R2=0.667


!0=3.0 !1=0.5 # !0=3.0 !1=0.5 #

12 12

10 10

8 8

6 6

4 4

2 2

0 0

0 5 10 15 20 0 5 10 15 20
Fathers’ and daughters’ heights

corr = 0.52
Daughter’s height (inches) 70

65

60

55

60 65 70 75

Father’s height (inches)

Linear regression

70
Daughter’s height (inches)

65

60

55

60 65 70 75

Father’s height (inches)


Linear regression

Daughter’s height (inches) 70

65

60

55

60 65 70 75

Father’s height (inches)

Regression line

70
Daughter’s height (inches)

65

60

55

60 65 70 75

Father’s height (inches)

−→ Slope = r × SD(Y) / SD(X)


SD line

Daughter’s height (inches) 70

65

60

55

60 65 70 75

Father’s height (inches)

−→ Slope = SD(Y) / SD(X)

SD line vs regression line

70
Daughter’s height (inches)

65

60

55

60 65 70 75

Father’s height (inches)

−→ Both lines go through the point (X̄, Ȳ).


Predicting father’s ht from daughter’s ht

Daughter’s height (inches) 70

65

60

55

60 65 70 75

Father’s height (inches)

Predicting father’s ht from daughter’s ht

70
Daughter’s height (inches)

65

60

55

60 65 70 75

Father’s height (inches)


Predicting father’s ht from daughter’s ht

Daughter’s height (inches) 70

65

60

55

60 65 70 75

Father’s height (inches)

There are two regression lines!

70
Daughter’s height (inches)

65

60

55

60 65 70 75

Father’s height (inches)


The equations

Regression of y on x (for predicting y from x)


SD(y)
Slope = r SD(x) Goes through the point (x̄, ȳ)
SD(y)
ŷ − ȳ = r SD(x) (x − x̄)

−→ ŷ = β̂0 + β̂1 x where β̂1 = r SD (y)


SD(x) and β̂0 = ȳ − β̂1 x̄

Regression of x on y (for predicting x from y)


SD(x)
Slope = r SD(y) Goes through the point (ȳ, x̄)
SD(x)
x̂ − x̄ = r SD(y) (y − ȳ)

−→ x̂ = β̂0% + β̂1% y where β̂1% = r SD (x ) % %


SD(y) and β̂0 = x̄ − β̂1 ȳ

Estimating the mean response

0.35 Y = 0.353 ! 0.0039X

0.30
OD

0.25

0.218
0.20

0.15

0 10 25 35 50

H2O2 concentration

−→ We can use the regression results to predict the expected response for a new
concentration of hydrogen peroxide. But what is its variability?
Variability of the mean response

Let ŷ be the predicted mean for some x, i. e.

ŷ=β̂0 + β̂1x

Then
E(ŷ) = β0 + β1 x
% &
2
1 (x − x̄)
var(ŷ) = σ 2 +
n SXX

where y = β0 + β1x is the true mean response.

Why?

E(ŷ) = E(β̂0 + β̂1 x)


= E(β̂0) + x E(β̂1)
= β0 + x β1

var(ŷ) = var(β̂0 + β̂1 x)


= var(β̂0) + var(β̂1 x) + 2 cov(β̂0, β̂1 x)
= var(β̂0) + x2 var(β̂1) + 2 x cov(β̂0, β̂1)
% & ( 2 )
2
2 1 x̄ 2 x 2 x x̄ σ 2
= σ + +σ −
n SXX SXX SXX
3 4
2 1 (x − x̄)2
= σ +
n SXX
Confidence intervals

Hence

5
1 (x − x̄)2
ŷ ± t(1 – α2 ),n – 2 × σ̂ × +
n SXX

is a (1 – α)×100% confidence interval for the mean response


given x.

Confidence limits

95% confidence limits for the mean response

0.35

0.30
OD

0.25

0.20

0.15

0 10 25 50

H2O2 concentration
Prediction

Now assume that we want to calculate an interval for the predicted


response y% for a value of x.

There are two sources of uncertainty:


(a) the mean response
(b) the natural variation σ 2

%
The variance of ŷ is
% & % &
% 1 (x − x̄)2 1 (x − x̄)2
var(ŷ )=σ 2 + σ 2 + =σ 2 1+ +
n SXX n SXX

Prediction intervals

Hence

5
% 1 (x − x̄)2
ŷ ± t(1 – α2 ),n – 2 × σ̂ × 1+ +
n SXX

is a (1 – α)×100% prediction interval for the predicted response


given x.

−→ When n is very large, we get roughly

%
ŷ ± t(1 – α2 ),n – 2 × σ̂
Prediction intervals

0.35 95% confidence limits for the mean response

95% confidence limits for the prediction

0.30
OD

0.25

0.20

0.15

0 10 25 50

H2O2 concentration

Span and height

75

70
Height (inches)

65

60

60 65 70 75 80

Span (inches)
With just 100 individuals

75

70
Height (inches)

65

60

60 65 70 75 80

Span (inches)

Regression for calibration

That prediction interval is for the case that the x’s are known with-
out error while

y=β0 + β1 x + # where #= error

−→ Another common situation:

◦ We have a number of pairs (x,y) to get a calibration line/curve.

◦ x’s basically without error; y’s have measurement error.

◦ We obtain a new value, y%, and want to estimate the corresponding x%:

y%=β0 + β1 x% + #
Example

180

160

140
Y

120

100

0 5 10 15 20 25 30 35

Another example

180

160

140
Y

120

100

0 5 10 15 20 25 30 35

X
Regression for calibration

−→ Data: (xi,yi) for i = 1,. . . ,n


with yi=β0 + β1 xi + #i, #i ∼ iid Normal(0, σ)

y%j for j = 1,. . . ,m


with y%j =β0 + β1 x% + #%j , #%j ∼ iid Normal(0, σ) for some x%

−→ Goal:
Estimate x% and give a 95% confidence interval.

−→ The estimate:
Obtain β̂0 and β̂1 by regressing the yi on the xi.
% !
Let x̂ =(ȳ% − β̂0)/β̂1 where ȳ% = j y%j /m

95% CI for x̂%

Let T denote the 97.5th percentile of the t distr’n with n–2 d.f.
√ √
Let g = T / [|β̂1| / (σ̂/ SXX)] = (T σ̂) / (|β̂1| SXX)

−→ If g ≥ 1, we would fail to reject H0 : β1=0!


%
In this case, the 95% CI for x̂ is (−∞, ∞).

−→ If g < 1, our 95% CI is the following:

'
% %
%
(x̂ − x̄) g + (T σ̂ / |β̂1|) (x̂ − x̄)2/SXX + (1 − g2) ( m1 + 1n )
2
x̂ ±
1 − g2
% √
For very large n, this reduces to approximately x̂ ± (T σ̂) / (|β̂1| m)
Example

180

160

140
Y

120

100

0 5 10 15 20 25 30 35

Another example

180

160

140
Y

120

100

0 5 10 15 20 25 30 35

X
Infinite m

180

160

140
Y

120

100

0 5 10 15 20 25 30 35

Infinite n

180

160

140
Y

120

100

0 5 10 15 20 25 30 35

X
Multiple linear regression

A and B

0.35
A

0.30

0.25
OD

0.20

0.15

0.10

0 10 25 50

H2O2 concentration

Multiple linear regression

general parallel

concurrent coincident
Multiple linear regression

A and B

0.35
A

0.30

0.25
OD

0.20

0.15

0.10

0 10 25 50

H2O2 concentration

More than one predictor

# Y X1 X2
1 0.3399 0 0
2 0.3563 0 0
3 0.3538 0 0
4 0.3168 10 0
5 0.3054 10 0 The model with two parallel lines can be described as
6 0.3174 10 0
7 0.2460 25 0
8 0.2618 25 0
9 0.2848 25 0
Y =β0 + β1X1 + β2X2 + #
10 0.1535 50 0
11 0.1613 50 0
12 0.1525 50 0
13 0.3332 0 1
14 0.3414 0 1 In other words (or, equations):
15 0.3299 0 1
16 0.2940 10 1 6
17 0.2948 10 1 β0 + β1X1 + # if X2 = 0
18 0.2903 10 1 Y=
19 0.2089 25 1 (β0 + β2) + β1X1 + # if X2 = 1
20 0.2189 25 1
21 0.2102 25 1
22 0.1006 50 1
23 0.1031 50 1
24 0.1452 50 1
Multiple linear regression

A multiple linear regression model has the form

Y =β0 + β1X1 + · · · + βkXk + #, # ∼ N(0, σ 2)

The predictors (the X’s) can be categorical or numerical.

Often, all predictors are numerical or all are categorical.

And actually, categorical variables are converted into a group of


numerical ones.

Interpretation

Let X1 be the age of a subject (in years).

E[Y] = β0 + β1 X1

−→ Comparing two subjects who differ by one year in age, we


expect the responses to differ by β1.

−→ Comparing two subjects who differ by five years in age, we


expect the responses to differ by 5β1.
Interpretation

Let X1 be the age of a subject (in years), and let X2 be an indicator


for the treatment arm (0/1).

E[Y] = β0 + β1 X1 + β2 X2

−→ Comparing two subjects from the same treatment arm who


differ by one year in age, we expect the responses to differ
by β1.

−→ Comparing two subjects of the same age from the two dif-
ferent treatment arms (X2=1 versus X2=0), we expect the re-
sponses to differ by β2.

Interpretation

Let X1 be the age of a subject (in years), and let X2 be an indicator


for the treatment arm (0/1).

E[Y] = β0 + β1 X1 + β2 X2 + β3 X1X2

−→ E[Y] = β0 + β1 X1 (if X2=0)

−→ E[Y] = β0 + β1 X1 + β2 + β3 X1 = β0 + β2 + (β1 + β3) X1 (if X2 =1)

−→ Comparing two subjects who differ by one year in age, we


expect the responses to differ by β1 if they are in the control
arm (X2=0), and expect the responses to differ by β1 + β3 if
they are in the treatment arm (X2=1).
Estimation

We have the model

yi = β0 + β1xi1 + · · · + βkxik + #i, #i ∼ iid Normal(0, σ 2)

−→ We estimate the β ’s by the values for which


!
RSS = i(yi − ŷi)2

is minimized where ŷi = β̂0 + β̂1xi1 + · · · + β̂kxik (aka “least squares”).

5
RSS
−→ We estimate σ by σ̂ =
n − (k + 1)

FYI

Calculation of the β̂ ’s (and their SEs and correlations) is not that


complicated, but without matrix algebra, the formulas are nasty.

Here is what you need to know:

◦ The SEs of the β̂ ’s involve σ and the x’s.


◦ The β̂ ’s are normally distributed.
7 (β̂)
◦ Obtain confidence intervals for the β ’s using β̂ ± t × SE
where t is a quantile of t dist’n with n–(k+1) d.f.
7 (β̂)
◦ Test H0 : β = 0 using |β̂|/SE
Compare this to a t distribution with n–(k+1) d.f.
The example: a full model

x1 = [H2O2].
x2 = 0 or 1, indicating type of heme.
y = the OD measurement.

The model: y = β0 + β1X1 + β2X2 + β3X1X2 + #

i.e., 
 β0 + β1X1 + # if X2 = 0
y=

(β0 + β2) + (β1 + β3)X1 + # if X2 = 1

β2 = 0 −→ Same intercepts.
β3 = 0 −→ Same slopes.
β2 = β3 = 0 −→ Same lines.

Results

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.35305 0.00544 64.9 < 2e-16
x1 -0.00387 0.00019 -20.2 8.86e-15
x2 -0.01992 0.00769 -2.6 0.0175
x1:x2 -0.00055 0.00027 -2.0 0.0563

Residual standard error: 0.0125 on 20 degrees of freedom


Multiple R-Squared: 0.98,Adjusted R-squared: 0.977
F-statistic: 326.4 on 3 and 20 DF, p-value: < 2.2e-16
Testing many parameters

We have the model

yi = β0 + β1xi1 + · · · + βkxik + #i, #i ∼ iid Normal(0, σ 2)

We seek to test H0 : βr+1 = · · · = βk = 0.

In other words, do we really have just:

yi = β0 + β1xi1 + · · · + βrxir + #i, #i ∼ iid Normal(0, σ 2)

What to do. . .

1. Fit the “full” model (with all k x’s).

2. Calculate the residual sum of squares, RSSfull.

3. Fit the “reduced” model (with only r x’s).

4. Calculate the residual sum of squares, RSSred.

(RSSred−RSSfull)/(dfred−dffull)
5. Calculate F = RSSfull/dffull .
where dfred = n − r − 1 and dffull = n − k − 1).

6. Under H0, F ∼ F(dfred − dffull, dffull).


In particular. . .

Assume the model

yi = β0 + β1xi1 + · · · + βkxik + #i, #i ∼ iid Normal(0, σ 2)

We seek to test H0 : β1 = · · · = βk = 0 (i.e., none of the x’s are related to y).

−→ Full model: All the x’s

!
−→ Reduced model: y = β0 + # RSSred = i (yi − ȳ)2

! ! !
−→ F = [( i(yi − ȳ)2 − i(yi − ŷi)2)/k] / [ i(yi − ŷi)2/(n − k − 1)]
Compare this to a F(k, n – k – 1) dist’n.

The example

To test β2 = β3 = 0

Analysis of Variance Table

Model 1: y ˜ x1
Model 2: y ˜ x1 + x2 + x1:x2

Res.Df RSS Df Sum of Sq F Pr(>F)


1 22 0.00975
2 20 0.00312 2 0.00663 21.22 1.1e-05

You might also like