0% found this document useful (0 votes)
19 views

Chapter 11 Lecture Notes .

Uploaded by

chrisholiday49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Chapter 11 Lecture Notes .

Uploaded by

chrisholiday49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Chapter 11 Lecture Notes

The research context is that two variables have been observed for each of n
participants. The research team then has a spreadsheet with n pairs of observations
( x i , y i ), i=1 ,…, n . One of the variables (here y) is the outcome variable or dependent
variable. This is the variable hypothesized to be affected by the other variable in
scientific research. The other variable (here x) is the independent variable. It may
be hypothesized to predict the outcome variable or to cause a change in the
outcome variable. The research task is to document the association between
independent and dependent variables. An example of a research project seeking to
document a causal association would be a clinical trial in which x i was the dosage
of a medicine randomly assigned to a participant (say simvastatin) and y i was the
participant’s response after a specified period taking the medicine (say cholesterol
reduction after 3 months). An example of a study seeking to document the value of
a predictive association would be an observational study in which x i was the score
of a statistics student on the first examination in a course and y i was the student’s
score on the final examination in the course.

A recommended first step is to create the scatterplot of observations, with the


vertical axis representing the dependent variable and the horizontal axis
representing the independent variable. The “pencil test” is to hold up a pencil to the
scatterplot and examine whether that describes the data well. If so, then it is
reasonable to assume that a linear model (such as β 0 + β 1 x ) describes the data. The
linear model is reasonable for many data sets in observational studies. A more
object procedure is to use a “nonlinear smoother” such as LOWESS to estimate the
association. If the LOWESS curve is not well approximated by a line, then the
assumption of linearity is not reasonable.

Estimating the Linear Model Parameters (section 11.2)

OLS (ordinary least squares) is the most commonly used method to estimate the
parameters of the linear model. An arbitrary linear model b 0 +b 1 x is used as a fit for
the dependent variable values. The method uses the residual y i −b0 −b 1 xi . The
fitting model is judged by how small the set of residuals is. OLS uses each residual
and focuses on the magnitude of the residuals by examining the sum of squares
n
SS(b 0 , b1 )=∑ ( y i−b0 −b1 x i )2
function i=1 . The OLS method is to find the arguments
^ ^
( β 0 , β 1 ) that make SS(b 0 , b1 ) as small as possible. This minimization is a standard

calculus problem. Step 1 is to calculate the partial derivatives of SS(b 0 , b1 ) with


respect to each argument. First, the partial with respect to b 0 :
∂ SS(b 0 , b1 ) ∂ n n

n ∂( y i −b0 −b 1 xi )
∂ b0
= ∑
∂b 0 i=1
( y i−b0 −b1 x i ) =∑
2
( y i −b 0−b1 x i ) =∑ 2( y i −b0 −b 1 x i )
2

i=1 ∂ b0 i=1 ∂ b0

∂ SS(b 0 , b1 ) n
=∑ (−2 )( y i−b 0 −b1 x i )
∂ b0 i=1 .

Second, the partial with respect to b 1 :


∂ SS(b 0 , b1 ) ∂ n n
∂ ( y −b −b x )2= 2( y −b −b x ) ∂( y i−b 0 −b1 x i )
n

∂ b1
= ∑ ( y −b −b x )2
= ∑
∂b 1 i=1 i 0 1 i i=1 ∂ b1 i 0 1 i i=1
∑ i 0 1 i ∂ b1

∂ SS(b 0 , b1 ) n
=∑ (−2 x i )( y i −b0 −b 1 x i )
∂ b1 i=1 .
^ ^
Step 2 is to find the arguments ( β 0 , β 1 ) that make the two partial derivatives zero.
The resulting equations are called the normal equations:
n
∑ (−2)( y i− β^ 0− β^ 1 x i )=0
i=1 and
n
∑ (−2)( y i− β^ 0− β^ 1 x i ) x i=0
i=1 .
^ ^
These equations have a very important interpretation. Let r i = y i− β0 − β 1 x i , i=1 ,…, n .
n n
∑ r i=0 ∑ r i x i =0
The first normal equation is equivalent to i=1 , and the second is i=1 .
That is, there are two constraints on the n residuals. The OLS residuals must sum
to zero, and the OLS residuals are orthogonal to the independent variable values.
The n residuals then have n−2 degrees of freedom.
Step 3 is to solve this two linear equation system in two unknowns. Start by using
^
the first normal equation to solve for β 0 :
n n n n
∑ ( y i− β^ 0− β^ 1 x i )=∑ y i−∑ β^ 0−∑ β^ 1 x i= n ȳ n−n β^ 0− β^ 1(n x̄ n )=0 ^
i=1 i=1 i=1 i=1 . Solving for β 0 yields
β^ 0 = ȳ n − β^ 1 x̄n . Next, insert the solution for β^ 0 in the second normal equation and
^
solve for β 1 :
n n n n
0=∑ ( y i − β^ 0 − β^ 1 x i )x i =∑ {[ yi − ¿( ȳ n− β^ 1 x̄ n )− β^ 1 x i ] xi }=∑ [( y i − ȳ n )x i ]−∑ [ β^ 1 ( x i− x̄ n )x i ] ¿
i=1 i=1 i=1 i=1 ,
n
∑ ( y i− ȳ n ) x i
β^ 1= i =1
n
∑ ( x i− x̄ n ) xi
The solution is i=1 . There are a number of modifications of this
formula that are helpful. The first results from noting that
n n n n
∑ ( x i− x̄ n )2=∑ ( x i− x̄ n ) x i−∑ ( xi − x̄ n ) x̄ n= ∑ ( x i− x̄ n ) x i
i=1 i=1 i=1 i=1 and
n n n n
∑ ( y i− ȳ n )( x i − x̄ n )=∑ ( y i − ȳ n ) xi −∑ ( yi − ȳ n ) x̄ n= ∑ ( y i− ȳ n ) x i
i=1 i=1 i=1 i=1 . The OLS solution is
n
∑ ( y i− ȳ n )( x i − x̄n )
β^ 1= i =1 n
∑ ( x i − x̄ n )2
then i=1 . This is a very commonly quoted formula.
^
The second shows the relation of β 1 and the Pearson product moment
correlation. The Pearson product moment correlation is a dimensionless measure of
n
∑ ( y i − ȳ n )( xi − x̄ n )
i=1
r ( x , y )=

√ √
n n
∑ ( x i− x̄ n )2 ∑ ( yi − ȳ n )2
association. The formula is i=1 i =1 . The Cauchy-
Schwartz inequality shows that |r( x , y )|≤1 . A correlation of +1 or -1 shows a
perfect linear association. A correlation of 0 means no linear association. The
^ ^
numerator of β 1 and r( x , y ) are the same. Starting with β 1 ,

√∑
n n n
∑ ( y i− ȳ n )( x i − x̄n ) ∑ ( y i− ȳ n )( x i − x̄ n ) ( y i− ȳ n )2
β^ 1= i =1 = i=1
⋅ i =1

√ √ √∑
n n n n
∑ ( x i − x̄ n )2 ∑ ( x i− x̄ n )2⋅ ∑ ( y i − ȳ n )2 ( x i− x̄ n )2
i=1 i=1 i=1 i=1 . That is,


n n
∑ ( y i − ȳ n )( xi − x̄ n ) ∑ ( y i− ȳ n )2 √( n−1) s 2Y = s Y ⋅r ( x , y )
β^ 1= i=1
⋅ i=1
=r ( x , y )⋅

√ √ √∑ √( n−1 ) s2X s X
n n n
∑ ( x i − x̄ n )2⋅ ∑ ( y i − ȳ n )2 ( x i− x̄ n )2
i =1 i=1 i=1 . The
sY
β^ 1= ⋅r ( x , y )
second formula is then sX .

The next variation will be used in calculating the distributional properties of


β^ 1 and uses the identity that
n
∑ ( xi − x̄ n ) y i
n n n n β^ 1= i =1n
∑ ( y i− ȳ n )( x i − x̄ n )=∑ [( x i − x̄ n) y i ]−∑ [( x i− x̄ n ) ȳ n ]=∑ ( x i − x̄ n) y i ∑ ( x i − x̄ n )2
i=1 i=1 i=1 i=1 . Then i =1

Fisher’s Decomposition of the Total Sum of Squares

The total sum of squares of the dependent variable is defined to be


n
TSS=∑ ( y i − ȳ n )2
i =1 with n−1 degrees of freedom. The ith residual was defined above
^ ^ ^
to be r i = y i− β0 − β 1 x i , i=1 ,…, n . After substituting for β 0 ,
r i = y i− ȳ n− β^ 1 ( xi − x̄ n ), i=1 ,…, n .

Fisher’s decomposition is a fundamental tool for the analysis of the linear model. It
n n n
TSS=∑ ( y i − ȳ n )2 =∑ [ y i− ȳ n− β^ 1 ( xi − x̄ n )+ β^ 1 ( x i− x̄ n ) ]2 =∑ [ r i + β^ 1 ( x i− x̄ n ) ]2
starts with i=1 i=1 i=1 .
n n
TSS=∑ [r i + β^ 1 ( x i− x̄ n )]2 =∑ [r 2i + β^ 12 ( x i− x̄ n )2 +2 β^ 1 r i ( x i− x̄ n )]
Next i=1 i=1 , and
n n n n
TSS=∑ r 2i + ∑ β^ 21 ( x i− x̄ n )2 +2 β^ 1 ∑ r i ( x i − x̄ n ) ∑ r 2i =SSE
i=1 i=1 i=1 . The first sum i=1 , the sum of
n
∑ β^ 21 ( xi − x̄ n )2
squared errors and has n−2 degrees of freedom. The second sum i=1 is
called the regression sum of squares and has 1 degree of freedom. It can be
simplified:

[ ]
√∑
2
n
2
n n ( yi − ȳ n ) n
RegSS=∑ β^ 1 ( xi − x̄ n ) = β^ 1 ∑ ( xi − x̄ n ) =[ r ( x , y ) ] ∑ ( x i− x̄ n )2
2 2 2 2 2 i =1

√∑
n
i =1 i =1 i=1
( xi − x̄ n )2
i =1
and

[ ]
√∑
2
n
2
( y i − ȳ n ) n n
RegSS=[ r ( x , y ) ]
2 i=1
∑ ( x i − x̄ n ) =[ r ( x , y ) ] ∑ ( y i− ȳ n )2 =[r ( x , y ) ]2 TSS
2 2

√∑
n
i =1 i =1
( x i − x̄ n )2
i =1
.
n n n
2 β^ 1 ∑ r i ( x i− x̄ n )=2 β^ 1 ( ∑ r i x i −∑ r i x̄ n )=2 β^ 1 (0−0)=0
Finally, the third sum i=1 i=1 i=1 .

This is conventionally displayed in an Analysis of Variance Table as below:

Analysis of Variance Table

One Predictor Linear Regression

Source DF Sum of Squares Mean Square F


n
Regression 1 2
(n−2)[ r ( x , y )]2
∑ β^ 21 ( xi − x̄ n )2=[ r ( x , y ) ]2 TSS [ r( x, y)] TSS
i=1 1−[r (x , y )]2
Error n−2 n
{1−[ r( x , y )]2 }TSS
∑ r 2i ={1−[r (x , y )]2}TSS
i=1 (n−2)
Total n−1 TSS=(n−1)s 2DV

11.3 Inferences
There must be a probabilistic model for the data so that researchers can make
inferences and find confidence intervals. The model for one predictor linear
regression is Y i =β 0 +β 1 x i +σ Y |x Z i . The outcome or dependent (random) variables
Y i , i=1 ,…, n are each assumed to be the sum of the linear regession expected value
β 0 +β 1 x i and a random error term σ Y|x Z i . The random variables Zi , i=1 ,…, n are

assumed to be independent standard normal random variables. The parameter β 0 is


the intercept parameter and is fixed but unknown. The parameter β 1 is the slope
parameter and is also fixed but unknown. This parameter is the focus of the
statistical analysis. The parameter σ Y|x is also fixed but unknown. Another
description of this model is that Y i , i=1 ,…, n are independent normally distributed
2
random variables with Y i having the distribution N ( β 0 + β1 x i , σ Y|x ) . That is,
E(Y i|X=x i )=β 0 +β 1 x i , and var(Y i|X =xi )=σ 2Y|x . The assumption that
var(Y i|X =xi )=σ 2Y|x is called the homoscedasticity assumption.

There are four assumptions. There are two important assumptions: the
outcome variables Y i , i=1 ,…, n are independent and E(Y i|X=x i )=β 0 +β 1 x i for
i=1 ,…,n . Homoscedasticity is less important. The assumption that Y i , i=1 ,…, n are
normally distributed random variables is least important.

Variance Calculations

The most complex variance formula in this course so far is:

var(aX +bY )=a2 var X +b2 var Y +2 ab cov ( X , Y ).

More complex calculations are required for the variance-covariance matrix of the
OLS estimates. The easiest way is to use the variance-covariance matrix of a
T
random vector. Let Y be an n×1 vector of random variables (Y 1 , Y 2 ,…Y n ) . That is,
each component of the vector is a random variable. Then the expected value of
vector Y is the n×1 vector whose components are the respective means of the
T
random variables; that is, E(Y )=(EY 1 , EY 2 ,…EY n ) . The variance-covariance matrix
of the random vector Y is the n×n matrix whose diagonal entries are the respective
variances of the random variables and whose off-diagonal elements are the
covariances of the random variables. That is,

[ ]
var(Y 1 ) cov (Y 1 ,Y 2 ) ⋯ cov (Y 1 ,Y n )
cov(Y 2 ,Y 1 ) var(Y 2 ) ⋯ cov (Y 2 , Y n )
vcv (Y )=
⋮ ⋮ ⋱ ⋮
cov(Y n , Y 1 ) cov (Y n , Y 2 ) ⋯ var(Y n )
.
T
In terms of expectation operator calculations, vcv(Y )=E [(Y −EY )(Y −EY ) ]=Σ .

Variance of a Set of Linear Combinations

Let W be the m×1 random vector of linear combinations of Y given by W =MY ,


where M is a matrix of constants having m rows and n columns. Then
E(W )=E ( MY )=ME(Y ) , The definition of the variance-covariance matrix of W is
vcv(W )=E[(W −EW )(W −EW )T ]=E[( MY −MEY )( MY −MEY )T ] , and

vcv(W )=E[( MY −MEY )( MY −MEY )T ]=E {M (Y −EY )[ M (Y −EY )]T }

From matrix algebra, when A is an n×m matrix and B is anm× p matrix, then
( AB)T =BT AT . Then [ M (Y −EY )]T =(Y −EY )T M T , and
vcv (W )=vcv (MY )=E {M (Y −EY )(Y −EY )T M T }=M {E [(Y −EY )(Y − EY )T ]}M T
from the
vcv(Y )=E [(Y −EY )(Y −EY )T ]=Σ
linear operator property of E. Since ,
vcv (W )=vcv (MY )=M×vcv (Y )×M T =MΣM T

Examples

The first use of this result is to find the variance of a linear combination of values
from Y, an n1 vector of random variables. Let a be an n1 vector of constants,
T T T T T
and let W =a Y . Then var(a Y )=a ×vcv (Y )×(a ) =a ×vcv (Y )×a . This is the
T

2 2
completely general form of var(aX +bY )=a var( X )+b var(Y )+2 ab cov ( X ,Y ) .

is fundamental to this chapter. The OLS estimates of the

The second example


^ ^
parameters are always the same functions of the observed data: β 0 = ȳ n − β 1 x̄n and
n
∑ ( xi − x̄ n ) y i
β^ 1= n
i =1

∑ ( x i − x̄ n )2
i =1

. It is then reasonable to study the random variables


n
∑ Yi
i=1 1 1 1
Ȳ n = = Y 1 + Y 2 +⋯+ Y n
n n n n

and
n
∑ ( xi − x̄ n )Y i ( x 1− x̄ n ) ( x 2− x̄ n ) ( x n − x̄ n )
β^ 1= n
i =1
= n
Y1+ n
Y 2 +⋯+ n
Yn
∑ ( x i− x̄ n ) 2
∑ ( xi − x̄ n ) 2
∑ ( x i− x̄ n ) 2
∑ ( x i− x̄ n ) 2

i=1 i =1 i=1 i=1

.
n

( xi − x̄ n )
∑ ( xi − x̄ n )Y i
w i= n
, i=1 ,… ,n β^ 1= i =1n =w1 Y 1 +w 2 Y 2 +⋯+w n Y n
∑ ( x i − x̄ n ) 2
∑ ( x i− x̄ n ) 2

i =1 i=1

Let . Then .

[]
Y1

( )
Ȳ n
=[ 1/n 1/n ⋯ 1/n ] Y 2
β^ 1 w1 w 2 ⋯ wn ⋮
Yn
Let , which has the form MY, where
1/n 1/n ⋯ 1/n
M=[ ]
w1 w2 ⋯ w n . In the model Y i =β 0 +β 1 x i +σ Y |x Z i vcv (Y )=σ 2Y|x I n×n
, . Then
1/n w1 1/n w1

()
Ȳ 1 /n 1/n ⋯ 1/n 1/n w2 1 /n 1/n ⋯ 1/n 1/n w2
vcv n =[ ]×σ 2Y|x I n×n ×[ ]=σ 2Y|x [ ]×[ ]
β^ 1 w1 w 2 ⋯ w n ⋮ ⋮ w1 w 2 ⋯ w n ⋮ ⋮
1 /n wn 1/n w
n .

Then,

[ ]
n n
w1 wi
1/n
∑ n12 ∑
( )
Ȳ n 1/ n 1/n ⋯ 1/n 1/n w2 n
vcv =σ 2Y|x [ ]×[ ]=σ 2Y|x i=1 i=1
^β w1 w 2 ⋯ wn ⋮ ⋮ n
w n
1
1/ n wn ∑ ni ∑ w2i
i=1 i=1
.
n wi n
1 ( x i − x̄ n ) 1
n
∑ n
=∑
n n
= n ∑ ( x i− x̄ n )=0
i =1
∑ ( x i − x̄ n )2 n ∑ ( x i− x̄ n )2
i=1 i=1

Now i=1 i=1 , and


n n ( x i − x̄ n ) 1
n
1
∑ w2i =∑ [ n
]2 = n ∑ ( x i − x̄ n )2= n
i =1 i =1
∑ ( x i− x̄ n )2 [ ∑ ( x i− x̄ n )2 ] 2 ∑ ( x i − x̄ n )2
i=1

i=1 i=1 i =1 .

[ ][ ]
2
n n
wi σ Y |x
∑ n12 ∑ n
0

( )
Ȳ n
vcv n =σ 2Y |x i=1 i =1
= σ 2Y |x
β^ n
wi n
0
1
∑ n
∑ wi2 n
i=1 i=1 ∑ ( xi − x̄ n )2
The final result is that i =1

σ 2Y |x
var ( β^ 1 )= n
σ Y2 |x
var( Ȳ n )= ∑ ( x i− x̄ n )2 ^
To summarize this result, n , i=1 , and cov ( Ȳ n , β 1 )=0 .

Testing a null hypothesis about β 1

The last detail before deriving tests and confidence intervals for the slope of the
E( β^ 1 )
. Then

regression function is to find


n n n n n
E( β^ 1 )=E ( ∑ wi Y i )=∑ wi E(Y i )=∑ wi ( β 0 +β1 x i )=β 0 ( ∑ wi )+β 1 ( ∑ wi x i )
i=1 i=1 i=1 i=1 i=1
. Now, from
n n
1
∑ wi = n ∑ ( x i− x̄ n )=0.
∑ ( x i− x̄ n )2
i=1 i=1

i=1
above,
n

n n ( xi − x̄ n )
∑ ( xi − x̄ n ) x i
∑ w i x i= ∑ n
xi=
i =1
n
=1 .
∑ ( xi − x̄ n ) ∑ ( x i− x̄ n )
i=1 i=1 2 2

i =1 i=1

The second sum is

^ )= β
E( β β^ 1
1 1
Then . Under the data model, the distribution of is
σ 2Y |x
N ( β1 , n
).
∑ ( x i− x̄ n ) 2

i=1

H 0 : β1 =0 H 1 : β 1 ≠0
The key null hypothesis is , and the alternative hypothesis is .

σ 2Y |x
N (0 , n
).

β^ 1 ∑ ( x i− x̄ n ) 2

i=1
The test statistic is , and the null distribution is The standard
β^ 1−0
Z=


σ Y2 |x
n
∑ ( x i − x̄ n )2
i=1 α
score form of the statistic is . When the level of significance is
σ 2Y|x H 0 : β1 =0 |Z|≥|z α /2| σ 2Y|x
and is known, then is rejected when . When is not
σ^ 2Y|x =MSE
known, it is estimated by . This requires the use of the Student’s t
β^ 1 −0
T n−2 =


MSE
n
∑ ( x i − x̄ n )2
i =1
distribution. The studentized form of the statistic is . Then,
H 0 : β1 =0 |T n−2|≥|t α /2,n−2|
is rejected when . An equivalent approach is to use
MS REG
TS= =F H 0 : β1 =0
MSE
. Under , the null distribution of F is a central F with 1
n−2
numerator and denominator degrees of freedom.

Confidence interval for β 1

σ 2Y|x (1−α) β1
When is known, the % confidence interval for is


σ 2Y |x
β^ 1±|z α / 2| n
∑ ( x i − x̄ n )2 σ Y|x 2
(1−α)
. When
i =1
is not known, the % confidence interval


MSE
β^ 1±|t α /2 , n−2| n

β1 ∑ ( x i− x̄ n )2
i=1
for is .

Distribution of the Estimated Intercept

β^ 0 =Ȳ n − β^ 1 x̄ n β^ 0
Since , is normally distributed with
E( β^ 0 )=E ( Ȳ n − β^ 1 x̄ n )=E( Ȳ n )−E ( β^ 1 x̄ n )=E ( Ȳ n )− x̄ n E( β^ 1 )=E ( Ȳ n )−β 1 x̄ n E( β^ 1 )=β 1 .
because
Y 1 +Y 2 +⋯+ Y n E (Y 1 )+ E (Y 2 )+⋯+ E (Y n ) ( β0 + β 1 x1 )+⋯+( β0 + β 1 x n )
E( Ȳ n )=E( )= =
Now n n n
,
( β 0 + β 1 x 1 )+⋯+( β0 + β 1 x n ) nβ 0 + β1 ∑ x i
E( Ȳ n )= = =β 0 + β 1 x̄ n
n n
with . Then
E( β^ 0 )=E ( Ȳ n )−β 1 x̄ n =β 0 +β 1 x̄ n− β1 x̄ n =β 0
. Finally,
σ 2Y |x ( x̄ n )2 σ 2Y |x
var( β^ 0 )=var( Ȳ n − β^ 1 x̄ n )=var( Ȳ n )+( x̄ n )2 var( β^ 1 )−2 x̄ n cov ( Ȳ n , β^ 1 )= + n −2⋅0
n
∑ ( x i− x̄ n ) 2

i=1
. In
σ 2Y |x ( x̄ n )2 σ 2Y |x
N ( β0 , + n )
n
β^ 0 ∑ ( x i− x̄ n )2
i=1
summary, is

Y^ ( x)
Confidence Interval for

Y^ ( x )= β^ 0 + β^ 1 x E [ Y^ ( x )]=E ( β^ 0 + β^ 1 x )=E ( β^ 0 )+xE ( β^ 1 )=β 0 +β 1 x


Since , .

For the variance,


var[ Y^ ( x)]=var[ Ȳ n + β^ 1 ( x− x̄ n )]=var ( Ȳ n )+( x− x̄ n )2 var ( β^ 1 )+2( x− x̄ n )cov ( Ȳ n , β^ 1 )
, with
σ 2Y |x 2
( x− x̄ n ) σ 2Y |x 1 ( x− x̄ n ) 2
var [ Y^ ( x )]= + n
+2( x− x̄ n )⋅0=σ 2Y |x ( + n )
n n
∑ ( x i− x̄ n )2
∑ ( x i− x̄ n )2
Y^ ( x)
i=1 i=1
. In summary, has
1 ( x− x̄ n )2
N ( β 0 + β1 x , σ 2Y |x ( + n ))
n
∑ ( xi − x̄ n )2
σ 2Y|x
i =1
the normal distribution . When is known, the


1 ( x− x̄ n )2
Y^ ( x )±1. 960 σ Y |x ( +
2
)
n n

E [ Y^ ( x)]= β0 +β 1 x ∑ ( x i− x̄ n ) 2

i=1
95% confidence interval for is .

σ 2Y|x σ 2Y|x
When is not known, an estimate of is used, and the t-percentile is used
rather than the z-percentile, here 1.960. If the four assumptions are met, then
E( MSE )=σ 2Y|x
.
E [ Y^ ( x)]= β0 +β 1 x
The 95% confidence interval for is then


1 ( x− x̄ n )2
Y^ ( x )±t 1 . 960 , n−2 MSE ( + )
n n
∑ ( x i − x̄ n ) 2

i=1
.

Y F( x)
Prediction Interval for a Future Observation

Y F( x)
Let be the future value observed with the independent variable value set to x.
2
Y F ( x ) N ( β 0 + β1 x , σ Y|x ) Y i , i=1 ,…, n .
That is, is . Its distribution is independent of

Y i , i=1 ,…, n have been observed, Y^ ( x)


At a time before has the normal distribution
2
1 ( x− x̄ n )
N ( β 0 + β1 x , σ 2Y |x ( + n ))
n
∑ ( xi − x̄ n )2
Y F ( x )−Y^ (x )
i =1
. The distribution of is normal. The
E [Y F ( x)−Y^ ( x )]=E[ Y F ( x)]−E [ Y^ ( x)]= β0 +β 1 x−( β 0 + β1 x )=0
expected value of , and
1 ( x− x̄ n )2
var [Y F ( x )−Y^ ( x )]=var [Y F ( x )]+ var [ Y^ (x )]−2 cov [Y F ( x ), Y^ ( x )]=σ 2Y|x +σ Y2 |x ( + )−2⋅0 .
n n
∑ ( x i− x̄ n ) 2

i=1
2
1 ( x− x̄ n )
N ( 0 , σ Y2 |x ( 1+ + n ))
n
Y F ( x )−Y^ (x ) ∑ ( x i− x̄ n )2

i=1
In summary, is . To set up a 99% prediction
interval, one starts with

√ √
1 ( x− x̄ n )2 1 ( x− x̄ n )2
Pr {0−2 .576 σ 2Y |x ( 1+ + n )≤Y F ( x )−Y^ ( x )≤0+2 . 576 σ 2Y |x ( 1+ + n )}=0 . 99.
n n
∑ ( x i − x̄ n )2
∑ ( x i − x̄ n )2

i =1 i =1

Then,

√ √
1 ( x− x̄ n )2 1 ( x− x̄ n )2
Pr {Y^ ( x )−2 .576 σ Y |x ( 1+ + n )≤Y F ( x )≤Y^ ( x )+ 2. 576 σ Y |x ( 1+ + n )}=0 . 99 .
n n
∑ ( x i− x̄ n )2
∑ ( xi − x̄ n )2

i=1 i =1
σ 2Y|x Y F( x)
Assuming known, a 99% prediction interval for is the interval between

√ √
1 ( x− x̄ n )2 1 ( x− x̄n )2
Y^ ( x )−2. 576 σ Y |x ( 1+ + ) Y^ ( x )+2 .576 σ Y |x ( 1+ + )
n n n n
∑ ( x i− x̄ n ) 2
∑ ( x i − x̄ n )2

i=1 i =1
and . There are
Y^ ( x)
two problems with this interval. The first is that is a random variable; in other
words, it will be observed in the future. The resolution to this is to set the time of
the prediction to be after the collection of the regression data but before the future
observation is made. Then the 99% prediction interval is the interval between

√ √
1 ( x− x̄ n )2 1 ( x− x̄ n )2
^y ( x )−2. 576 σ Y |x ( 1+ + n ) ^y ( x )+2 .576 σ Y |x ( 1+ + n )
n n
∑ ( xi − x̄ n )2
∑ ( x i − x̄ n)2
^y ( x )
i =1 i =1
and , where
is the fitted value based on the regression data using the independent variable
σ 2Y|x σ 2Y|x
setting x. The second problem is that is not known. As usual we estimate
n−2
using MSE and stretch 2.576 by the t-distribution with degrees of freedom.
The 99% prediction interval is the interval between

√ √
1 ( x− x̄ n )2 1 ( x− x̄ n )2
^y ( x )−t 2 .576 , n−2 MSE ( 1+ + n ) ^y ( x )+t 2. 576 ,n−2 MSE(1+ + n )
n n
∑ ( x i− x̄ n )2
∑ ( x i− x̄ n )2

i=1 i =1
and

Problem 1

A research team collected data on n=450 students in a statistics course. The


observed average final examination score was 524, with an observed
standard deviation of 127.6 (the divisor in the estimated variance was n−1 ).
The average first examination score was 397, with an observed standard
deviation of 96.4. The correlation coefficient between the first examination
score and the final examination score was 0.63.
a. Report the analysis of variance table and result of the test of the null
hypothesis that the slope of the regression line of final exam score on
first exam score is zero against the alternative that it is not. Use the 0.10,
0.05, and 0.01 levels of significance.
b. Determine the least-squares fitted equation and give the 99% confidence
interval for the slope of the regression of final examination score on first
examination score.
c. Use the least-squares prediction equation to estimate the final
examination score of students who scored 550 on the first examination.
Give the 99% confidence interval for the expected final examination
score of these students.
d. Use the least-squares prediction equation to predict the final examination
score of a student who scored 550 on the first examination. Give the 99%
prediction interval for the final examination score of this student.

Solution:

For part a, the first task is to identify which variable is dependent and which
independent. The question asks for the “regression line of final exam score on first
exam score.” This phrasing identifies the first exam score as the independent
variable and the final exam score as the dependent variable. This also matches the
2 2
logic of regression analysis. Then, TSS=(n−1)s DV =449⋅127 .6 =7310510 . 2 , and
REGSS=[ r ( DV , IV )]2⋅TSS=(0 .63 )2⋅7310510 . 2=2901541 .5 . One can obtain SSE by
2 2
subtraction or SSE={1−[r (DV , IV )] }⋅TSS=[1−(0 .63 ) ]⋅7310510. 2=4408968 .7 . The
4408968 . 7
MSE= =9841 . 4
degrees of freedom for error is n−2 , and 448 . Then
MS REG 2901541. 5
F= = =294 . 8
MSE 9841. 4 with (1, 448) degrees of freedom. These values are
conventionally displayed in the Analysis of Variance Table below:

Analysis of Variance Table


Problem 1
DF SS MS F

Reg. 1 2901541.5 2901541.5 294.8

Res. 448 4408968.7 9841.4


Total 449 7310510.2

For α =0 . 10 , the critical value of an F distribution with (1, 448) degrees of freedom
is 2.717; for α =0 . 05 the critical value is 3.862; and for α=0 . 01 the critical value is
6.692. Reject the null hypothesis that the slope of the regression line is zero at the
0.01 level of significance (and also at the 0.05 and 0.10 levels).
sY 127 . 6
β^ 1= ⋅r ( x , y )= ⋅0 . 63=0 . 834 .
For part b, sX 96 . 4
The intercept is
β^ 0 =524−0 .834⋅397=192 .9 Y^ ( x)=192.9+0.834 x
, so that . The 99% confidence
interval for the slope is

√ √ √
MSE 9841. 4 9841 . 4
β^ 1±|t α /2 , n−2| n
=0 .834±2 . 587 2
=0 . 834±2 .587
(n−1) s IV 449⋅( 96 . 4 )2
∑ ( x i− x̄ n )2
i=1 . This is
the interval (0.71,0.96).

For part c, the 99% confidence interval for E(Y |x=550 ) is centered on
Y^ (550 )= β^ 0 + β^ 1 550=192. 9+0 . 834⋅550=651 . 6
. The 99% confidence interval is

√ √
1 ( x− x̄ n )2 1 ( 550−397 )2
Y^ ( 550 )±t 2. 576 , n−2 MSE ( + )=651. 6±2. 587 9841. 4( + )
n n 450 449⋅( 96 . 4 )2
∑ ( xi − x̄ n )2
i =1

This is

√ 23409
651 .6±2 .587 9841 . 4 (0 .002222+
4172539 .04
)=651. 6±2 .587 √ 9841 . 4 (0. 002222+0 . 005610 ),

which reduces to 651.6±22 .7 , which is the interval (628.9, 674.3) .

Part d specifies the prediction interval for the final exam score of a student whose
first exam score was 550. The center of the prediction interval is still ^y (550 )=651 .6 .

1 ( x− x̄ n )2
^y ( x )±t 2 .576 , n−2 MSE ( 1+ + n )
n
∑ ( x i− x̄ n )2

i=1

The prediction interval is This is


1 ( x− x̄ n )2
^y ( x )±t 2 .576 , n−2 MSE ( 1+ + n )=651 .6±2 .587 √ 9841 . 4 (1+ 0. 002222+0 . 005610
n
∑ ( x i− x̄ n )2

i=1

. This

reduces to
651.6±2.587⋅99.20 √(1+0.002222+0.005610)=651.6±2.587⋅99.20 √1.00783=651.6±257.6

he 99% prediction interval is (394.0, 909.23).

Problem2

( )
1 ρ ρ ρ
ρ 1 ρ ρ
ρ ρ 1 ρ
The correlation matrix of the random variables Y 1 ,Y 2 ,Y 3 ,Y 4 is ρ ρ ρ 1 ,
0< ρ<1 , and each random variable has variance σ 2 . Let W 1 =Y 1 +Y 2 +Y 3 , and
let W 2 =Y 2 +Y 3 +Y 4 . Find the variance covariance matrix of (W 1 , W 2 ) .

Solution: The solution requires the application of the result that

[]
Y1

[ ]
W1
W2 1 1 1
Y
=[ 1 1 1 0 ] 2
0 Y3
vcv (W )=vcv (MY )=M×vcv (Y )×M T Y4
where . That is,
1 1 1 0
M 2×4 =[ ]
0 1 1 1 , with

[ ]
1 ρ ρ ρ
1 1 1 0 ρ 1 ρ ρ 1+2 ρ 1+2 ρ 1+2 ρ 3 ρ
M×vcv (Y )=[ ]×σ 2 =σ 2 [ ]
0 1 1 1 ρ ρ 1 ρ 3 ρ 1+2 ρ 1+2 ρ 1+2 ρ
ρ ρ ρ 1 .
Then
1 0
M×vcv(Y )×M T == σ 2 [
1+2 ρ 1+2 ρ 1+2 ρ 3 ρ
]×[
3 ρ 1+2 ρ 1+2 ρ 1+2 ρ 1
1
]= [
1 3+6 ρ 2+7 ρ 2
1 2+7 ρ 3+6 ρ
σ ]
0 1 .
2 2
That is, var (W 1 )=var(W 2 )=(3+6 ρ )σ , and cov (W 1 ,W 2 )=(2+7 ρ )σ .

Fisher’s Transformation of the Correlation Coefficient

Your text uses Fisher’s transformation of the correlation coefficient to get a


confidence interval for a correlation coefficient. It is more useful in calculating
Type II error rates and sample size calculations. The transformation is applied to
the Pearson product moment correlation coefficient R xy calculated using n
observations ( X i , Y i ) from a bivariate normal random variable with population
1 1+R xy
F (R xy )= ln( )
correlation coefficient ρ=corr( X , Y ) . Fisher's result is that 2 1−R xy is

1 1+ρ 1
N ( ln ( ), )
approximately distributed as 2 1− ρ n−3 .

Confidence Interval for a Correlation Coefficient

The 99% confidence interval for


1
F ( ρ)= ln(
2 1−ρ
1+ ρ
)

is
F (R xy )±2 .576
√ 1
n−3
. Readers,
ρ=corr( X , Y )
. This requires
however, want to know the confidence interval for
1 1+R xy
F (R xy )= ln( )
R xy 2 1−R xy
solving for as a function of
. The solution requires some
1+R xy 1+R xy 1+R xy
2 F( R xy )=ln( ) exp [2 F( R xy )]=exp [ln ( )]=
1−R xy 1−R xy 1−R xy
algebra: so that . Using the
(1−R xy )exp[2 F (R xy )]=1+R xy R xy
on one
first and third parts of the equation, . Putting
exp[2 F( R xy )]−1=(1+exp[ 2 F ( R xy )])R xy . R xy
side of the equation yields ,
Solving for
exp [2 F (R xy )]−1
R xy=
exp [2 F (R xy )]+1
.

Modification of problem 1 above:

A research team collected data on n=450 students in a statistics course. The


correlation coefficient between the first examination score and the final
examination score was 0.63. Find the 99% confidence interval for the population
correlation of the first examination score and the final examination score.

: Fisher’s transformation of the observed correlation is


Solution
1 1+0 . 63 1 1. 63 1 1
F (0 .63 )= ln ( )= ln( )= ln( 4 . 405)= (1 . 483)=0. 741 .
2 1−0. 63 2 0. 37 2 2

Since the sampling margin of error is


2 .576
√ 1
n−3
=2. 576

1
450−3
=0 . 122 ,

the
1 1+ ρ
F ( ρ)= ln( )
2 1−ρ 0.741±0.122,
99% confidence interval for is which is the interval
exp [2 F (R xy )]−1
R xy=
exp [2 F (R xy )]+1
from 0.619 to 0.863. Using the inversion formula , the left
exp [2×0. 619 )]−1 3 . 449−1 2. 449
= = =0 . 55.
exp [2×0. 619 ]+1 3 . 449+1 4 . 449
endpoint of the confidence interval is

exp [2×0. 863 )]−1 5 . 618−1 4 .618


= = =0. 70 .
exp [2×0. 863 ]+1 5 . 618+1 6 . 618
The right endpoint is Even with 450
ρ=corr( X , Y )
observations, the 99% confidence interval for is rather wide: from

0.55 to 0.70.

Example Sample Size Calculations

A research team wishes to test the null hypothesis H 0 : ρ=0 at α =0 . 005 against the
alternative H 1 : ρ> 0 using the Fisher’s transformation of the Pearson product
moment correlation coefficient R xy as the test statistic. They have asked their
consulting statistician for a sample size n such that β=0 . 01 when ρ=0 .316 (that is,
ρ2 =0 . 10 ).

Solution: The null distribution of Fisher transformation of the Pearson product


1 1+0 1 1
N ( ln ( ), ) N (0 , )
moment correlation R xy is approximately 2 1−0 n−3 , which is n−3 .

The null hypothesis H 0 : ρ=0 is rejected at α =0 . 005 against the alternative

√ 1
H 1 : ρ> 0 when F (R xy )≥0+2. 576 n−3 . For the alternative specified, the test
statistic (Fisher’s transformation of the Pearson product moment correlation) is
1 1+0 . 316 1 1 1+0 . 316 1
N ( ln ( ), ) ln( )= ln(1. 924 )=0 .327
approximately 2 1−0. 316 n−3 . Since 2 1−0 . 316 2 , the
1
N (0 . 327 , )
approximate alternative distribution is n−3 A useful fact for checking
1 1+ρ
ln( )≃ρ
your work is that 2 1−ρ for small values of ρ. The probability of a Type II

error is
β=Pr1 {Accept H 0 }=Pr 1 {F( R xy )<0+2 .576
√ 1
}.
n−3 Then

β=Pr1 {F (R xy )<0+2 .576


1

}=Pr {
[ F ( R xy )−E 1 ( F (R xy )]
<
0+2 . 576

1
n−3
−0 .327
}.


n−3 σ 1 ( F ( R xy )) 1
n−3

As before, to select n so that β=0 . 01=Pr{Z≤−2. 326}=Φ(−2. 326 ) requires that


0+2 .576
√ 1
n−3
−0. 327
=−2 . 326 .

√ 1
n−3 This is essentially the same setup for sample size
calculations in the one and two sample problems so that the solution follows the

same steps:
0+2 . 576
1
n−3 √
−0 .327=−2 . 326
1
n−3
,

2 .576
√ 1
n−3
+2 .326
1

n−3
=0 .327−0 ,
and

2. 576 √ 1+2. 326 √ 1


√ n−3≥ =14 . 991.
|0 . 327−0| That is, n≥228 .
Researchers need on the order of 230 observations to detect reliably (that is, two
sided level of signficanceα=0 . 01 , β=0. 01 )an association that explains 10% of the
variation of the dependent variable.

This result is easily generalized. The fundamental design equation then gives us
|z α|σ 0 +|z β |σ 1
√ n−3≥
that |E1 −E0| , where |z α|=2. 576 , |z β|=2. 326 , σ 0=σ 1 =1 , E1 =F ( ρ1 ) ,
and E0 =0 .

You might also like