Chapter 11 Lecture Notes .
Chapter 11 Lecture Notes .
The research context is that two variables have been observed for each of n
participants. The research team then has a spreadsheet with n pairs of observations
( x i , y i ), i=1 ,…, n . One of the variables (here y) is the outcome variable or dependent
variable. This is the variable hypothesized to be affected by the other variable in
scientific research. The other variable (here x) is the independent variable. It may
be hypothesized to predict the outcome variable or to cause a change in the
outcome variable. The research task is to document the association between
independent and dependent variables. An example of a research project seeking to
document a causal association would be a clinical trial in which x i was the dosage
of a medicine randomly assigned to a participant (say simvastatin) and y i was the
participant’s response after a specified period taking the medicine (say cholesterol
reduction after 3 months). An example of a study seeking to document the value of
a predictive association would be an observational study in which x i was the score
of a statistics student on the first examination in a course and y i was the student’s
score on the final examination in the course.
OLS (ordinary least squares) is the most commonly used method to estimate the
parameters of the linear model. An arbitrary linear model b 0 +b 1 x is used as a fit for
the dependent variable values. The method uses the residual y i −b0 −b 1 xi . The
fitting model is judged by how small the set of residuals is. OLS uses each residual
and focuses on the magnitude of the residuals by examining the sum of squares
n
SS(b 0 , b1 )=∑ ( y i−b0 −b1 x i )2
function i=1 . The OLS method is to find the arguments
^ ^
( β 0 , β 1 ) that make SS(b 0 , b1 ) as small as possible. This minimization is a standard
i=1 ∂ b0 i=1 ∂ b0
∂ SS(b 0 , b1 ) n
=∑ (−2 )( y i−b 0 −b1 x i )
∂ b0 i=1 .
∂ b1
= ∑ ( y −b −b x )2
= ∑
∂b 1 i=1 i 0 1 i i=1 ∂ b1 i 0 1 i i=1
∑ i 0 1 i ∂ b1
∂ SS(b 0 , b1 ) n
=∑ (−2 x i )( y i −b0 −b 1 x i )
∂ b1 i=1 .
^ ^
Step 2 is to find the arguments ( β 0 , β 1 ) that make the two partial derivatives zero.
The resulting equations are called the normal equations:
n
∑ (−2)( y i− β^ 0− β^ 1 x i )=0
i=1 and
n
∑ (−2)( y i− β^ 0− β^ 1 x i ) x i=0
i=1 .
^ ^
These equations have a very important interpretation. Let r i = y i− β0 − β 1 x i , i=1 ,…, n .
n n
∑ r i=0 ∑ r i x i =0
The first normal equation is equivalent to i=1 , and the second is i=1 .
That is, there are two constraints on the n residuals. The OLS residuals must sum
to zero, and the OLS residuals are orthogonal to the independent variable values.
The n residuals then have n−2 degrees of freedom.
Step 3 is to solve this two linear equation system in two unknowns. Start by using
^
the first normal equation to solve for β 0 :
n n n n
∑ ( y i− β^ 0− β^ 1 x i )=∑ y i−∑ β^ 0−∑ β^ 1 x i= n ȳ n−n β^ 0− β^ 1(n x̄ n )=0 ^
i=1 i=1 i=1 i=1 . Solving for β 0 yields
β^ 0 = ȳ n − β^ 1 x̄n . Next, insert the solution for β^ 0 in the second normal equation and
^
solve for β 1 :
n n n n
0=∑ ( y i − β^ 0 − β^ 1 x i )x i =∑ {[ yi − ¿( ȳ n− β^ 1 x̄ n )− β^ 1 x i ] xi }=∑ [( y i − ȳ n )x i ]−∑ [ β^ 1 ( x i− x̄ n )x i ] ¿
i=1 i=1 i=1 i=1 ,
n
∑ ( y i− ȳ n ) x i
β^ 1= i =1
n
∑ ( x i− x̄ n ) xi
The solution is i=1 . There are a number of modifications of this
formula that are helpful. The first results from noting that
n n n n
∑ ( x i− x̄ n )2=∑ ( x i− x̄ n ) x i−∑ ( xi − x̄ n ) x̄ n= ∑ ( x i− x̄ n ) x i
i=1 i=1 i=1 i=1 and
n n n n
∑ ( y i− ȳ n )( x i − x̄ n )=∑ ( y i − ȳ n ) xi −∑ ( yi − ȳ n ) x̄ n= ∑ ( y i− ȳ n ) x i
i=1 i=1 i=1 i=1 . The OLS solution is
n
∑ ( y i− ȳ n )( x i − x̄n )
β^ 1= i =1 n
∑ ( x i − x̄ n )2
then i=1 . This is a very commonly quoted formula.
^
The second shows the relation of β 1 and the Pearson product moment
correlation. The Pearson product moment correlation is a dimensionless measure of
n
∑ ( y i − ȳ n )( xi − x̄ n )
i=1
r ( x , y )=
√ √
n n
∑ ( x i− x̄ n )2 ∑ ( yi − ȳ n )2
association. The formula is i=1 i =1 . The Cauchy-
Schwartz inequality shows that |r( x , y )|≤1 . A correlation of +1 or -1 shows a
perfect linear association. A correlation of 0 means no linear association. The
^ ^
numerator of β 1 and r( x , y ) are the same. Starting with β 1 ,
√∑
n n n
∑ ( y i− ȳ n )( x i − x̄n ) ∑ ( y i− ȳ n )( x i − x̄ n ) ( y i− ȳ n )2
β^ 1= i =1 = i=1
⋅ i =1
√ √ √∑
n n n n
∑ ( x i − x̄ n )2 ∑ ( x i− x̄ n )2⋅ ∑ ( y i − ȳ n )2 ( x i− x̄ n )2
i=1 i=1 i=1 i=1 . That is,
√
n n
∑ ( y i − ȳ n )( xi − x̄ n ) ∑ ( y i− ȳ n )2 √( n−1) s 2Y = s Y ⋅r ( x , y )
β^ 1= i=1
⋅ i=1
=r ( x , y )⋅
√ √ √∑ √( n−1 ) s2X s X
n n n
∑ ( x i − x̄ n )2⋅ ∑ ( y i − ȳ n )2 ( x i− x̄ n )2
i =1 i=1 i=1 . The
sY
β^ 1= ⋅r ( x , y )
second formula is then sX .
Fisher’s decomposition is a fundamental tool for the analysis of the linear model. It
n n n
TSS=∑ ( y i − ȳ n )2 =∑ [ y i− ȳ n− β^ 1 ( xi − x̄ n )+ β^ 1 ( x i− x̄ n ) ]2 =∑ [ r i + β^ 1 ( x i− x̄ n ) ]2
starts with i=1 i=1 i=1 .
n n
TSS=∑ [r i + β^ 1 ( x i− x̄ n )]2 =∑ [r 2i + β^ 12 ( x i− x̄ n )2 +2 β^ 1 r i ( x i− x̄ n )]
Next i=1 i=1 , and
n n n n
TSS=∑ r 2i + ∑ β^ 21 ( x i− x̄ n )2 +2 β^ 1 ∑ r i ( x i − x̄ n ) ∑ r 2i =SSE
i=1 i=1 i=1 . The first sum i=1 , the sum of
n
∑ β^ 21 ( xi − x̄ n )2
squared errors and has n−2 degrees of freedom. The second sum i=1 is
called the regression sum of squares and has 1 degree of freedom. It can be
simplified:
[ ]
√∑
2
n
2
n n ( yi − ȳ n ) n
RegSS=∑ β^ 1 ( xi − x̄ n ) = β^ 1 ∑ ( xi − x̄ n ) =[ r ( x , y ) ] ∑ ( x i− x̄ n )2
2 2 2 2 2 i =1
√∑
n
i =1 i =1 i=1
( xi − x̄ n )2
i =1
and
[ ]
√∑
2
n
2
( y i − ȳ n ) n n
RegSS=[ r ( x , y ) ]
2 i=1
∑ ( x i − x̄ n ) =[ r ( x , y ) ] ∑ ( y i− ȳ n )2 =[r ( x , y ) ]2 TSS
2 2
√∑
n
i =1 i =1
( x i − x̄ n )2
i =1
.
n n n
2 β^ 1 ∑ r i ( x i− x̄ n )=2 β^ 1 ( ∑ r i x i −∑ r i x̄ n )=2 β^ 1 (0−0)=0
Finally, the third sum i=1 i=1 i=1 .
11.3 Inferences
There must be a probabilistic model for the data so that researchers can make
inferences and find confidence intervals. The model for one predictor linear
regression is Y i =β 0 +β 1 x i +σ Y |x Z i . The outcome or dependent (random) variables
Y i , i=1 ,…, n are each assumed to be the sum of the linear regession expected value
β 0 +β 1 x i and a random error term σ Y|x Z i . The random variables Zi , i=1 ,…, n are
There are four assumptions. There are two important assumptions: the
outcome variables Y i , i=1 ,…, n are independent and E(Y i|X=x i )=β 0 +β 1 x i for
i=1 ,…,n . Homoscedasticity is less important. The assumption that Y i , i=1 ,…, n are
normally distributed random variables is least important.
Variance Calculations
More complex calculations are required for the variance-covariance matrix of the
OLS estimates. The easiest way is to use the variance-covariance matrix of a
T
random vector. Let Y be an n×1 vector of random variables (Y 1 , Y 2 ,…Y n ) . That is,
each component of the vector is a random variable. Then the expected value of
vector Y is the n×1 vector whose components are the respective means of the
T
random variables; that is, E(Y )=(EY 1 , EY 2 ,…EY n ) . The variance-covariance matrix
of the random vector Y is the n×n matrix whose diagonal entries are the respective
variances of the random variables and whose off-diagonal elements are the
covariances of the random variables. That is,
[ ]
var(Y 1 ) cov (Y 1 ,Y 2 ) ⋯ cov (Y 1 ,Y n )
cov(Y 2 ,Y 1 ) var(Y 2 ) ⋯ cov (Y 2 , Y n )
vcv (Y )=
⋮ ⋮ ⋱ ⋮
cov(Y n , Y 1 ) cov (Y n , Y 2 ) ⋯ var(Y n )
.
T
In terms of expectation operator calculations, vcv(Y )=E [(Y −EY )(Y −EY ) ]=Σ .
From matrix algebra, when A is an n×m matrix and B is anm× p matrix, then
( AB)T =BT AT . Then [ M (Y −EY )]T =(Y −EY )T M T , and
vcv (W )=vcv (MY )=E {M (Y −EY )(Y −EY )T M T }=M {E [(Y −EY )(Y − EY )T ]}M T
from the
vcv(Y )=E [(Y −EY )(Y −EY )T ]=Σ
linear operator property of E. Since ,
vcv (W )=vcv (MY )=M×vcv (Y )×M T =MΣM T
Examples
The first use of this result is to find the variance of a linear combination of values
from Y, an n1 vector of random variables. Let a be an n1 vector of constants,
T T T T T
and let W =a Y . Then var(a Y )=a ×vcv (Y )×(a ) =a ×vcv (Y )×a . This is the
T
2 2
completely general form of var(aX +bY )=a var( X )+b var(Y )+2 ab cov ( X ,Y ) .
∑ ( x i − x̄ n )2
i =1
and
n
∑ ( xi − x̄ n )Y i ( x 1− x̄ n ) ( x 2− x̄ n ) ( x n − x̄ n )
β^ 1= n
i =1
= n
Y1+ n
Y 2 +⋯+ n
Yn
∑ ( x i− x̄ n ) 2
∑ ( xi − x̄ n ) 2
∑ ( x i− x̄ n ) 2
∑ ( x i− x̄ n ) 2
.
n
( xi − x̄ n )
∑ ( xi − x̄ n )Y i
w i= n
, i=1 ,… ,n β^ 1= i =1n =w1 Y 1 +w 2 Y 2 +⋯+w n Y n
∑ ( x i − x̄ n ) 2
∑ ( x i− x̄ n ) 2
i =1 i=1
Let . Then .
[]
Y1
( )
Ȳ n
=[ 1/n 1/n ⋯ 1/n ] Y 2
β^ 1 w1 w 2 ⋯ wn ⋮
Yn
Let , which has the form MY, where
1/n 1/n ⋯ 1/n
M=[ ]
w1 w2 ⋯ w n . In the model Y i =β 0 +β 1 x i +σ Y |x Z i vcv (Y )=σ 2Y|x I n×n
, . Then
1/n w1 1/n w1
()
Ȳ 1 /n 1/n ⋯ 1/n 1/n w2 1 /n 1/n ⋯ 1/n 1/n w2
vcv n =[ ]×σ 2Y|x I n×n ×[ ]=σ 2Y|x [ ]×[ ]
β^ 1 w1 w 2 ⋯ w n ⋮ ⋮ w1 w 2 ⋯ w n ⋮ ⋮
1 /n wn 1/n w
n .
Then,
[ ]
n n
w1 wi
1/n
∑ n12 ∑
( )
Ȳ n 1/ n 1/n ⋯ 1/n 1/n w2 n
vcv =σ 2Y|x [ ]×[ ]=σ 2Y|x i=1 i=1
^β w1 w 2 ⋯ wn ⋮ ⋮ n
w n
1
1/ n wn ∑ ni ∑ w2i
i=1 i=1
.
n wi n
1 ( x i − x̄ n ) 1
n
∑ n
=∑
n n
= n ∑ ( x i− x̄ n )=0
i =1
∑ ( x i − x̄ n )2 n ∑ ( x i− x̄ n )2
i=1 i=1
i=1 i=1 i =1 .
[ ][ ]
2
n n
wi σ Y |x
∑ n12 ∑ n
0
( )
Ȳ n
vcv n =σ 2Y |x i=1 i =1
= σ 2Y |x
β^ n
wi n
0
1
∑ n
∑ wi2 n
i=1 i=1 ∑ ( xi − x̄ n )2
The final result is that i =1
σ 2Y |x
var ( β^ 1 )= n
σ Y2 |x
var( Ȳ n )= ∑ ( x i− x̄ n )2 ^
To summarize this result, n , i=1 , and cov ( Ȳ n , β 1 )=0 .
The last detail before deriving tests and confidence intervals for the slope of the
E( β^ 1 )
. Then
i=1
above,
n
n n ( xi − x̄ n )
∑ ( xi − x̄ n ) x i
∑ w i x i= ∑ n
xi=
i =1
n
=1 .
∑ ( xi − x̄ n ) ∑ ( x i− x̄ n )
i=1 i=1 2 2
i =1 i=1
^ )= β
E( β β^ 1
1 1
Then . Under the data model, the distribution of is
σ 2Y |x
N ( β1 , n
).
∑ ( x i− x̄ n ) 2
i=1
H 0 : β1 =0 H 1 : β 1 ≠0
The key null hypothesis is , and the alternative hypothesis is .
σ 2Y |x
N (0 , n
).
β^ 1 ∑ ( x i− x̄ n ) 2
i=1
The test statistic is , and the null distribution is The standard
β^ 1−0
Z=
√
σ Y2 |x
n
∑ ( x i − x̄ n )2
i=1 α
score form of the statistic is . When the level of significance is
σ 2Y|x H 0 : β1 =0 |Z|≥|z α /2| σ 2Y|x
and is known, then is rejected when . When is not
σ^ 2Y|x =MSE
known, it is estimated by . This requires the use of the Student’s t
β^ 1 −0
T n−2 =
√
MSE
n
∑ ( x i − x̄ n )2
i =1
distribution. The studentized form of the statistic is . Then,
H 0 : β1 =0 |T n−2|≥|t α /2,n−2|
is rejected when . An equivalent approach is to use
MS REG
TS= =F H 0 : β1 =0
MSE
. Under , the null distribution of F is a central F with 1
n−2
numerator and denominator degrees of freedom.
σ 2Y|x (1−α) β1
When is known, the % confidence interval for is
√
σ 2Y |x
β^ 1±|z α / 2| n
∑ ( x i − x̄ n )2 σ Y|x 2
(1−α)
. When
i =1
is not known, the % confidence interval
√
MSE
β^ 1±|t α /2 , n−2| n
β1 ∑ ( x i− x̄ n )2
i=1
for is .
β^ 0 =Ȳ n − β^ 1 x̄ n β^ 0
Since , is normally distributed with
E( β^ 0 )=E ( Ȳ n − β^ 1 x̄ n )=E( Ȳ n )−E ( β^ 1 x̄ n )=E ( Ȳ n )− x̄ n E( β^ 1 )=E ( Ȳ n )−β 1 x̄ n E( β^ 1 )=β 1 .
because
Y 1 +Y 2 +⋯+ Y n E (Y 1 )+ E (Y 2 )+⋯+ E (Y n ) ( β0 + β 1 x1 )+⋯+( β0 + β 1 x n )
E( Ȳ n )=E( )= =
Now n n n
,
( β 0 + β 1 x 1 )+⋯+( β0 + β 1 x n ) nβ 0 + β1 ∑ x i
E( Ȳ n )= = =β 0 + β 1 x̄ n
n n
with . Then
E( β^ 0 )=E ( Ȳ n )−β 1 x̄ n =β 0 +β 1 x̄ n− β1 x̄ n =β 0
. Finally,
σ 2Y |x ( x̄ n )2 σ 2Y |x
var( β^ 0 )=var( Ȳ n − β^ 1 x̄ n )=var( Ȳ n )+( x̄ n )2 var( β^ 1 )−2 x̄ n cov ( Ȳ n , β^ 1 )= + n −2⋅0
n
∑ ( x i− x̄ n ) 2
i=1
. In
σ 2Y |x ( x̄ n )2 σ 2Y |x
N ( β0 , + n )
n
β^ 0 ∑ ( x i− x̄ n )2
i=1
summary, is
Y^ ( x)
Confidence Interval for
√
1 ( x− x̄ n )2
Y^ ( x )±1. 960 σ Y |x ( +
2
)
n n
E [ Y^ ( x)]= β0 +β 1 x ∑ ( x i− x̄ n ) 2
i=1
95% confidence interval for is .
σ 2Y|x σ 2Y|x
When is not known, an estimate of is used, and the t-percentile is used
rather than the z-percentile, here 1.960. If the four assumptions are met, then
E( MSE )=σ 2Y|x
.
E [ Y^ ( x)]= β0 +β 1 x
The 95% confidence interval for is then
√
1 ( x− x̄ n )2
Y^ ( x )±t 1 . 960 , n−2 MSE ( + )
n n
∑ ( x i − x̄ n ) 2
i=1
.
Y F( x)
Prediction Interval for a Future Observation
Y F( x)
Let be the future value observed with the independent variable value set to x.
2
Y F ( x ) N ( β 0 + β1 x , σ Y|x ) Y i , i=1 ,…, n .
That is, is . Its distribution is independent of
i=1
2
1 ( x− x̄ n )
N ( 0 , σ Y2 |x ( 1+ + n ))
n
Y F ( x )−Y^ (x ) ∑ ( x i− x̄ n )2
i=1
In summary, is . To set up a 99% prediction
interval, one starts with
√ √
1 ( x− x̄ n )2 1 ( x− x̄ n )2
Pr {0−2 .576 σ 2Y |x ( 1+ + n )≤Y F ( x )−Y^ ( x )≤0+2 . 576 σ 2Y |x ( 1+ + n )}=0 . 99.
n n
∑ ( x i − x̄ n )2
∑ ( x i − x̄ n )2
i =1 i =1
Then,
√ √
1 ( x− x̄ n )2 1 ( x− x̄ n )2
Pr {Y^ ( x )−2 .576 σ Y |x ( 1+ + n )≤Y F ( x )≤Y^ ( x )+ 2. 576 σ Y |x ( 1+ + n )}=0 . 99 .
n n
∑ ( x i− x̄ n )2
∑ ( xi − x̄ n )2
i=1 i =1
σ 2Y|x Y F( x)
Assuming known, a 99% prediction interval for is the interval between
√ √
1 ( x− x̄ n )2 1 ( x− x̄n )2
Y^ ( x )−2. 576 σ Y |x ( 1+ + ) Y^ ( x )+2 .576 σ Y |x ( 1+ + )
n n n n
∑ ( x i− x̄ n ) 2
∑ ( x i − x̄ n )2
i=1 i =1
and . There are
Y^ ( x)
two problems with this interval. The first is that is a random variable; in other
words, it will be observed in the future. The resolution to this is to set the time of
the prediction to be after the collection of the regression data but before the future
observation is made. Then the 99% prediction interval is the interval between
√ √
1 ( x− x̄ n )2 1 ( x− x̄ n )2
^y ( x )−2. 576 σ Y |x ( 1+ + n ) ^y ( x )+2 .576 σ Y |x ( 1+ + n )
n n
∑ ( xi − x̄ n )2
∑ ( x i − x̄ n)2
^y ( x )
i =1 i =1
and , where
is the fitted value based on the regression data using the independent variable
σ 2Y|x σ 2Y|x
setting x. The second problem is that is not known. As usual we estimate
n−2
using MSE and stretch 2.576 by the t-distribution with degrees of freedom.
The 99% prediction interval is the interval between
√ √
1 ( x− x̄ n )2 1 ( x− x̄ n )2
^y ( x )−t 2 .576 , n−2 MSE ( 1+ + n ) ^y ( x )+t 2. 576 ,n−2 MSE(1+ + n )
n n
∑ ( x i− x̄ n )2
∑ ( x i− x̄ n )2
i=1 i =1
and
Problem 1
Solution:
For part a, the first task is to identify which variable is dependent and which
independent. The question asks for the “regression line of final exam score on first
exam score.” This phrasing identifies the first exam score as the independent
variable and the final exam score as the dependent variable. This also matches the
2 2
logic of regression analysis. Then, TSS=(n−1)s DV =449⋅127 .6 =7310510 . 2 , and
REGSS=[ r ( DV , IV )]2⋅TSS=(0 .63 )2⋅7310510 . 2=2901541 .5 . One can obtain SSE by
2 2
subtraction or SSE={1−[r (DV , IV )] }⋅TSS=[1−(0 .63 ) ]⋅7310510. 2=4408968 .7 . The
4408968 . 7
MSE= =9841 . 4
degrees of freedom for error is n−2 , and 448 . Then
MS REG 2901541. 5
F= = =294 . 8
MSE 9841. 4 with (1, 448) degrees of freedom. These values are
conventionally displayed in the Analysis of Variance Table below:
For α =0 . 10 , the critical value of an F distribution with (1, 448) degrees of freedom
is 2.717; for α =0 . 05 the critical value is 3.862; and for α=0 . 01 the critical value is
6.692. Reject the null hypothesis that the slope of the regression line is zero at the
0.01 level of significance (and also at the 0.05 and 0.10 levels).
sY 127 . 6
β^ 1= ⋅r ( x , y )= ⋅0 . 63=0 . 834 .
For part b, sX 96 . 4
The intercept is
β^ 0 =524−0 .834⋅397=192 .9 Y^ ( x)=192.9+0.834 x
, so that . The 99% confidence
interval for the slope is
√ √ √
MSE 9841. 4 9841 . 4
β^ 1±|t α /2 , n−2| n
=0 .834±2 . 587 2
=0 . 834±2 .587
(n−1) s IV 449⋅( 96 . 4 )2
∑ ( x i− x̄ n )2
i=1 . This is
the interval (0.71,0.96).
For part c, the 99% confidence interval for E(Y |x=550 ) is centered on
Y^ (550 )= β^ 0 + β^ 1 550=192. 9+0 . 834⋅550=651 . 6
. The 99% confidence interval is
√ √
1 ( x− x̄ n )2 1 ( 550−397 )2
Y^ ( 550 )±t 2. 576 , n−2 MSE ( + )=651. 6±2. 587 9841. 4( + )
n n 450 449⋅( 96 . 4 )2
∑ ( xi − x̄ n )2
i =1
This is
√ 23409
651 .6±2 .587 9841 . 4 (0 .002222+
4172539 .04
)=651. 6±2 .587 √ 9841 . 4 (0. 002222+0 . 005610 ),
Part d specifies the prediction interval for the final exam score of a student whose
first exam score was 550. The center of the prediction interval is still ^y (550 )=651 .6 .
√
1 ( x− x̄ n )2
^y ( x )±t 2 .576 , n−2 MSE ( 1+ + n )
n
∑ ( x i− x̄ n )2
i=1
√
1 ( x− x̄ n )2
^y ( x )±t 2 .576 , n−2 MSE ( 1+ + n )=651 .6±2 .587 √ 9841 . 4 (1+ 0. 002222+0 . 005610
n
∑ ( x i− x̄ n )2
i=1
. This
reduces to
651.6±2.587⋅99.20 √(1+0.002222+0.005610)=651.6±2.587⋅99.20 √1.00783=651.6±257.6
Problem2
( )
1 ρ ρ ρ
ρ 1 ρ ρ
ρ ρ 1 ρ
The correlation matrix of the random variables Y 1 ,Y 2 ,Y 3 ,Y 4 is ρ ρ ρ 1 ,
0< ρ<1 , and each random variable has variance σ 2 . Let W 1 =Y 1 +Y 2 +Y 3 , and
let W 2 =Y 2 +Y 3 +Y 4 . Find the variance covariance matrix of (W 1 , W 2 ) .
[]
Y1
[ ]
W1
W2 1 1 1
Y
=[ 1 1 1 0 ] 2
0 Y3
vcv (W )=vcv (MY )=M×vcv (Y )×M T Y4
where . That is,
1 1 1 0
M 2×4 =[ ]
0 1 1 1 , with
[ ]
1 ρ ρ ρ
1 1 1 0 ρ 1 ρ ρ 1+2 ρ 1+2 ρ 1+2 ρ 3 ρ
M×vcv (Y )=[ ]×σ 2 =σ 2 [ ]
0 1 1 1 ρ ρ 1 ρ 3 ρ 1+2 ρ 1+2 ρ 1+2 ρ
ρ ρ ρ 1 .
Then
1 0
M×vcv(Y )×M T == σ 2 [
1+2 ρ 1+2 ρ 1+2 ρ 3 ρ
]×[
3 ρ 1+2 ρ 1+2 ρ 1+2 ρ 1
1
]= [
1 3+6 ρ 2+7 ρ 2
1 2+7 ρ 3+6 ρ
σ ]
0 1 .
2 2
That is, var (W 1 )=var(W 2 )=(3+6 ρ )σ , and cov (W 1 ,W 2 )=(2+7 ρ )σ .
1 1+ρ 1
N ( ln ( ), )
approximately distributed as 2 1− ρ n−3 .
is
F (R xy )±2 .576
√ 1
n−3
. Readers,
ρ=corr( X , Y )
. This requires
however, want to know the confidence interval for
1 1+R xy
F (R xy )= ln( )
R xy 2 1−R xy
solving for as a function of
. The solution requires some
1+R xy 1+R xy 1+R xy
2 F( R xy )=ln( ) exp [2 F( R xy )]=exp [ln ( )]=
1−R xy 1−R xy 1−R xy
algebra: so that . Using the
(1−R xy )exp[2 F (R xy )]=1+R xy R xy
on one
first and third parts of the equation, . Putting
exp[2 F( R xy )]−1=(1+exp[ 2 F ( R xy )])R xy . R xy
side of the equation yields ,
Solving for
exp [2 F (R xy )]−1
R xy=
exp [2 F (R xy )]+1
.
the
1 1+ ρ
F ( ρ)= ln( )
2 1−ρ 0.741±0.122,
99% confidence interval for is which is the interval
exp [2 F (R xy )]−1
R xy=
exp [2 F (R xy )]+1
from 0.619 to 0.863. Using the inversion formula , the left
exp [2×0. 619 )]−1 3 . 449−1 2. 449
= = =0 . 55.
exp [2×0. 619 ]+1 3 . 449+1 4 . 449
endpoint of the confidence interval is
0.55 to 0.70.
A research team wishes to test the null hypothesis H 0 : ρ=0 at α =0 . 005 against the
alternative H 1 : ρ> 0 using the Fisher’s transformation of the Pearson product
moment correlation coefficient R xy as the test statistic. They have asked their
consulting statistician for a sample size n such that β=0 . 01 when ρ=0 .316 (that is,
ρ2 =0 . 10 ).
√ 1
H 1 : ρ> 0 when F (R xy )≥0+2. 576 n−3 . For the alternative specified, the test
statistic (Fisher’s transformation of the Pearson product moment correlation) is
1 1+0 . 316 1 1 1+0 . 316 1
N ( ln ( ), ) ln( )= ln(1. 924 )=0 .327
approximately 2 1−0. 316 n−3 . Since 2 1−0 . 316 2 , the
1
N (0 . 327 , )
approximate alternative distribution is n−3 A useful fact for checking
1 1+ρ
ln( )≃ρ
your work is that 2 1−ρ for small values of ρ. The probability of a Type II
error is
β=Pr1 {Accept H 0 }=Pr 1 {F( R xy )<0+2 .576
√ 1
}.
n−3 Then
√
n−3 σ 1 ( F ( R xy )) 1
n−3
√ 1
n−3 This is essentially the same setup for sample size
calculations in the one and two sample problems so that the solution follows the
same steps:
0+2 . 576
1
n−3 √
−0 .327=−2 . 326
1
n−3
,
√
2 .576
√ 1
n−3
+2 .326
1
√
n−3
=0 .327−0 ,
and
This result is easily generalized. The fundamental design equation then gives us
|z α|σ 0 +|z β |σ 1
√ n−3≥
that |E1 −E0| , where |z α|=2. 576 , |z β|=2. 326 , σ 0=σ 1 =1 , E1 =F ( ρ1 ) ,
and E0 =0 .