0% found this document useful (0 votes)
10 views

Lect 9

1) This document discusses multiple linear regression when the predictor variables (x's) are random. It presents the multivariate normal regression model and the formulas for estimating the regression coefficients, intercept, and variance from a random sample. 2) It also derives the relationships between the sample correlation matrix and the estimated regression coefficients, showing they can be expressed in terms of correlations. 3) Finally, it defines the population multiple correlation coefficient and coefficient of determination, and lists some of their properties in multivariate normal regression.

Uploaded by

AmalAbdlFattah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lect 9

1) This document discusses multiple linear regression when the predictor variables (x's) are random. It presents the multivariate normal regression model and the formulas for estimating the regression coefficients, intercept, and variance from a random sample. 2) It also derives the relationships between the sample correlation matrix and the estimated regression coefficients, showing they can be expressed in terms of correlations. 3) Finally, it defines the population multiple correlation coefficient and coefficient of determination, and lists some of their properties in multivariate normal regression.

Uploaded by

AmalAbdlFattah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 9: Multiple Regression: Random x’s

In the random-x case, k + 1 variables y, x1 , x2 , . . . , xk are measured on each of the n subjects,


and we have
Cov((y, x1 , . . . , xk )T ) = Σ,
where Σ is not a diagonal matrix.

1 Multivariate Normal Regression Model


Under the normality assumption that (y, xT )T is distributed as Nk+1 (µ, Σ) with
!
µy
µ= ,
µx
!
σyy σ Tyx
Σ= . (1)
σ yx Σxx
By the property of the multivariate Gaussian distribution, we have
E(y|x) = µy + σ Tyx Σ−1 T
xx (x − µx ) = β0 + β 1 x,

where β0 = µy − σ Tyx Σ−1


xx µx and
β 1 = Σ−1
xx σ yx . (2)
We also have
Var(y|x) = σyy − σ Tyx Σ−1 2
xx σ yx := σ .

2 Estimation and Testing in Multivariate Normal Regres-


sion
Theorem 2.1. If (y1 , xT1 ), (y2 , xT2 ), . . ., (yn , xTn ) is a random sample from Nk+1 (µ, Σ), the maxi-
mum likelihood estimators are
!

µ̂ = ,

! (3)
n−1 n−1 syy sTyx
Σ̂ = S= .
n n syx S xx

1
Theorem 2.2. The MLE of a function of one or more parameters is the same function of the
corresponding estimators; that is, if θ̂ is the MLE of the vector or matrix of parameters θ, then
g(θ̂) is the MLE of g(θ).

Example 2.1. We illustrate the use of the invariance property in Theorem 2.2 by showing that the
sample correlation matrix R is the MLE of the population correlation matrix P ρ when sampling
from the multivariate Gaussian distribution.

Theorem 2.3. If (y1 , xT1 ), . . ., (yn , xTn ) is a random sample from Nk+1 (µ, Σ), the MLEs of β0 ,
β 1 and σ 2 are given by

β̂0 = ȳ − sTyx S −1
xx x̄,

β̂ − 1 = S −1
xx syx ,
n−1 2
σ̂ 2 = s,
n
where s2 = syy − sTyx S −1
xx syx .

3 Standardized Regression Coefficients


We now show that the regression coefficient vector β̂ 1 can be expressed in terms of sample corre-
lations. The sample correlation matrix can be written in partitioned form as
!
1 r Tyx
R= , (4)
r yx Rxx

where ryx is the vector of correlation between y and x’s, and Rxx is the correlation matrix for the
x’s. In particular, we have
Pn
− ȳ)(xij − x̄j )
i=1 (yi
ry,xj = qP .
n 2
Pb 2
(y
i=1 i − ȳ) (x
i=1 ij − x̄ j )

R can be converted to S by
S = DRD,
√ √
where D = [diag(S)]1/2 = diag(sy , s11 , . . . , skk ). Therefore
! !
syy sTyx s2y sy r Tyx D x
S= = ,
syx S xx sy D x D x Rxx D x

where D x = diag(s1 , s2 , . . . , sk ) and si = sii for i = 1, 2, . . . , k. By the partition of S, we have

S xx = D x Rxx D x , syx = sy D x r yx .

2
Therefore,
β̂ 1 = S −1 −1 −1
xx syx = sy D x Rxx r yx .

We illustrate the formula for k = 2. Consider the centered the model:

ŷi = ȳ + β̂1 (xi1 − x̄1 ) + β̂2 (xi2 − x̄2 ),

which can be expressed in terms of standardized variables as


   
ŷi − ȳ s1 xi1 − x̄1 s2 xi2 − x̄2
= β̂1 + β̂2 .
sy sy s1 sy s2
We this define the standardized coefficient as
s1
β̂j∗ = β̂j .
sy
In the matrix-vector form, we have
∗ 1
β̂ 1 = D x β̂ 1 ,
sy
which can be further written as

β̂ 1 = R−1
xx r yx .

4 R2 in Multivariate Normal Regression


The population multiple correlation coefficient ρy|x is defined as the correlation between y and the
linear function w = µy + σ Tyx Σ−1
xx (x − µx ):

σyw
ρy|x = corr(y, w) = ,
σy σw
where w is equal to E(y|x). As x varies randomly, w becomes a random variable.
It is easy to establish that

Cov(y, w) = Var(w) = σ Tyx Σ−1


xx σ yx ,

by the fact that y = w + e, and e denotes a random error vector.


Then the population multiple correlation ρy|x becomes
s
Cov(y, w) σ ′yx Σ−1
xx σ yx
ρy|x = = .
Var(y)Var(w) σ yx

The population coefficient of determination or population squared multiple correlation ρ2y|x is given
by
σ ′yx Σ−1
xx σ yx
ρ2y|x = .
σ yx
In what follows, we list some properties of ρy|x and ρ2y|x :

3
1. ρy|x is the maximum correlation between y and any linear function of x; that is, ρy|x =
maxa ρy,a′ x .

2. ρ2y|x can be expressed in terms of determinants:

|Σ|
ρ2y|x = 1 − ,
σyy |Σxx |

where Σ and Σxx are as defined in (??).

3. ρ2y|x is invariant to linear transformations on y or on the x’s; that is, if u = ay and v = Bx,
where B is nonsingular, then ρ2u|v = ρ2y|x .

4. Using Var(w) = σ ′yx σ −1 2


xx σ yx , ρy|x can be written in the form

Var(w)
ρ2y|x = .
Var(y)

5. Var(y|x) can be expressed in terms of ρ2y|x :

Var(y|x) = σyy − σ Tyx Σ−1 2 2


xx σ yx = σyy − σyy ρy|x = σyy (1 − ρy|x ).

6. If we consider y − w as a residual or error term, then y − w is uncorrelated with the x’s,

Cov(y − w, x) = 0.

We can obtain a MLE for ρ2y|x , which is

2
s′yx Sxx
−1
syx
R = .
syy

R is called the sample multiple correlation coefficient.


We now list several properties of R and R2 , some of which are analogous to properties of ρ2y|x
above.

1. R is equal to the correlation between y and ŷ.

2. R is equal to the maximum correlation between y and ant linear combination a′ x:

R = max ry,a′ x .
a

3. R2 can be expressed in terms of correlations:

R2 = r Tyx R−1
xx r yx ,

where r yx and Rxx are from the samle correlation matrix R partitioned as in (4).

4
4. R2 can be obtained from R−1 :
1
R2 = 1 − ,
ryy
where ryy is the first diagonal element of R−1 .

5. R2 can be expressed in terms of determinants:

|S| |R|
R2 = 1 − =1− ,
syy |S xx | |Rxx |

where S xx are Rxx are defined in (3) and (4).

6. If ρ2y|x = 0, the expected value of R2 is given by

k
E(R2 ) = .
n−1
R2 is biased when ρ2y|x = 0.

7. R2 ≥ maxj ryj
2
, where ryj is an element of r yx = (ry1 , ry2 , . . . , ryk )T .

8. R2 is invariant to full linear transformations on y or on the x’s.

5 Tests and Confidence Intervals for R2


Note that ρ2y|x = 0 becomes
σ Tyx Σ−1
xx σ yx
ρ2y|x = = 0,
σyy
which leads to σ yx = 0 since Σxx is positive. Further, by (2), β 1 = Σ−1 2
xx σ yx , H0 : ρy|x = 0 is
equivalent to H0 : β 1 = 0.
The F statistic for fixed x’s is given by

(β̂ X ′ y − nȳ 2 )/k
F = ′
(y ′ y − β̂ X ′ y)/(n − k − 1) (5)
R2 /k
= .
(1 − R2 )/(n − k − 1)

The test statistic in (5) can be obtained by the likelihood ratio approach in the case of random
x’s.

Theorem 5.1. If (y1 , x′1 ), (y2 , x′2 ), . . ., (yn , x′n ) is a random sample from Nk+1 (µ, Σ), the likeli-
hood ratio test for H0 : β 1 = 0 or equivalently H0 : ρ2y|x = 0 can be based on F in (5). We reject
H0 if F > Fα,k,n−k−1 .

5
When k = 1, F in (5) reduces to F = (n − 2)r2 /(1 − r2 ). Hence,

n − 2r
t= √
1 − r2
has a t-distribution with n − 2 degrees of freedom when (y, x) has a bivariate normal distribution
with ρ = 0.
If (y, x) is bivariate normal and ρ ̸= 0, then Var(r) = (1 − ρ2 )2 /n and the function

n(r − ρ)
u= ,
1 − ρ2
is approximately standard normal for large n. However, the distribution of u approaches normality
very slowly as n increases. Fisher (1921) found a function of r
1 1+r
z= log = tanh−1 (r),
2 1−r
approaches normality much faster than does u. The approximated mean and variance of z are
1 1+ρ
E(z) ≈ log ,
2 1−ρ
1
Var(z) ≈ .
n−3
We can use Fisher’s transformation to test the hypotheses such as H0 : ρ = ρ0 vs. H1 : ρ ̸= ρ0 , we
calculate
z − tanh−1 (ρ0 )
v= p ,
1/(n − 3)
which is approximately distributed as the standard normal N (0, 1).

6 Sample Partial Correlations


The population partial correlation ρij·rs···q is the correlation between yi and yj in the conditional
distribution of y given x, where yi and yj are in y and the subcripts r, s, . . . , q represent all the
variables in x:
σij·rs···q
ρij·rs···q = √ ,
σii·rs···q σjj·rs···q
where σij·rs···q is the (ij)th element of Σy·x = Cov(y|x).
To simplify exposition, we illustrate with r12·3 . The sample partial correlation of y1 and y2
with y3 held fixed is usually given as
r12 − r13 r23
r12·3 = p 2 2
,
(1 − r13 )(1 − r23 )
where r12 , r13 and r23 are the ordinary correlations between y1 and y2 , y1 and y3 , y2 and y3 ,
respectively.
Theorem 6.1. The expression for r12·3 is equal to ry1 −ŷ1 ,y2 −ŷ2 , where y1 − ŷ1 and y2 − ŷ2 are
residuals from regression of y1 on y3 and y2 on y3 .

You might also like