Lect 9
Lect 9
1
Theorem 2.2. The MLE of a function of one or more parameters is the same function of the
corresponding estimators; that is, if θ̂ is the MLE of the vector or matrix of parameters θ, then
g(θ̂) is the MLE of g(θ).
Example 2.1. We illustrate the use of the invariance property in Theorem 2.2 by showing that the
sample correlation matrix R is the MLE of the population correlation matrix P ρ when sampling
from the multivariate Gaussian distribution.
Theorem 2.3. If (y1 , xT1 ), . . ., (yn , xTn ) is a random sample from Nk+1 (µ, Σ), the MLEs of β0 ,
β 1 and σ 2 are given by
β̂0 = ȳ − sTyx S −1
xx x̄,
β̂ − 1 = S −1
xx syx ,
n−1 2
σ̂ 2 = s,
n
where s2 = syy − sTyx S −1
xx syx .
where ryx is the vector of correlation between y and x’s, and Rxx is the correlation matrix for the
x’s. In particular, we have
Pn
− ȳ)(xij − x̄j )
i=1 (yi
ry,xj = qP .
n 2
Pb 2
(y
i=1 i − ȳ) (x
i=1 ij − x̄ j )
R can be converted to S by
S = DRD,
√ √
where D = [diag(S)]1/2 = diag(sy , s11 , . . . , skk ). Therefore
! !
syy sTyx s2y sy r Tyx D x
S= = ,
syx S xx sy D x D x Rxx D x
√
where D x = diag(s1 , s2 , . . . , sk ) and si = sii for i = 1, 2, . . . , k. By the partition of S, we have
S xx = D x Rxx D x , syx = sy D x r yx .
2
Therefore,
β̂ 1 = S −1 −1 −1
xx syx = sy D x Rxx r yx .
σyw
ρy|x = corr(y, w) = ,
σy σw
where w is equal to E(y|x). As x varies randomly, w becomes a random variable.
It is easy to establish that
The population coefficient of determination or population squared multiple correlation ρ2y|x is given
by
σ ′yx Σ−1
xx σ yx
ρ2y|x = .
σ yx
In what follows, we list some properties of ρy|x and ρ2y|x :
3
1. ρy|x is the maximum correlation between y and any linear function of x; that is, ρy|x =
maxa ρy,a′ x .
|Σ|
ρ2y|x = 1 − ,
σyy |Σxx |
3. ρ2y|x is invariant to linear transformations on y or on the x’s; that is, if u = ay and v = Bx,
where B is nonsingular, then ρ2u|v = ρ2y|x .
Var(w)
ρ2y|x = .
Var(y)
Cov(y − w, x) = 0.
2
s′yx Sxx
−1
syx
R = .
syy
R = max ry,a′ x .
a
R2 = r Tyx R−1
xx r yx ,
where r yx and Rxx are from the samle correlation matrix R partitioned as in (4).
4
4. R2 can be obtained from R−1 :
1
R2 = 1 − ,
ryy
where ryy is the first diagonal element of R−1 .
|S| |R|
R2 = 1 − =1− ,
syy |S xx | |Rxx |
k
E(R2 ) = .
n−1
R2 is biased when ρ2y|x = 0.
7. R2 ≥ maxj ryj
2
, where ryj is an element of r yx = (ry1 , ry2 , . . . , ryk )T .
The test statistic in (5) can be obtained by the likelihood ratio approach in the case of random
x’s.
Theorem 5.1. If (y1 , x′1 ), (y2 , x′2 ), . . ., (yn , x′n ) is a random sample from Nk+1 (µ, Σ), the likeli-
hood ratio test for H0 : β 1 = 0 or equivalently H0 : ρ2y|x = 0 can be based on F in (5). We reject
H0 if F > Fα,k,n−k−1 .
5
When k = 1, F in (5) reduces to F = (n − 2)r2 /(1 − r2 ). Hence,
√
n − 2r
t= √
1 − r2
has a t-distribution with n − 2 degrees of freedom when (y, x) has a bivariate normal distribution
with ρ = 0.
If (y, x) is bivariate normal and ρ ̸= 0, then Var(r) = (1 − ρ2 )2 /n and the function
√
n(r − ρ)
u= ,
1 − ρ2
is approximately standard normal for large n. However, the distribution of u approaches normality
very slowly as n increases. Fisher (1921) found a function of r
1 1+r
z= log = tanh−1 (r),
2 1−r
approaches normality much faster than does u. The approximated mean and variance of z are
1 1+ρ
E(z) ≈ log ,
2 1−ρ
1
Var(z) ≈ .
n−3
We can use Fisher’s transformation to test the hypotheses such as H0 : ρ = ρ0 vs. H1 : ρ ̸= ρ0 , we
calculate
z − tanh−1 (ρ0 )
v= p ,
1/(n − 3)
which is approximately distributed as the standard normal N (0, 1).