007 Multivariate Linear Regression
007 Multivariate Linear Regression
Yjm εjm
be the responses and errors for the jth trial. Thus we
have an n x (r + 1) design matrix
z10 z11 z1r
z z21 z2r
Z =
20
zn0 zn1 znr
If we now set
Yn1 Yn2 Ynm
ε'1
βr1 βr2 βrm
ε11 ε12 ε1m ε'
ε ε22 ε2m 2
ε = = ε | | ε m =
21
| ε2
1
εn1 εn2 εnm
'
ε m
the multivariate linear regression model is
Y = Zβ + ε
with
E εi = 0
and
Cov εi,εk = σikI, i, k = 1, , m
Note also that the m observed responses on the jth
trial have covariance matrix
σ11 σ12 σ1m
σ σ σ
Σ =
21 22 2m
σ m1 σ m2 σ mm
^
The ordinary least squares estimates b are found in a
~
manner analogous to the univariate case – we begin
by taking
-1
ˆ
β '
Z'Yi
i = ZZ
collecting the univariate least squares estimates yields
βˆ =
-1 -1
βˆ | βˆ | | βˆ = ZZ '
Z'
Y | Y | | Y = '
ZZ Z'
Y
1 2 m 1 2 m
Now for any choice of parameters
B = b1 | b2 | | b m
the resulting matrix of errors is
Y - Zβ
The resulting Error Sums of Squares and Crossproducts
is
Y- Zb Y Y Y
' '
Y - Zb Y
'
i i i - Zbi
generalized
i.e., variance
tr
Y - ZB Y - ZB and Y - ZB Y - ZB
' '
are both minimized.
so we have matrices of predicted values
Yˆ - Zβˆ = ZZ Z Y
-1
' '
and we have a resulting matrices of residuals
Y - Yˆ
= I - Z ZZ Z'
-1
ˆ
ε =
'
Y
Note that the orthogonality conditions among
residuals, predicted values, and columns of the
design matrix which hold in the univariate case are
also true in the multivariate case because
Z I - Z ZZ Z' = Z' - Z' = 0
-1
' '
… which means the residuals are perpendicular to the
columns of the design matrix
ˆ = Z I - Z ZZ Z' = Z' - Z' = 0
-1
' ' '
Zε
Y ε = β Z I - Z ZZ Z' Y
ˆ ˆ' '
-1
'
ˆ '
= 0
Furthermore, because
Y = Yˆ + ˆε
we have
Y'Y = Yˆ'Yˆ + ˆˆ
'
εε
total sums of predicted sums
residual (error)
squares and of squares and sums of squares
crossproducts crossproducts and crossproducts
Example – suppose we had the following six sample
observations on two independent variables (palatability
and texture) and two dependent variables (purchase
intent and overall quality):
Overall Purchase
Palatability Texture
Quality Intent
65 71 63 67
72 77 70 70
77 73 72 70
68 78 75 72
81 76 89 88
73 87 76 77
and
jointly.
-1
ˆ
β1 = ZZ Z'y 1
'
62.560597030 -0.378268027 -0.453330568 445
= -0.378268027 0.005988412 -0.000738830 32536
-0.453330568 -0.000738830 0.006584661 34345
-37.501205460
= 1.134583728
0.379499410
and 67
70
1 1 1 1 1 1 444
70
Z'y2 = 65 72 77 68 81 73 = 32430
72
71 77 73 78 76 87 34260
88
so 77
-1
ˆ
β2 = ZZ Z'y 2
'
62.560597030 -0.378268027 -0.453330568 444
= -0.378268027 0.005988412 -0.000738830 32430
-0.453330568 -0.000738830 0.006584661 34260
-21.432293350
= 0.940880634
0.351449792
so
-37.501205460 -21.432293350
ˆ
β = β
ˆ |β ˆ = 1.134583728 0.940880634
1 2
0.379499410 0.351449792
- E β
ˆ = β i.e., E βˆ = βˆ
i
i
ˆ
- Cov β , β ˆ
i k = σik
'
ZZ
-1
1, , m
, i, k =
- E ˆ 1 '
ε = 0 and E ˆˆ
εε = Σ-1
n - r - 1
if the model is of full rank, i.e., rank(Z)= r + 1 < n.
~
Note that e and b are also uncorrelated.
~ ~
This means that, for any observation z0
~
ˆ = z'0 βˆ | βˆ | | βˆ = z'0βˆ | z'0βˆ | | z'0βˆ
z'0β
1 2 m 1 2 m
is an unbiased estimator, i.e.,
E z'0βˆ = z'0β
We can also determine from these properties that the
estimation errors
ˆ
z'0βi - z'0βi
have covariances
'
'
E z0 βi - β ˆ β - ˆ
β z
i i i 0
'
'
-1
ˆ ˆ ' '
= z0 E βi - βi βi - βi z0 = σikz0 ZZ z0
Furthermore, we can easily ascertain that
ˆ = Yˆ0
z'0β
^
i.e., the forecasted vector Y0 associated with the values
~
of the predictor variables z~0 is an unbiased estimator
of Y~ 0.
E Y0i - z'0βˆ
i '
Y0k - z β
0
k ik 0
ˆ = σ 1 + z' ZZ
'
-1
z0
Thus, for the multivariate regression model with full
rank (Z) = r + 1, n r + 1 + m, and normally
~
distributed errors ~e,
βˆ =
-1
'
ZZ Z'Y
is the maximum likelihood estimator of b and
~
βˆ ~
N
βˆ, Σ
Cov β
ˆ ,β
i
ˆ
k = σ ZZ
ik
'
-1
, i, k = 1, , m
^
Also, the maximum likelihood estimator of b is
~
independent of the maximum likelihood estimator of
the positive definite matrix S
~
given by
'
ˆΣ = 1 εε
'
=
1
Y - Zβˆ Y - Zβˆ
n n
and
ˆ ~ Wp,n-r-1 Σ
nΣ
all of which provide additional support for using the
least squares estimate – when the errors are normally
distributed
ˆ and n-1εε
β '
are the maximum likelihood estimators of
β and Σ
These results can be used to develop likelihood ratio
tests for the multivariate regression parameters.
Z = Z1 | Z2
m x (q + 1) m x (r - q)
we can write the general model as
β1
E Y = Zβ = Z1 | Z2 = Z1β1 + Z2β2
β
2
^
The extra sum of squares associated with b(2) are
~
= n ˆΣ - ˆΣ
' '
ˆ Y - Z1β
Y - Z1β ˆ - ˆ Y - Zβ
Y - Zβ ˆ
1 1 1
where
ˆ
-1
β 1 = ZZ '
1 1 Z'1Y
and
'
ˆ
Σ1 = n-1 ˆ
Y - Z1β ˆ
Y - Z1β
1 1
The likelihood ratio for the test of the hypothesis
H0:b(2) = 0
~ ~
max L β = = ˆΣ
1,Σ
Lˆ , ˆΣ1
β1
n2
β ,Σ
1
Λ =
max L β, Σ ˆΣ ˆ, ˆΣ
L β 1
β,Σ
which is often converted to Wilks’ Lambda statistic
ˆ
Σ
Λ2 n =
ˆ
Σ1
Finally, for the multivariate regression model with
full rank (Z)~ = r + 1, n r + 1 + m, normally
distributed errors ~e, and the null hypothesis is true
^ ^
(so n(S~ 1 – S)
~
~ Wq,r-q(S))
~
ˆ
Σ
1
- n - r - 1 - m - r + q + 1 ln ~ χ2m r-q
2 ˆ
Σ1
when n – r and n – m are both large.
If we again refer to the Error Sum of Squares and
Crossproducts as
^
E
~
= nS
~
ˆ
Σ E s
1
Λ 2n
= =
ˆ
Σ1 E+H
=
i =1 1 + ηi
where h1 h2 hs are the ordered eigienvalues of
HE
~~
-1
where s = min(p, r - q).
There are other similar tests (as we have seen in our
discussion of MANOVA):
s
ηi
= tr H H + E
-1
Pillai’s Trace
i=1 1 + ηi
s
Hotelling-Lawley Trace
i=1
ηi = tr
HE-1
max L β = = ˆΣ
2,Σ
L ˆ , ˆΣ2
β2
n2
β ,Σ
2
Λ =
max L β, Σ ˆ, ˆΣ
ˆ
L βΣ 2
β,Σ
For ease of computation, we’ll use the Wilks’ lambda
statistic
ˆ
Σ E
Λ2 n = =
ˆ
Σ2 E+H
The error sum of squares and crossproducts matrix is
114.31302415 99.335143683
E =
99.335143683 108.5094298
214.96186763 178.26225891
H =
178.26225891 147.82823253
so the calculated value of the Wilks’ lambda statistic is
2n
E
Λ =
E+H
114.31302415 99.335143683
99.335143683 108.5094298
=
114.31302415 99.335143683 214.96186763 178.26225891
99.335143683 108.5094298 + 178.26225891 147.82823253
2536.570299
= = 0.34533534
7345.238098
The transformation to a Chi-square distributed
statistic (which is actually valid only when n – r and n
– m are both large) is
ˆ
Σ
1
- n - r - 1 - m - r + q + 1 ln
2 ˆΣ1
1
= - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.34533534
2
= 0.92351795
max L β = = ˆΣ
1,Σ
L ˆ , ˆΣ1
β1
n2
β ,Σ
1
Λ =
max L β, Σ ˆ, ˆΣ
ˆ
L βΣ 1
β,Σ
For ease of computation, we’ll use the Wilks’ lambda
statistic
ˆ
Σ E
Λ2 n = =
ˆ
Σ1 E+H
The error sum of squares and crossproducts matrix is
114.31302415 99.335143683
E =
99.335143683 108.5094298
21.872015222 20.255407498
H =
20.255407498 18.758286731
so the calculated value of the Wilks’ lambda statistic is
2n
E
Λ =
E+H
114.31302415 99.335143683
99.335143683 108.5094298
=
114.31302415 99.335143683 21.872015222 20.255407498
99.335143683 108.5094298 + 20.255407498 18.758286731
2536.570299
= = 0.837135598
3030.059055
The transformation to a Chi-square distributed
statistic (which is actually valid only when n – r and n
– m are both large) is
ˆ
Σ
1
- n - r - 1 - m - r + q + 1 ln
2 ˆΣ1
1
= - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.837135598
2
= 0.15440838
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 256.5203092 128.2601546 3.37 0.1711
Error 3 114.3130241 38.1043414
Corrected Total 5 370.8333333
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept -37.50120546 48.82448511 -0.77 0.4984
z1 1.13458373 0.47768661 2.38 0.0980
z2 0.37949941 0.50090335 0.76 0.5037
SAS output for a Multivariate Linear Regression Analysis:
Dependent Variable: y2 Purchase Intent
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 181.4905702 90.7452851 2.51 0.2289
Error 3 108.5094298 36.1698099
Corrected Total 5 290.0000000
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept -21.43229335 47.56894895 -0.45 0.6829
z1 0.94088063 0.46540276 2.02 0.1364
z2 0.35144979 0.48802247 0.72 0.5235
SAS output for a Multivariate Linear Regression Analysis:
The GLM Procedure
Multivariate Analysis of Variance
Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
DF = 3 y1 y2
y1 1.000000 0.891911
0.1081
y2 0.891911 1.000000
0.1081
SAS output for a Multivariate Linear Regression Analysis:
The GLM Procedure
Multivariate Analysis of Variance
β z0 ~ N m β z0, z0 ZZ z0Σ
ˆ' ' ' '
-1
and independent
ˆ ~ Wn-r-1 Σ
nΣ
so '
ˆ' 'z ˆ' 'z
β z0 - β 0 n
-1
β z0 - β 0
2
T = ˆ
Σ
n - r - 1
-1 -1
z0 ZZ z0
' '
z0 ZZ z0
' '
Thus the 100(1 – a)% confidence interval for the
predicted mean value of Y0 associated with z~ 0 (b’z0) is
~ ~~
given by
z ZZ m n - r - 1
' -1
βˆ'z0 - β'z0 n
βˆ'z0 - β'z0 Fm,n-r- m α
-1
ˆ
Σ
'
0
'
z0
n - r - 1 n-r-m
m n - r - 1
Fm,n-r- m α z ZZ n
-1
'
zβˆ ' '
z0 σˆii
0 i 0
n-r-m n-r-1
i = 1,…,m
Finally, we can build prediction intervals for the
predicted value of Y~ 0 associated with ~z0 – here the
prediction error
Y = Zβˆ+ε
has normal errors, then
β z0 ~ N m β z0, z0 ZZ z0Σ
ˆ' ' ' '
-1
and independent
ˆ ~ Wn-r-1 Σ
nΣ
so '
ˆ' 'z ˆ' 'z
β z0 - β 0 n
-1
β z0 - β 0
2
T = ˆ
Σ
n - r - 1
-1 -1
z0 ZZ z0
' '
z0 ZZ z0
' '
the prediction intervals the 100(1 – a)% prediction
interval associated with ~z0 is given by
1 + z ZZ m n - r - 1
' -1
ˆ'z0 n ˆ'z0 -1
Y0 - β ˆ
Σ Y0 - β '
0
'
z0 Fm,n-r- m α
n - r - 1 n-r-m
m n - r - 1 n
-1
zβˆ
'
0i Fm,n-r- m α 1 + z0 ZZ z0
' '
σˆii
n-r-m n-r-1
i = 1,…,m