Solution To Exercises On MVN: 1 Question 1 (I)
Solution To Exercises On MVN: 1 Question 1 (I)
Richard Gill
March 16, 2009
Abstract
This is the solution to the two exercises on the multivariate normal
distribution in my lecture notes for Statistics II.
Question 1 (i)
Question 1 (ii)
Question 1 (iii)
Suppose now is non-singular. As in part (ii) we can construct the distribution of X as X = AV + where A is n n and of full rank, = AA> , and
V Nn (0, I). By the formula for transformation of multivariate densities
under smooth transformations, X has a probability density on Rn , which is
1
1
2
equal to det(A)1 (2)n/2 e 2 kA (x)k . A simple rewriting
produces the
1
21
> 1
required formula det(2) exp 2 (x ) (x ) .
).
Finally,
if
X
N
(,
)
then
X
has
the
same
distribution
as
11 12
the just constructed Y . It follows that X1 N (1 , 11 ), whilst conditional
on X1 = x1 , X2 N (2 + 21
11 (x1 1 ), 22 21 11 12 ).
To return to the question of whether or not 22 21
11 12 is a covariance
matrix, suppose Y N (0, ) and consider the problem of determining A21
to minimize EkY2 A21 Y1 k2 . The choice A21 = 21
11 makes Y2 A21 Y1
uncorrelated with Y1 . So replacing A21 with anything else, if anything only
makes this expected sum of squares larger, and the given choice solves the
minimization problem. Anyway, by direct computation again, the variance of
Y2 A21 Y1 with the optimal choice of A21 is nothing else than 22 21
11 12 ,
which consequently is indeed a covariance matrix (i.e., a positive semi-definite
symmetric matrix).
Concluding remarks
For those who are not familiar with characteristic functions, let me remark
that by Fourier theory, very many functions g can be expressed
R as the Fourier
transform of another function gb, by the formula g(x) = t= gb(t)eitx dx.
2
Therefore if we know X (t) R=REei tX for all t, and if g has the expression
just given, then Eg(X) = x t gb(t)eRitx dtPX (dx). If we can exchange the
two integrations, we have Eg(X) = gb(t)X (t)dt. Thus knowledge of the
characteristic function of the distribution of X entails knowledge of the expectation of a very large class of functions of X. If the indicators of intervals
are in this class (or if they can be approximated arbitrarily well by functions
in this class) then the probabilities of X lying in any interval are determined,
thus the distribution of X is determined.
All this can be made precise and the result is indeed that the characteristic
function of the probability distribution of a real random variable does indeed
characterise that distribution. The natural generalization to random vectors
is also true.
Alternatively one can prove all the statements in the exercises without any
use of characteristic functions. Probably one should then start by computing
the probability density of the vector X = AV + b in the case when A is
non-singular. In particular, if A is an orthonormal matrix, A> A = AA> = I,
one obtains that X = AV has the same distribution as V .
In general, if some rows of A are linear combinations of others, one can
delete some elements of X, since these are affine functions of others. This
brings us to the case X = AV + b where A is an n p matrix of rank n. So
n p. If n = p we are done. If on the other hand n < p we should apply
an orthonormal transformation B to V , writing X = AB > BV such that the
last p n columns of AB > are identically equal to zero, so that we can delete
superfluous elements from BV and superfluous columns from AB > . This is
the case exactly when the n rows of A are orthogonal to the last p n rows
of B. Thus the orthonormal matrix B has to be such that its last p n rows
span the null space of A. And that is easy to arrange.