Wishart Distribution
Wishart Distribution
Wishart Distribution
1
If 𝑥1 , 𝑥2 , … . . 𝑥𝑛 is a random sample from 𝑁(𝜇, 𝜎 2 ), and if = (𝑛−1) ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 , then it is
(𝑛 − 1)𝑠 2⁄ 2 2
well known that ( 𝜎 2 ) ~𝜒(𝑛−1) . The multivariate analogue of (𝑛 − 1) 𝑠 is the
matrix 𝐴 = ∑𝑁
𝛼=1(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′ and is called Wishart matrix. The generalization of 𝜒
2
𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛: In a random sampling from a 𝑝 variate normal population 𝑁𝑝 (𝜇, 𝛴), the 𝑝 × 𝑝
symmetric matrix of sums of squares and cross products of deviations about the mean of the
sample observations given by
𝐴 = ∑𝑁 𝑁 ′
𝛼=1(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′ = ∑𝛼=1 𝑥𝛼 𝑥𝛼 − 𝑁𝑥̅ . 𝑥̅ ′ = 𝑋𝑋′ − 𝑁𝑥̅ . 𝑥̅ ′
is called the Wishart matrix and the distribution of the matrix 𝐴 is called Wishart distribution.
By definition of Wishart matrix, it is clear that the Wishart distribution is the joint distribution
𝑝(𝑝+1)
of elements of matrix 𝐴 as it is a symmetric matrix.
2
1 1 −1 𝐴)
|𝐴|2(𝑛−𝑝−1) 𝑒 −2𝑡𝑟(𝛴
𝑓(𝐴) = ; 𝑛 = (𝑁 − 1)
𝑛 +1−𝑖
2𝑛𝑝⁄2 𝜋 𝑝(𝑝−1)⁄4 |𝛴|𝑛⁄2 ∏𝑝𝑖=1 𝛤 ( )
2
and we write that 𝐴~ 𝑊(𝛴, 𝑁 − 1 )
distributed each following 𝑁𝑝 (0, 𝛴), be a 𝑝 𝑋 𝑝 matrix following 𝑊(𝛴, 𝑁 − 1 ), then the
𝑛
characteristic function of 𝐴 is given by 𝛷𝐴 (𝛩) = |𝐼 − 2𝑖 𝛴𝛩|− 2
Proof: Let us now consider the characteristic function of vector 𝑋
𝛷𝐴 (𝛩) = 𝐸(𝑒 𝑖 𝑡𝑟(𝐴𝛩) )
𝑁−1 ′
= 𝐸 (𝑒 𝑖 𝑡𝑟(∑𝛼=1 𝑦𝛼𝑦𝛼)𝛩 )
𝑁−1 ′
= 𝐸 (𝑒 𝑖 𝑡 ∑𝛼=1 𝑦𝛼 𝛩𝑦𝛼 )
′
= 𝐸(∏𝑁−1
𝛼=1 𝑒
𝑖 (𝑦𝛼 𝛩𝑦𝛼 )
)
′
= ∏𝑁−1
𝛼=1 𝐸(𝑒
𝑖 (𝑦𝛼 𝛩𝑦𝛼 )
) as 𝑦𝛼 are independently distributed
As 𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are identically distributed each following 𝑁𝑝 (0, 𝛴).
′ 𝑛
𝛷𝐴 (𝛩) = [𝐸(𝑒 𝑖 (𝑦 𝛩𝑦)
)] ; 𝑛 = (𝑁 − 1)
Now since 𝛩 is a 𝑝 𝑋 𝑝 real non-singular positive definite matrix, it can be reduced to a
diagonal matrix using a matrix 𝑃 such that
𝑃′𝛩𝑃 = 𝐷 = 𝑑𝑖𝑎𝑔(𝜆1 , 𝜆2 , … . . 𝜆𝑝 )
As 𝛴 is positive definite symmetric, the matrix 𝑃 is used to reduce to an identity matrix
𝑃′ 𝛴 −1 𝑃 = 𝐼
and use this matrix 𝑃 to transform vector 𝑦, using
𝑦 = 𝑃𝑧 so that 𝐸(𝑦) = 𝑃𝐸(𝑧) = 0 and
As each 𝑧𝑗 ~𝑁(0,1) ∀ 𝑗 = 1,2, … 𝑝, its square follows Chi square distribution. The
Thus,
𝑛
𝛷𝐴 (𝛩) = (∏𝑝𝑗=1(1 − 2𝑖 𝜆𝑗 ))
1 𝑛
= (|𝐼 − 2𝑖 𝐷|−2 )
Therefore,
𝑘
𝛷∑𝑘 (𝛩) = 𝐸 (𝑒 𝑖 𝑡𝑟 𝛩(∑𝑗=1 𝐴𝑗 ) )
𝑗=1 𝐴𝑗
= 𝐸 (∏ 𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )
𝑗=1
= ∏ 𝐸(𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )
𝑗=1
𝑧𝛼 ~ 𝑁𝑝 (0, 𝐻𝛴𝐻 ′ )
We also notice that 𝐻𝐴𝐻 ′ = 𝐻(∑𝑛𝛼=1 𝑦𝛼 𝑦𝛼′ )𝐻 ′ = ∑𝑛𝛼=1( 𝐻𝑦𝛼 ) (𝐻𝑦𝛼 )′ = ∑𝑛𝛼=1 𝑧𝛼 𝑧𝛼′
Marginal Distribution
Theorem 4: Let then 𝐴~𝑊(𝛴, 𝑛) and let both 𝐴 and 𝛴 be partitioned into 𝑞 and 𝑝 − 𝑞 rows
and columns as follows:
Then the marginal distribution of 𝐴11 follows Wishart Distribution with variance covariance
matrix 𝛴11 and 𝑛 degrees of freedom.
𝐼 0 𝑞𝑋1
Proof: Let us choose a matrix 𝐻 = [ ] and using it in Theorem 3 we notice
0 0 (𝑝 − 𝑞)𝑋1
that
𝐼 0 𝐴11 𝐴12 𝐼 0
𝐻𝐴𝐻 ′ = [ ][ ][ ] = 𝐴11
0 0 𝐴21 𝐴22 0 0
Similarly
𝐼 0 𝛴11 𝛴12 𝐼 0
𝐻𝛴𝐻 ′ = [ ][ ][ ] = 𝛴11
0 0 𝛴21 𝛴22 0 0
Therefore, 𝐴11 ~𝑊(𝛴11 , 𝑛) ; 𝑛 = (𝑁 − 1)
Theorem 5: Let then 𝐴~𝑊(𝛴, 𝑛) and let both 𝐴 and 𝛴 be partitioned into 𝑞 and 𝑝 − 𝑞 rows
and columns as before. Then the marginal distribution of 𝐴11.2 follows Wishart Distribution
with variance covariance matrix 𝛴11.2 and (𝑛 − (𝑝 − 𝑞)); degrees of freedom.
Generalized Variance
The multivariate analogue of the variance 𝜎 2 of a univariate distribution, apart from the
covariance matrix 𝛴 is the determinant of variance covariance matrix |𝛴| and is termed as
generalized variance of the multivariate distribution. Similarly, the generalized variance of a
sample vectors 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined as
𝑁
1 |𝐴|
|𝑆| = | ∑(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′| =
𝑁 (𝑁 − 1)𝑝
𝛼=1
distances from the points to the origin. In general, the value of |𝐴| is the sum of squares of
the volumes of all parallelotopes formed by taking as principal edges p vectors from the set
YI"'" YN'
In order to derive the distribution of the generalized variance, the following Lemma is used:
Lemma: Let 𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are independently distributed each following 𝑁𝑝 (0, 𝛴).
and let 𝐴 = ∑𝑁−1 ′
𝛼=1 𝑦𝛼 𝑦𝛼 = 𝑇 𝑇′ where 𝑇 is a lower triangular matrix 𝑇 = (𝑡𝑖𝑗 ); 𝑡𝑖𝑗 > 0, 𝑖 =
1, . . . , 𝑝, 𝑎𝑛𝑑 𝑡𝑖𝑗 = 0, 𝑖 < 𝑗. Then the elements of the matrix 𝑇 are independently distributed
with 𝑡𝑖𝑗 ~𝑁(0, 1) 𝑖 > 𝑗 and 𝑡𝑖𝑖2 follows a 𝜒 2 distribution with 𝑁 − 𝑖 degrees of freedom.