100% found this document useful (1 vote)
382 views6 pages

Wishart Distribution

The document discusses the Wishart distribution. Some key points: 1) The Wishart distribution is the multivariate generalization of the chi-squared distribution and describes the distribution of the sample covariance matrix from a multivariate normal distribution. 2) The probability density function of a Wishart matrix A is provided. 3) Properties of Wishart matrices are examined, including the characteristic function, sums of Wishart matrices, and transformations of Wishart matrices.

Uploaded by

ita purnamasari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
382 views6 pages

Wishart Distribution

The document discusses the Wishart distribution. Some key points: 1) The Wishart distribution is the multivariate generalization of the chi-squared distribution and describes the distribution of the sample covariance matrix from a multivariate normal distribution. 2) The probability density function of a Wishart matrix A is provided. 3) Properties of Wishart matrices are examined, including the characteristic function, sums of Wishart matrices, and transformations of Wishart matrices.

Uploaded by

ita purnamasari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Wishart Distribution

1
If 𝑥1 , 𝑥2 , … . . 𝑥𝑛 is a random sample from 𝑁(𝜇, 𝜎 2 ), and if = (𝑛−1) ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 , then it is

(𝑛 − 1)𝑠 2⁄ 2 2
well known that ( 𝜎 2 ) ~𝜒(𝑛−1) . The multivariate analogue of (𝑛 − 1) 𝑠 is the
matrix 𝐴 = ∑𝑁
𝛼=1(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′ and is called Wishart matrix. The generalization of 𝜒
2

distribution in the multivariate case is called the Wishart Distribution.

𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛: In a random sampling from a 𝑝 variate normal population 𝑁𝑝 (𝜇, 𝛴), the 𝑝 × 𝑝
symmetric matrix of sums of squares and cross products of deviations about the mean of the
sample observations given by

𝐴 = ∑𝑁 𝑁 ′
𝛼=1(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′ = ∑𝛼=1 𝑥𝛼 𝑥𝛼 − 𝑁𝑥̅ . 𝑥̅ ′ = 𝑋𝑋′ − 𝑁𝑥̅ . 𝑥̅ ′

is called the Wishart matrix and the distribution of the matrix 𝐴 is called Wishart distribution.

𝑎11 𝑎12 … 𝑎1𝑁


𝑎 𝑎22 … 𝑎2𝑁
𝐴 = [ 21⋮ ]
⋮ ⋱ ⋮
𝑎𝑝1 𝑎𝑝1 … 𝑎𝑝𝑁
(𝑝 𝑋 𝑝)

By definition of Wishart matrix, it is clear that the Wishart distribution is the joint distribution
𝑝(𝑝+1)
of elements of matrix 𝐴 as it is a symmetric matrix.
2

Probability Density Function of Wishart Distribution

In Lemma 2, we also demonstrated that 𝐴 = ∑𝑁−1 ′


𝛼=1 𝑦𝛼 𝑦𝛼 and we have proved in the Theorem

that the distribution of 𝐴 = 𝑁𝛴̂ is same as the distribution of ∑𝑁−1 ′


𝛼=1 𝑦𝛼 𝑦𝛼 where 𝑦𝛼 ; 𝛼 =

1, 2, … (𝑁 − 1) are independently distributed each following 𝑁𝑝 (0, 𝛴).

The probability density function of Wishart matrix 𝐴 is given by

1 1 −1 𝐴)
|𝐴|2(𝑛−𝑝−1) 𝑒 −2𝑡𝑟(𝛴
𝑓(𝐴) = ; 𝑛 = (𝑁 − 1)
𝑛 +1−𝑖
2𝑛𝑝⁄2 𝜋 𝑝(𝑝−1)⁄4 |𝛴|𝑛⁄2 ∏𝑝𝑖=1 𝛤 ( )
2
and we write that 𝐴~ 𝑊(𝛴, 𝑁 − 1 )

Corollary: Given 𝑝 variate random vectors 𝑥1 , 𝑥2 , … , 𝑥𝑁 from a 𝑝 variate normal


distribution 𝑁𝑝 (𝜇, 𝛴), the distribution of sample variance covariance matrix 𝑆 is
1
𝑊 ((𝑁−1) 𝛴, 𝑁 − 1 )
The Characteristic Function

To determine the characteristic function of a 𝑝 𝑋 𝑝 matrix 𝐴, is given by


𝛷𝐴 (𝛩) = 𝐸(𝑒 𝑖 𝑡𝑟(𝐴𝛩) )
where 𝛩 is a a 𝑝 𝑋 𝑝 real valued positive definite matrix.
Theorem 1: Let 𝐴 = ∑𝑁−1 ′
𝛼=1 𝑦𝛼 𝑦𝛼 , where 𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are independently

distributed each following 𝑁𝑝 (0, 𝛴), be a 𝑝 𝑋 𝑝 matrix following 𝑊(𝛴, 𝑁 − 1 ), then the
𝑛
characteristic function of 𝐴 is given by 𝛷𝐴 (𝛩) = |𝐼 − 2𝑖 𝛴𝛩|− 2
Proof: Let us now consider the characteristic function of vector 𝑋
𝛷𝐴 (𝛩) = 𝐸(𝑒 𝑖 𝑡𝑟(𝐴𝛩) )
𝑁−1 ′
= 𝐸 (𝑒 𝑖 𝑡𝑟(∑𝛼=1 𝑦𝛼𝑦𝛼)𝛩 )
𝑁−1 ′
= 𝐸 (𝑒 𝑖 𝑡 ∑𝛼=1 𝑦𝛼 𝛩𝑦𝛼 )

= 𝐸(∏𝑁−1
𝛼=1 𝑒
𝑖 (𝑦𝛼 𝛩𝑦𝛼 )
)

= ∏𝑁−1
𝛼=1 𝐸(𝑒
𝑖 (𝑦𝛼 𝛩𝑦𝛼 )
) as 𝑦𝛼 are independently distributed
As 𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are identically distributed each following 𝑁𝑝 (0, 𝛴).
′ 𝑛
𝛷𝐴 (𝛩) = [𝐸(𝑒 𝑖 (𝑦 𝛩𝑦)
)] ; 𝑛 = (𝑁 − 1)
Now since 𝛩 is a 𝑝 𝑋 𝑝 real non-singular positive definite matrix, it can be reduced to a
diagonal matrix using a matrix 𝑃 such that
𝑃′𝛩𝑃 = 𝐷 = 𝑑𝑖𝑎𝑔(𝜆1 , 𝜆2 , … . . 𝜆𝑝 )
As 𝛴 is positive definite symmetric, the matrix 𝑃 is used to reduce to an identity matrix
𝑃′ 𝛴 −1 𝑃 = 𝐼
and use this matrix 𝑃 to transform vector 𝑦, using
𝑦 = 𝑃𝑧 so that 𝐸(𝑦) = 𝑃𝐸(𝑧) = 0 and

𝑉(𝑧) = 𝑉(𝑃−1 𝑦) = 𝑃−1 𝛴(𝑃−1 )′ = (𝑃′ 𝛴 −1 𝑃)−1 = 𝐼

Therefore, 𝑧~𝑁𝑝 (0, 𝐼) . Using


′ 𝑃′𝛩𝑃𝑧) 𝑛 ′ 𝐷𝑧) 𝑛
𝛷𝐴 (𝛩) = [𝐸(𝑒 𝑖 (𝑧 )] = [𝐸(𝑒 𝑖 (𝑧 )]
𝜆1 0 ⋯ 0 𝑧1
⋯ 𝑧𝑝 ] [ 0 𝜆2 ⋯ 0 ] [𝑧2 ] = ∑𝑝 𝜆 𝑧 2
Consider 𝑧 ′ 𝐷𝑧 = [𝑧1 𝑧2 ⋮ 𝑗=1 𝑗 𝑗
⋮ ⋮ ⋱ ⋮
0 0 ⋯ 𝜆𝑝 𝑧𝑝
𝑝 𝑛
𝑖 (∑𝑗=1 𝜆𝑗 𝑧𝑗2 )
𝛷𝐴 (𝛩) = [𝐸 (𝑒 )]
2 𝑛
𝑝
= [𝐸 (∏𝑗=1 𝑒 𝑖 𝜆𝑗𝑧𝑗 )]
2 𝑛
= [∏𝑝𝑗=1 𝐸(𝑒 𝑖 𝜆𝑗𝑧𝑗 )]
𝑛
= (∏𝑝𝑗=1 𝜑𝑧 2 (𝜆𝑗 ))
𝑗

As each 𝑧𝑗 ~𝑁(0,1) ∀ 𝑗 = 1,2, … 𝑝, its square follows Chi square distribution. The

characteristic function of 𝑧𝑗2 is given by 𝜑𝑧 2 (𝜆𝑗 ) = (1 − 2𝑖 𝜆𝑗 )


𝑗

Thus,
𝑛
𝛷𝐴 (𝛩) = (∏𝑝𝑗=1(1 − 2𝑖 𝜆𝑗 ))
1 𝑛
= (|𝐼 − 2𝑖 𝐷|−2 )

Using 𝑃′𝛩𝑃 = 𝐷 𝑎𝑛𝑑 𝑃′ 𝛴 −1 𝑃 = 𝐼 in above, we can write


𝑛
𝛷𝐴 (𝛩) = (|𝑃′ 𝛴 −1 𝑃 − 2𝑖 𝑃′𝛩𝑃|)−
𝑛
= (|𝛴 −1 − 2𝑖 𝛩||𝑃𝑃′ |)−2
𝑛
= (|𝛴 −1 − 2𝑖 𝛩||𝛴|)−2
𝑛
= (|𝐼 − 2𝑖 𝛴 𝛩|)− 2 ; 𝑛 = (𝑁 − 1)

Sum of Wishart Matrices


Theorem 2: Let 𝐴1 , 𝐴2 , … . 𝐴𝑘 be 𝑝𝑋 𝑝 matrices each following independent Wishart
𝑗 𝑛
distributions , i.e 𝐴𝑗 ~𝑊(𝛴, 𝑛𝑗 ) where each 𝐴𝑗 = ∑𝛼=1 𝑦𝛼 𝑦𝛼′ ; 𝑗 = 1, 2, … . 𝑘 and 𝑦𝛼 ~𝑁𝑝 (0,
𝛴), then ∑𝑘𝑗=1 𝐴𝑗 ~𝑊(𝛴, ∑𝑘𝑗=1 𝑛𝑗 ).

Proof: As 𝐴𝑗 ~𝑊(𝛴, 𝑛𝑗 ), its characteristic function is given by


𝑛𝑗
𝛷𝐴𝑗 (𝛩) = (|𝐼 − 2𝑖 𝛴 𝛩|)− 2 ; 𝑗 = 1, 2, … . 𝑘

Therefore,
𝑘
𝛷∑𝑘 (𝛩) = 𝐸 (𝑒 𝑖 𝑡𝑟 𝛩(∑𝑗=1 𝐴𝑗 ) )
𝑗=1 𝐴𝑗

= 𝐸 (∏ 𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )
𝑗=1

= ∏ 𝐸(𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )
𝑗=1

= ∏𝑘𝑗=1 𝐸(𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )


𝑛𝑗
= ∏𝑘𝑗=1(|𝐼 − 2𝑖 𝛴 𝛩|)− 2
∑𝑘
𝑗=1 𝑛𝑗

= (|𝐼 − 2𝑖 𝛴 𝛩|) 2 ; 𝑛𝑗 = (𝑁𝑗 − 1 )

Transformation of Wishart Matrix

Theorem 3: Let 𝐴 = ∑𝑁−1 ′


𝛼=1 𝑦𝛼 𝑦𝛼 ,where𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are independently

distributed each following 𝑁𝑝 (0, 𝛴). If 𝑌 → 𝑍 using a 𝑝 𝑋 𝑝 non-singular matrix 𝐻 so that


𝑧𝛼 = 𝐻𝑦𝛼 ; 𝛼 = 1, 2, … 𝑛 ; 𝑛 = (𝑁 − 1), then 𝐻𝐴𝐻 ′ ~𝑊(𝐻𝛴𝐻 ′ , 𝑛).

Proof: 𝐸(𝑧𝛼 ) = 𝐻𝐸(𝑦𝛼 ) = 0

𝑉(𝑧𝛼 ) = 𝑉(𝐻𝑦𝛼 ) = 𝐻𝛴𝐻 ′

𝑧𝛼 ~ 𝑁𝑝 (0, 𝐻𝛴𝐻 ′ )

We also notice that 𝐻𝐴𝐻 ′ = 𝐻(∑𝑛𝛼=1 𝑦𝛼 𝑦𝛼′ )𝐻 ′ = ∑𝑛𝛼=1( 𝐻𝑦𝛼 ) (𝐻𝑦𝛼 )′ = ∑𝑛𝛼=1 𝑧𝛼 𝑧𝛼′

Hence 𝐻𝐴𝐻 ′ = ∑𝑛𝛼=1 𝑧𝛼 𝑧𝛼′ ~𝑊(𝐻𝛴𝐻 ′ , 𝑛)

Marginal Distribution

Theorem 4: Let then 𝐴~𝑊(𝛴, 𝑛) and let both 𝐴 and 𝛴 be partitioned into 𝑞 and 𝑝 − 𝑞 rows
and columns as follows:

𝐴 𝐴12 𝑞 𝛴11 𝛴12


𝐴 = [ 11 ] (𝑝 − 𝑞) 𝛴=[ ]
𝐴21 𝐴22 𝛴21 𝛴22

Then the marginal distribution of 𝐴11 follows Wishart Distribution with variance covariance
matrix 𝛴11 and 𝑛 degrees of freedom.

𝐼 0 𝑞𝑋1
Proof: Let us choose a matrix 𝐻 = [ ] and using it in Theorem 3 we notice
0 0 (𝑝 − 𝑞)𝑋1
that

𝐼 0 𝐴11 𝐴12 𝐼 0
𝐻𝐴𝐻 ′ = [ ][ ][ ] = 𝐴11
0 0 𝐴21 𝐴22 0 0

Similarly

𝐼 0 𝛴11 𝛴12 𝐼 0
𝐻𝛴𝐻 ′ = [ ][ ][ ] = 𝛴11
0 0 𝛴21 𝛴22 0 0
Therefore, 𝐴11 ~𝑊(𝛴11 , 𝑛) ; 𝑛 = (𝑁 − 1)

Corollary 1: All the diagonal submatrices of matrix 𝐴 have Wishart distribution.


1 1
Corollary 2: 𝛴 −2 𝐴𝛴 −2 ~𝑊(𝐼, 𝑛)

Theorem 5: Let then 𝐴~𝑊(𝛴, 𝑛) and let both 𝐴 and 𝛴 be partitioned into 𝑞 and 𝑝 − 𝑞 rows
and columns as before. Then the marginal distribution of 𝐴11.2 follows Wishart Distribution
with variance covariance matrix 𝛴11.2 and (𝑛 − (𝑝 − 𝑞)); degrees of freedom.

Generalized Variance

The multivariate analogue of the variance 𝜎 2 of a univariate distribution, apart from the
covariance matrix 𝛴 is the determinant of variance covariance matrix |𝛴| and is termed as
generalized variance of the multivariate distribution. Similarly, the generalized variance of a
sample vectors 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined as

𝑁
1 |𝐴|
|𝑆| = | ∑(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′| =
𝑁 (𝑁 − 1)𝑝
𝛼=1

The generalized variance is also a measure of spread. A geometric interpretation of the


sample generalized variance comes from considering the 𝑝 rows of matrix 𝑋 as 𝑝 vectors in
𝑁-dimensional

space. Geometrically the value of |𝐴| may be interpreted in terms of 𝑁 points in a 𝑝


dimensional space. Let the columns of the matrix 𝑋 be written as points in 𝑝 dimensional
space as follows:

𝑥11 𝑥12 … 𝑥1𝑁


𝑥21 𝑥22 … 𝑥2𝑁
𝑋=[ ⋮ ] = [𝒙𝟏 𝒙𝟐 ⋯ 𝒙𝑵 ]
⋮ ⋱ ⋮
𝑥𝑝1 𝑥𝑝1 … 𝑥𝑝𝑁
(𝑝 𝑋 𝑁)

Clearly when 𝑝 = 1, the value of |𝐴| = ∑𝑁 2


𝛼=1(𝑥𝛼 − 𝑥̅ ) which is the sum of squares of the

distances from the points to the origin. In general, the value of |𝐴| is the sum of squares of
the volumes of all parallelotopes formed by taking as principal edges p vectors from the set
YI"'" YN'

Distribution of Generalized Variance

In order to derive the distribution of the generalized variance, the following Lemma is used:
Lemma: Let 𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are independently distributed each following 𝑁𝑝 (0, 𝛴).
and let 𝐴 = ∑𝑁−1 ′
𝛼=1 𝑦𝛼 𝑦𝛼 = 𝑇 𝑇′ where 𝑇 is a lower triangular matrix 𝑇 = (𝑡𝑖𝑗 ); 𝑡𝑖𝑗 > 0, 𝑖 =

1, . . . , 𝑝, 𝑎𝑛𝑑 𝑡𝑖𝑗 = 0, 𝑖 < 𝑗. Then the elements of the matrix 𝑇 are independently distributed
with 𝑡𝑖𝑗 ~𝑁(0, 1) 𝑖 > 𝑗 and 𝑡𝑖𝑖2 follows a 𝜒 2 distribution with 𝑁 − 𝑖 degrees of freedom.

You might also like