0% found this document useful (0 votes)

281 views59 pages

The Multivariate Normal Distribution: Exactly Central Limit

The document discusses the multivariate normal distribution. It begins by introducing the multivariate normal distribution as a generalization of the univariate normal distribution to multiple dimensions. It then provides the probability density function for the multivariate normal distribution. Finally, it discusses some key properties of the multivariate normal distribution, including that linear combinations of components are normally distributed and subsets are also normally distributed.

Uploaded by

Victor Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

281 views59 pages

The Multivariate Normal Distribution: Exactly Central Limit

Uploaded by

Victor Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

3.

The Multivariate Normal Distribution

3.1 Introduction

• A generalization of the familiar bell shaped normal density to several

dimensions plays a fundamental role in multivariate analysis

• While real data are never exactly multivariate normal, the normal density
is often a useful approximation to the “true” population distribution because
of a central limit effect.

• One advantage of the multivariate normal distribution stems from the fact
that it is mathematically tractable and “nice” results can be obtained.

1
To summarize, many real-world problems fall naturally within the framework
of normal theory. The importance of the normal distribution rests on its dual
role as both population model for certain natural phenomena and approximate
sampling distribution for many statistics.

2
3.2 The Multivariate Normal density and Its Properties

• Recall that the univariate normal distribution, with mean µ and variance σ 2,
has the probability density function

1 2
f (x) = √ e−[(x−µ)/σ] /2
−∞<x<∞
2πσ 2

• The term 2
x−µ
= (x − µ)(σ 2)−1(x − µ)
σ

• This can be generalized for p × 1 vector x of observations on serval variables

as
(x − µ)0Σ−1(x − µ)
The p × 1 vector µ represents the expected value of the random vector X,
and the p × p matrix Σ is the variance-covariance matrix of X.

3
• A p-dimensional normal density for the random vector X 0 = [X1, X2, . . . , Xp]
has the form

1 −(x−µ)0 Σ−1 (x−µ)/2

f (x) = e
(2π)p/2|Σ|1/2

where −∞ < xi < ∞, i = 1, 2, . . . , p. We should denote this p-dimensional

normal density by Np(µ, Σ).

4
Example 3.1 (Bivariate normal density) Let us evaluate the p = 2 variate
normal density in terms of the individual parameters µ1 = E(X1), µ2 =
√ √
E(X2), σ11 = Var(X1), σ22 = Var(X2), and ρ12 = σ12/( σ11 σ22) =
Corr(X1, X2).

Result 3.1 If Σ is positive definite, so that Σ−1 exists, then

1
Σe = λe implies Σ−1e = e
λ

so (λ, e) is an eigenvalue-eigenvector pair for Σ corresponding to the pair

(1/λ, e) for Σ−1. Also Σ−1 is positive definite.

5
6
Constant probability density contour

= { all x such that (x − µ)0Σ−1(x − µ) = c2}

= surface of an ellipsoid centered at µ.

Contours of constant density for the p-dimensional normal distribution are

ellipsoids defined by x such the that

(x − µ)0Σ−1(x − µ) = c2
√
These ellipsoids are centered at µ and have axes ±c λiei, where Σei = λi for
i = 1, 2, . . . , p.

7
Example 4.2 (Contours of the bivariate normal density) Obtain the axes
of constant probability density contours for a bivariate normal distribution when
σ11 = σ22

8
The solid ellipsoid of x values satisfying

(x − µ)0Σ−1(x − µ) ≤ χ2p(α)

has probability 1−α where χ2p(α) is the upper (100α)th percentile of a chi-square
distribution with p degrees of freedom.

9
Additional Properties of the Multivariate Normal
Distribution
The following are true for a normal vector X having a multivariate normal
distribution:

1. Linear combination of the components of X are normally distributed.

2. All subsets of the components of X have a (multivariate) normal distribution.

3. Zero covariance implies that the corresponding components are independently

distributed.

4. The conditional distributions of the components are normal.

10
Result 3.2 If X is distributed as Np(µ, Σ), then any linear combination of
variables a0X = a1X1 + a2X2 + · · · + apXp is distributed as N (a0µ, a0Σa). Also
if a0X is distributed as N (a0µ, a0Σa) for every a, then X must be Np(µ, Σ).

Example 3.3 (The distribution of a linear combination of the component

of a normal random vector) Consider the linear combination a0X of a
multivariate normal random vector determined by the choice a0 = [1, 0, . . . , 0].

Result 3.3 If X is distributed as Np(µ, Σ), the q linear combinations

 
a11X1 + · · · + a1pXp
 a21X1 + · · · + a2pXp 
A(q×p)Xp×1 =  .. 

aq1X1 + · · · + aqpXp
are distributed as Nq (Aµ, AΣA0). Also X p×1 + dp×1, where d is a vector of
constants, is distributed as Np(µ + d, Σ).

11
Example 3.4 (The distribution of two linear combinations of the
components of a normal random vector) For X distributed as N3(µ, Σ),
find the distribution of
 
X1
X1 − X2 1 −1 0  X2  = AX
=
X2 − X3 0 1 −1
X3

12
Result 3.4 All subsets of X are normally distributed. If we respectively partition
X, its mean vector µ, and its covariance matrix Σ as
   
X1 µ1

 (q × 1) 


 (q × 1) 

X (p×1) =
 ······ 
 µ(p×1) =
 ······ 

 X2   µ2 
(p − q) × 1 (p − q) × 1

and  
Σ11 Σ12

 (q × 1) (q × (p − q)) 

Σ(p×p) =  ······ ······ 

 Σ21 Σ22 
((p − q) × q) ((p − q) × (p − q))
then X 1 is distributed as Nq (µ1, Σ11).

Example 3.5 (The distribution of a subset of a normal random vector)

If X is distributed as N5(µ, Σ), find the distribution of [X2, X4]0.

13
Result 3.5

(a) If X 1 and X 2 are independent, then Cov(X 1, X 2) = 0, a q1 × q2 matrix of

zeros, where X 1 is q1 × 1 random vector and X 2 is q2 × 1. random vector

X1 µ1 Σ11 Σ12
(b) If is Nq1+q2 , , then X 1 and X 2 are
X2 µ2 Σ21 Σ22
independent if and only if Σ12 = Σ21 = 0.

(c) If X 1 and X 2 are independent and are distributed as Nq1 (µ1, Σ11)
X1
and Nq2 (µ2, Σ22), respectively, then has the multivariate normal
X2
distribution
µ1 Σ11 0
Nq1+q2 ,
µ2 0 Σ22

14
Example 3.6 (The equivalence of zero covariance and independence for
normal variables) Let X 3×1 be N3(µ, Σ) with
 
4 1 0
Σ= 1 3 0 
0 0 2

Are X1 and X2 independent ? What about (X1, X2) and X3 ?

X1 µ1
Result 3.6 Let X = be distributed as Np(µ, Σ) with , Σ=
X 2 µ2
Σ11 Σ12
, and |Σ22| > 0. Then the conditional distribution of X 1, given
Σ21 Σ22
that X 2 = x2 is normal and has
Mean = µ1 + Σ12Σ−1
22 (x2 − µ2 )

and Covariance = Σ11 − Σ12Σ−1

22 Σ21
Note that the covariance does not depend on the value x2 of the conditioning
variable.
15
Example 3.7 (The conditional density of a bivariate normal distribution)
Obtain the conditional density of X1, give that X2 = x2 for any bivariate
distribution.

Result 3.7 Let X be distributed as Np(µ, Σ) with |Σ| > 0. Then

(a) (X − µ)0Σ−1(X − µ) is distributed as χ2p, where χ2p denotes the chi-square

distribution with p degrees of freedom.

(b) The Np(µ, Σ)distribution assign probability 1 − α to the solid ellipsoid

{x : (x − µ)0Σ−1(x − µ) ≤ χ2p(α)}, where χ2p(α) denote the upper (100α)th
percentile of the χ2p distribution.

16
Result 3.8 Let X 1, X 2, . . . , X n be mutually independent with X j distributed
as Np(µj , Σ). (Note that each X j has the same covariance matrix Σ.) Then

V1 = c1X 1 + c2X 2 + · · · + cnX n

!
n n
c2j )Σ . Moreover, V1 and V2 = b1X 1 +
P P
is distributed as Np cj µj , (
j=1 j=1
b2X 2 + · · · + bnX n are jointly multivariate normal with covariance matrix
 n 
c2j )Σ b0cΣ
P
(
j=1
 
n
 
0
b2j )Σ
 P 
b cΣ21 (
j=1

n
0
P
Consequently, V1 and V2 are independent if b c = cj bj = 0.
j=1

17
Example 3.8 (Linear combinations of random vectors) Let X 1, X 2, X 3
and X 4 be independent and identically distributed 3 × 1 random vectors with
   
3 3 −1 1
µ =  −1  and Σ =  −1 1 0 
1 1 0 2

(a) find the mean and variance of the linear combination a0X 1 of the three
components of X 1 where a = [a1 a2 a3]0.

(b) Consider two linear combinations of random vectors

1 1 1 1
X1 + X2 + X3 + X4
2 2 2 2

and
X 1 + X 2 + X 3 − 3X 4.
Find the mean vector and covariance matrix for each linear combination of
vectors and also the covariance between them.
18
3.3 Sampling from a Multivariate Normal Distribution and
Maximum Likelihood Estimation
The Multivariate Normal Likelihood

• Joint density function of all p × 1 observed random vectors X 1, X 2, . . . , X n

Joint density
of X 1, X 2, . . . , X n
n
Y 1 0 −1
−(xj −µ) Σ (xj −µ)/2
= p/2 |Σ|1/2
e
j=1
(2π)
n
(xj −µ)0 Σ−1 (xj −µ)/2
P
1 −
j=1
= e
(2π)np/2|Σ|n/2
" !#.
n
−tr Σ−1 (xj −x̄)(xj −x̄)0 +n(x̄−µ)(x̄−µ)0
P
1 2
j=1
= np/2 n/2
e
(2π) |Σ|

19
• Likelihood
When the numerical values of the observations become available, they may
be substituted for the xj in the equation above. The resulting expression,
now considered as a function of µ and Σ for the fixed set of observations
x1, x2, . . . , xn, is called the likelihood.

• Maximum likelihood estimation

One meaning of best is to select the parameter values that maximize
the joint density evaluated at the observations. This technique is called
maximum likelihood estimation, and the maximizing parameter values are
called maximum likelihood estimates.

Result 3.9 Let A be a k × k symmetric matrix and x be a k × 1 vector. Then

(a) x0Ax = tr(x0Ax) = tr(Axx0)

n
P
(b) tr(A) = λi, where the λi are the eigenvalues of A.
i=1
20
Maximum Likelihood Estimate of µ and Σ
Result 3.10 Given a p × p symmetric positive definite matrix B and a scalar
b > 0, it follows that

1 −tr(Σ−1B)/2 1 pb −bp
b
e ≤ b
(2b) e
|Σ| |B|

for all positive definite Σp×p, with equality holding only for Σ = (1/2b)B.

Result 3.11 Let X 1, X 2, . . . , X n be a random sample from a normal population

with mean µ and covariance Σ. Then

n
1X n−1
µ̂ = X̄ and Σ̂ = (X j − X̄)(X j − X̄)0 = S
n j=1 n

are the maximum likelihood estimators of µ and Σ, respectively. Their

n
(xj − x̄)(xj − x̄)0, are called the maximum
P
observed value x̄ and (1/n)
j=1
likelihood estimates of µ and Σ. 21
Invariance Property of Maximum likelihood estimators

Let θ̂ be the maximum likelihood estimator of θ, and consider the parameter

h(θ), which is a function of θ. Then the maximum likelihood estimate of

h(θ) is given by h(θ̂).

For example

1. The maximum likelihood estimator of µ0Σ−1µ is µ̂Σ̂−1µ̂, where µ̂ = X̄ and

Σ̂ = n−1
n S are the maximum likelihood estimators of µ and Σ respectively.

√ √
2. The maximum likelihood estimator of σii is σ̂ii, where

n
1X
σ̂ii = (Xij − X̄i)2
n j=1

is the maximum likelihood estimator of σii = Var(Xi).

22
Sufficient Statistics

Let X 1, X 2, . . . , X n be a random sample from a multivariate normal

population with mean µ and covariance Σ. Then

n
1 X
X̄ and S = (X j − X̄)(X j − X̄)0 are sufficient statistics
n − 1 j=1

• The importance of sufficient statistics for normal populations is that all of

the information about µ and Σ in the data matrix X is contained in X̄ and
S, regardless of the sample size n.

• This generally is not true for nonnormal populations.

• Since many multivariate techniques begin with sample means and covariances,
it is prudent to check on the adequacy of the multivariate normal assumption.

• If the data cannot be regarded as multivariate normal, techniques that depend

solely on X̄ and S may be ignoring other useful sample information.
23
3.4 The Sampling Distribution of X̄ and S

• The univariate case (p = 1)

– X̄ is normal with mean µ =(population mean) and variance

1 2 population variance
σ =
n sample size

n
– For the sample variance, recall that (n−1)s2 = (Xj − X̄)2 is distributed
P
j=1
as σ 2 times a chi-square variable having n − 1 degrees of freedom (d.f.).
– The chi-square is the distribution of a sum squares of independent standard
normal random variables. That is, (n − 1)s2 is distributed as σ 2(Z12 +
2
· · · + Zn−1 ) = (σZ1)2 + · · · + (σZn−1)2. The individual terms σZi are
independently distributed as N (0, σ 2).

24
• Wishart distribution

Wm(·|Σ) = Wishart distribution with m d.f.

n
X
= distribution of Zj Z0j
j=1

where Zj are each independently distributed as Np(0, Σ).

• Properties of the Wishart Distribution

1. If A1 is distributed as Wm1 (A1|Σ) independently of A2, which is
distributed as Wm2 (A2|Σ), then A1 + A2 is distributed as Wm1+m2 (A1 +
A2|Σ). That is, the the degree of freedom add.
2. If A is distributed as Wm(A|Σ), then CAC0 is distributed as
Wm(CAC0|CΣC0).

25
• The Sampling Distribution of X̄ and S
Let X 1, X 2, . . . , X n be a random sample size n from a p-variate normal
distribution with mean µ and covariance matrix Σ. Then
1. X̄ is distributed as Np(µ, n1 Σ).
2. (n − 1)S is distributed as a Wishart random matrix with n − 1 d.f.
3. X̄ and S are independent.

26
4.5 Large-Sample Behavior of X̄ and S
Result 3.12 (Law of Large numbers) Let Y1, Y2, . . . , Yn be independent
observations from a population with mean E(Yi) = µ, then

Y1 + Y2 + · · · + Yn
Ȳ =
n

converges in probability to µ as n increases without bound. That is, for any

prescribed accuracy ε > 0, P [−ε < Ȳ − µ < ε] approaches unity as n → ∞.

Result 3.13 (The central limit theorem) Let X 1, X 2, . . . , X n be independent

observations from any population with mean µ and finite covariance Σ. Then
√
n(X̄ − µ) has an approximate Np(0, Σ)distribution

for large sample sizes. Here n should also be large relative to p.

27
Large-Sample Behavior of X̄ and S

Let X 1, X 2, . . . , X n be independent observations from a population with mean

µ and finite (nonsingular) covariance Σ. Then
√
n(X̄ − µ)is approximately Np(0, Σ)

and
n(X̄ − µ)0S−1(X̄ − µ) is approximately χ2p
for n − p large.

28
3.6 Assessing the Assumption of Normality

• Most of the statistical techniques discussed assume that each vector

observation X j comes from a multivariate normal distribution.

• In situations where the sample size is large and the techniques dependent
solely on the behavior of X̄, or distances involve X̄ of the form n(X̄ −
µ)0S(X̄ − µ), the assumption of normality for the individual observations is
less crucial.

• But to some degree, the quality of inferences made by these methods

depends on how closely the true parent population resembles the multivariate
normal form.

29
Therefore, we address these questions:

1. Do the marginal distributions of the elements of X appear to be normal ?

What about a few linear combinations of the components Xj ?

2. Do the scatter plots of observations on different characteristics give the

elliptical appearance expected from normal population ?

3. Are there any “wild” observations that should be checked for accuracy ?

30
Evaluating the Normality of the Univariate Marginal
Distributions

• Dot diagrams for smaller n and histogram for n > 25 or so help reveal
situations where one tail of a univariate distribution is much longer than
other.

• If the histogram for a variable Xi appears reasonably symmetric , we can

check further by counting the number of observations in certain interval, for
examples
A univariate normal distribution assigns probability 0.683 to the interval
√ √
(µi − σii, µi + σii)
and probability 0.954 to the interval
√ √
(µi − 2 σii, µi + 2 σii)
Consequently, with a large same size n, the observed proportion p̂i1 of the
√ √
observations lying in the interval (x̄i − sii, x̄i + sii) to be about 0.683,
√ √
and the interval (x̄i − 2 sii, x̄i + 2 sii) to be about 0.954
31
Using the normal approximating to the sampling of p̂i, observe that either
r
(0.683)(0.317) 1.396
|p̂i1 − 0.683| > 3 = √
n n

or r
(0.954)(0.046) 0.628
|p̂i2 − 0.954| > 3 = √
n n
would indicate departures from an assumed normal distribution for the ith
characteristic.

32
• Plots are always useful devices in any data analysis. Special plots called
Q − Q plots can be used to assess the assumption of normality.
Let x(1) ≤ x(2) ≤ · · · ≤ x(n) represent these observations after they are
ordered according to magnitude. For a standard normal distribution, the
quantiles q(j) are defined by the relation

q(j)
j − 12
Z
1 −z2/2
P [Z ≤ q(j)] = √ e dz = p(j) =
−∞ 2π n

Here p(j) is the probability of getting a value less than or equal to q(j) in a
single drawing from a standard normal population.

• The idea is to look at the pairs of quantiles (q(j), x(j)) with the same
associated cumulative probability (j − 12 )/n. If the data arise from a normal
population, the pairs (q(j), x(j)) will be approximately linear related, since
σq(j) + µ is nearly expected sample quantile.

33
Example 3.9 (Constructing a Q-Q plot) A sample of n = 10 observation
gives the values in the following table:

The steps leading to a Q-Q plot are as follows:

1. Order the original observations to get x(1), x(2), . . . , x(n) and their
corresponding probability values (1 − 21 )/n, (2 − 12 )/n, . . . , (n − 12 )/n;
2. Calculate the standard quantiles q(1), q(2), . . . , q(n) and
3. Plot the pairs of observations (q(1), x(1)), (q(2), x(2)), . . . , (q(n), x(n)), and
examine the “straightness” of the outcome.

34
Example 4.10 (A Q-Q plot for radiation data) The quality -control
department of a manufacturer of microwave ovens is required by the federal
government to monitor the amount of radiation emitted when the doors of the
ovens are closed. Observations of the radiation emitted through closed doors of
n = 42 randomly selected ovens were made. The data are listed in the following
table.

35
The straightness of the Q-Q plot can be measured ba calculating the
correlation coefficient of the points in the plot. The correlation coefficient for
the Q-Q plot is defined by

n
P
(x(j) − x̄)(q(j) − q̄)
j=1
rQ = s s
n
P Pn
(x(j) − x̄)2 (q(j) − q̄)2
j=1 j=1

and a powerful test of normality can be based on it. Formally we reject the
hypothesis of normality at level of significance α if rQ fall below the appropriate
value in the following table

36
Example 3.11 (A correlation coefficient test for normality) Let us calculate
the correlation coefficient rQ from Q-Q plot of Example 3.9 and test for
normality.
37
Linear combinations of more than one characteristic can be investigated.
Many statistician suggest plotting

ê01xj where Seˆ1 = λ̂1eˆ1

in which λ̂1 is the largest eigenvalue of S. Here x0j = [xj1, xj2, . . . , xjp] is
the jth observation on p variables X1, X2, . . . , Xp. The linear combination
êpxj corresponding to the smallest eigenvalue is also frequently singled out for
inspection

38
Evaluating Bivariate Normality

• By Result 3.7, the set of bivariate outcomes x such that

(x − µ)0Σ−1(x − µ) ≤ χ22(0.5)
has probability 0.5.
• Thus we should expect roughly the same percentage, 50%, of sample
observations lie in the ellipse given by

{all x such that (x − x̂)0S−1(x − x̂) ≤ χ22(0.5)}

where µ is replaced by x̂and Σ−1 by its estimate S−1. If not, the normality
assumption is suspect.

Example 3.12 (Checking bivariate normality) Although not a random sample,

data consisting of the pairs of observations (x1 = sales, x2 = profits) for the 10
largest companies in the world are listed in the following table. Check if (x1, x2)
follows bivariate normal distribution.
39
• A somewhat more formal method for judging normality of a data set is based
on the squared generalized distances
d2j = (xj − x̄)0S−1(xj − x̄)
• When the parent population is multivariate normal and both n and n − p
are greater than 25 or 30, each of the squared distance d21, d22, . . . , d2n should
behave like a chi-square random variable.
• Although these distances are not independent or exactly chi-square
distributed, it is helpful to plot them as if they were. The resulting
plot is called a chi-square plot or gamma plot, because the chi-square
distribution is a special case of the more general gamma distribution. To
construct the chi-square plot
1. Order the square distance in the equation above from smallest to largest
as d2(1) ≤ d2(2) ≤ · · · ≤ d2(n).
2. Graph the pairs (qc,p((j − 21 )/n), d2(j)), where qc,p((j − 21 )/n) is the 100(j −
1
2 )/n quantile of the chi-square distribution with p degrees of freedom.

40
Example 3.13 (Constructing a chi-square plot) Let us construct a chi-square
plot of the generalized distances given in Example 3.12. The order distance and
the corresponding chi-square percentile for p = 2 and n = 10 are listed in the
following table:

41
42
Example 3.14 (Evaluating multivariate normality for a four-variable data
set) The data in Table 4.3 were obtained by taking four different measures of
stiffness, x1, x2, x3, and x4, of each of n = 30 boards. the first measurement
involving sending a shock wave down the board, the second measurement
is determined while vibrating the board, and the last two measurements are
obtained from static tests. The squared distances dj = (xj − x̄)0S−1(xj − x̄) are
also presented in the table

43
44
3.7 Detecting Outliers and Cleaning Data

• Outliers are best detected visually whenever this is possible

• For a single random variable, the problem is one dimensional, and we look
for observations that are far from the others.

• In the bivariate case, the situation is more complicated. Figure 4.10 shows a
situation with two unusual observations.

• In higher dimensions, there can be outliers that cannot be detected from

the univariate plots or even the bivariate scatter plots. Here a large value
of (xj − x̄)0S−1(xj − x̄) will suggest an unusual observation. even though it
cannot be seen visually.

45
46
Steps for Detecting Outliers
1. Math a dot plot for each variable.
2. Make a scatter plot for each pair of variables.
√
3. Calculate the standardize variable zjk = (xjk − x̄k )/ skk for j = 1, 2, . . . , n
and each column k = 1, 2, . . . , p. Examine these standardized values for large
or small values.
4. Calculate the generalized squared distance (xj − x̄)0S−1(xj − x̄). Examine
these distances for unusually values. In a chi-square plot, these would be the
points farthest from the origin.

47
Example 3.15 (Detecting outliers in the data on lumber) Table 4.4 contains
the data in Table 4.3, along with the standardized observations. These data
consist of four different measurements of stiffness x1, x2, x3 and x4, on each
n = 30 boards. Detect outliers in these data.

48
49
3.8 Transformations to Near Normality
If normality is not a viable assumption, what is the next step ?

• Ignore the findings of a normality check and proceed as if the data were
normality distributed. ( Not recommend)

• Make nonnormal data more “normal looking” by considering

transformations of data. Normal-theory analyses can then be carried
out with the suitably transformed data.

Appropriate transformations are suggested by

1. theoretical consideration

2. the data themselves.

50
• Helpful Transformations To Near Normality
Original Scale Transformed Scale
√
1. Counts, y y
p̂
2. Proportions, p̂ logit = 12 log 1−p̂
3. Correlations, r Fisher’s z(r) = 21 log 1+r
1−r

• Box and Cox transformation

(
xλ −1
λ 6= 0 (λ) xλj − 1
x(λ) = λ or yj = #λ−1 , j = 1, . . . , n
ln x λ = 0
"
n
1/n
Q
λ xi
i=1

Given the observations x1, x2, . . . , xn, the Box-Cox transformation for the
choice of an appropriate power λ is the solution that maximizes the express
 
n n
n  1 X (λ) ¯ )2 + (λ − 1)
X
`(λ) = − ln (xj − x(λ) ln xj
2 n j=1 j=1

n

¯ = 1
P xλ
j −1
where x(λ) n λ . 51
j=1
Example 3.16 (Determining a power transformation for univariate data)
We gave readings of microwave radiation emitted through the closed doors of
n = 42 ovens in Example 3.10. The Q-Q plot of these data in Figure 4.6
indicates that the observations deviate from what would be expected if they
were normally distributed. Since all the positive observations are positive, let
us perform a power transformation of the data which, we hope, will produce
results that are more nearly normal. We must find that value of λ maximize the
function `(λ).

52
53
Transforming Multivariate Observations

• With multivariate observations, a power transformation must be selected for

each of the variables.
• Let λ1, λ2, . . . , λp be the power transformations for the p measured
characteristics. Each λk can be selected by maximizing
 
n n
n  1 X (λk ) (λ )
X
`(λ) = − ln (xjk − xk k )2 + (λk − 1) ln xjk
2 n j=1 j=1

where x1k , x2k , . . . , xnk are n observations on the kth variable, k =

1, 2, . . . , p. Here
λk
n
!
(λk ) 1 X xjk − 1
xk =
n j=1 λk
• Let λ̂1, λ̂2, . . . , λ̂p be the values that individually maximize the equation
above. Then the jth transformed multivariate observation is
 0
λ̂1 λ̂2 λ̂p
(λ̂) xj1 − 1 xj2 − 1 xjp −1
xj =  , ,···, 
54
λ̂1 λ̂2 λ̂p
• The procedure just described is equivalent to making each marginal
distribution approximately normal. Although normal marginals are not
sufficient to ensure that the joint distribution is normal, in practical
applications this may be good enough.

• If not, the value λ̂1, λ̂2, . . . , λ̂p can be obtained from the preceding
transformations and iterate toward the set of values λ0 = [λ1, λ2, . . . , λp],
which collectively maximizes
n n
n X X
`(λ1, λ2, . . . , λp) = − ln |S(λ)| + (λ1 − 1) ln xj1 + (λ2 − 1) ln xj2
2 j=1 j=1
n
X
+ · · · + (λp − 1) ln xjp
j=1

where S(λ) is the sample covariance matrix computed from

" λ λ2 λp #0
(λ) xj1 − 1 xj2 − 1
1
xjp − 1
xj = , ,···, , j = 1, 2, . . . , n
λ1 λ2 λp
55
Example 3.17 (Determining power transformations for bivariate data)
Radiation measurements were also recorded though the open doors of the
n = 42 micowave ovens introduced in Example 3.10. The amount of radiation
emitted through the open doors of these ovens is list Table 4.5. Denote the
door-close data x11, x21, . . . , x42,1 and the door-open data x12, x22, . . . , x42,2.
Consider the joint distribution of x1 and x2, Choosing a power transformation
for (x1, x2) to make the joint distribution of (x1, x2) approximately bivariate
normal.

56
57
58
If the data includes some large negative values and have a single long tail, a
more general transformation should be applied.

 λ

 {(x + 1) − 1}/λ x ≥ 0, λ 6= 0
ln(x + 1) x ≥ 0, λ = 0

x(λ) = 2−λ

 −{(−x + 1) − 1}/(2 − λ) x < 0, λ 6= 2
− ln(−x + 1) x < 0, λ = 2


QRM 06
No ratings yet
QRM 06
59 pages
Sampling MND MLE AED 2021
No ratings yet
Sampling MND MLE AED 2021
28 pages
Applied Multivariate Statistical Analysis-192-208
No ratings yet
Applied Multivariate Statistical Analysis-192-208
17 pages
Sampling From A Multivariate Normal Distribution - Dr. Juan Camilo Orduz
No ratings yet
Sampling From A Multivariate Normal Distribution - Dr. Juan Camilo Orduz
8 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
20 pages
Multi Normal
No ratings yet
Multi Normal
6 pages
Cap 2 Applied Multivariate Statistical JOHNSON PP 149-163
No ratings yet
Cap 2 Applied Multivariate Statistical JOHNSON PP 149-163
15 pages
01-DensityEstimation 2
No ratings yet
01-DensityEstimation 2
26 pages
Chapter6 (Multivariate Normal Distribution)
No ratings yet
Chapter6 (Multivariate Normal Distribution)
25 pages
22-23 323 Week5Notes
No ratings yet
22-23 323 Week5Notes
8 pages
My Notes For Discrete and Continuous Distributions 987654
No ratings yet
My Notes For Discrete and Continuous Distributions 987654
28 pages
STAT394 MVNtutorial
No ratings yet
STAT394 MVNtutorial
39 pages
CH 4
No ratings yet
CH 4
3 pages
Slides
No ratings yet
Slides
38 pages
Research Methodology Part 1
No ratings yet
Research Methodology Part 1
25 pages
1-Multivariate Normal Distributions-18-07-2024
No ratings yet
1-Multivariate Normal Distributions-18-07-2024
36 pages
Chapter1 MV
No ratings yet
Chapter1 MV
72 pages
Slides 4
No ratings yet
Slides 4
51 pages
HASTS215 - HSTS215 NOTES Chapter4
No ratings yet
HASTS215 - HSTS215 NOTES Chapter4
7 pages
Louisianas Way Home Kate DiCamillo
100% (5)
Louisianas Way Home Kate DiCamillo
85 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
15 pages
Unit 1 Multivariate Analysis Lecture Notes
No ratings yet
Unit 1 Multivariate Analysis Lecture Notes
12 pages
Multivariate Normal
No ratings yet
Multivariate Normal
24 pages
AE - Tema 3 - The Multivariate Gaussian Distribution
No ratings yet
AE - Tema 3 - The Multivariate Gaussian Distribution
6 pages
Topic 3 Multivariate Models I (Week 2)
No ratings yet
Topic 3 Multivariate Models I (Week 2)
27 pages
Rohatgi - An Introduction To Probability and Statistics Wiley 2015 - Removed
No ratings yet
Rohatgi - An Introduction To Probability and Statistics Wiley 2015 - Removed
13 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Multi Variate Analysis
No ratings yet
Multi Variate Analysis
133 pages
Lecture 11 HHJJ
No ratings yet
Lecture 11 HHJJ
6 pages
Cours MND
No ratings yet
Cours MND
19 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
19 pages
Handout-3-Multivariate Normal
No ratings yet
Handout-3-Multivariate Normal
9 pages
Multivariate Normal Distribution - Wikipedia, The Free Encyclopedia
No ratings yet
Multivariate Normal Distribution - Wikipedia, The Free Encyclopedia
12 pages
Gamma Function
No ratings yet
Gamma Function
16 pages
Exactly Central Limit: Multivariate Statistical Methods
No ratings yet
Exactly Central Limit: Multivariate Statistical Methods
18 pages
Statistics Review
No ratings yet
Statistics Review
9 pages
Unit 19
No ratings yet
Unit 19
16 pages
Multi Varia Da 1
No ratings yet
Multi Varia Da 1
59 pages
6.1 The Multivariate Normal Random Vector
No ratings yet
6.1 The Multivariate Normal Random Vector
9 pages
SSP4SE Appa
No ratings yet
SSP4SE Appa
10 pages
Multivariate Normal Distribution: 3.1 Basic Properties
No ratings yet
Multivariate Normal Distribution: 3.1 Basic Properties
13 pages
Multivariate Normal Distribution: 3.1 Basic Properties
No ratings yet
Multivariate Normal Distribution: 3.1 Basic Properties
13 pages
ENG 102 New Course Outline Spring 2023
No ratings yet
ENG 102 New Course Outline Spring 2023
6 pages
Random Vectors and Multivariate Normal Distribution
No ratings yet
Random Vectors and Multivariate Normal Distribution
6 pages
Chapter 6 - The Multivariate Normal Distribution and Copulas - 2013 - Simulation
No ratings yet
Chapter 6 - The Multivariate Normal Distribution and Copulas - 2013 - Simulation
13 pages
Multivariate Statistical Analysis: The Multivariate Normal Distribution
No ratings yet
Multivariate Statistical Analysis: The Multivariate Normal Distribution
13 pages
2.5.2 Multivariate Density
No ratings yet
2.5.2 Multivariate Density
12 pages
Multivariate Normal Distribution
100% (1)
Multivariate Normal Distribution
8 pages
The Multivariate Normal Distribution: f (x) = √ e −∞ 0. /σ
No ratings yet
The Multivariate Normal Distribution: f (x) = √ e −∞ 0. /σ
5 pages
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
No ratings yet
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
84 pages
Trigonometric Graphs Investigation
100% (1)
Trigonometric Graphs Investigation
12 pages
Chap 2
No ratings yet
Chap 2
9 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
14 pages
Solution To Exercises On MVN: 1 Question 1 (I)
No ratings yet
Solution To Exercises On MVN: 1 Question 1 (I)
3 pages
4 Distribusi Normal Multivariat-1
No ratings yet
4 Distribusi Normal Multivariat-1
26 pages
Community Mental Health Nursing PDF
75% (4)
Community Mental Health Nursing PDF
2 pages
Beam
No ratings yet
Beam
8 pages
Solusi Soal Bab 4
No ratings yet
Solusi Soal Bab 4
9 pages
BD - eBOOK Big Data Data Scientist
No ratings yet
BD - eBOOK Big Data Data Scientist
11 pages
Chap2 Multivariate Normal and Related Distributions
No ratings yet
Chap2 Multivariate Normal and Related Distributions
18 pages
Large Retailer 2020
No ratings yet
Large Retailer 2020
54 pages
Criminology Final
No ratings yet
Criminology Final
8 pages
BCC Results
No ratings yet
BCC Results
64 pages
International Perspectives On Undergraduate Research Policy and Practice 1st Edition Nancy H Hensel Download
No ratings yet
International Perspectives On Undergraduate Research Policy and Practice 1st Edition Nancy H Hensel Download
79 pages
Multivariate Distributions
No ratings yet
Multivariate Distributions
8 pages
Richard Scarry's Best Little Golden Books Ever!: 9 Books in 1 Scarryinstant Download
No ratings yet
Richard Scarry's Best Little Golden Books Ever!: 9 Books in 1 Scarryinstant Download
53 pages
Properties of The Normal and Multivariate Normal Distributions
No ratings yet
Properties of The Normal and Multivariate Normal Distributions
2 pages
The Truth About Animals Stoned Sloths Lovelorn Hippos and Other Tales From The Wild Side of Wildlife First Us Edition Cooke
100% (1)
The Truth About Animals Stoned Sloths Lovelorn Hippos and Other Tales From The Wild Side of Wildlife First Us Edition Cooke
62 pages
Multivariate Normal Distribution: 1 Random Vector
No ratings yet
Multivariate Normal Distribution: 1 Random Vector
3 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
9 pages
Financial Accounting and Reporting Iii Financial Accounting and Reporting Iii (Reviewer) (Reviewer)
No ratings yet
Financial Accounting and Reporting Iii Financial Accounting and Reporting Iii (Reviewer) (Reviewer)
18 pages
Rodriguez Rodolf
No ratings yet
Rodriguez Rodolf
110 pages
Why Did Ram Leave Sita
100% (2)
Why Did Ram Leave Sita
4 pages
English: Quarter 3, Wk.1 - Module 4
No ratings yet
English: Quarter 3, Wk.1 - Module 4
23 pages
A Case of Hemorrhagic Ovarian Cyst Treated With Homoeopathy
No ratings yet
A Case of Hemorrhagic Ovarian Cyst Treated With Homoeopathy
4 pages
Can A Contractor Claim For Loss of Profit On Omitted Works PDF
100% (1)
Can A Contractor Claim For Loss of Profit On Omitted Works PDF
2 pages
PVC-40 - PVC Plastic Pipe & Fittings 40
No ratings yet
PVC-40 - PVC Plastic Pipe & Fittings 40
8 pages
Busieness Research SAN FRANCISCOo
No ratings yet
Busieness Research SAN FRANCISCOo
21 pages
Gis-3207 - Final Exam
No ratings yet
Gis-3207 - Final Exam
2 pages
Williams V Sir C. M. Burrell, Bart., and Another (1845) 135 ER 596
No ratings yet
Williams V Sir C. M. Burrell, Bart., and Another (1845) 135 ER 596
14 pages
Sample Validation Letter
No ratings yet
Sample Validation Letter
3 pages
Islamic Ethics Al-Kindi
No ratings yet
Islamic Ethics Al-Kindi
11 pages
Duchesne 2018
No ratings yet
Duchesne 2018
10 pages
Handout 14: Unbiasedness and MSE
No ratings yet
Handout 14: Unbiasedness and MSE
3 pages
lecII Electronion
No ratings yet
lecII Electronion
10 pages
掃描文件 2019年10月24日
No ratings yet
掃描文件 2019年10月24日
19 pages
Leadership Management and Education Planning: Developing The Entrepreneurship Training of Islamic Education
No ratings yet
Leadership Management and Education Planning: Developing The Entrepreneurship Training of Islamic Education
7 pages
Threshold Methods For Sample Extremes: at 1984 by 0. Reidel Publishing C0mpany
No ratings yet
Threshold Methods For Sample Extremes: at 1984 by 0. Reidel Publishing C0mpany
18 pages
單字 1-3
No ratings yet
單字 1-3
4 pages
Rhodiola Rosea Changed My Life
No ratings yet
Rhodiola Rosea Changed My Life
1 page
Desktops in The Cloud Your Silver Bullet For Windows XP End of Life
No ratings yet
Desktops in The Cloud Your Silver Bullet For Windows XP End of Life
7 pages
108-3 Business-Analytics Syllabus v1
No ratings yet
108-3 Business-Analytics Syllabus v1
1 page
Sample File: The False Hydra
No ratings yet
Sample File: The False Hydra
4 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

The Multivariate Normal Distribution: Exactly Central Limit

Uploaded by

The Multivariate Normal Distribution: Exactly Central Limit

Uploaded by

3.

The Multivariate Normal Distribution

• A generalization of the familiar bell shaped normal density to several

• This can be generalized for p × 1 vector x of observations on serval variables

1 −(x−µ)0 Σ−1 (x−µ)/2

where −∞ < xi < ∞, i = 1, 2, . . . , p. We should denote this p-dimensional

Result 3.1 If Σ is positive definite, so that Σ−1 exists, then

so (λ, e) is an eigenvalue-eigenvector pair for Σ corresponding to the pair

= { all x such that (x − µ)0Σ−1(x − µ) = c2}

= surface of an ellipsoid centered at µ.

Contours of constant density for the p-dimensional normal distribution are

1. Linear combination of the components of X are normally distributed.

2. All subsets of the components of X have a (multivariate) normal distribution.

3. Zero covariance implies that the corresponding components are independently

4. The conditional distributions of the components are normal.

Example 3.3 (The distribution of a linear combination of the component

Result 3.3 If X is distributed as Np(µ, Σ), the q linear combinations

Example 3.5 (The distribution of a subset of a normal random vector)

(a) If X 1 and X 2 are independent, then Cov(X 1, X 2) = 0, a q1 × q2 matrix of

Are X1 and X2 independent ? What about (X1, X2) and X3 ?

and Covariance = Σ11 − Σ12Σ−1

Result 3.7 Let X be distributed as Np(µ, Σ) with |Σ| > 0. Then

(a) (X − µ)0Σ−1(X − µ) is distributed as χ2p, where χ2p denotes the chi-square

(b) The Np(µ, Σ)distribution assign probability 1 − α to the solid ellipsoid

V1 = c1X 1 + c2X 2 + · · · + cnX n

(b) Consider two linear combinations of random vectors

• Joint density function of all p × 1 observed random vectors X 1, X 2, . . . , X n

• Maximum likelihood estimation

Result 3.9 Let A be a k × k symmetric matrix and x be a k × 1 vector. Then

(a) x0Ax = tr(x0Ax) = tr(Axx0)

Result 3.11 Let X 1, X 2, . . . , X n be a random sample from a normal population

are the maximum likelihood estimators of µ and Σ, respectively. Their

Let θ̂ be the maximum likelihood estimator of θ, and consider the parameter

h(θ) is given by h(θ̂).

1. The maximum likelihood estimator of µ0Σ−1µ is µ̂Σ̂−1µ̂, where µ̂ = X̄ and

is the maximum likelihood estimator of σii = Var(Xi).

Let X 1, X 2, . . . , X n be a random sample from a multivariate normal

• The importance of sufficient statistics for normal populations is that all of

• This generally is not true for nonnormal populations.

• If the data cannot be regarded as multivariate normal, techniques that depend

• The univariate case (p = 1)

Wm(·|Σ) = Wishart distribution with m d.f.

where Zj are each independently distributed as Np(0, Σ).

• Properties of the Wishart Distribution

converges in probability to µ as n increases without bound. That is, for any

Result 3.13 (The central limit theorem) Let X 1, X 2, . . . , X n be independent

for large sample sizes. Here n should also be large relative to p.

Let X 1, X 2, . . . , X n be independent observations from a population with mean

• Most of the statistical techniques discussed assume that each vector

• But to some degree, the quality of inferences made by these methods

1. Do the marginal distributions of the elements of X appear to be normal ?

2. Do the scatter plots of observations on different characteristics give the

• If the histogram for a variable Xi appears reasonably symmetric , we can

The steps leading to a Q-Q plot are as follows:

ê01xj where Seˆ1 = λ̂1eˆ1

• By Result 3.7, the set of bivariate outcomes x such that

{all x such that (x − x̂)0S−1(x − x̂) ≤ χ22(0.5)}

Example 3.12 (Checking bivariate normality) Although not a random sample,

• Outliers are best detected visually whenever this is possible

• In higher dimensions, there can be outliers that cannot be detected from

• Make nonnormal data more “normal looking” by considering

Appropriate transformations are suggested by

2. the data themselves.

• Box and Cox transformation

• With multivariate observations, a power transformation must be selected for

where x1k , x2k , . . . , xnk are n observations on the kth variable, k =

where S(λ) is the sample covariance matrix computed from

You might also like