0% found this document useful (0 votes)
10 views24 pages

Multivariate Normal

The document discusses the bivariate normal distribution, defining its joint probability density function and properties such as the mean vector and covariance matrix. It explains the marginal and conditional distributions, providing examples and simulations to illustrate these concepts. Additionally, it covers the implications of correlation and the behavior of quadratic forms in the context of the bivariate normal distribution.

Uploaded by

Asmamaw Getnet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views24 pages

Multivariate Normal

The document discusses the bivariate normal distribution, defining its joint probability density function and properties such as the mean vector and covariance matrix. It explains the marginal and conditional distributions, providing examples and simulations to illustrate these concepts. Additionally, it covers the implications of correlation and the behavior of quadratic forms in the context of the bivariate normal distribution.

Uploaded by

Asmamaw Getnet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Lecture 5: The multivariate normal

distribution
The bivariate normal distribution

Suppose µx , µy , σx ≥ 0, σy ≥ 0 and −1 ≤ ρ ≤ 1 are constants.


Define the 2 × 2 matrix Σ by
 2 
σx ρσx σy
Σ= .
ρσx σy σy2

Then define a joint probability density function by


 
1 1
fX ,Y (x, y ) = √ exp − Q(x, y )
2π det Σ 2

where
Q(x, y ) = (x − µ)T Σ−1 (x − µ)
and    
x µx
x= , µ= .
y µy
If random variables (X , Y ) have joint probability density given by
fX ,Y above, then we say that (X , Y ) have a bivariate normal
distribution and write

(X , Y )T ∼ N2 (µ, Σ).

It can be proved that the function fX ,Y (x, y ) integrates to 1 and


therefore defines a valid joint pdf.
The notes contain expansions of Q(x, y ) and det Σ.
Remarks

1 The vector µ = (µx , µy )T is called the mean vector and the


matrix Σ is called the covariance matrix (or sometimes
variance-covariance matrix).
2 Functions of the form F (x) = x T Σ−1 x are called quadratic
forms. Quadratic forms are functions Rn → R which satisfy
certain properties. They crop up in several areas of
mathematics and statistics.
3 The matrix Σ and its inverse Σ−1 are positive definite. A
matrix A is positive definite if

x T Ax ≥ 0

for all non-zero vectors x.


4 It follows that when µx = µy = 0, Q(x, y ) is a positive
definite quadratic form.
Pictures

σx = σy, ρ = 0
3
90%
2

y
−1
80%
−2
95%

−3 99%

−3 −2 −1 0 1 2 3
x

σx = 2σy, ρ = 0
3
0

y
Pictures −1
80%
−2
95%

−3 99%

−3 −2 −1 0 1 2 3
x

σx = 2σy, ρ = 0
3

2
99%
1 80%

y
−1 90%
95%

−2

−3

−3 −2 −1 0 1 2 3
x

2σx = σy, ρ = 0
3 99%
−1 90%
95%

Pictures −2

−3

−3 −2 −1 0 1 2 3
x

2σx = σy, ρ = 0
3 99%
95%
2
80%

y
−1

−2 90%

−3

−3 −2 −1 0 1 2 3
x
Pictures

σx = σy, ρ = 0.75
3
90%
2

y
−1
80%
−2
95%

−3 99%

−3 −2 −1 0 1 2 3
x

σx = σy, ρ = − 0.75
3 99%
0

y
Pictures −1
80%
−2
95%

−3 99%

−3 −2 −1 0 1 2 3
x

σx = σy, ρ = − 0.75
3 99%

90%
2

y
−1
80%
−2
95%

−3

−3 −2 −1 0 1 2 3
x

2σx = σy, ρ = 0.75


3
−1
Pictures −2
80%

95%

−3

−3 −2 −1 0 1 2 3
x

2σx = σy, ρ = 0.75


3
95%
2
80%
1

y
−1
90%
−2
99%
−3

−3 −2 −1 0 1 2 3
x
Comments

1 Q(x, y ) ≥ 0 with equality only when x = µ. It follows that


the density function has its mode at x = µ.
2 Changing the values of µx , µy does not change the shape of
the plots, but corresponds to a translation of the xy -plane
i.e. changing µx , µy just shifts the contours / surface to a
new mode position.
3 The contours of equal density are circular when σx = σy and
ρ = 0 and elliptical when σx 6= σy or ρ 6= 0.
4 σx and σy control the extent to which the distribution is
dispersed.
5 The parameter ρ is the correlation of X , Y
i.e. Cor (X , Y ) = ρ. Thus for non-zero ρ, the contours are at
an angle to the axes.
Marginals and conditionals

Suppose (X , Y )T ∼ N2 (µ, Σ). Then:-


1 The marginal distributions are normal:
X ∼ N(µx , σx2 ) and
Y ∼ N(µy , σy2 ).
2 The conditional distributions are normal:
σx
X |Y = y ∼ N(µx + ρ (y − µy ), σx2 (1 − ρ2 )) and
σy
σy
Y |X = x ∼ N(µy + ρ (x − µx ), σy2 (1 − ρ2 )).
σx
3 When ρ = 0, X and Y are independent.
4 Linear combinations of X and Y are also normally distributed:
aX + bY ∼ N(aµx + bµy , a2 σx2 + b 2 σy2 + 2abρσx σy )
where a, b are constants.
Example 5.1

Suppose (X , Y )T ∼ N2 (µ, Σ) where µx = 2, µy = 3, σx = 1,


σy = 1 and ρ = 0.5.
Simulate a sample of size 500 from this distribution and draw a
scatter plot.
Use simulation to find Pr X 2 + Y 2 < 9 .


Solution
The marginal distribution of X is X ∼ N(2, 12 ).
Using the formula for the conditional
σy
Y |X = x ∼ N(µy + ρ (x − µx ), σy2 (1 − ρ2 ))
σx
∼ N(3 + 0.5(x − 2), 0.75).
Example 5.1

Suppose (X , Y )T ∼ N2 (µ, Σ) where µx = 2, µy = 3, σx = 1,


σy = 1 and ρ = 0.5.
Simulate a sample of size 500 from this distribution and draw a
scatter plot.
Use simulation to find Pr X 2 + Y 2 < 9 .


Solution
The marginal distribution of X is X ∼ N(2, 12 ).
Using the formula for the conditional
σy
Y |X = x ∼ N(µy + ρ (x − µx ), σy2 (1 − ρ2 ))
σx
∼ N(3 + 0.5(x − 2), 0.75).
Simulation results

1 n p t s = 500
2 x = rnorm ( n p t s , mean=2, s d = 1 )
3 y = rnorm ( n p t s , mean=3+0.5∗ ( x −2) , s d=s q r t ( 0 . 7 5 ) )


6

● ●
● ●
● ● ●●
5

● ● ● ●
● ●●●
● ● ● ●● ● ●

● ● ●● ● ● ● ●

●●●● ● ● ● ● ●
●● ● ● ●●
● ● ●●●● ● ● ●
4

●●●●● ● ●
●●●●● ●●●
● ●●●● ●
●● ●●●● ●
● ●● ●● ● ●
● ●● ● ●●●● ● ● ●●●● ●●●●●● ● ● ●
● ● ●● ● ●
● ● ● ●
● ●● ●● ●●
● ●●●●●●●● ●●●●●●●●●●●
●●●● ●
●● ●●● ●●●● ●
● ●●●●

● ●

●● ●●● ● ● ●●
● ●
●● ●
●● ●●●●●●● ●●●
● ● ●● ●●● ● ● ●
● ●● ● ●● ● ● ●● ● ●● ● ●● ●
y

● ●●●
3

● ● ●
● ● ● ● ●
● ●
●●● ● ●●●●●● ●● ● ●● ●●● ●
● ●
● ●●● ●● ●●● ●● ●●
● ●● ● ●● ● ● ●● ● ●●
● ●●●●● ●●
●●●●● ●●
●●
● ● ●●●●●●● ●● ● ● ●
●●
● ●● ● ●●● ● ● ●●●● ●●●
●●● ● ● ●●●● ●
●●●● ● ● ●●● ●●

● ● ●●● ● ●●
● ● ● ●● ● ●● ● ●●●
● ● ● ● ●● ●
2

●●
● ●● ●● ●● ●●●● ●●●● ● ● ●
● ●● ● ● ●

● ● ● ●● ● ●● ●●● ● ●
● ●●●●
● ●
● ● ● ●
● ● ●● ●
●● ● ● ●
● ● ● ● ● ● ●●
1


●●
● ● ●
●●●


● ●
0

0 1 2 3 4 5

x
Probability calculation

To find Pr X 2 + Y 2 < 9 approximately, count the number of




points in the region:


1 n p t s = 10000
2 x = rnorm ( n p t s , mean=2, s d = 1 )
3 y = rnorm ( n p t s , mean=3+0.5∗ ( x −2) , s d=s q r t ( 0 . 7 5 ) )
4 f = xˆ2+y ˆ2
5 sum ( f <9)/ n p t s

Answer ' 0.2776


Extra example

Suppose      
X 4 8 2
∼ N2 , .
Y 1 2 5
The random variable Z is defined by Z = X + 3Y . What is the
distribution of Z ?
Extra example

We have Z = X + 3Y . Using result 4 on page 31, we have

E [Z ] = 1 × µx + 3 × µy = 1 × 4 + 3 × 1 = 7.

Now from the variance-covariance matrix, we have ρσx σy = 2.


Thus

Var (Z ) = 12 × σx2 + 32 × σy2 + 2 × 1 × 3 × (ρσx σy )


= 1×8+9×5+2×1×3×2
= 65.

Therefore Z ∼ N(7, 65).


Extra example

We have Z = X + 3Y . Using result 4 on page 31, we have

E [Z ] = 1 × µx + 3 × µy = 1 × 4 + 3 × 1 = 7.

Now from the variance-covariance matrix, we have ρσx σy = 2.


Thus

Var (Z ) = 12 × σx2 + 32 × σy2 + 2 × 1 × 3 × (ρσx σy )


= 1×8+9×5+2×1×3×2
= 65.

Therefore Z ∼ N(7, 65).


Extra example

We have Z = X + 3Y . Using result 4 on page 31, we have

E [Z ] = 1 × µx + 3 × µy = 1 × 4 + 3 × 1 = 7.

Now from the variance-covariance matrix, we have ρσx σy = 2.


Thus

Var (Z ) = 12 × σx2 + 32 × σy2 + 2 × 1 × 3 × (ρσx σy )


= 1×8+9×5+2×1×3×2
= 65.

Therefore Z ∼ N(7, 65).


Extra example

We have Z = X + 3Y . Using result 4 on page 31, we have

E [Z ] = 1 × µx + 3 × µy = 1 × 4 + 3 × 1 = 7.

Now from the variance-covariance matrix, we have ρσx σy = 2.


Thus

Var (Z ) = 12 × σx2 + 32 × σy2 + 2 × 1 × 3 × (ρσx σy )


= 1×8+9×5+2×1×3×2
= 65.

Therefore Z ∼ N(7, 65).


Extra example

−20 0 20 40
The multivariate normal distribution

The multivariate normal distribution is defined on vectors in Rn .


Suppose that X is a random vector with n entries, i.e.
X = (X1 , . . . , Xn )T .
Then
X ∼ Nn (µ, Σ)
if X1 , . . . , Xn have joint PDF given by
 
1 1
fX (x) = √ exp − Q(x)
2π det Σ 2

where
Q(x) = (x − µ)T Σ−1 (x − µ).

This definition makes sense for any column vector µ ∈ Rn and any
positive definite n × n matrix Σ.
Remarks

1 The vector µ is the mean of the distribution and Σ is called


the covariance matrix.
2 All the marginal distributions of X are normal. (We do not
specify their parameters here, however).
3 Similarly, all the conditional distributions of X are normal.
(Again, we do not specify the parameters of these
distributions here).

You might also like