Bivariate Normal Distribution
Bivariate Normal Distribution
5 Conditional Distributions
Let Z1 , Z2 ∼ N (0, 1), which we will use to build a general bivariate normal
Lecture 22: Bivariate Normal Distribution distribution.
1 1 2 2
f (z1 , z2 ) = exp − (z1 + z2 )
Statistics 104 2π 2
We want to transform these unit normal distributions to have the follow
Colin Rundel
arbitrary parameters: µX , µY , σX , σY , ρ
April 11, 2012
X = σX Z1 + µ X
p
Y = σY [ρZ1 + 1 − ρ2 Z2 ] + µY
First, lets examine the marginal distributions of X and Y , Second, we can find Cov (X , Y ) and ρ(X , Y )
= σY [N (0, ρ2 ) + N (0, 1 − ρ2 )] + µY
Cov (X , Y )
ρ(X , Y ) = =ρ
= σY N (0, 1) + µY σX σY
= N (µY , σY2 )
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 2 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 3 / 22
6.5 Conditional Distributions 6.5 Conditional Distributions
the transformation: If we assume that the n functions r1 , . . . , rn define a one-to-one differentiable transformation
from S to T then let the inverse of this transformation be
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 4 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 5 / 22
The first thing we need to find are the inverses of the transformation. If Next we calculate the Jacobian,
x = r1 (z1 , z2 ) and y = r2 (z1 , z2 ) we need to find functions h1 and h2 such " ∂s1 ∂s1
# " 1
0
#
∂x ∂y σX 1
that Z1 = s1 (X , Y ) and Z2 = s2 (X , Y ). J = det ∂s2 ∂s2 = det −ρ
√ √1 = p
∂x ∂y σX 1−ρ2 σY 1−ρ2 σX σY 1 − ρ2
X = σX Z1 + µX
X − µX
The joint density of X and Y is then given by
Z1 =
σX
f (x, y ) = f (z1 , z2 )|J|
1 1 2 1 1 2
p 2 2
= exp − (z1 + z2 ) |J| = exp − (z1 + z2 )
Y = σY [ρZ1 + 1 − ρ2 Z2 ] + µY 2π 2
p
2πσX σY 1 − ρ2 2
Y − µY X − µX p
=ρ + 1 − ρ 2 Z2
σY σX 1
"
1
"
x − µX
!2
1 y − µY x − µX
!2 ##
= exp − + −ρ
1 Y − µY X − µX
p
2πσX σY 1 − ρ2 2 σX 1 − ρ2 σY σX
Z2 = p −ρ
1 − ρ2 σY σX
(x − µX )2 (y − µY )2
" !#
1 −1 (x − µX ) (y − µY )
Therefore, = exp + − 2ρ
2πσX σY (1 − ρ2 )1/2 2(1 − ρ2 ) 2
σX 2
σY σX σY
x − µX 1 y − µY x − µX
s1 (x, y ) = s2 (x, y ) = p −ρ
σX 1 − ρ2 σY σX
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 6 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 7 / 22
6.5 Conditional Distributions 6.5 Conditional Distributions
General Bivariate Normal - Density (Matrix Notation) General Bivariate Normal - Density (Matrix Notation)
Obviously, the density for the Bivariate Normal is ugly, and it only gets Recall for a 2 × 2 matrix,
worse when we consider higher dimensional joint densities of normals. We
a b 1 d −b 1 d −b
A= A−1 = =
can write the density in a more compact form using matrix notation, c d det A −c a ad − bc −c a
Then,
σX2
x µX ρσX σY
x= µ= Σ= (x − µ)T Σ−1 (x − µ)
y µY ρσX σY σY2
T
σY2
1 x − µx −ρσX σY x − µx
= 2 2 2
σX σY (1 − ρ2 ) y − µy −ρσX σY σX y − µy
1 −1/2 1 T −1
exp − (x − µ) Σ (x − µ)
T
f (x) = (det Σ) 2
1 σY (x − µX ) − ρσX σY (y − µY ) x − µx
2π 2 = 2 2 2
2
σX σY (1 − ρ ) −ρσX σY (x − µX ) + σX (y − µY ) y − µy
1
σ 2 (x − µX )2 − 2ρσX σY (x − µX )(y − µY ) + σX2 (y − µY )2
We can confirm our results by checking the value of (det Σ)−1/2 and =
σX2 σY2 (1 − ρ2 ) Y
(x − µ)T Σ−1 (x − µ) for the bivariate case. 1 (x − µX )2 (x − µX )(y − µY ) (y − µY )2
!
= − 2ρ +
1 − ρ2 σX2 σX σY σY2
−1/2 1
(det Σ)−1/2 = σX2 σY2 − ρ2 σX2 σY2 =
σX σY (1 − ρ2 )1/2
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 8 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 9 / 22
X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 2), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 2) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1)
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 10 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 11 / 22
6.5 Conditional Distributions 6.5 Conditional Distributions
X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 1) X ∼ N (0, 2), Y ∼ N (0, 1) X ∼ N (0, 1), Y ∼ N (0, 2)
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 12 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 13 / 22
Matrix notation allows us to easily express the density of the multivariate In the bivariate case, we had a nice transformation such that we could
normal distribution for an arbitrary number of dimensions. We express the generate two independent unit normal values and transform them into a
k-dimensional multivariate normal distribution as follows, sample from an arbitrary bivariate normal distribution.
X ∼ Nk (µ, Σ) There is a similar method for the multivariate normal distribution that
takes advantage of the Cholesky decomposition of the covariance matrix.
where µ is the k × 1 column vector of means and Σ is the k × k
covariance matrix where {Σ}i,j = Cov (Xi , Xj ).
The Cholesky decomposition is defined for a symmetric, positive definite
matrix X as
The density of the distribution is
L = Chol(X)
where L is a lower triangular matrix such that LLT = X.
1 −1/2 1 T −1
f (x) = (det Σ) exp − (x − µ) Σ (x − µ)
(2π)k/2 2
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 14 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 15 / 22
6.5 Conditional Distributions 6.5 Conditional Distributions
Let Z1 , . . . , Zk ∼ N (0, 1) and Z = (Z1 , . . . , Zk )T then We need to find the Cholesky decomposition of Σ for the general bivariate
case where 2
σX ρσX σY
Σ=
µ + Chol(Σ)Z ∼ Nk (µ, Σ) ρσX σY σY2
this is offered without proof in the general k-dimensional case but we can We need to solve the following for a, b, c
check that this results in the same transformation we started with in the
a2 σX2
a 0 a b ab ρσX σY
bivariate case and should justify how we knew to use that particular = =
b c 0 c ab b2 + c2 ρσX σY σY2
transformation.
This gives us three (unique) equations and three unknowns to solve for,
a2 = σX2 ab = ρσX σY b 2 + c 2 = σY2
a = σX
b = ρσX σY /a = ρσY
q
c = σY2 − b 2 = σY (1 − ρ2 )1/2
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 16 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 17 / 22
Cholesky and the Bivariate Transformation Conditional Expectation of the Bivariate Normal
Conditional Variance of the Bivariate Normal Example - Husbands and Wives (Example 5.10.6, deGroot)
Suppose that the heights of married couples can be explained by a bivariate normal distribution.
Using X = µX + σX Z1 and Y = µY + σY [ρZ1 + (1 − ρ2 )1/2 Z2 ] where If the wives have a mean heigh of 66.8 inches and a standard deviation of 2 inches while the
Z1 , Z2 ∼ N (0, 1) we can find Var (Y |X ). heights of the husbands have a mean of 70 inches and a standard deviation of 2 inches. The
correlation between the heights is 0.68. What is the probability that for a randomly selected
couple the wife is taller than her husband?
h i
Var [Y |X = x] = Var µY + σY ρZ1 + (1 − ρ2 )1/2 Z2 X = x
x − µX 2 1/2
= Var µY + σY ρ + (1 − ρ ) Z2 X = x
σX
= Var [σY (1 − ρ2 )Z2 |X = x]
= σY2 (1 − ρ2 )
By symmetry,
Var [X |Y = y ] = σX2 (1 − ρ2 )
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 20 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 21 / 22
Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 22 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 23 / 22