Lectures Week5
Lectures Week5
pX,Y (x, y)
pX|Y (x|y) = P(X = x | Y = y) = , for x ∈ supp(X), y ∈ supp(Y ).
pY (y)
Hence, pX|Y (x|y) is the probability that X = x, given that Y = y. Let us make some comments
about this definition:
• Note that, for all y ∈ supp(Y ),
X X pX,Y (x, y) pY (y)
pX|Y (x|y) = = = 1.
x x
pY (y) pY (y)
So, for each fixed y, the function that maps x into pX|Y (x|y) is a probability mass func-
tion. We refer to the distribution associated to this probability mass function as the
distribution of X given that Y = y.
• Recall that two discrete random variables X and Y are independent if and only if pX,Y (x, y) =
pX (x)pY (y) for all x, y. This is equivalent to saying that pX|Y (x|y) = pX (x) for all x and
all y ∈ supp(Y ).
Example. Let X and Y be discrete with joint probability mass function given by the following
table:
x\y 1 2 3
1 1 1
0 12 12 12
1 1
1 0 2 4
Let us find pX|Y (x|y) for all choices of x and y. Note first that
1 1 1 1 7 1 1 1
pY (1) = +0= , pY (2) = + = , pY (3) = + = .
12 12 12 2 12 12 4 3
1
Then,
1/12 0
pX|Y (0|1) = = 1, pX|Y (1|1) = = 0,
1/12 1/12
1/12 1 1/2 6
pX|Y (0|2) = = , pX|Y (1|2) = = ,
7/12 7 7/12 7
1/12 1 1/4 3
pX|Y (0|3) = = , pX|Y (1|3) = = .
1/3 4 1/3 4
Note that pX|Y (x|y) is the proportion that pX,Y (x, y) represents of the total mass given
by pY (y) (in the table: the proportion that the entry in position (x, y) represents of the
total mass of its column).
Example. Let Y ∼ Bin(n, p). Let q ∈ (0, 1) and suppose that X is a random variable
with supp(X) = {0, . . . , n} and such that, for each y ∈ {0, . . . , n},
y
pX|Y (x|y) = · q x · (1 − q)y−x for x ∈ {0, . . . , y} and 0 otherwise.
x
n!
= · (pq)x · (p(1 − q) + (1 − p))n−x
x!(n − x)!
n!
= · (pq)x · (1 − pq)n−x .
x!(n − x)!
2
Example. Let X and Y be independent with X ∼ Poi(λ) and Y ∼ Poi(µ). Let Z = X + Y .
Find pX|Z (x|z).
Solution. Recall from the Week 3 lecture notes (first proposition) that Z ∼ Poi(λ + µ). Note
that, given that Z = z, X can take any value in {0, . . . , z}. Hence, pX|Z (x|z) is defined for
all pairs (x, z) such that z ∈ N0 and all x ∈ {0, . . . , z}. We compute
where the last equality follows from the independence between X and Y . The right-hand side
equals
λx −λ µz−x −µ
e · (z−x)!
x z−x
e
x! z! λ µ
z
(λ+µ) −(λ+µ)
= · · .
e x!(z − x)! λ+µ λ+µ
z!
fX,Y (x, y)
fX|Y (x|y) = ,
fY (y)
As in the discrete case, let us make some comments about this definition.
• As a function of x for fixed y, fX|Y (x|y) is a probability density function since
Z ∞ Z ∞ Z ∞
fX,Y (x, y) 1
fX|Y (x|y) dx = dx = fX,Y (x, y) dx = 1.
−∞ −∞ fY (y) fY (y) −∞
We refer to the distribution that corresponds to this probability density function as the
distribution of X given that Y = y. Note that this is just a way to say things, since
formally {Y = y} is an event of probability zero, so we are not really allowed to condition
to it.
• While in the discrete case we had pX|Y (x|y) = P(X=x, Y =y)
P(Y =y)
, it does not make sense to
write such a quotient for fX|Y (x|y) (both the numerator and the denominator are zero!).
• Assume that A ⊆ R2 is a set of the form:
3
Then, we have
Z x2 Z b(x) Z x2 Z b(x)
P((X, Y ) ∈ A) = fX,Y (x, y) dy dx = fX (x) fY |X (y|x) dy dx.
x1 a(x) x1 a(x)
• As in the discrete case, X and Y are independent if and only if fX|Y (x|y) = fX (x) for
all y such that fY (y) > 0.
fX,Y (x, y)
( " 2 2 #)
1 1 x − µX y − µY (x − µX )(y − µY )
= p · exp − + − 2ρ
2πσX σY 1 − ρ2 2(1 − ρ2 ) σX σY σX σY
We will now see several properties of this joint distribution, starting with the marginals.
2
Proposition 1. If (X, Y ) ∼ N (µX , σX 2
, µY , σY2 , ρ), then X ∼ N (µX , σX ) and Y ∼ N (µY , σY2 ).
Proof. This proof is not examinable. We will only prove the statement for X (since the
statement for Y is treated in the same way, or even better, by symmetry). In order to render
the expression for fX,Y (x, y) more manageable, we let w = y−µ σY
Y
, so that
2 2 2
x − µX y − µY (x − µX )(y − µY ) x − µX 2ρw(x − µX )
+ − 2ρ = + w2 − . (2)
σX σY σX σY σX σX
We now complete the square as follows:
2 2
2 2ρw(x − µX ) ρ(x − µX ) 2 x − µX
w − = w− −ρ .
σX σX σX
With this at hand, the right-hand side of (2) becomes
2 2
ρ(x − µX ) 2 x − µX
w− + (1 − ρ ) .
σX σX
We then have
fX,Y (x, y)
( " 2 2 #)
1 1 ρ(x − µX ) x − µ X
= p · exp − 2
w− + (1 − ρ2 )
2πσX σY 1 − ρ 2 2(1 − ρ ) σX σX
( 2 )
(x − µX )2
1 1 ρ(x − µX )
= p · exp − w− − 2
. (3)
2πσX σY 1 − ρ2 2(1 − ρ2 ) σX 2σX
4
Now, we compute the marginal probability density function of X using
Z ∞
fX (x) = fX,Y (x, y) dy,
−∞
replacing the expression for fX,Y (x, y) by what we obtained in (3), and using the substitu-
tion w = y−µ
σY
Y
(which gives dy = σY dw). We then obtain
fX (x)
Z ∞ ( 2 )
(x − µX )2
1 1 ρ(x − µX )
= p · exp − 2
· σY exp − w− dw
2πσX σY 1 − ρ2 2σX −∞ 2(1 − ρ2 ) σX
(x − µX )2
1
=√ · exp − 2
2πσX 2σX
Z ∞ ( 2 )
1 1 ρ(x − µX )
×p · exp − w− dw
2π(1 − ρ2 ) −∞ 2(1 − ρ2 ) σX
(x − µX )2
1
=√ · exp − 2
.
2πσX 2σX
As you may have guessed, the value ρ ∈ (−1, 1) is the correlation coefficient between X and Y ,
but we will not prove that now. Next, we consider conditional density functions:
2
Proposition 2. Assume that (X, Y ) ∼ N (µX , σX , µY , σY2 , ρ). Then, conditionally on Y = y,
the distribution of X is
σX 2 2
N µX + ρ (y − µY ), (1 − ρ )σX .
σY
Proof. This proof is not examinable. Throughout this proof, we will denote expressions
that depend on constants and on y, but not on x, by C1 , C2 , etc. With this convention, we
can write
fX,Y (x, y)
fX|Y (x|y) = = C1 · fX,Y (x, y)
fY (y)
( " 2 2 #)
1 x − µX y − µY (x − µX )(y − µY )
= C2 · exp − + − 2ρ
2(1 − ρ2 ) σX σY σX σY
2
x2 2
1 2xµ X µ X y − µ Y x(y − µ Y ) µ X (y − µ Y )
= C2 · exp −
2
− 2
+ 2
+ −2ρ + 2ρ
2(1 − ρ2 ) σX σX σX σY σX σY σX σY
|{z} | {z } | {z }
no x no x no x
1 2 σX
= C3 · exp − 2
x − 2xµX − 2ρ x(y − µY )
2(1 − ρ2 )σX σY
1 2 σX
= C3 · exp − 2
x − 2x µX + ρ (y − µY ) .
2(1 − ρ2 )σX σY
5
We now complete the square
2
2 σX σX
x − 2x µX + ρ (y − µY ) = x − µX + ρ (y − µY ) + C̃,
σY σY
On the other hand, integrating the density of N µX + ρ σσXY (y 2
− µY ), (1 − ρ 2
)σX , we have
2
1
Z ∞ x − µX + ρ σσX (y − µY )
Y
1= p exp − 2
dx. (6)
2
2π(1 − ρ2 )σX −∞
2(1 − ρ2 )σX
Proof. Recall that X and Y are independent if and only if fX,Y (x, y) = fX (x)fY (y) holds for
all x, y. Since fY (y) is (strictly) positive when Y is normally distributed, we have fX,Y (x, y) =
fX (x)fY (y) for all x, y if and only if fX|Y (x|y) = fX (x) for all x, y. By the above proposition,
we can see that this holds if and only if ρ = 0.
Remark. We have seen earlier that for two random variables X and Y , we have that
We now see that, if we also know that (X, Y ) follow a bivariate normal distribution, then the
two directions hold, that is, X, Y are independent if and only if ρX,Y = 0.