0% found this document useful (0 votes)
24 views

Lectures Week5

Uploaded by

baljeet chima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lectures Week5

Uploaded by

baljeet chima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ST119: Probability 2 Lecture notes for Week 5

Conditional distributions: discrete case


Recall that, for two events A, B with P(B) > 0, we define the conditional probability of A
given B as
P(A ∩ B)
P(A|B) = .
P(B)
Also recall that this can be interpreted as our “updated degree of belief that A occurs, after
we are told that B occurs”. For example, when rolling a fair die, the conditional probability
that the result is 2 conditional on the result being an even number is 1/3.
We now want to extend the notion of conditioning to random variables, starting with the dis-
crete case.
Definition. Let X and Y be discrete random variables. The conditional probability mass
function of X given Y is the function

pX,Y (x, y)
pX|Y (x|y) = P(X = x | Y = y) = , for x ∈ supp(X), y ∈ supp(Y ).
pY (y)
Hence, pX|Y (x|y) is the probability that X = x, given that Y = y. Let us make some comments
about this definition:
• Note that, for all y ∈ supp(Y ),
X X pX,Y (x, y) pY (y)
pX|Y (x|y) = = = 1.
x x
pY (y) pY (y)

So, for each fixed y, the function that maps x into pX|Y (x|y) is a probability mass func-
tion. We refer to the distribution associated to this probability mass function as the
distribution of X given that Y = y.

• The following equalities are often very useful:


X X
pX (x) = pX,Y (x, y) = pX|Y (x|y) · pY (y). (1)
y y

• Recall that two discrete random variables X and Y are independent if and only if pX,Y (x, y) =
pX (x)pY (y) for all x, y. This is equivalent to saying that pX|Y (x|y) = pX (x) for all x and
all y ∈ supp(Y ).

Example. Let X and Y be discrete with joint probability mass function given by the following
table:
x\y 1 2 3
1 1 1
0 12 12 12
1 1
1 0 2 4

Let us find pX|Y (x|y) for all choices of x and y. Note first that

1 1 1 1 7 1 1 1
pY (1) = +0= , pY (2) = + = , pY (3) = + = .
12 12 12 2 12 12 4 3

1
Then,

1/12 0
pX|Y (0|1) = = 1, pX|Y (1|1) = = 0,
1/12 1/12
1/12 1 1/2 6
pX|Y (0|2) = = , pX|Y (1|2) = = ,
7/12 7 7/12 7
1/12 1 1/4 3
pX|Y (0|3) = = , pX|Y (1|3) = = .
1/3 4 1/3 4

Note that pX|Y (x|y) is the proportion that pX,Y (x, y) represents of the total mass given
by pY (y) (in the table: the proportion that the entry in position (x, y) represents of the
total mass of its column).

Example. Let Y ∼ Bin(n, p). Let q ∈ (0, 1) and suppose that X is a random variable
with supp(X) = {0, . . . , n} and such that, for each y ∈ {0, . . . , n},
 
y
pX|Y (x|y) = · q x · (1 − q)y−x for x ∈ {0, . . . , y} and 0 otherwise.
x

We can understand the random variable X as a result of a two-step procedure:

• we first sample Y ∼ Bin(n, p); say its value is y;

• we use this y to now sample a Bin(y, q) random variable.

Let us find pX . For each x ∈ {0, . . . , n}, as in (1) we have


n
X
pX (x) = pX|Y (x|y) · pY (y).
y=0

Noting that pX|Y (x|y) = 0 if y < x, the above becomes


n n
X X y! n!
pX|Y (x|y) · pY (y) = · q x · (1 − q)y−x · · py · (1 − p)n−y
y=x y=x
x!(y − x)! y!(n − y)!
n
n! x
X (n − x)!
= · (pq) · · (1 − q)y−x · py−x · (1 − p)n−y .
x!(n − x)! y=x
(y − x)!(n − y)!

Changing ` = y − x, this becomes


n−x
X (n − x)!
n!
· (pq)x · · (1 − q)` · p` · (1 − p)n−`−x
x!(n − x)! `=0
`!(n − x − `)!

n!
= · (pq)x · (p(1 − q) + (1 − p))n−x
x!(n − x)!
n!
= · (pq)x · (1 − pq)n−x .
x!(n − x)!

This shows that X ∼ Bin(n, pq).

2
Example. Let X and Y be independent with X ∼ Poi(λ) and Y ∼ Poi(µ). Let Z = X + Y .
Find pX|Z (x|z).
Solution. Recall from the Week 3 lecture notes (first proposition) that Z ∼ Poi(λ + µ). Note
that, given that Z = z, X can take any value in {0, . . . , z}. Hence, pX|Z (x|z) is defined for
all pairs (x, z) such that z ∈ N0 and all x ∈ {0, . . . , z}. We compute

pX,Z (x, z) pX,Y (x, z − x) pX (x) · pY (z − x)


pX|Z (x|z) = = = ,
pZ (z) pZ (z) pZ (z)

where the last equality follows from the independence between X and Y . The right-hand side
equals
λx −λ µz−x −µ
e · (z−x)!
x  z−x
e

x! z! λ µ
z
(λ+µ) −(λ+µ)
= · · .
e x!(z − x)! λ+µ λ+µ
z!

This shows that, conditional to Z = z, the distribution of X is Bin(z, λ/(λ + µ)).

Conditional distributions: continuous case


Definition. Let X and Y be jointly continuous random variables. The conditional prob-
ability density function of X given Y is the function

fX,Y (x, y)
fX|Y (x|y) = ,
fY (y)

defined for all x ∈ R and all y such that fY (y) > 0.

As in the discrete case, let us make some comments about this definition.
• As a function of x for fixed y, fX|Y (x|y) is a probability density function since
Z ∞ Z ∞ Z ∞
fX,Y (x, y) 1
fX|Y (x|y) dx = dx = fX,Y (x, y) dx = 1.
−∞ −∞ fY (y) fY (y) −∞
We refer to the distribution that corresponds to this probability density function as the
distribution of X given that Y = y. Note that this is just a way to say things, since
formally {Y = y} is an event of probability zero, so we are not really allowed to condition
to it.
• While in the discrete case we had pX|Y (x|y) = P(X=x, Y =y)
P(Y =y)
, it does not make sense to
write such a quotient for fX|Y (x|y) (both the numerator and the denominator are zero!).
• Assume that A ⊆ R2 is a set of the form:

3
Then, we have
Z x2 Z b(x) Z x2 Z b(x)
P((X, Y ) ∈ A) = fX,Y (x, y) dy dx = fX (x) fY |X (y|x) dy dx.
x1 a(x) x1 a(x)

• As in the discrete case, X and Y are independent if and only if fX|Y (x|y) = fX (x) for
all y such that fY (y) > 0.

The bivariate normal distribution


We will now study an important kind of distribution, that is a two-dimensional version of the
normal distribution. Remembering this joint p.d.f. is not required for the exam.
Definition. Two random variables X and Y have a bivariate normal distribution with
parameters
2
µX ∈ R, µY ∈ R, σX > 0, σY2 > 0, ρ ∈ (−1, 1)
if the joint probability density function of X and Y is:

fX,Y (x, y)
( " 2  2 #)
1 1 x − µX y − µY (x − µX )(y − µY )
= p · exp − + − 2ρ
2πσX σY 1 − ρ2 2(1 − ρ2 ) σX σY σX σY

for x, y ∈ R (the notation exp{z} means ez ). We write


2
(X, Y ) ∼ N (µX , σX , µY , σY2 , ρ).

We will now see several properties of this joint distribution, starting with the marginals.
2
Proposition 1. If (X, Y ) ∼ N (µX , σX 2
, µY , σY2 , ρ), then X ∼ N (µX , σX ) and Y ∼ N (µY , σY2 ).
Proof. This proof is not examinable. We will only prove the statement for X (since the
statement for Y is treated in the same way, or even better, by symmetry). In order to render
the expression for fX,Y (x, y) more manageable, we let w = y−µ σY
Y
, so that
 2  2  2
x − µX y − µY (x − µX )(y − µY ) x − µX 2ρw(x − µX )
+ − 2ρ = + w2 − . (2)
σX σY σX σY σX σX
We now complete the square as follows:
 2  2
2 2ρw(x − µX ) ρ(x − µX ) 2 x − µX
w − = w− −ρ .
σX σX σX
With this at hand, the right-hand side of (2) becomes
 2  2
ρ(x − µX ) 2 x − µX
w− + (1 − ρ ) .
σX σX
We then have
fX,Y (x, y)
( " 2  2 #)
1 1 ρ(x − µX ) x − µ X
= p · exp − 2
w− + (1 − ρ2 )
2πσX σY 1 − ρ 2 2(1 − ρ ) σX σX
( 2 )
(x − µX )2

1 1 ρ(x − µX )
= p · exp − w− − 2
. (3)
2πσX σY 1 − ρ2 2(1 − ρ2 ) σX 2σX

4
Now, we compute the marginal probability density function of X using
Z ∞
fX (x) = fX,Y (x, y) dy,
−∞

replacing the expression for fX,Y (x, y) by what we obtained in (3), and using the substitu-
tion w = y−µ
σY
Y
(which gives dy = σY dw). We then obtain

fX (x)
Z ∞ ( 2 )
(x − µX )2
  
1 1 ρ(x − µX )
= p · exp − 2
· σY exp − w− dw
2πσX σY 1 − ρ2 2σX −∞ 2(1 − ρ2 ) σX

(x − µX )2
 
1
=√ · exp − 2
2πσX 2σX
Z ∞ (  2 )
1 1 ρ(x − µX )
×p · exp − w− dw
2π(1 − ρ2 ) −∞ 2(1 − ρ2 ) σX

(x − µX )2
 
1
=√ · exp − 2
.
2πσX 2σX

As you may have guessed, the value ρ ∈ (−1, 1) is the correlation coefficient between X and Y ,
but we will not prove that now. Next, we consider conditional density functions:
2
Proposition 2. Assume that (X, Y ) ∼ N (µX , σX , µY , σY2 , ρ). Then, conditionally on Y = y,
the distribution of X is
 
σX 2 2
N µX + ρ (y − µY ), (1 − ρ )σX .
σY

Proof. This proof is not examinable. Throughout this proof, we will denote expressions
that depend on constants and on y, but not on x, by C1 , C2 , etc. With this convention, we
can write
fX,Y (x, y)
fX|Y (x|y) = = C1 · fX,Y (x, y)
fY (y)
( " 2  2 #)
1 x − µX y − µY (x − µX )(y − µY )
= C2 · exp − + − 2ρ
2(1 − ρ2 ) σX σY σX σY
  
 
2
 x2 2
   
 1 2xµ X µ X y − µ Y x(y − µ Y ) µ X (y − µ Y ) 
= C2 · exp − 
2
− 2
+ 2
+ −2ρ + 2ρ 

 2(1 − ρ2 )  σX σX σX σY σX σY σX σY 
 |{z} | {z } | {z } 
no x no x no x
  
1 2 σX
= C3 · exp − 2
x − 2xµX − 2ρ x(y − µY )
2(1 − ρ2 )σX σY
   
1 2 σX
= C3 · exp − 2
x − 2x µX + ρ (y − µY ) .
2(1 − ρ2 )σX σY

5
We now complete the square
    2
2 σX σX
x − 2x µX + ρ (y − µY ) = x − µX + ρ (y − µY ) + C̃,
σY σY

where C̃ again does not depend on x. We thus obtained


   2 
 σ
 x − µX + ρ σ (y − µY )
X 

Y
fX|Y (x|y) = C4 · exp − 2
. (4)

 2(1 − ρ2 )σX 

We now observe that


   2 
σ
 x − µX + ρ σ (y − µY )
X
∞ ∞
Z Z  

Y
1= fX|Y (x|y) dx = C4 exp − 2
dx. (5)
−∞ −∞ 
 2(1 − ρ2 )σX 

 
On the other hand, integrating the density of N µX + ρ σσXY (y 2
− µY ), (1 − ρ 2
)σX , we have
   2 
1
Z ∞  x − µX + ρ σσX (y − µY )
 

Y
1= p exp − 2
dx. (6)
2
2π(1 − ρ2 )σX −∞ 
 2(1 − ρ2 )σX 

Comparing (5) and (6) shows that


1
C4 = p 2
,
2π(1 − ρ2 )σX

which together with (4) completes the proof.


2
Proposition 3. Assume that (X, Y ) ∼ N (µX , σX , µY , σY2 , ρ). Then, X and Y are indepen-
dent if and only if ρ = 0.

Proof. Recall that X and Y are independent if and only if fX,Y (x, y) = fX (x)fY (y) holds for
all x, y. Since fY (y) is (strictly) positive when Y is normally distributed, we have fX,Y (x, y) =
fX (x)fY (y) for all x, y if and only if fX|Y (x|y) = fX (x) for all x, y. By the above proposition,
we can see that this holds if and only if ρ = 0.
Remark. We have seen earlier that for two random variables X and Y , we have that

X, Y independent implies ρX,Y = 0, but ρX,Y = 0 does not imply X, Y independent.

We now see that, if we also know that (X, Y ) follow a bivariate normal distribution, then the
two directions hold, that is, X, Y are independent if and only if ρX,Y = 0.

You might also like