0% found this document useful (0 votes)
19 views

Lectures Week5

Uploaded by

baljeet chima
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Lectures Week5

Uploaded by

baljeet chima
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ST119: Probability 2 Lecture notes for Week 5

Conditional distributions: discrete case


Recall that, for two events A, B with P(B) > 0, we define the conditional probability of A
given B as
P(A ∩ B)
P(A|B) = .
P(B)
Also recall that this can be interpreted as our “updated degree of belief that A occurs, after
we are told that B occurs”. For example, when rolling a fair die, the conditional probability
that the result is 2 conditional on the result being an even number is 1/3.
We now want to extend the notion of conditioning to random variables, starting with the dis-
crete case.
Definition. Let X and Y be discrete random variables. The conditional probability mass
function of X given Y is the function

pX,Y (x, y)
pX|Y (x|y) = P(X = x | Y = y) = , for x ∈ supp(X), y ∈ supp(Y ).
pY (y)
Hence, pX|Y (x|y) is the probability that X = x, given that Y = y. Let us make some comments
about this definition:
• Note that, for all y ∈ supp(Y ),
X X pX,Y (x, y) pY (y)
pX|Y (x|y) = = = 1.
x x
pY (y) pY (y)

So, for each fixed y, the function that maps x into pX|Y (x|y) is a probability mass func-
tion. We refer to the distribution associated to this probability mass function as the
distribution of X given that Y = y.

• The following equalities are often very useful:


X X
pX (x) = pX,Y (x, y) = pX|Y (x|y) · pY (y). (1)
y y

• Recall that two discrete random variables X and Y are independent if and only if pX,Y (x, y) =
pX (x)pY (y) for all x, y. This is equivalent to saying that pX|Y (x|y) = pX (x) for all x and
all y ∈ supp(Y ).

Example. Let X and Y be discrete with joint probability mass function given by the following
table:
x\y 1 2 3
1 1 1
0 12 12 12
1 1
1 0 2 4

Let us find pX|Y (x|y) for all choices of x and y. Note first that

1 1 1 1 7 1 1 1
pY (1) = +0= , pY (2) = + = , pY (3) = + = .
12 12 12 2 12 12 4 3

1
Then,

1/12 0
pX|Y (0|1) = = 1, pX|Y (1|1) = = 0,
1/12 1/12
1/12 1 1/2 6
pX|Y (0|2) = = , pX|Y (1|2) = = ,
7/12 7 7/12 7
1/12 1 1/4 3
pX|Y (0|3) = = , pX|Y (1|3) = = .
1/3 4 1/3 4

Note that pX|Y (x|y) is the proportion that pX,Y (x, y) represents of the total mass given
by pY (y) (in the table: the proportion that the entry in position (x, y) represents of the
total mass of its column).

Example. Let Y ∼ Bin(n, p). Let q ∈ (0, 1) and suppose that X is a random variable
with supp(X) = {0, . . . , n} and such that, for each y ∈ {0, . . . , n},
 
y
pX|Y (x|y) = · q x · (1 − q)y−x for x ∈ {0, . . . , y} and 0 otherwise.
x

We can understand the random variable X as a result of a two-step procedure:

• we first sample Y ∼ Bin(n, p); say its value is y;

• we use this y to now sample a Bin(y, q) random variable.

Let us find pX . For each x ∈ {0, . . . , n}, as in (1) we have


n
X
pX (x) = pX|Y (x|y) · pY (y).
y=0

Noting that pX|Y (x|y) = 0 if y < x, the above becomes


n n
X X y! n!
pX|Y (x|y) · pY (y) = · q x · (1 − q)y−x · · py · (1 − p)n−y
y=x y=x
x!(y − x)! y!(n − y)!
n
n! x
X (n − x)!
= · (pq) · · (1 − q)y−x · py−x · (1 − p)n−y .
x!(n − x)! y=x
(y − x)!(n − y)!

Changing ` = y − x, this becomes


n−x
X (n − x)!
n!
· (pq)x · · (1 − q)` · p` · (1 − p)n−`−x
x!(n − x)! `=0
`!(n − x − `)!

n!
= · (pq)x · (p(1 − q) + (1 − p))n−x
x!(n − x)!
n!
= · (pq)x · (1 − pq)n−x .
x!(n − x)!

This shows that X ∼ Bin(n, pq).

2
Example. Let X and Y be independent with X ∼ Poi(λ) and Y ∼ Poi(µ). Let Z = X + Y .
Find pX|Z (x|z).
Solution. Recall from the Week 3 lecture notes (first proposition) that Z ∼ Poi(λ + µ). Note
that, given that Z = z, X can take any value in {0, . . . , z}. Hence, pX|Z (x|z) is defined for
all pairs (x, z) such that z ∈ N0 and all x ∈ {0, . . . , z}. We compute

pX,Z (x, z) pX,Y (x, z − x) pX (x) · pY (z − x)


pX|Z (x|z) = = = ,
pZ (z) pZ (z) pZ (z)

where the last equality follows from the independence between X and Y . The right-hand side
equals
λx −λ µz−x −µ
e · (z−x)!
x  z−x
e

x! z! λ µ
z
(λ+µ) −(λ+µ)
= · · .
e x!(z − x)! λ+µ λ+µ
z!

This shows that, conditional to Z = z, the distribution of X is Bin(z, λ/(λ + µ)).

Conditional distributions: continuous case


Definition. Let X and Y be jointly continuous random variables. The conditional prob-
ability density function of X given Y is the function

fX,Y (x, y)
fX|Y (x|y) = ,
fY (y)

defined for all x ∈ R and all y such that fY (y) > 0.

As in the discrete case, let us make some comments about this definition.
• As a function of x for fixed y, fX|Y (x|y) is a probability density function since
Z ∞ Z ∞ Z ∞
fX,Y (x, y) 1
fX|Y (x|y) dx = dx = fX,Y (x, y) dx = 1.
−∞ −∞ fY (y) fY (y) −∞
We refer to the distribution that corresponds to this probability density function as the
distribution of X given that Y = y. Note that this is just a way to say things, since
formally {Y = y} is an event of probability zero, so we are not really allowed to condition
to it.
• While in the discrete case we had pX|Y (x|y) = P(X=x, Y =y)
P(Y =y)
, it does not make sense to
write such a quotient for fX|Y (x|y) (both the numerator and the denominator are zero!).
• Assume that A ⊆ R2 is a set of the form:

3
Then, we have
Z x2 Z b(x) Z x2 Z b(x)
P((X, Y ) ∈ A) = fX,Y (x, y) dy dx = fX (x) fY |X (y|x) dy dx.
x1 a(x) x1 a(x)

• As in the discrete case, X and Y are independent if and only if fX|Y (x|y) = fX (x) for
all y such that fY (y) > 0.

The bivariate normal distribution


We will now study an important kind of distribution, that is a two-dimensional version of the
normal distribution. Remembering this joint p.d.f. is not required for the exam.
Definition. Two random variables X and Y have a bivariate normal distribution with
parameters
2
µX ∈ R, µY ∈ R, σX > 0, σY2 > 0, ρ ∈ (−1, 1)
if the joint probability density function of X and Y is:

fX,Y (x, y)
( " 2  2 #)
1 1 x − µX y − µY (x − µX )(y − µY )
= p · exp − + − 2ρ
2πσX σY 1 − ρ2 2(1 − ρ2 ) σX σY σX σY

for x, y ∈ R (the notation exp{z} means ez ). We write


2
(X, Y ) ∼ N (µX , σX , µY , σY2 , ρ).

We will now see several properties of this joint distribution, starting with the marginals.
2
Proposition 1. If (X, Y ) ∼ N (µX , σX 2
, µY , σY2 , ρ), then X ∼ N (µX , σX ) and Y ∼ N (µY , σY2 ).
Proof. This proof is not examinable. We will only prove the statement for X (since the
statement for Y is treated in the same way, or even better, by symmetry). In order to render
the expression for fX,Y (x, y) more manageable, we let w = y−µ σY
Y
, so that
 2  2  2
x − µX y − µY (x − µX )(y − µY ) x − µX 2ρw(x − µX )
+ − 2ρ = + w2 − . (2)
σX σY σX σY σX σX
We now complete the square as follows:
 2  2
2 2ρw(x − µX ) ρ(x − µX ) 2 x − µX
w − = w− −ρ .
σX σX σX
With this at hand, the right-hand side of (2) becomes
 2  2
ρ(x − µX ) 2 x − µX
w− + (1 − ρ ) .
σX σX
We then have
fX,Y (x, y)
( " 2  2 #)
1 1 ρ(x − µX ) x − µ X
= p · exp − 2
w− + (1 − ρ2 )
2πσX σY 1 − ρ 2 2(1 − ρ ) σX σX
( 2 )
(x − µX )2

1 1 ρ(x − µX )
= p · exp − w− − 2
. (3)
2πσX σY 1 − ρ2 2(1 − ρ2 ) σX 2σX

4
Now, we compute the marginal probability density function of X using
Z ∞
fX (x) = fX,Y (x, y) dy,
−∞

replacing the expression for fX,Y (x, y) by what we obtained in (3), and using the substitu-
tion w = y−µ
σY
Y
(which gives dy = σY dw). We then obtain

fX (x)
Z ∞ ( 2 )
(x − µX )2
  
1 1 ρ(x − µX )
= p · exp − 2
· σY exp − w− dw
2πσX σY 1 − ρ2 2σX −∞ 2(1 − ρ2 ) σX

(x − µX )2
 
1
=√ · exp − 2
2πσX 2σX
Z ∞ (  2 )
1 1 ρ(x − µX )
×p · exp − w− dw
2π(1 − ρ2 ) −∞ 2(1 − ρ2 ) σX

(x − µX )2
 
1
=√ · exp − 2
.
2πσX 2σX

As you may have guessed, the value ρ ∈ (−1, 1) is the correlation coefficient between X and Y ,
but we will not prove that now. Next, we consider conditional density functions:
2
Proposition 2. Assume that (X, Y ) ∼ N (µX , σX , µY , σY2 , ρ). Then, conditionally on Y = y,
the distribution of X is
 
σX 2 2
N µX + ρ (y − µY ), (1 − ρ )σX .
σY

Proof. This proof is not examinable. Throughout this proof, we will denote expressions
that depend on constants and on y, but not on x, by C1 , C2 , etc. With this convention, we
can write
fX,Y (x, y)
fX|Y (x|y) = = C1 · fX,Y (x, y)
fY (y)
( " 2  2 #)
1 x − µX y − µY (x − µX )(y − µY )
= C2 · exp − + − 2ρ
2(1 − ρ2 ) σX σY σX σY
  
 
2
 x2 2
   
 1 2xµ X µ X y − µ Y x(y − µ Y ) µ X (y − µ Y ) 
= C2 · exp − 
2
− 2
+ 2
+ −2ρ + 2ρ 

 2(1 − ρ2 )  σX σX σX σY σX σY σX σY 
 |{z} | {z } | {z } 
no x no x no x
  
1 2 σX
= C3 · exp − 2
x − 2xµX − 2ρ x(y − µY )
2(1 − ρ2 )σX σY
   
1 2 σX
= C3 · exp − 2
x − 2x µX + ρ (y − µY ) .
2(1 − ρ2 )σX σY

5
We now complete the square
    2
2 σX σX
x − 2x µX + ρ (y − µY ) = x − µX + ρ (y − µY ) + C̃,
σY σY

where C̃ again does not depend on x. We thus obtained


   2 
 σ
 x − µX + ρ σ (y − µY )
X 

Y
fX|Y (x|y) = C4 · exp − 2
. (4)

 2(1 − ρ2 )σX 

We now observe that


   2 
σ
 x − µX + ρ σ (y − µY )
X
∞ ∞
Z Z  

Y
1= fX|Y (x|y) dx = C4 exp − 2
dx. (5)
−∞ −∞ 
 2(1 − ρ2 )σX 

 
On the other hand, integrating the density of N µX + ρ σσXY (y 2
− µY ), (1 − ρ 2
)σX , we have
   2 
1
Z ∞  x − µX + ρ σσX (y − µY )
 

Y
1= p exp − 2
dx. (6)
2
2π(1 − ρ2 )σX −∞ 
 2(1 − ρ2 )σX 

Comparing (5) and (6) shows that


1
C4 = p 2
,
2π(1 − ρ2 )σX

which together with (4) completes the proof.


2
Proposition 3. Assume that (X, Y ) ∼ N (µX , σX , µY , σY2 , ρ). Then, X and Y are indepen-
dent if and only if ρ = 0.

Proof. Recall that X and Y are independent if and only if fX,Y (x, y) = fX (x)fY (y) holds for
all x, y. Since fY (y) is (strictly) positive when Y is normally distributed, we have fX,Y (x, y) =
fX (x)fY (y) for all x, y if and only if fX|Y (x|y) = fX (x) for all x, y. By the above proposition,
we can see that this holds if and only if ρ = 0.
Remark. We have seen earlier that for two random variables X and Y , we have that

X, Y independent implies ρX,Y = 0, but ρX,Y = 0 does not imply X, Y independent.

We now see that, if we also know that (X, Y ) follow a bivariate normal distribution, then the
two directions hold, that is, X, Y are independent if and only if ρX,Y = 0.

You might also like