0% found this document useful (0 votes)
10 views10 pages

P9-Conditional Distribution

STAT

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views10 pages

P9-Conditional Distribution

STAT

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

§9 Conditional distribution

§9.1 Introduction
9.1.1 Let f (x, y), fX (x) and fY (y) be the joint probability (density or mass) function of (X, Y ) and
the marginal probability functions of X and Y respectively. Then the conditional probability
function of X given Y = y is
f (x, y)
fX|Y (x|y) = ,
fY (y)
and the corresponding conditional cdf is

FX|Y (x|y) = P(X ≤ x|Y = y)


 X X


 f X|Y (u|y) = P(X = u|Y = y) (discrete case),

{u: u≤x} {u: u≤x}
= Z x

fX|Y (u|y) du (continuous case).



−∞

9.1.2 By conditioning on {Y = y}, we limit our scope to those outcomes of X that are possible
when Y is observed to be y.
Example. Let X = no. of casualties at a road accident and Y = total weight (in tons) of vehicles
involved. Then the conditional distributions of X given different values of Y may be very different, i.e.
distribution of X|Y = 10 may be different from distribution of X|Y = 1, say.

9.1.3 Special case: Y = 1 {A} (for some event A)


For brevity, we write f (x|A) = fX|Y (x|1) and f (x|Ac ) = fX|Y (x|0), and similarly for condi-
tional cdf’s.

9.1.4 X, Y independent if and only if fX|Y (x|y) = fX (x) for all x, y.

9.1.5 Conditional distributions may similarly be defined for groups of random variables.
For example, for random variables X = (X1 , . . . , Xr ) and Y = (Y1 , . . . , Ys ), let

f (x1 , . . . , xr , y1 , . . . , ys ), fX (x1 , . . . , xr ) and fY (y1 , . . . , ys )

X , Y ), X and Y , respectively. Then


be the joint probability functions of (X

56
(a) conditional (joint) probability function of X given Y = (y1 , . . . , ys ):

f (x1 , . . . , xr , y1 , . . . , ys )
fX |YY (x1 , . . . , xr |y1 , . . . , ys ) = ;
fY (y1 , . . . , ys )

(b) conditional (joint) cdf of X given Y = (y1 , . . . , ys ):

FX |YY (x1 , . . . , xr |y1 , . . . , ys ) = P(X1 ≤ x1 , . . . , Xr ≤ xr | Y1 = y1 , . . . , Ys = ys ).

9.1.6 X , Y independent if and only if

fX |YY (x1 , . . . , xr |y1 , . . . , ys ) = fX (x1 , . . . , xr ), for all x1 , . . . , xr , y1 , . . . , ys .

9.1.7 Concepts previously established for “unconditional” distributions can be obtained analogously
for conditional distributions by substituting conditional probabilities P(·|·) for P(·).

9.1.8 CONDITIONAL INDEPENDENCE


X1 , X2 , . . . are conditionally independent given Y iff
n
Y
P( X1 ≤ x1 , . . . , Xn ≤ xn |Y = y ) = P(Xi ≤ xi |Y = y)
i=1

for all x1 , . . . , xn , y ∈ (−∞, ∞) and any n ∈ {1, 2, . . .}. The latter condition is equivalent to
n
Y
f(X1 ,...,Xn )|Y (x1 , . . . , xn |y) = fXi |Y (xi |y)
i=1

for all x1 , . . . , xn , y ∈ (−∞, ∞) and any n ∈ {1, 2, . . .}, where f(X1 ,...,Xn )|Y (x1 , . . . , xn |y) denotes
the joint probability function of (X1 , . . . , Xn ) conditional on Y = y.

9.1.9 Examples.

(i) Toss a coin N times, where N ∼ Poisson (λ). Suppose the coin has probability p of
turning up “head”. Let X = no. of heads and Y = N − X = no. of tails. Then
 
n x
p (1 − p)n−x 1 x ∈ {0, 1, . . . , n} ,

fX|N (x|n) =
x
 
n n−y
p (1 − p)y 1 y ∈ {0, 1, . . . , n} .

fY |N (y|n) =
y

57
Conditional joint mass function of (X, Y ) given N = n:

f(X,Y )|N (x, y|n) = P(X = x, Y = y|N = n)


 
n x
p (1 − p)n−x 1 x = n − y ∈ {0, 1, . . . , n} 6= fX|N (x|n) fY |N (y|n)

=
x

in general — hence X, Y are not conditionally independent given N .


Joint mass function of (X, Y ):

X
f (x, y) = f(X,Y )|N (x, y|n) P(N = n)
n=0
∞  
X n x n−x
 λn e−λ
= p (1 − p) 1 x = n − y ∈ {0, 1, . . . , n}
n=0
x n!
y −(1−p)λ 
x+y −λ x −pλ −
   
x+y x λ e (pλ) e (1 p)λ e
= p (1 − p)y = ,
x (x + y)! x! y!

so that X and Y are independent Poisson random variables with means pλ and (1 − p)λ,
respectively.
(ii) Joint pdf:
f (x, y, z) = 40 xz 1 {x, y, z ≥ 0, x + z ≤ 1, y + z ≤ 1}.
It has been derived in Example §7.1.11(c) that, for x, y, z ∈ [0, 1],
20 5
fX (x) = x(1 − x)2 (1 + 2x), fY (y) = (1 − 4y 3 + 3y 4 ), fZ (z) = 20z(1 − z)3 .
3 3
Thus, for x, y, z ∈ [0, 1], conditional pdf’s can be obtained as follows.
– Given Z
  
f (x, y, z) 2x 1{x ≤ 1 − z} 1{y ≤ 1 − z}
f(X,Y )|Z (x, y|z) = = .
fZ (z) (1 − z)2 1−z

The above decomposition suggests that X and Y are conditionally independent given
Z, and therefore

2x 1 {x ≤ 1 − z}
fX|(Y,Z) (x|y, z) = fX|Z (x|z) = ,


(1 − z)2
 1 {y ≤ 1 − z}
fY |(X,Z) (y|x, z) = fY |Z (y|z) =
 ⇒ Y |(X, Z) = (x, z) ∼ U [0, 1 − z].
1−z

58
– Given Y

f (x, y, z) 24 xz 1 {x + z ≤ 1, z ≤ 1 − y}
f(X,Z)|Y (x, z|y) = =
fY (y) 1 − 4y 3 + 3y 4

cannot be expressed as a product of a function of x and a function of z, which implies


that X and Y are not conditionally independent given Y .
From f(X,Z)|Y (x, z|y) we may also derive:

1
2
12 x 1 − max{x, y}
Z
fX|Y (x|y) = f(X,Z)|Y (x, z|y) dz = ,
0 1 − 4y 3 + 3y 4
Z 1
12 z(1 − z)2 1 {z ≤ 1 − y}
fZ|Y (z|y) = f(X,Z)|Y (x, z|y) dx = ,
0 1 − 4y 3 + 3y 4
f(X,Z)|Y (x, z|y) 2 z 1{z ≤ 1 − max{x, y}}
fZ|(X,Y ) (z|x, y) = = 2 .
fX|Y (x|y) 1 − max{x, y}

– Given X

f (x, y, z) 6 z 1 {z ≤ 1 − x, y + z ≤ 1}
f(Y,Z)|X (y, z|x) = = .
fX (x) (1 − x)2 (1 + 2x)

cannot be expressed as a product of a function of y and a function of z, which implies


that Y and Z are not conditionally independent given X.
From f(Y,Z)|X (y, z|x) we may also derive:

1
2
3 1 − max{x, y}
Z
fY |X (y|x) = f(Y,Z)|X (y, z|x) dz = ,
0 (1 − x)2 (1 + 2x)
Z 1
6 z(1 − z) 1 {z ≤ 1 − x}
fZ|X (z|x) = f(Y,Z)|X (y, z|x) dy = .
0 (1 − x)2 (1 + 2x)

§9.2 Conditional expectation


9.2.1 Let fX|Y (x|y) be the conditional probability function of X given Y = y. Then, for any function
g(·),  X


 g(x)fX|Y (x|y) (discrete case),
  x∈X(Ω)
E g(X) Y = y = Z ∞

g(x)fX|Y (x|y) dx (continuous case).



−∞

59
 
9.2.2 Let ψ(y) = E g(X) Y = y , a function of y. The random variable ψ(Y ) is usually written
     
as E g(X) Y for brevity, so that E g(X) Y = y is a realisation of E g(X) Y when Y is
observed to be y.
 
9.2.3 E g(X) X = g(X).
(
E[X|Y ] = E[X],
9.2.4 X, Y independent ⇒
E[Y |X] = E[Y ].
R R
Proof: ∀ y, E[X|Y = y] = x fX|Y (x|y) dx = x fX (x) dx = E[X]. (discrete case similar).
 
9.2.5 Proposition. E E[X|Y ] = E[X].
Proof: Consider the continuous case (discrete case similar).
Z Z Z 
   
E E[X|Y ] = E X Y = y fY (y) dy = x fX|Y (x|y) dx fY (y) dy
Z Z  Z
= x f (x, y) dy dx = x fX (x) dx = E[X].

 
9.2.6 Proposition. For any event A, E P(A|Y ) = P(A).
Proof: Note that
   
E 1 {A} Y = 1 × P 1 {A} = 1 Y + 0 × P 1 {A} = 0 Y = P(A|Y ).
 
Similarly, E 1 {A} = P(A). The result follows by applying Proposition §9.2.5 with X = 1 {A}.

9.2.7 Standard properties of E[ · ] still hold for conditional expectations E[ · | Y ]:

• X1 ≥ X2 given Y ⇒ E[X1 |Y ] ≥ E[X2 |Y ].


• For any functions α(Y ), β(Y ) of Y ,
 
E α(Y )X1 + β(Y )X2 Y = α(Y ) E[X1 |Y ] + β(Y ) E[X2 |Y ].
 
• E[X|Y ] ≤ E |X| Y .
 
• g(X1 , . . . , Xn ) = h(Y ) ⇒ E g(X1 , . . . , Xn ) Y = h(Y ).
• X1 , X2 conditionally independent given Y ⇒ E[X1 X2 |Y ] = E[X1 |Y ] E[X2 |Y ].

9.2.8 Concepts derived from E[ · ] can be extended to a conditional version. For example,

60
• CONDITIONAL VARIANCE
  2
– Var(X|Y ) = E (X − E[X|Y ])2 Y = E[X 2 |Y ] − E[X|Y ] .
  
Note: Var(X) = E Var(X|Y ) + Var E[X|Y ] .

– For any functions a(Y ), b(Y ) of Y , Var a(Y )X + b(Y ) Y = a(Y )2 Var(X|Y ).
– Var(X|Y ) ≥ 0.

– Var(X|Y ) = 0 iff P X = h(Y ) Y = 1 for some function h(Y ) of Y .
P  P
– X1 , . . . , Xn conditionally independent given Y ⇒ Var i Xi Y = i Var(Xi |Y ).

• CONDITIONAL COVARIANCE/CORRELATION COEFFICIENT


 
Cov(X1 , X2 |Y ) = E (X1 − E[X1 |Y ])(X2 − E[X2 |Y ]) Y
= E[X1 X2 |Y ] − E[X1 |Y ] E[X2 |Y ],
Cov(X1 , X2 |Y )
ρ(X|Y ) = p .
Var(X1 |Y )Var(X2 |Y )

• CONDITIONAL QUANTILE
Conditional αth quantile of X given Y is inf{x ∈ R : FX|Y (x|Y ) > α}.

9.2.9 The results of §9.2.1 to §9.2.8 can be extended to the multivariate case where X is replaced
by (X1 , . . . , Xr ) and Y replaced by (Y1 , . . . , Ys ).

9.2.10 Examples — (cont’d from §9.1.9)

(i) X|N ∼ Binomial (N, p) ⇒ E[X|N ] = N p.


Then
 
E E[X|N ] = p E[N ] = pλ.
But, unconditionally, X ∼ Poisson (pλ), which implies E[X] = pλ.
 
This confirms E E[X|N ] = E[X].
(ii) Consider conditional expectations of g(X, Y, Z) = XY Z given X, Y, Z, respectively.
– Given Z
Since X and Y are conditionally independent given Z,
       
E XY Z Z = Z E XY Z = Z E X Z E Y Z
Z 1−Z Z 1−Z
2x2 y Z(1 − Z)2
=Z dx dy = .
0 (1 − Z)2 0 1−Z 3

61
– Given Y
Z 1Z 1
   
E XY Z Y = Y E XZ Y = Y xz f(X,Z)|Y (x, z|Y ) dx dz
0 0
Z 1−Y Z 1−z
24Y 2 2 2Y (1 − Y )(1 + 3Y + 6Y 2 + 10Y 3 )
= z x dx dz = .
1 − 4Y 3 + 3Y 4 0 0 15(1 + 2Y + 3Y 2 )

– Given X
Z 1Z 1
   
E XY Z X = X E Y Z X = X yz f(Y,Z)|X (y, z|X) dy dz
0 0
Z 1−X Z 1−z
6X 2 X(1 − X)(1 + 3X + 6X 2 )
= z y dy dz = .
(1 − X)2 (1 + 2X) 0 0 10(1 + 2X)

It has been derived in Example §7.1.11(c) that, for x, y, z ∈ [0, 1],


20 5
fX (x) = x(1 − x)2 (1 + 2x), fY (y) = (1 − 4y 3 + 3y 4 ), fZ (z) = 20z(1 − z)3 .
3 3
Thus,
Z 1
 
E E[XY Z|X] = E[XY Z|X = x] fX (x) dx
0
Z 1
x(1 − x)(1 + 3x + 6x2 )
 
20 2
= x(1 − x) (1 + 2x) dx = 5/126.
0 10(1 + 2x) 3

Similarly,
1
2y(1 − y)(1 + 3y + 6y 2 + 10y 3 )
Z   
  5 3 4
E E[XY Z|Y ] = (1 − 4y + 3y ) dy = 5/126,
0 15(1 + 2y + 3y 2 ) 3
Z 1
z(1 − z)2
 
3
 
E E[XY Z|Z] = 20z(1 − z) dz = 5/126.
0 3

As expected, the above results agree with that derived in Example §8.1.4(iii):
     
E E[XY Z|X] = E E[XY Z|Y ] = E E[XY Z|Z] = E[XY Z] = 5/126.

9.2.11 Proposition. For a random variable X and an event A,


 
E X 1 {A} = E[X|A] P(A).

62
Proof: Consider
  h  i h  i
E X 1 {A} = E E X 1 {A} 1 {A} = E 1 {A} E X 1 {A}
  
= E X 1 {A} = 1 P 1 {A} = 1 = E[X|A] P(A).

9.2.12 Proposition. Suppose Ω = A1 ∪ A2 · · · , where the Aj ’s are mutually exclusive. Then

E[X] = E[X|A1 ] P(A1 ) + E[X|A2 ] P(A2 ) + · · · .

The expectation of X can be treated as a weighted average of the conditional expectations of X


given disjoint sectors of the sample space. The weights are determined by the probabilities of the
sectors. The special case X = 1 {B} reduces to the “law of total probability”.
Proof:

Clearly, X = X 1 {A1 } + X 1 {A2 } + · · · . The result follows from Proposition §9.2.11.

9.2.13 Example. A person is randomly selected from an adult population and his/her height X
measured. It is known that the mean height of a man is 1.78m, and that of a woman is 1.68m.
Men account for 48% of the population. Calculate the mean height of the adult population,
E[X].
Answer:

E[X] = E[X|{man}] P(man) + E[X|{woman}] P(woman) = 1.78m × 0.48 + 1.68m × 0.52 = 1.728m.

§9.3 *** More challenges ***


9.3.1 Let X and Y be independent continuous random variables with joint density function

 Cy,
 x, y ≥ 0, x + y ≤ 1,
−3
f (x, y) = C(x + y) , x, y ≥ 0, x + y > 1,

0, otherwise,

for some constant C > 0.

(a) Find C.
(b) Find the marginal pdf’s of X and Y .
(c) Find the conditional pdf’s fX|Y and fY |X .

63
9.3.2 Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables with a
common mean µ and common variance σ 2 . Let N be a random positive integer, independent of
X1 , X2 , . . . , with E[N ] = ν and Var(N ) = τ 2 . Define S = N
P
i=1 Xi , e(n) = E[S|N = n] and
v(n) = Var(S|N = n), for n = 1, 2, . . . .

(a) Write down explicit expressions for the functions e(n) and v(n).
(b) Show that e(N ) has mean µν and variance µ2 τ 2 .
(c) Show that v(N ) has mean σ 2 ν.
(d) Deduce from (b) and (c), or otherwise, that S has mean µν and variance µ2 τ 2 + σ 2 ν.

9.3.3 Dreydel is an ancient game played by Jews at the Chanukah festival. It can be played by any
number, m say, of players. Each player takes turns to spin a four-sided top — the dreydel —
marked with letters N, G, H and S respectively. Before the game starts, each player contributes one
unit to the pot which thus contains m units. Depending on the outcome of his spin, the spinning
player

• receives no payoff if N turns up,


• receives the entire pot if G turns up,
• receives half the pot if H turns up,
• contributes 1 unit to the pot if S turns up.

If G turns up, all m players must each contribute one unit to the pot to start the game again.

(a) Show that in the long run, the pot contains 2(m + 1)/3 units on average.
(b) Is Dreydel a fair game, i.e. no player has advantages over the others?

9.3.4 A mysterious organism is capable of resurrections, so that it can start a new life immediately after
death and repeat this cycle indefinitely. Let Xi be the duration of the ith life of the organism so
that Sn = ni=1 Xi gives the time of its nth resurrection. Define S0 = 0 by convention. Assume
P

that X1 , X2 , . . . are independent unit-rate exponential random variables with the density function

f (x) = e−x 1 {x > 0}.

(a) Find the mean lifetime, that is E[X1 ], of the organism.

64
(b) It is known that Sn has the Gamma (n, 1) density function

xn−1 e−x
gn (x) = 1 {x > 0}.
(n − 1)!
P∞
Show that n=1 gn (x) = 1 for x > 0.
(c) Suppose that the organism is living its N th life at time t, so that N is a positive random
integer.
(i) Show that

X
P(XN ≤ x) = P(Xn ≤ x, Sn−1 < t ≤ Sn ).
n=1

(ii) Deduce from (i) that


∞ Z
X t
P(XN ≤ x) = P(t ≤ X1 ≤ x) + P(t − s ≤ Xn ≤ x)gn−1 (s) ds.
n=2 0

(iii) Deduce from (b) and (c)(ii) that XN has the density function

h(x) = xe−x 1 {0 < x ≤ t} + (1 + t) e−x 1 {x > t}.

(iv) Show that E[XN ] = 2 − e−t .


[Hint: You may find the following integrals useful:
Z u Z u
−x −u
x e dx = 1 − (1 + u)e and x2 e−x dx = 2 − (2 + 2u + u2 )e−u ,
0 0

for any u > 0.]


(v) Do your answers to (a) and (c)(iv) contradict each other? Explain.

65

You might also like