100% found this document useful (1 vote)
64 views12 pages

Introduction To Probability Theory

This document provides an introduction to probability theory and conditional distributions. It defines conditional probability mass functions (pmfs) and probability density functions (pdfs) for discrete, continuous, and mixed random variables. Conditional probabilities are then defined based on conditional pmfs/pdfs. An example is provided to illustrate how to compute the conditional distribution and pmf/pdf for a given random experiment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
64 views12 pages

Introduction To Probability Theory

This document provides an introduction to probability theory and conditional distributions. It defines conditional probability mass functions (pmfs) and probability density functions (pdfs) for discrete, continuous, and mixed random variables. Conditional probabilities are then defined based on conditional pmfs/pdfs. An example is provided to illustrate how to compute the conditional distribution and pmf/pdf for a given random experiment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Introduction to Probability Theory

K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay

October 29, 2017


2

LECTURES 24-25

Now we will state a result which is useful, its proof is beyond the scope
of this course.
Theorem 0.1 (Continuity Theorem) Let Xn , X be random variables on
(Ω, F, P ) such that,
lim ΦXn (t) = ΦX (t), t ∈ R.
n→∞

Then FXn (x) → FX (x) for all x ∈ R such that F is continuous at x.

Chapter 9 : Conditional distribution and expec-


tation
The notion of conditional densities are intended to give a quantification
of dependence of one random variable over the other if the random variables
are not independent. One of the main use of conditional expectation and
probabilities is the ’conditioning argument’ to compute expectations which
involves multiple random variables. Our approach is to define conditional
distributions (pmf/pdf), conditional probabilities and finally conditional ex-
pectations for various cases such as discrete, continuous and a combination
of both. This is an elementary treatment and hence doesn’t cover condi-
tional probabilities in its full generality.

0.1 Conditional pmf of discrete random variables


In this case, definition of conditional pmf directly follows from the definition
of conditional probabilities of events.
Definition 9.1. Let X, Y be two discrete random variables with joint pmf
f . Then the conditional pmf of Y given X denoted by fY |X is defined as

fY |X (y|x) = P (Y = y|X = x) if P (X = x) 6= 0
= 0 if P (X = x) = 0 .
Intuitively, fY |X means the pmf of Y given the information about X. Here
information about X means knowledge about the occurrence (or non occur-
rence) of {X = x} for each x. One can rewrite fX|Y in terms of the pmfs as
follows.
f (x, y)
fY |X (y|x) = if fX (x) 6= 0
fX (x)
= 0 if fX (x) = 0 .
0.2. CONDITIONAL PDF OF CONTINUOUS RANDOM VARIABLES 3

Definition of fX|Y is similar. Also one can relate fY |X and fX|Y through the
Bayes theorem as follows.
fX|Y (x|y)fY (y)
fY |X (y|x) = ,
fX (x)
where fY denote the marginal distribution of Y .

0.2 Conditional pdf of continuous random vari-


ables
When both X and Y are continuous random variables with a joint pdf f ,
we define the conditional pdf of Y given X using the motivation from the
discrete case. Thus we have the following definition.
Definition 9.2. Let X, Y are continuous random variables with joint pdf
f . The conditional distribution of Y given X is defined as

 f (x, y)
if 0 < fX (x) < ∞
fY |X (y|x) = fX (x)
0 if otherwise .

Definition of fY |X is similar and also we can see that for x, y with fX (x), fY (y) 6=
0,
fX|Y (x|y)fY (y)
fY |X (y|x) = , (0.1)
fX (x)
this can be treated as a variation of Bayes theorem for densites.

0.3 Conditional pmf/pdf: for a combination of


random variables
Now we will discuss about the case when X is discrete and Y is a continuous
random variable. Let fX denote the pmf of X, Y has a pdf fY and F denote
the joint distribution function of X and Y . Consider the probabilites
G(x, y) = P ({X = x, Y ≤ y}), x ∈ D, y ∈ R,
where D is the set of discontinuities of the distribution function FX of X.
Then it is easy to see that
X
F (x, y) = G(x0 , y).
x0 ∈D,x0 ≤x
4

Definition 9.3. The conditional distribution function of Y given X = x


denotes by FY |X (y|x) defined as
  G(x, y)
FY |X (y|x) := P {Y ≤ y}|{X = x} = , x ∈ D, y ∈ R.
fX (x)
Now we define the conditional pdf of Y given X = x is defined as the
function fY |X )(y|x) satisfying
Z y
FY |X (y|x) = fY |X (z|x)dz, y ∈ R.
−∞

Clearly, if G is differentiable with respect to y for each x ∈ D, then


1 dG(x, y)
fY |X (y|x) = .
fX (x) dy
Motivated by the density version of the Bayes’ theorem (0.1), we define the
conditional pmf of X given Y as follows.

 fY |X (y|x)fX (x)
if 0 < fY (y) < ∞
fX|Y (x|y) = fY (y)
0 if otherwise.

Example 0.1 Let Y be a symmetric random variable with pdf f and X =


I{Y >0} . Let us compute conditional pmf/pdf.
Note X is Bernoulli ( 12 ) and fY (y) = fY (−y), since Y is symmetric.

G(0, y) = P {X = 0, Y ≤ y}
= P {Y ≤ 0, Y ≤ y}
 Ry
= −∞ f (z)dz if y ≤ 0
1
2 if y > 0.

Similarly 
0R if y ≤ 0
G(1, y) = y
0 f (z)dz if y > 0.
Now

dG(0, y) f (y) if y < 0
=
dy 0 if y > 0,

dG(1, y) 0 if y < 0
=
dy f (y) if y > 0.
0.3. CONDITIONAL PMF/PDF: FOR A COMBINATION OF RANDOM VARIABLES5

Hence
dG(x, y)
fY |X (y|x) = 2
dy

2f (y) if y < 0, x = 0 or y > 0, x = 1
=
0 if otherwise,
Also
fY |X (y|x)fX (x)
fX|Y (x|y) =
fY (y)
fY |X (y|x)
=
2f (y)

1 if y < 0, x = 0 or y > 0, x = 1
=
0 if otherwise,
Example 0.2 Consider the random experiment given as follows.
• Pick a point at random from the interval (0, 1) say x
• Now pick another point at random from the interval (0, x).
Find how is the second point distributed.
Let X denote the position of the first point and Y denote the position of
the second point.
Then X be uniform random variable over (0, 1) and the conditional pdf
of Y given X is given by

1 0<x<1
fX (x) =
0 otherwise .
Note that the pdf of Y given X = x is U (0, x), i.e.
( 1
0<y<x<1
fY | X (y | x) = x
0 otherwise .
Also ( 1
0<y<x<1
f (x, y) = fX (x)fY | X (y | x) x
0 otherwise .
Hence
Z ∞ Z 1
1
fY (y) = f (x, y)dx = dx = −ln y, 0 < y < 1 .
−∞ y x
6

0.4 Conditional probabilities


Using conditional pmf /pdf, we define conditional probabilities as follows.
A remark about the notations. If Y is a discrete random variable, we de-
note DY be the set of discontinuities of its distribution function FY . The
meaning of fY |X is understood from the context, i.e. if Y is discrete, then
it denote the conditional pdf of Y given X and when Y is continuous, then
fY |X denote the conditional pmf given X.

Definition 9.4. If X, Y are random variable. Then for −∞ < a < b < ∞,
we define
X
P (a ≤ Y ≤ b | X = x) = fY |X (y|x) if Y is discrete
y∈DY ∩[a, b]
Z b
= fY |X (y|x)dy if Y is continuous.
a

In the above, when Y is continuous, it is assumed that fY |X (y|x) exists.

WARNING : When X is a continuous random variable, then P {X =


x} = 0, hence the LHS above doesn’t have the meaning using the definition
of conditional probability. So NEVER WRITE the LHS as
P {a ≤ Y ≤ b, X = x}
.
P {X = x}

Example 0.3 Let (X, Y ) be uniformly distributed  on B(0,


R), i.e.
 the open
ball centered at (0, 0) with radius R. Evaluate P Y > 0 X = x .

Note that
1
f (x, y) = , 0 ≤ x2 + y 2 < R, = 0 other wise.
πR2
Also for −R < x < R,
Z ∞
fX (x) = f (x, y)dy
−∞
Z √R2 −x2
1
= √ dy
− R2 −x2 πR2

2 R 2 − x2
= .
πR2
0.4. CONDITIONAL PROBABILITIES 7

For 0 ≤ x2 + y 2 < R,
f (x, y)
fY |X (y|x) =
fX (x)
1
= √ .
2 R 2 − x2
Hence for −R < x < R,

Z R2 −x2
  1
P Y > 0 X = x = √ dy

0 2 R 2 − x2
1
= .
2
Example 0.4 Let X, Y be i.i.d. exp(λ). Find P ({X > 1|X + Y = 2}).

Let f denote the joint pdf of X, Y and g denote the joint pdf of X, Z =
X + Y . Then
f (x, y) = λ2 e−λ(x+y) , x, y > 0, = 0 otherwise.
Recall that
1
g(u, v) = f ((u, v)A−1 ),
|detA|
when (U, V ) = (X, Y )A and f, g denote respectively the pdf of (X, Y ) and
(U, V ). Using this, it follows that
g(x, z) = λ2 e−λz , 0 < x < z, = 0 otherwise.
Hence using the definition of fX|Z , we get for 0 < x < z,
g(x, z)
fX|Z (x, z) =
fZ (z)
λ2 e−λz
= Rz
0 g(x, z)dx
λ2 e−λz
=
zλ2 e−λz
1
= .
z
Also fX|Z (x|z) = 0 otherwise. Now
Z ∞
P (X > 1|X + Y = 2) = fX|Z (x|2)dx
1
Z 2
1 1
= dx = .
1 2 2
8

Example 0.5 (De Finetti’s Theorem-Reading exercise) Consider n inde-


pendent Uniform [0, a) random variables X1 , X2 , · · · , Xn and the corre-
sponding La1 , La2 , · · · , Lan+1 denote the partition of [0, a). When n = 1,
La1 = X1 , La2 = a − La1 , for n = 2, La1 = min{X1 , X2 }, La2 = max{La1 , X2 } −
La1 , La3 = a − La2 and so on. Then

 x1 x2 xn+1 n
P {La1 > x1 , · · · Lan+1 > xn+1 } = 1 − − − ··· − ,
a a a +

where z+ = max{z, 0}.

When n = 1, for 0 ≤ x1 , x2 ≤ 1 such that x1 + x2 ≤ a,

P {La1 > x1 , La2 > x2 } = P {X1 > x1 , 1 − X1 > x2 }


= P {x1 < X1 < a − x2 }
a − x2 − x1
= .
a

Now for x1 + x2 > a, it is clear that P {La1 > x1 , La2 > x2 } = 0. Hence

 x1 x2 
P ({La1 > x1 , La2 > x2 }) = 1 − − .
a a +

Now we induction. Assume it is true for n = m − 1, for all a > 0.


Consider

P {La1 > x1 , · · · Lam+1 > xm+1 |La1 = x} = P {La−x


1 > x2 , · · · La−x
m > xm+1 |}
 x2 xm+1 m−1
= 1− − ··· − ,
a−x a−x +

where L1a−x , · · · La−x


m denote partition of the interval [0, a − x).
Now using La1 = min{X1 , · · · Xm }, it follws that the pdf of La1 is given by

m t m−1
f (t) = 1− , 0 ≤ t ≤ a.
a a
0.5. CONDITIONAL EXPECTATION 9

Hence
Z a
P {La1 > x1 , · · · Lam+1 > xm+1 } = P {La1 > x1 , · · · Lam+1 > xm+1 |La1 = x}
0
m x
(1 − )m−1 dx
Z aa a
x2 xm+1 m−1
= (1 − − ··· − )
x1 a−x a−x +
m x
(1 − )m−1 dx
a a
m a
Z
= (a − x − x2 − · · · − xm+1 )m−1
+ dx
a m x1
m (a−x1 −x2 −···−xm+1 )+ m−1
Z
= t dt
am 0
1
= (a − x1 − x2 − · · · − xm+1 )m+.
am
This completes the proof by induction.

The above example gives the distribution of the random partition (using
uniform distribution )of interval [0, a). Also note that Lak gives the kth order
statistc of uniform disrtibution on [0, a).

0.5 Conditional Expectation


Now we define conditional expectation denoted by E[Y |X = x] of the ran-
dom variable Y given the information about the random variable X. If
Y is a Bernoulli(p) random variable and X any discrete random variable,
then we expect E[Y |X = x] to be P {Y = 1|X = x}, since we know that
EY = p = P {Y = 1}. i.e.
E[Y |X = x] = 1 × fY |X (1|x) + 0 × fY |X (0|x) ,
where fY |X (y|x) is the conditional pmf of Y given X. Now since we expect
conditional expectation to be linear and any discrete random variable can
be written as a linear combination of Bernoulli random variables we get the
following definition.
Definition 9.5. Let X, Y be such that Y is a discrete random variable.
Then conditional expectation of Y given X = x is defined as
X
E[Y | X = x] = yfY | X (y | x) .
y∈DY
10

Similarly, when Y is a continuous random variable such that fY |X exists,


then we define Z ∞
E[Y | X = x] = yfY | X (y | x) .
−∞

Example 0.6 Let X, Y be independent random variables with geometric


distribution of parameter p > 0. Calculate E[Y | X + Y = n], where
n = 0, 1, 2, · · ·

Set
Z = X +Y .
P (Y = y | Z = n) = 0 if y ≥ n + 1 .
For y = 0, 1, 2, · · · , n
P {Y = y, X + Y = n}
P (Y = y|Z = n) = .
P {X + Y = n}
Now
n
X
P {X + Y = n} = P {X = x, Y = n − x} = (n + 1)p2 (1 − p)n .
x=0

P {Y = y, X + Y = n} = P {Y + y, X = n − y}
= P {Y = y}P {X = n − y}
= p(1 − p)y p(1 − p)n−y .
Therefore
1
P (Y = y|Z = n) = .
n+1
i.e.,
1

z+1 if y = 0, 1, 2, · · · , n
fY |Z (y|z) = .
= 0 otherwise
Now
n
X X 1 1 n(n + 1) n
E[Y |Z = n] = yfY |Z (y|n) = k = = .
y
n+1 n+1 2 2
k=1

When X and Y are discrete random variables, E[Y |X] is defined using
conditional pmf of Y given X. Hence we define E[Y |X] when X and Y are
continuous random variables with joint pdf f in a similar way as follows.
Now we state a very useful theorem.
0.5. CONDITIONAL EXPECTATION 11

Theorem 0.2 Let X and Y be random variables and φ : R → R be contin-


uous, then
 X

 ϕ(y)fY |X (y|x) if Y is discrete

 y∈DY

E[ϕ(Y )|X = x] = Z ∞


ϕ(y)fY |X (y|x)dy if Y is continuous



−∞

provided the rhs converges absolutely.

We state our final theorem in this chapter which is about the conditioning
method we mentioned in the begining of the chapter and is our destination
for this chapter.

Theorem 0.3 (i) Let X, Y be random variables and ϕ : R2 → R a contin-


uous function such that such that Eϕ(Y ) is finite. Then
 X

 E[ϕ(x, Y )|X = x]fX (x) if X is discrete

 x∈DX

E[ϕ(X, Y )] = Z ∞


e[ϕ(x, Y )|X = x]fX (x)dx if X is continuous



−∞

Example 0.7 Let X, Y be continuous random variables with joint pdf given
by 
6(y − x) if 0 ≤ x ≤ y ≤ 1
f (x, y) = .
0 otherwise
Find E[Y |X = x] and hence calculate EY . Note that
Z 1
fX (x) = 6(y − x)dy = 3(x − 1)2 , 0 ≤ x ≤ 1
x

and fX (x) = 0 elsewhere. Hence for 0 ≤ x ≤ 1,


Z ∞
E[Y |X = x] = yfY |X (y|x)dy
−∞ Z 1
2
= y(y − x)dy
(x − 1)2 x
x3 − 3x + 2
= .
3(x − 1)2
12

Also E[Y |X = x] = 0 elsewhere. Therefore


Z ∞
EY = E[Y |X = x]fX (x)dx
Z−∞1
3
= (x3 − 3x + 2)dx = .
0 4

Example 0.8 Let X ∼ uiniform (0, 1) and Y ∼ uniform (0, X). Find
E(X + Y )2 .

Note that what is given is fX (x) = 1, 0 < x < 1, = 0 otherwise and the
conditional pdf
1
fY |X (y|x) = , 0 < y < x, = 0 otherwise.
x
Hence
Z 1
2
E(X + Y ) = E[(x + Y )2 |X = x]dx
0
Z 1Z ∞
= (x + y)2 fY |X (y|x)dydx
0 −∞
1Z x
(x + y)2
Z
= dydx
0 0 x
Z 1 Z 2x
1 7
= z 2 dzdx = .
0 x x 9

You might also like