0% found this document useful (0 votes)
2 views44 pages

Jointly Distributed Random Variables: Jeff Chak Fu WONG

The document discusses joint distribution functions for random variables, defining the joint cumulative distribution function (CDF) and joint probability mass function (PMF) for both continuous and discrete cases. It explains how to derive marginal distributions from joint distributions and provides examples to illustrate these concepts. The document also covers the relationship between joint density functions and probabilities in measurable sets.

Uploaded by

ke ke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views44 pages

Jointly Distributed Random Variables: Jeff Chak Fu WONG

The document discusses joint distribution functions for random variables, defining the joint cumulative distribution function (CDF) and joint probability mass function (PMF) for both continuous and discrete cases. It explains how to derive marginal distributions from joint distributions and provides examples to illustrate these concepts. The document also covers the relationship between joint density functions and probabilities in measurable sets.

Uploaded by

ke ke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Jointly Distributed Random Variables

Jeff Chak Fu WONG

Department of Mathematics
The Chinese University of Hong Kong

MATH3280
Introductory Probability
Joint Distribution Functions
Thus far, we have concerned ourselves only with probability dis-
tributions for single random variables. However, we are often
interested in probability statements concerning two or more ran-
dom variables.
In order to deal with such probabilities, we define,
Definition 1
For any two random variables X and Y , the joint cumulative
probability distribution function (joint CDF) of X and Y by

F (a, b) = P {X ≤ a, Y ≤ b} − ∞ < a, b < ∞

As usual, comma means “and” or “the intersection of two events”.


so we can write
F (a, b) = P {X ≤ a, Y ≤ b} − ∞ < a, b < ∞ or a, b ∈ R
= P {{X ≤ a} and {Y ≤ b}}
= P {{X ≤ a} ∩ {Y ≤ b}}
The distribution of X can be obtained from the joint distribution
of X and Y as follows:

FX (a) = P {X ≤ a}
= P {X ≤ a, Y < ∞}
 
= P lim {X ≤ a, Y ≤ b}
b→∞

Use the continuity of P


= lim P {X ≤ a, Y ≤ b}
b→∞
= lim F (a, b)
b→∞
≡ F (a, ∞)
Note that, in the preceding set of equalities, we have once again
made use of the fact that probability is a continuous set (that is,
event) function.
Similarly, the cumulative distribution function of Y is given by

FY (b) = P {Y ≤ b}
= lim F (a, b)
a→∞
≡ F (∞, b)

The distribution functions FX and FY are sometimes referred to


as the marginal distributions of X and Y.
All joint probability statements about X and Y can, in theory,
be answered in terms of their joint distribution function.
For instance, suppose we wanted to compute the joint probability
that X is greater than a and Y is greater than b.
This could be done as follows:



 P {X > a ∩ Y > b}
= 1 − P ({X > a ∩ Y > b}c )





= 1 − P ({X > a}c ∪ {Y > b}c )






 = 1 − P ({X ≤ a} ∪ {Y ≤ b})
(1)


 Using P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F )




 = 1 − [P {X ≤ a} + P {Y ≤ b}




 −P {X ≤ a, Y ≤ b}]

 = 1 − FX (a) − FY (b) + F (a, b)
Equation (1) is a special case of the following equation,
(
P {a1 < X ≤ a2 , b1 < Y ≤ b2 }
(2)
= F (a2 , b2 ) + F (a1 , b1 ) − F (a1 , b2 ) − F (a2 , b1 )

whenever a1 < a2 , b1 < b2

Equation (2) says that to find the probability that an outcome


is in a rectangle, it is necessary to evaluate the joint CDF at all
four corners. When the probability of interest corresponds to a
nonrectangular area, the joint CDF is much harder to use.
In the case when X and Y are both discrete random variables, it
is convenient to define:
Definition 2
The joint probability mass function (joint PMF) of X and Y is

p(x, y) = P {X = x, Y = y}.

The probability mass function of X can be obtained from p(x, y)


by
X
pX (x) = P {X = x} = P {X = x, Y = y}
y:p(x,y)>0
X
= p(x, y)
y:p(x,y)>0

Similarly, X
pY (y) = p(x, y)
x:p(x,y)>0
In particular, X
F (a, b) = p(x, y)
(x,y)
X≤a,Y ≤b
The figure below shows a sketch of what the joint PMF of two
discrete r.v.s could look like.

The height of a vertical bar at (x, y) represents the probability


P (X = x, Y = y).
For the joint PMF to be valid, the total height of the vertical
bars must be 1.
From the joint distribution of X and Y , we can get the distribu-
tion of X alone by summing over the possible values of Y . This
gives us the familiar PMF of X that we have seen in previous
lectures.
In the context of joint distributions, we will call it the marginal
or unconditional distribution of X, to make it clear that we are
referring to the distribution of X alone, without regard for the
value of Y .
The marginal pmf of X is the pmf of X, viewing X individually
rather than jointly with Y , i.e.,
X X
pX (x) = P {X = x} = P {X = x, Y = y} = p(x, y)
y:p(x,y)>0 y:p(x,y)>0

The above equation follows from the axioms of probability (we


are summing over disjoint cases).
The operation of summing over the possible values of Y in order
to convert the joint PMF into the marginal PMF of X is known
as marginalizing out Y .
The process of obtaining the marginal PMF from the joint PMF
is illustrated in the following figure.

Each column of the joint PMF corresponds to a fixed x and each


row corresponds to a fixed y.
For any x, the probability P (X = x) is the total height of the bars
in the corresponding column of the joint PMF: we can imagine
taking all the bars in that column and stacking them on top of
each other to get the marginal probability. Repeating this for all
x, we arrive at the marginal PMF, depicted in bold.
Similarly, the marginal PMF of Y is obtained by summing over all
possible values of X. So given the joint PMF, we can marginalize
out Y to get the PMF of X, or marginalize out X to get the PMF
of Y. But if we only know the marginal PMFs of X and Y , there
is no way to recover the joint PMF without further assumptions.
It is clear how to stack the bars in the above figure, but very
unclear how to unstack the bars after they have been stacked!
Example 1
Suppose that 3 balls are randomly selected from an urn contain-
ing 3 red, 4 white, and 5 blue balls. If we let X and Y denote,
respectively, the number of red and white balls chosen, then the
joint probability mass function of X and Y , p(i, j) = P {X =
i, Y = j}, is given by

35 12 30
p(1, 0) = / =
1 2 3 220
5 12 10 345 12
p(0, 0) = / = 60
p(1, 1) = / =
3 3 220 1 1 1 3 220
45 12 40 34 12
p(0, 1) = / = 18
p(1, 2) = / =
1 2 3 220 1 2 3 220
45 12 30 35 12
p(0, 2) = / = 15
p(2, 0) = / =
2 1 3 220 2 1 3 220
4 12 4 34 12
p(0, 3) = / = 12
p(2, 1) = / =
3 3 220 2 1 3 220
3 12 1
p(3, 0) = / =
3 3 220
Note that the probability mass function of X is obtained by com-
puting the row sums, whereas the probability mass function of Y
is obtained by computing the column sums.

Table 1: P {X = i, Y = j}

j
i 0 1 2 3 Row sum = P {X = i}
10 40 30 4 84
0 220 220 220 220 220
30 60 18 108
1 220 220 220
0 220
15 12 27
2 220 220
0 0 220
1 1
3 220
0 0 0 220
56 112 48 4
Column sum = P {Y = j} 220 220 220 220

Because the individual probability mass functions of X and Y


thus appear in the margin of such a table, they are often referred
to as the marginal probability mass functions of X and Y, respec-
tively. ■
Definition 3
1. We say that X and Y are jointly continuous it there exists a
function f (x, y), defined for all real x and y, having the
property that, for every set C of pairs of real numbers
(that is, C is a set in the two-dimensional plane), or if they
exists f : R2 → [0, ∞) such that
ZZ
P {(X, Y ) ∈ C} = f (x, y)dxdy (3)
(x,y)∈C

for any “measureable” set C ⊆ R2 . (“measureable” sets


include, for instance, the countable union/intersections of
rectangles [a, b] × [c, d]).
2. The function f (x, y) is called the joint probability density
function (joint PDF) of X and Y .
In particular,

P {X ≤ a, Y ≤ b} = P {(X, Y ) ∈ (−∞, a) × (−∞, b)}


Z b Z a
= f (x, y)dxdy
−∞ −∞

If A and B are any sets of real numbers, i.e., A, B ⊆ R2 , then, by


defining C = {(x, y) : x ∈ A, y ∈ B} ( viewed as iterated integral ),
we see from Equation (3) that
Z Z
P {X ∈ A, Y ∈ B} = f (x, y)dxdy (4)
B A
Because

F (a, b) = P {X ∈ (−∞, a], Y ∈ (−∞, b]}


Z b Z a
= f (x, y)dxdy
−∞ −∞

it follows, upon differentiation, that

∂2
f (a, b) = F (a, b)
∂a∂b
wherever the partial derivatives are defined. Another interpretation
of the joint density function, obtained from Equation (4), is
Z d+db Z a+da
P {a < X < a + da, b < Y < b + db} = f (x, y)dxdy
b a
≈ f (a, b)dadb

when da and db are small and f (x, y) is continuous at a, b. Hence,


f (a, b) is a measure of how likely it is that the random vector
(X, Y ) will be near (a, b).
By Equation (3), we have, C = {(x, y) : x ∈ A, y ∈ B}
( viewed as iterated integral ),
Z Z
P {X ∈ A, Y ∈ B} = f (x, y)dxdy
B A

and in turn, as B = R,
Z
P {X ∈ A} = fX (x)dx
A

where the PDF of X is


Z Z ∞
fX (x) = f (x, y)dy = f (x, y)dy. (5)
B=R −∞
Deduction for (5): We have the following facts:
{X ∈ A, Y ∈ B = R = (−∞, ∞)} = {X ∈ A}∩{Y ∈ R} = {X ∈ A};

If X and Y are jointly continuous, they are individually continuous,


and their probability density functions (PDFs) can be obtained as
follows:
P {X ∈ A} = P {X ∈ A, Y ∈ (−∞, ∞)}
Z Z ∞ 
= f (x, y)dy dx
A −∞
Z
= fX (x)dx
A
where Z ∞
fX (x) = f (x, y)dy
−∞
is thus the probability density function (PDF) of X. Similarly, the
probability density function of Y is given by
Z ∞
fY (y) = f (x, y)dx
−∞
Example 2
The joint density function of X and Y is given by
(
2e−x e−2y 0 < x < ∞, 0 < y < ∞
f (x, y) =
0 otherwise
Compute
(a) P {X > 1, Y < 1},
(b) P {X < Y }, and
(c) P {X < a}.
Solution.
(a)

Z 1Z ∞
P {X > 1, Y < 1} = 2e−x e−2y dxdy
0 1
Z 1
∞
= 2e−2y −e−x 1
dy
0
Z 1
−1
=e 2e−2y dy
0
= e−1 (1 − e−2 )
(b)

ZZ
P {X < Y } = 2e−x e−2y dxdy
(x,y): x<y
Z ∞Z y
= 2e−x e−2y dxdy
Z0 ∞ 0

= 2e−2y (1 − e−y )dy


0
Z ∞ Z ∞
−2y
= 2e dy − 2e−3y dy
0 0
2
=1−
3
1
=
3
(c)

Z aZ ∞
P {X < a} = 2e−2y e−x dydx
Z0 a 0
∞ −x
= −e−2y 0
e dydx
Z0 a
= e−x dx
0
= 1 − e−a


Example 3
The joint density of X and Y is given by
(
e−(x+y) 0 < x < ∞, 0 < y < ∞
f (x, y) =
0 otherwise
Find the density function of the random variable X/Y.
Solution. Since f (x, y) = 0 of (x, y) ∈
/ (0, ∞) × (0, ∞), we may
assume X, Y always take positive values. So is X/Y .
For a > 0,
  ZZ
X
FX/Y (a) = P ≤ a = P {X ≤ aY } = f (x, y)dxdy
Y
x≤ay
ZZ
= e−(x+y) dxdy
x≤ay
ZZ
= e−(x+y) dxdy
(x,y)∈(0,∞)×(0,∞):x≤ay
Z ∞ Z ay
−(x+y)
= e dxdy
0 0
Z ∞
= (1 − e−ay )e−y dy
0

−y e−(a+1)y 1
= −e + =1−
a+1 a+1
0
Differentiation shows that the density function of X/Y is given by
1

 0<a<∞
fX/Y (a) = (a + 1)2
0 otherwise


Proposition 1
Suppose X and Y have a joint density function f . Then the
marginal densities of X and Y are given by
Z ∞
fX (a) = f (a, y)dy, a ∈ R
−∞

and Z ∞
fY (b) = f (x, b)dx, b∈R
−∞

Proof. Notice that

FX (a) = P {X ≤ a}
Z a Z ∞ 
= f (x, y)dy dx
−∞ −∞
Z ∞
Let g(x) = f (x, y)dy.
−∞
Then Z a
FX (a) = g(x)dx
−∞
Taking the derivative gives
d
fX (a) = FX (a)
dx Z
a
d
= g(x)dx
dx −∞
= g(a)
Z ∞
= f (a, y)dy
−∞

(assuming that g is continuous at a)


Similarly,

FX (b) = P {Y ≤ b}
Z b Z ∞ 
= f (x, y)dx dy
−∞ −∞
Z ∞
Let g(y) = f (x, y)dx.
−∞
Then Z b
FY (b) = g(y)dy
−∞
Taking the derivative gives
d
fY (b) = FY (b)
dy
Z b
d
= g(y)dy
dy −∞
= g(b)
Z ∞
= f (x, b)dy
−∞

(under some regularity assumptions on f )


We can also define joint probability distributions for n random vari-
ables in exactly the same manner as we did for n = 2.
Generalization:
The joint cumulative probability distribution function F (a1 , a2 , · · · , an )
of the n continuous random variables X1 , X2 , · · · , Xn is defined by

F (a1 , a2 , · · · , an ) = P {X1 ≤ a1 , X2 ≤ a2 , · · · , Xn ≤ an }
Further, the n random variables are said to be jointly continuous if
there exists a function f (x1 , x2 , . . . , xn ) , called the joint prob-
ability density function, such that, for any set C in n-space, i.e.,
C ⊆ Rn (dx = dx1 dx2 · · · dxn )

P {(X1 , X2 , · · · , Xn ) ∈ C}
Z Z
= ··· f (x1 , · · · , xn )dx1 dx2 · · · dxn
(x1 ,··· ,xn )∈C

In particular, for any n sets of real numbers A1 , A2 , . . . , An , i.e.,


Aj ⊆ R and C = A1 × A2 × . . . × An ⊆ Rn ,

P {X1 ∈ A1 , X2 , ∈ A2 , . . . , Xn ∈ An }
= P {(x1 , · · · , xn ) ∈ C}
Z Z Z
= ··· f (x1 , · · · , xn )dx1 dx2 · · · dxn
An An−1 A1

( viewed as iterated integral )


Independent Random Variables
Definition 4
The random variables X and Y are said to be independent if, for
any two sets of real numbers A and B,

P {X ∈ A, Y ∈ B} = P {X ∈ A}P {Y ∈ B} (6)

or in terms of the joint distribution function F,

F (a, b) = FX (a)FY (b) a, b ∈ R. (7)

In other words, X and Y are independent if, for all A and B, the
events EA = {X ∈ A} and FB = {Y ∈ B} are independent.
It can be shown by using the three axioms of probability that Equa-
tion (6) will follow if and only if, for all a, b,

P {X ≤ a, Y ≤ b} = P {X ≤ a}P {Y ≤ b}
Hence, in terms of the joint distribution function F of X and Y, X
and Y are independent if

F (a, b) = FX (a)FY (b) for all a, b

When X and Y are discrete random variables, the condition of in-


dependence (6) is equivalent to

p(x, y) = pX (x)pY (y) for all x, y (8)

The equivalence follows because, if Equation (6) is satisfied, then


we obtain Equation (8) by letting A and B be, respectively, the
one-point sets A = {x} and B = {y}.
Furthermore, if Equation (8) is valid, then, for any sets A, B,
XX
P {X ∈ A, Y ∈ B} = p(x, y)
y∈B x∈A
XX
= pX (x)pY (y)
y∈B x∈A
X X
= pY (y) pX (x)
y∈B x∈A

= P {Y ∈ B}P {X ∈ A}

and Equation (8) is established.


In the jointly continuous case, the condition of independence is
equivalent to

f (x, y) = fX (x)fY (y) for all x, y

Thus, loosely speaking, X and Y are independent if knowing the


value of one does not change the distribution of the other. Random
variables that are not independent are said to be dependent.
Proof.
X and Y are independent ⇐⇒
F (a, b) = FX (a)FY (b), ∀a, b ∈ R

∂ 2 F (a, b) dFX (a) dFY (b)


=⇒ = (9)
∂a∂b da db
Now if Equation (9) holds, then
F (a, b)
Z b Z a
= f (x, y)dxdy
−∞ −∞
Z b Z a
= fX (x)fY (y)dxdy
−∞ −∞
Z b  Z a 
= fY (y)dy fX (x)dx
−∞ −∞
= FY (b)FX (a)
Hence, X and Y are independent.
Example 4
Suppose X and Y have a joint density

f (x, y) = 24xy 0 < x < 1, 0 < y < 1, 0 < x + y < 1

and is equal to 0 otherwise.


Determine whether X and Y are independent.
Solution
We first calculate the marginal densities, fX (x), and fY (y).
Note that
Z ∞
fX (a) = f (a, y)dy
−∞
Z1−a
= (24ay)dy
0
1−a
y2
= 24a
2 0
= 12a(1 − a)2 , if 0 < a < 1

and
and
Z ∞
fY (b) = f (x, b)dx
−∞
Z 1−b
= (24xb)dx
0
1−b
x2
= 24b
2 0
= 12b(1 − b)2 , if 0 < b < 1

Clearly, f (a, b) ̸= fX (a)fY (b). Hence, X and Y are not indepen-


dent. ■
Proposition 2
The continuous (discrete) rv X and Y are independent if and only
if their joint pdf (pmf) is expressible as f (x, y) = h(x)g(y), x, y ∈
R.

You might also like