0% found this document useful (0 votes)
13 views49 pages

ST 610 Lect 4

This document discusses joint and marginal distributions of random variables. It defines joint and marginal probability mass functions for discrete random vectors. It also defines joint and marginal probability density functions for continuous random vectors. Examples are provided to demonstrate calculating probabilities from the joint distribution and determining marginal distributions.

Uploaded by

Prabhat 7154it
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views49 pages

ST 610 Lect 4

This document discusses joint and marginal distributions of random variables. It defines joint and marginal probability mass functions for discrete random vectors. It also defines joint and marginal probability density functions for continuous random vectors. Examples are provided to demonstrate calculating probabilities from the joint distribution and determining marginal distributions.

Uploaded by

Prabhat 7154it
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Chapter 4

Multiple Random Variables

4.1 Joint and Marginal Distributions

Definition 4.1.1 An n-dimensional random vector is a function


from a sample space S into Rn, n-dimensional Euclidean space.

Suppose, for example, that with each point in a sample space we


associate an ordered pair of numbers, that is, a point (x, y) ∈ R2,
where R2 denotes the plane. Then we have defined a two -dimensional
(or bivariate) random vector (X, Y ).

Example 4.1.1 (Sample space for dice) Consider the experiment


of tossing two fair dice. The sample space for this experiment has
36 equally likely points. Let

X=sum of the two dice and Y =|difference of two dice|.

In this way we have defined then bivariate random vector (X, Y ).


123
124 CHAPTER 4. MULTIPLE RANDOM VARIABLES

The random vector (X, Y ) defined above is called a discrete random


vector because it has only a countable (in this case, finite) number of
possible values. The probabilities of events defined in terms of X and
Y are just defined in terms of the probabilities of the corresponding
events in the sample space S. For example,
2 1
P (X = 5, Y = 3) = P ({4, 1}, {1, 4}) = = .
36 18
4.1. JOINT AND MARGINAL DISTRIBUTIONS 125

Definition 4.1.2 Let (X, Y ) be a discrete bivariate random vec-


tor. Then the function f (x, y) from R2 into R defined by f (x, y) =
P (X = x, Y = y) is called the joint probability mass function or
joint pmf of (X, Y ). If it is necessary to stress the fact that f is
the joint pmf of the vector (X, Y ) rather than some other vector,
the notation fX,Y (x, y) will be used.

The joint pmf can be used to compute the probability of any event
defined in terms of (X, Y ). Let A be any subset of R2. Then
X
P ((X, Y ) ∈ A) = f (x, y).
(x,y)∈A

Expectations of functions of random vectors are computed just as


with univariate random variables. Let g(x, y) be a real-valued function
defined for all possible values (x, y) of the discrete random vector
(X, Y ). Then g(X, Y ) is itself a random variable and its expected
value Eg(X, Y ) is given by
X
Eg(X, Y ) = g(x, y)f (x, y).
(x,y)∈R2
126 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Example 4.1.2 (Continuation of Example 4.1.1) For the (X, Y )


whose joint pmf is given in the following table
X
2 3 4 5 6 7 8 9 10 11 12
1 1 1 1 1 1
0 36 36 36 36 36 36
1 1 1 1 1
1 18 18 18 18 18
1 1 1 1
Y 2 18 18 18 18
1 1 1
3 18 18 18
1 1
4 18 18
1
5 18

Letting g(x, y) = xy, we have


1 1 11
EXY = (2)(0) + · · · + (7)(5) = 13 .
36 18 18
4.1. JOINT AND MARGINAL DISTRIBUTIONS 127

The expectation operator continues to have the properties listed in


Theorem 2.2.5 (textbook). For example, if g1(x, y) and g2(x, y) are
two functions and a, b and c are constants, then

E(ag1(X, Y ) + bg2(X, Y ) + c) = aEg1(X, Y ) + bEg2(X, Y ) + c.

For any (x, y), f (x, y) ≥ 0 since f (x, y) is a probability. Also,


since (X, Y ) is certain to be in R2,
X
f (x, y) = P ((X, Y ) ∈ R2) = 1.
(x,y)∈R2
128 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Theorem 4.1.1 Let (X, Y ) be a discrete bivariate random vector


with joint pmf fXY (x, y). Then the marginal pmfs of X and Y ,
fX (x) = P (X = x) and fY (y) = P (Y = y), are given by
X X
fX (x) = fX,Y (x, y) and fY (y) = fX,Y (x, y).
y∈R x∈R

Proof: For any x ∈ R, let Ax = {(x, y) : −∞ < y < ∞}. That is,
Ax is the line in the plane with first coordinate equal to x. Then, for
any x ∈ R,

fX (x) = P (X = x)

= P (X = x, −∞ < Y < ∞) (P (−∞ < Y < ∞) = 1)

= P ((X, Y ) ∈ Ax) (definition of Ax)


X
= fX,Y (x, y)
(x,y)∈Ax
X
= fX,Y (x, y).
y∈R

The proof for fY (y) is similar. ¤


4.1. JOINT AND MARGINAL DISTRIBUTIONS 129

Example 4.1.3 (Marginal pmf for dice) Using the table given in
Example 4.1.2, compute the marginal pmf of Y . Using Theorem
4.1.1, we have
1
fY (0) = fX,Y (2, 0) + · · · + fX,Y (12, 0) = .
6
Similarly, we obtain
5 2 1 1 1
fY (1) = ,
fY (2) = , fY (3) = , fY (4) = , fY (5) = .
18 9 6 9 18
P
Notice that 5i=0 fY (i) = 1.

The marginal distributions of X and Y do not completely describe


the joint distribution of X and Y . Indeed, there are many different
joint distributions that have the same marginal distribution. Thus, it
is hopeless to try to determine the joint pmf from the knowledge of
only the marginal pmfs. The next example illustrates the point.
130 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Example 4.1.4 (Same marginals, different joint pmf ) Consider-


ing the following two joint pmfs,
1 5 3
f (0, 0) = , f (1, 0) = , , f (0, 1) = f (1, 1) = , f (x, y) = 0 for a
12 12 12
and
1 1
f (0, 0) = f (0, 1) = , f (1, 0) = f (1, 1) = , f (x, y) = 0 for all other
6 3
It is easy to verify that they have the same marginal distributions.
The marginal of X is
1 2
fX (0) = , fX (1) = .
3 3
The marginal of Y is
1 1
fY (0) = , fY (1) = .
2 2
4.1. JOINT AND MARGINAL DISTRIBUTIONS 131

In the following we consider random vectors whose components are


continuous random variables.

Definition 4.1.3 A function f (x, y) from R2 into R is called a


joint probability density function or joint pdf of the continuous
bivariate random vector (X, Y ) if, for every A ⊂ R2,
Z Z
P ((X, Y ) ∈ A) = f (x, y)dxdy.
A

If g(x, y) is a real-valued function, then the expected value of


g(X, Y ) is defined to be
Z ∞ Z ∞
Eg(X, Y ) = g(x, y)f (x, y)dxdy.
−∞ −∞

The marginal probability density functions of X and Y are defined


as
Z ∞
fX (x) = f (x, y)dy, −∞ < x < ∞,
−∞
Z ∞
fY (y) = f (x, y)dx, −∞ < y < ∞.
−∞

Any function f (x, y) satisfying f (x, y) ≥ 0 for all (x, y) ∈ R2 and


Z ∞Z ∞
1= f (x, y)dxdy
−∞ −∞

is the joint pdf of some continuous bivariate random vector (X, Y ).


132 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Example 4.1.5 (Calculating joint probabilities-I) Define a joint


pdf by 


 6xy 2 0 < x < 1 and 0 < y < 1
f (x, y) =


0 otherwise
Now, consider calculating a probability such as P (X + Y ≥ 1).
Let A = {(x, y) : x + y ≥ 1}, we can re-express A as

A = {(x, y) : x+y ≥ 1, 0 < x < 1, 0 < y < 1} = {(x, y) : 1−y ≤ x < 1, 0 <

Thus, we have
Z Z Z 1Z 1
9
P (X + Y ≥ 1) = f (x, y)dxdy = 6xy 2dxdy = .
A 0 1−y 10
The joint cdf is the function F (x, y) defined by
Z x Z y
F (x, y) = P (X ≤ x, Y ≤ y) = f (s, t)dtds.
−∞ −∞
4.2. CONDITIONAL DISTRIBUTIONS AND INDEPENDENCE 133

4.2 Conditional Distributions and Independence

Definition 4.2.1 Let (X, Y ) be a discrete bivariate random vec-


tor with joint pmf f (x, y) and marginal pmfs fX (x) and fY (y).
For any x such that P (X = x) = fX (x) > 0, the conditional pmf
of Y given that X = x is the function of y denoted by f (y|x) and
defined by
f (x, y)
f (y|x) = P (Y = y|X = x) = .
fX (x)
For any y such that P (Y = y) = fY (y) > 0, the conditional pmf
of X given that Y = y is the function of x denoted by f (x|y) and
defined by
f (x, y)
f (x|y) = P (X = x|Y = y) = .
fY (y)
It is easy to verify that f (y|x) and f (x|y) are indeed distributions.
First, f (y|x) ≥ 0 for every y since f (x, y) ≥ 0 and fX (x) > 0.
Second, P
X f (x, y) fX (x)
y
f (y|x) = = = 1.
y
fX (x) fX (x)
134 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Example 4.2.1 (Calculating conditional probabilities) Define the


joint pmf of (X, Y ) by
2 3
f (0, 10) = f (0, 20) = , f (1, 10) = f (1, 30) = ,
18 18
4 4
, f (2, 30) = .
f (1, 20) =
18 18
The conditional probability
f (0, 10) f (0, 10) 1
fY |X (10|0) = = = .
fX (0) f (0, 10) + f (0, 20) 2
Definition 4.2.2 Let (X, Y ) be a continuous bivariate random
vector with joint pdf f (x, y) and marginal pdfs fX (x) and fY (y).
For any x such that fX (x) > 0, the conditional pdf of Y given
that X = x is the function of y denoted by f (y|x) and defined by
f (x, y)
f (y|x) = .
fX (x)
For any y such that fY (y) > 0, the conditional pdf of X given
that Y = y is the function of x denoted by f (x|y) and defined by
f (x, y)
f (x|y) = .
fy (y)
If g(Y ) is a function of Y , then the conditional expected value of
g(Y ) given that X = x is denoted by E(g(Y )|x) and is given by
X Z ∞
E(g(Y )|x) = g(y)f (y|x) and E(g(Y )|x) = g(y)f (y|x)dy
y −∞

in the discrete and continuous cases, respectively.


4.2. CONDITIONAL DISTRIBUTIONS AND INDEPENDENCE 135

Example 4.2.2 (Calculating conditional pdfs) Let the continu-


ous random vector (X, Y ) have joint pdf

f (x, y) = e−y , 0 < x < y < ∞.

The marginal of X is
Z ∞ Z ∞
fX (x) = f (x, y)dy = e−y dy = e6−x.
−∞ x

Thus, marginally, X has an exponential distribution. The condi-


tional distribution of Y is



f (x, y)  e−y = e−(y−x), if y > x,
e−x
f (y|x) = =
fX (x) 
 0
 e−x = 0, if y ≤ x

The mean of the conditional distribution is


Z ∞
E(Y |X = x) = ye−(y−x)dy = 1 + x.
x

The variance of the conditional distribution is

Var(Y |x) = E(Y 2|x) − (E(Y |x))2


Z ∞ Z ∞
= y 2e−(y−x)dy − ( ye−(y−x))2
x x

=1
136 CHAPTER 4. MULTIPLE RANDOM VARIABLES

In all the previous examples, the conditional distribution of Y given


X = x was different for different values of x. In some situations, the
knowledge that X = x does not give us any more information about
Y than we already had. This important relationship between X and
Y is called independence.

Definition 4.2.3 Let (X, Y ) be a bivariate random vector with


joint pdf or pmf f (x, y) and marginal pdfs or pmfs fX (x) and
fY (y). Then X and Y are called independent random variables
if, for EVERY x ∈ R and y ∈ mR,

f (x, y) = fX (x)fY (y).

If X and Y are independent, the conditional pdf of Y given X = x is


f (x, y) fX (x)fY (y)
f (y|x) = = = fY (y)
fX (x) fX (x)
regardless of the value of x.
4.2. CONDITIONAL DISTRIBUTIONS AND INDEPENDENCE 137

Lemma 4.2.1 Let (X, Y ) be a bivariate random vector with joint


pdf or pmf f (x, y). Then X and Y are independent random vari-
ables if and only if there exist functions g(x) and h(y) such that,
for every x ∈ R and y ∈ R,

f (x, y) = g(x)h(y).

Proof: The “only if” part is proved by defining g(x) = fX (x) and
h(y) = fY (y). To proved the “if” part for continuous random vari-
ables, suppose that f (x, y) = g(x)h(y). Define
Z ∞ Z ∞
g(x)dx = c and h(y)dy = d,
−∞ −∞

where the constants c and d satisfy


Z ∞ Z ∞
cd = ( g(x)dx)( h(y)dy)
Z −∞
∞ Z ∞
−∞

= g(x)h(y)dxdy
Z−∞ −∞
∞ Z ∞
= f (x, y)dxdy = 1
−∞ −∞

Furthermore, the marginal pdfs are given by


Z ∞
fX (x) = g(x)h(y)dy = g(x)d
−∞

and
Z ∞
fY (y) = g(x)h(y)dx = h(y)c.
−∞
138 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Thus, we have

f (x, y) = g(x)h(y) = g(x)h(y)cd = fX (x)fY (y),

showing that X and Y are independent. Replacing integrals with


sums proves the lemma for discrete random vectors. ¤

Example 4.2.3 (Checking independence) Consider the joint pdf


1 2 2 −y−(x/2)
f (x, y) = 384 x y e ,x > 0 and y > 0. If we define



x2e−x/2 x > 0
g(x) =


0 x≤0

and 


y 4e−y /384 y>0
h(y) =


0 y≤0
then f (x, y) = g(x)h(y) for all x ∈ R and all y ∈ R. By Lemma
4.2.1, we conclude that X and Y are independent random vari-
ables.
4.2. CONDITIONAL DISTRIBUTIONS AND INDEPENDENCE 139

Theorem 4.2.1 Let X and Y be independent random variables.

(a) For any A ⊂ R and B ⊂ R, P (X ∈ A, Y ∈ B) = P (X ∈


A)P (Y ∈ B); that is, the events {X ∈ A} and {Y ∈ B} are
independent events.

(b) Let g(x) be a function only of x and h(y) be a function only


of y. Then

E(g(X)h(Y )) = (Eg(X))(Eh(Y )).

Proof: For continuous random variables, part (b) is proved by not-


ing that
Z ∞ Z ∞
E(g(X)h(Y )) = g(x)h(y)f (x, y)dxdy
Z−∞ −∞
∞ Z ∞
= g(x)h(y)fX (x)fY (y)dxdy
Z−∞∞ −∞ Z ∞
=( g(x)fX (x)dx)( h(y)fY (y)dy)
−∞ −∞

= (Eg(X))(Eh(Y )).
The result for discrete random variables is proved bt replacing integrals
by sums.
Part (a) can be proved similarly. Let g(x) be the indicator function
of the set A. let h(y) be the indicator function of the set B. Note
that g(x)h(y) is the indicator function of the set C ∈ R2 defined by
140 CHAPTER 4. MULTIPLE RANDOM VARIABLES

C = {(x, y) : x ∈ A, y ∈ B}. Also note that for an indicator function


such as g(x), Eg(X) = P (X ∈ A). Thus,

P (X ∈ A, Y ∈ B) = P ((X, Y ) ∈ C) = E(g(X)h(Y ))

= (Eg(X))(Eh(Y )) = P (X ∈ A)P (Y ∈ B).


¤
4.2. CONDITIONAL DISTRIBUTIONS AND INDEPENDENCE 141

Example 4.2.4 (Expectations of independent variables) Let X


and Y be independent exponential(1) random variables. So

P (X ≥ 4, Y ≤ 3) = P (X ≥ 4)P (Y ≤ 3) = e−4(1 − e−3)/

Letting g(x) = x2 and h(y) = y, we have

E(X 2Y ) = E(X 2)E(Y ) = (2)(1) = 2.

Theorem 4.2.2 Let X and Y be independent random variables


with moment generating functions MX (t) and MY (t). Then the
moment generating function of the random variable Z = X + Y
is given by
MZ (t) = MX (t)MY (t).

Proof:

MZ (t) = Eet(X+Y ) = (EetX )(EetY ) = MX (t)MY (t).

¤
142 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Theorem 4.2.3 Let X ∼ N (µ, σ 2) and Y ∼ N (γ, τ 2) be inde-


pendent normal random variables. Then the random variable
Z = X + Y has a N (µ + γ, σ 2 + τ 2) distribution.

Proof: Using Theorem 4.2.2, we have

MZ (t) = MX (t)MY (t) = exp{(µ + γ)t + (σ 2 + τ 2)t2/2}.

Hence, Z ∼ N (µ + γ, σ 2 + τ 2). ¤

4.3 Bivariate Transformations

Let (X, Y ) be a bivariate random vector with a known probability


distribution. Let U = g1(X, Y ) and V = g2(X, Y ), where g1(x, y)
and g2(x, y) are some specified functions. If B is any subset of R2,
then (U, V ) ∈ B if and only if (X, Y ) ∈ A, where A = {(x, y) :
(g1(x, y), g2(x, y)) ∈ B}. Thus P ((U, V ) ∈ B) = P ((X, Y ) ∈ A),
and the probability of (U, V ) is completely determined by the proba-
bility distribution of (X, Y ).
If (X, Y ) is a discrete bivariate random vector, then
X
fU,V (u, v) = P (U = u, V = v) = P ((X, Y ) ∈ Au,v ) = fX,Y (x, y),
(x,y)∈Auv

where Au,v = {(x, y) : g1(x, y) = u, g2(x, y) = v}.


4.3. BIVARIATE TRANSFORMATIONS 143

Example 4.3.1 (Distribution of the sum of Poisson variables)


Let X and Y be independent Poisson random variables with pa-
rameters θ and λ, respectively. Thus, the joint pmf of (X, Y )
is
θxe−θ λy e−λ
fX,Y (x, y) = , x = 0, 1, 2, . . . , y = 0, 1, 2, . . .
x! y!
Now define U = X + Y and V = Y , thus,
θu−v e−θ λv e−λ
fU,V (u, v) = fX,V (u−v, v) = , v = 0, 1, 2, . . . , u = v, v+
(u − v)! v!
The marginal of U is
u
X Xu
θu−v e−θ λv e−λ −(θ+λ) θu−v λv
fU (u) = =e
v=0
(u − v)! v! v=0
(u − v)! v!
u µ ¶
e−(θ+λ) X u v u−v e−(θ+λ)
= λθ = (θ + λ)u, u = 0, 1, 2, . . .
u! v=0 v u!
This is the pmf of a Poisson random variable with parameter θ+λ.
144 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Theorem 4.3.1 If X ∼ P oisson(θ) and Y ∼ P oisson(λ) and X


and Y are independent, then X + Y ∼ P oisson(θ + λ).

If (X, Y ) is a continuous random vector with joint pdf fX,Y (x, y),
then the joint pdf of (U, V ) can be expressed in terms of FX,Y (x, y) in
a similar way. As before, let A = {(x, y) : fX,Y (x, y) > 0} and B =
{(u, v) : u = g1(x, y) and v = g2(x, y) for some (x, y) ∈ A}. For the
simplest version of this result, we assume the transformation u =
g1(x, y) and v = g2(x, y) defines a one-to-one transformation of A
to B. For such a one-to-one, onto transformation, we can solve the
equations u = g1(x, y) and v = g2(x, y) for x and y in terms of u and
v. We will denote this inverse transformation by x = h1(u, v) and
y = h2(u, v). The role played by a derivative in the univariate case is
now played by a quantity called the Jacobian of the transformation.
It is defined by ¯ ¯
¯ ∂x ∂x ¯
¯ ∂u ¯
J = ¯¯ ∂v ¯
,
∂y
¯ ∂u ∂y ¯¯
∂v
∂x ∂h1 (u,v) ∂x ∂h1 (u,v) ∂y ∂h2 (u,v) ∂y ∂h2 (u,v)
where ∂u = ∂u , ∂v = ∂v , ∂u = ∂u , and ∂v = ∂v .

We assume that J is not identically 0 on B. Then the joint pdf of


(U, V ) is 0 outside the set B and on the set B is given by

fU,V (u, v) = fX,Y (h1(u, v), h2(u, v))|J|,


4.3. BIVARIATE TRANSFORMATIONS 145

where |J| is the absolute value of J.

Example 4.3.2 (Sum and difference of normal variables) Let X


and Y be independent, standard normal variables. Consider the
transformation U = X + Y and V = X − Y . The joint pdf of X
and Y is, of course,

fX,Y (x, y) = (2π)−1 exp(−x2/2) exp(−y 2/2), −∞ < x < ∞, −∞ < y <

so the set A = R2. Solving the following equations

u=x+y and v =x−y

for x and y, we have

u+v u−v
x = h1(x, y) = , and y = h2(x, y) = .
2 2

Since the solution is unique, we can see that the transformation


is one-to-one, onto transformation from A to B = R2.
¯ ¯ ¯ ¯
¯ ∂x ∂x ¯ ¯ 1 1 ¯
¯ ∂u ∂v ¯ ¯ 2 2 ¯
J = ¯¯ ¯=¯ ¯ = −1.
∂y ∂y ¯¯
¯ ∂u
¯1 1¯
¯2 −2¯ 2
∂v

So the joint pdf of (U, V ) is

1 −((u+v)/2)2/2 −((u−v)/2)2/2 1
fU,V (u, v) = fX,Y (h1(u, v), h2(u, v))|J| = e e
2π 2
146 CHAPTER 4. MULTIPLE RANDOM VARIABLES

for −∞ < u < ∞ and −∞ < v < ∞. After some simplification


and rearrangement we obtain
1 2 1 2
fU,V (u, v) = ( √ √ e−u /4)( √ √ e−v /4).
2p 2 2p 2
The joint pdf has factored into a function of u and a function of
v. That implies U and V are independent.
4.3. BIVARIATE TRANSFORMATIONS 147

Theorem 4.3.2 Let X and Y be independent random variables.


Let g(x) be a function only of x and h(y) be a function only of
y. Then the random variables U = g(X) and V = h(Y ) are
independent.

Proof: We will prove the theorem assuming U and V are continuous


random variables. For any u ∈ mR and v ∈ R, define

Au = {x : g(x) ≤ u} and Bu = {y : h(y) ≤ v}.

Then the joint cdf of (U, V ) is

FU,V (u, v) = P (U ≤ u, V ≤ v)

= P (X ∈ Au, Y ∈ Bv )

P (X ∈ Au)P (Y ∈ Bv ).

The joint pdf of (U, V ) is


∂2 d d
fU,V (u, v) = FU,V (u, v) = ( P (X ∈ Au))( P (Y ∈ Bv )),
∂u∂v du dv
where the first factor is a function only of u and the second factor is
a function only of v. Hence, U and V are independent. ¤
148 CHAPTER 4. MULTIPLE RANDOM VARIABLES

In many situations, the transformation of interest is not one-to-one.


Just as Theorem 2.1.8 (textbook) generalized the univariate method
to many-to-one functions, the same can be done here. As before,
A = {(x, y) : fX,Y (x, y) > 0}. Suppose A0, A1, . . . , Ak form a
partition of A with these properties. The set A0, which may be empty,
satisfies P ((X, Y ) ∈ A0) = 0. The transformation U = g1(X, Y ) and
V = g2(X, Y ) is a one-to-one transformation from Ai onto B for each
i = 1, 2, . . . , k. Then for each i, the inverse function from B to Ai can
be found. Denote the ith inverse by x = h1i(u, v) and y = h2i(u, v).
Let Ji denote the Jacobian computed from the ith inverse. Then
assuming that these Jacobians do not vanish identically on B, we
have
k
X
fU,V (u, v) = fX,Y (h1i(u, v), h2i(u, v))|Ji|.
i=1
4.3. BIVARIATE TRANSFORMATIONS 149

Example 4.3.3 (Distribution of the ratio of normal variables)


Let X and Y be independent N (0, 1) random variable. Consider
the transformation U = X/Y and V = |Y |. (U and V can be
defined to be any value, say (1,1), if Y = 0 since P (Y = 0) = 0.)
This transformation is not one-to-one, since the points (x, y) and
(−x, −y) are both mapped into the same (u, v) point. Let

A1 = {(x, y) : y > 0}, A2 = {(x, y) : y < 0}, A0 = {(x, y) : y = 0}.

A0, A1 and A2 form a partition of A = R2 and P (A0) = 0. The


inverse transformations from B to A1 and B to A2 are given by

x = h11(u, v) = uv, y = h21(u, v) = v,

and

x = h12(u, v) = −uv, y = h22(u, v) = −v.

The Jacobians from the two inverses are J1 = J2 = v. Using

1 −x2/2 −y2/2
fX,Y (x, y) = e e ,

we have
1 −(uv)2/2 −v2/2 1 2 2
fU,V (u, v) = e e |v| + e−(−uv) /2e−(−v) /2|v|
2π 2π
v 2 2
= e−(u +1)v /2, −∞ < u < ∞, 0 < v < ∞.
π
150 CHAPTER 4. MULTIPLE RANDOM VARIABLES

From this the marginal pdf of U can be computed to be


Z ∞
v −(u2+1)v2/2
fU (u) = e dv
0 Z π

1 2
= e−(u +1)z/2dz (z = v 2)
2π 0
1
=
π(u2 + 1)
So we see that the ratio of two independent standard normal ran-
dom variable is a Cauchy random variable.
4.4. HIERARCHICAL MODELS AND MIXTURE DISTRIBUTIONS 151

4.4 Hierarchical Models and Mixture Distributions

Example 4.4.1 (Binomial-Poisson hierarchy) Perhaps the most


classic hierarchical model is the following. An insect lays a large
number of eggs, each surviving with probability p. On the average,
how many eggs will survive?
The large number of eggs laid is a random variable, often taken
to be Poisson(λ). Furthermore, if we assume that each egg’s sur-
vival is independent, then we have Bernoulli trials. Therefore,,
if we let X=number of survivors and Y =number of eggs laid, we
have
X|Y binomial(Y, p), Y ∼ Poisson(λ),

a hierarchical model.

The advantage of the hierarchy is that complicated process may


be modeled by a sequence of relatively simple models placed in a
hierarchy.
152 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Example 4.4.2 (Continuation of Example 4.4.1) The random


variable X has the distribution given by

X ∞
X
P (X = x) = P (X = x, Y = y) = P (X = x|Y = y)P (Y = y)
y=0 y=0
X∞ µ ¶ −y y
y x y−x e λ
= [ p (1 − p) ][ ] (conditional probability is 0
y=x
x y!

(λp)xe−λ X ((1 − p)λ)y−x
=
x! y=x
(y − x)!
(λp)xe−λ (1−p)λ
= e
x!
(λp)x −λp
= e ,
x!
so X ∼ P oisson(λ). Thus, any marginal inference on X is with
respect to a Poisson(λp) distribution, with Y playing no part at
all. Introducing Y in the hierarchy was mainly to aid our under-
standing of the model. On the average,

EX = λp

eggs will survive.


4.4. HIERARCHICAL MODELS AND MIXTURE DISTRIBUTIONS 153

Sometimes, calculations can be greatly simplified be using the fol-


lowing theorem.

Theorem 4.4.1 If X and Y are any two random variables, then

EX = E(E(X|Y )),

provided that the expectations exist.

Proof: Let f (x, y) denote the joint pdf of X and Y . By definition,


we have
Z Z Z
EX = inf xf (x, y)dxdy = [ xf (x|y)dx]fY (y)dy
Z
E(X|y)fY (y)dy = E(E(X|Y ))

Replacing integrals by sums to prove the discrete case. ¤

Using Theorem 4.4.1, we have

EX = E(E(X|Y )) = E(pY ) = pλ

for Example 4.4.2.


154 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Definition 4.4.1 A random variable X is said to have a mixture


distribution if the distribution of X depends on a quantity that
also has a distribution.

Thus, in Example 4.4.1 the Poisson(λp) distribution is a mixture


distribution since it is the result of combining a binomial(Y, p) with
Y ∼ Poisson(λ).

Theorem 4.4.2 (Conditional variance identity) For any two ran-


dom variables X and Y ,

VarX = E(Var(X|Y )) + Var(E(X|Y )),

provided that the expectations exist.

Proof: By definition, we have


VarX = E([X − EX]2) = E([X − E(X|Y ) + E(X|Y ) − EX]2)

= E([X − E(X|Y )]2) + E([E(X|Y ) − EX]2) + 2E([X − E(X|Y )][


The last term in this expression is equal to 0, however, which can
easily be seen by iterating the expectation:

E([X−E(X|Y )][E(X|Y )−EX]) = E(E{[X−E(X|Y )][E(X|Y )−EX]|Y }

In the conditional distribution X|Y , X is the random variable. Con-


ditional on Y , E(X—Y) and EX are constants. Thus,

E{[X−E(X|Y )][E(X|Y )−EX]|Y } = (E(X|Y )−E(X|Y ))(E(X|Y )−EX


4.4. HIERARCHICAL MODELS AND MIXTURE DISTRIBUTIONS 155

Since

E([X − E(X|Y )]2) = E(E{[X − E(X|Y )]2|Y }) = E(¯(X|Y )).

and
E([E(X|Y ) − EX]2) = Var(E(X|Y )),

Theorem 4.4.2 is proved. ¤


156 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Example 4.4.3 (Beta-binomial hierarchy) One generalization of


the binomial distribution is to allow the success probability to vary
according to a distribution. A standard model for this situation
is

X|P ∼ binomial(P ), i = 1, . . . , n,

P ∼ beta(α, β).
The mean of X is then

EX = E[E(X|p)] = E[nP ] = .
α+β
Since P ∼ beta(α, β),
αβ
Var(E(X|P )) = Var(np) = n2 .
(α + β)2(α + β + 1)
Also, since X|P is binomial(n, P ), Var(X|P ) = nP (1 − P ). We
then have
Z
Γ(α + β) 1
E[Var(X|P )] = nE[P (1 − P )] = n p(1 − p)pα−1(1 − p)β−1dp
Γ(α)Γ(β) 0
Γ(α + β) Γ(α + 1)Γ(β + 1) nαβ
=n = .
Γ(α)Γ(β) Γ(α + β + 2) (α + β)(α + β + 1)
Adding together the two pieces, we get
nαβ(α + β + n)
VarX = .
(α + β)2(α + β + 1)
4.5. COVARIANCE AND CORRELATION 157

4.5 Covariance and Correlation

In earlier sections, we have discussed the absence or presence of a


relationship between two random variables, Independence or nonin-
dependence. But if there is a relationship, the relationship may be
strong or weak. In this section, we discuss two numerical measures
of the strength of a relationship between two random variables, the
covariance and correlation.
Throughout this section, we will use the notation EX = µX , EY =
2
µY , VarX = σX , and VarY = σY2 .

Definition 4.5.1 The covariance of X and Y is the number de-


fined by
Cov(X, Y ) = E((X − µX )(Y − µY )).

Definition 4.5.2 The correlation of X and Y is the number de-


fined by
Cov(X, Y )
ρXY = .
σX σY
The value ρXY is also called the correlation coefficient.

Theorem 4.5.1 For any random variables X and Y ,

Cov(X, Y ) = EXY − µX µY .
158 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Theorem 4.5.2 If X and Y are independent random variables,


then Cov(X, Y ) = 0 and ρXY = 0.

Theorem 4.5.3 If X and Y are any two random variables and


a and b are any two constants, then

Var(aX + bY ) = a2VarX + b2VarY + 2abCov(X, Y ).

If X and Y are independent random variables, then

Var(AX + bY ) = a2VarX + b2VarY.

Theorem 4.5.4 For any random variables X and Y ,

a. −1 ≤ ρXY ≤ 1.

b. |ρXY | = 1 if and only if there exist numbers a 6= 0 and b such


that P (Y = aX + b) = 1. If ρXY = 1, then a > 0, and if
ρXY = −1, then a < 0.

Proof: Consider the function h(t) defined by

h(t) = E((X − µX )t + (Y − µY ))2

= t2σX
2
+ 2tCov(X, Y ) + σY2 .

Since h(t) ≥ 0 and it is quadratic function,

(2Cov(X, Y ))2 − 4σX


2 2
σY ≤ 0.
4.5. COVARIANCE AND CORRELATION 159

This is equivalent to

−σX σY ≤ Cov(X, Y ) ≤ σX σY .

That is,
−1 ≤ ρXY ≤ 1.

Also, |ρXY | = 1 if and only if the discriminant is equal to 0, that is, if


and only if h(t) has a single root. But since ((X −µX )t+(Y −µY ))2 ≥
0, h(t) = 0 if and only if

P ((X − µX )t + (Y − µY ) = 0) = 1.

This P (Y = aX + b) = 1 with a = −t and b = µX t + µY , where


t is the root of h(t). Using the quadratic formula, we see that this
2
root is t = −Cov(X, Y )/σX . Thus a = −t has the same sign as ρXY ,
proving the final assertion. ¤
160 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Example 4.5.1 (Correlation-I) Let X have a uniform(0,1) dis-


tribution and Z have a uniform(0,0.1) distribution. Suppose X
and Z are independent. Let Y = X + Z and consider the random
vector (X, Y ). The joint pdf of (X, Y ) is

f (x, y) = 10, 0 < x < 1, x < y < x + 0.1

Note f (x, y) can be obtained from the relationship f (x, y) = f (y|x)f (x).
Then

Cov(X, Y ) = EXY = −(EX)(EY )

= EX(X + Z) − (EX)(E(X + Z))


2 1
= σX =
12
11
The variance of Y is σY2 = VarX + VarZ = + 1200
12 . Thus
r
1/12 100
ρXY =p p = .
1/12 1/12 + 1/1200 101
The next example illustrates that there may be a strong relationship
between X and Y , but if the relationship is not linear, the correlation
may be small.
4.5. COVARIANCE AND CORRELATION 161

Example 4.5.2 (Correlation-II) Let X ∼ U nif (−1, 1), Z ∼ U nif (0, 0.


and X and Z be independent. Let Y = X 2 + Z and consider the
random vector (X, Y ). Since given X = x, Y ∼ U nif (x2, x2 +0.1).
The joint pdf of X and Y is
1
f (x, y) = 5, −1 < x < 1, x2 < y < x 2 + .
10
Cov(X, Y ) = E(X(X 2 + Z)) − (EX)(E(X 2 + Z))

= EX 3 + EXZ − 0E(X 2 + Z)

=0
Thus, ρXY = Cov(X, Y )/(σX σY ) = 0.

Definition 4.5.3 Let −∞ < µX < ∞, −∞ < µY < ∞, 0 < σX ,


0 < σY , and −1 < ρ < 1 be five real numbers. The bivariate
2
normal pdf with means µX and µY , variances σX and σY2 , and
correlation ρ is the bivariate pdf given by
1 © 1 ¡ x − µX 2 x − µX y −
f (x, y) = p exp − ( ) −2ρ( )(
2πσxσY 1 − ρ2 2(1 − ρ2) σX σX σ
for −∞ < x < ∞ and −∞ < y < ∞.

The many nice properties of this distribution include these:


2
a. The marginal distribution of X is N (µX , σX ).

b. The marginal distribution of Y is N (µY , σY2 ).


162 CHAPTER 4. MULTIPLE RANDOM VARIABLES

c. The correlation between X and Y is ρXY = ρ.

d. For any constants a and b, the distribution of aX +bY is N (aµX +


bµY , a2σX
2
+ b2σY2 + 2abρσX σY ).

Assuming (a) and (b) are true, we will prove (c). Let
x − µX y − µY x − µX
s=( )( ) and t = ( ).
σX σY σX
Then x = σX t + µX , y = (σY s/t) + µY , and the Jacobian of the
transformation is J = σX σY /t. With this change of variables, we
obtain
Z ∞ Z ∞
σY s σX σY
ρXY = + µY )|
sf (σX t + µX , |dsdt
t t
Z−∞ −∞
∞ Z ∞ p ¡ 1 s 2¢
= s(2πσX σY 1 − ρ2)−1 exp − 2
(t 2
− 2ρs + ( ))
2(1 − ρ) t
Z−∞

−∞
Z ∞
1 t2 s ¡ (s − ρt2)2 ¢
= √ exp(− )dt √ p exp − ds
−∞ 2π 2 −∞ 2π (1 − ρ2)t2 2(1 − ρ2)t2
The inner integral is ES, where S is a normal random variable with
ES = ρt2 and VarS = (1 − ρ2)t2. Thus,
Z ∞
ρt2
ρXY = √ exp{−t2/2}dt = ρ.
−∞ 2π
4.6. MULTIVARIATE DISTRIBUTIONS 163

4.6 Multivariate Distributions

The random vector X = (X1, . . . , Xn) has a sample space that is


a subset of Rn. If X is discrete random vector, then the joint pmf
of x is the function defined by f (x) = f (x1, . . . , xn) = P (X1 =
x1, . . . , Xn − xn) for each (x1, . . . , xn) ∈ Rn. Then for any A ⊂ Rn,
X
P (X ∈ A) = f (x).
x∈A
If X is a continuous random vector, the joint pdf of X is a function
f (x1, . . . , xn) that satisfies
Z Z Z Z
P (X ∈ A) = · · · f (x)dx = · · · f (x1, . . . , xn)dx1 · · · dxn.
A A

Let g(x) = g(x1, . . . , xn) be a real-valued function defined on the


sample space of X. Then g(X) is a random variable and the expected
value of g(X) is
Z ∞ Z ∞
Eg(X) = ··· g(x)f (x)dx
−∞ −∞

and
X
Eg(X) = g(x)f (x)
x∈Rn
in the continuous and discrete cases, respectively.
The marginal distribution of (X1, . . . , Xn) , the first k coordinates
164 CHAPTER 4. MULTIPLE RANDOM VARIABLES

of (X1, . . . , Xn), is given by the pdf or pmf


Z ∞ Z ∞
f (x1, . . . , xk ) = ··· f (x1, . . . , xn)dxk+1 · · · dxn
−∞ −∞

or
X
f (x1, . . . , xk ) = f (x1, . . . , xn)
(xk+1 ,...,xn )∈Rn−k

for every (x1, . . . , xk ) ∈ Rk .


If f (x1, . . . , xk ) > 0, the conditional pdf or pmf of (Xk+1, . . . , Xn)
given X1 = x1, . . . , Xk = xk is the function of (xk+1, . . . , xn) defined
by
f (x1, . . . , xn)
f (xk=1, . . . , xn|x1, . . . , xk ) = .
f (x1, . . . , xk )
4.6. MULTIVARIATE DISTRIBUTIONS 165

Example 4.6.1 (Multivariate pdfs) Let n = 4 and





 3 (x21 + x22 + x23 + x24) 0 < xi < 1, i = 1, 2, 3, 4
4
f (x1, x2, x3, x4) =


0 otherwise

The joint pdf can be used to compute probabilities such as


1 3 1
P (X1 < , X2 < , X4 > )
2 4 2
Z 1Z 1Z 3 Z 1
4 2 3 151
= (x21 + x22 + x23 + x24)dx1dx2dx3dx4 = .
1
2 0 0 0 4 1024
The marginal pdf of (X1, X2) is
Z 1Z 1
3 2 3 1
f (x1, x2) = (x1 + x22 + x23 + x24)dx2dx4 = (x21 + x22) +
0 0 4 4 2
for 0 < x1 < 1 and 0 < x2 < 1.

Definition 4.6.1 Let n and m be positive integers and let p1, . . . , pn


P
be numbers satisfying 0 ≤ pi ≤ 1, i = 1, . . . , n, and ni=1 pi = 1.
Then the random vector (X1, . . . , Xn) has a multinomial distribu-
tion with m trials and cell proabilities p1, . . . , pn if the joint pmf
of (X1, . . . , Xn) is
Yn xi
m! x1 p
f (x1, . . . , xn) = p1 · · · pxnn = m! i
x1 ! · · · xn ! x!
i=1 i

on the set of (x1, . . . , xn) such that each xi is a nonnegative integer


P
and ni=1 xi = m.
166 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Example 4.6.2 (Multivariate pmf ) Consider tossing a six-sided


die 10 times. Suppose the die is unbalanced so that the probability
of observing an i is i/21. Now consider the vector (X1, . . . , X6),
where Xi counts the number of times i comes up in the 10 tosses.
Then (X1, . . . , X6) has a multinomial distribution with m = 10
1 6
and cell probabilities p1 = 21 , . . . , p6 = 21 . For example, the prob-
ability of the vector (0, 0, 1, 2, 3, 4) is
10! 1 2 3 4 5 6
f (0, 0, 1, 2, 3, 4) = ( )0( )0( )1( )2( )3( )4 = 0.0059.
0!0!1!2!3!4! 21 21 21 21 21 21
m!
The factor x1 !···xn ! is called a multinomial coefficient. It is the num-
ber of ways that m objects can be divided into n groups with x1 in
the first group, x2 in the second group, . . ., and xn in the nth group.

Theorem 4.6.1 (Multinomial Theorem) Let m and n be positive


integers. Let A be the set of vectors x = (x1, . . . , xn) such that
P
each xi is a nonnegative integer and ni=1 xi = m. Then, for any
real numbers p1, . . . , pn,
X m!
(p1 + . . . + pn) = m
px1 1 . . . pxnn .
x1! · · · xn!
x∈A
4.6. MULTIVARIATE DISTRIBUTIONS 167

Definition 4.6.2 Let X1, . . . , Xn be random vectors with joint pdf


or pmf f (x1, . . . , xn). Let fX i (xi) denote the marginal pdf or pmf
of X i. Then X1, . . . , Xn are called mutually independent random
vectors if, for every (x1, . . . , xn),
n
Y
f (x1, . . . , xn) = fX 1 (x1) . . . fX n (xn) = fX i (xi).
i=1

If the Xi’s are all one dimensional, then X1, . . . , Xn are called
mutually independent random variables.

Mutually independent random variables have many nice properties.


The proofs of the following theorems are analogous to the proofs of
their counterparts in Sections 4.2 and 4.3.

Theorem 4.6.2 (Generalization of Theorem 4.2.1) Let X 1, . . . , X n


be mutually independent random variables. Let g1, . . . , gn be real-
valued functions such that gi(xi) is a function only of xi, i =
1, . . . , n. Then

E(g1(X1) · · · g(Xn)) = (Eg1(X1)) · · · (Egn(Xn)).


168 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Theorem 4.6.3 (Generalization of Theorem 4.2.2) Let X 1, . . . , X n


be mutually independent random variables with mgfs MX1 (t), . . . , MXn (t).
Let Z = X1 + · · · + Xn. Then the mgf of Z is

MZ (t) = MX1 (t) · · · MXn (t).

In particular, if X1, . . . , Xn all have the same distribution with


mgf MX (t), then
MZ (t) = (MX (t))n.

Example 4.6.3 (Mgf of a sum of gamma variables) Suppose X1, . . . , Xn


are mutually independent random variables, and the distribution
of Xi is gamma(αi, β). Thus, if Z = X1 + . . . + Xn, the mgf of Z
is

MZ (t) = MX1 (t) · · · MXn (t) = (1−βt)−α1 · · · (1−βt)−αn = (1−βt)−(α1+···+αn

This is the mgf of a gamma(α1 + · · · + αn, β) distribution. Thus,


the sum of a independent gamma random variables that have a
common scale parameter β also has a gamma distribution.
4.6. MULTIVARIATE DISTRIBUTIONS 169

Example 4.6.4 Let X1, . . . , Xn be mutually independent random


variables with Xi ∼ N (µi, σi2). Let a1, . . . , an and b1, . . . , bn be
fixed constants. Then
n
X n
X n
X
Z= (aiXi + bi) ∼ N ( (aiµi + bi), a2i σi2).
i=1 i=1 i=1

Theorem 4.6.4 (Generalization of Lemma 4.2.1) Let X 1, . . . , X n


be random vectors. Then X 1, . . . , X n are mutually independent
random vectors if and only if there exist functions gi(xi), i =
1, . . . , n, such that the joint pdf or pmf of (X 1, . . . , X n) can be
written as
f (x1, . . . , xn) = g1(x1) · · · gn(xn).
170 CHAPTER 4. MULTIPLE RANDOM VARIABLES

Theorem 4.6.5 (Generalization of Theorem 4.3.2) Let X 1, . . . , X n


be random vectors. Let gi(xi) be a function only of xi, i =
1, . . . , n. Then the random vectors Ui = gi(X i), i = 1, . . . , n,
are mutually independent.

Let (X1, . . . , Xn) be a random vector with pdf fX (x1, . . . , xn). Let
A = {x : fX (x) > 0}. Consider a new random vector (U1, . . . , Un),
defined by U1 = g1(X1, . . . , Xn), . . ., Un = gn(X1, . . . , Xn). Suppose
that A0, A1, . . . , Ak form a partition of A with these properties. The
set A0, which may be empty, satisfies P ((X1, . . . , Xn) ∈ A0) = 0.
The transformation (U1, . . . , Un) = (g1(X), . . . , gn(X)) is a one-to-
one transformation from Ai onto B for each i = 1, 2, . . . , k. Then for
each i, the inverse functions from B to Ai can be found. Denote the
ith inverse by x1 = h1i(u − 1, . . . , un), . . . , xn = hni(u1, . . . , un). Let
Ji denote the Jacobian computed from the ith inverse. That is,
¯ ¯
¯ ∂h1i(u) ∂h1i(u) ∂h1i (u) ¯¯
¯ . . . ∂u1 ¯
¯ ∂u1 ∂u2
¯ ∂h (u) ∂h (u) ∂h2i (u) ¯
¯
¯ 2i 2i
. . . ∂u1 ¯
¯ ∂u1 ∂u2
Ji = ¯ ¯
¯ ... ... ... ... ¯
¯ ¯
¯ ¯
¯ ∂hni(u) ∂hni(u) ∂hni (u) ¯
¯ ∂u1 ∂u2 . . . ∂u1 ¯
the determinant of an n × n matrix. Assuming that these Jacobians
do not vanish identically on B, we have the following representation
4.6. MULTIVARIATE DISTRIBUTIONS 171

of the joint pdf, fU (u1, . . . , un), for u ∈ B:


k
X
fu(u1, . . . , un) = fX (h1i(u1, . . . , un), . . . , hni(u1, . . . , un))|Ji|.
i=1

You might also like