MTH 514 Notes CH 4,5,6,10
MTH 514 Notes CH 4,5,6,10
f
X
(x) dx.
For pairs of random variables X and Y the PDF is called Bivariate and for contin-
uous bivariate (X, Y ) the joint probability density function (joint PDF) is f
XY
(x, y) =
2
F
XY
(x,y)
xy
which is related to the Joint CDF by
F
XY
(x, y) =
_
x
_
y
f
XY
(u, v)dudv
The Joint CDF has properties analogous to the single variable case. It represents the
cumulative probability that the random pair (X, Y ) lies below and to the left of (x, y) as
in the above gure. Here is a list of properties of the CDF, PDF and PMF. This is a long
list, but the properties should look familiar. Most of them are either direct translations
of properties for the single variable case, or generalizations of the properties we learned
4.1. ABOUT PAIRS OF RANDOM VARIABLES 79
for the probabilities of events in general. After the list, we shall illustrate bivariate RVs
through examples.
4.1.1 The Joint CDF
Let S be a sample space of a random experiment. Let X and Y be two rvs dened on S.
The ordered pair (X, Y ) is called a bivariate rv. (X, Y ) is either a continuous, discrete or
mixed rv depending on the characters of X and Y .
A) The Joint cumulative distribution function is dened as F
XY
(x, y) = P[X x, Y y].
Here the event {X x, Y y} is the intersection of the events A = {X x} and
B = {Y y} so
F
XY
(x, y) = P[A B].
where F
X
(x) = P[A] and F
Y
(y) = P[B]. Note that X and Y are statistically inde-
pendent if F
XY
(x, y) = F
X
(x)F
Y
(y) for every x and y. The portion of the sample space
covered by the Joint CDF is pictured in the above Fig.
B) Properties
a) 0 F
XY
(x, y) 1
b) If x
1
x
2
and y
1
y
2
then: (F
XY
(x, y) is increasing)
F
XY
(x
1
, y
1
) F
XY
(x
2
, y
1
) F
XY
(x
2
, y
2
)
F
XY
(x
1
, y
1
) F
XY
(x
1
, y
2
) F
XY
(x
2
, y
2
)
c) F
XY
(, ) = 1 (P[S] = 1)
d) F
XY
(x, ) = F
XY
(, y) = 0 (P[] = 0)
e) lim
xa
+ F
XY
(x, y) = F
XY
(a, y) lim
yb
+ F
XY
(x, y) = F
XY
(x, b) (Right Continuity)
f) (Rectangular Domains) If x
1
x
2
and y
1
y
2
then:
P[x
1
X x
2
, Y y] = F
XY
(x
2
, y) F
XY
(x
1
, y)
P[X x, y
1
Y y
2
] = F
XY
(x, y
2
) F
XY
(x, y
1
)
P[x
1
X x
2
, y
1
Y y
2
] = F
XY
(x
2
, y
2
) F
XY
(x
1
, y
2
) F
XY
(x
2
, y
1
) +F
XY
(x
1
, y
1
)
C) Marginal Distribution Functions (One of the variables is completely unrestricted.)
lim
y
F
XY
(x, y) = F
X
(x) and lim
x
F
XY
(x, y) = F
Y
(y)
80 CHAPTER 4. PAIRS OF RANDOM VARIABLES
4.1.2 Joint Probability Density Functions
For continuous bivariate (X, Y ) the joint probability density function (joint PDF) is f
XY
(x, y) =
2
F
XY
(x,y)
xy
which is related to the Joint CDF by
F
XY
(x, y) =
_
x
_
y
f
XY
(u, v)dudv
Properties
a) f
XY
(x, y) 0
b)
_
f
XY
(u, v)dudv = 1
c) P[(X, Y ) A] =
__
R
A
f
XY
(u, v)dudv
d) P[a < X b, c < Y d] =
_
b
a
_
d
c
f
XY
(x, y) dy dx
Marginal probability density
The marginal PDFs are
f
X
(x) =
dF
X
(x)
dx
=
_
f
XY
(x, y) dy
and
f
Y
(y) =
dF
Y
(y)
dy
=
_
f
XY
(x, y) dx
Note that if X and Y are statistically independent then f
XY
(x, y) = f
X
(x)f
Y
(y) for every
x and y.
Conditional probability density
f
Y |X
(y|x) =
f
XY
(x, y)
f
X
(x)
f
X
(x) > 0
f
X|Y
(x|y) =
f
XY
(x, y)
f
Y
(y)
f
Y
(y) > 0
Note that if X and Y are statistically independent then f
X|Y
(x|y) = f
X
(x) and f
Y |X
(y|x) = f
Y
(y) .
In the above we have assumed continuous RVs, the same structure appears for discrete
RVs, however the analog of the cumulative distribution function is not often used.
4.1. ABOUT PAIRS OF RANDOM VARIABLES 81
4.1.3 Joint Probability Mass Function
Let (X, Y ) be a discrete bivariate RV with range over the discrete set of values {(X
i
, Y
j
), i =
1 . . . n
1
, j = 1, . . . n
2
}. Let
p
XY
(x
i
, y
j
) = P[X = x
i
, Y = y
j
]
then p
XY
(x
i
, y
j
) is called the joint probability mass function for (X, Y ).
Properties
a) p
XY
(x
i
, y
j
) 0
b)
n
1
i=1
n
2
j=1
p
XY
(x
i
, y
j
) = 1
c) P[(X, Y ) A] =
(x
i
,y
j
)R
A
p
XY
(x
i
, y
j
)
Marginal Probability Mass Function
The idea is to remove all restriction for one variable.
p
X
(x
i
) =
n
2
j=1
p
XY
(x
i
, y
j
)
and
p
Y
(x
j
) =
n
1
i=1
p
XY
(x
i
, y
j
)
(X, Y ) are independent if p
XY
(x
i
, y
j
) = p
X
(x
i
)p
Y
(x
j
)
Conditional Probability Mass Functions
p
X|Y
(x
i
|y
j
) =
p
XY
(x
i
, y
j
)
p
Y
(y
j
)
p
Y
(y
j
) > 0
p
Y |X
(y
j
|x
i
) =
p
XY
(x
i
, y
j
)
p
X
(x
i
)
p
X
(x
i
) > 0
(X, Y ) are independent if p
X|Y
(x
i
|y
j
) = p
X
(x) and p
Y |X
(y
j
|x
i
) = p
Y
(y
j
)
In the following we explore some of these properties through examples.
82 CHAPTER 4. PAIRS OF RANDOM VARIABLES
4.2 Sample Problems
Example 4.1 PDF Properties
Consider the joint PDF given by:
f
XY
(x, y) =
_
k(x +y) 0 < x < 2, 0 < y < 2,
0 otherwise.
Find the value of k, the marginal PDFs and determine whether X and Y are independent.
Sol: Integrate the PDF over its domain to make sure the integral is 1.
_
f
XY
(x, y)dx dy =
_
2
0
_
2
0
k(x+y)dx dy = k
_
2
0
((
x
2
2
+xy)
2
0
dy = k
_
2
0
(2+2y) dy = 8k
Since this integral must be 1, k = 1/8.
Figure 4.2: A sketch of the Bivariate PDF.
The marginal distribution
of X is f
X
(x) =
_
f
XY
(x, y)dy =
_
2
0
1
8
(x+y) dy =
1
4
(x+1) where
this last result is for 0 < x < 2.
So
f
X
(x) =
_
1
4
(x + 1) 0 < x < 2,
0 otherwise.
The marginal for Y is the
same except Y is replaced by X.
Note that:
f
X
(x)f
Y
(y) =
_
1
16
(x + 1)(y + 1) 0 < x < 2, 0 < y < 2,
0 otherwise.
This is not the same as
f
XY
(x, y) and so the two RVs are not independent.
In the following example we integrate the PDF to get the CDF.
4.2. SAMPLE PROBLEMS 83
Example 4.2 PDF to CDF
Use the PDF above to nd the corresponding CDF.
Sol:
The PDF is dened to be non-zero in a rectangular region, and zero everywhere else. To
dene the CDF we shall have to be a bit careful about the everywhere else region. In the
Fig. to the right we see that there are 5 dierent areas we shall have to distinguish.
1 2
1
2
x
y
Figure 4.3: A sketch of the areas underlying
the CDF.
Given the PDF
f
XY
(x, y) =
_
1
8
(x +y) 0 < x < 2, 0 < y < 2,
0 otherwise,
the region where either x < 0 or y < 0, (pink
area in gure) is such that both the PDF and
CDF must be zero. In the region 0 < x <
2, 0 < y < 2, the PDF and CDF are both non-
zero and we will have to integrate the PDF to
get the CDF. In the region x > 2, y > 2 the
PDF is zero but the CDF must be 1, since all
random pairs (X, Y ) have both X and Y less
than 2. Finally, the two regions x > 2, 0 < y < 2 and 0 < x < 2, y > 2 correspond to
regions where the PDF is zero, but where the CDF is non-zero but dependent only on a
single variable. So here is the CDF:
F
XY
(x, y) =
_
_
0 {x 0} {y 0}
_
x
0
_
y
0
1
8
(s +t)ds dt =
1
16
(xy
2
+x
2
y) 0 < x 2, 0 < y 2,
1
16
(2y
2
+ 2
2
y) = y
2
/8 +y/4 x > 2, 0 < y 2,
1
16
(2x
2
+ 2
2
x) = x
2
/8 +x/4 y > 2, 0 < x 2,
1 x > 2, y > 2,
Figure 4.4: A sketch of the Bivariate CDF.
Note the continuity of the CDF, there are
no jumps when going from one region to the
other. In the gure to the right we see some
general features of Bivariate CDFs. When
both x and y are arbitrarily large, the CDF
must approach 1. This is the analog of the
feature of one-dimensional CDFs, that there
must be a horizontal asymptote at y = 1, as
x . Notice that there are no local min-
ima, the CDF cannot be decreasing anywhere.
Finally, notice that as either x or y go to ,
the CDF must go to zero.
84 CHAPTER 4. PAIRS OF RANDOM VARIABLES
Example 4.3 Bivariate CDF. LG4.9
Let X and Y denote the amplitude of noise signals at two antennas. The random
vector(X, Y ) has the joint PDF:
f(x, y) = axe
ax
2
/2
bye
by
2
/2
x > 0, y > 0. a > 0 b > 0.
a) Find the joint CDF.
b) Find P[X > Y].
c) Find the marginal PDF of X and Y.
Sol: The CDF is zero except in the region x > 0, y > 0 and here
a)
F
XY
(x, y) =
_
x
0
_
y
0
axe
ax
2
/2
bye
by
2
/2
dy dx
= (1 e
ax
2
/2
)(1 e
by
2
/2
)
b)
P[X > Y] =
_
0
_
x
0
axe
ax
2
/2
bye
by
2
/2
dy dx
=
_
0
axe
ax
2
/2
(1 e
bx
2
/2
) dx
= 1
a
a +b
c) For X > 0 F
X
(x) = lim
y
F
XY
(x, y) = (1 e
ax
2
/2
). The PDF is the derivative of this,
viz. f
X
(x) = axe
ax
2
/2
. Similarly f
Y
(y) = bye
by
2
/2
4.2. SAMPLE PROBLEMS 85
Example 4.4 Marginals and integrals. LG4.11
The random vector variable (X, Y ) is uniformly distributed inside the regions shown in
the gure (and zero elsewhere).
1) Find the value of k in each case.
2) Find the marginal PDF for X and Y in each case.
x
y
1
1
1
1
1
1
Figure 4.5: Three areas
Sol:
1) Since the random vector is uniformly distributed, the probability of the sample space is
the constant integrated over the respective area. Thus the constant must be one divided
by the area. The unit circle has area and thus k = 1/ for the circle. Similarly the
constant is k = 1/2 and k = 2 for the square and triangle respectively.
2) To nd the marginals we just integrate the PDF over one of the variables.
a) For the circle the PDF is f
X
(x) =
_
(1/)
_
x
2
+y
2
1
0 otw
Thus, for |x| < 1,
f
X
(x) =
_
1x
2
1x
2
dx
= 2
1 x
2
. Similarly f
Y
(y) = 2
_
1 y
2
2
e
x
2
/2
__
1
2
e
y
2
/2
_
=
1
2
e
(x
2
+y
2
)/2
from the denition of the PDF.
Now we wish to integrate the PDF over the unit disk. To do this we change from
rectangular to polar coordinates. Thus we let x = r cos , y = r sin so x
2
+ y
2
= r
2
, and
the area element becomes dxdy r dr d. The region R is then just the polar rectangle
R = {(r, ) | 0 < 2, and r 1} The probability is then:
_
R
_
f
X,Y
(x, y)dxdy =
_
2
0
_
1
0
1
2
e
r
2
/2
r dr d =
_
1
0
e
r
2
/2
r dr
The last integral is easily seen to be:
_
1
0
d
_
e
r
2
/2
_
= e
r
2
/2
1
0
= 1 e
1/2
0.3935.
88 CHAPTER 4. PAIRS OF RANDOM VARIABLES
Example 4.7 PMFs and Tables. LG4.5
a) Find the marginal PMFs for the pairs of random variables with the indicated joint
pmf.
b) Find the probabilities of the events: A = {X 0}, B = {X Y } and
C = {X = Y }
c) Are X and Y independent?
Figure 4.7: Three pmfs arranged in tabular form.
Sol:
a) Find the marginals by adding the columns and rows.
Figure 4.8: The three pmfs with their marginal distributions. Note that the marginals are
the same even though the pmfs are dierent.
b) By inspection, P[A] =
2
3
in all cases. P[B] is the sum of the probabilities along and
below the diagonal from top left to bottom right. This gives
5
6
,
2
3
, and
2
3
respectively.
Now P[C] = P[(X, Y) {(1, 1), (0, 0), (1, 1)}]. These are respectively,
2
3
,
1
3
, 1.
c) X and Y are independent i the PMF is the product of the marginals. This is the
case for (b) but not for the other two.
4.3. FUNCTIONS OF TWO RANDOM VARIABLES 89
4.3 Functions of Two Random variables
Just as we wanted to nd the distribution of derived random variables Y = g(X) when we
were considering the random variable X, so we shall often want to nd the distribution of
a derived random variable W = g(X, Y ) when we have a pair of random variables (X, Y ).
If X and Y are discrete with joint PMF p(x
i
, y
j
) then the PMF of W is:
P[W = k] =
g(x
i
,y
j
)=k
p(x
i
, y
j
)
If (X, Y ) is continuous with PDF f
X,Y
(x, y) then we rst nd the CDF of W as
F
W
(w) =
_
g(x,y)w
f
X,Y
(x, y) dxdy
and then dierentiate this to get the PDF.
Example 4.8 PMF Uniform RDY4.6.4
Let X and Y be discrete RVs with joint PMF
P
X,Y
=
_
_
_
0.01 x = 1, 2, . . . 10
y = 1, 2, . . . 10
0 otherwise.
Find the PMF of W = Min(X, Y ).
Sol: The sample space for W is just {1, 2, . . . 10} since the minimum of X and Y can be
anything from 1 to 10. Now P[W w] = F
W
(w) = 1 P[W > w]
W = 3
Figure 4.9: W = Min(X, Y )
From the gure we see that
P[W > w] = 0.01 (10 w)
2
since the number of elements in the shaded
square is (10 w)
2
so,
F
W
(w) =
_
_
_
0 w 0
1 0.01 (10 w)
2
w = 1, 2, . . .
1 w = 11, 12 . . . .
Thus the PMF, for w = 1, 2, . . . 10. is
P[w] = F
W
(w) F
W
(w 1)
= 0.01((10 (w 1))
2
(10 w)
2
)
= 0.01(21 2w)
90 CHAPTER 4. PAIRS OF RANDOM VARIABLES
Here is a common problem that is phrased as an exercise. The common problem is to
nd the PDF of a sum of two random variables.
Example 4.9 Sum of RVs Let Z = X +Y be the sum of two random variables that have
joint PDF f
X,Y
(x, y). Find f
Z
(z) and F
Z
(z).
Sol: The region Z z is sketched below. We have
P[Z z] = P[X + Y z].
y z x
x
y
z
z
Figure 4.10: Adding two random vari-
ables.
The region may be covered by vertical strips
where y runs from to zx and x is unrestricted.
This gives us the double integral
F
Z
(z) =
_
dx
_
zx
f
X,Y
(x, y) dy
Dierentiating this with respect to z gives us the
PDF
f
Z
(z) =
dF
Z
(z)
dz
=
_
f
X,Y
(x, z x) dx.
This is a form of convolution. Note that in the
specic case that X and Y are independent and
f
X,Y
(x, y) = f
X
(x)f
Y
(y) this is:
f
Z
(z) =
dF
Z
(z)
dz
=
_
f
X
(x) f
Y
(z x) dx.
Thus the distribution of the sum of two inde-
pendent random variables is the convolution of the
individual PDFs!
4.3. FUNCTIONS OF TWO RANDOM VARIABLES 91
Example 4.10 Sum Example
Suppose we have a pair of identical systems, one of which we use as a backup. The
rst system is turned on and runs until it fails. Then the second system is turned on
until it fails. If the lifetime of both systems are identically distributed, independent random
variables with exponential PDF
f
T
(t) =
_
e
t
t 0
0 otherwise.
What is the lifetime of the combined system?
1 2
Figure 4.11: A Backup system.
Sol:
The lifetime of the system will be the sum of two
random variables Z = T
1
+ T
2
from the given PDF.
For Z 0 the PDF is clearly zero. Otherwise we
have, from the previous problem:
f
Z
(z) =
_
f
T
(w) f
T
(z w) dw
=
_
z
0
2
e
w
e
(zw)
dw
=
2
e
z
_
z
0
1 dw
=
2
z e
z
Notice this is the Erlang PDF for m = 2.
The above example is a special case of a more general result. The sum of m independent
identically distributed exponential() random variables is an Erlang RV with parameters
and m.
Another common function of two random variables is the Maximum function. Let us
nd its PDF as an example.
92 CHAPTER 4. PAIRS OF RANDOM VARIABLES
Example 4.11 Maximum Function
If X and Y are random variables with PDF f
X,Y
(x, y) what is the PDF of the random
variable Z = Max(X, Y )?
Sol:
x
y
z
z
max(x, y) z
y = x
Figure 4.12: Max(X, Y ) z
The region Max(X, Y ) z is sketched in the g-
ure. The CDF of Z is then
P[Z z] = F
Z
(z) =
_
z
_
z
f
X,Y
(x, y) dx dy = F
X,Y
(z, z)
In the special case that X and Y are independent so
f
X,Y
(x, y) = f
X
(x)f
Y
(y), this is
F
Z
(z) =
_
z
_
z
f
X
(x)f
Y
(y) dxdy = F
X
(z) F
Y
(z)
To get the PDF we dierentiate with respect to z:
f
z
(z) = f
X
(z)F
Y
(z) +F
X
(z)f
Y
(z)
4.4 Expectation
As in the single variable case, the expected value of any function of a pair of random
variables is the sum/integral of the function times the PMF/PDF over the entire sample
space. Thus if W = g(X, Y ) then
Discrete RVs: E[W] =
X
i
S
X
Y
j
S
Y
p(x
i
, y
j
)g(x
i
, y
j
)
Continuous RVs E[W] =
_
f
X,Y
(x, y)g(x, y) dxdy
As before, expectation is linear so E[ag(X, Y )+bh(X, Y ] = aE[g(X, Y )] +bE[h(X, Y )].
In particular, if X and Y are any two random variables, we have E[X+Y ] = E[X] +E[Y ],
so the expected sum of two random variables is the sum of the expected values. Let us
check to see if this rule works for variance.
4.4. EXPECTATION 93
Var [X +Y ] = E[(X +Y (
X
+
Y
))
2
]
= E[(X
X
) + (Y
Y
))
2
]
= E[(X
X
)
2
+ 2(X
X
)(Y
Y
) + (Y
Y
)
2
]
= E[(X
X
)
2
] + 2E[(X
X
)(Y
Y
)] +E[(Y
Y
)
2
]
= Var [X] + Var [Y ] + 2E[(X
X
)(Y
Y
)]
Where we have used the linearity property of expectation. The residual term above
Cov[X, Y ] =
XY
= E[(X
X
)(Y
Y
)]
is called the Covariance of X and Y , and we have Var [X +Y ] = Var [X] + Var [Y ] only
if the Covariance is zero. Notice
E[(X
X
)(Y
Y
)] = E[XY X
Y
Y
X
+
X
Y
] = E[XY ]
X
Y
E[XY ] is called the Correlation of X and Y and is denoted r
X,Y
= E[XY ]. If r
X,Y
= 0
we say X and Y are orthogonal . We have: Cov[X, Y ] = r
X,Y
X
Y
.
The covariance of two random variables is a measure of how they co-vary, or move in
a related fashion. One might expect that if they are independent then their covariance
should be zero. This is the case. If f
X,Y
(x, y) = f
X
(x) f
Y
(y) then:
Cov[X, Y ] =
_
(x
x
)(y
y
)f
X
(x) f
Y
(y) dxdy = E[X
X
]E[Y
Y
] = 0
If the covariance is zero we say X and Y are uncorrelated. A standard measure of
correlation is the Correlation Coecient:
X,Y
=
Cov(X, Y )
Y
The correlation coecient is bounded between -1 and 1. A completely linear relationship
between X and Y will give one of the extremes 1, and the weaker the relationship between
the two, the smaller the absolute value of the coecient.
Here is an exercise where we calculate some of these parameters.
Figure 4.13: The PDF forf
XY
(x, y) =
1
2
sin(x +y)
Example 4.12 LG-4.99
The random variables X and Y have joint pdf
f
XY
(x, y) = c sin(x +y) 0 x
2
0 y
2
94 CHAPTER 4. PAIRS OF RANDOM VARIABLES
a) Find c.
b) Find the joint cdf of X and Y.
c) Find the marginal pdf s of X and Y.
d) Find the mean, variance and covariance of X
and Y.
Sol: This is an exercise requiring use of denitions and trig integrals.
a) The probability of the sample space must be 1 so:
_
2
0
_
2
0
c sin(x +y) dxdy = 1
In actual fact this is just F(
2
,
2
) and since we have to calculate F in part (b) we
postpone the determination of c until we nish part (b).
b)
F(x, y) =
_
y
0
_
x
0
c sin(x
+y
) dx
dy
2
,
2
) = c(1 + 1) => c=1/2 .
c)
f
X
(x) =
_
2
0
1
2
sin(x +y) dy =
1
2
(cos x + sin x)
similarly
f
Y
(y) =
_
2
0
1
2
sin(x +y) dx =
1
2
(cos y + sin y).
d) Here it is easiest to use the marginals for the means and variances. The integrations
may be done using integration by parts.
E[X] =
_
2
0
x
2
(cos x + sin x) dx =
4
= E[Y ]
Similarly
E[X
2
] =
_
2
0
x
2
2
(cos x + sin x) dx =
2
8
+
2
= E[Y
2
].
Thus
4.5. CONDITIONAL DISTRIBUTIONS 95
Var [X] = E[X
2
] E[X]
2
=
2
16
+
2
= Var [Y ].
Also, by denition
E[XY ] =
_
2
0
_
2
0
xy
2
sin (x +y) dxdy =
2
1 0.57
so
Cov(X, Y) = E[XY] E[X] E[Y] =
2
1
_
4
_
2
0.046
and
X,Y
=
Cov(X, Y )
0.046
2
/16
= 0.075
In the above the small value of
X,Y
indicates a weak negative dependence of y on x.
4.5 Conditional distributions
Conditional bivariate distributions behave exactly as one might expect from the single
variable, and event/set case. The notation was mentioned in the rst section. We provide
a couple of examples.
96 CHAPTER 4. PAIRS OF RANDOM VARIABLES
Example 4.13 Conditional PMFs. LG4.30
Find the conditional pmf s of X given Y = 1 for example(4.2.32).
Sol: We use P[A|B] = P[A B]/P[B]. In this case the P[A B] are the table entries in
the rst row and the P[B] are the marginals at the end of the rst row. See the gure.
Figure 4.14: The top tables are the PMFs, the middle tables are P[AB] and the bottom
tables have the respective conditional probabilities.
Example 4.14 Conditional PDF LG4.10/30
The random vector (X, Y ) has the joint PDF f
X,Y
(x, y) = k(x+y) for the region R =
{(0 < x < 1) (0 < y < 1)}.
a) Find k.
b) Find the joint CDF of (X, Y ).
c) Find the marginal PDFs of X and Y.
d) Find f
Y |X
(y|x) for 0 y 1.
e) Find E[Y |X].
Sol:
a) To nd k integrate over the sample space to get 1. However since we shall need the
CDF, to avoid integrating twice we nd this rst. In the region R we have F(x, y) =
4.5. CONDITIONAL DISTRIBUTIONS 97
_
x
0
_
y
0
k(x
+y
) dy
dx
= k
_
x
0
_
x
2
2
+x
x
0
dy
= k
_
x
0
_
x
2
2
+xy
_
dy
= k
xy(x +y)
2
.
Since F(1, 1) = 1 this gives k = 1. Thus, the PDF is
f
X,Y
(x, y) =
_
(x +y) x R
0 otherwise.
b) The above calculation yielded the CDF for {(0 < x < 1) (0 < y < 1)}. Thus in this
region F(x, y) =
xy(x +y)
2
. In quadrants 2, 3, and 4 the CDF is identically 0. For
{(x > 1) (y > 1)} the CDF is 1. The other two areas correspond to evaluating the
CDF at its maximum extent in x and y. So in the region {(0 < x < 1) (y > 1)} we
have F(x, y) = F(x, 1) =
x(x + 1)
2
. Similarly in the region {(0 < y < 1) (x > 1)}
we have F(x, y) = F(1, y) =
y(y + 1)
2
c) We nd the marginal CDFs and then dierentiate to nd the marginal PDF.
F
X
(x) = lim
y
F
XY
(x, y) = F
XY
(x, 1) = x
(x + 1)
2
)
so f
X
(x) =
dF
dx
= (x +
1
2
). Similarly f
Y
(y) = (y +
1
2
).
d) We have, for 0 x 1, f
Y |X
(y|x) = f
XY
(y, x)/f
X
(x) =
(x+y)
(x+
1
2
)
for 0 y 1
e) Since we have the PDF f
Y |X
(y|x) we can calculate this directly as:
E[Y |X] =
_
f
Y |X
(y|x)y dy =
_
1
0
y
(x +y)
(x +
1
2
)
dy =
x/2
(x +
1
2
)
+
1/3
(x +
1
2
)
The conditional expectation of a function of a random variable is itself a random var-
iable. We can calculate its expected value! It turns out that:
E[E[g(X)|Y ]] = E[g(X)]
Let us just verify this for the above example.
98 CHAPTER 4. PAIRS OF RANDOM VARIABLES
Example 4.15 Verify nested expectation
In the above problem nd E[E[Y |X]].
Sol: By denition
E[E[Y |X]] =
_
E[y|x]f
X
(x) dx =
_
1
0
_
x/2 + 1/3
(x +
1
2
)
_
(x +
1
2
) dx = 7/12
Now
E[Y ] =
_
yf
Y
(y) dy =
_
1
0
_
y +
1
2
_
y dy = (
y
3
3
+y
2
/4
_
1
0
= 7/12
Thus E[E[g(X)|Y ]] = E[g(X)] in this case.
4.6 Bivariate Gaussian RV
A very important bivariate distribution is the joint Gaussian:
f
X,Y
=
exp{
1
2(1
2
)
_
_
xx
x
_
2
2
_
xx
x
__
yy
y
_
+
_
yy
y
_
2
_
}
2
x
y
_
1
2
(4.1)
Here
x
,
y
,
2
x
, and
2
y
are the means and variances of X and Y respectively. is the
correlation coecient
X,Y
. The marginal distributions for X and Y are independent of
the correlation coecient and are both normal: respectively N(
X
,
X
) and N(
Y
,
Y
).
The random variables X and Y are independent when = 0. Let us explore this case rst.
When = 0, the PDF becomes:
f
X,Y
=
exp{
1
2
_
_
xx
x
_
2
+
_
yy
y
_
2
_
}
2
x
y
=
_
1
2
x
exp
_
1
2
_
x
x
x
_
2
__
_
1
2
y
exp
_
1
2
_
y
y
y
_
2
__
= f
X
(x) f
Y
(y)
Where here f
X
(x) and f
Y
(y) are the marginals, and both are Gaussians. Notice that if we
dene new variables Z
1
=
Xx
x
and Z
1
=
Xx
x
then, changing variables
4.6. BIVARIATE GAUSSIAN RV 99
Figure 4.15: The PDF for the bivariate stan-
dard normal. On the left is a sketch of the
surface, on the right a contour plot. Notice
the circular symmetry.
f
Z
1
,Z
2
=
exp{
1
2
_
z
2
1
+z
2
2
_
}
2
=
_
1
2
exp
_
z
2
1
/2
_
1
2
exp
_
z
2
2
/2
_
= f
Z
1
(z
1
) f
Z
2
(z
2
)
and we see we have a product of two standard
normal PDFs. In the gure we see a sketch of
the bivariate PDF surface and a contour plot.
The contour plot shows the circular symmetry
of the distribution and suggests that, in fact,
polar rectangles would be more natural for
this PDF than Cartesian rectangles. Let us
test this out in an example.
Example 4.16 Standard Bivariate Normal
Consider the bivariate unit normal above and dene the random variables R and via
X = Rcos() and Y = Rsin(). Find the PDF, CDF and marginals of this pairs of
variables.
Sol: Assume 0 < 2 and let W be the region W = {0 R < r, 0 < } then
P[0 R < r, 0 < ] =
_
W
_
f
Z
1
,Z
2
(z
1
, z
2
) dz
1
dz
2
in polar coordinates dz
1
dz
2
rdrd, and this latter integral becomes:
_
0
_
r
0
1
2
exp [r
2
/2]r
dr
() = lim
r
F
R,
(r, ) =
2
so
f
() =
_
1
2
0 < 2,
0 otherwise.
and we see that the marginal distribution for is uniform. Similarly
F
R
(r) = lim
2
F
R,
(r, ) = 1 e
(r
2
/2)
for r 0
so
f
R
(r) =
_
e
(r
2
/2)
r > 0,
0 otherwise.
The transformation to polar coordinates above considerably simplied the picture of
what the PDF and CDF looked like. However, to obtain the circular symmetry that
allowed the simple picture, we assumed that was zero. If is non-zero, X and Y are not
independent. The distribution of X depends on the value of Y and vice versa. The algebra
is a bit cumbersome, but an appropriate change of variables transforms the general PDF
(5.1) into a product of two Gaussian distributions, one of which is conditional. That is,
given the random variables (X, Y ) of (5.1), we can write the PDF as:
f
X,Y
=
exp{
1
2(1
2
)
_
_
xx
x
_
2
2
_
xx
x
__
yy
y
_
+
_
yy
y
_
2
_
}
2
x
y
_
1
2
=
_
1
2
x
exp[(
x
x
x
)/2]
__
1
2
2
exp[(
y
2
2
2
)/2]
_
= f
X
(x)f
Y |X
(y|x) (4.2)
4.7. SUMMARY 101
where y
2
=
_
y
2
2
_
with
2
=
Y
+
X
(X
X
) and
2
2
=
2
Y
(1
2
). Notice here
that we have expressed the PDF as a product of two PDFs. The rst is a marginal
PDF, the second is a conditional PDF. Both are Gaussian and we notice that as 0,
f
Y |X
(y|x) f
Y
(y) and the two variables become independent, as we expect. We shall
return to the joint Gaussian in the next chapter on random vectors.
4.7 Summary
This chapter extends the results of previous chapters to cover pairs of random variables.
For pairs of random variables, events correspond to regions in a plane. Thus, the inte-
grals and sums of the previous chapter become double-integrals and double-sums. The
CDF still contains all the information of the random variables, but is less useful than in
the single variable case. Instead the PDF and PMF are generally used, as these allow
integration/summation over more general regions.
The joint CDF is F
X,Y
(x, y) = P[X x, Y y] and the marginal CDFs are F
X
(x) =
F
X,Y
(x, ) = P[X x] and F
Y
(y) = F
X,Y
(, y) = P[Y y]. If X and Y are
independent then F
X,Y
(x, y) = F
X
(x) F
Y
(y).
The joint probability density f
X,Y
(x, y) is proportional to the probability that the
bivariate random variable lies in a small region about the point (x, y). Actual prob-
abilities are found by integrating the joint PDF over regions of the (x, y) plane. X
and Y are independent if f
X,Y
(x, y) = f
X
(x)f
Y
(y).
Expected values of functions of random variables are calculated as double-integrals or
double-sums over the sample spaces. Two new functions Cov[X, Y] and r
X,Y
provide
information on the relation between X and Y .
Conditional probability is dened in accordance with previous uses. For example
f
Y |X
(y|x) = f
XY
(y, x)/f
X
(x) denes the PDF of Y conditioned on a particular value
of X.
4.8 Sample Test Questions
T-1 The CDF of a pair of random variable (X, Y) is given by:
F
X,Y
(x, y) =
_
x
2
y
2
0 x 1, 0 y 2
0 otherwise
In the region {0 < x < 1, 0 < y < 2} the PDF must be:
A) x B)
y
2
C)
x
3
y
2
12
D )
x
2
E) None of the above.
102 CHAPTER 4. PAIRS OF RANDOM VARIABLES
x
y
1 3
2
4
Figure 4.17: Figure for question 2.
T-2 Let X and Y be two continuous random variables
with CDF F
X,Y
. Let A be the event
A = {1 < X 3, 2 < Y 4} that is sketched in Fig. 2.
The probability of A P[A] is:
A) (F
X,Y
(3, 4) F
X,Y
(1, 2))
B) (F
X,Y
(3, 4) F
X,Y
(1, 2) F
X,Y
(3, 2) F
X,Y
(1, 4))
C) (F
X,Y
(3, 4) +F
X,Y
(1, 2) F
X,Y
(3, 2) F
X,Y
(1, 4))
D) (F
X,Y
(3, 4) +F
X,Y
(2, 1) F
X,Y
(3, 2) F
X,Y
(1, 4))
E) None of the above.
T-3 The bivariate PDF of a random pair (X, Y ) is given by:
f
X,Y
(x, y) =
_
2e
x
e
2y
x 0, y 0
0 otherwise.
Consider the following statements.
i) X and Y are dependent random variables.
ii) X and Y are identically distributed random variables.
iii) The probability that Y < 4 given that X > 1 is 0.435
Choose the correct statement.
A) Only i) is true. B) Only ii) is true. C) Only iii) is true. D) Only i) and ii) are
true. E) i), ii) and iii) are all true.
T-4 Suppose (X, Y ) has joint PMF P
X,Y
(x, y) where X and Y are non-negative integers:
0 X + and 0 Y +. Consider the following statements.
i) P[a Y X b] =
b
x=a
_
b
y=a
P
X,Y
(x, y)
_
ii) P[a X b] =
b
x=a
(P
X,Y
(x, y) )
iii) P[a Y b, X c] =
c
x=0
_
b
y=a
P
X,Y
(x, y)
_
4.8. SAMPLE TEST QUESTIONS 103
Choose the correct statement.
A) Only i) is true. B) Only ii) is true. C) Only iii) is true. D) Only i) and ii) are
true. E) i), ii) and iii) are all true.
T-5 For two independent ips of a fair coin, let X be the random variable representing
the total number of tails and let Y equal the number of heads on the last ip.
a) List the elements of the sample space as a set of ordered pairs (X, Y ).
b) In the box below, enter the values for the joint probability mass function (PMF).
Also add the values for the marginal PMFs in the rows/columns labeled
.
Y \X 0 1 2
0
1
(c) Show that X and Y are independent/dependent whichever happens to be the case.
(d) Find the probability that X = 1 given Y = 0.
(e) Find the probability that Y X.
T-6 Robin Hood shoots arrows at a circular target. The impact point of his arrows is a
bivariate random variable (X, Y ) with PDF
f
XY
(x, y) =
1
2
exp[(x
2
+y
2
)/2]
with x and y measured in inches from the centre of the target. (Hint: Polar coordinates
are very useful here!)
(a) What is the probability that his arrow lands within one inch of the centre of the
target?
(b) What is the probability that his arrow is more that 3 inches from the centre?
T-7 The random vector variable (X, Y ) has the joint PDF
f(x, y) =
_
k(x +y) 0 < X < 1, 0 < y < 1
0 otherwise
A) Find k.
B) Find the joint Cumulative Distribution Function (CDF) of (X, Y ).
C) Find the marginal PDFs.
104 CHAPTER 4. PAIRS OF RANDOM VARIABLES
T-8 Let X and Y denote the amplitude of noise signals at two antennas. The random
vector(X, Y ) has the joint PDF:
f(x, y) = axe
ax
2
/2
bye
by
2
/2
x > 0, y > 0. where a > 0 & b > 0.
(a) Find the joint CDF.
(b) Find P[X > Y].
(c) Find the marginal PDFs of X and Y.
(d) Are X and Y independent?
T-9 The amplitudes of two signals X and Y have joint PDF:
f
X,Y
(x, y) = y exp
_
x
2
y
2
_
for x > 0, y > 0
(a) Find the joint CDF.
(b) Find the two marginal PDFs.
(c) Find P
_
X > Y
_
.
(d) Find P (Y > X | X = x).
(e) Find E [X | Y = y].
(f) Are X and Y independent? Explain.
Chapter 5
Random Vectors
5.1 About Random Vectors
In the previous chapter we discussed bivariate distributions where the random variables
occurred in pairs. We can think of bivariate random variables as vectors with two com-
ponents. The generalization to vectors with n components with n > 2 requires only that
we specify PDFs and CDFs that are functions of n variables rather than just two vari-
ables. However, if we keep the notation we used for the single and bivariate cases by listing
components, just writing down functions becomes cumbersome. Although, for clarity, we
sometimes have to revert to this notation, when possible we shall default to a vector no-
tation that treats vectors and matrices as single objects. For example, if we have a vector
of n random variables such as (X
1
, X
2
, . . . X
n
)
is a column vector, the prime here standing for transpose.) we shall denote this by
a boldface symbol such as:
X = (X
1
, X
2
, . . . X
n
)
This not only allows for compact notation so that we can write, for example, F
X
instead of
F
X
1
,X
2
,...Xn
, it also allows us to bring in the power of general linear algebra. For example,
consider our previous formula for the bivariate Gaussian PDF,
f
X,Y
(x, y) =
exp
_
1
2(1
2
)
_
_
xx
x
_
2
2
_
xx
x
__
yy
y
_
+
_
yy
y
_
2
__
2
x
y
_
1
2
(5.1)
not a formula that is going to win any prizes for transparency or compactness. However
using vector notation this is
f
X
(x) =
1
(2)
n/2
(det(C
X
))
1/2
exp
_
1
2
(x
X
)
C
1
X
(x
X
)
_
(5.2)
105
106 CHAPTER 5. RANDOM VECTORS
where in the bivariate case n is two. Recall that the single-variable case, written sugges-
tively, is
f
X
(x) =
1
(2)
1/2
exp
_
1
2
(x
X
)
1
2
(x
X
)
_
. (5.3)
By comparing (5.2) and (5.3), we can see that the Covariance Matrix C
X
is a matrix
analog of the variance
2
and that the square root of its determinant (det(C
X
))
1/2
serves
to normalize the multivariate Gaussian as did in the single variable case. This similarity
between the two formulas (5.2) and (5.3) is of course not accidental, but a byproduct of the
denition of matrix multiplication and the denition of C
X
. It will be particularly useful
when considering the multivariate Gaussian.
5.2 Distribution Functions and Notation
(x, y, z)
Figure 5.1: The CDF of a 3-component vec-
tor accumulates the probability of nding
the random variable in an octant. On the
right is a sketch of such an octant where
the area enclosed by the shaded planes is
{{X x} {Y y} {Z z}}. On the
left is a sketch of the 8 octants dened by a
single point.
The joint CDF, F
X
(x) of the random vector
X = (X
1
, X
2
, . . . X
n
) is dened as the proba-
bility that X lies in the semi-innite region as-
sociated with the point x. As for the bivariate
case, the CDF is useful for formulating prob-
lems, and we list some properties here. In cases
when the region of interest is not rectangular,
the PDF or PMF are preferred, and we shall
discuss them subsequently.
5.2.1 The Joint CDF
The Joint cumulative distribution function is
dened as F
X
(x) = P[X
1
x
1
, . . . X
n
x
n
].
Here the event {X
1
x
1
, . . . X
n
x
n
} is the
intersection of the events A
1
= {X
1
x
1
},
A
2
= {X
2
x
2
} A
n
= {X
n
x
n
} so
F
X
(x
1
, x
2
, . . . x
n
) = F
X
(x) = P[A
1
A
2
. . .A
n
].
Notice there are n univariate marginals, F
X
1
(x
1
) = P[A
1
] . . . F
Xn
(x
n
) = P[A
n
]. These
are all obtained from the original CDF by setting all but one variable to +.
B) Properties
a) 0 F
X
(x) 1
b) F
X
(, . . . , ) = 1
5.2. DISTRIBUTION FUNCTIONS AND NOTATION 107
c) lim
x
k
F
X
(x
1
, x
2
, . . . x
n
) = 0.
d) There are 2
n
2 marginal CDFs obtained by setting appropriate x
i
s to innity. So for
example F
X
(x
1
, x
2
, . . . , x
n1
, ) is an (n1) variate marginal and F
X
(x
1
, , . . . , )
is a univariate marginal. (Why are there 2
n
2 distinct marginals in total?)
e) The random variables X
1
. . . X
n
are statistically independent if
F
X
(x) = F
X
1
(x
1
)F
X
2
(x
2
) . . . F
Xn
(x
n
)
f) F
X
(x) is right-continuous in all x
k
.
5.2.2 Joint Probability Density Functions
For continuous multivariate F
X
(x) the joint probability density function (joint PDF) is
f
X
(x) =
n
F
X
(x)
x
1
. . . x
n
which is related to the Joint CDF by
F
X
(x) =
_
x
1
. . .
_
xn
f
X
(u
1
, . . . , u
n
)du
1
. . . du
n
Properties
a) f
X
(x) 0
b)
_
. . .
_
f
X
(u)du
1
. . . du
n
= 1
c) P[X A] =
_
. . .
_
A
f
X
(u)du
1
. . . du
n
where R
A
is the region corresponding to the
event A.
Marginal probability densities
The univariate marginal PDFs are
f
X
k
(x
k
) =
dF
X
k
(x
k
)
dx
k
=
_
. . .
_
f
X
(x
1
, . . . , x
n
)dx
1
. . . dx
k1
dx
k+1
. . . dx
n
The bivariate and higher order marginals are similarly dened.The random variables
X
1
. . . X
n
are statistically independent if
f
X
(x) = f
X
1
(x
1
)f
X
2
(x
2
) . . . f
Xn
(x
n
)
Pairs of random vectors.
Just as we considered pairs of random variables (X, Y ) after considering univariate
random variables. We may also consider pairs of random vectors (X, Y). The properties
of PDFs and CDFs follow from the fact that the pair (X, Y) is just a convenient way
of writing (X, Y) = (X
1
, X
2
, . . . X
n
, Y
1
, . . . Y
n
). The PDF and CDF properties follow as a
result.
108 CHAPTER 5. RANDOM VECTORS
5.2.3 Joint Probability Mass functions
In the above we have assumed continuous RVs, the same structure appears for discrete
RVs, however the analog of the cumulative distribution function is seldom used, instead
we use a probability mass function.
Let (X
1
, . . . , X
n
) be a discrete n-variate RV. Let
P
X
1
,...,Xn
(x
1
, . . . , x
n
) = P[X
1
= x
1
, . . . X
n
= x
n
]
then P
X
1
,...,Xn
(x
1
, . . . , x
n
) = P
X
is called the joint probability mass function for (X
1
, . . . , X
n
) = X.
Properties
a) P
X
0
b)
Sample Space
P
X
= 1
c) P[(X
1
, . . . , X
n
) A] =
(X
1
,...,Xn)R
A
P
X
1
,...,Xn
(x
1
, . . . , x
n
)
Marginal Probability Mass Functions
There are several of these depending on which variable(s) are left as independent vari-
ables. For example there are n-univariate marginal distributions of the form:
P
X
i
(x
i
) =
. . .
_
xn
f
X
(u
1
, . . . , u
n
)du
1
. . . du
n
Since the PDF is dened dierently in dierent regions, we need to gure out how
to write the above integral in such a way that we do not have to specify each region
separately. First notice that the probability that any X
i
is less than zero is zero. Thus
in the above multiple integral, each lower limit may be set to zero. Now if any of the
X
i
are greater than 1, the PDF changes from 1 to 0 in the region of integration, so
we really want to stop the upper limits of integration at +1 when a variable exceeds
1. So we notice that P[X
i
1 + ] = P[X
i
1] = 1 for every positive . Thus we
110 CHAPTER 5. RANDOM VECTORS
can say that for any x > 0, P[X
i
x] = P[X
i
min(x, 1)]. This allows us to rewrite
the CDF as
F
X
(x) =
_
min(x
1
,1)
0
. . .
_
min(xn,1)
0
f
X
(u
1
, . . . , u
n
)du
1
. . . du
n
= min(x
1
, 1)min(x
2
, 1) . . . min(x
n
, 1)
=
n
i=1
min(x
i
, 1)
where it is assumed that all x
i
0. Or, explicitly,
F
X
(x) =
_
n
i=1
min(x
i
, 1) all x
i
> 0
0 otherwise.
c) The univariate marginal CDF for X
1
is, for x > 0
F
X
1
(x) =
_
min(x,1)
0
_
1
0
. . .
_
1
0
1 du
1
. . . du
n
=
_
min(x,1)
0
du
1
= min(x, 1).
Similarly we nd F
X
1
(x) = F
X
2
(x) = . . . = F
Xn
(x) = min(x, 1).
d) From the above formula for the CDF we see that it is just the product of n univariate
marginal CDFs. The X
i
are then independent.
e) For n = 3,
P[max
i
(X
i
) 3/4] = P[X
1
3/4, X
2
3/4, X
3
3/4]
= F
X
(3/4, 3/4, 3/4)
=
_
3/4
0
_
3/4
0
_
3/4
0
dx
1
dx
2
dx
3
= (3/4)
3
= 27/64
f) For n = 3,
P[min
i
(X
i
) 3/4] = 1 P[X
1
3/4, X
2
3/4, X
3
3/4]
= 1
_
1
3/4
_
1
3/4
_
1
3/4
dx
1
dx
2
dx
3
= 1 (1/4)
3
= 63/64
The following is an exercise with a familiar theme.
5.2. DISTRIBUTION FUNCTIONS AND NOTATION 111
Example 5.2 (Trivariate Properties) Let X = (X
1
, X
2
, X
3
)
i=1
1
a
i
(1 exp[a
i
x
i
])
Setting P[S] = 1 we have:
F
X
() =
k
a
1
a
2
a
3
= 1
so we must have k = a
1
a
2
a
3
.
b) The marginal CDF for X
1
and X
2
is
F
X
1
,X
2
(x
1
, x
2
) = F
X
1
,X
2
,X
3
(x
1
, x
2
, ) = (1 exp[a
1
x
1
])(1 exp[a
2
x
2
])
The marginal PDF is the partial derivative of this with respect to x
1
and x
2
.
f
X
1
,X
2
(x
1
, x
2
) =
2
F
X
1
,X
2
(x
1
, x
2
)
x
1
x
2
= a
1
a
2
exp[(a
1
x
1
+a
2
x
2
)]
for x
1
, x
2
> 0.
1
We use the abbreviated notation x > 0 to mean that all components of x are greater than zero.
112 CHAPTER 5. RANDOM VECTORS
c) Similarly, the marginal CDF of X
1
is
F
X
1
(x
1
) = F
X
1
,X
2
,X
3
(x
1
, , ) = (1 exp[a
1
x
1
])
with the marginal PDF being the derivative of this, namely:
f
X
1
(x
1
) = a
1
exp[a
1
x
1
]
d) Notice we can write the PDF as the product of the marginal pdfs:
f
X
(x) = a
1
a
2
a
3
exp[(a
1
x
1
+a
2
x
2
+a
3
x
3
)] = f
X
1
(x
1
)f
X
2
(x
2
)f
X
3
(x
3
)
so the X
i
are independent.
It often happens that the components of a random vector are independently distributed
and have similar univariate marginal PDFs. A special and very important example of this
is when the components of the vector are independent and identically distributed. That is,
(X
1
, . . . , X
n
) are Independent and identically distributed random variables (iid)
if
f
X
(x) =
n
i=1
f
X
(x
i
).
Example 5.1 gave an instance of this. In this exercise the PDF was a product of univariate
uniform distributions.
Independence may be further generalized to the factoring of vector PDFs. Two random
vectors (X, Y) are independent if f
(X,Y)
(x, y) = f
X
(x)f
Y
(y). For example, if both the
vectors in the pair (X, Y) are taken from the distribution of Example 5.1 then the vectors
themselves are statistically independent and the bi-vector PDF a product of two individual
PDFs.
5.3 Functions of Vector Random Variables
When considering functions of vector random variables, the general procedure is the same
as the single and bivariate case. There are two levels of detail.
a) We require only distribution parameters like means or variances.
b) We require the full distribution, PDF or CDF of the derived random variable.
Suppose X = (X
1
. . . X
n
) is a vector random variable and Y = g(X) is a vector function
of X. If a) is the case we can use:
E[Y] =
_
. . .
_
g(x)f
X
(x) dx
5.3. FUNCTIONS OF VECTOR RANDOM VARIABLES 113
or its equivalent for a discrete RV. If b) is the case we usually look for the CDF of Y. That
is
P[Y y] = F
Y
(y) =
_
. . .
_
R
f
X
(x)dx
Where R is the region R = {x such that Y y}. To get the new PDF we dierentiate
this. Here is an example where we get some practice with expectation in an n-variable
case.
Example 5.3 (n-variable expectation) Let X
1
, . . . , X
n
be identically distributed RVs,
each with mean 0 and variance 1. Suppose the covariance is Cov [X
i
, X
j
] = . Find the
expected value and variance of the sum Y = X
1
+ +X
n
.
Sol: 1) E[Y ] = E[X
1
+. . . +X
n
] = E[x
1
] +. . . +E[X
n
] = 0
2) Var[Y ] = E[Y
2
] E[Y ]
2
= E[(X
1
+. . . +X
n
)
2
]
Var[Y ] =
n
i=1
E[X
2
i
] +
n
i=1
n
j=i
E[X
i
X
j
]
But E[X
2
i
] = Var[X
i
] = 1 and E[X
i
X
j
] = E[(X
i
i
)(X
j
j
)] = Cov [X
i
, X
j
] = .
so
Var[Y ] =
n
i=1
E[X
2
i
] +
n
i=1
n
j=i
E[X
i
X
j
] = n +n(n 1)
Here is an exercise in navigating between regular and vector notation.
114 CHAPTER 5. RANDOM VECTORS
Example 5.4 (Normal vector) Let X
1
, X
2
be Gaussian random variables with mean
1
= 50,
2
= 62 and covariance matrix C
X
=
_
16.0 12.8
12.8 16.0
_
.
A) Write the PDF in vector form.
B) Write the PDF in component form.
Sol:
A) From the covariance matrix we see that (det(C
X
))
1/2
= 9.6, so
(det(C
X
))
1/2
2 = 60.3
This is the normalization factor for the Gaussian expressed in vector form. Furthermore
C
1
=
_
0.1736 0.1389
0.1389 0.1736
_
and (x
x
)
= (x
1
50, x
2
62) so
f
X
(x) =
1
(60.3)
exp
_
1
2
(x
1
50, x
2
62)C
1
X
(x
1
50, x
2
62)
_
B) From the diagonal elements of C
X
, we see that X
1
and X
2
both have variance 16 so
X
1
X
2
=
12.8
16
= 0.8
2
x
1
x
2
_
1
2
= 60.3
2(1
2
) = 19.2
and in component form:
f
X
(x) =
exp
_
1
2(1
2
)
_
_
x
1
x
1
x
_
2
2
_
x
1
x
1
x
__
x
2
x
2
x
_
+
_
x
2
x
2
x
_
2
__
2
x
y
_
1
2
=
exp
_
1
19.2
_
(x
1
50)
2
1.6 (x
1
50) (x
2
62) + (x
2
62)
2
__
60.3
In the following we practice nding expectation values.
5.3. FUNCTIONS OF VECTOR RANDOM VARIABLES 115
Example 5.5 (Two-Variable Expectation) The joint pdf of a bivariate RV (X, Y ) is
given by:
f
X,Y
(x, y) =
1
3
exp[
2
3
(x
2
xy +y
2
)]
a) Find the means and variances of X and Y .
b) Find the correlation coecient of X and Y .
Sol: A) This problem involves integration of a Gaussian. Let us rst check to make sure
the PDF is properly normalized. To do this we integrate the PDF over all x and y.
_
f
X,Y
(x, y)dxdy =
_
3
exp[q(x, y)] dxdy
Where q(x, y) =
2
3
(x
2
xy + y
2
). To perform the integration we complete the square
and write
q(x, y) =
2
3
_
(y
x
2
)
2
+
3x
2
4
_
=
2
3
(y
x
2
)
2
+
x
2
2
so
exp[q(x, y)] = exp[
2
3
(y
x
2
)
2
] exp[x
2
/2]
Our strategy for the integration will be to integrate over y rst, using the change of
variables:
2
3
(y
x
2
)
2
= z
2
/2; z =
2
3
(y
x
2
), with dz =
2
3
dy
We choose this change of variables in order to use the result from the standard normal:
_
exp[z
2
/2] dz =
2.
Now
_
exp[
2
3
(y
x
2
)
2
] dy
= exp[x
2
/2]
_
exp[z
2
/2]
3
2
dz
= exp[x
2
/2]
3
2
2.
116 CHAPTER 5. RANDOM VECTORS
This gives us the marginal distribution for X as
f
X
(x) =
_
1
3
_
exp[x
2
/2]
_
3
2
2
_
=
1
2
exp[x
2
/2]
so
_
1
3
__
__
exp[q(x, y)] dy
_
dx =
_
_
1
2
exp[x
2
/2]
_
dx = 1
and we see that the PDF is correctly normalized. To nd the expected value of X we note
that:
E[X] =
_
1
3
__
x exp[q(x, y)]dx dy =
_
2
x exp[x
2
/2] dx = 0
where the integral is 0 from symmetry. Similarly
Var[X] = E[(X
X
)
2
] = E[X
2
] =
_
2
x
2
exp[x
2
/2] dx = 1
The value of the last integral follows from the fact that the PDF is the same as the standard
Gaussian with zero mean and unit variance. From the complete symmetry between x and
y in the PDF it is clear that E[Y ] = E[X] = 0 and Var[Y ] = Var[X] = 1.
B) To nd the correlation coecient we need to nd E[XY ], ie:
E[XY ] =
_
1
3
__
xy exp[q(x, y)]dx dy
Using the change of variables we used above this is:
E[XY ] =
_
1
3
__
x y exp[q(x, y)] dy dx
=
_
1
3
__
_
x exp[x
2
/2]
_
y exp[
2
3
(y
x
2
)
2
] dy
_
dx
=
_
1
3
__
_
xexp[x
2
/2]
_
3z
2
+
x
2
) exp[z
2
/2]
3
2
dz
_
dx
=
_
1
3
_
(
3
2
)
_
_
xexp[x
2
/2]
_
x
2
exp[z
2
/2] dz
_
dx
=
_
1
2
__
x
2
2
exp[x
2
/2] dx =
1
2
.
Thus =
Cov[X,Y ]
Y
=
1/2
11
= 1/2.
5.3. FUNCTIONS OF VECTOR RANDOM VARIABLES 117
In the special case of an invertible linear transformation between random vectors, we
have the following theorem.
Theorem: 5.3.4 If X is a continuous random vector and A an invertible matrix,
then Y = AX+b has PDF:
f
Y
(y) =
1
|det(A)|
f
X
(A
1
(y b))
Notice that this is just a vector version of our invertible transformation equation for a
single variable (3.14). Thus for linear transformations, we have a formula to relate PDFs.
In the particular case of the PDF of a Gaussian random vector:
f
X
(x) =
1
(2)
n/2
(det(C
X
))
1/2
exp
_
1
2
(x
X
)
C
1
X
(x
X
)
_
where the elements of the Covariance Matrix are:
[C
X
]
i,j
= E[(X
i
i
)(X
j
j
)]
The change of variables to Y give a new Gaussian PDF with means
Y
= A
X
+b and
Covariance matrix C
Y
= AC
X
A
.
Here is an example of a linear transformation of variables.
Example 5.6 (Linear Transformation) Find the joint PDF of the Random Variables
U = X
1
, V = X
1
+X
2
and W = X
1
+X
2
+X
3
a) Assuming a PDF f
X
(x).
b) Assuming the X
i
are iid RVs in N(0, 1).
Sol: It is actually easier to nd X as a function of U, V, W directly here. However for
practice we will nd A
1
rst. Writing the transformation in matrix form Y = AX we
have
Y =
_
_
U
V
W
_
_
=
_
_
1 0 0
1 1 0
1 1 1
_
_
_
_
X
1
X
2
X
3
_
_
The matrix A is
A =
_
_
1 0 0
1 1 0
1 1 1
_
_
118 CHAPTER 5. RANDOM VECTORS
and det(A) = 1. We can invert A using Gauss-Seidel:
_
_
1 0 0 1 0 0
1 1 0 0 1 0
1 1 1 0 0 1
_
_
_
_
1 0 0 1 0 0
0 1 0 1 1 0
0 1 1 1 0 1
_
_
_
_
1 0 0 1 0 0
0 1 0 1 1 0
0 0 1 0 1 1
_
_
where we have used only elementary row operations to nd
A
1
=
_
_
1 0 0
1 1 0
0 1 1
_
_
Thus
_
_
X
1
X
2
X
3
_
_
=
_
_
1 0 0
1 1 0
0 1 1
_
_
_
_
U
V
W
_
_
=
_
_
U
V U
W V
_
_
so, since det(A) = 1
f
Y
(y) = f
X
(u, v u, w v)
B) If
f
X
(x
1
, x
2
, x
3
) =
1
(2)
3/2
exp[(x
2
1
+x
2
2
+x
2
3
)/2]
then
f
Y
(y) =
1
(2)
3/2
exp[(u
2
+ (v u)
2
+ (w v)
2
)/2]
5.4 Summary
In this chapter we have extended the idea of bivariate random variables to vectors of
random variables. Thus instead of having a sample space that is a subset of the plane,
we have sample spaces that are subsets of n-dimensional space. Vector notation generally
simplied expressions and in some cases allowed us to see multidimensional formulas
simply extended to the n-dimensional case. Some pertinent generalizations are:
The concept of marginal distributions was generalized the multivariable case. In-
stead of just two single-variable marginal distributions there are 2
n
2 marginal
distributions.
The PMF/PDF, and CDF of n independent random variables are products of n
univariate distributions.
5.5. SAMPLE TEST QUESTIONS 119
The formulas for expected values of functions of a random vector are straightforward
generalizations of the bivariate case.
The Covariance matrix of a random vector contains the covariances of all pairs of
random variables in the random vector.
A linear function of a Gaussian random vector is also a Gaussian random vector.
5.5 Sample Test Questions
T-1 Suppose R = (X, Y, Z) is a three-component random vector with joint PDF f
XY Z
(x, y, z).
Consider the following statements:
i) There are 6 marginal distributions for R.
ii) f
XZ
(x, z) =
_
f
XY Z
(x, y, z) dy is a marginal distribution for R.
iii) If X and Y and Z are independent then E[XY ] = E[X]E[Y ].
(a) Only i) is true.
(b) Only ii) is true.
(c) Only iii) is true.
(d) Only i) and iii) are true.
(e) i), ii) and iii) are all true.
T-2 Suppose X = (X
1
, X
2
, . . . X
n
)
i=1
E[X
i
]
b) Var [W
n
] =
n
i=1
n
j=1
Cov(X
i
, X
j
) =
n
i=1
Var [X
i
] + 2
n1
i=1
n
j=i+1
Cov(X
i
, X
j
)
The full PDF of W
n
will depend on the full PDF f
X
(x) however there are many special
cases of interest. Let us consider an example.
Example 6.1 Let the random variables X
1
, X
2
, . . . X
n
be iid from the exponential distri-
bution with parameter . Let W
n
= X
1
+X
2
+. . . +X
n
be the sum of n of these RVs.
a) Find the PDF of W
2
.
b) What is the mean and variance of W
2
.
c) Find the PDF of W
3
.
d) Use induction to nd the PDF of W
n
.
Sol:
a) The PDF of X is
f
X
(x) =
_
exp[x] x 0
0 otherwise.
To nd the PDF of W
2
we rst nd the CDF and then dierentiate.
F
W
(w) = P[W w] = P[X
1
+X
2
w] = P[X
1
wX
2
] =
_
_
wx
2
f
X
1
,X
2
(x
1
, x
2
)dx
1
dx
2
6.1. EXPECTED VALUES OF SUMS 123
Now X
1
and X
2
are non-negative so the lower limits of integration are both 0. We
have
F
W
(w) =
_
0
_
wx
2
0
()
2
exp[(x
1
+x
2
)]dx
1
dx
2
and dierentiating this with respect to w gives us the PDF:
dF
W
(w)
dw
= f
W
(w) =
_
w
0
(e
x
)(e
(wx)
)dx =
_
2
wexp[w] w 0
0 otherwise.
Notice, this is the Erlang PDF with parameters and n = 2
b) We could use the calculated PDF directly to nd the mean and variance, however it
is easier to use the formulas for the mean and variance of sums of RVs. In this case
E[W
2
] = E[X
1
] +E[X
2
] =
2
Similarly, since X
1
and X
2
are independent, the variance of the sum is the sum of
the variances so
Var[W
2
] =
2
2
c) To nd the PDF of W
3
we rst nd the CDF, using the above PDF for W
2
and then
dierentiate.
F
W
(w) = P[W
3
w] = P[W
2
+X
3
w] = P[X
3
ww
2
] =
_
_
ww
2
f
X
3
,W
2
(x
3
, w
2
)dx
3
dw
2
Thus
f
W
(w) =
_
w
0
f
X
3
,W
2
(wx, x)dx =
_
w
0
(e
x
)(
2
(wx)e
(wx)
) dx =
3
e
w
_
w
0
(wx) dx
so
f
W
(w) =
_
3
w
2
e
w
2!
w 0
0 otherwise.
This is the Erlang distribution of order 3.
d) If we assume that the PDF of W
k
is given by the Erlang distribution up to k = n we
have, for W
n+1
:
f
W
n+1
(w) =
_
w
0
(e
x
)(
n
(n 1)!
(wx)
n1
e
(wx)
) dx = (e
w
)(
n+1
(n 1)!
)
_
w
0
(wx)
n1
dx
so
f
W
n+1
(w) =
_
w
n
n+1
n!
e
w
w 0
0 otherwise.
This is the Erlang distribution of order n + 1, and the result follows.
124 CHAPTER 6. SUMS OF RANDOM VARIABLES
In the above example, notice that calculating the PDF of a sum of two independent
random variables always involves a convolution integral. If f
X
(x) and f
Y
(y) are the PDFs
of X and Y respectively, the PDF of X +Y is
f
X+Y
(w) =
_
f
X
(x) f
Y
(w x) dx
Recall from your experience with Laplace Transforms that such integrals are well handled
by transform methods.
In probability theory, the analog of the Laplace Transform is the Moment Generating
Function.
6.2 Moment Generating Functions
The MGF for the random variable X is dened as:
X
(s) = E[e
sX
] =
_
exp [sx]f
X
(x) dx
The reason for the name Moment Generating Function is easily seen.
X
(0) =
_
f
X
(x) dx = 1
d
X
(s)
ds
s=0
=
_
xf
X
(x) dx = E[X]
d
2
X
(s)
ds
2
s=0
=
_
x
2
f
X
(x) dx = E[X
2
]
.
.
.
As an exercise let us calculate the MGF for the Erlang PDFs above.
Example 6.2 a) Calculate the MGF for the exponential distribution.
b) Calculate the MGF for the Erlang distribution of order 2.
Sol:
a) The PDF of the Exponential RV X is.
f
X
(x) =
_
exp[x] x 0
0 otherwise.
6.2. MOMENT GENERATING FUNCTIONS 125
We have
E[e
sX
] =
X
(s) =
_
0
exp[x] exp[sx] dx =
_
0
exp[(s)x] dx =
s
( > s)
b) Similarly for the Erlang n = 2
f
X
(x) =
_
2
xexp[x] x 0
0 otherwise.
and
E[e
sX
] =
X
(s) =
_
0
2
xexp[x] exp[sx] dx =
_
0
2
xexp[(s)x] dx =
2
( s)
2
( > s)
This last integral may be calculated using integration by parts or by noticing that we
obtain the second integral from the integral in part a) through dierentiation with
respect to s. Notice that the MGF of the sum of two exponential random variables
is the product of the two MGFs.
Like the Laplace transform of a function, the MGF contains the same information as the
original PDF, organized in a dierent way. There is a table of common MGFs in your text
on page 249. The main utility of the MGF is through the following theorem.
Theorem: 6.2.6 For a set of independent random variables X
1
. . . X
n
, the MGF of
the sum W = X
1
+. . . +X
n
is the product:
W
(s) =
X
1
(s)
X
2
(s) . . .
Xn
(s).
If the X
i
are identically distributed then this becomes:
W
(s) =
n
X
(s).
In practice it is often easier to use the MGF for sums of random variables than to use
the PDF. It also helps to identify families of random variables. To see this let us modify
example 6.1 to use the MGF.
Example 6.3 Let the random variables X
1
, X
2
, . . . X
n
be iid from the exponential distri-
bution with parameter . Let W
n
= X
1
+X
2
+. . . +X
n
be the sum of n of these RVs.
a) Find the MGF of W
2
.
b) Use the MGF to identify the PDF from the table on pg. 249 of your text.
126 CHAPTER 6. SUMS OF RANDOM VARIABLES
c) Calculate the mean and variance of W
2
using the MGF.
d) Find the MGF of W
n
.
e) What is the mean and variance of W
n
.
Sol:
a) The MGF of X is:
(s) = E[e
s
] =
_
0
e
s
e
x
dx =
s
.
From the above theorem, the MGF of W
2
= X
1
+X
2
is just:
2
(s) =
_
s
_
2
b) From the table we see that the PDF corresponding to
2
(s) is just the Erlang PDF
for n = 2.
c) The mean of W
2
, using the MGF is
2
=
d
2
(s)
ds
s=0
=
2
2
( s)
3
s=0
= 2/
The variance can be calculated by rst nding the second moment:
E[W
2
2
] =
d
2
2
(s)
ds
2
s=0
=
6
2
( s)
4
s=0
= 6/
2
so
var(W
2
) = E[W
2
2
] E[W
2
]
2
= 6/
2
(2 )
2
= 2/
2
d) From the convolution theorem, the MGF of W
n
is just
n
(s) =
n
1
=
_
s
_
n
e) We could calculate the mean and variance of W
n
as we did for W
2
but since W
n
is
the sum of n iid RVs we can also use the fact that the expected value of the sum is
the sum of the expected values, ie.
n
= n = n/ and that the variance of the sum
is the sum of the variances (because of independence) so var(W
n
) = n/
2
.
6.3. THE CENTRAL LIMIT THEOREM 127
6.3 The Central Limit Theorem
The Central Limit Theorem, as its name suggests, is a result that lies at the heart of many
applications of probability theory. The content of the theorem is simply that the PDF of a
sum of n iid RVs X, ie. W
n
= X
1
+X
2
+. . . +X
n
, approaches the Gaussian distribution
with mean n
X
and variance nvar(X) as n gets large, regardless of the details of f
X
(x).
This is a remarkable result. It means that the Gaussian distribution may often be used
to approximate the distribution of the average of a sample of n iid random variables. For
example suppose we dene:
X = W
n
/n = (X
1
+X
2
+. . . +X
n
)/n.
Then E[
X] = n
X
/n =
X
and Var
_
X
=
1
n
2
Var [W
n
] = Var [X]/n.
X, besides being
the usual estimate of the population mean
X
, is also a random variable with its own
PDF. Calculating the PDF of
X could be a complicated procedure if the PDF of X is
complicated. (We could of course nd the MGF of
X easily once we have the MGF of X,
the diculty would occur when we attempted to invert the MGF.) However if n is large,
the central limit theorem tells us that the random variable
Z =
X
X
(
X
/
n)
is approximately the standard normal with zero mean and unit variance! In practice this
can save a lot of work. Here is an example.
Example 6.4 Consider an experiment of tossing a fair coin 1000 times. Find the proba-
bility of obtaining more than 520 heads.
Sol: Let X be a Bernoulli random variable, X = 1 for heads and X = 0 for tails. Then
p = P[X = 1] = 1/2 and W
1000
= X
1
+ . . . + X
1000
is a Binomial random variable with
mean = np = 1000
1
2
= 500 and variance Var [W
1000
] = np(1 p) = 250. Now
P[W
n
> 520] =
1000
k=521
_
1000
k
__
1
2
_
1000
and although we could in principle calculate this sum, in practice it is easier to approximate
it using the CLT. Notice that
P[W
n
> 520] = P[
W
n
500
250
>
520 500
250
] = P[
W
n
500
250
>
20
250
]
Now by the CLT, the random variable Z =
Wn500
250
is approximately standard normal so
that:
P[W
n
> 520] 1 (Z) 1 (1.265) 1 0.897 = 0.103
128 CHAPTER 6. SUMS OF RANDOM VARIABLES
Here is the texts statement of the Central Limit Theorem.
Theorem: 6.3.7 Given X
1
, X
2
, . . . X
n
, a sequence of iid random variables with ex-
pected values
X
and variance
2
X
, the CDF of Z
n
=
X
1
+X
2
+...+Xnn
X
n
2
X
has the property
lim
n
F
Zn
(z) = (z)
The CLT as stated above comments on the CDF of the random variable Z
n
. If X is
discrete then so is Z
n
, so the CDF of Z
n
is a step function in which the size of the steps
goes down with n. Fig. 6.1) is an example of this. The actual CDF in the gure is for
the Binomial distribution. Notice that the Normal approximation to the Binomial CDF
is a continuous curve that tends to go through the actual Binomial CDF halfway between
the integer values. Noticing this fact leads to a slightly more accurate approximation
for the Binomial distribution called the De Moivre-Laplace Formula. It is an example of
a continuity correction for a discrete random variable. For the binomial (n, p) random
variable K we have
P[k
1
K k
2
]
_
k
2
+ 0.5 np
_
np(1 p)
_
_
k
1
0.5 np
_
np(1 p)
_
De Moivre-Laplace Formula
This formula is frequently used as an approximation for Binomial probabilities, and even
works in the event that k
1
= k
2
= K. Here is an example of its use.
Example 6.5 Let K be a binomial random variable with n = 20 and p = 0.4. What is
P[K = 8]?
Sol: The exact formula for P[K = 8] is;
P[K = 8] =
_
20
8
_
(0.4)
8
(0.6)
12
0.1797.
The De Moivre-Laplace Formula for this is:
P[K = 8]
_
0.5
4.8
_
_
0.5
4.8
_
0.1803
with an error of less than 1% .
6.4. CHAPTER SUMMARY 129
0
0.2
0.4
0.6
0.8
1
-1 0 1 2 3 4 5
Binomial CDF
CLT approximation
0
0.2
0.4
0.6
0.8
1
-2 0 2 4 6 8 10
Binomial CDF
CLT approximation
0
0.2
0.4
0.6
0.8
1
-2 0 2 4 6 8 10 12 14 16 18
Binomial CDF
CLT approximation
0
0.2
0.4
0.6
0.8
1
-5 0 5 10 15 20 25 30 35
Binomial CDF
CLT approximation
Figure 6.1: The CDFs of the Binomial distribution for n = 4, 8, 16 and 32 respectively
with the CLT approximation superimposed. Notice that as n gets larger, the step size gets
smaller, and the CLT approximation gets better.
6.4 Chapter Summary
Many problems in probability and statistics involve the sum of independent random vari-
ables. This chapter noted that:
The expected value of the sum of any random variables is the sum of the expected
values.
The variance of the sum of independent random variables is the sum of the vari-
ances. If the variables are not independent, the variance of the sum is the sum of the
covariances.
The PDF of the sum of independent random variables is the convolution of the
individual PDFs.
The moment generating function (MGF) provides a transform domain method for
calculating moments through dierentiation.
130 CHAPTER 6. SUMS OF RANDOM VARIABLES
The MGF of the sum of independent random variables is the product of the individual
MGFs.
Certain sums of independent random variables W = X
1
+ X
2
+ . . . + X
n
occur in
families.
If X
i
is Bernoulli (p) , then W is binomial (n, p).
If X
i
is Poisson (), then W is Poisson (n).
If X
i
is Geometric (p), then W is Pascal (n, p).
If X
i
is Exponential (), then W is Erlang (n, ).
If X
i
is Gaussian (, ), then W is Garssian (n,
n).
The Central Limit Theorem(CLT) states that the CDF of the sum of n independent
random variables converges to a Gaussian CDF as n gets very large.
As a result of the CLT the variable Z =
Wn
X
n
is (approximately) a standard normal
random variable for large n.
The De Moivre-Laplace formula gives a good CLT approximation for the binomial
random variable for large n.
6.5 Sample Test Questions
T-1 K is a Binomial (n = 20, p = 0, 4) random variable. The DeMoivre- Laplace approxi-
mation to P[K = 8] is closest to: A) 0.180 B) 0.300 C) 0.170 D) 0.413 E)
0.018
T-2 The moment generating function (MGF) of a random variable X is (s) = (0.7 +
0.3e
s
)
9
. The expected value of X is closest to: A) 2.3 B) 7.2 C) 9.0 D) 2.7
E) 1.0
T-3 Suppose M
n
is the average of n independent identically distributed random variables
M
n
=
1
n
n
i=1
X
i
where the X
i
have mean , variance
2
and n is large. Consider the
following statements.
i) M
n
is approximately a Gaussian random variable.
ii) M
n
has variance n
2
.
iii) The expected value of M
n
is .
Which of the above statements are true?
6.5. SAMPLE TEST QUESTIONS 131
T-4 Let K
1
, K
2
, . . . K
n
denote a sequence of iid Bernoulli (p) random variables. Let M =
K
1
+K
2
+. . . +K
n
.
(a) Find the MGF
K
(s).
(b) Find the MGF
M
(s).
(c) Use the MGF
M
(s) to nd the expected value E[M] and the varianceV ar[M].
132 CHAPTER 6. SUMS OF RANDOM VARIABLES
Chapter 7
Stochastic Processes
133
134 CHAPTER 7. STOCHASTIC PROCESSES
7.1 Introduction
In this chapter we add a new feature to the idea of random variables. In previous chapters
we have assumed that outcomes of random experiments may be mapped onto random
variables, or perhaps vectors of random variables, but without any explicit consideration
that outcomes might map onto functions of time. In this chapter we consider this extension.
W(t)
X(n)
Y(t)
Z(n)
Figure 7.1: Four kinds of stochastic pro-
cesses. Continuous-value, Continuous-Time.
Continuous-value, Discrete-Time. Discrete-
value, Continuous-Time. Discrete-value,
Discrete-Time.
The distinguishing feature of a Stochastic
Process is that there is an ordering of random
variables in a natural sequence. We gener-
ally think of the sequencing variable as a pa-
rameter associated with time. Typically we
would denote a stochastic process by X(t) with
t as an independent variable. At any xed
t = t
0
, X(t
0
) is just a random variable, but
the random variable and possibly the PDF of
the random variable changes with t. For exam-
ple, the temperature or pressure in a room as
a function of time is a stochastic process. At
each instant of time the temperature/pressure
is slightly dierent depending on the local envi-
ronment. In this case the time variable might
be the time of day. However we could also
think of temperature/pressure as a function of
altitude in the atmosphere. In this case the
natural t-variable would be altitude.
There are four dierent kinds of stochastic processes depending on whether the process
is, or is not, continuous in value or time. For example a stochastic process is deemed a
discrete-value stochastic process only if X(t) is a discrete-valued random variable for all
values of t, otherwise the process is considered continuous-valued. Fig. 7.1 illustrates the
four types. A sample path or sample function is an actual realization of a stochastic
process. In our temperature as a function of time example, an actual recorded trace of the
temperature over, say, a whole day would be a sample path. The set of all possible such
paths is called an ensemble. The ensemble of paths for a stochastic process is the sample
space for the process.
If the time-variable of a stochastic process is discrete, we can replace the time var-
iable by an index and dene a Random Sequence {X
n
} = {X
0
, X
1
, X
2
, . . .} where
X
0
= X(t
0
), X
1
= X(t
1
) . . .
For example, your weight as a function of time is a stochastic process. If you weigh
yourself each morning from the beginning of term to the end of term, you can order the
measurements in a random sequence. In principle, your weight as a function of time is
continuous-time, continuous-value (it changes innitesimally with every breath you take).
7.2. INDEPENDENT, IDENTICALLY DISTRIBUTED RANDOM SEQUENCES 135
However, sampling your weight each morning will give you a discrete-time continuous-
value process, or if your morning weigh-in is on a digital scale, your observations will form
a random sequence of discrete values.
7.2 Independent, Identically Distributed Random Sequences
Figure 7.2: Two sample paths of length 50 for the Bernoulli
Process. The two random sequences are plotted with lines
joining successive elements.
Possibly the simplest stochastic
process is the iid random sequence.
In such sequences, all element in
the sequence are random variables
from the same distribution. There
is no correlation between any el-
ements of the sequence. Two
sample paths from an iid random
sequence are plotted in Fig. 7.3. In
the gure, each X
n
in the sample
path is a Bernoulli random var-
iable that is 1 with probability 1/2
and 0 with probability 1/2.
For an iid random sequence,
if the PDF of the rst ele-
ment is f
X
(x), then the joint
PDF of the entire sequence is
f
X
(x) = f
X
(x
1
)f
X
(x
2
) . . . f
X
(x
n
)
Notice that if we dene the ex-
pected value of the n-th term of
the sequence as
n
= E[X
n
] we see
that
n
= E[X
n
] = E[X] and is independent of n. Furthermore, if we calculate the covari-
ance between two elements of the sequence
1
:
Cov[X
m
, X
n
] = E[(X
m
m
)(X
n
m
)]
We nd this is zero whenever m = n since the RVs X
m
and X
n
are independent. When
the autocovariance of a process is 0, it means that deviations of sample paths from their
expected values are uncorrelated from one step to another. There is no time-varying signal
guiding the stochastic process. As a result, knowing a sample path up to step n gives us
no extra information that will allow us to better predict the result at step n + 1.
The Bernoulli process, although pretty simple as a stochastic process, is also quite
1
In Stochastic Processes this is usually called the autocovariance because it is the covariance of a
signal with itself at another time
136 CHAPTER 7. STOCHASTIC PROCESSES
Figure 7.3: The Bernoulli process again, with p = 1/2. The
rst graph is an average over four sample paths in the en-
semble of all paths. The second graph is an average over 20
sample paths. Note the individual elements are clustering
around 1/2. If we averaged over the entire ensemble of paths,
the graph would be a straight line of values at y = 1/2.
ubiquitous in signals that involve
analog to digital conversion. Typ-
ically if you are measuring some-
thing to one decimal place on an
analog instrument, then if your
reading is say somewhere between
2.1 and 2.2, you round down to 2.1
if the reading looks less than 2.15
and round up to 2.2 if the read-
ing looks greater than 2.15. That
is, you add either 0.0 or 0.1 to
the reading of 2.1 depending on
whether the actual reading is past
the half-way mark or not. If the
reading has a stochastic compo-
nent, your conversion to a single
decimal place may be modeled as
a Bernoulli process.
The iid random sequence is a
case where the stochastic process
is in a sense all noise and no signal.
Here is an example of a case where
there is a genuine, time varying
signal, but with noise determined
by initial conditions.
Example 7.1 Random Phase
Consider a stochastic process in which a sample path is given by X(t) = Acos (t + )
where A and are xed parameters and is a random variable that is uniform on [0, 2).
a) Sketch a few sample paths.
b) Find E[X(t)].
c) Find Var [X]
d) Find E[X(t)X(t +)]
Sol:
a) A few sample paths are sketched in the gure.
7.3. THE POISSON PROCESS 137
b) Since f
() =
_
1
2
0 < 2
0 otherwise.
. We have E[X(t)] =
_
2
0
1
2
Acos (t +) d = 0.
c) Similarly
Var [X] = E[(X(t))
2
] =
_
2
0
1
2
A
2
cos
2
(t +) d =
A
2
2
_
2
0
1
2
(1+cos 2(t +)) d = A
2
/2
d) Similarly
E[X(t)X(t +)] =
_
2
0
1
2
A
2
cos (t +) cos ((t +) +) d
=
A
2
4
_
2
0
(cos ((2t +) + 2)) + (cos (())) d
=
A
2
2
cos (())
Notice that the calculated covariance extracts the fact that there is a recognizable signal
in the stochastic process, even though the mean E[X(t)] fails to demonstrate this.
1 2 3 4
-1
-0.5
0.5
1
Figure 7.4: Three dierent sample
paths for a sinusoid with a random
phase.
Another type of stochastic process is a counting pro-
cess. A counting process is one that counts the number
of (discrete) events occurring over time. For example, the
number of phone calls, or emails you get throughout the
day is a counting process. The number of calls/emails at
any given time is integer valued and it cannot decreasing
with time.
More formally, a stochastic process N(t) is a counting
process if for every sample function N(t) = 0 for t < 0
and N(t) is integer valued and non-decreasing with time.
Counting processes are extremely common, and the objects
being counted are many and varied. The generic term for
the counted objects are arrivals.
A primary example of a counting process is the Poisson process.
7.3 The Poisson Process
N(t) is a Poisson Process of rate if:
a) The number of arrivals in any interval (t
0
, t
1
], N(t
1
) N(t
0
), is a Poisson random
variable with expected value (t
1
t
0
).
138 CHAPTER 7. STOCHASTIC PROCESSES
b) For any pair of non-overlapping intervals (t
0
, t
1
], and (t
2
, t
3
], the number of arrivals in
each interval, N(t
1
) N(t
0
), and N(t
2
) N(t
3
) respectively, are independent random
variables.
We call the arrival rate for the process because the expected number of arrivals per unit
time is E[N(t)]/t = .
By denition of the Poisson RV we can write the PMF of M = N(t
1
) N(t
0
) as:
P
M
[m] =
_
[(t
1
t
0
)]
m
m!
exp [(t
1
t
0
)] m = 0, 1, . . . ,
0 otherwise.
The second property of the Poisson process ( b) above) allows us to immediately write
down the joint PMF of N(t) at any ordered set of times t
1
< t
2
< . . . < t
n
Theorem: 7.3.8
For a Poisson process N(t) of rate , the joint PMF of N = [N(t
1
), N(t
2
), . . . N(t
k
)]
for
the sequence of times t
1
< t
2
< . . . < t
k
, is:
P
N
(n) =
_
_
_
_
n
1
1
e
1
n
1
!
_
_
n
2
n
1
2
e
2
(n
2
n
1
)!
_
. . .
_
n
k
n
k1
k
e
k
(n
k
n
k1
)!
_
0 n
1
n
2
. . . n
k
,
0 otherwise.
(7.1)
where
1
= t
1
,
2
= (t
2
t
1
), . . .
k
= (t
k
t
k1
).
The above theorem expresses the fact that successive non-overlapping time intervals
have independent arrival events. The resulting PMF is then a product of the marginal
distributions. Let us show this for the n = 2 case as an exercise.
Example 7.2 For a Poisson process N(t) of rate , the joint PMF of N = [N(t
1
), N(t
2
)]
n
1
1
e
1
n
1
!
_
_
n
2
n
1
2
e
2
(n
2
n
1
)!
_
0 n
1
n
2
,
0 otherwise.
(7.2)
Show that the PMF is the product of the marginals P
N(t
1
)
(n
1
)P
N(t
2
)N(t
1
)
(t
2
t
1
)
Sol: To nd the marginal P
N(t
1
)
(n
1
) we have to sum P
N
(n
1
, n
2
) over all possible values
of n
2
. The minimum value of n
2
is n
1
. This occurs when there are no events between
t
1
and t
2
. The dierence between n
2
and n
1
can then be any non-negative integer so the
required sum is then:
P
N(t
1
)
(n
1
) =
n2=n
1
P
N
(n
1
, n
2
) =
k=0
_
n
1
1
e
1
n
1
!
__
k
2
e
2
k!
_
=
_
n
1
1
e
1
n
1
!
_
e
k=0
_
k
2
k!
_
7.3. THE POISSON PROCESS 139
However the last sum is just the MacLaurin expansion of e
2
and the result is
P
N(t
1
)
(n
1
) =
_
n
1
1
e
1
n
1
!
_
Similarly to nd the marginal P
N(t
2
)N(t
1
)
(n
2
n
1
) we have to sum P
N
(n
1
, n
2
) over all
possible values of n
1
keeping the value of n
2
n
1
xed. Thus, writing n
2
n
1
= n so
n
2
= n
1
+n we have
P
N(t
2
)N(t
1
)
(n) =
n
1
=0
P
N
(n
1
, n
1
+n) =
_
n
2
e
2
n!
_
e
n
1
=0
_
n
1
1
n
1
!
_
=
_
n
2
e
2
n!
_
The product is then
P
N(t
1
)
(n
1
)P
N(t
2
)
(n) =
_
n
1
1
e
1
n
1
!
__
n
2
e
2
n!
_
=
_
n
1
1
e
1
n
1
!
__
n
2
n
1
2
e
2
(n
2
n
1
)!
_
This is just the PMF in Eqn. 7.2
The PMF Eqn. 7.1 gives us the probability that there were n
1
arrivals in the time
N(t)
t
X
1
X
2
X
3
X
4
X
5
1
2
3
4
5
S
1
S
2
S
3
S
4
S
5
Figure 7.5: The relationship between the Poisson Process,
that counts arrivals, and the interarrival times that have an
exponential distribution.
interval(0, t
1
] and n
2
n
1
arrivals
in the interval (t
1
, t
2
] and so on
up to n
k
n
k1
arrivals in the in-
terval (t
k1
, t
k
]. Now the Poisson
process is a counting process. We
are counting the number of events
in a xed time interval. How-
ever we can also look at sample
paths in another way. The se-
quence of events breaks any time
interval into a sequence of interar-
rival times. The interrarival times
themselves are random variables
and have an exponential distribu-
tion. That is
Theorem: 7.3.9 For a Poisson process of rate , the interarrival times X
1
, X
2
, . . .
are an iid random sequence with the exponential PDF:
f
X
(x) =
_
e
x
x > 0,
0 otherwise.
140 CHAPTER 7. STOCHASTIC PROCESSES
To see this suppose that the interarrival times are X
1
= x
1
, . . . X
n1
= x
n1
, so that
arrival n 1 occurs at time t
n1
= X
1
+ + X
n1
. We then notice that for x > 0,
X
n
> x if and only if there are no arrivals in the interval (t
n1
, t
n1
+x]. Now arrivals in
(t
n1
, t
n1
+x] are independent of past history X
1
, . . . , X
n1
so
P[X
n
> x|X
1
= x
1
, . . . , X
n1
= x
n1
]
= P[N(t
n1
+ x) N(t
n1
) = 0] = e
x
Thus P[X
n
x] = F
X
(x) = 1 e
x
so the PDF is f
X
(x) = e
x
for x > 0.
Notice that the counting process used with a collection of independent interarrival
times from the exponential () distribution, X
1
, X
2
. . . denes a Poisson process. Here is
an exercise working with Poisson processes and interarrival times.
Example 7.3 Inquiries arrive at a recorded message device according to a Poisson process
of rate 15 inquiries per minute.
a) Find the probability that in a 1-minute interval, 3 inquiries arrive during the rst 10
seconds and 2 inquiries arrive during the last 15 seconds.
b) Find the mean and variance of the time until the arrival of the 10th inquiry.
Sol: The arrival rate is 15 inquiries per minute or = 15/60 = 1/4 arrivals per second.
a) Arrival probabilities over disjoint time intervals are independent and depend only on
the lengths of the time intervals in question, so that the probability that in a 1-minute
interval, 3 inquiries arrive during the rst 10 seconds and 2 inquiries arrive during the last
15 seconds is just the probability of three events in 10 seconds times the probability of 2
events in 15 seconds. Thus, if N(t) is the Poisson RV then
P[both events] = P[N(10) = 3]P[N(15) = 2] =
_
(10/4)
3
e
10/4
3!
__
(15/4)
2
e
15/4
2!
_
b) The interarrival times X are iid random variables from the exponential distribution with
rate = 1/4 so E[X] = 1/ = 4. The time of the tenth arrival is T =
10
i=1
X
i
so
E[T] = E[
10
i=1
X
i
] = 10E[X] = 40 (sec.)
The variance is
Var [T] = 10 Var [X] =
10
2
= 160 (sec
2
.)
7.4. BROWNIAN MOTION PROCESS 141
7.4 Brownian Motion Process
A Brownian Motion Process W(t) is such that W(0) = 0 and for > 0, W(t+)W(t)
is a Gaussian random variable (mean 0 and variance with a xed constant) and is
independent of W(t
) for all t
2t
exp[w
2
/(2t)] (7.3)
To interpret this in terms of Brownian motion, let W(t) represent the position of a particle
on a line. For a small time increment we can write:
W(t +) = W(t) + [W(t +) W(t)]
This equation, taken literally says:
The position of the particle at time t + is the position at time t plus the small
change [W(t +) W(t)].
But from the denition of the Brownian Process, the small change [W(t +) W(t)] is a
Gaussian random variable that is independent of W(t) ! Thus the small steps are random
and Gaussian.
The independent steps leads to a joint PMF that is a product of marginals. For the
Brownian process W(t), the joint PDF of W = [W(t
1
), W(t
2
), . . . W(t
n
)]
is:
f
W
(w) =
n
k=1
1
_
2(t
k
t
k1
)
exp[(w
k
w
k1
)
2
/(2(t
k
t
k1
))] (7.4)
The following problem connects the Brownian Motion Process with a discrete time
Process.
Example 7.4 For a Brownian Motion Process X(t) with variance t let X
0
= X(0), X
1
=
X(1), X
2
= X(2), . . . be a sequence of samples from the process. The discrete-time con-
tinuous value process Y
1
, Y
2
, . . . dened by Y
1
= X
1
X
0
, Y
2
= X
2
X
1
, . . . is called an
increments process. Show that this sequence Y
1
, Y
2
, . . . is an iid random sequence.
Sol:
From the denition of the Brownian process, the increments Y
k
= X
k
X
k1
are all
Gaussian with zero mean and variance . Thus the Y
k
are identically distributed. But the
Gaussian process requires that Y
k
be independent of all X(t) for all t k 1 so Y
k
must
be independent of Y
m
for all m k 1. Thus the Y
k
are all independent and consequently
iid.
142 CHAPTER 7. STOCHASTIC PROCESSES
Example 7.5 The price of a volatile stock, X(t) is known to be a Brownian process. The
standard deviation of the stock price over an 8 hour day is = 4. The opening price on a
particular day is X(0) = 100.
a) What is the parameter for the process if t is in hours.
b) Suppose that after the rst hour the stock is trading at X(1) = 99. What is the
probability that the closing price will be 100 or above?
c) If X(0) = 100, what is a safe price m such that the stock price will close above m
with probability 0.9.
Sol:
a) The stock has a standard deviation of = 4 over an 8-hour day so the 8-hour variance
is
2
= 16. For a Brownian process this is t so 8 = 16 or = 2.
b) Since X(t) is Brownian, the price change X(8) X(1) is a Gaussian RV that is
independent of X(0) and depends only on the time dierence of 7 hours. Since
X(8) X(1) is Gaussian with mean 0 and variance 7 = 14. We have:
P[X(8)] 100 | X(1) = 99] = P[X(8)X(1) 1] = P[(X(8)X(1))/
14 1/
14]
But Z = (X(8) X(1))/
14 is standard normal so
P[(X(8) X(1))/
14 1/
14] = 1 (1/
t represents our
uncertainty in the speed of the particle. Note that both of these uncertainties depend on
t. The apparent speed of the particle seems to decrease with time, although the apparent
displacement increases. The product is
xv = .
Thus the product of the uncertainties is independent of time! This is a classical coun-
terpart to the famous Heisenberg Uncertainty principle
2
of Quantum Mechanics. In this
case, the interpretation is that the sample paths are so rough, the Brownian particles never
settle down to a smooth velocity, no matter how small a time interval one chooses to use in
a measurement. The sample paths are non-dierentiable curves (Fractals!) and Brownian
particles do not have instantaneous velocities!
7.5 Sample Test Questions
T-1 Suppose N(t) is a Poisson process with rate . Consider the following statements:
i) At a xed time t
0
, N(t
0
) is a continuous random variable with the exponential
distribution.
ii) A sample path N(t) plotted as a function of t will be a smooth curve.
iii) P[N(2) = 2, N(5) = 6] =
()
2
e
2!
()
4
e
4!
.
Choose the correct statement. A) Only i) is true.
B) Only ii) is true.
C) Only iii) is true.
D) Only i) and ii) are true.
E) i), ii) and iii) are all false.
T-2 Let X(t) be a stochastic process in which E[X(t)] = m(t) = 0. Consider the following
statements:
i) The autocovariance C
X
(t
1
, t
2
) is equal to the autocorrelation R
X
(t
1
, t
2
).
ii) Var[X(t)] = C
X
(t, t)
2
See for example the Wikipedia article on Heisenbergs Uncertainty Principle
144 CHAPTER 7. STOCHASTIC PROCESSES
iii) 1 C
X
(t
1
, t
2
) 1
Choose the correct statement.
A) Only i) is true. B) Only ii) is true. C) Only iii) is true. D) Only i) and ii) are
true. E) i), ii) and iii) are all true.
T-3 Let A be a uniformly distributed random variable from the interval [0, 1] .
The random process X(t), t 0 is dened as
X(t) = Ae
t
(a) Sketch 3 typical sample paths.
(b) Find the CDF of X(t).
(c) Find the PDF of X(t).
(d) Find
X
(t) = E[X(t)].
T-4 [10 marks] The number of bicycles that arrive at the engineering building in t minutes
is a Poisson process X(t)
with expected value E[X(t)] = t/10.
A) What is the probability mass function (PMF) of X at t = 20.
B) What is the probability that in a ve-minute interval, three bicycles will arrive?
C) What is the probability of no bicycles arriving in a 10-minute interval?
D) How much time should you allow so that with probability 0.99, at least one bicycle
arrives?
E) What is the PDF of waiting times between bicycles?
T-5 [12 marks] Let g(t) = u(t) u(t 1) be the rectangular pulse shown in the gure.
(u(t) being the unit-step function.) The random process X(t), t 0 is dened as
X(t) = Ag(t)
where A assumes the value 1 with equal probability.
(a) Sketch typical sample paths.
(b) Find the pmf of X(t).
(c) Find m
X
(t) = E[X(t)].
(d) Find the joint pmf of X(t) and X(t +d), d > 0.
(e) Find the autocovariance function C
X
(t, t +d), d > 0
1
1
t
7.5. SAMPLE TEST QUESTIONS 145
T-6 [12 marks] Patients arrive at a hospital emergency room(ER) in a Poisson process
at an average rate of one patient every 10 minutes. Let N(t) denote the number of
patients who have arrived at the oce after t hours. [Hint: In some parts of this
problem, the Central Limit Theorem may be helpful!]
(a) What is the PMF of N.
(b) What is the probability that after three hours, 20 or more patients have arrived.
(c) What is the probability that 6 patients arrive between 9:00 and 10:00 AM and 4
patients arrive between 1:00 and 2:00 PM?
(d) Over a 100 hour period, what is the probability that over 661 patients visit the
ER.