Bayes rule for random variables
There are many situations where we want to know X, but can only
measure a related random variable Y or observe a related event A.
Bayes gives us a systematic way to update the pdf for X given this
observation.
We will look at four different versions of Bayes rule for random vari-
ables. They all say essentially the same thing, but are tailored to
situations where we are observing or inferring a mixture of continu-
ous random variables and discrete random variables or events.
Bayes rule for continuous random variables
If X and Y are both continuous random variables with joint pdf
fX,Y (x, y), we know that
fX,Y (x, y) = fY |X (y|x) fX (x) = fX|Y (x|y) fY (y).
Thus we can turn a conditional pdf in y, fY |X (y|x) into one for X
using
fY |X (y|x) fX (x)
fX|Y (x|y) = .
fY (y)
For a fixed observation of Y = y,
• fX (x) is a function of x (a pdf),
• fY |X (y|x) is a particular function of x determined by y (al-
though not a pdf),
• fY (y) is a number.
63
ECE 3077 Notes by M. Davenport, J. Romberg and C. Rozell. Last updated 21:27, June 25, 2014
Using the fact that
Z ∞
fY (y) = fY |X (y|u) fX (u) du,
−∞
we will often find it useful to rewrite the denominator above to get
fY |X (y|x) fX (x)
fX|Y (x|y) = R ∞
−∞ fY |X (y|u) fX (u) du
Example. iPhones are known to have an exponentially distributed
lifetime Y . However, the manufacturing plant has had some quality
control problems lately. On any given day, the parameter λ of the
pdf of Y is itself a random variable uniformly distributed on [1/2, 1].
We test an iPhone and record its lifetime. What can we say about
the underlying parameter λ?
We have
1
fΛ(λ) = 2, for ≤ λ ≤ 1,
2
and
fY |Λ(y|λ) = λ e−λy , y ≥ 0.
Given a particular observation Y = y, we update the distribution
for λ as
fΛ|Y (λ|y) =
64
ECE 3077 Notes by M. Davenport, J. Romberg and C. Rozell. Last updated 21:27, June 25, 2014
Bayes rule for discrete random variables
If X and Y are both discrete random variables, then we can simply
replace the pdfs above with pmfs,
pY |X (y|x) pX (x)
pX|Y (x|y) = .
pY (y)
This really just follows from Bayes rule for events A and B (which
we looked at in Section I of the notes),
P (B|A) P (A)
P (A|B) = ,
P (B)
where A is the event {X = x} and B is the event {Y = y}.
Again, using the law of total probability,
X
pY (y) = pY |X (y|k) pX (k)
k
we can rewrite the denominator above to get this version of Bayes
rule:
pY |X (y|x) pX (x)
pX|Y (x|y) = P
k pY |X (y|k) pX (k)
65
ECE 3077 Notes by M. Davenport, J. Romberg and C. Rozell. Last updated 21:27, June 25, 2014
Exercise:
Suppose that X is the number of 1s that appear in a binary string of
length L; each bit in the string is equal to zero or one with probability
1/2, and the bits are independent. Given L = `, we know that X
has the binomial distribution when k ≤ ` (the probability is zero
otherwise):
! !
` ` −`
pX|L(k|`) = (0.5)k (0.5)`−k = 2 .
k k
Suppose that the length of the string is also random and uniformly
distributed between 1 and 10:
(
1
10
, ` = 1, . . . , 10
pL(`) =
0, otherwise.
We learn that the binary string contains 4 ones. How can we use this
information to update the pmf for the length of the string L?
66
ECE 3077 Notes by M. Davenport, J. Romberg and C. Rozell. Last updated 21:27, June 25, 2014
Inference about a discrete event/random variable from a
continuous observation
What does the observation of a random variable tell us about whether
an event has occurred?
Recall that if A and B are events, then
P (B|A) P (A)
P (A|B) = , for P (B) > 0.
P (B)
Now suppose that Y is a continuous random variable, and we observe
Y = y. What can we say about P (A|Y = y)?
We have to be a little careful here, since strictly speaking P (Y = y) =
0 (since Y is continuous). But using densities for Y instead of prob-
abilities, we can arrive at (see the next page)
fY |A(y) P (A)
P (A|Y = y) =
fY (y)
where fY |A(y) is a conditional density for Y evaluated at y, and Y
is the pdf for Y evaluated at y.
Again, we can use a version of the law of total probability to expand
the denominator above:
fY (y) = fY |A(y) P (A) + fY |Ac (y) P (Ac)
and so
fY |A(y) P (A)
P (A|Y = y) =
fY |A(y) P (A) + fY |Ac (y) P (Ac)
67
ECE 3077 Notes by M. Davenport, J. Romberg and C. Rozell. Last updated 21:27, June 25, 2014
We can derive this expression using Bayes rule for events, and then
taking limits. For any δ > 0, the event B = {y ≤ Y ≤ y + δ} is
well-defined, and assuming that fY (y) > 0, P (B) will be positive.
Then using Bayes rule for events
P (y ≤ Y ≤ y + δ|A) P (A)
P (A|y ≤ Y ≤ y + δ) =
P (y ≤ Y ≤ y + δ)
R
y+δ 0 0
y fY |A(y ) dy P (A)
= R y+δ .
y fY (y 0) dy 0
As δ → 0,
P (A|y ≤ Y ≤ y + δ) → P (A|Y = y) ,
Z y+δ
fY |A(y 0) dy 0 → fY |A(y)δ,
y
Z y+δ
fY (y 0) dy 0 → fY (y)δ,
y
and so taking the limit as δ → 0 on both sides above, the Bayes
expression becomes
fY |A(y) P (A)
P (A|Y = y) = .
fY (y)
68
ECE 3077 Notes by M. Davenport, J. Romberg and C. Rozell. Last updated 21:27, June 25, 2014
The Bayes expression on the previous two pages easily extends to
the case where X is a discrete random variable, as events of the form
{X = x} are well defined. Thus a mixed continuous-discrete version
of Bayes rule is
pX|Y (x|y) = P (X = x|Y = y)
fY |X (y|x) pX (x)
=
fY (y)
fY |X (y|x) pX (x)
=P
k fY |X (y|k) pX (k)
Exercise:
S is a binary signal, with P (S = 1) = p and P (S = −1) = 1 − p.
Suppose that we transmit S and the received signal is Y = S + N ,
where N is normal noise with zero mean and unit variance, N ∼
Normal(0, 1), and is independent of S. What is the probability that
S = 1 as a function of the observed value y of Y ?
69
ECE 3077 Notes by M. Davenport, J. Romberg and C. Rozell. Last updated 21:27, June 25, 2014
Inference about a continuous random variable based on
discrete observations
What does the observation of a discrete event tell us about a related
continuous random variable?
We have observed that an event A has occurred, and want to use
this information to update our probability model for a continuous
random variable Y . In a manner similar to what we did in the
previous section, we can derive the following version of Bayes rule
that mixes continuous random variables and discrete events:
P (A|Y = y) fY (y)
fY |A(y) =
P (A)
P (A|Y = y) fY (y)
= R∞
−∞ P (A|Y = u) fY (u) du
70
ECE 3077 Notes by M. Davenport, J. Romberg and C. Rozell. Last updated 21:27, June 25, 2014
Exercise:
A bag of candy is filled with red and white jelly beans. The color of
each jelly bean is independent, and the probability of pulling out a
red jelly bean is p, while the probability of pulling out a white jelly
bean is 1 − p.
We have no idea what p is, so we model it as a uniformly distributed
random variable P ,
(
1, 0 ≤ p ≤ 1
fP (p) =
0, otherwise.
We pull out a jelly bean and see that it is red. How does this obser-
vation change the pdf for P ?
71
ECE 3077 Notes by M. Davenport, J. Romberg and C. Rozell. Last updated 21:27, June 25, 2014