Bayes’ Rule
Foundations of Data Analysis
February 3, 2022
Brain Teaser: Trick Coin
I have four coins. Three are normal, one side heads, one
side tails. One is a trick coin where both sides are
heads. I pick one coin at random and flip it. If it shows
heads, what is the probability that it is the trick coin?
Bayes’ Rule
Let’s us “flip” a conditional:
P(A | B)P(B)
P(B | A) =
P(A)
Deriving Bayes’ Rule
Multiplication rule:
P(A ∩ B) = P(A | B)P(B)
P(B ∩ A) = P(B | A)P(A)
But these two equations are equal, so:
P(B | A)P(A) = P(A | B)P(B)
Dividing both sides by P(A) gives us:
P(A | B)P(B)
P(B | A) =
P(A)
Trick Coin Example
A = “heads”, B = “trick coin”
P(A | B) = 1.0
P(B) = 0.25
P(A) = P(A | B)P(B) + P(A | Bc )P(Bc )
5
= 1.0 × 0.25 + 0.5 × 0.75 =
8
P(A | B)P(B) 1.0 × 0.25 2
P(B | A) = = = = 0.4
P(A) 5/8 5
Random Variables
Definition
A random variable is a function defined on a sample
space, Ω. Notation: X : Ω → R
I A random variable is neither random nor a variable.
I Just think of a random variable as assigning a
number to every possible outcome.
I For example, in a coin flip, we might assign “tails”
as 0 and “heads” as 1:
X(T) = 0, X(H) = 1
Dice Example
Let (Ω, F, P) be the probability space for rolling a pair of
dice, and let X be the random variable that gives the
sum of the numbers on the two dice. So,
X[(1, 2)] = 3, X[(4, 4)] = 8, X[(6, 5)] = 11
Even Simpler Example
Most of the time the random variable X will just be the
identity function. For example, if the sample space is the
real line, Ω = R, the identity function
X : R → R,
X(s) = s
is a random variable.
Defining Events via Random Variables
Setting a real-valued random variable to a value or range
of values defines an event.
[X = x] = {s ∈ Ω : X(s) = x}
[X < x] = {s ∈ Ω : X(s) < x}
[a < X < b] = {s ∈ Ω : a < X(s) < b}
Joint Probabilities
Two binary random variables:
C = cold / no cold = (1/0)
R = runny nose / no runny nose = (1/0)
Event [C = 1]: “I have a cold”
Event [R = 1]: “I have a runny nose”
Joint event
[C = 1] ∩ [R = 1]: “I have a cold and a runny nose”
Notation for joint probabilities:
P(C = 1, R = 1) = P([C = 1] ∩ [R = 1])
Cold Example: Probability Tables
Two binary random variables:
C = cold / no cold = (1/0)
R = runny nose / no runny nose = (1/0)
Joint probabilities:
C
0 1
0 0.50 0.05
R
1 0.20 0.25
Cold Example: Marginals
C
0 1
0 0.50 0.05
R
1 0.20 0.25
Marginals:
P(R = 0) = 0.55, P(R = 1) = 0.45
P(C = 0) = 0.70, P(C = 1) = 0.30
Cold Example: Conditional Probabilities
C
0 1
0 0.50 0.05 0.55
R
1 0.20 0.25 0.45
0.7 0.3
Conditional Probabilities:
P(C = 0, R = 0) 0.50
P(C = 0 | R = 0) = = ≈ 0.91
P(R = 0) 0.55
P(C = 1, R = 1) 0.25
P(C = 1 | R = 1) = = ≈ 0.56
P(R = 1) 0.45
Cold Example
C
0 1
Remember:
0 0.50 0.05 0.55
R P(C) = 0.3
1 0.20 0.25 0.45
P(C | R) = 0.56
0.7 0.3
What if I didn’t give you the full table, but just:
P(R | C) = 0.83 > P(R) = 0.45
What can you say about the increase
P(C | R) > P(C)?
Cold Example
Notice, having a cold increases my chance for a runny
nose by the factor,
P(R | C) 0.83
= = 1.85
P(R) 0.45
How does such a ratio increase if I flip the conditional?
P(C | R) P(C ∩ R)
=
P(C) P(R)P(C)
P(R | C)
=
P(R)
= 1.85