Slide set Conditional Probability
∙ Conditional probability
∙ Chain rule
∙ Sequential calculation of probabilities
∙ Law of total probability
∙ Bayes rule
∙ Independence
∙ Summary
© Copyright Abbas El Gamal
Conditional probability
∙ We are often interested in the probability of event A given that event B occurred
∙ Examples:
Alice tests positive for a certain disease, what is the probability that she has this disease?
A is read out of a memory chip, what is the probability that a was stored?
∙ Definition: The conditional probability of A given B (with P(B) ̸= ) is:
P(A ∩ B) P(A, B)
P(A | B) = , or
P(B) P(B)
∙ Interpretation: Given B, only outcomes in A that are in A ∩ B can occur
Dividing by P(B) then normalizes the probability that A ∩ B occurs,
which we call the conditional probability of A given B
/
Example
∙ Roll a -sided die twice. So, the sample space is {, , , }
Assume each sample point has probability /.
Let B be the event that the minimum of the two die rolls is , and
A be the event that the maximum of the two die rolls is
Find P(A | B)
B
P(A, B)
P(A | B) =
P(B)
P(B) =
nd roll
P(A, B) =
P(A | B) =
Compared to P(A) = A
st roll
/
P(⋅ | B) is a probability measure
∙ For a given B, P(⋅ | B) constitutes a probability measure on Ω
Positivity: P(A | B) ≥
P(Ω ∩ B) P(B)
Normalization: P(Ω | B) = = =
P(B) P(B)
Additivity: A , A disjoint evens, then
P((A ∪ A ) ∩ B) P (A ∩ B) ∪ (A ∩ B) P(A ∩ B) + P(A ∩ B)
P(A ∪ A | B) = = =
P(B) P(B) P(B)
= P(A | B) + P(A | B)
B
∙ For the -sided die example:
Unconditional probability measure first
nd roll
assigns probability / to each outcome
Conditioned on B, the conditional
probability measure first assigns probability
/ to (, ), (, ), (, ), (, ), (, ) and zero
to all other outcomes
st roll
/
Chain rule
∙ Probability measures are often specified via conditional probabilities
∙ Two methods to calculate probability of an event from conditional probabilities:
the sequential method and
a “divide and conquer” method using the law of total probability
∙ Both use the chain rule:
By the definition of conditional probability, P(A ∩ B) = P(A | B) P(B)
Suppose that A , A , . . . , An are events, then
P ∩ni= Ai = P ∩n−
i= A i ⋅ P A n | ∩ n−
i= Ai
= P ∩n−
i= A i ⋅ P A n− | ∩ n−
i= A i ⋅ P A n | ∩ n−
i= Ai
..
.
n−
= P(A ) ⋅ P(A | A ) ⋅ P(A | A ∩ A ) ⋅ ⋅ ⋅ P An | ∩i= Ai
n
P ∩ni= Ai = P(Ai | A , A , . . . , Ai− ), where A =
i=
/
Sequential calculation of probabilities
. Construct a tree description of the sample space for the sequential experiment
. Assign conditional probabilities to the corresponding edges
. By the chain rule, the probability of an outcome is the product of the
conditional probabilities along the path from the root to the outcome’s leaf node
s P(A , A ) = P(A ) P(A | A )
s
P(A |A )
✁✁❅ P(A | A )
c
P(A ) ✁ ❅s P(A , Ac ) = P(A ) P(Ac | A )
❅
s✁
✁
❆
❆
P(Ac ) s P(Ac , A ) = P(Ac ) P(A |Ac )
❆
❆❆s P(A |Ac )
❅
❅s P(Ac , Ac ) = P(Ac ) P(Ac |Ac )
P(Ac |Ac )
❅
/
Diagnosing a disease
∙ Let A be the event that a person has a certain disease and
B be the event that he tested positive for the disease. Assume that
P(A) = ., P(B | A) = ., and P(B | Ac ) = ., what is the probability of:
False negative (misdiagnosis), i.e., P(A ∩ Bc )
False positive (false diagnosis), i.e., P(Ac ∩ B)
∙ We use the sequential method:
s P(A, B) = P(A) P(B | A)
s
P(B|A) = .
✁❅
P(A) = . ✁ .❅s P(A, Bc ) = . × . = . (Misdiagnosis)
s✁
✁
❆
P(Ac ) = .
❆ s P(Ac , B) = . × . = . (False diagnosis)
❆ .
❆s
❅
P(Bc |Ac ) = .
❅s P(Ac , Bc ) = P(Ac ) P(Bc |Ac )
/
Example
∙ Cards: Three cards are randomly drawn (without replacement) from
a -deck card. What is the probability of not drawing a heart?
∙ We can use counting to find this probability:
The sample space for picking cards has choices
−
The number of ways of picking cards with no heart is:
− × ×
The probability of no hearts is: = = .
× ×
/
Example
∙ We can also use the sequential method
∙ Let Ai be the event “not heart” at draw i = , , , and we want P(A ∩ A ∩ A )
We construct a (partial) tree with conditional probabilities on its edges
t
t✟
P(A |A , A ) = /✟✟ P(A , A , A ) = P(A ) P(A | A ) P(A | A , A ) = × ×
P(A |A ) = / / ❍t
❍❍
t
✁✁❅
P(A ) = / ✁ /❅❅t
✁
t✁
✁
❆
❆
P(Ac ) = /❆
❆
❆❆t
/
Recap
P(A ∩ B)
∙ Conditional probability of A given B (with P(B) ̸= ) is: P(A | B) =
P(B)
P(⋅ | B) is a probability measure
∙ Chain rule: For two events, P(A ∩ B) = P(A | B) P(B)
n
In general: P ∩ni= Ai = P(Ai | A , A , . . . , Ai− ), A =
i=
∙ The sequential method for computing probabilities
∙ Up next:
Computing probabilities via the law of total probability
Bayes rule (basis for Bayesian statistical inference)
/
Law of total probability
∙ Let A , . . . , An be events that partition Ω, i.e., disjoint and ∪ni= Ai = Ω
n n
Then for any event B: P(B) = P(Ai ∩ B) = P(Ai ) P(B | Ai )
i= i=
Remark: This law holds also for n = ∞
∙ Chess tournament. Alice plays one of classes of opponents:
P( Class ) = ., P( Alice wins | Class ) = .
P( Class ) = ., P( Alice wins | Class ) = .
P( Class ) = ., P( Alice wins | Class ) = .
What is the probability that Alice wins the game?
∙ By the law of total probability
P( Alice wins ) = P( Class i , Alice wins ) = P( Class i ) P( Alice wins | Class i )
i= i=
= . × . + . × . + . × . = .
/
Binary communication link
∙ Consider a communication link in which a or is sent and a or is received
Assume that is sent with probability . (and with probability .)
The link is noisy: if a is sent, a is received with probability ., and
if a is sent, a is received with probability .
What is the probability the received bit is in error?
∙ We can represent the link model by a probability transition diagram
P( | ) = .
P() = .
.
bit sent bit received
.
P() = .
P( | ) = .
/
Binary communication link
∙ We can represent the link model by a probability transition diagram
P( | ) = .
P() = .
.
bit sent bit received
.
P() = .
P( | ) = .
∙ Define the events: A = { is sent}, B = { is received}, E = {received bit is in error}
∙ To find the probability of error, we use the law of total probability
P(E) = P(A) P(E | A) + P(Ac ) P(E | Ac )
= P(A) P(Bc | A) + P(Ac ) P(B | Ac )
= . × . + . × . = .
/
Bayes rule
∙ Let A , A , . . . , An be nonzero probability events (the causes) that partition Ω,
and let B be a nonzero probability event (the effect/observation)
∙ Given the a priori probabilities P(Ai ) and
the conditional probabilities P(B | Ai ), i = , . . . , n,
find the a posteriori probabilities P(Aj | B), j = , . . . , n
∙ From the definition of conditional probability,
P(B, Aj ) P(B | Aj )
P(Aj | B) = = P(Aj )
P(B) P(B)
By the law of total probability
Thomas Bayes (-)
n
P(B) = P(Ai ) P(B | Ai )
i=
Substituting we obtain Bayes rule:
P(B | Aj )
P(Aj | B) = P(Aj ) for j = , , . . . , n
∑ni= P(Ai ) P(B | Ai )
/
Disease diagnosis
∙ Recall that A is the event a person has the disease and B is the event that he tests
positive, where P(A) = ., P(B | A) = ., P(B | Ac ) = .
What is the probability that the person has the disease given he tested positive?
∙ We want to find P(A|B), so we use Bayes rule
P(B | A)
P(A|B) = c c
P(A)
P(A) P(B | A) + P(A ) P(B | A )
.
= × . = .
. × . + . × .
Versus a priori probability of ., so the test increased the probability a lot
But the test is as good as flipping a fair coin, even though it is % accurate!
What test accuracy would we needed to achieve % correct diagnosis?
/
Disease diagnosis
∙ Recall that A is the event a person has the disease and B is the event that he tests
positive, where P(A) = ., P(B | A) = ., P(B | Ac ) = .
What is the probability that the person has the disease given he tested negative?
∙ We want to find P(A|Bc ), so we again use Bayes rule
c P(Bc | A)
P(A|B ) = P(A)
P(A) P(Bc | A) + P(Ac ) P(Bc | Ac )
.
= × . = .
. × . + . × .
Versus a priori probability of ., so the test decreased the probability a lot
/
Independence
∙ Sometimes knowing that B occurred doesn’t change the probability of A, i.e.,
P(A | B) = P(A)
In this case we say that the two events are statistically independent
∙ Examples:
Flip a fair coin and get a H, the probability of a H in the next flip is still /
You are told Alice is born on /, the probability Bob is born on / is still /
∙ From the definition of conditional probability,
P(A, B)
P(A | B) =
P(B)
Hence, an equivalent definition of independence is that
P(A, B) = P(A) P(B)
It also follows that: P(B | A) = P(B)
/
Independence
∙ Suppose A and B are independent. i.e., P(A, B) = P(A) P(B),
are Ac and Bc also independent?
∙ Intuitively, the answer should be YES
∙ Let’s verify
P(Ac , Bc ) = P((A ∪ B)c ) De Morgan’s law
= − P(A ∪ B)
= − (P(A) + P(B) − P(A, B)) inclusion-exclusion
= − (P(A) + P(B) − P(A) P(B)) independence
= ( − P(A)) ( − P(B)) = P(Ac ) P(Bc ),
hence Ac and Bc are indeed independent
∙ Two experiments are said to be independent if for every event A from the first
experiment and B from the second experiment, P(A, B) = P(A) P(B)
/
Independence for more than two events
∙ In general A , . . . , An are mutually independent if for every subset S ⊆ {, . . . , n},
P(∩i∈S Ai ) = P(Ai )
i∈S
∙ For example, for events A , A , A :
P(A , A ) = P(A ) P(A ),
P(A , A ) = P(A ) P(A ),
P(A , A ) = P(A ) P(A ), and
P(A , A , A ) = P(A ) P(A ) P(A )
/
Independence for more than two events
n
∙ Note that P(A , . . . , An ) = P(Ai ) alone is not sufficient for independence
i=
∙ Example: Roll two fair dice independently. Define the events
A = {st die = , , or }, B = {st die = , , or }, C = {Sum of outcomes = }
Are A, B, and C independent?
∙ Since the dice are fair P(A) = P(B) =
The event C = {(, ), (, ), (, ), (, )}. Since the experiments are independent,
P(C) = =
Since A ∩ B ∩ C = {(, )}, P(A, B, C) = = P(A) P(B) P(C)
But P(A, B) = = ≠ = P(A) P(B), thus A, B, and C are not independent!
/
Independence for more than two events
∙ Also, independence of subsets does not necessarily imply independence
∙ Example: Flip a fair coin twice independently. Define the events:
A: First toss is H
B: Second toss is H
C: First and second tosses have different outcomes
Are A, B, C independent?
∙ Clearly, A, B are independent, A, C are independent, and B, C are independent
But, P(A, B, C) = ̸= P(A) P(B) P(C) = ,
since if you know A and B, you know that C could not have occurred
/
Coupon collector problem
∙ A food company includes a baseball player card in each cereal box it sells.
Assume k players and each box contains a randomly and independently chosen
card of one of them (with replacement). A fan buys n cereal boxes,
what is the probability that she gets Babe Ruth’s card?
Let Ei be the event that she gets Ruth’s card in cereal box i ∈ {, , . . . , n}
We want to find P ∪ni= Ei
Since the card selection is random and with replacement, P(Ei ) =
k
This is not enough information to find the probability of the union
Let’s try to find the probability of the complementary event P ∩ni= Eic (De Morgan’s)
Since the Ei s are independent, so are the Eic s, hence
n n
n
P ∩ni= Eic = P(Eic ) = ( − P(Ei )) = −
i= i= k
n
Hence, P ∪ni= Ei = − P ∩ni= Eic = − −
k
For example if k = and we want P ∪ni= Ei = ., then n ≈
/
Summary
∙ Conditional probability captures prior information about the event probability
∙ Chain rule provides two important methods for computing probability:
Sequential (tree) method
Law of total probability method
∙ Bayes rule, basis of statistical inference
∙ Independence of events
/