01-Bayes-All-Handout Prob
01-Bayes-All-Handout Prob
Conditional probability
Bayes’ Theorem
Independence
Rough syllabus:
Introduction to probability: 1 lecture
Discrete and continuous random variables: 6 lectures
Moments and limit theorems: 3 lectures
Applications/statistics: 2 lectures
Recommended reading:
Ross, S.M. (2014). A First course in probability. Pearson (9th ed.).
Dekking, F.M., et. al. (2005) A modern introduction to probability and
statistics. Springer.
Bertsekas, D.P. & Tsitsiklis, J.N. (2008). Introduction to probability. Athena
Scientific.
Grimmett, G. & Welsh, D. (2014). Probability: an Introduction. Oxford
University Press (2nd ed.).
Machine learning: use probability to compute predictions about and from data.
Finance Medicine
Computer Science Mathematics
...
3
10
Intro to Probability Logistics, motivation, background 6
Prerequisite background
Set theory
Counting: product rule, sum rule, inclusion-exclusion
Combinatorics: permutations
Probability space: sample space, event space
Axioms
Union bound
Conditional probability
Bayes’ Theorem
Independence
Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]
Example
Two dice are rolled yielding value D1 and D2 . Let E be event that
D1 + D2 = 4.
1. What is P [ E ]?
2. Let event F be D1 = 2. What is P [ E∣F ]?
Answer
Chain rule
Rearranging the definition of conditional probability gives us:
P [ EF ] = P [ E∣F ] P [ F ]
Multiplication rule
P [ E3 ∣E1 E2 ] = 1 − 50
24
P [ E4 ∣E1 E2 E3 ] = 1 − 36
49
Thus:
P [ E1 E2 E3 E4 ] = 39⋅26⋅13
51⋅50⋅49
≈ 0.105
Conditional probability
Bayes’ Theorem
Independence
Intuition:
c
Want to know probability of E. There are two scenarios, F and F . If we
know these and the probability of E conditioned on each scenario, we can
compute the probability of E.
Let event E = "dead bulb is picked", and F1 = "bulb is picked from first
box", F2 = "bulb is picked from second box" and F3 = "bulb is picked
from third box". We know:
4 1 3
P [ E∣F1 ] = , P [ E∣F2 ] = , P [ E∣F3 ] =
10 6 8
We need to compute P [ E ], and we know that P [ Fi ] = 13 :
n
4 1 1 1 3 1 113
P [ E ] = ∑ P [ E∣Fi ] P [ Fi ] = + + = ≈ 0.31
i=1
10 3 6 3 8 3 360
using the Law of Total Probability. Note that all events Fi must be
mutually exclusive (non-overlapping) and exhaustive (their union is the
complete sample space) .
Example
60% of all email in 2022 is spam. 20% of spam contains the word
"Dear". 1% of non-spam contains the word "Dear". What is the
probability that an email is spam given it contains the word "Dear"?
Answer
P [ E∣F ] = 0.2.
P [ E∣F ] = 0.01.
c
Compute P [ F ∣E ].
P [ E∣F ] P [ F ]
P [ F ∣E ] = =
P [ E∣F ] P [ F ] + P [ E∣F c ] P [ F c ]
(0.2)(0.6)
= ≈ 0.968
(0.2)(0.6) + (0.01)(0.4)
prior
posterior likelihood
P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]
normalisation constant
F : hypothesis, E: evidence
P [ F ]: "prior probability" of hypothesis
P [ E∣F ]: probability of evidence given hypothesis (likelihood)
P [ E ]: calculated by making sure that probabilities of all
outcomes sum to 1 (they are "normalised")
True condition
Total population Condition positive Condition negative
c
F F
Predicted True positive False positive
Predicted
P [ E∣F ] P [ E∣F ]
c
condition
condition pos-
itive E
Predicted False negative True negative
P [ E ∣F ] P [ E ∣F ]
c c c
condition neg-
c
ative E
33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).
c
F yes disease F no disease
E test+ True positive False positive
c
P [ E∣F ] = 0.98 P [ E∣F ] = 0.01
c
E test- False negative True negative
c c c
P [ E ∣F ] = 0.02 P [ E ∣F ] = 0.99
But what is a chance of having COVID-19 if you test and it comes back negative?
c
c P [ E ∣F ] P [ F ]
P [ F ∣E ] = ≈ 0.0001
P [ E c ∣F ] P [ F ] + P [ E c ∣F c ] P [ F c ]
We update our beliefs with Bayes’ theorem:
I have 0.5% chance of having COVID-19. I take the test:
Test is positive: I now have 33% chance of having COVID-19.
Test is negative: I now have 0.01% chance of having COVID-19.
So it makes sense to take the test.
Conditional probability
Bayes’ Theorem
Independence
Independence
Two events E and F are independent if and only if
P [ EF ] = P [ E ] P [ F ]
P [ Ea Eb ⋯Er ] = P [ Ea ] P [ Eb ] ⋯P [ Er ]
P [ EFG ] =P [ E ] P [ F ] P [ G ]
P [ EF ] =P [ E ] P [ F ]
P [ EG ] =P [ E ] P [ G ]
P [ FG ] =P [ F ] P [ G ]
P [ E∣F ] = P [ E ]
Proof:
P [ EF ] P [ E ] P [ F ]
P [ E∣F ] = = = P[E ]
P[F ] P[F ]
Independence of complement
c
If events E and F are independent, then E and F are independent:
c c
P [ EF ] = P [ E ] P [ F ]
Proof:
c
P [ EF ] =P [ E ] − P [ EF ] = P [ E ] − P [ E ] P [ F ] =
c
=P [ E ] (1 − P [ F ]) = P [ E ] P [ F ]
Example
Each roll of a die is an independent trial. We have two rolls of D1 and
D2 . Let event E ∶ D1 = 1, F ∶ D2 = 6 and event G ∶ D1 + D2 = 7 (thus
G = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}).
1. Are E and F independent?
2. Are E and G independent?
3. Are E, F , G independent?
Answer
1. Yes, since P [ E ] = 16 , P [ F ] = 1
6
and P [ EF ] = 1
36
.
2. Yes, since P [ E ] = 16 , P [ G ] = 1
6
and P [ EG ] = 1
36
.
3. No, since P [ EFG ] = 1
36
≠ 11 1
66 6
.
Conditional independence
Two events E and F are called conditionally independent given a third
event G if
P [ EF ∣G ] = P [ E∣G ] P [ F ∣G ]
Or equivalently,
P [ E∣FG ] = P [ E∣G ]
Notice that:
Dependent events can become conditionally independent.
Independent events can become conditionally dependent.
Knowing when conditioning breaks or creates independence is a big part
of building complex probabilistic models.
Example
Each roll of a die is an independent trial. We have two rolls of D1 and
D2 . Let event E ∶ D1 = 1, F ∶ D2 = 6 and event G ∶ D1 + D2 = 7 (thus
G = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}).
1. Are E and F independent?
2. Are E and F independent given G?
Answer
1. Yes, since P [ E ] = 16 , P [ F ] = 1
6
and P [ EF ] = 1
36
.
2. No, since P [ E∣G ] = 16 and P [ F ∣G ] = 16 , but
P [ EF ∣G ] = 61 ≠ P [ E∣G ] P [ F ∣G ].
Conditioning on event G: