0% found this document useful (0 votes)
11 views28 pages

01-Bayes-All-Handout Prob

The document is an introduction to probability, focusing on conditional probabilities and Bayes' theorem, presented by Mateja Jamnik and Thomas Sauerwald at the University of Cambridge. It outlines the course logistics, including a syllabus and recommended readings, and discusses the importance of probability in various applications, particularly in machine learning. Key concepts covered include conditional probability, Bayes' theorem, and their applications in real-world scenarios such as spam detection and medical testing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views28 pages

01-Bayes-All-Handout Prob

The document is an introduction to probability, focusing on conditional probabilities and Bayes' theorem, presented by Mateja Jamnik and Thomas Sauerwald at the University of Cambridge. It outlines the course logistics, including a syllabus and recommended readings, and discusses the importance of probability in various applications, particularly in machine learning. Key concepts covered include conditional probability, Bayes' theorem, and their applications in real-world scenarios such as spam detection and medical testing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Introduction to Probability

Lecture 1: Conditional probabilities and Bayes’ theorem


Mateja Jamnik, Thomas Sauerwald

University of Cambridge, Department of Computer Science and Technology


email: {mateja.jamnik,thomas.sauerwald}@cl.cam.ac.uk
Outline

Logistics, motivation, background

Conditional probability

Bayes’ Theorem

Independence

Intro to Probability Logistics, motivation, background 2


Lecturers

Mateja Jamnik Thomas Sauerwald

Intro to Probability Logistics, motivation, background 3


Course logistics

Rough syllabus:
Introduction to probability: 1 lecture
Discrete and continuous random variables: 6 lectures
Moments and limit theorems: 3 lectures
Applications/statistics: 2 lectures

Recommended reading:
Ross, S.M. (2014). A First course in probability. Pearson (9th ed.).
Dekking, F.M., et. al. (2005) A modern introduction to probability and
statistics. Springer.
Bertsekas, D.P. & Tsitsiklis, J.N. (2008). Introduction to probability. Athena
Scientific.
Grimmett, G. & Welsh, D. (2014). Probability: an Introduction. Oxford
University Press (2nd ed.).

Intro to Probability Logistics, motivation, background 4


Why probability?

Gives us mathematical tools to deal with uncertain events.

It is used everywhere, especially in applications of machine learning.

Machine learning: use probability to compute predictions about and from data.

Probability is not statistics:


Both about random processes.
Probability: logically self-contained, few rules for computing, one correct answer.
Statistics: messier, more art, get experimental data and try to draw
probabilistic conclusions, no single correct answer.

Intro to Probability Logistics, motivation, background 5


Applications of probability

Ranking Websites Matching


⎛ 0 0 0 0 0 1 1 0 0 1 ⎞

⎜ ⎟


⎜ 0 0 0 0 0 0 1 1 0 1 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 1 1 0 1 0 ⎟


⎜ ⎟


⎜ ⎟



0 0 0 0 0 1 0 1 1 0 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 0 0 1 1 1 ⎟

A= ⎜
⎜ ⎟


⎜ 1 0 1 1 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟


⎜ 1 1 1 0 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ 0 1 0 1 1 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟

⎜ 0 0 1 1 1 0 0 0 0 0 ⎟
⎝ 1 1 0 0 1 0 0 0 0 0 ⎠

Finance Medicine
Computer Science Mathematics

Biology Probability Physics

...

Data Mining Deep Learning Particle Processes


7
10

3
10
Intro to Probability Logistics, motivation, background 6
Prerequisite background

Set theory
Counting: product rule, sum rule, inclusion-exclusion
Combinatorics: permutations
Probability space: sample space, event space
Axioms
Union bound

Look for revision material of above on the course website:


https://www.cl.cam.ac.uk/teaching/2324/IntroProb/

Intro to Probability Logistics, motivation, background 7


Outline

Logistics, motivation, background

Conditional probability

Bayes’ Theorem

Independence

Intro to Probability Conditional probability 8


Definition

Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]

Sample space: all possible outcomes consistent with F (i.e., S ∩ F = F )


Event space: all outcomes in E consistent with F (i.e., E ∩ F )
Note: we assume that all outcomes are equally likely
# outcomes in E∩F
# outcomes in E ∩ F # outcomes in S P[E ∩ F ]
P [ E∣F ] = = =
# outcomes in F # outcomes in F P[F ]
# outcomes in S

Intro to Probability Conditional probability 9


Example

Example
Two dice are rolled yielding value D1 and D2 . Let E be event that
D1 + D2 = 4.
1. What is P [ E ]?
2. Let event F be D1 = 2. What is P [ E∣F ]?
Answer

1. ∣S∣ = 36, E = {(1, 3), (2, 2), (3, 1)}, thus P [ E ] = 3


36
= 1
12
.
2. S = {(2, 1), (2, 2), 2, 3), (2, 4), (2, 5), (2, 6)}, E = {(2, 2)}, thus
P [ E∣F ] = 16

Intro to Probability Conditional probability 10


Rules revisited

Chain rule
Rearranging the definition of conditional probability gives us:

P [ EF ] = P [ E∣F ] P [ F ]

Generalisation of the Chain rule:

Multiplication rule

P [ E1 E2 ⋯En ] = P [ E1 ] P [ E2 ∣E1 ] P [ E3 ∣E2 E1 ] ⋯P [ En ∣E1 ⋯En−1 ]

Intro to Probability Conditional probability 11


Example
Example
An ordinary deck of 52 playing cards is randomly divided into 4 piles of
13 cards each. What is the probability that each pile has exactly 1 ace?
Answer
Define:
E1 = ace♥ is in any one pile
E2 = ace♥ and ace♠ are in different piles
E3 = ace♥, ace♠ and ace♣ are in different piles
E4 = all aces are in different piles
P [ E1 E2 E3 E4 ] = P [ E1 ] P [ E2 ∣E1 ] P [ E3 ∣E1 E2 ] P [ E4 ∣E1 E2 E3 ]
We have P [ E1 ] = 1. For rest we consider complement of next ace
being in the same pile and thus have:
P [ E2 ∣E1 ] = 1 − 51
12

P [ E3 ∣E1 E2 ] = 1 − 50
24

P [ E4 ∣E1 E2 E3 ] = 1 − 36
49
Thus:
P [ E1 E2 E3 E4 ] = 39⋅26⋅13
51⋅50⋅49
≈ 0.105

Intro to Probability Conditional probability 12


Outline

Logistics, motivation, background

Conditional probability

Bayes’ Theorem

Independence

Intro to Probability Bayes’ Theorem 13


Law of total probability

The law of total probability (a.k.a. Partition theorem)


For events E and F where P [ F ] > 0, then for any event E
c c c
P [ E ] = P [ EF ] + P [ EF ] = P [ E∣F ] P [ F ] + P [ E∣F ] P [ F ]

In general, for disjoint events F1 , F2 , . . . Fn s.t. F1 ∪ F2 ∪ ⋯ ∪ Fn = S,


n
P [ E ] = ∑ P [ E∣Fi ] P [ Fi ]
i=1

Intuition:
c
Want to know probability of E. There are two scenarios, F and F . If we
know these and the probability of E conditioned on each scenario, we can
compute the probability of E.

Intro to Probability Bayes’ Theorem 14


Lightbulb example
Example
There are 3 boxes each containing a different number of light bulbs.
The first box has 10 bulbs of which 4 are dead, the second has 6 bulbs
of which 1 is dead, and the third box has 8 bulbs of which 3 are dead.
What is the probability of a dead bulb being selected when a bulb is
chosen at random from one of the 3 boxes (each box has equal chance
of being picked)?
Answer

Let event E = "dead bulb is picked", and F1 = "bulb is picked from first
box", F2 = "bulb is picked from second box" and F3 = "bulb is picked
from third box". We know:
4 1 3
P [ E∣F1 ] = , P [ E∣F2 ] = , P [ E∣F3 ] =
10 6 8
We need to compute P [ E ], and we know that P [ Fi ] = 13 :
n
4 1 1 1 3 1 113
P [ E ] = ∑ P [ E∣Fi ] P [ Fi ] = + + = ≈ 0.31
i=1
10 3 6 3 8 3 360

Intro to Probability Bayes’ Theorem 15


Bayes’ theorem
How many spam emails contain the word "Dear"?
P [ E∣F ] = P [ "Dear"∣spam ]
But how about what is the probability that an email containing "Dear" is
spam?
P [ F ∣E ] = P [ spam∣"Dear" ]
Bayes’ theorem
For any events E and F where P [ E ] > 0 and P [ F ] > 0,
P [ E∣F ] P [ F ]
P [ F ∣E ] =
P[E ]
and in expanded form,
P [ E∣F ] P [ F ] P [ E∣F ] P [ F ]
P [ F ∣E ] = =
P [ E∣F ] P [ F ] + P [ E∣F c ] P [ F c ] ∑ni=1 P [ E∣Fi ] P [ Fi ]

using the Law of Total Probability. Note that all events Fi must be
mutually exclusive (non-overlapping) and exhaustive (their union is the
complete sample space) .

Intro to Probability Bayes’ Theorem 16


Example

Example
60% of all email in 2022 is spam. 20% of spam contains the word
"Dear". 1% of non-spam contains the word "Dear". What is the
probability that an email is spam given it contains the word "Dear"?
Answer

Let event E ="Dear", event F = spam.


P [ F ] = 0.6 thus P [ F ] = 0.4.
c

P [ E∣F ] = 0.2.
P [ E∣F ] = 0.01.
c

Compute P [ F ∣E ].
P [ E∣F ] P [ F ]
P [ F ∣E ] = =
P [ E∣F ] P [ F ] + P [ E∣F c ] P [ F c ]
(0.2)(0.6)
= ≈ 0.968
(0.2)(0.6) + (0.01)(0.4)

Intro to Probability Bayes’ Theorem 17


Bayes’ terminology

prior
posterior likelihood

P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]

normalisation constant

F : hypothesis, E: evidence
P [ F ]: "prior probability" of hypothesis
P [ E∣F ]: probability of evidence given hypothesis (likelihood)
P [ E ]: calculated by making sure that probabilities of all
outcomes sum to 1 (they are "normalised")

Intro to Probability Bayes’ Theorem 18


Confusion matrix (error matrix)

Used in classification tasks for predicting output error.

True condition
Total population Condition positive Condition negative
c
F F
Predicted True positive False positive
Predicted

P [ E∣F ] P [ E∣F ]
c
condition

condition pos-
itive E
Predicted False negative True negative
P [ E ∣F ] P [ E ∣F ]
c c c
condition neg-
c
ative E

Intro to Probability Bayes’ Theorem 19


Medical testing example
Example
A test is 98% effective at detecting the disease COVID-19 ("true
positive").
The test has a "false positive" rate of 1%.
0.5% of the population has COVID-19.
What is the likelihood you have COVID-19 if you test positive?
Answer

Let E: test positive, F : actually have COVID-19.


Need to find P [ F ∣E ].
We know:
P [ E∣F ] = 0.98
c
P [ E∣F ] = 0.01
c
P [ F ] = 0.005 thus P [ F ] = 0.995
Thus
P [ E∣F ] P [ F ]
P [ F ∣E ] = =
P [ E∣F ] P [ F ] + P [ E∣F c ] P [ F c ]
(0.98)(0.005)
= ≈ 0.33
(0.98)(0.005) + (0.01)(0.995)

Intro to Probability Bayes’ Theorem 20


Bayesian intuition

33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).
c
F yes disease F no disease
E test+ True positive False positive
c
P [ E∣F ] = 0.98 P [ E∣F ] = 0.01
c
E test- False negative True negative
c c c
P [ E ∣F ] = 0.02 P [ E ∣F ] = 0.99
But what is a chance of having COVID-19 if you test and it comes back negative?
c
c P [ E ∣F ] P [ F ]
P [ F ∣E ] = ≈ 0.0001
P [ E c ∣F ] P [ F ] + P [ E c ∣F c ] P [ F c ]
We update our beliefs with Bayes’ theorem:
I have 0.5% chance of having COVID-19. I take the test:
Test is positive: I now have 33% chance of having COVID-19.
Test is negative: I now have 0.01% chance of having COVID-19.
So it makes sense to take the test.

Intro to Probability Bayes’ Theorem 21


Outline

Logistics, motivation, background

Conditional probability

Bayes’ Theorem

Independence

Intro to Probability Independence 22


Independent events

Independence
Two events E and F are independent if and only if

P [ EF ] = P [ E ] P [ F ]

Otherwise, they are called dependent events.


In general, n events E1 , E2 , . . . , En are mutually independent if for every
subset of these events with r elements (where r ≤ n) it holds that

P [ Ea Eb ⋯Er ] = P [ Ea ] P [ Eb ] ⋯P [ Er ]

Therefore for 3 events E, F , G to be independent, we must have

P [ EFG ] =P [ E ] P [ F ] P [ G ]
P [ EF ] =P [ E ] P [ F ]
P [ EG ] =P [ E ] P [ G ]
P [ FG ] =P [ F ] P [ G ]

Intro to Probability Independence 23


Independence of complement

Notice an equivalent definition for independent events E and F (P [ F ] > 0)

P [ E∣F ] = P [ E ]

Proof:
P [ EF ] P [ E ] P [ F ]
P [ E∣F ] = = = P[E ]
P[F ] P[F ]

Independence of complement
c
If events E and F are independent, then E and F are independent:
c c
P [ EF ] = P [ E ] P [ F ]

Proof:
c
P [ EF ] =P [ E ] − P [ EF ] = P [ E ] − P [ E ] P [ F ] =
c
=P [ E ] (1 − P [ F ]) = P [ E ] P [ F ]

Intro to Probability Independence 24


Example

Example
Each roll of a die is an independent trial. We have two rolls of D1 and
D2 . Let event E ∶ D1 = 1, F ∶ D2 = 6 and event G ∶ D1 + D2 = 7 (thus
G = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}).
1. Are E and F independent?
2. Are E and G independent?
3. Are E, F , G independent?
Answer

1. Yes, since P [ E ] = 16 , P [ F ] = 1
6
and P [ EF ] = 1
36
.
2. Yes, since P [ E ] = 16 , P [ G ] = 1
6
and P [ EG ] = 1
36
.
3. No, since P [ EFG ] = 1
36
≠ 11 1
66 6
.

Intro to Probability Independence 25


Conditional independence

Conditional independence
Two events E and F are called conditionally independent given a third
event G if
P [ EF ∣G ] = P [ E∣G ] P [ F ∣G ]
Or equivalently,
P [ E∣FG ] = P [ E∣G ]

Notice that:
Dependent events can become conditionally independent.
Independent events can become conditionally dependent.
Knowing when conditioning breaks or creates independence is a big part
of building complex probabilistic models.

Intro to Probability Independence 26


Example revisited

Example
Each roll of a die is an independent trial. We have two rolls of D1 and
D2 . Let event E ∶ D1 = 1, F ∶ D2 = 6 and event G ∶ D1 + D2 = 7 (thus
G = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}).
1. Are E and F independent?
2. Are E and F independent given G?
Answer

1. Yes, since P [ E ] = 16 , P [ F ] = 1
6
and P [ EF ] = 1
36
.
2. No, since P [ E∣G ] = 16 and P [ F ∣G ] = 16 , but
P [ EF ∣G ] = 61 ≠ P [ E∣G ] P [ F ∣G ].

Intro to Probability Independence 27


Summary of conditional probability

Conditioning on event G:

Name of rule Original rule Conditional rule

1st axiom of probability 0 ≤ P[E ] ≤ 1 0 ≤ P [ E∣G ] ≤ 1


c c
Complement P[E ] = 1 − P[E ] P [ E∣G ] = 1 − P [ E ∣G ]

Chain rule P [ EF ] = P [ E∣F ] P [ F ] P [ EF ∣G ] = P [ E∣FG ] P [ F ∣G ]


P [ E∣F ] P [ F ] P [ E∣FG ] P [ F ∣G ]
Bayes’ theorem P [ F ∣E ] = P [ F ∣EG ] =
P[E ] P [ E∣G ]

Intro to Probability Independence 28

You might also like