07 Probability Review
07 Probability Review
07 Probability Review
• Events
– discrete random variables, continuous random variables,
compound events
• Axioms of probability
– What defines a reasonable theory of uncertainty
• Independent events
• Conditional probabilities
• Bayes rule and beliefs
• Joint probability distribution
• Expectations
• Independence, Conditional independence
Random Variables
• Informally, A is a random variable if
– A denotes something about which we are uncertain
– perhaps the outcome of a randomized experiment
• Examples
A = True if a randomly drawn person from our class is female
A = The hometown of a randomly drawn person from our class
A = True if two randomly drawn persons from our class have same birthday
Sample space
of all possible Worlds in which P(A) = Area of
worlds A is true
reddish oval
iff
A ~A
A useful theorem
• 0 <= P(A) <= 1, P(True) = 1, P(False) = 0,
P(A or B) = P(A) + P(B) - P(A and B)
A^ B
A ^ ~B
B
Definition of Conditional Probability
P(A ^ B)
P(A|B) = -----------
P(B)
A
B
Definition of Conditional
Probability
P(A ^ B)
P(A|B) = -----------
P(B)
A
B
P(B| A)P(A)
P(A|B) =
P(B| A)P(A)+P(B|~ A)P(~ A)
P(B| A X)P(A X)
P(A|B X) =
P(B X)
Applying Bayes Rule
P(B| A)P(A)
P(A|B) =
P(B| A)P(A)+ P(B|~ A)P(~ A)
A = you have covid, B = you just coughed
Assume:
P(A) = 0.05
P(B|A) = 0.80
P(B| ~A) = 0.20
Assume:
P(A) = 0.05
P(B|A) = 0.80
P(B| ~A) = 0.20
Assume:
P(A) = 0.1
P(B|A) = 0.60
instead of F: X →Y,
learn P(Y | X)
The Joint Distribution Example: Boolean
variables A, B, C
A B C Prob
Recipe for making a joint 0 0 0 0.30
distribution of M variables: 0 0 1 0.05
0 1 0 0.10
0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1 1 0 0.25
1 1 1 0.10
P(E1 E2 )
P(row)
rows matching E and E
P(E1 | E2 ) = = 1 2
P(E2 ) P(row)
rows matching E2
Equivalently, P(W | G, H)
Equivalently, P(W | G, H)
Are we done?
sounds like the solution to
learning F: X →Y,
or P(Y | X).
α1
-----------
α1 + α0
Estimating θ = P(X=1)
X=1 X=0
Case A:
100 flips: 51 Heads (X=1), 49 Tails (X=0)
Case B:
3 flips: 2 Heads (X=1), 1 Tails (X=0)
Estimating θ = P(X=1)
X=1 X=0
Case C: (online learning)
• keep flipping, want single learning algorithm
that gives reasonable estimate after each flip
Principles for Estimating Probabilities
Data D:
Data D: {1 1 0 0 1}
[C. Guestrin]
and MAP estimate is therefore
and MAP estimate is therefore
Some terminology
• Likelihood function: P(data | θ )
• Prior: P(θ)
• Posterior: P(θ | data)
Example: X P(X)
0 0.3
1 0.2
2 0.5
Expected values
Given discrete random variable X, the expected value of
X, written E[X] is
Remember: