6 Probabilities
6 Probabilities
2
So far
• Binary classification,
• Linear classifier,
• E.g., infected with COVID19 (y/n) based on symptoms
• Train using the perceptron algorithm
• Guaranteed to converge in the separable case
• Multiclass classification,
• Linear classifier,
• E.g., most likely disease based on symptoms
• Train using the multiclass perceptron algorithm
• Guaranteed to converge in the separable case
(http://proceedings.mlr.press/v97/beygelzimer19a/beygelzimer19a-supp.pdf)
3
Generative vs Discriminative Models
• So far: Discriminative Classifiers
• Assume some functional form for
• Estimate parameters of directly from training data
• Finds a decision boundary between the classes
• E.g., K-NN, Linear classifier (perceptron)
• Limited by model assumptions (curse of dimensionality, linear separability)
• Next: Generative classifiers
• Assume some functional form for ,
• Estimate parameters of directly from training data
• Use Bayes rule to calculate
• Finds the actual distribution of each class
4
Estimating probability
• Empirical estimation of probability
• 10 coin flips
5
Probability of observing samples of out
6
Sidestep: PDF
• Probability density function (PDF)
• The probability of the random variable
falling within a particular range of values
• Given by the integral of this variable's PDF
over that range
• Taking on a specific value has a prob=0
• The PDF is nonnegative everywhere, and
its integral (CDF) over the entire space is
equal to 1
7
Log likelihood (important, pay
attention):
Estimating probability • MLE is invariant under this
transformation
• Common trick for breaking up a
factored objective function
• • But log is not defined over negative
values
• Probability can’t be negative
• Find max with respect to
Log is monotonically
• increasing
𝑓 (𝑥)
log ( 𝑓 ( 𝑥 ) )
𝑓 (𝑥)
8
Unseen Events
13
Maximum a posteriori probability
15
Empirical probability estimation summary
• MLE: . is the set of model parameters
• MAP: . is set of random variables.
• MAP only adds the term
• Independent of the data and penalizes if the parameters, deviate too much
from our prior believe
• We will later revisit this as a form of regularization, where will be interpreted
as a measure of classifier complexity
16
Generative model
• Can be trained with either MLE of MAP approaches
• Provides a distribution over labels
• Why is this powerful?
• Allows the use of statistical inference (probabilistic reasoning)!
17
Notation clarification Quiz 1 now available.
Complete by Tuesday, Sep-28
19
Inference from probabilities
• A ghost is in the grid
somewhere
• Sensor readings tell how close a
square is to the ghost
• On the ghost: red
• 1 or 2 away: orange
• 3 or 4 away: yellow
• 5+ away: green
• Weather:
• Temperature:
W P
T P
sun 0.6
hot 0.5
rain 0.1
cold 0.5
fog 0.3
meteor 0.0
Probability Distributions
• Random variables are affiliated with distributions
Shorthand notation:
T P W P
hot 0.5 sun 0.6
cold 0.5 rain 0.1
fog 0.3
meteor 0.0
T W P
hot sun 0.4
• Must obey:
hot rain 0.1
cold sun 0.2
cold rain 0.3
• P(+x, +y) ?
X Y P
+x +y 0.2
• P(+x) ? +x -y 0.3
-x +y 0.4
-x -y 0.1
• P(-y OR +x) ?
Marginal Distributions
• Marginal distributions are sub-tables which eliminate variables
• Marginalization (summing out): Combine collapsed rows by adding
T P
T W P hot 0.5
hot sun 0.4 cold 0.5
hot rain 0.1
cold sun 0.2
cold rain 0.3 W P
sun 0.6
rain 0.4
Quiz: Marginal Distributions
X P
+x
X Y P
-x
+x +y 0.2
+x -y 0.3
-x +y 0.4
Y P
-x -y 0.1
+y
-y
Conditional Probabilities
• Derived from the joint probability P(a,b)
• The definition of a conditional probability
P(a) P(b)
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
Quiz: Conditional Probabilities
• P(+x | +y) ?
X Y P
+x +y 0.2
+x -y 0.3
-x +y 0.4 • P(-x | +y) ?
-x -y 0.1
• P(-y | +x) ?
Conditional Distributions
• Conditional distributions are probability distributions over
some variables given fixed values of others
Conditional Distributions
Joint Distribution
W P
sun 0.8
rain 0.2 T W P
hot sun 0.4
hot rain 0.1
W P cold sun 0.2
sun 0.4 cold rain 0.3
rain 0.6
Conditional Distributions
T W P
hot sun 0.4
W P
hot rain 0.1
sun 0.4
cold sun 0.2
rain 0.6
cold rain 0.3
Normalization Trick
• Example 2
• Example 1
W P W P T W P T W P
Normalize
sun 0.2 hot sun 20 Normalize hot sun 0.4
sun 0.4
rain 0.3 hot rain 5 hot rain 0.1
Z = 0.5 rain 0.6
cold sun 10 Z = 50 cold sun 0.2
cold rain 15 cold rain 0.3
Quiz: Normalization Trick
• P(X | Y=-y) ?
SELECT the joint NORMALIZE the
probabilities selection
X Y P (make it sum to one)
matching the
+x +y 0.2 X P X P
evidence
+x -y 0.3 +x +x
-x +y 0.4 -x -x
-x -y 0.1
Probabilistic Inference
• Probabilistic inference: compute a desired
probability from other known probabilities (e.g.
conditional from joint)
• We generally compute conditional probabilities
• P(on time | no reported accidents) = 0.90
• These represent the agent’s beliefs given the
evidence
• Probabilities change with new evidence:
• P(on time | no accidents, 5 a.m.) = 0.95
• P(on time | no accidents, 5 a.m., raining) = 0.80
• Observing new evidence causes beliefs to be updated
Inference by Enumeration
* Works fine with
• General case: We want: multiple query
• Evidence variables: variables, too
• Query* variable:
All variables
• Hidden variables:
Step 1: Select the Step 2: Sum out H to get joint Step 3: Normalize
entries consistent of Query and evidence
with the evidence
Inference by Enumeration
• P(W)? W P
sun
S T W P
rain
summer hot sun 0.30
summer hot rain 0.05
• P(W | winter)?
W P summer cold sun 0.10
• Example:
D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
The Chain Rule
• More generally, can always write any joint distribution as an incremental product of
conditional distributions
• Proof:
Bayes’ Rule
Bayes’ Rule
• Two ways to factor a joint distribution over two variables:
• Dividing, we get:
• Example:
• M: meningitis, S: stiff neck
Example
givens
=0.0079
• Note: posterior probability of meningitis still very small
• Note: you should still get stiff necks checked out!
Quiz: Bayes’ Rule
• Given: D W P
W P wet sun 0.1
sun 0.8 dry sun 0.9
rain 0.2 wet rain 0.7
dry rain 0.3
52