0% found this document useful (0 votes)

18 views52 pages

6 Probabilities

This document discusses machine learning concepts including generative models, inference from probabilities, and probabilistic reasoning. Generative models estimate probabilities to model the distributions of classes rather than just classify points. This allows using probabilities and statistical inference to reason about unknown variables given observed evidence.

Uploaded by

damasodra33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views52 pages

6 Probabilities

Uploaded by

damasodra33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 52

CSCE-421 Machine Learning

4. Inference from Probabilities

Instructor: Guni Sharon, classes: TR 3:55-5:10, HRBB 124

1
Based on slides by: Pieter Abbeel, Dan Klein
Announcements
• Announcements
• Lecture recordings “may be synchronous”
• The student must be admitted by faculty into the zoom (instead of free joining) so that
it only takes care of the student in quarantine situation
• The student must agree to allow his name to be seen by students in class because this
means that he/she gives up his privacy. The student is disclosing private medical
information by joining the zoom session
• No class or office hours on Tuesday, Sep-28
• Overdue:
• Written assignment 1: K Nearest Neighbors + Linear algebra (due Wednesday, Sep-15)
• Due:
• Programing assignment 1: Perceptron (due Wednesday, Sep-22)

2
So far
• Binary classification,
• Linear classifier,
• E.g., infected with COVID19 (y/n) based on symptoms
• Train using the perceptron algorithm
• Guaranteed to converge in the separable case
• Multiclass classification,
• Linear classifier,
• E.g., most likely disease based on symptoms
• Train using the multiclass perceptron algorithm
• Guaranteed to converge in the separable case
(http://proceedings.mlr.press/v97/beygelzimer19a/beygelzimer19a-supp.pdf)

3
Generative vs Discriminative Models
• So far: Discriminative Classifiers
• Assume some functional form for
• Estimate parameters of directly from training data
• Finds a decision boundary between the classes
• E.g., K-NN, Linear classifier (perceptron)
• Limited by model assumptions (curse of dimensionality, linear separability)
• Next: Generative classifiers
• Assume some functional form for ,
• Estimate parameters of directly from training data
• Use Bayes rule to calculate
• Finds the actual distribution of each class
4
Estimating probability
• Empirical estimation of probability
• 10 coin flips

• Maximum Likelihood Estimation

• Assume some underlying parametrized distribution that comes from

• Find parameters, , that maximize the probability of sampling

5
Probability of observing samples of out

Estimating probability of sample in total where in a single

observation is

• Assume some underlying parametrized ( )

𝑃 ( 𝑥 )= 𝑛 𝑝 ( 1− 𝑝 )
𝑥
𝑥 𝑛− 𝑥

distribution that comes from

• Find parameters, , that maximize the

probability of sampling

• For the coin case, we have a Binomial

distribution
•

6
Sidestep: PDF
• Probability density function (PDF)
• The probability of the random variable
falling within a particular range of values
• Given by the integral of this variable's PDF
over that range
• Taking on a specific value has a prob=0
• The PDF is nonnegative everywhere, and
its integral (CDF) over the entire space is
equal to 1

7
Log likelihood (important, pay
attention):
Estimating probability • MLE is invariant under this
transformation
• Common trick for breaking up a
factored objective function
• • But log is not defined over negative
values
• Probability can’t be negative
• Find max with respect to
Log is monotonically
• increasing
𝑓 (𝑥)

log ( 𝑓 ( 𝑥 ) )

𝑓 (𝑥)
8
Unseen Events

MLE for P(outcome=5) ?

Laplace Smoothing
Estimate the probability of drawing
• Laplace’s estimate: ‘red’ given 3 observations
• Pretend you saw every outcome
once more than you actually did
r r b
Laplace Smoothing
• Laplace’s estimate (extended):
• Pretend you saw every outcome k extra times r r b

• What’s Laplace with k = 0?

• k is the strength of the uniformity prior
• Laplace for conditionals:
• Smooth each condition independently:
Maximum Likelihood?
• Relative frequencies are the maximum likelihood estimates for a given data set

• Another option is to consider the maximum a posteriori probability given the

data

We need to estimate a distribution

over which is now treated as a
random var instead of a parameter
Written assignment 2: generative models
• Assume a data set with occurrences of “Head” event and
occurrences of “Tail” event
• Define as the MLE of with Laplace smoothing with some parameter
• Define as the MAP of with with a Beta distribution () prior

• Prove that, in this case,

• Due Friday, Sep-24 Normalizing constant

13
Maximum a posteriori probability

• Model as a random variable, drawn from a distribution

• Note that is not a random variable associated with an event in a
sample space
• is the prior distribution over the parameter(s) , before we see any
data
• is the likelihood of the data given the parameter(s)
• is the posterior distribution over the parameter(s) after we have
observed the data
14
Maximum a posteriori probability
𝑃 (𝜃∨𝐷)

15
Empirical probability estimation summary
• MLE: . is the set of model parameters
• MAP: . is set of random variables.
• MAP only adds the term
• Independent of the data and penalizes if the parameters, deviate too much
from our prior believe
• We will later revisit this as a form of regularization, where will be interpreted
as a measure of classifier complexity

16
Generative model
• Can be trained with either MLE of MAP approaches
• Provides a distribution over labels
• Why is this powerful?
• Allows the use of statistical inference (probabilistic reasoning)!

17
Notation clarification Quiz 1 now available.
Complete by Tuesday, Sep-28

• the probability of observing given event

• E.g.,
• the probability of sampling given a distribution that is defined by
• E.g.,
• Multivariate normal distribution
• the probability of sampling the entire training set given a distribution
that is defined by

• the probability of sampling the entire training set given a sampled

distribution (sampled from a distribution of distributions)
18
So far
• Discriminative model
• Maps features to labels
• Generative model
• Maps features to a distribution over labels (probability per class)
• Do we really need generative models?
• Seems like an overkill
• Allows the use of statistical inference (probabilistic reasoning)!

19
Inference from probabilities
• A ghost is in the grid
somewhere
• Sensor readings tell how close a
square is to the ghost
• On the ghost: red
• 1 or 2 away: orange
• 3 or 4 away: yellow
• 5+ away: green

 Sensors are noisy, but we know P(Color | Distance)

P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3)
0.05 0.15 0.5 0.3
[Demo: Ghostbuster – no probability (L12D1) ]
Video of Demo Ghostbuster – No probability
Uncertainty
• General situation:
• Observed variables (evidence, X): Agent knows
certain things about the state of the world (e.g.,
sensor readings or symptoms)
• Unobserved variables (Y): Agent needs to
reason about other aspects (e.g. where an object
is or what disease is present)
• Generative model (): Agent knows something
about how the known variables relate to the
unknown variables
• Probabilistic reasoning gives us a framework
for managing our beliefs and knowledge
Random Variables
• A random variable is some aspect of the world about
which we (may) have uncertainty
• R = Is it raining?
• T = Is it hot or cold?
• D = How long will it take to drive to work?
• L = Where is the ghost?
• We denote random variables with capital letters
• Random variables have domains
• R in {true, false} (often write as {+r, -r})
• T in {hot, cold}
• D in [0, )
• L in possible locations, maybe {(0,0), (0,1), …}
Probability Distributions
• Associate a probability with each value

• Weather:
• Temperature:

W P
T P
sun 0.6
hot 0.5
rain 0.1
cold 0.5
fog 0.3
meteor 0.0
Probability Distributions
• Random variables are affiliated with distributions

Shorthand notation:
T P W P
hot 0.5 sun 0.6
cold 0.5 rain 0.1
fog 0.3
meteor 0.0

OK if all domain entries are unique

• A distribution is a TABLE of probability per value

• An outcome probability is a single number

• Must have: and

Joint Distributions
• A joint distribution over a set of random variables:
specifies a real number for each assignment (or outcome):

T W P
hot sun 0.4
• Must obey:
hot rain 0.1
cold sun 0.2
cold rain 0.3

• Size of distribution if n variables with domain sizes d?

• For all but the smallest distributions, impractical to write out!
Events
• An event is a set E of outcomes

• From a joint distribution, we can calculate the probability T W P

of any event hot sun 0.4
• Probability that it’s hot AND sunny?
hot rain 0.1
cold sun 0.2
• Probability that it’s hot?
cold rain 0.3
• Probability that it’s hot OR sunny?

• Typically, the events we care about are partial

assignments, like P(T=hot)
Quiz: Events

• P(+x, +y) ?
X Y P
+x +y 0.2
• P(+x) ? +x -y 0.3
-x +y 0.4
-x -y 0.1

• P(-y OR +x) ?
Marginal Distributions
• Marginal distributions are sub-tables which eliminate variables
• Marginalization (summing out): Combine collapsed rows by adding

T P
T W P hot 0.5
hot sun 0.4 cold 0.5
hot rain 0.1
cold sun 0.2
cold rain 0.3 W P
sun 0.6
rain 0.4
Quiz: Marginal Distributions

X P
+x
X Y P
-x
+x +y 0.2
+x -y 0.3
-x +y 0.4
Y P
-x -y 0.1
+y
-y
Conditional Probabilities
• Derived from the joint probability P(a,b)
• The definition of a conditional probability

P(a) P(b)

T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
Quiz: Conditional Probabilities

• P(+x | +y) ?
X Y P
+x +y 0.2
+x -y 0.3
-x +y 0.4 • P(-x | +y) ?
-x -y 0.1

• P(-y | +x) ?
Conditional Distributions
• Conditional distributions are probability distributions over
some variables given fixed values of others
Conditional Distributions

Joint Distribution
W P
sun 0.8
rain 0.2 T W P
hot sun 0.4
hot rain 0.1
W P cold sun 0.2
sun 0.4 cold rain 0.3
rain 0.6
Conditional Distributions

T W P
hot sun 0.4
W P
hot rain 0.1
sun 0.4
cold sun 0.2
rain 0.6
cold rain 0.3
Normalization Trick

SELECT the joint NORMALIZE the

probabilities selection
T W P (make it sum to one)
matching the
hot sun 0.4 evidence T W P W P
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3
To Normalize
• (Dictionary) To bring or restore to a normal condition
All entries sum to ONE
• Procedure:
• Step 1: Compute Z = sum over all entries
• Step 2: Divide every entry by Z

• Example 2
• Example 1
W P W P T W P T W P
Normalize
sun 0.2 hot sun 20 Normalize hot sun 0.4
sun 0.4
rain 0.3 hot rain 5 hot rain 0.1
Z = 0.5 rain 0.6
cold sun 10 Z = 50 cold sun 0.2
cold rain 15 cold rain 0.3
Quiz: Normalization Trick
• P(X | Y=-y) ?
SELECT the joint NORMALIZE the
probabilities selection
X Y P (make it sum to one)
matching the
+x +y 0.2 X P X P
evidence
+x -y 0.3 +x +x
-x +y 0.4 -x -x
-x -y 0.1
Probabilistic Inference
• Probabilistic inference: compute a desired
probability from other known probabilities (e.g.
conditional from joint)
• We generally compute conditional probabilities
• P(on time | no reported accidents) = 0.90
• These represent the agent’s beliefs given the
evidence
• Probabilities change with new evidence:
• P(on time | no accidents, 5 a.m.) = 0.95
• P(on time | no accidents, 5 a.m., raining) = 0.80
• Observing new evidence causes beliefs to be updated
Inference by Enumeration
* Works fine with
• General case:  We want: multiple query
• Evidence variables: variables, too

• Query* variable:
All variables
• Hidden variables:

 Step 1: Select the  Step 2: Sum out H to get joint  Step 3: Normalize
entries consistent of Query and evidence
with the evidence
Inference by Enumeration
• P(W)? W P
sun
S T W P
rain
summer hot sun 0.30
summer hot rain 0.05
• P(W | winter)?
W P summer cold sun 0.10

sun summer cold rain 0.05

rain winter hot sun 0.10

winter hot rain 0.05
winter cold sun 0.15
• P(W | winter, hot)?
W P winter cold rain 0.20
sun
rain
Inference by Enumeration
 Obvious problems:
 Worst-case time complexity O(dn)
 Space complexity O(dn) to store the joint distribution
The Product Rule
• Sometimes have conditional distributions but want the joint
The Product Rule

• Example:

D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
The Chain Rule
• More generally, can always write any joint distribution as an incremental product of
conditional distributions

• Proof:
Bayes’ Rule
Bayes’ Rule
• Two ways to factor a joint distribution over two variables:

• Dividing, we get:

• Why is this at all helpful?

• Lets us build one conditional from its reverse
• Often one conditional is tricky but the other one is simple
• Foundation of many systems we’ll see later
• In the running for most important AI equation!
Inference with Bayes’ Rule
• Example: Diagnostic probability from causal probability:

• Example:
• M: meningitis, S: stiff neck
Example
givens

=0.0079
• Note: posterior probability of meningitis still very small
• Note: you should still get stiff necks checked out!
Quiz: Bayes’ Rule
• Given: D W P
W P wet sun 0.1
sun 0.8 dry sun 0.9
rain 0.2 wet rain 0.7
dry rain 0.3

• What is P(W | dry) ?

D W P
𝑃 ( 𝑑𝑟𝑦|𝑊 ) 𝑃 ( 𝐷 ,𝑊 )= 𝑃 ( 𝐷|𝑊 ) 𝑃 (𝑊 )
𝑃 ( 𝑊 |𝑑𝑟𝑦 )= 𝑃 (𝑊 ) wet sun 0.08
𝑃 ( 𝑑𝑟𝑦 )
dry sun 0.72
0.78
𝑃 ( 𝑑𝑟𝑦 ) =? wet rain 0.14
dry rain 0.06
Ghostbusters, Revisited
• Let’s say we have two distributions:
• Prior distribution over ghost location: P(G)
• Let’s say this is uniform
• Sensor reading model: P(R | G)
• Given: we know what our sensors do
• R = reading color measured at (1,1)
• E.g. P(R = yellow | G=(1,1)) = 0.1

• We can calculate the posterior distribution P(G|r)

over ghost locations given a reading using Bayes’
rule:

[Demo: Ghostbuster – with probability (L12D2) ]

Video of Demo Ghostbusters with Probability
What did we learn?
• Generative vs Discriminative Models
• MLE and MAP
• Random variables and events
• Maximum Likelihood and Maximum a posteriori estimates
• Probability distribution
• Joint distribution and marginal distribution
• Conditional probabilities
• Probabilistic inference
• Bayes’ rule
• Bayesian inference
What next?
• Class:
• Bayesian networks
• Assignments:
• Programing assignment 1: Perceptron (due Wednesday, Sep-22)
• Written assignment 2: generative models (due Friday, Sep-24)
• Quizzes:
• Quiz 1 (due Tuesday, Sep-28)

L09 Learning I Bayesian Learning
No ratings yet
L09 Learning I Bayesian Learning
66 pages
Probability
No ratings yet
Probability
56 pages
Lecture 05 Reasoning Under Uncertainty
No ratings yet
Lecture 05 Reasoning Under Uncertainty
41 pages
6.1. Quantifying Uncertainty-Probability I (Updated)
No ratings yet
6.1. Quantifying Uncertainty-Probability I (Updated)
23 pages
Lec 12
No ratings yet
Lec 12
54 pages
Artificial Intelligence: Lecture 12 - Probability Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 12 - Probability Dr. Shivanjali Khare
32 pages
SP14 CS188 Lecture 12 - Probability - Print
No ratings yet
SP14 CS188 Lecture 12 - Probability - Print
33 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
ProbabilityStatitic Review
No ratings yet
ProbabilityStatitic Review
41 pages
Lec-1 Probabilistic Models
No ratings yet
Lec-1 Probabilistic Models
29 pages
Lec-1 Probabilistic Models
No ratings yet
Lec-1 Probabilistic Models
29 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
07 Bayesian Networks
No ratings yet
07 Bayesian Networks
106 pages
Outline of The Course: Unknown
No ratings yet
Outline of The Course: Unknown
26 pages
SP14 CS188 Lecture 12 - Probability
No ratings yet
SP14 CS188 Lecture 12 - Probability
35 pages
Current State of The Course!!!: We're Done With Part I Search and Planning! Part II: Probabilistic Reasoning
No ratings yet
Current State of The Course!!!: We're Done With Part I Search and Planning! Part II: Probabilistic Reasoning
30 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Lec6 - Probabilistic Reasoning
No ratings yet
Lec6 - Probabilistic Reasoning
36 pages
Jeff Byers - Machine Learning and Advanced Statitics
No ratings yet
Jeff Byers - Machine Learning and Advanced Statitics
48 pages
CSE3635 Lecture 12 Probability 3
No ratings yet
CSE3635 Lecture 12 Probability 3
33 pages
Lecture # 2-1 Probabilistic Models
No ratings yet
Lecture # 2-1 Probabilistic Models
40 pages
Probability Review
No ratings yet
Probability Review
29 pages
CLASS 2025 Bayesian Framework
No ratings yet
CLASS 2025 Bayesian Framework
46 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
L07 Probabilistic Reasoning Till Sep6
No ratings yet
L07 Probabilistic Reasoning Till Sep6
71 pages
Cs 228
No ratings yet
Cs 228
98 pages
Lecture 2 Annotated
No ratings yet
Lecture 2 Annotated
60 pages
Software Engineer
No ratings yet
Software Engineer
207 pages
L08 Probabilistic Reasoning
No ratings yet
L08 Probabilistic Reasoning
90 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
Likelihood Frequentist
No ratings yet
Likelihood Frequentist
27 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
8 - Probability
No ratings yet
8 - Probability
54 pages
CSE 473: Ar+ficial Intelligence: Bayes' Nets
No ratings yet
CSE 473: Ar+ficial Intelligence: Bayes' Nets
26 pages
Lectures 7 and 8
No ratings yet
Lectures 7 and 8
37 pages
CS115 Probability
No ratings yet
CS115 Probability
41 pages
ML 5
No ratings yet
ML 5
28 pages
CP4252 ML Unit-Iv
No ratings yet
CP4252 ML Unit-Iv
12 pages
Announcements: Released Monday 3/10, 6:30pm-9:30pm
No ratings yet
Announcements: Released Monday 3/10, 6:30pm-9:30pm
40 pages
Unit 3-2
No ratings yet
Unit 3-2
12 pages
Naive Bayes Classifier and Other Topics
No ratings yet
Naive Bayes Classifier and Other Topics
52 pages
תרגול - Bayesian Learning
No ratings yet
תרגול - Bayesian Learning
45 pages
Contact Session6
No ratings yet
Contact Session6
57 pages
CHP: 13 and 14
No ratings yet
CHP: 13 and 14
62 pages
Introduction To Probability Theory: A Short Course On Graphical Models
No ratings yet
Introduction To Probability Theory: A Short Course On Graphical Models
30 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
No ratings yet
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
141 pages
CS 182 Berkeley 2021 Discussion 1
No ratings yet
CS 182 Berkeley 2021 Discussion 1
7 pages
Probability Theory - Towards Data Science
No ratings yet
Probability Theory - Towards Data Science
19 pages
Naive Bays
No ratings yet
Naive Bays
25 pages
MLT Unit 4 Notes
No ratings yet
MLT Unit 4 Notes
26 pages
Dis1 Sol
No ratings yet
Dis1 Sol
9 pages
Essentials of Bayesian Inference 1706204646
No ratings yet
Essentials of Bayesian Inference 1706204646
21 pages
CS6364 Lecture12 - AI Ch13 Prob Reasoning - Rev4
No ratings yet
CS6364 Lecture12 - AI Ch13 Prob Reasoning - Rev4
58 pages
Naive Bayes
No ratings yet
Naive Bayes
25 pages
Lec 12
No ratings yet
Lec 12
15 pages
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
No ratings yet
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
18 pages
Probability Addl
No ratings yet
Probability Addl
22 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
NeuralNets DeepLearning
No ratings yet
NeuralNets DeepLearning
17 pages
Image Augmentation
No ratings yet
Image Augmentation
8 pages
15 Finetune
No ratings yet
15 Finetune
33 pages
Neural Style
No ratings yet
Neural Style
6 pages
Fine Tuning
No ratings yet
Fine Tuning
3 pages
6 Perceptron
No ratings yet
6 Perceptron
32 pages
MLP Scratch
No ratings yet
MLP Scratch
8 pages
MLRD 1
No ratings yet
MLRD 1
28 pages
3 2KNN
No ratings yet
3 2KNN
27 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
A1 Comp 2 BASIS School
No ratings yet
A1 Comp 2 BASIS School
7 pages
PQT - Question Bank
No ratings yet
PQT - Question Bank
10 pages
Binomial and Poisson
No ratings yet
Binomial and Poisson
7 pages
Variance Vs Standard Deviation
No ratings yet
Variance Vs Standard Deviation
2 pages
Continuous Distributions
No ratings yet
Continuous Distributions
42 pages
Erlang B Table
No ratings yet
Erlang B Table
2 pages
7 OLS Assumptions
No ratings yet
7 OLS Assumptions
37 pages
Bsafc4 PPT Ch04
No ratings yet
Bsafc4 PPT Ch04
37 pages
Applied MTCS Seminar Questions Editted
No ratings yet
Applied MTCS Seminar Questions Editted
8 pages
GEC410 Lecture Note II
No ratings yet
GEC410 Lecture Note II
23 pages
APM 504 - PS4 Solutions
No ratings yet
APM 504 - PS4 Solutions
5 pages
QF5211 Lecture01 (2025) - Notes
No ratings yet
QF5211 Lecture01 (2025) - Notes
43 pages
L6 Solutions - Probability - Class XII
No ratings yet
L6 Solutions - Probability - Class XII
34 pages
-读 Bivariate distribution of shear strength parameters using copulas and its impact on geotechnical system reliability
No ratings yet
-读 Bivariate distribution of shear strength parameters using copulas and its impact on geotechnical system reliability
12 pages
Paap 2019-08-15 Peter Glynn
No ratings yet
Paap 2019-08-15 Peter Glynn
46 pages
ARIMA Models: X = X + Z, ∼ W N (0, σ)
No ratings yet
ARIMA Models: X = X + Z, ∼ W N (0, σ)
9 pages
Ise Viii System Modeling and Simulation 06cs82 Sol 240806 065259
No ratings yet
Ise Viii System Modeling and Simulation 06cs82 Sol 240806 065259
68 pages
Grade 10 - Probability
No ratings yet
Grade 10 - Probability
5 pages
ARIMA AR MA ARMA Models
No ratings yet
ARIMA AR MA ARMA Models
46 pages
Introduction To Probability Second Edition Joseph K. Blitzstein 2024 Scribd Download
100% (2)
Introduction To Probability Second Edition Joseph K. Blitzstein 2024 Scribd Download
55 pages
Castillo2020 Bayesian Predictive Optimization of Multiple and Profile Responses Systems in Industry
No ratings yet
Castillo2020 Bayesian Predictive Optimization of Multiple and Profile Responses Systems in Industry
18 pages
MMS Complete Syllabus New
No ratings yet
MMS Complete Syllabus New
126 pages
Pollaczek-Khinchine Formula - Wikipedia
No ratings yet
Pollaczek-Khinchine Formula - Wikipedia
3 pages
Copulas - Course Notes
No ratings yet
Copulas - Course Notes
11 pages
Unit - II Regression-LogisticRegressionModels
No ratings yet
Unit - II Regression-LogisticRegressionModels
7 pages
Standard Normal Cumulative Probability Table
No ratings yet
Standard Normal Cumulative Probability Table
2 pages
Statistics Lectures Slides, 2.4 Random Variables
No ratings yet
Statistics Lectures Slides, 2.4 Random Variables
15 pages
Reliability Part 1 - Partial Factors
No ratings yet
Reliability Part 1 - Partial Factors
38 pages
Geometric Brownian Motion
No ratings yet
Geometric Brownian Motion
2 pages
Bayesian Probability
67% (3)
Bayesian Probability
6 pages

6 Probabilities

Uploaded by

6 Probabilities

Uploaded by

CSCE-421 Machine Learning

4. Inference from Probabilities

Instructor: Guni Sharon, classes: TR 3:55-5:10, HRBB 124

• Maximum Likelihood Estimation

• Find parameters, , that maximize the probability of sampling

Estimating probability of sample in total where in a single

• Assume some underlying parametrized ( )

distribution that comes from

• Find parameters, , that maximize the

• For the coin case, we have a Binomial

MLE for P(outcome=5) ?

• What’s Laplace with k = 0?

• Another option is to consider the maximum a posteriori probability given the

We need to estimate a distribution

• Prove that, in this case,

• Model as a random variable, drawn from a distribution

• the probability of observing given event

• the probability of sampling the entire training set given a sampled

 Sensors are noisy, but we know P(Color | Distance)

OK if all domain entries are unique

• An outcome probability is a single number

• Must have: and

• Size of distribution if n variables with domain sizes d?

• From a joint distribution, we can calculate the probability T W P

• Typically, the events we care about are partial

SELECT the joint NORMALIZE the

sun summer cold rain 0.05

rain winter hot sun 0.10

• Why is this at all helpful?

• What is P(W | dry) ?

• We can calculate the posterior distribution P(G|r)

[Demo: Ghostbuster – with probability (L12D2) ]

You might also like