0% found this document useful (0 votes)
29 views27 pages

Bayes

This document provides an overview of probabilistic graphical models and Bayesian inference. It discusses why graphical models are useful, what probability means from frequentist and Bayesian perspectives, and key concepts like prior and posterior distributions, marginal likelihood, Bayesian updating, exchangeability, prediction, model validation, and Bayesian estimation.

Uploaded by

Allen Plato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views27 pages

Bayes

This document provides an overview of probabilistic graphical models and Bayesian inference. It discusses why graphical models are useful, what probability means from frequentist and Bayesian perspectives, and key concepts like prior and posterior distributions, marginal likelihood, Bayesian updating, exchangeability, prediction, model validation, and Bayesian estimation.

Uploaded by

Allen Plato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CSC535: Probabilistic Graphical Models

Bayesian Probability and Statistics

Prof. Jason Pacheco


Why Graphical Models?

Data elements often have dependence arising from structure

Pose Estimation

Protein Structure

Exploit structure to simplify representation and computation


Why “Probabilistic”?

Stochastic processes have many sources of uncertainty

Randomness in Measurement
State of Nature Process
PGMs let us represent and reason about these in structured ways
What is Probability?

What does it mean that the probability of heads is ½ ?

Two schools of thought…


Frequentist Perspective
Proportion of successes (heads) in repeated
trials (coin tosses)

Bayesian Perspective
Belief of outcomes based on assumptions
about nature and the physics of coin flips

Neither is better/worse, but we can compare interpretations…


Administrivia

• HW1 due 11:59pm tonight


• Will accept submissions through Friday, -0.5pts per day late
• HW only worth 4pts so maximum score on Friday is 75%
• Late policy only applies to this HW
Frequentist & Bayesian Modeling

We will use the following notation throughout:


- Unknown (e.g. coin bias) - Data

Frequentist Bayesian
(Conditional Model) (Generative Model)
Prior Belief Likelihood

• is a non-random unknown • is a random variable (latent)


parameter • Requires specifying the
• is the sampling / data prior belief
generating distribution
Frequentist Inference

Example: Suppose we observe the outcome of N coin flips.


. What is the probability of heads (coin bias)?

• Coin bias is not random (e.g. there is some true value)


• Uncertainty reported as confidence interval (typically 95%)
Correct Interpretation: On repeated trials of N coin flips will fall inside
the confidence interval 95% of the time (in the limit)

• Inferences are valid for multiple trials, never on single trials


Wrong Interpretation: For this trial there is a 95% chance falls in the
confidence interval
Bayesian Inference

Posterior distribution is complete representation of uncertainty


Prior Belief
Posterior computed by Bayes’ rule: Likelihood

Marginal Likelihood
(more on this later)
• Must specify a prior belief about coin bias
• Coin bias is a random quantity
• Interval can be reported in lieu of full
posterior, and takes intuitive interpretation for a single trial
Interval Interpretation: For this trial there is a 95% chance that
lies in the interval
Bayesian Inference Example
Getty Images
About 29% of American adults have
high blood pressure (BP). Home test
has 30% false positive rate and no
false negative error.

A recent home test states that you have high


BP. Should you start medication?
Bayesian Inference Example
Getty Images
About 29% of American adults have
high blood pressure (BP). Home test
has 30% false positive rate and no
false negative error.

• Latent quantity of interest is hypertension:


• Measurement of hypertension:
• Prior:
• Likelihood:
Bayesian Inference Example
Getty Images
About 29% of American adults have
high blood pressure (BP). Home test
has 30% false positive rate and no
false negative error.

Suppose we get a positive measurement, then posterior is:

What conclusions can be drawn from this calculation?


Marginal Likelihood

Posterior calculation requires the marginal likelihood,

• Also called the partition function or evidence


• Key quantity for model learning and selection
• NP-hard to compute in general (actually #P)
Example: Consider the vector with binary ,
Bayesian Updating

Consider two conditionally independent observations and , their


joint distribution is:
Probability chain rule

So, conditioned on : Update prior belief after seeing X1

This is proportional to the full posterior by Bayes’ rule:


Normalizer is marginal
likelihood p(X1,X2)

In general, given conditionally independent :


Exchangeability

We often assume the model is invariant to data ordering

Def: Consider N random variables and any permutation


of indices. The variables are exchangeable if every
permutation has equal probability,

• is infinitely exchangeable if every finite subsequence is


exchangeable
• Independence implies exchangeability, but the converse is not true
de Finetti’s Theorem

Simple hierarchical representation for exchangeable models

• Observe: this is the marginal likelihood for a model with prior


• Often used as justification for Bayesian statistics
• Technically only true for infinitely exchangeable sequences but reasonable
approximation for many finite sequences
Posterior Marginal

In hierarchical models a subset of variables may be of interest


Normal distribution with random parameters:

Nuisance variable

Quantity of interest

Marginalize out nuisance variables:

Use of conjugate prior


ensures analytic
posterior
Prediction

Can make predictions of unobserved before seeing any data,


Similar calculation to
marginal likelihood

This is the prior predictive distribution

When we observe we can predict future observations ,

This is the posterior predictive distribution


Prediction Example
Getty Images
About 29% of American adults have
high blood pressure (BP). Home test
has 30% false positive rate and no
false negative error.

What is the likelihood of another positive measurement?

What conclusions can be drawn from this calculation?


Model Validation

How do we know if the model is good?

Supervised Learning
Validation set consists of known . Are true
values typically preferred under the posterior?
Good (maybe lucky) Not Good (maybe unlucky)

Repeat trials over validation set for more certainty


Model Validation

How do we know if the model is good?

Unsupervised Learning
Validation set only contains observable data. Check
validation data against posterior-predictive distribution.
Good (maybe lucky) Not Good (maybe unlucky)

Repeat trials over validation set for more certainty


Likelihood and Odds Ratios

Which parameter value or is more likely to have


generated the observed data ?

The posterior odds ratio is:

Prior Odds Likelihood


Ratio Ratio

Observe: the marginal likelihood cancels!


Bayesian Estimation

Task: produce an estimate of after observing data .

Bayes estimators minimize expected loss function:

Example: Minimum mean squared error (MMSE):

Posterior mean always minimizes squared error.


Bayes Estimation: More Examples

Minimum absolute error:

Note: Same answer for linear function .


Maximum a posteriori (MAP):
Very common to produce maximum probability estimates,

Loss function is degenerate,


Not a Bayes estimator!
(unless discrete)
Posterior Summarization

Ideally we would report the full posterior distribution as the


result of inference…but this is not always possible

Summary of Posterior Location:


Point estimates: mean (MMSE), mode, median (min. absolute
error)

Summary of Posterior Uncertainty:


Credible intervals / regions, posterior entropy, variance

Bayesian analysis should report uncertainty when possible


Credible Interval

Def. For parameter the a credible


interval satisfies,
Interval containing
fixed percentage of
posterior
probability density.

Note: This is not unique -- consider the 95% intervals below:

[Source: Gelman et al., “Bayesian Data Analysis”]


Summary

• Marginal likelihood required for Bayesian inference, which can be hard:

• One exception is posterior odds (used in model selection, hypothesis


testing, …)

• Posterior predictive can be used for model quality in unsupervised


setting:
Summary

• Bayesian estimation minimizes expected loss function:

• Common estimators: Posterior mean  MMSE, Median  MAE


• Posterior uncertainty can be summarized by (not necessarily unique)
credible intervals:

• Interpretation: For this trial parameter lies in interval with specified


probability (e.g. 0.95)

You might also like