Bayesian Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 76

What is Bayesian Statistics?

Alireza Akhondi-Asl
MSICU Center For Outcomes
Department of Anesthesiology, Critical Care and Pain Medicine
Learning Objectives

• What is Bayes’ Rule?


• Mechanism of belief update in Bayesian
statistics?
• What are the Differences with the frequentist
statistics?
Probability Interpretations

Frequentist Subjective
Frequentist
Probability
• Relative Frequency of an Event in
long run:

#𝑡𝑖𝑚𝑒𝑠 𝑒 ℎ𝑎𝑝𝑝𝑒𝑛𝑑
𝑃 𝑒 = lim
𝑛→∞ 𝑛
It is inside the head probability

• How strongly do you believe that a patient is going to


survive?
• The probability of the Democrats winning the 2024 US
presidential election.
Subjective
Probability Sometimes it is hard to quantify our belief.

• Thinking about a fair bet.


• Comparing with other events with clear probabilities

We should be coherent.
Conditional probability
S
A

B
Law of total probability
If 𝐴1 , 𝐴2 , …, 𝐴4 is a partition of the sample space, then for
any event B we have:
S
𝐴2 𝐴3
𝐴1

𝐵
𝐴4
Bayes’ Rule
S
A

B
Bayes’ Rule
S
A

B
Bayes’ Rule
S
A

B
Bayes’ Rule S
𝐴2 𝐴3
𝐴1

𝐵
𝐴4
Bayes’ Rule S
𝐴2 𝐴3
𝐴1

𝐵
𝐴4
Bayes’ Rule S
𝐴2 𝐴3
𝐴1

𝐵
𝐴4
Medical Test
• A certain disease affects about 1 out of 1000 people in a
population.
• P()=0.001
• P(☺)=0.999
• There is a test to check whether the person has the
disease. The test has very high sensitivity and specificity. In
particular, we know that:
• P(T+|)=0.98
• P(T+|☺)=0.01
Medical Test

If you test positive for this disease, what


are the chances that you have the disease?

A) 98 Percent
B) Less than 10 percent
Medical Test

P(T+|)P()
P(|T+)=
P(T+)
Medical Test

P(T+|)P()
P(|T+)=
P(T+|)P()+P(T+|☺)P(☺)
Medical Test

P(T+|)P() 0.98×0.001
P(|T+)= = = 0.089
P(T+|)P()+P(T+|☺)P(☺) 0.98×0.001+0.01×0.999
Medical Test

P(T+|)P() 0.98×0.001
P(|T+)= = = 0.089
P(T+|)P()+P(T+|☺)P(☺) 0.98×0.001+0.01×0.999

The test updates your chances of having the diseases from 0.001 to 0.089.
Medical Test

If you test positive for this disease, what


are the chances that you have the disease?

A) 98 Percent
B) Less than 10 percent
What does present evidence
tell?

Richard
Royall’s Three What should we believe?

Questions

What should we do?


Medical Test Paradox
• A second independent test with the same accuracy is done
and it is positive again. What are the chances that you have
the disease?
• A) More than 90 percent
• B) Less than 10 percent
Medical Test Paradox
P(T+|)P()
• P(|T+)=
P(T+|)P()+P(T+|☺)P(☺)
• P()= ?
• P(☺)= ?
Medical Test Paradox
P(T+|)P()
• P(|T+)=
P(T+|)P()+P(T+|☺)P(☺)
• P()= 0.089
• P(☺)= 0.911
Medical Test Paradox
P(T+|)P() 0.98×0.089
• P(|T+)= = = 0.906
P(T+|)P()+P(T+|☺)P(☺) 0.98×0.089+0.01×0.911
• P()= 0.089
• P(☺)= 0.911
Medical Test Paradox
• A second independent test with the same accuracy is done
and it is positive again. What are the chances that you have
the disease?
• A) More than 90 percent
• B) Less than 10 percent
Statistical Analysis

Frequentist Bayesian Likelihoodist


Frequentist
Probabilities are long-
The most popular Parameters are fixed
run relative Data is assumed to be
method for statistical but unknown
frequencies from the random
inference constants
repeated experiments

We cannot make Randomness is


any probability due to sampling
statement about from a fixed
the parameters population.

The uncertainty is
due to sampling
variation.
Frequentist

• 𝑃 𝐷𝑎𝑡𝑎 𝜃
• Maximum Likelihood Estimation
• P-values : 𝑃 𝐷𝑎𝑡𝑎 𝜃 = 𝜃0
• Confidence Intervals, Effect Size
• No probability statement about 𝜃
Bayesian

Probability is interpreted as “degree of


subjective belief”.
• The events do not need to be repeatable.

We don’t know the value of parameters • Epistemic uncertainty.


and therefore, we consider them to be
random variables. • Parameters are probabilistic in nature.

Since we have observed data, it is fixed.

We Update our prior belief based on


observed data. The updated belief is • We use Bayes’ rule to calculate posterior.
called posterior belief
Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule

𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)
Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule

Prior
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)

Prior
Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule

Likelihood Prior
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)

Prior Model
Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule

Likelihood Prior
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)
Evidence

Prior Data Model


Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule

Likelihood Prior
Posterior
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)
Evidence

Prior Data Model Posterior


Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule

Posterior Likelihood Prior


𝑃 𝜃 𝐷𝑎𝑡𝑎 ∝ 𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)

Prior Data Model Posterior


Posterior Distribution

Summarizes
Model
everything
Comparison
we know

Prediction Hypothesis
• Posterior Testing
predictive • Region of
distribution practical
equivalence
Example

A new treatment approach


is proposed. We would like We observe results of
to infer about the success treatment of N patients.
rate of this treatment.
Likelihood
• Since the outcome is binary and samples are
independent, for a fixed number of trials, N, we
can use binomial distribution to describe our data
generation model:
𝑁 𝐸
𝑝 𝐷𝑎𝑡𝑎 𝜃 = 𝑝 𝐸 𝑁, 𝜃 = 𝜃 1 − 𝜃 𝑁−𝐸
𝐸
Example: Frequentist

Our Null Hypothesis


is that 𝜃0 = 0.5
Frequentist
N=6
E=5
CI: 0.36-1.0
෡ =0.833
𝜽
Frequentist
N=18
E=15
CI: 0.59-0.96
෡ =0.833
𝜽
Example: Bayesian

Let’s assume that we We update our belief


believe the success after observing each
rate is around 50%. outcome.

This is our prior


belief before
observing any
data.
Prior
O1=Y
O2=Y
“Today's posterior is tomorrow's prior”
— Lindley
O3=N
O4=Y
O5=Y
O6=Y
O1..6=YYNYYY
Higher resolution
• We need a distribution to describe our prior belief such
that posterior has a closed form distribution
• Beta distribution is an excellent option for parameters in
the range [0,1]
• It is the conjugate prior for binomial distribution.
• Beta prior + binomial = Beta posterior

𝑎−1 𝑏−1
𝑝 𝜃 𝑎, 𝑏 = beta a, b ∝ 𝜃 1−𝜃 ,0 ≤ 𝜃 ≤ 1
𝑎
• Mean=𝑎+𝑏
Conjugate Prior • Mode=𝑎+𝑏−2
𝑎−1

• N samples and E events

𝑝 𝑑𝑎𝑡𝑎 𝜃 = beta a + E, b + N − E

• 𝑎 + 𝑏 is effective sample size of prior


Beta
Distribution

Kruschke, John K. (Ed.) (2014): Doing Bayesian data analysis


𝒃𝒆𝒕𝒂(𝟏. 𝟓, 𝟏. 𝟓)
𝒃𝒆𝒕𝒂(𝟗, 𝟗)
𝒃𝒆𝒕𝒂(𝟗, 𝟗)
𝒃𝒆𝒕𝒂(𝟒. 𝟐, 𝟏𝟑. 𝟖)
𝒃𝒆𝒕𝒂(𝟏𝟑. 𝟖, 𝟒. 𝟐)
𝒃𝒆𝒕𝒂(𝟏. 𝟎, 𝟏. 𝟎)
N=24 and E=7.

Example:
Stopping N is fixed. Binomial

Rules
Negative
E is fixed. Binomial
Fixed N
Fixed E
Uniform
Prior
Subjectivity

• Most serious objection to Bayesian statistics.


• Two observers/researchers can arrive at different
conclusions
• Same statistical model
Problems with • Different priors
Bayesian
Denominator is hard to calculate
Inference • In some cases, we can use conjugate priors
• But in many cases, we cannot
• If the number of parameters are small, we can use grid
approximation
• However, even when we have moderate number of
parameters, it is not practical to use grid
approximation.
Subjectivity

• Most serious objection to Bayesian statistics.


• Two observers/researchers can arrive at different
conclusions
• Same statistical model
Problems with • Different priors
Bayesian
Denominator is hard to calculate
Inference • In some cases, we can use conjugate priors
• But in many cases, we cannot
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
• If the number of parameters are small, we can use grid
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)
=
approximation
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃) • However, even when we have moderate number of
parameters, it is not practical to use grid
‫𝜃׬‬′ 𝑃 𝐷𝑎𝑡𝑎 𝜃 ′ 𝑝 𝜃 ′ 𝑑𝜃′ approximation.
Markov chain Monte Carlo
(​MCMC)
• Metropolis–Hastings
Sampling from • Gibbs sampling
Posterior • JAGS, BUGS

Hamiltonian Monte Carlo


(HMC)
• STAN
• “If you are completely ignorant about which of a set of
Principle of exclusive and exhaustive propositions is true, that you
should assign them equal probabilities that sum to one.”
Indifference SOBER, ELLIOTT (2008): Evidence and evolution. The logic behind the science. Cambridge University Press.
Bayesian Inference Violates Principle of indifference

• Uniform prior.
• We believe all 0 ≤ 𝜃 ≤ 1
have the same prior
probability.
• We might think that this prior
is “uninformative”.
• Change the 𝜃 which is the
probability metric to odds
𝜃
𝑞=
1−𝜃
Weakly Informative

Informative priors

How to set • Prior Studies


• Moment-Matching
the prior? • Expert Knowledge

Objective Priors

• Jeffreys Prior
• Reference Prior
Jeffreys Prior
• Jeffreys proposed an “objective”
prior that is invariant under
monotone transformations of the
parameter.
• Based on Fisher information
• It is not uninformative
• For example, for binomial
distribution, Jeffreys Prior is
beta(0.5,0.5).
Jeffreys
prior
Reading Suggestions
• Kruschke, John K. (Ed.) (2014): Doing Bayesian data analysis.
A tutorial with R, JAGS, and Stan. Academic Press.
• Some of the simulations was based on the codes from this book.
• Lambert, Ben (2018): A student's guide to Bayesian
statistics. 1st. Los Angeles: SAGE.
• McElreath, Richard (2020): Statistical rethinking. A Bayesian
course with examples in R and Stan. Taylor and Francis CRC
Press.
• SOBER, ELLIOTT (2008): Evidence and evolution. The logic
behind the science. Cambridge, UK: Cambridge University
Press.
Conclusions

• Bayesian Statistics is a very flexible approach


• Update our belief after observing data
• Natural statement about the parameters
• Bayesian Inference Violates Principle of indifference
https://xkcd.com/1132/
Thank you!

You might also like