0% found this document useful (0 votes)
91 views92 pages

SMBI Ch1 - Introduction To Bayesian Statistics

This document provides an introduction to Bayesian statistics. It discusses the different modes of statistical inference, including the frequentist, likelihood, and Bayesian approaches. The key concept in Bayesian inference is Bayes' theorem, which combines the likelihood of data with prior beliefs to update the probability of parameters and produce a posterior distribution. An example shows how Bayes' theorem can be used to calculate the positive predictive value of a COVID-19 test given its sensitivity, specificity, and disease prevalence. Bayes' rule breaks down the posterior probability into the product of the likelihood and prior, divided by the evidence.

Uploaded by

Hong Kimmeng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views92 pages

SMBI Ch1 - Introduction To Bayesian Statistics

This document provides an introduction to Bayesian statistics. It discusses the different modes of statistical inference, including the frequentist, likelihood, and Bayesian approaches. The key concept in Bayesian inference is Bayes' theorem, which combines the likelihood of data with prior beliefs to update the probability of parameters and produce a posterior distribution. An example shows how Bayes' theorem can be used to calculate the positive predictive value of a COVID-19 test given its sensitivity, specificity, and disease prevalence. Bayes' rule breaks down the posterior probability into the product of the likelihood and prior, divided by the evidence.

Uploaded by

Hong Kimmeng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Simulation Method and Bayesian Inference

Chapter 01
Introduction to Bayesian Statistics

Prepared by Nhim Malai


[email protected]

Department of Applied Mathematics and Statistics


Institute of Technology of Cambodia
SMBI: Chapter 1, Introduction to Bayesian Statistics

References

Hoff, P. D. (2009). A first course in Bayesian statistical methods (Vol. 580). New York:
Springer.

Lesaffre, E., Lawson, A. B. (2012). Bayesian biostatistics. John Wiley Sons.

Srijith, R. Introduction to Computational Statistics for Data Scientists Specialization [MOOC].


Coursera. https://www.coursera.org/specializations/compstats

2 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions


The Binomial Case
The Gaussian Case
The Poisson Case

3 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Modes of Statistical Inference

The central activity in statistics is inference.


Statistical inference is a procedure or a collection of activities with the aim to extract
information from (gathered) data and to generalize the observed results beyond the data
at hand, say to a population or to the future.
Two mainstream paradigms to draw statistical inference:
1 Frequentist approach (aka Classical approach)
2 Bayesian approach
In between these two paradigms is the (pure) likelihood approach.

4 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Frequentist Approach

Classical statistical approach is the method that provides statistical inference based on the
classical P-value, the significance level, the power, and the confidence interval (CI).
Mix of two approaches (Fisher’s approach & Neyman and Pearson’s approach)
Fisher’s Approach
Inductive approach
Introduction of null hypothesis (H0 ), significant test, P-value (= evidence against H0 ), and
significant level. NO alternative hypothesis. NO power.
Neyman and Pearson’s Approach
Deductive approach
Introduction of the alternative hypothesis (HA ), type I error, type II error, power, and
hypothesis test.
In practice the two approaches are mixed.

5 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Likelihood Approach

Inference purely on likelihood function has not been developed to a full-blown statistical
approach.
Considered here as a precursor to the Bayesian approach.
Likelihood function = plausibility of the observed data as a function of the parameters of
the stochastic model.
Likelihood(θ|data) = P(x|θ)
Likelihood does not form a valid probability since the Probability Density Function formed
by varying θ with the observed data does not integrate to 1.

6 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Bayesian Approach

Central idea of Bayesian approach: combine likelihood (data) with your prior knowledge
(prior information) to update information on the parameter to result in a revised
probability associated with the parameter (posterior probability).
Example of Bayesian reasoning in real life:
Tourist: prior views on Cambodian + visit Cambodia (data) ⇒ posterior view on
Cambodian.
Marketing: launching of new energy drink on the market.
Medical: Patients treated for CVA1 with thrombolytic agents suffer from severe bleeding
accident (SBA). Historical studies (20% - prior), pilot study (10% - data) ⇒ posterior

1 CVA - Cerebral Vascular Accident (a brain attack) is an interruption in the flow of blood to cells in the brain.
7 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions


The Binomial Case
The Gaussian Case
The Poisson Case

8 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Bayes’ Rule

Bayes theorem = theorem on inverse probability

P(B|A)P(A)
P(A|B) =
P(B)

Equivalence to,
P(B|A)P(A)
P(A|B) =
P(B|A)P(A) + P(B|Ā)P(Ā)

9 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Example 1.1: Sensitivity, Specificity, and Prevalence

A: “diseased” (D + ); B: “positive diagnostic test” (T + )


Characteristic of diagnostic test:
Sensitivity (Se ) = P(B|A) = P(T + |D + )
Specificity (Sp ) = P(B̄|Ā) = P(T − |D − )
Positive predictive value (pred+) = P(A|B) = P(D + |T + )
Negative predictive value (pred-) = P(Ā|B̄) = P(D − |T − )
Prevalence (prev) = P(A) = P(D + )

10 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Example 1.1: Sensitivity, Specificity, and Prevalence

Bayes’ Rule
P(T + |D + )P(D+)
P(D + |T + ) =
P(T + |D + )P(D + ) + P(T + |D − )P(D − )
In terms of Se , Sp , and prev

Se · prev
prev + =
Se · prev + (1 − Sp ) · (1 − prev )

11 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Example 1.1: COVID-19

We are interested in taking a test to determine if we have COVID-19.


From a sample, P(disease) = 5% and P(no disease) = 95%.
For one of the COVID-19 tests, the test sensitivity is reported to be 80%, and the test
specificity is 98.9%.
We want to know when the test is positive, then are we in fact infected?
P(T + |D − ) = 1 − P(T − |D − ) = 1-0.989 = 0.011
0.8 × 0.05
P(D + |T + ) = = 0.79
0.8 × 0.05 + 0.011 × 0.95

Not 100% but still high at 79%.

12 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Breaking Down Bayes’ Rule

Bayes’ Rule:
P(B|A)P(A)
P(A|B) =
P(B)

Posterior: P(A|B). In Bayesian analysis, we are often looking for the posterior to
represent the distribution of the parameter given the data.
Likelihood: P(B|A) In the future, we will see that the likelihood will represent the
likelihood of observing the data given the parameters.
Prior: P(A). Prior can represent a belief, it can be informed, or vague.
Marginal: P(B). This is a constant and in many analyses may be dropped out.

13 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Exercise 1.1

A car repair shop receives a car with reports of a strange noise coming from the engine. The
shop knows 90% of the cars that come in for “noises” have a loose fan belt while the other
10% have a loose muffler. A common description, 95%, of cars having loose mufflers is the
rattle. Less commonly, 8%, fan belt issues can also sound like a rattle. The car owner is
describing the strange noise as a rattle. What is the probability the car has a loose muffler?
1 78%
2 57%
3 95%

14 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Exercise 1.2

It is estimated that 80% of emails are spam. You have developed a new algorithm to detect
spam. Your spam software can detect 99% of spam emails but has a false positive rate of 5%.
Your company receives 1000 emails in a day, how many emails will be incorrectly marked as
spam?
1 10
2 20
3 5
4 200
5 50

15 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Exercise 1.3

You have developed a new algorithm for detecting fraud. It has a sensitivity of 90% with a
specificity of 95%. Choose the correct statement:
1 true positive rate = 90%, true negative rate = 5%
2 true positive rate = 90%, true negative rate = 95%

16 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions


The Binomial Case
The Gaussian Case
The Poisson Case

17 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distribution

To describe the number of successful p in n total events. Each draw is an independent


Bernoulli event.  
n p
P(y |θ) = θ (1 − θ)n−p
p
Mean = nθ
Variance = nθ(1 − θ)
Example: To model the number of successful outcomes in a drug trial.
Conditions
Discrete data
Two possible outcomes for each trial
Each trial is independent
The probability of success/failure is the same in each trial

18 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distribution [Python Code]

from __future__ import print_function


from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import numpy as np
import scipy
from scipy.special import gamma, factorial, comb
import plotly.express as px
import plotly.offline as pyo
import plotly.graph_objs as go
# Set notebook mode to work in offline
pyo.init_notebook_mode()

INTERACT_FLAG = True

19 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distribution [Python Code]

def binomial_vector_over_y(theta, n):


total_events = n
y = np.linspace(0, total_events , total_events + 1)
p_y = [comb(int(total_events), int(yelem)) * theta** yelem *
(1 - theta)**(total_events - yelem) for yelem in y]

fig = px.line(x=y, y=p_y, color_discrete_sequence=["steelblue"],


height=600, width=800,
title=" Binomial distribution for theta = %lf, n = %d" %(theta, n))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "y"
fig.layout.yaxis.title.text = "P(y)"
fig.show()

if(INTERACT_FLAG):
interact(binomial_vector_over_y, theta=0.5, n=15)
else:
binomial_vector_over_y(theta=0.5, n=10)

20 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distributions

21 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Negative Binomial Distribution

To describe the number of successes (r − 1) and x failures in (x + r − 1) trials, until you


have a success on the (x + r )th trail.
 
x +r −1 r
P(x|θ) = θ (1 − θ)x
r −1

Mean = r (1 − θ)/θ
Variance = r (1 − θ)/θ2
Example: To measure the number of days your car would work before it breaks down for
the 3rd time.
Conditions
Count of discrete events
The events can be non-independent (the events can influence or cause other events)
Variance can exceed the mean

22 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Negative Binomial Distribution [Python Code]


def negative_binomial_vector_over_y(theta, total_events):
# total_events = x + r
# for a fixed number of events, what is the probability of seeing 'x' failures
# number of successes 'r' is therefore total_events - x
# theta is the probability of the success event

x = np.linspace(0, total_events , total_events + 1)


p_x = [comb(int(total_events - 1), int(total_events - xelem - 1)) * theta**
(total_events - xelem) * (1 - theta)**(xelem) for xelem in x]

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],


height=600, width=800,
title="Negative Binomial distribution for theta = %lf, total_events = %d"
%(theta, total_events))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x = number of failures"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(negative_binomial_vector_over_y, theta=0.9, total_events=15)
else:
negative_binomial_vector_over_y(theta=0.9, total_events=15)
23 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Negative Binomial Distribution

24 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Poisson Distribution

To indicate the probability of a number of events.

P(y |θ) = θy e −θ /y ! x = 0, 1, 2, ...

Mean = Variance = λ
Example: To model the number of accidents at an intersection. To model the number of
Salmonella outbreaks in a year.
Conditions
Discrete non-negative data - count of events, the rate parameter can be a non-integer
positive value
Each event is independent of other events
Each event happens at a fixed rate
A fixed amount of time in which the events occur

25 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Poisson Distribution [Python Code]

def poisson_vector(theta, y_end):


y = np.linspace(0,y_end,y_end+1)

p_theta = (theta**y * np.exp(-theta)) / factorial(y)

# y is the number of events


# y_end is how far you want to compute y values
fig = px.line(x=y, y=p_theta, color_discrete_sequence=["steelblue"],
height=600, width=800, title=" Poisson distribution for theta = %d" %(theta))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "y"
fig.layout.yaxis.title.text = "P(y)"
fig.show()

if(INTERACT_FLAG == True):
interact(poisson_vector, theta=7, y_end=20)
else:
poisson_vector(theta=7, y_end=20)

26 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Poisson Distribution

27 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Exponential Distribution

To model the duration of events.


P(x) = λe −λx
Mean = 1/λ
Variance = 1/λ2
Example: Time to failure for the radiator in a car.
Conditions
Continuous non-negative data
Time between events are considered to happen at a constant rate
Events are considered to be independent

28 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Exponential Distribution [Python Code]

def exponential_distribution(lambda_rate, x_end):

x = np.linspace(0,x_end,x_end*4)

p_x = lambda_rate * np.exp(-lambda_rate * x)

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],


height=600, width=800, title=" Exponential distribution for lambda = %lf" %(lambda_rate))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(exponential_distribution, lambda_rate = 4, x_end=20)
else:
exponential_distribution(lambda_rate = 0.2, x_end=20)

29 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Exponential Distribution

30 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Gamma Distribution

To model the time taken for n independent events to occur.


β α α−1 −βx
P(x) = x e
Γ(α)

α > 0 is the shape parameter; β > 0 is the rate parameter, or the inverse scale parameter
Mean = α/β
Variance = α/β 2
Example: To model the time taken for 4 bolts in your car to fail.
Conditions
Continuous non-negative data
A generalization of the exponential distribution, but more parameters to fit,
An exponential distribution models the time to the first event, Gamma distribution models
the time to the “n” event.

31 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Gamma Distribution [Python Code]

def gamma_individual(a, b, x_max):


x = np.arange(0,x_max,0.1)

term = b**a /gamma(a)


p_x = term * x**(a - 1) * np.exp(-b * x)

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],


height=600, width=800,
title=" Gamma distribution for a (num events) = %d, b (rate of events) = %d" %(a, b))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x (wait times)"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(gamma_individual,a=2,b=1,x_max=10)
else:
gamma_individual(2,1,10)

32 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Gamma Distribution

Figure: Gamma distribution by varying parameters α and β

.
33 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Normal Distribution (Gaussian)

To model real-valued random variable.


1 2 2
P(x) = √ e −(x−µ) /2σ
σ 2π
Mean = µ
Variance = σ 2
Example: The heights of men in your state can be represented by a normal distribution.
Conditions
Continuous
Unbounded distribution
Outliers are minimal

34 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Normal Distribution (Gaussian) [Python Code]

import math

def normal_distribution(mean, sigma):

x = np.linspace(-4*sigma + mean ,4*sigma + mean, 50*sigma)

p_x = np.exp(-(x - mean)**2 / (2*sigma*sigma)) / (sigma * np.sqrt(2.0 * math.pi ))

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],


height=600, width=800,
title=" Exponential distribution for mean = %lf, sigma = %lf" %(mean, sigma))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(normal_distribution, mean = 4, sigma = 3)
else:
normal_distribution(mean = 5, sigma = 4)

35 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Normal Distribution

36 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution

To model a right-skewed continuous random variable (long-tail towards the right).

1 2 2
) /2σ 2
P(x) = √ e −(ln(x)−µ
xσ 2π

X ∼ lognormal(µ, σ 2 ) then log(X ) ∼ N(µ, σ 2 )


Mean = exp(µ + σ 2 /2)
Variance = exp(2µ + σ 2 )(exp(σ 2 ) − 1)
To model the disease parameters such as the reproduction number for epidemics.
Conditions
Continuous non-negative values
Asymmetric unlike the Normal distribution

37 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution [Python Code]

import math

def lognormal_distribution(mean, sigma):

x = np.linspace(0.1,2.5,100)

p_x = np.exp(-(np.log(x) - mean)**2 / (2*sigma*sigma)) / (x * sigma * np.sqrt(2.0 * math.pi ))

mode = np.exp(mean - sigma**2)


fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],
height=600, width=800,
title=" Lognormal distribution for mean = %lf, sigma = %lf, mode = %lf" %(mean, sigma, mode))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

38 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution [Python Code]


# OPTION 1, if you want to provide the lognormal mean_x and std deviation_x
# OPTION 2, if one wants to select a mean based on a desired mode of the lognormal distribution
# and a given standard deviation.

OPTION = 2

if(OPTION == 1):
mean_x = 2 # CHANGE THIS
sigma_x = 2 # CHANGE THIS
mean = np.log(mean_x**2 / (np.sqrt(mean_x**2 + sigma_x**2)))
sigma = np.log(1 + (sigma_x**2 / mean_x**2))
else:
sigma = 0.2 # CHANGE THIS
mode = 0.8 # CHANGE THIS
mean = np.log(mode + sigma**2)

#print("Mean %lf, sigma %lf, mode %lf "%(mean, sigma, mode))

if(INTERACT_FLAG):
interact(lognormal_distribution, mean = 1, sigma = 0.25)
else:
lognormal_distribution(mean = mean, sigma = sigma)

39 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution

40 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution

Similar to the normal distribution with its bell shape but has heavier tails.
r
Γ( ν+1
2 ) λ λ(x − µ)2 −(ν+1)/2
p(x) = ν (1 + )
Γ( 2 ) νπ ν

Mean = µ
Variance = µ/(ν − 2)λ
Example: A distribution of test scores from an exam which has a significant number of
outliers and would not be appropriate for a Normal distribution
Conditions
Continuous data
Unbounded distribution
Considered an overdispersed Normal distribution, a mixture of individual normal distributions
with different variances

41 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution [Python Code]

def studentst_distribution(v):

t = np.linspace(-10,10,100)

term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))


term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t = term1 * term2

fig = px.line(x=t, y=p_t, color_discrete_sequence=["steelblue"],


height=600, width=800, title=" Student's t-distribution for v = %lf" %(v))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "t"
fig.layout.yaxis.title.text = "P(t)"
fig.show()

if(INTERACT_FLAG == True):
interact(studentst_distribution, v=10)
else:
studentst_distribution(v=10)

42 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution [Python Code]


import plotly.graph_objects as go
t = np.linspace(-10,10,100)
fig = go.Figure()

v = 1
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t, line=go.scatter.Line(color="gray"), showlegend=True))
fig.add_scatter(x=t, y=p_t, name="v=1", mode="lines")

v = 4
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t2 = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t2, line=go.scatter.Line(color="blue"), showlegend=True))
fig.add_scatter(x=t, y=p_t2, name="v=2", mode="lines")

v = 10
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t2 = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t2, line=go.scatter.Line(color="blue"), showlegend=True))
fig.add_scatter(x=t, y=p_t2, name="v=10", mode="lines")
43 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution

44 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Beta Distribution

To model continuous random variables whose range is between 0 and 1.

Γ(a + b) a−1
P(θ|a, b) = θ (1 − θ)b−1
Γ(a)Γ(b)
a
Mean = a+b
ab
Variance = (a+b)2 (a+b+1)
Example: in Bayesian analyses, the beta distribution is often used as a prior distribution of
the parameter p (which is bounded between 0 and 1) of the binomial distribution.
Conditions
Takes positive values between 0 and 1 as input
Setting a and b to 1 gives you a uniform distribution

45 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Beta Distribution
# Beta posterior with uniform Beta prior, a=1, b=1
def beta_vector_theta(num_p, total, a, b):
alpha = num_p + a
beta = total - num_p + b
theta = np.linspace(0,1,25)

print("Posterior a =",alpha)
print("Posterior b =",beta)

term = gamma(alpha + beta) / ( gamma(alpha) * gamma(beta) )


p_theta = term * theta**(alpha - 1) * (1 - theta)**(beta - 1)

fig = px.line(x=theta, y=p_theta, color_discrete_sequence=["steelblue"],


height=600, width=800,
title=" Beta dist. for total # of events=%d, # of positive events=%d" %(total, num_p))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "theta"
fig.layout.yaxis.title.text = "P(theta)"
fig.show()

if(INTERACT_FLAG):
interact(beta_vector_theta, num_p = 4, total=10, a=1, b=1)
else:
beta_vector_theta(num_p = 4, total=10, a=1, b=1)
46 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Beta Distribution

Figure: Beta distribution by varying parameters α (a) and β (b)

. 47 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions


The Binomial Case
The Gaussian Case
The Poisson Case

48 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Priors
Bayes’ Rule gives us a method to update our beliefs based on prior knowledge.
Prior is the unconditional probability of the parameters before the (new) data.
Prior can come from a number of sources including:
past experiments or experience
some sort of desire for balance or weighting in a decision
non-informative, but objective
mathematical convenience
The choice of prior is as much about what is currently known about the parameters. It is
often subjective and contested. Two broad types of prior:
1 non-informative
2 informative
The prior can be proper, i.e. conform to the rules of probability and integrate to 1, or
improper.
Convenient choice of priors can lead to closed-form solutions for the posterior.
49 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Conjugate priors

Conjugate priors are priors that induce a known (same family as prior) distribution in the
posterior.
Example:
Data: X ∼ Bern(θ) Q
Likelihood: f (x|θ) = ni=1 θix (1 − θ)1−xi = θk (1 − θ)n−k , where k =
P
xi .
Posterior = Likelihood × prior

(x|θ) · p(θ) ∝ [θk (1 − θ)n−k ][θα−1 (1 − θ)β−1 ] = θα+k−1 (1 − θ)β+n−k−1

We recognize as Beta(α + k, β + n − k).


If we are modeling a Bernoulli process and use a prior with a Beta distribution, our posterior
has a Beta distribution.

50 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Conjugate priors (Cont.)

Common conjugate priors by likelihood type:

51 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Non-informative priors

Non-informative priors are priors that suggest ignorance as to the parameters. These are
sometimes called vague or diffuse priors.
The priors generally cover the region of the parameter space relatively smoothly.
Common non-informative priors: U[−100, 100], N[0, 104 ].

52 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Jeffrey’s non-informative prior

Jeffrey’s prior is a non-informative prior that is derived from the Fisher information.
We do not specify prior information. We use the data information to shape the prior.
Fisher’s information In (θ) tells us how much information about θ is included in the data.
Jeffery’s prior is derived by:
p ∂ ln f (θ) ∂ 2 lnL(θ)
p(θ) ∝ In (θ), where In (θ) = Eθ [ ] = −Eθ [ ]
∂θ ∂θ2

53 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Jeffrey’s non-informative prior

Example:
Data: X ∼ gamma(α, β), assuming α is known and β is unknown.
Fisher’s information, In (β) = nα
β 2 leading to the Jeffrey’s prior for β:

nα √ 2 √
r
p(β) ∝ 2
= nσ = σ n ∝ 1
β

Note: Jeffrey’s priors are not guaranteed to be a proper prior. Perhaps most importantly,
Jeffrey’s priors are stable under reparameterization.

54 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Informative priors

Informative priors are explicitly chosen to represent current knowledge or belief about the
parameter of interest.
When choosing informative priors, one can also choose the form of prior.
Example: Tossing Coins
We were given a new coin and were told it would generate heads with P(heads) = 0.75.
We conduct a new experiment to characterize the distribution of θ.
When dealing with Bernoulli trials, a computationally convenient on the prior is beta(a, b).

55 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Example: Tossing Coins

What do we choose for the parameters of the


beta distribution?
P(heads) = 0.75
To incorporate the information, we might
use beta with a mean close to 0.75.
We also have the ability to choose the
precision/scale that represents some
amount of disbelief in the unfairness of the
coin.
In this case, we can use beta(6.9, 3) or
beta(16,6).

56 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Example: Tossing Coins [Python Code]

from scipy.stats import beta


import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0,1,100)

y1 = beta.pdf(x, 0.5, 0.5)


y2 = beta.pdf(x, 1, 1)
y3 = beta.pdf(x, 3, 3)
y4 = beta.pdf(x, 6.9, 3)
y5 = beta.pdf(x, 16, 6)

plt.plot(x, y1, "-", label="beta(0.5,0.5)")


plt.plot(x, y2, "r--", label="beta(1,1)")
plt.plot(x, y3, "g--", label="beta(3,3)")
plt.plot(x, y4, "b--", label="beta(6.9,3)")
plt.plot(x, y5, "y--", label="beta(16, 6)")

plt.legend(loc="upper left")
plt.show()

57 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Weakly informative or vague priors

We can tune our prior belief using the mean or even mode to center our belief and
variance as a measure of the strength of belief.
Example: Tossing Coins
For prior, beta(6.9,3):

a a−1 ab
E [x] = = 0.70 mode(x) = = 0.77 V (x) = = 0.02
a+b a+b−2 (a + b)2 (a + b + 1)

For prior, beta(16,6): E (x) = 0.73, mode(x) = 0.75, V (x) = 0.0086


Tuning the prior to include slightly more confidence in the prior information may suggest a
beta(16,6).
These priors are vague in that the mass of the prior is still diffuse allowing the data to
drive the posterior through the likelihood.

58 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Sensitivity Analysis of Priors

The general approach to using priors in models is to start with some justification for a
prior, run the analysis, then come up with competing priors and re-examine the
conclusions under the alternative priors.
Many Bayesian experts recommend that a sensitivity analysis should always be conducted.
The process takes place as follows:
The researcher predetermines a set of priors to use for model estimation.
The model is estimated, and convergence is obtained for all model parameters.
The researcher comes up with a set of competing priors to examine.
Results are obtained for the competing priors and then compared with the original results
through a series of visual and statistical comparisons.
The final model results are written up to reflect the original model results (obtained in Item
1, from the original priors), and the sensitivity analysis results are also presented in order to
comment on how robust (or not) the final model results are to different prior settings.

59 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions


The Binomial Case
The Gaussian Case
The Poisson Case

60 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case

Example: Stoke Study


First interim analysis ECASS 3: 50 rt-PA2 patients with 10 SICHs3 .
Historical data ECASS 2: 100 rt-PA patients with 8 SICHs.
Estimate risk for SICH in ECASS 3 to construct stopping rule of rt-PA.

2 rt-PA is a treatment for stroke.


3 SICH is a side effect.
61 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case (Cont.)

Prior: ECASS 2 Study


n0 = 100 and x0 = 8
Kernel of binomial likelihood, θ0x (1 − θ)n0 −x0 ∝ Beta(α0 = x0 + 1, β0 = n0 − x0 + 1)

1 Γ(α0 )Γ(β0 )
p(θ) = θα0 −1 (1 − θ)β0 −1 , B(α0 , β0 ) =
B(α0 , β0 ) Γ(α0 + β0 )

Likelihood: ECASS 3 Study


n = 50 and x = 10
n

Likelihood: L(θ|x) = x θx (1 − θ)n−x

62 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case (Cont.)

Posterior distribution
p(θ|x) ∝ L(θ|x)p(x) ∝ Beta(α, β)

1 Γ(α)Γ(β)
p(θ|x) = θα−1 (1 − θ)β−1 , B(α, β) =
B(α, β) Γ(α + β)

where α = α0 + x and β = β0 + n − x

63 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Example: Stroke Study

Example: Stoke Study


Prior: n0 = 100 and x0 = 8, beta(9, 93)
Likelihood: n = 50 and x = 10,
binomial(50, 0.2)
Posterior: beta(19, 133)

64 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Example: Stroke Study [Python Code]

x = np.linspace(0, 1, 100)

# prior: ECASS 2 study


n0 = 8
y0 = 100
alpha0 = n0 + 1 # 8+1=9
beta0 = y0 - n0 + 1 # 100-8+1=93
prior = beta.pdf(x, alpha0, beta0)
plt.plot(x, prior, "r-", label="prior = beta(9, 93)")

# likelihood: ECASS 3 study


n = 50
y = 10
# x play role as p
likelihood = comb(50,10)*(x)**(10)*(1-x)**(50-10)

def f(p):
return p**(10) * (1-p)**(50-10)

65 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Example: Stroke Study [Python Code]

p = sy.Symbol("p")
I = sy.integrate(f(p), (p, 0, 1))
print(I)
scaled_likelihood = (x)**(10)*(1-x)**(50-10) * (523886186670)

plt.plot(x, likelihood, "b-", label="likelihood of bin(n=50, y=10)")


plt.plot(x, scaled_likelihood, "b--", label="scaled likelihood (AUC=1)")

# posterior: likelihood*prior
alpha1 = alpha0 + y # 9+10=19
beta1 = beta0 + n - y # 93+50-10=133
posterior = beta.pdf(x, alpha1, beta1)
plt.plot(x, posterior, "r--", label="posterior = beta(19, 133)")

plt.legend(loc="upper right")
plt.ylim(-0.1, 20)
plt.xlim(0,0.4)
plt.show()

66 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Characteristics of the Posterior Distribution

Posterior = compromise between prior & likelihood.


n0 n
Posterior mode: θ̄ = n0 +n θ0 + n0 +n θ̂ (analogous result for mean).
Shrinkage: θ0 ≤ θ̄ ≤ θ̂ when (x0 /n0 ≤ y /n).
Here: posterior more peaked than prior & likelihood (not in general).
Likelihood dominates the prior for large sample sizes.
Posterior = beta distribution = prior (conjugacy).
Posterior estimate θ = MLE of combined ECASS 2 data and interim data ECASS 3.

67 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Equivalence of Prior Information and Extra Data

Beta(α0 , β0 ) prior ∝ binomial experiment with (α0 − 1) success in (α0 + β0 − 2)


experiments.
⇒≈ extra data to observed data set: (α0 − 1) successes and (β0 − 1) failures.

68 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case: No Prior Information Is Available

Example: Stoke Study


Suppose no prior information is available.
Need a prior distribution that expresses
ignorance = non-informative (NI) prior.
For stroke study: NI prior = p(theta) =
I[0,1] = flat prior on [0,1].
Prior: Uniform prior on [0,1] = beta(1, 1)
Likelihood: n = 50 and x = 10,
binomial(50, 0.2)
Posterior: beta(11, 41)

69 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

No Prior Information Is Available [Python Code]


x = np.linspace(0, 1, 100)

# prior: no priror information is available


alpha0 = 1
beta0 = 1
prior = beta.pdf(x, 1, 1)
plt.plot(x, prior, "r-", label="prior = beta(1, 1)")

# likelihood: ECASS 3 study


n = 50
y = 10
# x play role as p
likelihood = comb(50,10)*(x)**(10)*(1-x)**(50-10)

def f(p):
return p**(10) * (1-p)**(50-10)

p = sy.Symbol("p")
I = sy.integrate(f(p), (p, 0, 1))
print(I)
scaled_likelihood = (x)**(10)*(1-x)**(50-10) * (523886186670)

70 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

No Prior Information Is Available [Python Code]

#plt.plot(x, likelihood, "b-", label="likelihood of bin(n=50, y=10)")


plt.plot(x, scaled_likelihood, "b--", label="scaled likelihood (AUC=1)")

# posterior: likelihood*prior
alpha1 = alpha0 + y # 1+10=11
beta1 = beta0 + n - y # 1+50-10=41
posterior = beta.pdf(x, alpha1, beta1)
plt.plot(x, posterior, "r--", label="posterior = beta(11, 41)")

plt.legend(loc="upper right")
plt.ylim(-0.1, 20)
plt.xlim(0,0.4)
plt.xlabel('theta')
plt.show()

71 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Exercise: Tossing Coin

Consider you are doing a coin toss experiment. You are given a presumably unfair coin with
p(heads) = 0.80 from 20 coins tossed. You are now collecting new data and analyzing the
posterior by doing 10 coin tosses and getting 4 heads.
A. Choose the distribution for your prior and construct your posterior distribution.
B. In case no prior information is available, construct your posterior distribution.

72 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case

Example: Dietary Study


IBBENS Study: a dietary survey in Belgium
Of interest: intake of cholesterol
Monitoring dietary behavior in Belgium: IBBENS-2 study

73 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case: Prior Distribution

Histogram of the dietary cholesterol of 563 bank employees approximately follows a


normal distribution.
Y ∼ N(µ, σ 2 ) where the pdf of Y

1 (y −µ)2
f (y ) = √ e − 2σ2
2πσ
Sample y1 , ..., yn the we obtained the likelihood:
" n
# "  2 #
1 X 1 µ − ȳ
L(µ|y ) ∝ exp − 2 (yi − µ)2 ∝ exp − √ ≡ L(µ|ȳ )
2σ 2 σ/ n
i=1

74 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Histogram and Likelihood IBBENS Study

75 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case: Prior Distribution

Denote sample n0 IBBENS data: y 0 ≡ {y0,1 , y0,2 , ..., y0,n0 } with mean ȳ0
Likelihood ∝ N(µ0 , σ02 )
µ0 ≡ ȳ0 = 328 √

σ0 = σ/ n0 = 120.3/ 563 = 5.072
IBBENS prior distribution
"  2 #
1 1 µ − µ0
p(µ) = √ exp −
2πσ0 2 σ0

with µ0 ≡ ȳ0

76 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Constructing the Posterior Distribution

IBBENS-2 Study:
Sample y with n = 50
ȳ = 318 mg/day and s = 119.5 mg/day
The 95% confidence interval = [284.3, 351.9] mg/day ⇒ wide
Combine IBBENS prior distribution and IBBENS-2 Normal likelihood:
IBBENS-2 likelihood: L(y |ŷ )
IBBENS prior density: N(µ0 , σ02 )
Posterior distribution ∝ p(µ)L(µ|ȳ ):
( " 2  2 #)
1 µ − µ0 µ − ȳ
p(µ|y ) ∝ p(µ|ȳ ) ≡ exp − + √
2 σ0 σ/ n

77 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Constructing the Posterior Distribution

Integration constant to obtain density?


Recognize standard distribution: exponent (quadratic function of µ)
Posterior distribution:
p(µ|y ) = N(µ̄, σ̄ 2 )
with
1
σ02
+ σn2 ȳ 1
µ̄ = 1 and σ̄ 2 =
σ02
+ σn2 1
σ02
+ n
σ2

Hence µ̄ = 327.2 and σ̄ = 4.79

78 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

IBBENS-2 Posterior Distribution

79 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Characteristics of the Posterior Distribution

Posterior distribution: a compromise between prior and likelihood.


Posterior mean: weighted average of prior and the sample mean
w0 w1
µ̄ = µ0 + ȳ
w0 + w1 w0 + w1
with weight
1 1
w0 = and 2
σ02 σ /n
The posterior precision = 1/posterior variance:

1
= w0 + w1
σ̄ 2
with w0 = 1/σ02 = prior precision and w1 = 1/(σ 2 /n) = sample precision.

80 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Characteristics of the Posterior Distribution

Posterior is always more peaked than prior likelihood.


When n → ∞ or σ0 → ∞: p(µ|y ) = N(ȳ , σ 2 /n)
When the sample size increases the likelihood dominates the prior.
Posterior = normal = prior ⇒ conjugacy.

81 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Equivalence of Prior Information and Extra Data

Prior variance σ02 = σ 2 ⇒ σ¯2 /(n + 1) ⇒ Prior information = adding one extra observation
to the sample.
General: σ02 = sigma2 /n0 , with n0 general
n0 n
µ̄ = µ0 + ȳ
n0 + n n0 + n
and
σ2
σ̄ 2 =
n0 + n

82 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case: No Prior Information Available


Non-informative prior: σ02 → ∞ ⇒ Posterior: N(ȳ , σ 2 /n)

83 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Exercise: Midge Wing Length

Grogan and Wirth (1981) provided data on the wing length in


millimeters of nine members of a species of midge (small,
two-winged flies). From these nine measurements (1.64, 1.70,
1.72, 1.74, 1.82, 1.82, 1.82, 1.90. 2.08) we wish to make an
inference on the population mean θ. Studies from other
populations suggest that wing lengths are typically around
1.9mm and must be positive. We also know that most of the
probability (95%) is within two standard deviations of the
mean. Obtain and plot the posterior distribution of midge
wing length.

84 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Poisson Case

Take y ≡ {y1 , ..., yn } independent counts ⇒ Poisson distribution


Poisson (θ)
θy e −θ
p(y |θ) =
y!
Mean and variance = θ
Poisson likelihood
n n  yi 
Y Y θ
L(θ|y ) ≡ p(yi |θ) = e −nθ
yi !
i=1 i=1

85 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Example: Describing Caries Experience in Flanders

The Signal-Tandmobiel (STM) study:


Longitudinal oral health study in Flanders.
Annual examinations from 1996 to 2001.
4468 children (7% of children born in
1989)
Caries experience measured by dmft-index
(min=0, max=20)
Frequentist and Likelihood Calculations
MLE of θ: θ̂ = ȳ = 2.24
Likelihood-based 95% confidence interval
for θ: [2.1984, 2.2875].

86 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

The Poisson Case: Specifying Prior Distribution

Information from literature:


Average dmft-index 4.1 (Liège) & 1.39 (Gent,
1994)
Oral hygiene has improved considerably in Flanders
Average dmft-index bounded above by 10.
Candidate for prior: Gamma(α0 , β0 )

β0α0 α0 −1 −β0 θ
p(θ) = θ e
Γ(α0 )

α0 = shape parameter & β0 = inverse of scale


parameter
E (θ0 ) = α0 /β0 & Var (θ) = α0 /β02
STM study: α0 = 3 & β0 = 1.
87 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Constructing the Posterior Distribution

Posterior
n
Y θ α0 α0 P
p(θ|y ) ∝ L(θ|y )p(θ) ∝ e −nθ (θiy /yi !) 0 θα −1 e −β0 θ ∝ θ( yi +α0 )−1 e −(n+β0 )θ
Γ(α0 )
i=1
P
Recognize kernel of a Gamma( yi + α0 , n + β0 ) distribution

β̄ ᾱ ᾱ−1 −β̄θ
⇒ p(θ|y ) ≡ p(θ|ȳ ) = θ e
Γ(ᾱ)
P
with ᾱ = yi + α0 = 9758 + 3 = 9761 and β̄ = n + β0 = 4351 + 1 = 4352 ⇒ STM
Study: The effect of the prior is minimal.

88 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Characteristics of the Posterior Distribution

Posterior is a compromise between prior and likelihood.


Posterior mode and mean demonstrate shrinkage.
For the STM study posterior more peaked than prior and likelihood, but not in general.
Prior is dominated by likelihood for large sample size.
Posterior = gamma = prior ⇒ conjugacy.

89 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Equivalence of Prior Information and Extra Data

Prior = equivalent to experiment of size β0 with counts summing up to α0 − 1


STM study: prior corresponds to an experiment of size 1 with a count equal to 2.

90 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

The Poisson Case: No Prior Information Available


Gamma with α0 ≈ 1 and β0 ≈ 0 = non-informative prior.

91 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Exercise: Road Accidents in Cambodia

The bar chart on the right shows the number


of injuries caused by road accidents in
Cambodia from 2014 to 2019. We are
interested in making inferences on the average
of fatalities at year level. The experts suggest
that road accidents kill more or less 2,000
people every year. Construct the posterior
distribution of the average fatality.
Figure: Source: UNDP: Road Traffic Accidents in
Cambodia

92 / 92

You might also like