0% found this document useful (0 votes)

91 views92 pages

SMBI Ch1 - Introduction To Bayesian Statistics

This document provides an introduction to Bayesian statistics. It discusses the different modes of statistical inference, including the frequentist, likelihood, and Bayesian approaches. The key concept in Bayesian inference is Bayes' theorem, which combines the likelihood of data with prior beliefs to update the probability of parameters and produce a posterior distribution. An example shows how Bayes' theorem can be used to calculate the positive predictive value of a COVID-19 test given its sensitivity, specificity, and disease prevalence. Bayes' rule breaks down the posterior probability into the product of the likelihood and prior, divided by the evidence.

Uploaded by

Hong Kimmeng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views92 pages

SMBI Ch1 - Introduction To Bayesian Statistics

Uploaded by

Hong Kimmeng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Simulation Method and Bayesian Inference

Chapter 01
Introduction to Bayesian Statistics

Prepared by Nhim Malai

[email protected]

Department of Applied Mathematics and Statistics

Institute of Technology of Cambodia
SMBI: Chapter 1, Introduction to Bayesian Statistics

References

Hoff, P. D. (2009). A first course in Bayesian statistical methods (Vol. 580). New York:
Springer.

Lesaffre, E., Lawson, A. B. (2012). Bayesian biostatistics. John Wiley Sons.

Srijith, R. Introduction to Computational Statistics for Data Scientists Specialization [MOOC].

Coursera. https://www.coursera.org/specializations/compstats

2 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions

The Binomial Case
The Gaussian Case
The Poisson Case

3 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Modes of Statistical Inference

The central activity in statistics is inference.

Statistical inference is a procedure or a collection of activities with the aim to extract
information from (gathered) data and to generalize the observed results beyond the data
at hand, say to a population or to the future.
Two mainstream paradigms to draw statistical inference:
1 Frequentist approach (aka Classical approach)
2 Bayesian approach
In between these two paradigms is the (pure) likelihood approach.

4 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Frequentist Approach

Classical statistical approach is the method that provides statistical inference based on the
classical P-value, the significance level, the power, and the confidence interval (CI).
Mix of two approaches (Fisher’s approach & Neyman and Pearson’s approach)
Fisher’s Approach
Inductive approach
Introduction of null hypothesis (H0 ), significant test, P-value (= evidence against H0 ), and
significant level. NO alternative hypothesis. NO power.
Neyman and Pearson’s Approach
Deductive approach
Introduction of the alternative hypothesis (HA ), type I error, type II error, power, and
hypothesis test.
In practice the two approaches are mixed.

5 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Likelihood Approach

Inference purely on likelihood function has not been developed to a full-blown statistical
approach.
Considered here as a precursor to the Bayesian approach.
Likelihood function = plausibility of the observed data as a function of the parameters of
the stochastic model.
Likelihood(θ|data) = P(x|θ)
Likelihood does not form a valid probability since the Probability Density Function formed
by varying θ with the observed data does not integrate to 1.

6 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference

Bayesian Approach

Central idea of Bayesian approach: combine likelihood (data) with your prior knowledge
(prior information) to update information on the parameter to result in a revised
probability associated with the parameter (posterior probability).
Example of Bayesian reasoning in real life:
Tourist: prior views on Cambodian + visit Cambodia (data) ⇒ posterior view on
Cambodian.
Marketing: launching of new energy drink on the market.
Medical: Patients treated for CVA1 with thrombolytic agents suffer from severe bleeding
accident (SBA). Historical studies (20% - prior), pilot study (10% - data) ⇒ posterior

1 CVA - Cerebral Vascular Accident (a brain attack) is an interruption in the flow of blood to cells in the brain.
7 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions

The Binomial Case
The Gaussian Case
The Poisson Case

8 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Bayes’ Rule

Bayes theorem = theorem on inverse probability

P(B|A)P(A)
P(A|B) =
P(B)

Equivalence to,
P(B|A)P(A)
P(A|B) =
P(B|A)P(A) + P(B|Ā)P(Ā)

9 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Example 1.1: Sensitivity, Specificity, and Prevalence

A: “diseased” (D + ); B: “positive diagnostic test” (T + )

Characteristic of diagnostic test:
Sensitivity (Se ) = P(B|A) = P(T + |D + )
Specificity (Sp ) = P(B̄|Ā) = P(T − |D − )
Positive predictive value (pred+) = P(A|B) = P(D + |T + )
Negative predictive value (pred-) = P(Ā|B̄) = P(D − |T − )
Prevalence (prev) = P(A) = P(D + )

10 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Example 1.1: Sensitivity, Specificity, and Prevalence

Bayes’ Rule
P(T + |D + )P(D+)
P(D + |T + ) =
P(T + |D + )P(D + ) + P(T + |D − )P(D − )
In terms of Se , Sp , and prev

Se · prev
prev + =
Se · prev + (1 − Sp ) · (1 − prev )

11 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Example 1.1: COVID-19

We are interested in taking a test to determine if we have COVID-19.

From a sample, P(disease) = 5% and P(no disease) = 95%.
For one of the COVID-19 tests, the test sensitivity is reported to be 80%, and the test
specificity is 98.9%.
We want to know when the test is positive, then are we in fact infected?
P(T + |D − ) = 1 − P(T − |D − ) = 1-0.989 = 0.011
0.8 × 0.05
P(D + |T + ) = = 0.79
0.8 × 0.05 + 0.011 × 0.95

Not 100% but still high at 79%.

12 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Breaking Down Bayes’ Rule

Bayes’ Rule:
P(B|A)P(A)
P(A|B) =
P(B)

Posterior: P(A|B). In Bayesian analysis, we are often looking for the posterior to
represent the distribution of the parameter given the data.
Likelihood: P(B|A) In the future, we will see that the likelihood will represent the
likelihood of observing the data given the parameters.
Prior: P(A). Prior can represent a belief, it can be informed, or vague.
Marginal: P(B). This is a constant and in many analyses may be dropped out.

13 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Exercise 1.1

A car repair shop receives a car with reports of a strange noise coming from the engine. The
shop knows 90% of the cars that come in for “noises” have a loose fan belt while the other
10% have a loose muffler. A common description, 95%, of cars having loose mufflers is the
rattle. Less commonly, 8%, fan belt issues can also sound like a rattle. The car owner is
describing the strange noise as a rattle. What is the probability the car has a loose muffler?
1 78%
2 57%
3 95%

14 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Exercise 1.2

It is estimated that 80% of emails are spam. You have developed a new algorithm to detect
spam. Your spam software can detect 99% of spam emails but has a false positive rate of 5%.
Your company receives 1000 emails in a day, how many emails will be incorrectly marked as
spam?
1 10
2 20
3 5
4 200
5 50

15 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem

Exercise 1.3

You have developed a new algorithm for detecting fraud. It has a sensitivity of 90% with a
specificity of 95%. Choose the correct statement:
1 true positive rate = 90%, true negative rate = 5%
2 true positive rate = 90%, true negative rate = 95%

16 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions

The Binomial Case
The Gaussian Case
The Poisson Case

17 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distribution

To describe the number of successful p in n total events. Each draw is an independent

Bernoulli event.
n p
P(y |θ) = θ (1 − θ)n−p
p
Mean = nθ
Variance = nθ(1 − θ)
Example: To model the number of successful outcomes in a drug trial.
Conditions
Discrete data
Two possible outcomes for each trial
Each trial is independent
The probability of success/failure is the same in each trial

18 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distribution [Python Code]

from future import print_function

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import numpy as np
import scipy
from scipy.special import gamma, factorial, comb
import plotly.express as px
import plotly.offline as pyo
import plotly.graph_objs as go
# Set notebook mode to work in offline
pyo.init_notebook_mode()

INTERACT_FLAG = True

19 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distribution [Python Code]

def binomial_vector_over_y(theta, n):

total_events = n
y = np.linspace(0, total_events , total_events + 1)
p_y = [comb(int(total_events), int(yelem)) * theta** yelem *
(1 - theta)**(total_events - yelem) for yelem in y]

fig = px.line(x=y, y=p_y, color_discrete_sequence=["steelblue"],

height=600, width=800,
title=" Binomial distribution for theta = %lf, n = %d" %(theta, n))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "y"
fig.layout.yaxis.title.text = "P(y)"
fig.show()

if(INTERACT_FLAG):
interact(binomial_vector_over_y, theta=0.5, n=15)
else:
binomial_vector_over_y(theta=0.5, n=10)

20 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Binomial Distributions

21 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Negative Binomial Distribution

To describe the number of successes (r − 1) and x failures in (x + r − 1) trials, until you

have a success on the (x + r )th trail.

x +r −1 r
P(x|θ) = θ (1 − θ)x
r −1

Mean = r (1 − θ)/θ
Variance = r (1 − θ)/θ2
Example: To measure the number of days your car would work before it breaks down for
the 3rd time.
Conditions
Count of discrete events
The events can be non-independent (the events can influence or cause other events)
Variance can exceed the mean

22 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Negative Binomial Distribution [Python Code]

def negative_binomial_vector_over_y(theta, total_events):
# total_events = x + r
# for a fixed number of events, what is the probability of seeing 'x' failures
# number of successes 'r' is therefore total_events - x
# theta is the probability of the success event

x = np.linspace(0, total_events , total_events + 1)

p_x = [comb(int(total_events - 1), int(total_events - xelem - 1)) * theta**
(total_events - xelem) * (1 - theta)**(xelem) for xelem in x]

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

height=600, width=800,
title="Negative Binomial distribution for theta = %lf, total_events = %d"
%(theta, total_events))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x = number of failures"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(negative_binomial_vector_over_y, theta=0.9, total_events=15)
else:
negative_binomial_vector_over_y(theta=0.9, total_events=15)
23 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Negative Binomial Distribution

24 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Poisson Distribution

To indicate the probability of a number of events.

P(y |θ) = θy e −θ /y ! x = 0, 1, 2, ...

Mean = Variance = λ
Example: To model the number of accidents at an intersection. To model the number of
Salmonella outbreaks in a year.
Conditions
Discrete non-negative data - count of events, the rate parameter can be a non-integer
positive value
Each event is independent of other events
Each event happens at a fixed rate
A fixed amount of time in which the events occur

25 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Poisson Distribution [Python Code]

def poisson_vector(theta, y_end):

y = np.linspace(0,y_end,y_end+1)

p_theta = (theta**y * np.exp(-theta)) / factorial(y)

# y is the number of events

# y_end is how far you want to compute y values
fig = px.line(x=y, y=p_theta, color_discrete_sequence=["steelblue"],
height=600, width=800, title=" Poisson distribution for theta = %d" %(theta))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "y"
fig.layout.yaxis.title.text = "P(y)"
fig.show()

if(INTERACT_FLAG == True):
interact(poisson_vector, theta=7, y_end=20)
else:
poisson_vector(theta=7, y_end=20)

26 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Poisson Distribution

27 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Exponential Distribution

To model the duration of events.

P(x) = λe −λx
Mean = 1/λ
Variance = 1/λ2
Example: Time to failure for the radiator in a car.
Conditions
Continuous non-negative data
Time between events are considered to happen at a constant rate
Events are considered to be independent

28 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Exponential Distribution [Python Code]

def exponential_distribution(lambda_rate, x_end):

x = np.linspace(0,x_end,x_end*4)

p_x = lambda_rate * np.exp(-lambda_rate * x)

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

height=600, width=800, title=" Exponential distribution for lambda = %lf" %(lambda_rate))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(exponential_distribution, lambda_rate = 4, x_end=20)
else:
exponential_distribution(lambda_rate = 0.2, x_end=20)

29 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Exponential Distribution

30 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Gamma Distribution

To model the time taken for n independent events to occur.

β α α−1 −βx
P(x) = x e
Γ(α)

α > 0 is the shape parameter; β > 0 is the rate parameter, or the inverse scale parameter
Mean = α/β
Variance = α/β 2
Example: To model the time taken for 4 bolts in your car to fail.
Conditions
Continuous non-negative data
A generalization of the exponential distribution, but more parameters to fit,
An exponential distribution models the time to the first event, Gamma distribution models
the time to the “n” event.

31 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Gamma Distribution [Python Code]

def gamma_individual(a, b, x_max):

x = np.arange(0,x_max,0.1)

term = b**a /gamma(a)

p_x = term * x**(a - 1) * np.exp(-b * x)

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

height=600, width=800,
title=" Gamma distribution for a (num events) = %d, b (rate of events) = %d" %(a, b))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x (wait times)"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(gamma_individual,a=2,b=1,x_max=10)
else:
gamma_individual(2,1,10)

32 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Gamma Distribution

Figure: Gamma distribution by varying parameters α and β

.
33 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Normal Distribution (Gaussian)

To model real-valued random variable.

1 2 2
P(x) = √ e −(x−µ) /2σ
σ 2π
Mean = µ
Variance = σ 2
Example: The heights of men in your state can be represented by a normal distribution.
Conditions
Continuous
Unbounded distribution
Outliers are minimal

34 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Normal Distribution (Gaussian) [Python Code]

import math

def normal_distribution(mean, sigma):

x = np.linspace(-4sigma + mean ,4sigma + mean, 50*sigma)

p_x = np.exp(-(x - mean)**2 / (2sigmasigma)) / (sigma * np.sqrt(2.0 * math.pi ))

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

height=600, width=800,
title=" Exponential distribution for mean = %lf, sigma = %lf" %(mean, sigma))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

if(INTERACT_FLAG):
interact(normal_distribution, mean = 4, sigma = 3)
else:
normal_distribution(mean = 5, sigma = 4)

35 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Normal Distribution

36 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution

To model a right-skewed continuous random variable (long-tail towards the right).

1 2 2
) /2σ 2
P(x) = √ e −(ln(x)−µ
xσ 2π

X ∼ lognormal(µ, σ 2 ) then log(X ) ∼ N(µ, σ 2 )

Mean = exp(µ + σ 2 /2)
Variance = exp(2µ + σ 2 )(exp(σ 2 ) − 1)
To model the disease parameters such as the reproduction number for epidemics.
Conditions
Continuous non-negative values
Asymmetric unlike the Normal distribution

37 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution [Python Code]

import math

def lognormal_distribution(mean, sigma):

x = np.linspace(0.1,2.5,100)

p_x = np.exp(-(np.log(x) - mean)**2 / (2sigmasigma)) / (x * sigma * np.sqrt(2.0 * math.pi ))

mode = np.exp(mean - sigma**2)

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],
height=600, width=800,
title=" Lognormal distribution for mean = %lf, sigma = %lf, mode = %lf" %(mean, sigma, mode))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "x"
fig.layout.yaxis.title.text = "P(x)"
fig.show()

38 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution [Python Code]

# OPTION 1, if you want to provide the lognormal mean_x and std deviation_x
# OPTION 2, if one wants to select a mean based on a desired mode of the lognormal distribution
# and a given standard deviation.

OPTION = 2

if(OPTION == 1):
mean_x = 2 # CHANGE THIS
sigma_x = 2 # CHANGE THIS
mean = np.log(mean_x**2 / (np.sqrt(mean_x**2 + sigma_x**2)))
sigma = np.log(1 + (sigma_x**2 / mean_x**2))
else:
sigma = 0.2 # CHANGE THIS
mode = 0.8 # CHANGE THIS
mean = np.log(mode + sigma**2)

#print("Mean %lf, sigma %lf, mode %lf "%(mean, sigma, mode))

if(INTERACT_FLAG):
interact(lognormal_distribution, mean = 1, sigma = 0.25)
else:
lognormal_distribution(mean = mean, sigma = sigma)

39 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Log-normal Distribution

40 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution

Similar to the normal distribution with its bell shape but has heavier tails.
r
Γ( ν+1
2 ) λ λ(x − µ)2 −(ν+1)/2
p(x) = ν (1 + )
Γ( 2 ) νπ ν

Mean = µ
Variance = µ/(ν − 2)λ
Example: A distribution of test scores from an exam which has a significant number of
outliers and would not be appropriate for a Normal distribution
Conditions
Continuous data
Unbounded distribution
Considered an overdispersed Normal distribution, a mixture of individual normal distributions
with different variances

41 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution [Python Code]

def studentst_distribution(v):

t = np.linspace(-10,10,100)

term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))

term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t = term1 * term2

fig = px.line(x=t, y=p_t, color_discrete_sequence=["steelblue"],

height=600, width=800, title=" Student's t-distribution for v = %lf" %(v))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "t"
fig.layout.yaxis.title.text = "P(t)"
fig.show()

if(INTERACT_FLAG == True):
interact(studentst_distribution, v=10)
else:
studentst_distribution(v=10)

42 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution [Python Code]

import plotly.graph_objects as go
t = np.linspace(-10,10,100)
fig = go.Figure()

v = 1
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t, line=go.scatter.Line(color="gray"), showlegend=True))
fig.add_scatter(x=t, y=p_t, name="v=1", mode="lines")

v = 4
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t2 = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t2, line=go.scatter.Line(color="blue"), showlegend=True))
fig.add_scatter(x=t, y=p_t2, name="v=2", mode="lines")

v = 10
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t2 = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t2, line=go.scatter.Line(color="blue"), showlegend=True))
fig.add_scatter(x=t, y=p_t2, name="v=10", mode="lines")
43 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Student’s t-Distribution

44 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Beta Distribution

To model continuous random variables whose range is between 0 and 1.

Γ(a + b) a−1
P(θ|a, b) = θ (1 − θ)b−1
Γ(a)Γ(b)
a
Mean = a+b
ab
Variance = (a+b)2 (a+b+1)
Example: in Bayesian analyses, the beta distribution is often used as a prior distribution of
the parameter p (which is bounded between 0 and 1) of the binomial distribution.
Conditions
Takes positive values between 0 and 1 as input
Setting a and b to 1 gives you a uniform distribution

45 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Beta Distribution
# Beta posterior with uniform Beta prior, a=1, b=1
def beta_vector_theta(num_p, total, a, b):
alpha = num_p + a
beta = total - num_p + b
theta = np.linspace(0,1,25)

print("Posterior a =",alpha)
print("Posterior b =",beta)

term = gamma(alpha + beta) / ( gamma(alpha) * gamma(beta) )

p_theta = term * theta**(alpha - 1) * (1 - theta)**(beta - 1)

fig = px.line(x=theta, y=p_theta, color_discrete_sequence=["steelblue"],

height=600, width=800,
title=" Beta dist. for total # of events=%d, # of positive events=%d" %(total, num_p))
fig.data[0].line['width'] = 4
fig.layout.xaxis.title.text = "theta"
fig.layout.yaxis.title.text = "P(theta)"
fig.show()

if(INTERACT_FLAG):
interact(beta_vector_theta, num_p = 4, total=10, a=1, b=1)
else:
beta_vector_theta(num_p = 4, total=10, a=1, b=1)
46 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions

Beta Distribution

Figure: Beta distribution by varying parameters α (a) and β (b)

. 47 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions

The Binomial Case
The Gaussian Case
The Poisson Case

48 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Priors
Bayes’ Rule gives us a method to update our beliefs based on prior knowledge.
Prior is the unconditional probability of the parameters before the (new) data.
Prior can come from a number of sources including:
past experiments or experience
some sort of desire for balance or weighting in a decision
non-informative, but objective
mathematical convenience
The choice of prior is as much about what is currently known about the parameters. It is
often subjective and contested. Two broad types of prior:
1 non-informative
2 informative
The prior can be proper, i.e. conform to the rules of probability and integrate to 1, or
improper.
Convenient choice of priors can lead to closed-form solutions for the posterior.
49 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Conjugate priors

Conjugate priors are priors that induce a known (same family as prior) distribution in the
posterior.
Example:
Data: X ∼ Bern(θ) Q
Likelihood: f (x|θ) = ni=1 θix (1 − θ)1−xi = θk (1 − θ)n−k , where k =
P
xi .
Posterior = Likelihood × prior

(x|θ) · p(θ) ∝ [θk (1 − θ)n−k ][θα−1 (1 − θ)β−1 ] = θα+k−1 (1 − θ)β+n−k−1

We recognize as Beta(α + k, β + n − k).

If we are modeling a Bernoulli process and use a prior with a Beta distribution, our posterior
has a Beta distribution.

50 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Conjugate priors (Cont.)

Common conjugate priors by likelihood type:

51 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Non-informative priors

Non-informative priors are priors that suggest ignorance as to the parameters. These are
sometimes called vague or diffuse priors.
The priors generally cover the region of the parameter space relatively smoothly.
Common non-informative priors: U[−100, 100], N[0, 104 ].

52 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Jeffrey’s non-informative prior

Jeffrey’s prior is a non-informative prior that is derived from the Fisher information.
We do not specify prior information. We use the data information to shape the prior.
Fisher’s information In (θ) tells us how much information about θ is included in the data.
Jeffery’s prior is derived by:
p ∂ ln f (θ) ∂ 2 lnL(θ)
p(θ) ∝ In (θ), where In (θ) = Eθ [ ] = −Eθ [ ]
∂θ ∂θ2

53 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Jeffrey’s non-informative prior

Example:
Data: X ∼ gamma(α, β), assuming α is known and β is unknown.
Fisher’s information, In (β) = nα
β 2 leading to the Jeffrey’s prior for β:

nα √ 2 √
r
p(β) ∝ 2
= nσ = σ n ∝ 1
β

Note: Jeffrey’s priors are not guaranteed to be a proper prior. Perhaps most importantly,
Jeffrey’s priors are stable under reparameterization.

54 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Informative priors

Informative priors are explicitly chosen to represent current knowledge or belief about the
parameter of interest.
When choosing informative priors, one can also choose the form of prior.
Example: Tossing Coins
We were given a new coin and were told it would generate heads with P(heads) = 0.75.
We conduct a new experiment to characterize the distribution of θ.
When dealing with Bernoulli trials, a computationally convenient on the prior is beta(a, b).

55 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Example: Tossing Coins

What do we choose for the parameters of the

beta distribution?
P(heads) = 0.75
To incorporate the information, we might
use beta with a mean close to 0.75.
We also have the ability to choose the
precision/scale that represents some
amount of disbelief in the unfairness of the
coin.
In this case, we can use beta(6.9, 3) or
beta(16,6).

56 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Example: Tossing Coins [Python Code]

from scipy.stats import beta

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0,1,100)

y1 = beta.pdf(x, 0.5, 0.5)

y2 = beta.pdf(x, 1, 1)
y3 = beta.pdf(x, 3, 3)
y4 = beta.pdf(x, 6.9, 3)
y5 = beta.pdf(x, 16, 6)

plt.plot(x, y1, "-", label="beta(0.5,0.5)")

plt.plot(x, y2, "r--", label="beta(1,1)")
plt.plot(x, y3, "g--", label="beta(3,3)")
plt.plot(x, y4, "b--", label="beta(6.9,3)")
plt.plot(x, y5, "y--", label="beta(16, 6)")

plt.legend(loc="upper left")
plt.show()

57 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Weakly informative or vague priors

We can tune our prior belief using the mean or even mode to center our belief and
variance as a measure of the strength of belief.
Example: Tossing Coins
For prior, beta(6.9,3):

a a−1 ab
E [x] = = 0.70 mode(x) = = 0.77 V (x) = = 0.02
a+b a+b−2 (a + b)2 (a + b + 1)

For prior, beta(16,6): E (x) = 0.73, mode(x) = 0.75, V (x) = 0.0086

Tuning the prior to include slightly more confidence in the prior information may suggest a
beta(16,6).
These priors are vague in that the mass of the prior is still diffuse allowing the data to
drive the posterior through the likelihood.

58 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors

Sensitivity Analysis of Priors

The general approach to using priors in models is to start with some justification for a
prior, run the analysis, then come up with competing priors and re-examine the
conclusions under the alternative priors.
Many Bayesian experts recommend that a sensitivity analysis should always be conducted.
The process takes place as follows:
The researcher predetermines a set of priors to use for model estimation.
The model is estimated, and convergence is obtained for all model parameters.
The researcher comes up with a set of competing priors to examine.
Results are obtained for the competing priors and then compared with the original results
through a series of visual and statistical comparisons.
The final model results are written up to reflect the original model results (obtained in Item
1, from the original priors), and the sensitivity analysis results are also presented in order to
comment on how robust (or not) the final model results are to different prior settings.

59 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions

Contents

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

3 Common Distributions

4 Priors

5 Computing Posterior Distributions

The Binomial Case
The Gaussian Case
The Poisson Case

60 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case

Example: Stoke Study

First interim analysis ECASS 3: 50 rt-PA2 patients with 10 SICHs3 .
Historical data ECASS 2: 100 rt-PA patients with 8 SICHs.
Estimate risk for SICH in ECASS 3 to construct stopping rule of rt-PA.

2 rt-PA is a treatment for stroke.

3 SICH is a side effect.
61 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case (Cont.)

Prior: ECASS 2 Study

n0 = 100 and x0 = 8
Kernel of binomial likelihood, θ0x (1 − θ)n0 −x0 ∝ Beta(α0 = x0 + 1, β0 = n0 − x0 + 1)

1 Γ(α0 )Γ(β0 )
p(θ) = θα0 −1 (1 − θ)β0 −1 , B(α0 , β0 ) =
B(α0 , β0 ) Γ(α0 + β0 )

Likelihood: ECASS 3 Study

n = 50 and x = 10
n

Likelihood: L(θ|x) = x θx (1 − θ)n−x

62 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case (Cont.)

Posterior distribution
p(θ|x) ∝ L(θ|x)p(x) ∝ Beta(α, β)

1 Γ(α)Γ(β)
p(θ|x) = θα−1 (1 − θ)β−1 , B(α, β) =
B(α, β) Γ(α + β)

where α = α0 + x and β = β0 + n − x

63 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Example: Stroke Study

Example: Stoke Study

Prior: n0 = 100 and x0 = 8, beta(9, 93)
Likelihood: n = 50 and x = 10,
binomial(50, 0.2)
Posterior: beta(19, 133)

64 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Example: Stroke Study [Python Code]

x = np.linspace(0, 1, 100)

# prior: ECASS 2 study

n0 = 8
y0 = 100
alpha0 = n0 + 1 # 8+1=9
beta0 = y0 - n0 + 1 # 100-8+1=93
prior = beta.pdf(x, alpha0, beta0)
plt.plot(x, prior, "r-", label="prior = beta(9, 93)")

# likelihood: ECASS 3 study

n = 50
y = 10
# x play role as p
likelihood = comb(50,10)*(x)**(10)*(1-x)**(50-10)

def f(p):
return p**(10) * (1-p)**(50-10)

65 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Example: Stroke Study [Python Code]

p = sy.Symbol("p")
I = sy.integrate(f(p), (p, 0, 1))
print(I)
scaled_likelihood = (x)**(10)*(1-x)**(50-10) * (523886186670)

plt.plot(x, likelihood, "b-", label="likelihood of bin(n=50, y=10)")

plt.plot(x, scaled_likelihood, "b--", label="scaled likelihood (AUC=1)")

# posterior: likelihood*prior
alpha1 = alpha0 + y # 9+10=19
beta1 = beta0 + n - y # 93+50-10=133
posterior = beta.pdf(x, alpha1, beta1)
plt.plot(x, posterior, "r--", label="posterior = beta(19, 133)")

plt.legend(loc="upper right")
plt.ylim(-0.1, 20)
plt.xlim(0,0.4)
plt.show()

66 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Characteristics of the Posterior Distribution

Posterior = compromise between prior & likelihood.

n0 n
Posterior mode: θ̄ = n0 +n θ0 + n0 +n θ̂ (analogous result for mean).
Shrinkage: θ0 ≤ θ̄ ≤ θ̂ when (x0 /n0 ≤ y /n).
Here: posterior more peaked than prior & likelihood (not in general).
Likelihood dominates the prior for large sample sizes.
Posterior = beta distribution = prior (conjugacy).
Posterior estimate θ = MLE of combined ECASS 2 data and interim data ECASS 3.

67 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Equivalence of Prior Information and Extra Data

Beta(α0 , β0 ) prior ∝ binomial experiment with (α0 − 1) success in (α0 + β0 − 2)

experiments.
⇒≈ extra data to observed data set: (α0 − 1) successes and (β0 − 1) failures.

68 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

The Binomial Case: No Prior Information Is Available

Example: Stoke Study

Suppose no prior information is available.
Need a prior distribution that expresses
ignorance = non-informative (NI) prior.
For stroke study: NI prior = p(theta) =
I[0,1] = flat prior on [0,1].
Prior: Uniform prior on [0,1] = beta(1, 1)
Likelihood: n = 50 and x = 10,
binomial(50, 0.2)
Posterior: beta(11, 41)

69 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

No Prior Information Is Available [Python Code]

x = np.linspace(0, 1, 100)

# prior: no priror information is available

alpha0 = 1
beta0 = 1
prior = beta.pdf(x, 1, 1)
plt.plot(x, prior, "r-", label="prior = beta(1, 1)")

# likelihood: ECASS 3 study

n = 50
y = 10
# x play role as p
likelihood = comb(50,10)*(x)**(10)*(1-x)**(50-10)

def f(p):
return p**(10) * (1-p)**(50-10)

p = sy.Symbol("p")
I = sy.integrate(f(p), (p, 0, 1))
print(I)
scaled_likelihood = (x)**(10)*(1-x)**(50-10) * (523886186670)

70 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

No Prior Information Is Available [Python Code]

#plt.plot(x, likelihood, "b-", label="likelihood of bin(n=50, y=10)")

plt.plot(x, scaled_likelihood, "b--", label="scaled likelihood (AUC=1)")

# posterior: likelihood*prior
alpha1 = alpha0 + y # 1+10=11
beta1 = beta0 + n - y # 1+50-10=41
posterior = beta.pdf(x, alpha1, beta1)
plt.plot(x, posterior, "r--", label="posterior = beta(11, 41)")

plt.legend(loc="upper right")
plt.ylim(-0.1, 20)
plt.xlim(0,0.4)
plt.xlabel('theta')
plt.show()

71 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case

Exercise: Tossing Coin

Consider you are doing a coin toss experiment. You are given a presumably unfair coin with
p(heads) = 0.80 from 20 coins tossed. You are now collecting new data and analyzing the
posterior by doing 10 coin tosses and getting 4 heads.
A. Choose the distribution for your prior and construct your posterior distribution.
B. In case no prior information is available, construct your posterior distribution.

72 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case

Example: Dietary Study

IBBENS Study: a dietary survey in Belgium
Of interest: intake of cholesterol
Monitoring dietary behavior in Belgium: IBBENS-2 study

73 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case: Prior Distribution

Histogram of the dietary cholesterol of 563 bank employees approximately follows a

normal distribution.
Y ∼ N(µ, σ 2 ) where the pdf of Y

1 (y −µ)2
f (y ) = √ e − 2σ2
2πσ
Sample y1 , ..., yn the we obtained the likelihood:
" n
# " 2 #
1 X 1 µ − ȳ
L(µ|y ) ∝ exp − 2 (yi − µ)2 ∝ exp − √ ≡ L(µ|ȳ )
2σ 2 σ/ n
i=1

74 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Histogram and Likelihood IBBENS Study

75 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case: Prior Distribution

Denote sample n0 IBBENS data: y 0 ≡ {y0,1 , y0,2 , ..., y0,n0 } with mean ȳ0
Likelihood ∝ N(µ0 , σ02 )
µ0 ≡ ȳ0 = 328 √
√
σ0 = σ/ n0 = 120.3/ 563 = 5.072
IBBENS prior distribution
" 2 #
1 1 µ − µ0
p(µ) = √ exp −
2πσ0 2 σ0

with µ0 ≡ ȳ0

76 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Constructing the Posterior Distribution

IBBENS-2 Study:
Sample y with n = 50
ȳ = 318 mg/day and s = 119.5 mg/day
The 95% confidence interval = [284.3, 351.9] mg/day ⇒ wide
Combine IBBENS prior distribution and IBBENS-2 Normal likelihood:
IBBENS-2 likelihood: L(y |ŷ )
IBBENS prior density: N(µ0 , σ02 )
Posterior distribution ∝ p(µ)L(µ|ȳ ):
( " 2 2 #)
1 µ − µ0 µ − ȳ
p(µ|y ) ∝ p(µ|ȳ ) ≡ exp − + √
2 σ0 σ/ n

77 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Constructing the Posterior Distribution

Integration constant to obtain density?

Recognize standard distribution: exponent (quadratic function of µ)
Posterior distribution:
p(µ|y ) = N(µ̄, σ̄ 2 )
with
1
σ02
+ σn2 ȳ 1
µ̄ = 1 and σ̄ 2 =
σ02
+ σn2 1
σ02
+ n
σ2

Hence µ̄ = 327.2 and σ̄ = 4.79

78 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

IBBENS-2 Posterior Distribution

79 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Characteristics of the Posterior Distribution

Posterior distribution: a compromise between prior and likelihood.

Posterior mean: weighted average of prior and the sample mean
w0 w1
µ̄ = µ0 + ȳ
w0 + w1 w0 + w1
with weight
1 1
w0 = and 2
σ02 σ /n
The posterior precision = 1/posterior variance:

1
= w0 + w1
σ̄ 2
with w0 = 1/σ02 = prior precision and w1 = 1/(σ 2 /n) = sample precision.

80 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Characteristics of the Posterior Distribution

Posterior is always more peaked than prior likelihood.

When n → ∞ or σ0 → ∞: p(µ|y ) = N(ȳ , σ 2 /n)
When the sample size increases the likelihood dominates the prior.
Posterior = normal = prior ⇒ conjugacy.

81 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Equivalence of Prior Information and Extra Data

Prior variance σ02 = σ 2 ⇒ σ¯2 /(n + 1) ⇒ Prior information = adding one extra observation
to the sample.
General: σ02 = sigma2 /n0 , with n0 general
n0 n
µ̄ = µ0 + ȳ
n0 + n n0 + n
and
σ2
σ̄ 2 =
n0 + n

82 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Gaussian Case: No Prior Information Available

Non-informative prior: σ02 → ∞ ⇒ Posterior: N(ȳ , σ 2 /n)

83 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Exercise: Midge Wing Length

Grogan and Wirth (1981) provided data on the wing length in

millimeters of nine members of a species of midge (small,
two-winged flies). From these nine measurements (1.64, 1.70,
1.72, 1.74, 1.82, 1.82, 1.82, 1.90. 2.08) we wish to make an
inference on the population mean θ. Studies from other
populations suggest that wing lengths are typically around
1.9mm and must be positive. We also know that most of the
probability (95%) is within two standard deviations of the
mean. Obtain and plot the posterior distribution of midge
wing length.

84 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

The Poisson Case

Take y ≡ {y1 , ..., yn } independent counts ⇒ Poisson distribution

Poisson (θ)
θy e −θ
p(y |θ) =
y!
Mean and variance = θ
Poisson likelihood
n n yi
Y Y θ
L(θ|y ) ≡ p(yi |θ) = e −nθ
yi !
i=1 i=1

85 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case

Example: Describing Caries Experience in Flanders

The Signal-Tandmobiel (STM) study:

Longitudinal oral health study in Flanders.
Annual examinations from 1996 to 2001.
4468 children (7% of children born in
1989)
Caries experience measured by dmft-index
(min=0, max=20)
Frequentist and Likelihood Calculations
MLE of θ: θ̂ = ȳ = 2.24
Likelihood-based 95% confidence interval
for θ: [2.1984, 2.2875].

86 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

The Poisson Case: Specifying Prior Distribution

Information from literature:

Average dmft-index 4.1 (Liège) & 1.39 (Gent,
1994)
Oral hygiene has improved considerably in Flanders
Average dmft-index bounded above by 10.
Candidate for prior: Gamma(α0 , β0 )

β0α0 α0 −1 −β0 θ
p(θ) = θ e
Γ(α0 )

α0 = shape parameter & β0 = inverse of scale

parameter
E (θ0 ) = α0 /β0 & Var (θ) = α0 /β02
STM study: α0 = 3 & β0 = 1.
87 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Constructing the Posterior Distribution

Posterior
n
Y θ α0 α0 P
p(θ|y ) ∝ L(θ|y )p(θ) ∝ e −nθ (θiy /yi !) 0 θα −1 e −β0 θ ∝ θ( yi +α0 )−1 e −(n+β0 )θ
Γ(α0 )
i=1
P
Recognize kernel of a Gamma( yi + α0 , n + β0 ) distribution

β̄ ᾱ ᾱ−1 −β̄θ
⇒ p(θ|y ) ≡ p(θ|ȳ ) = θ e
Γ(ᾱ)
P
with ᾱ = yi + α0 = 9758 + 3 = 9761 and β̄ = n + β0 = 4351 + 1 = 4352 ⇒ STM
Study: The effect of the prior is minimal.

88 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Characteristics of the Posterior Distribution

Posterior is a compromise between prior and likelihood.

Posterior mode and mean demonstrate shrinkage.
For the STM study posterior more peaked than prior and likelihood, but not in general.
Prior is dominated by likelihood for large sample size.
Posterior = gamma = prior ⇒ conjugacy.

89 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Equivalence of Prior Information and Extra Data

Prior = equivalent to experiment of size β0 with counts summing up to α0 − 1

STM study: prior corresponds to an experiment of size 1 with a count equal to 2.

90 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

The Poisson Case: No Prior Information Available

Gamma with α0 ≈ 1 and β0 ≈ 0 = non-informative prior.

91 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case

Exercise: Road Accidents in Cambodia

The bar chart on the right shows the number

of injuries caused by road accidents in
Cambodia from 2014 to 2019. We are
interested in making inferences on the average
of fatalities at year level. The experts suggest
that road accidents kill more or less 2,000
people every year. Construct the posterior
distribution of the average fatality.
Figure: Source: UNDP: Road Traffic Accidents in
Cambodia

92 / 92

Data Analytics With Python
100% (1)
Data Analytics With Python
634 pages
Smbi CH1
No ratings yet
Smbi CH1
97 pages
Simulation With Excel and at Risk
100% (1)
Simulation With Excel and at Risk
5 pages
Harolds Stats Distributions Cheat Sheet 2022
No ratings yet
Harolds Stats Distributions Cheat Sheet 2022
18 pages
Monte Carlo Simulation
100% (2)
Monte Carlo Simulation
70 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Distribution and Statistical Interference
No ratings yet
Distribution and Statistical Interference
43 pages
PM CW1 Assignment Explanation
No ratings yet
PM CW1 Assignment Explanation
41 pages
PPT7-Discrete Time - Markov Chain
No ratings yet
PPT7-Discrete Time - Markov Chain
37 pages
Repairable and Non-Repairable Items: When Only One Failure Can Occur
100% (1)
Repairable and Non-Repairable Items: When Only One Failure Can Occur
70 pages
A Generalized Normal Distribution
No ratings yet
A Generalized Normal Distribution
11 pages
Types of Statistical Distributions
No ratings yet
Types of Statistical Distributions
34 pages
An Introduction To Bayesian Statistics and MCMC Methods
No ratings yet
An Introduction To Bayesian Statistics and MCMC Methods
69 pages
Extreme Value Statistics
No ratings yet
Extreme Value Statistics
41 pages
Derivatives Markets 3rd Edition (Ebook PDF) Instant Download
100% (1)
Derivatives Markets 3rd Edition (Ebook PDF) Instant Download
51 pages
Course Notes STATS 325 Stochastic Processes: Department of Statistics University of Auckland
No ratings yet
Course Notes STATS 325 Stochastic Processes: Department of Statistics University of Auckland
195 pages
PPT9-Renewal Process
No ratings yet
PPT9-Renewal Process
29 pages
Applied Statistics and Probability For Engineers Chapter - 8
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 8
13 pages
2019 Wiley FRM Part II Study Guide Sample
No ratings yet
2019 Wiley FRM Part II Study Guide Sample
40 pages
Time Series Characteristic
No ratings yet
Time Series Characteristic
72 pages
BTH 780 Reliability Engineering: Topic: Repairable Systems, Non-Parametric Analysis, Reliability Growth
No ratings yet
BTH 780 Reliability Engineering: Topic: Repairable Systems, Non-Parametric Analysis, Reliability Growth
66 pages
BTH780ReliabilityEngineeringLectureMaterial#1 (1) Merged
100% (1)
BTH780ReliabilityEngineeringLectureMaterial#1 (1) Merged
523 pages
Probability Distribution
100% (1)
Probability Distribution
22 pages
Doing Bayesian Data Analysis With JASP: Darrell A. Worthy
No ratings yet
Doing Bayesian Data Analysis With JASP: Darrell A. Worthy
76 pages
Pearson Distribution
No ratings yet
Pearson Distribution
11 pages
Bayesian Ibrahim
No ratings yet
Bayesian Ibrahim
370 pages
Moment Generating Functions
No ratings yet
Moment Generating Functions
7 pages
Navidi ch6
No ratings yet
Navidi ch6
82 pages
10 - Chapter 4 PDF
No ratings yet
10 - Chapter 4 PDF
30 pages
3-Statistical Learning - Distributions
No ratings yet
3-Statistical Learning - Distributions
33 pages
Hypothesis Testing - 2 Populations
100% (1)
Hypothesis Testing - 2 Populations
26 pages
The Normal Distribution: Learning Objectives
No ratings yet
The Normal Distribution: Learning Objectives
5 pages
Poisson Distribution
No ratings yet
Poisson Distribution
22 pages
BTH 780 Reliability Engineering
No ratings yet
BTH 780 Reliability Engineering
31 pages
Markov Chains
No ratings yet
Markov Chains
63 pages
Weibull
100% (1)
Weibull
15 pages
Probability and Statistics
No ratings yet
Probability and Statistics
110 pages
1 s2.0 S0959652616301007 Main
No ratings yet
1 s2.0 S0959652616301007 Main
13 pages
Wind Loads On Asia Pacific
No ratings yet
Wind Loads On Asia Pacific
10 pages
Class 06 - Time Dependent Failure Models
100% (1)
Class 06 - Time Dependent Failure Models
37 pages
Ch06.continous Probability Distributions
No ratings yet
Ch06.continous Probability Distributions
26 pages
Markov Analysis
No ratings yet
Markov Analysis
10 pages
Handling Imbalanced Data
No ratings yet
Handling Imbalanced Data
21 pages
Generalized Linear Failure Rate Distribution
No ratings yet
Generalized Linear Failure Rate Distribution
23 pages
Gamma Extended Frechet Distribution
No ratings yet
Gamma Extended Frechet Distribution
23 pages
Principal Components Analysis
No ratings yet
Principal Components Analysis
50 pages
ACET - Syllabus 2 - Stats Pack
No ratings yet
ACET - Syllabus 2 - Stats Pack
2 pages
Class 7
No ratings yet
Class 7
42 pages
Assignment 1 Quantitative Management: Bayers' Theorem & Conditional Probability
No ratings yet
Assignment 1 Quantitative Management: Bayers' Theorem & Conditional Probability
2 pages
Analysis Analysis: Multivariat E Multivariat E
100% (1)
Analysis Analysis: Multivariat E Multivariat E
12 pages
Chap4 - 1 Time To Failure
No ratings yet
Chap4 - 1 Time To Failure
36 pages
DA Marathon Notes
No ratings yet
DA Marathon Notes
134 pages
Conda Cheatsheet
No ratings yet
Conda Cheatsheet
1 page
Markov Chains: Stochastic Models
No ratings yet
Markov Chains: Stochastic Models
7 pages
Statics CH5
No ratings yet
Statics CH5
89 pages
Likelihood Ratio Tests: Instructor: Songfeng Zheng
No ratings yet
Likelihood Ratio Tests: Instructor: Songfeng Zheng
9 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Bayesian Analysis - Explanation
No ratings yet
Bayesian Analysis - Explanation
20 pages
Chapter 9 Fundamental of Hypothesis Testing
No ratings yet
Chapter 9 Fundamental of Hypothesis Testing
26 pages
Reliability Distribution
No ratings yet
Reliability Distribution
16 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
Section 1: Introduction To Reliability
No ratings yet
Section 1: Introduction To Reliability
30 pages
Jesse Lesperance Smre Homework 2 Reliability Apportionment Component Importance
No ratings yet
Jesse Lesperance Smre Homework 2 Reliability Apportionment Component Importance
10 pages
CGE676 Lect. 1 PDF
No ratings yet
CGE676 Lect. 1 PDF
12 pages
Computational Stadistic With Matlab
No ratings yet
Computational Stadistic With Matlab
11 pages
Decision Analysis & Modelling
No ratings yet
Decision Analysis & Modelling
2 pages
Module 3
No ratings yet
Module 3
39 pages
Power Options
No ratings yet
Power Options
6 pages
EVA UserGuide
No ratings yet
EVA UserGuide
44 pages
Burkhart, Mocz - 2019 - The Self-Gravitating Gas Fraction and The Critical Density For Star Formation
No ratings yet
Burkhart, Mocz - 2019 - The Self-Gravitating Gas Fraction and The Critical Density For Star Formation
14 pages
Wachemo University: Hydraulic and WR Engineering Department (HE-4224) 2014
No ratings yet
Wachemo University: Hydraulic and WR Engineering Department (HE-4224) 2014
140 pages
Structural Reliability of Corroded Pipeline Using The So-Called Separable Monte Carlo Method
No ratings yet
Structural Reliability of Corroded Pipeline Using The So-Called Separable Monte Carlo Method
8 pages
Applied R&M Manual For Defence Systems Part D - Supporting Theory
No ratings yet
Applied R&M Manual For Defence Systems Part D - Supporting Theory
56 pages
10 1 1 195 1039 PDF
No ratings yet
10 1 1 195 1039 PDF
49 pages
Bayesian Methodology: an Overview With The Help Of R Software
From Everand
Bayesian Methodology: an Overview With The Help Of R Software
Editor IJSMI
No ratings yet
Flood Analysis-1 PDF
No ratings yet
Flood Analysis-1 PDF
70 pages
A Brief History of Risk and Return: Concept Questions
No ratings yet
A Brief History of Risk and Return: Concept Questions
6 pages
Brandner - Group Action of Axially-Loaded Screws in The Narrow Face of Cross Laminated Timber
No ratings yet
Brandner - Group Action of Axially-Loaded Screws in The Narrow Face of Cross Laminated Timber
10 pages
Option Pricing Model Comparing Louis Bachelier With Black-Scholes Merton
No ratings yet
Option Pricing Model Comparing Louis Bachelier With Black-Scholes Merton
45 pages
Probabilistic Similarity Measures
No ratings yet
Probabilistic Similarity Measures
20 pages
EBSCO FullText 2024 06 21
No ratings yet
EBSCO FullText 2024 06 21
4 pages
How To Use Monte Carlo Simulation With GBM
No ratings yet
How To Use Monte Carlo Simulation With GBM
10 pages
Diametri e Conversioni
No ratings yet
Diametri e Conversioni
29 pages
12.1 Using Probabilities and Simulation
No ratings yet
12.1 Using Probabilities and Simulation
22 pages
Lecture3 Handouts
No ratings yet
Lecture3 Handouts
3 pages
SPATIAL VARIABILITY OF INFILTRATION IN A WATERSHED (Sharma 1980)
No ratings yet
SPATIAL VARIABILITY OF INFILTRATION IN A WATERSHED (Sharma 1980)
22 pages
JaLayer 2014
No ratings yet
JaLayer 2014
21 pages
Carlstrom Fuerst
No ratings yet
Carlstrom Fuerst
24 pages
Probability and Reliability Aspects in P PDF
No ratings yet
Probability and Reliability Aspects in P PDF
5 pages

SMBI Ch1 - Introduction To Bayesian Statistics

Uploaded by

SMBI Ch1 - Introduction To Bayesian Statistics

Uploaded by

Simulation Method and Bayesian Inference

Prepared by Nhim Malai

Department of Applied Mathematics and Statistics

Lesaffre, E., Lawson, A. B. (2012). Bayesian biostatistics. John Wiley Sons.

Srijith, R. Introduction to Computational Statistics for Data Scientists Specialization [MOOC].

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

5 Computing Posterior Distributions

Modes of Statistical Inference

The central activity in statistics is inference.

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

5 Computing Posterior Distributions

Bayes theorem = theorem on inverse probability

Example 1.1: Sensitivity, Specificity, and Prevalence

A: “diseased” (D + ); B: “positive diagnostic test” (T + )

Example 1.1: Sensitivity, Specificity, and Prevalence

Example 1.1: COVID-19

We are interested in taking a test to determine if we have COVID-19.

Not 100% but still high at 79%.

Breaking Down Bayes’ Rule

1 Modes of Statistical Inference

2 Introduction to Bayes Theorem

5 Computing Posterior Distributions

To describe the number of successful p in n total events. Each draw is an independent

Binomial Distribution [Python Code]

from __future__ import print_function

Binomial Distribution [Python Code]

def binomial_vector_over_y(theta, n):

fig = px.line(x=y, y=p_y, color_discrete_sequence=["steelblue"],

Negative Binomial Distribution

To describe the number of successes (r − 1) and x failures in (x + r − 1) trials, until you

Negative Binomial Distribution [Python Code]

x = np.linspace(0, total_events , total_events + 1)

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

Negative Binomial Distribution

To indicate the probability of a number of events.

P(y |θ) = θy e −θ /y ! x = 0, 1, 2, ...

Poisson Distribution [Python Code]

def poisson_vector(theta, y_end):

p_theta = (theta**y * np.exp(-theta)) / factorial(y)

# y is the number of events

To model the duration of events.

Exponential Distribution [Python Code]

def exponential_distribution(lambda_rate, x_end):

p_x = lambda_rate * np.exp(-lambda_rate * x)

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

To model the time taken for n independent events to occur.

Gamma Distribution [Python Code]

def gamma_individual(a, b, x_max):

term = b**a /gamma(a)

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

Figure: Gamma distribution by varying parameters α and β

Normal Distribution (Gaussian)

To model real-valued random variable.

Normal Distribution (Gaussian) [Python Code]

def normal_distribution(mean, sigma):

x = np.linspace(-4*sigma + mean ,4*sigma + mean, 50*sigma)

p_x = np.exp(-(x - mean)**2 / (2*sigma*sigma)) / (sigma * np.sqrt(2.0 * math.pi ))

fig = px.line(x=x, y=p_x, color_discrete_sequence=["steelblue"],

To model a right-skewed continuous random variable (long-tail towards the right).

X ∼ lognormal(µ, σ 2 ) then log(X ) ∼ N(µ, σ 2 )

Log-normal Distribution [Python Code]

def lognormal_distribution(mean, sigma):

p_x = np.exp(-(np.log(x) - mean)**2 / (2*sigma*sigma)) / (x * sigma * np.sqrt(2.0 * math.pi ))

mode = np.exp(mean - sigma**2)

Log-normal Distribution [Python Code]

#print("Mean %lf, sigma %lf, mode %lf "%(mean, sigma, mode))

Student’s t-Distribution [Python Code]

term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))

fig = px.line(x=t, y=p_t, color_discrete_sequence=["steelblue"],

Student’s t-Distribution [Python Code]

To model continuous random variables whose range is between 0 and 1.

term = gamma(alpha + beta) / ( gamma(alpha) * gamma(beta) )

fig = px.line(x=theta, y=p_theta, color_discrete_sequence=["steelblue"],

Figure: Beta distribution by varying parameters α (a) and β (b)

from future import print_function

x = np.linspace(-4sigma + mean ,4sigma + mean, 50*sigma)

p_x = np.exp(-(x - mean)**2 / (2sigmasigma)) / (sigma * np.sqrt(2.0 * math.pi ))

p_x = np.exp(-(np.log(x) - mean)**2 / (2sigmasigma)) / (x * sigma * np.sqrt(2.0 * math.pi ))