SMBI Ch1 - Introduction To Bayesian Statistics
SMBI Ch1 - Introduction To Bayesian Statistics
Chapter 01
Introduction to Bayesian Statistics
References
Hoff, P. D. (2009). A first course in Bayesian statistical methods (Vol. 580). New York:
Springer.
2 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference
Contents
3 Common Distributions
4 Priors
3 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference
4 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference
Frequentist Approach
Classical statistical approach is the method that provides statistical inference based on the
classical P-value, the significance level, the power, and the confidence interval (CI).
Mix of two approaches (Fisher’s approach & Neyman and Pearson’s approach)
Fisher’s Approach
Inductive approach
Introduction of null hypothesis (H0 ), significant test, P-value (= evidence against H0 ), and
significant level. NO alternative hypothesis. NO power.
Neyman and Pearson’s Approach
Deductive approach
Introduction of the alternative hypothesis (HA ), type I error, type II error, power, and
hypothesis test.
In practice the two approaches are mixed.
5 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference
Likelihood Approach
Inference purely on likelihood function has not been developed to a full-blown statistical
approach.
Considered here as a precursor to the Bayesian approach.
Likelihood function = plausibility of the observed data as a function of the parameters of
the stochastic model.
Likelihood(θ|data) = P(x|θ)
Likelihood does not form a valid probability since the Probability Density Function formed
by varying θ with the observed data does not integrate to 1.
6 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Modes of Statistical Inference
Bayesian Approach
Central idea of Bayesian approach: combine likelihood (data) with your prior knowledge
(prior information) to update information on the parameter to result in a revised
probability associated with the parameter (posterior probability).
Example of Bayesian reasoning in real life:
Tourist: prior views on Cambodian + visit Cambodia (data) ⇒ posterior view on
Cambodian.
Marketing: launching of new energy drink on the market.
Medical: Patients treated for CVA1 with thrombolytic agents suffer from severe bleeding
accident (SBA). Historical studies (20% - prior), pilot study (10% - data) ⇒ posterior
1 CVA - Cerebral Vascular Accident (a brain attack) is an interruption in the flow of blood to cells in the brain.
7 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Contents
3 Common Distributions
4 Priors
8 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Bayes’ Rule
P(B|A)P(A)
P(A|B) =
P(B)
Equivalence to,
P(B|A)P(A)
P(A|B) =
P(B|A)P(A) + P(B|Ā)P(Ā)
9 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
10 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Bayes’ Rule
P(T + |D + )P(D+)
P(D + |T + ) =
P(T + |D + )P(D + ) + P(T + |D − )P(D − )
In terms of Se , Sp , and prev
Se · prev
prev + =
Se · prev + (1 − Sp ) · (1 − prev )
11 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
12 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Bayes’ Rule:
P(B|A)P(A)
P(A|B) =
P(B)
Posterior: P(A|B). In Bayesian analysis, we are often looking for the posterior to
represent the distribution of the parameter given the data.
Likelihood: P(B|A) In the future, we will see that the likelihood will represent the
likelihood of observing the data given the parameters.
Prior: P(A). Prior can represent a belief, it can be informed, or vague.
Marginal: P(B). This is a constant and in many analyses may be dropped out.
13 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Exercise 1.1
A car repair shop receives a car with reports of a strange noise coming from the engine. The
shop knows 90% of the cars that come in for “noises” have a loose fan belt while the other
10% have a loose muffler. A common description, 95%, of cars having loose mufflers is the
rattle. Less commonly, 8%, fan belt issues can also sound like a rattle. The car owner is
describing the strange noise as a rattle. What is the probability the car has a loose muffler?
1 78%
2 57%
3 95%
14 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Exercise 1.2
It is estimated that 80% of emails are spam. You have developed a new algorithm to detect
spam. Your spam software can detect 99% of spam emails but has a false positive rate of 5%.
Your company receives 1000 emails in a day, how many emails will be incorrectly marked as
spam?
1 10
2 20
3 5
4 200
5 50
15 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Introduction to Bayes Theorem
Exercise 1.3
You have developed a new algorithm for detecting fraud. It has a sensitivity of 90% with a
specificity of 95%. Choose the correct statement:
1 true positive rate = 90%, true negative rate = 5%
2 true positive rate = 90%, true negative rate = 95%
16 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Contents
3 Common Distributions
4 Priors
17 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Binomial Distribution
18 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
INTERACT_FLAG = True
19 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
if(INTERACT_FLAG):
interact(binomial_vector_over_y, theta=0.5, n=15)
else:
binomial_vector_over_y(theta=0.5, n=10)
20 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Binomial Distributions
21 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Mean = r (1 − θ)/θ
Variance = r (1 − θ)/θ2
Example: To measure the number of days your car would work before it breaks down for
the 3rd time.
Conditions
Count of discrete events
The events can be non-independent (the events can influence or cause other events)
Variance can exceed the mean
22 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
if(INTERACT_FLAG):
interact(negative_binomial_vector_over_y, theta=0.9, total_events=15)
else:
negative_binomial_vector_over_y(theta=0.9, total_events=15)
23 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
24 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Poisson Distribution
Mean = Variance = λ
Example: To model the number of accidents at an intersection. To model the number of
Salmonella outbreaks in a year.
Conditions
Discrete non-negative data - count of events, the rate parameter can be a non-integer
positive value
Each event is independent of other events
Each event happens at a fixed rate
A fixed amount of time in which the events occur
25 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
if(INTERACT_FLAG == True):
interact(poisson_vector, theta=7, y_end=20)
else:
poisson_vector(theta=7, y_end=20)
26 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Poisson Distribution
27 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Exponential Distribution
28 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
x = np.linspace(0,x_end,x_end*4)
if(INTERACT_FLAG):
interact(exponential_distribution, lambda_rate = 4, x_end=20)
else:
exponential_distribution(lambda_rate = 0.2, x_end=20)
29 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Exponential Distribution
30 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Gamma Distribution
α > 0 is the shape parameter; β > 0 is the rate parameter, or the inverse scale parameter
Mean = α/β
Variance = α/β 2
Example: To model the time taken for 4 bolts in your car to fail.
Conditions
Continuous non-negative data
A generalization of the exponential distribution, but more parameters to fit,
An exponential distribution models the time to the first event, Gamma distribution models
the time to the “n” event.
31 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
if(INTERACT_FLAG):
interact(gamma_individual,a=2,b=1,x_max=10)
else:
gamma_individual(2,1,10)
32 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Gamma Distribution
.
33 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
34 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
import math
if(INTERACT_FLAG):
interact(normal_distribution, mean = 4, sigma = 3)
else:
normal_distribution(mean = 5, sigma = 4)
35 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Normal Distribution
36 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Log-normal Distribution
1 2 2
) /2σ 2
P(x) = √ e −(ln(x)−µ
xσ 2π
37 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
import math
x = np.linspace(0.1,2.5,100)
38 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
OPTION = 2
if(OPTION == 1):
mean_x = 2 # CHANGE THIS
sigma_x = 2 # CHANGE THIS
mean = np.log(mean_x**2 / (np.sqrt(mean_x**2 + sigma_x**2)))
sigma = np.log(1 + (sigma_x**2 / mean_x**2))
else:
sigma = 0.2 # CHANGE THIS
mode = 0.8 # CHANGE THIS
mean = np.log(mode + sigma**2)
if(INTERACT_FLAG):
interact(lognormal_distribution, mean = 1, sigma = 0.25)
else:
lognormal_distribution(mean = mean, sigma = sigma)
39 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Log-normal Distribution
40 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Student’s t-Distribution
Similar to the normal distribution with its bell shape but has heavier tails.
r
Γ( ν+1
2 ) λ λ(x − µ)2 −(ν+1)/2
p(x) = ν (1 + )
Γ( 2 ) νπ ν
Mean = µ
Variance = µ/(ν − 2)λ
Example: A distribution of test scores from an exam which has a significant number of
outliers and would not be appropriate for a Normal distribution
Conditions
Continuous data
Unbounded distribution
Considered an overdispersed Normal distribution, a mixture of individual normal distributions
with different variances
41 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
def studentst_distribution(v):
t = np.linspace(-10,10,100)
if(INTERACT_FLAG == True):
interact(studentst_distribution, v=10)
else:
studentst_distribution(v=10)
42 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
v = 1
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t, line=go.scatter.Line(color="gray"), showlegend=True))
fig.add_scatter(x=t, y=p_t, name="v=1", mode="lines")
v = 4
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t2 = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t2, line=go.scatter.Line(color="blue"), showlegend=True))
fig.add_scatter(x=t, y=p_t2, name="v=2", mode="lines")
v = 10
term1 = gamma((v + 1)/2) / (np.sqrt(v * math.pi) * gamma(v/2))
term2 = (1 + t**2 / v)**(-(v + 1)/2)
p_t2 = term1 * term2
#fig.add_trace(go.Scatter(x=t, y=p_t2, line=go.scatter.Line(color="blue"), showlegend=True))
fig.add_scatter(x=t, y=p_t2, name="v=10", mode="lines")
43 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Student’s t-Distribution
44 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Beta Distribution
Γ(a + b) a−1
P(θ|a, b) = θ (1 − θ)b−1
Γ(a)Γ(b)
a
Mean = a+b
ab
Variance = (a+b)2 (a+b+1)
Example: in Bayesian analyses, the beta distribution is often used as a prior distribution of
the parameter p (which is bounded between 0 and 1) of the binomial distribution.
Conditions
Takes positive values between 0 and 1 as input
Setting a and b to 1 gives you a uniform distribution
45 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Beta Distribution
# Beta posterior with uniform Beta prior, a=1, b=1
def beta_vector_theta(num_p, total, a, b):
alpha = num_p + a
beta = total - num_p + b
theta = np.linspace(0,1,25)
print("Posterior a =",alpha)
print("Posterior b =",beta)
if(INTERACT_FLAG):
interact(beta_vector_theta, num_p = 4, total=10, a=1, b=1)
else:
beta_vector_theta(num_p = 4, total=10, a=1, b=1)
46 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Common Distributions
Beta Distribution
. 47 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Contents
3 Common Distributions
4 Priors
48 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Priors
Bayes’ Rule gives us a method to update our beliefs based on prior knowledge.
Prior is the unconditional probability of the parameters before the (new) data.
Prior can come from a number of sources including:
past experiments or experience
some sort of desire for balance or weighting in a decision
non-informative, but objective
mathematical convenience
The choice of prior is as much about what is currently known about the parameters. It is
often subjective and contested. Two broad types of prior:
1 non-informative
2 informative
The prior can be proper, i.e. conform to the rules of probability and integrate to 1, or
improper.
Convenient choice of priors can lead to closed-form solutions for the posterior.
49 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Conjugate priors
Conjugate priors are priors that induce a known (same family as prior) distribution in the
posterior.
Example:
Data: X ∼ Bern(θ) Q
Likelihood: f (x|θ) = ni=1 θix (1 − θ)1−xi = θk (1 − θ)n−k , where k =
P
xi .
Posterior = Likelihood × prior
50 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
51 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Non-informative priors
Non-informative priors are priors that suggest ignorance as to the parameters. These are
sometimes called vague or diffuse priors.
The priors generally cover the region of the parameter space relatively smoothly.
Common non-informative priors: U[−100, 100], N[0, 104 ].
52 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Jeffrey’s prior is a non-informative prior that is derived from the Fisher information.
We do not specify prior information. We use the data information to shape the prior.
Fisher’s information In (θ) tells us how much information about θ is included in the data.
Jeffery’s prior is derived by:
p ∂ ln f (θ) ∂ 2 lnL(θ)
p(θ) ∝ In (θ), where In (θ) = Eθ [ ] = −Eθ [ ]
∂θ ∂θ2
53 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Example:
Data: X ∼ gamma(α, β), assuming α is known and β is unknown.
Fisher’s information, In (β) = nα
β 2 leading to the Jeffrey’s prior for β:
nα √ 2 √
r
p(β) ∝ 2
= nσ = σ n ∝ 1
β
Note: Jeffrey’s priors are not guaranteed to be a proper prior. Perhaps most importantly,
Jeffrey’s priors are stable under reparameterization.
54 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
Informative priors
Informative priors are explicitly chosen to represent current knowledge or belief about the
parameter of interest.
When choosing informative priors, one can also choose the form of prior.
Example: Tossing Coins
We were given a new coin and were told it would generate heads with P(heads) = 0.75.
We conduct a new experiment to characterize the distribution of θ.
When dealing with Bernoulli trials, a computationally convenient on the prior is beta(a, b).
55 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
56 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
x = np.linspace(0,1,100)
plt.legend(loc="upper left")
plt.show()
57 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
We can tune our prior belief using the mean or even mode to center our belief and
variance as a measure of the strength of belief.
Example: Tossing Coins
For prior, beta(6.9,3):
a a−1 ab
E [x] = = 0.70 mode(x) = = 0.77 V (x) = = 0.02
a+b a+b−2 (a + b)2 (a + b + 1)
58 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Priors
The general approach to using priors in models is to start with some justification for a
prior, run the analysis, then come up with competing priors and re-examine the
conclusions under the alternative priors.
Many Bayesian experts recommend that a sensitivity analysis should always be conducted.
The process takes place as follows:
The researcher predetermines a set of priors to use for model estimation.
The model is estimated, and convergence is obtained for all model parameters.
The researcher comes up with a set of competing priors to examine.
Results are obtained for the competing priors and then compared with the original results
through a series of visual and statistical comparisons.
The final model results are written up to reflect the original model results (obtained in Item
1, from the original priors), and the sensitivity analysis results are also presented in order to
comment on how robust (or not) the final model results are to different prior settings.
59 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
Contents
3 Common Distributions
4 Priors
60 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
1 Γ(α0 )Γ(β0 )
p(θ) = θα0 −1 (1 − θ)β0 −1 , B(α0 , β0 ) =
B(α0 , β0 ) Γ(α0 + β0 )
62 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
Posterior distribution
p(θ|x) ∝ L(θ|x)p(x) ∝ Beta(α, β)
1 Γ(α)Γ(β)
p(θ|x) = θα−1 (1 − θ)β−1 , B(α, β) =
B(α, β) Γ(α + β)
where α = α0 + x and β = β0 + n − x
63 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
64 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
x = np.linspace(0, 1, 100)
def f(p):
return p**(10) * (1-p)**(50-10)
65 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
p = sy.Symbol("p")
I = sy.integrate(f(p), (p, 0, 1))
print(I)
scaled_likelihood = (x)**(10)*(1-x)**(50-10) * (523886186670)
# posterior: likelihood*prior
alpha1 = alpha0 + y # 9+10=19
beta1 = beta0 + n - y # 93+50-10=133
posterior = beta.pdf(x, alpha1, beta1)
plt.plot(x, posterior, "r--", label="posterior = beta(19, 133)")
plt.legend(loc="upper right")
plt.ylim(-0.1, 20)
plt.xlim(0,0.4)
plt.show()
66 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
67 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
68 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
69 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
def f(p):
return p**(10) * (1-p)**(50-10)
p = sy.Symbol("p")
I = sy.integrate(f(p), (p, 0, 1))
print(I)
scaled_likelihood = (x)**(10)*(1-x)**(50-10) * (523886186670)
70 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
# posterior: likelihood*prior
alpha1 = alpha0 + y # 1+10=11
beta1 = beta0 + n - y # 1+50-10=41
posterior = beta.pdf(x, alpha1, beta1)
plt.plot(x, posterior, "r--", label="posterior = beta(11, 41)")
plt.legend(loc="upper right")
plt.ylim(-0.1, 20)
plt.xlim(0,0.4)
plt.xlabel('theta')
plt.show()
71 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Binomial Case
Consider you are doing a coin toss experiment. You are given a presumably unfair coin with
p(heads) = 0.80 from 20 coins tossed. You are now collecting new data and analyzing the
posterior by doing 10 coin tosses and getting 4 heads.
A. Choose the distribution for your prior and construct your posterior distribution.
B. In case no prior information is available, construct your posterior distribution.
72 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
73 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
1 (y −µ)2
f (y ) = √ e − 2σ2
2πσ
Sample y1 , ..., yn the we obtained the likelihood:
" n
# " 2 #
1 X 1 µ − ȳ
L(µ|y ) ∝ exp − 2 (yi − µ)2 ∝ exp − √ ≡ L(µ|ȳ )
2σ 2 σ/ n
i=1
74 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
75 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
Denote sample n0 IBBENS data: y 0 ≡ {y0,1 , y0,2 , ..., y0,n0 } with mean ȳ0
Likelihood ∝ N(µ0 , σ02 )
µ0 ≡ ȳ0 = 328 √
√
σ0 = σ/ n0 = 120.3/ 563 = 5.072
IBBENS prior distribution
" 2 #
1 1 µ − µ0
p(µ) = √ exp −
2πσ0 2 σ0
with µ0 ≡ ȳ0
76 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
IBBENS-2 Study:
Sample y with n = 50
ȳ = 318 mg/day and s = 119.5 mg/day
The 95% confidence interval = [284.3, 351.9] mg/day ⇒ wide
Combine IBBENS prior distribution and IBBENS-2 Normal likelihood:
IBBENS-2 likelihood: L(y |ŷ )
IBBENS prior density: N(µ0 , σ02 )
Posterior distribution ∝ p(µ)L(µ|ȳ ):
( " 2 2 #)
1 µ − µ0 µ − ȳ
p(µ|y ) ∝ p(µ|ȳ ) ≡ exp − + √
2 σ0 σ/ n
77 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
78 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
79 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
1
= w0 + w1
σ̄ 2
with w0 = 1/σ02 = prior precision and w1 = 1/(σ 2 /n) = sample precision.
80 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
81 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
Prior variance σ02 = σ 2 ⇒ σ¯2 /(n + 1) ⇒ Prior information = adding one extra observation
to the sample.
General: σ02 = sigma2 /n0 , with n0 general
n0 n
µ̄ = µ0 + ȳ
n0 + n n0 + n
and
σ2
σ̄ 2 =
n0 + n
82 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
83 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
84 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
85 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Gaussian Case
86 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
β0α0 α0 −1 −β0 θ
p(θ) = θ e
Γ(α0 )
Posterior
n
Y θ α0 α0 P
p(θ|y ) ∝ L(θ|y )p(θ) ∝ e −nθ (θiy /yi !) 0 θα −1 e −β0 θ ∝ θ( yi +α0 )−1 e −(n+β0 )θ
Γ(α0 )
i=1
P
Recognize kernel of a Gamma( yi + α0 , n + β0 ) distribution
β̄ ᾱ ᾱ−1 −β̄θ
⇒ p(θ|y ) ≡ p(θ|ȳ ) = θ e
Γ(ᾱ)
P
with ᾱ = yi + α0 = 9758 + 3 = 9761 and β̄ = n + β0 = 4351 + 1 = 4352 ⇒ STM
Study: The effect of the prior is minimal.
88 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
89 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
90 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
91 / 92
SMBI: Chapter 1, Introduction to Bayesian Statistics
Computing Posterior Distributions
The Poisson Case
92 / 92