Single Parameter Models
Single Parameter Models
Yi Yang
1/37
Outline
Bayes’ Theorem
Prediction
Prior Elicitation
2/37
Bayes’ Theorem
Let A denote an event, and Ac denote its complement. Thus, A ∪ Ac = S
and A ∩ Ac = ∅, where S is the sample space. We have
P(A) + P(Ac ) = P(S) ≡ 1
Let A and B be two non-empty events, and P(A|B) denote the probability
of A given that B has occurred. From basic probabilities, we have
P(A ∩ B)
P(A|B) = ,
P(B)
and thus, P(A ∩ B) = P(A|B)P(B).
Likewise, P(A ∩ B) = P(B|A)P(A) and P(Ac ∩ B) = P(B|Ac )P(Ac ).
Observe that
P(A ∩ B) P(A ∩ B)
P(A|B) = =
P(B) P(A ∩ B) + P(Ac ∩ B)
P(B|A)P(A)
=
P(B|A)P(A) + P(B|Ac )P(Ac )
P(B|Aj )P(Aj )
P(Aj |B) = Pm , j = 1, . . . , m.
i=1 P(B|Ai )P(Ai )
4/37
Bayes’ Theorem Applied to Statistical
Models
Suppose we have observed data y, which has a probability
distribution f (y|θ) that depends upon an unknown vector of
parameters θ, and π(θ) is the prior distribution of θ that represents
the experimenter’s opinion about θ.
1.2
posterior with n = 1
posterior with n = 10
1.0
0.8
density
0.6
0.4
0.2
0.0
-2 0 2 4 6 8
θ
0.7
posterior with τ=1
posterior with τ=2
0.6
posterior with τ=5
0.5
0.4
density
0.3
0.2
0.1
0.0
−2 0 2 4 6 8 10
f (y |θ)π(θ) f (y |θ)π(θ)
p(θ|y ) = =R .
m(y ) Θ
f (y |θ)π(θ)dθ
Note that m(y ) does NOT depend on θ, and thus is just a constant.
That is,
p(θ|y ) ∝ f (y |θ)π(θ).
9/37
Deriving the Posterior
σ2 τ2 σ2 τ 2
p(θ|y ) = N θ | 2 µ+ 2 y,
σ + τ2 σ + τ 2 σ2 + τ 2
10/37
Bayes and Sufficiency
Recall that T (y) is sufficient for θ if the likelihood can be factored as
f (y|θ) = h(y)g (T (y)|θ).
Implication in Bayes:
p(θ|y) ∝ f (y|θ)π(θ) ∝ g (T (y)|θ)π(θ)
Then p(θ|y) = p(θ|T (y)) ⇒ we may work with T (y) instead of the
entire dataset y.
2
We know that f (ȳ |θ) = N(θ, σn ), this implies that
σ2
!
n τ2 σ2 τ 2
p(θ|ȳ ) = N θ σ2
µ + σ2 ȳ , 2 .
n +τ 2
n +τ
2 σ + nτ 2
11/37
Single Parameter Model: Binomial Data
Example: Estimating the probability of a female birth. The currently
accepted value of the proportion of female births in large European
populations is 0.485. Recent interest has focused on factors that
may influence the sex ratio.
Question: How much evidence does this provide for the claim that
the proportion of female births in the population of placenta previa
births is less than the proportion of female births in the general
population?
12/37
Example: Probability of a female birth
given placenta previa
Likelihood: Let
Γ(α + β) α−1
π(θ) = θ (1 − θ)β−1 .
Γ(α)Γ(β)
13/37
Example: Probability of a female birth
given placenta previa
14/37
Three different beta priors
Beta(1,1)
2.5
Beta(1.485,1.515)
Beta(5.85,6.15)
2.0
prior density
1.5
1.0
0.5
0.0
θ
15/37
Bayesian Inference
Now that we know what the posterior is, we can use it to make
inference about θ.
1 Point estimation
3 Hypothesis testing
16/37
Bayesian Inference: Point Estimation
Mean has the opposite property, tending to ”chase” heavy tails (just
like the sample mean X̄ )
Median is probably the best compromise overall, though can be
awkward to compute, since it is the solution θmedian to
Z θ median
1
p(θ|x) dθ = .
−∞ 2
17/37
Posterior estimates
Prior Posterior
distribution Mode Mean Median
Beta(1, 1) 0.44592 0.44603 0.44599
Beta(1.485, 1.515) 0.44596 0.44607 0.44603
Beta(5.85, 6.15) 0.44631 0.44642 0.44639
437
The classical point estimate is θ̂MLE = 980 = 0.44592.
Remarks:
1 A Bayes point estimate is a weighted average of a common
frequentist estimate and a parameter estimate obtained only from
the prior distribution.
2 The Bayes point estimate “shrinks” the frequentist estimate toward
the prior estimate.
3 The weight on the frequentist estimate tends to be 1 as n goes to
infinity.
18/37
Bayesian Inference: Interval Estimation
19/37
HPD Credible Interval
Definition: The 100(1 − α)% highest posterior density (HPD) credible
interval for θ is a subset C of Θ such that
C = {θ ∈ Θ : p(θ|y) ≥ k(α)} ,
P(C |y) ≥ 1 − α .
0.20
posterior
posterior
0.10
0.10
0.00
0.00
0 2 4 6 8 10 12 0 2 4 6 8 10 12
θ θ
An HPD credible interval has the smallest volume of all intervals of the
same α level.
20/37
Equal-tail Credible Interval
This interval is usually slightly wider than HPD interval, but easier
to compute (just two quantiles), and also transformation invariant.
21/37
Interval Estimation: Example
0.2
0.1
0.0
0 2 4 6 8 10
22/37
Example: probability of a female birth
f (X |θ) = Bin(980, θ), π(θ) = Beta(1, 1), xobs = 437
25
20
15
posterior
10
5
0
24/37
Bayes Factor
25/37
Bayes Factor vs Likelihood Ratio Test
26/37
Interpretation of Bayes Factor
Possible interpretations
BF Strength of evidence
1 to 3 barely worth mentioning
3 to 20 positive
20 to 150 strong
> 150 very strong
27/37
Example: Probability of a female birth
Data: x = 437 out of n = 980 placenta previa births were female.
We test the hypothesis that H0 : θ ≥ 0.485 vs. H1 : θ < 0.485.
Choose the uniform prior π(θ) = Beta(1, 1), and the prior probability
of H1 is
P(θ < 0.485) = 0.485.
Limitations:
NOT well-defined when the prior π(θ|H) is improper
may be sensitive to the choice of prior
29/37
Bayesian Prediction
We are often interested in predicting a future observation, yn+1 ,
given the observed data y = (y1 , . . . , yn ). A necessary assumption is
exchangeability.
31/37
Prior Elicitation
32/37
Noninformative Prior
33/37
Jeffreys Prior
p(θ) = [I (θ)]1/2 ,
34/37
Conjugate Priors
35/37
Another Example of Conjugate Prior
Suppose that X is distributed as Poisson(θ), so that
e −θ θx
f (x|θ) = , x ∈ {0, 1, 2, . . .}, θ > 0.
x!
A reasonably flexible prior for θ is the Gamma(α, β) distribution,
θα−1 e −θ/β
p(θ) = , θ > 0, α > 0, β > 0,
Γ(α)β α
p(θ|x) ∝ f (x|θ)p(θ)
∝ θx+α−1 e −θ(1+1/β) .
There is one and only one density proportional to the very last
function, Gamma(x + α, (1 + 1/β)−1 ) density. Gamma is the
conjugate family for the Poisson likelihood.
36/37
Common Conjugate Families
Binomial(N, θ) θ ∼ beta(α, λ)
Poisson(θ) θ ∼ gamma(δ0 , γ0 )
Exp(λ) λ ∼ gamma(δ0 , γ0 )
37/37