0% found this document useful (0 votes)
5 views

bayes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

bayes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Bayes Estimation

February 26-28, 2008

Bayesian estimation begins with an assumed probability distribution on the parameter space Θ. With
this approach θ itself is a random variable and the observations X1 , . . . , Xn are conditionally independent
given the value of θ. Consequently, in Bayesian statistics, (X, θ) is a random variable on the state space
S n × Θ.
The distribution of θ on Θ is called the prior distribution. We shall denote its density by π. Together,
the prior distribution and the parametric family {Pθ ; θ ∈ Θ} determine the joint distribution of (X, θ).

1 Bayes Formula
For events A and B, recall that the conditional probability is

P (A|B)P (B) = P (A ∩ B) = P (B|A)P (A),


or
P (B|A)P (A)
P (A|B) = .
P (B)
Now, if we set
A = {θ = θ0 } and B = {X = x},
then
P {X = x|θ = θ0 }P {θ = θ0 }
P {θ = θ0 |X = x} = .
P {X = x}
If the appropriate densities exist, then we can write Bayes formula as
!
fX|Θ (x|θ0 )
fΘ|X (θ0 |x) = R π(θ0 ),
fX|Θ (x|θ̃)π(θ̃) dθ̃

to compute the posterior density fΘ|X (θ0 |x) as the product of the Bayes factor and the prior density.
If T is a sufficient statistic and fX|Θ (x|θ̃) = h(x)g(θ, T (x)), then the Bayes factor

fX|Θ (x|θ0 ) h(x)g(θ, T (x)) g(θ, T (x))


R =R =R
fX|Θ (x|θ̃)fΘ (θ̃) dθ̃ h(x)g(θ̃, T (x)) dθ̃ g(θ̃, T (x)) dθ̃

is a function of T .
Example 1 (Normal observations and normal prior). Suppose that

1
• θ is N (θ0 , 1/λ), and
• that given θ, X consists of n conditionally independent N (θ, 1) random variables.
q
λ
Then the prior density is fΘ (θ) = 2π exp(− λ2 (θ − θ0 )2 ), and
n
1X
fX|Θ (x|θ) = (2π)−n/2 exp(− (xi − θ)2 )
2 i=1
n
n 1X
= (2π)−n/2 exp(− (θ − x̄)2 − (xi − x̄)2 ).
2 2 i=1

The posterior density is proportional to


1
k(x) exp(− (n(θ − x̄)2 + λ(θ − θ0 )2 ))
2
n+λ
= k̃(x) exp(− (θ − θ̃(x))2 ).
2
where
λθ0 + nx̄
θ̃(x) = .
λ+n
Thus, the posterior distribution is
N (θ1 (x), 1/(λ + n)).
Note that it is a function of the sufficient statistics T (x) = x1 + · · · + xn . If n is small, then θ1 (x) is
near θ0 . If n is large, θ1 (x) is near x̄.

2 Bayes Action
Recall that given a loss function L and an estimator d the risk function R : Θ × D → R is the expected loss
for that decision.
R(θ, d) = Eθ L(θ, d(X))
and the mean risk, or Bayes risk,
Z Z Z
r(π, d) = R(θ, d)π(θ) dθ = L(θ, d(x))fX (x|θ)π(θ) dxdθ.
Θ Θ Rn

The decision function that minimizes risk is called the Bayes action.
If the loss function is L1 (θ, a) = |θ − a|, then the posterior median minimizes risk and thus the Bayes
action θ̂1 (x) satisfies
Z θ̂1 (x)
1
= fΘ|X (θ|x) dθ.
2 −∞
If the loss function is L2 (θ, a) = (θ − a)2 , then the posterior mean minimizes risk and thus the Bayes
action Z
θ̂2 (x) = E[θ|X = x] = θfΘ|X (θ|x) dθ.

For the example of a normal prior and normal observations, θ̂1 (x) = θ̂2 (x) = θ̃(x).

2
10
8
dbeta(x, 1, 11)

6
4
2
0

0.0 0.2 0.4 0.6 0.8 1.0

x
Figure 1: Beta posterior distribution with t = 0, 1, · · · , 10 successes in 10 Bernoulli trials based on a uniform
prior

Example 2. Let the prior distribution π on θ be a beta distribution with parameters α and β and consider
Bernoulli observations X1 , . . . , Xn with parameter θ. T (X) = X1 + · · · + Xn is a sufficient statistic. The
posterior distribution
Γ(α + β) α−1
fΘ|X (θ|x) ∝ L(θ|x)π(θ) = θT (x) (1 − θ)n−T (x) θ (1 − θ)β−1 , 0 ≤ θ ≤ 1.
Γ(α)Γ(β)
Thus,
fΘ|X (θ|x) ∝ θT (x)+α−1 (1 − θ)n−T (x)+β−1 0 ≤ θ ≤ 1.
and the posterior distribution is Beta(T (x) + α, n − T (x) + β). If we want to estimate θ using a quadratic
risk function, then
T (x) + α
θ̂(x) = E[θ|X = x] = .
n+α+β
The uniform distribution on [0, 1] has a Beta(1, 1) distribution. In this case

T (x) + 1
θ̂(x) = .
n+2
The posterior densities are graph using R in Figure 1 using
> curve(dbeta(x,1,11),0,1)
> for (i in 2:11){curve(dbeta(x,i,12-i),0,1,add=TRUE)}

You might also like