STA 114: Statistics Notes 12. The Jeffreys Prior
STA 114: Statistics Notes 12. The Jeffreys Prior
1
ξθ (θ) for θ. Now suppose we look at a re-parametrization η = h(θ), given by a smooth
monotone transformation h. The reparametrized model is X ∼ g(x|η), η ∈ E, where g(x|η) =
f (x|h−1 (η)) and E = h(Θ) = {h(θ) : θ ∈ Θ}. Suppose the principle, when applied to the
re-parametrized model, produces a prior pdf ξη (η) on η.
But one could also derive a prior pdf ξ˜η (η) by starting from the prior pdf ξ(θ) on θ and
using the transformation η = h(θ). This pdf is given by ξ˜η (η) = ξθ (h−1 (η))/|h0 (h−1 (η)|.
Jeffreys demand of invariance is same as saying that the two pdfs ξη (η) [found by applying
the principle directly on η] and ξ˜η (η) [found by applying the principle to θ and then deriving
the corresponding pdf on η] should be the same. A little algebra shows that
ξη (η) = ξ˜η (η), for all η ∈ E
⇐⇒ ξη (h(θ)) = ξ˜η (h(θ)), for all θ ∈ Θ
ξθ (θ)
⇐⇒ ξη (h(θ)) = 0 , for all θ ∈ Θ
|h (θ)|
Qndata X has n components (X1 , · · · , Xn ) and the model is Xi ∼ g(xi |θ), then f (x|θ) =
If
i=1 g(xi |θ) and so
n
∂2
X
F
I (θ) = − E[Xi |θ] 2 log g(Xi |θ) = nI1F (θ)
i=1
∂θ
where I1F (θ) is the single observation Fisher information of Xi ∼ g(xi |θ) at θ.
The Jeffreys proposal of a non-informative prior pdf for the model X ∼ f (x|θ) is
p
ξ J (θ) = const. × I F (θ).
R p
If Θ I F (θ)dθ is finite number, then the constant is taken to be one over this number, so that
ξ J (θ) defines a pdf over Θ. If this integral is infinite, the constant is left unspecified, and the
corresponding function ξ J (θ) is called an “improper” prior pdf of θ ∈ Θ. An improper prior
pdf is accepted so long as it produces a proper posterior pdf for every possibly observation
X = x. That is
f (x|θ)ξ J (θ)
ξ J (θ|x) = R
Θ
f (x|θ0 )ξ J (θ0 )dθ0
2
R
must be a pdf on Θ, which means the integral f (x|θ)ξ J (θ)dθ must be finite. Even though an
improper pdf is not really a pdf, it still expresses relative plausibility scores through the well
defined ratios ξ J (θ1 )/ξ J θ2 (the arbitrary constant cancels from numerator and denominator,
so its exact value does not matter).
Below we show that the principle behind the construction Jeffreys prior is invariant to
smooth, monotone transformation of the parameter. Here we briefly comment why it is
“non-informative”. It turns out that the Jeffreys prior is indeed the uniform prior over the
parameter space Θ, but not under the Euclidean geometry (pdfs depend on the geometry,
as they give limits of probability of a set over the volume of the set, and volume calculation
depends on geometry). The geometry that one needs to consider stems from defining a
distance between θ1 , θ2 ∈ Θ in terms of the distance between the two pdfs f (x|θ1 ) and f (x|θ2 ).
An advantage of this definition of distance is that it remains invariant to reparametrization
under monotone transformation.
3
This is a “flat” prior over the parameter Rspace (−∞, ∞). Unfortunately, this does not lead
∞
to a pdf for any value of the constant as −∞ dµ = ∞. So this is an improper prior.
The posterior associated with the Jeffreys prior is
2
exp{− (x̄−µ)
2σ 2 /n
}
ξ J (µ|x) = R ∞ 0 )2 = Normal(x̄, σ 2 /n)
−∞
exp{− (x̄−µ
2σ 2 /n
}dµ0
which is a proper pdf. Thus the Jeffreys prior is an “acceptable one” in this case.
It is an interesting fact that summaries of ξ J (µ|x) numerically match summaries from
classical inference. For example, the posterior mean and median is x̄ which happens√ to be
µ̂MLE (x). Also, a 100(1 − α)% central posterior credible interval is x̄ ∓ σz(α)/ n which
matches the 100(1 − α)% confidence interval for µ.
4
So
E[X|µ,σ2 ] σn2 E[X|µ,σ2 ] n(X̄−µ) n !
σ4 σ2
0
I F (µ, σ 2 ) = = .
(n−1)s2X +n(X̄−µ)2 n
E[X|µ,σ2 ] n(X̄−µ)
σ4
E[X|µ,σ2 ] {− 2σn4 + σ6
} 0 2σ 4
which Pfollow from the facts (i) X̄ ∼ Normal(µ, σ 2 /n) which has mean µ and variance σ 2 and
(ii) σ2 ni=1 (Xi − X̄)2 ∼ χ2n−1 which has mean n − 1.
1