0% found this document useful (0 votes)
28 views18 pages

Chapter 2 B

The document discusses Bayesian inference for the mean and variance of a normal population using a conjugate prior distribution. It introduces Bayes' theorem for multiple parameters and the normal-gamma distribution as a conjugate prior. It then shows that the posterior distribution follows a normal-gamma distribution based on the prior and likelihood.

Uploaded by

emily
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views18 pages

Chapter 2 B

The document discusses Bayesian inference for the mean and variance of a normal population using a conjugate prior distribution. It introduces Bayes' theorem for multiple parameters and the normal-gamma distribution as a conjugate prior. It then shows that the posterior distribution follows a normal-gamma distribution based on the prior and likelihood.

Uploaded by

emily
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Chapter 2

Inference for a normal population

This chapter shows how to make inferences for the mean and variance of a normal
population using a conjugate prior distribution. First we need the multi-parameter version
of Bayes Theorem.

2.1 Bayes Theorem for many parameters

Suppose that now the probability (density) function we used to describe the data depends
on many parameters, that is, f (x|θ) where θ = (θ1 , θ2 , . . . , θp )T . After observing the
data, the likelihood function for θ is f (x|θ). Prior beliefs about θ are represented through
a probability (density) function π(θ). Therefore, using Bayes Theorem, the posterior
probability (density) function for θ is

π(θ) f (x|θ)
π(θ|x) =
f (x)
where
R
 Θ π(θ) f (x|θ) dθ
 if θ is continuous,
f (x) =

P
Θ π(θ) f (x|θ) if θ is discrete.

As in Chapter 1, this can be rewritten as

π(θ|x) ∝ π(θ) × f (x|θ)


i.e. posterior ∝ prior × likelihood.

Next we introduce a new distribution which will be useful later on.

31
32 CHAPTER 2. INFERENCE FOR A NORMAL POPULATION

Example 2.1

If X has a generalised ta (b, c) distribution (see page 101) then show that Y = (X −

b)/ c ∼ ta ≡ ta (0, 1).
Recall the general result: if X is a random variable with probability density function fX (x)
and g is a bijective (1–1) function then the random variable Y = g(X) has probability
density function
d −1
fY (y ) = fX g −1 (y )

g (y ) . (2.1)
dy

Solution
√ √
Here we take Y = g(X) = (X − b)/ c from which we obtain X = g −1 (Y ) = b + c Y .
Therefore using (2.1) we have

d −1
fY (y ) = fX g −1 (y )

g (y )
dy
√  √
= fY b + c y × c
− a+1
Γ a+1
 
2 y2 2 √
=√ 1 + × c, y ∈ R
acπ Γ 2a

a
− a+1
Γ a+1
 
2 y2 2
=√ 1 + , y ∈ R.
aπ Γ 2a

a

This is the ta density and so Y = (X − b)/ c ∼ ta .

Comment

Values for the density function fY (y ) and the distribution function FY (y ) can be obtained
by using the R functions dgt and pgt in the package nclbayes.
It is clear that ta (0, 1) ≡ ta by examining their densities. Therefore, it makes sense
to think of the ta distribution as the standard ta –distribution and make all calculations
for the generalised ta (b, c) distribution from this standard distribution. The relationship
between this standard and generalised version of the t-distribution is directly analogous
to that between the standard normal N(0, 1) distribution and its more general version:
the N(b, c) distribution. In both cases the relationship is one of location and scale:

Y −b
Y ∼ N(b, c) =⇒ √ ∼ N(0, 1)
c

Y −b
Y ∼ ta (b, c) =⇒ √ ∼ ta .
c
2.2. PRIOR TO POSTERIOR ANALYSIS 33

2.2 Prior to posterior analysis

Suppose we have a random sample from a normal distribution in which both the mean µ
and the precision τ are unknown, that is, Xi |µ, τ ∼ N(µ, 1/τ ), i = 1, 2, . . . , n (indepen-
dent). We shall adopt a (joint) prior distribution for µ and τ for which
 
1
µ|τ ∼ N b, and τ ∼ Ga(g, h)

for known values b, c, g and h. This distribution has density function

π(µ, τ ) = π(µ|τ )π(τ )


 cτ 1/2 n cτ o hg τ g−1 e −hτ
2
= exp − (µ − b) × , µ ∈ R, τ > 0
2π 2 Γ(g)
1
n τ o
∝ τ g− 2 exp − c(µ − b)2 + 2h , µ ∈ R, τ > 0.

(2.2)
2
We will use the notation NGa(b, c, g, h) for this distribution. Thus we take the prior
distribution  
µ
∼ NGa(b, c, g, h).
τ
 
µ
Determine the posterior distribution for .
τ
Hint:
2
nc(x̄ − b)2
 
2 2 cb + nx̄
c(µ − b) + n(x̄ − µ) = (c + n) µ − + .
c +n c +n

Solution

From (1.8), the likelihood function is


 τ n/2 h nτ  i
f (x|µ, τ ) = exp − s 2 + (x̄ − µ)2 .
2π 2
Using Bayes Theorem, the posterior density is

π(µ, τ |x) ∝ π(µ, τ ) f (x|µ, τ )

and so, for µ ∈ R, τ > 0


1
n τ o
π(µ, τ |x) ∝ τ g− 2 exp − c(µ − b)2 + 2h
2 h nτ  i
n
2 2
× τ exp −
2 s + (x̄ − µ)
2
n 1
n τ o
∝ τ g+ 2 − 2 exp − c(µ − b)2 + n(x̄ − µ)2 + 2h + ns 2
( 2 " 2 #)
2
 
n 1 τ cb + nx̄ nc(x̄ − b)
∝ τ g+ 2 − 2 exp − (c + n) µ − + + 2h + ns 2
2 c +n c +n
34 CHAPTER 2. INFERENCE FOR A NORMAL POPULATION

using the hint. Let


bc + nx̄
B= , C = c + n,
c +n
(2.3)
n cn(x̄ − b)2 ns 2
G=g+ , H=h+ + .
2 2(c + n) 2

Then the posterior density is


1
n τ o
π(µ, τ |x) ∝ τ G− 2 exp − C(µ − B)2 + 2H ,
2
µ ∈ R, τ > 0

Notice that this posterior density is of the same form as the prior density (2.2). Therefore,
we can conclude that the posterior distribution is
 
µ
x ∼ NGa(B, C, G, H).
τ

Thus, the NGa distribution is conjugate to this data model.

2.2.1 Marginal distributions

Suppose (µ, τ )T ∼ NGa(b, c, g, h). From the definition of the NGa distribution we know

that τ ∼ Ga(g, h). This also means that σ = 1/ τ ∼ Inv-Chi(g,h); see page 101.
The (marginal) density for µ is, for µ ∈ R
Z ∞
π(µ) = π(µ, τ ) dτ
Z0 ∞ n τ
g− 21 2
o
∝ τ exp − c(µ − b) + 2h dτ.
0 2

Now, as the integral of a gamma density over its entire range is one, we have
Z ∞ a a−1 −bθ Z ∞
b θ e Γ(a)
dθ = 1 =⇒ θa−1 e −bθ dθ = a .
0 Γ(a) 0 b

Therefore, for µ ∈ R
Z ∞ n τ o
g+ 21 −1 2
π(µ) ∝ τ exp − c(µ − b) + 2h dτ
0 2
1

Γ g+2
∝ 1
[{c(µ − b)2 + 2h}/2}]g+ 2
−g−1/2
c(µ − b)2

−g−1/2
∝h 1+
2h
− 2g+1
c(µ − b)2
  2
∝ 1+ .
2h
2.2. PRIOR TO POSTERIOR ANALYSIS 35

Comparing this density with that of the generalised t–distribution (on page 101) gives
 
h
µ ∼ t2g b, . (2.4)
gc

Thus, marginally, the prior distribution for µ is a t–distribution.


Similar calculations can be used to determine the (marginal) posterior distributions.

Summary of marginal distributions


 
µ
The prior ∼ NGa(b, c, g, h) has marginal distributions
τ

 
h
• µ ∼ t2g b, gc

• τ ∼ Ga(g, h)


Also σ = 1/ τ ∼ Inv-Chi(g, h).
 
µ
The posterior x ∼ NGa(B, C, G, H) has marginal distributions
τ

H

• µ|x ∼ t2G B, GC

• τ |x ∼ Ga(G, H)

Also σ|x ∼ Inv-Chi(G, H).


It can be shown that the posterior mean of µ is greater than its prior mean if and only if
the sample mean (likelihood mode) is greater than its prior mean, that is,

E(µ|x) > E(µ) ⇐⇒ x̄ > b.

The relationships between the prior and posterior variance of µ and mean and variance
of τ and of σ are rather more complex.

Example 2.2

Recall Example 1.4 on the earth’s density. Previously we assumed that the measurements
followed a N(µ, 0.22 ) distribution, that is, the standard deviation of the measurements
was known to be 0.2 g/cm3 . Now we consider the case where this standard deviation is
unknown and determine posterior distributions using the theory in section 2.2.
Before we can proceed, we must specify the parameters in the NGa(b, c, g, h) prior distri-
bution for (µ, τ ). In the previous analysis, we assumed that the population measurement
36 CHAPTER 2. INFERENCE FOR A NORMAL POPULATION

precision was τ = 1/0.22 = 25 and assumed a N(5.41, 0.42 ) prior distribution for the
population mean, that is, µ|τ = 25 ∼ N(5.41, 0.42 ).
Choice of b and c: the conditional prior distribution for µ is µ|τ ∼ N{b, 1/(cτ )} and so
matching the prior distributions for µ (when τ = 25) gives b = 5.41 and c = 0.25.
Choice of g and h: the marginal prior distribution for τ is τ ∼ Ga(g, h). Previously, we
assumed τ = 25 (with V ar (τ ) = 0) and so take this value as the prior mean: E(τ ) = 25.
Suppose we also decide that V ar (τ ) = 250. These two requirements give g = 2.5 and
h = 0.1. Therefore, we will assume the prior distribution
 
µ
∼ NGa(5.41, 0.25, 2.5, 0.1).
τ

T
We have  seen that if (µ, τ ) ∼ NGa(b, c, g, h) then the marginal distribution of µ is
µ ∼ t2g b, h/(gc) . Therefore, with this choice of prior distribution, the marginal prior
distribution for µ is
µ ∼ t5 (5.41, 0.16).

Figure 2.1 shows the close match between the new (marginal) prior distribution for µ and
that used previously.
0.8
density

0.4
0.0

4.0 4.5 5.0 5.5 6.0 6.5 7.0

Figure 2.1: Marginal prior density for µ: new version (solid) and previous version (dashed)

Determine the posterior distribution for (µ, τ )T . Also determine the marginal prior dis-
tribution for τ and for σ, and the marginal posterior distribution for each of µ, τ and σ.

Solution

We can combine the information in the NGa(5.41, 0.25, 2.5, 0.1) prior distribution
for (µ, τ )T with that in the data (n = 23, x̄ = 5.4848, s = 0.1882) using the results in
2.2. PRIOR TO POSTERIOR ANALYSIS 37

section 2.2 to obtain a NGa(B, C, G, H) posterior distribution, where


bc + nx̄ (5.41 × 0.25) + (23 × 5.4848)
B= = = 5.4840,
c +n 23.25
C = c + n = 23.25,
n
G = g + = 14,
2
cn(x̄ − b)2 ns 2 5.75
H=h+ + = 0.1 + (5.4848 − 5.41)2 + 11.5 × 0.18822 = 0.5080.
2(c + n) 2 46.5

The marginal prior distributions for τ and σ are

τ ∼ Ga(g, h) ≡ Ga(2.5, 0.1)


σ ∼ Inv-Chi(g, h) ≡ Inv-Chi(2.5, 0.1)

Also the marginal posterior distributions for µ, τ and σ are


 
H
µ|x ∼ t2G B, ≡ t28 (5.4840, 0.001561)
GC
τ |x ∼ Ga(G, H) ≡ Ga(14, 0.5080)
σ|x ∼ Inv-Chi(G, H) ≡ Inv-Chi(14, 0.5080)

Plots of the (marginal) prior and posterior distributions of µ, τ and σ are given in Fig-
ure 2.2. Note that the (marginal) prior and posterior distributions for σ can be determined
from that of τ . We can also examine the joint prior and posterior distributions for (µ, τ )T
via the contour plots of their densities to see if there is any change in the dependence
structure; see Figure 2.3. This figure is produced by using the R command NGacontour
in the nclbayes package as follows:

mu=seq(4.5,6.5,len=1000)
tau=seq(0,71,len=1000)
NGacontour(mu,tau,b,c,g,h,lty=3)
NGacontour(mu,tau,B,C,G,H,add=TRUE)

in which the variables b,c,g,h,B,C,G,H have already been set to their prior/posterior
values. A careful look at the values of the contour levels plotted shows that the highest
38 CHAPTER 2. INFERENCE FOR A NORMAL POPULATION

0 2 4 6 8
density

4.0 4.5 5.0 5.5 6.0 6.5 7.0

15
density

density

10
0.03

5
0.00

0
0 10 20 30 40 50 60 70 0.0 0.1 0.2 0.3 0.4 0.5

τ σ

Figure 2.2: Prior (dashed) and posterior (solid) densities for µ, τ and σ

contour level plotted for the prior density is 0.024 and the lowest level for the posterior
density is 0.05. From this we can conclude that the posterior distribution is far more
concentrated than the prior distribution. Also the contours for the posterior distribution
are much more elliptical than those for the prior distribution. This indicates a change
in the dependence structure. However, the main changes shown by the figure are in the
mean and variability of µ and τ .
Wikipedia tells us that the actual mean density of the earth is 5.515 g/cm3 . We can
determine the (posterior) probability that the mean density is within 0.1 of this value as
follows. We already know that µ|x ∼ t28 (5.484, 0.001561) and so we can calculate

P r (5.415 < µ < 5.615|x) = 0.9529

using pgt(5.615,28,5.484,0.001561)-pgt(5.415,28,5.484,0.001561).
Without the data, the only basis for determining the earth’s density is via the prior
distribution. Here the prior distribution is µ ∼ t5 (5.41, 0.16) and so the (prior) probability
that the mean density is within 0.1 of the (now known) true value is

P r (5.415 < µ < 5.615) = 0.1802,

calculated using pgt(5.615,5,5.41,0.16)-pgt(5.415,5,5.41,0.16).


These probability calculations demonstrate that the data have been very informative and
changed our beliefs about the earth’s density.
2.3. CONFIDENCE INTERVALS AND REGIONS 39

70
0.002

60
0.004

50
40

0.016
τ

0.02
30

0.024
20

0.2

0.05
0.022
10

0.018
0.014
2
0.01 0.01 0.008
0.006
0.004
0.002
0

4.5 5.0 5.5 6.0 6.5

Figure 2.3: Contour plot of the prior (dashed) and posterior (solid) densities for (µ, τ )T .

2.3 Confidence intervals and regions

Example 2.3

Determine the 100(1 − α)% highest density interval (HDI) for the population mean µ in
terms of quantiles of the standard t-distribution.

Solution
H

The marginal posterior distribution is µ|x ∼ t2G B, GC . This is a symmetric
distribution and so the HDI is an equi-tailed interval. Therefore the HDI (`, u) for µ
must satisfy
P r (µ < `|x) = α/2 and P r (µ > u|x) = α/2.

Now, given the data x

µ−B
p ∼ t2G
H/(GC)
40 CHAPTER 2. INFERENCE FOR A NORMAL POPULATION

and so
!
µ−B u−B
P r (µ > u|x) = α/2 ⇒ Pr p >p x = α/2
H/(GC) H/(GC)
u−B
⇒ p = t2G;α/2
H/(GC)
where t2G;p is the upper p point of the t2G distribution. Therefore
r
H
u = B + t2G;α/2 .
GC
Similar calculations give
r r
H H
` = B + t2G;1−α/2 = B − t2G;α/2
GC GC
since the t distribution is symmetric about zero. Thus the 100(1 − α)% HDI for µ is
r r !
H H
B − t2G;α/2 , B + t2G;α/2 .
GC GC

These intervals can be calculated easily using the R function qgt in the package nclbayes.
For example, the prior and posterior 95% HDIs for µ can be calculated using

c(qgt(0.025,2*g,b,h/(g*c)),qgt(0.975,2*g,b,h/(g*c)))
c(qgt(0.025,2*G,B,H/(G*C)),qgt(0.975,2*G,B,H/(G*C)))

Determining a highest density interval (HDI) for the population precision τ or standard
deviation σ is more complicated as their posterior distributions are not symmetric. The
(marginal) posterior for τ is τ |x ∼ Ga(G, H) and the (marginal) posterior for σ is σ|x ∼
Inv-Chi(G, H). HDIs can be found by using the R functions hdiGamma and hdiInvchi
in the package nclbayes. More standard equi-tailed confidence intervals can be found
using the functions qgamma and qinvchi.
For example, the prior and posterior 95% HDIs for τ can be calculated using R com-
mands hdiGamma(0.95,g,h) and hdiGamma(0.95,G,H), and those for σ using com-
mands hdiInvchi(0.95,g,h) and hdiInvchi(0.95,G,H). The 95% equi-tailed confi-
dence intervals are calculated in a similar way to the HDIs for µ above. So for τ , the
prior and posterior intervals are calculated using

c(qgamma(0.025,g,h),qgamma(0.975,g,h))
c(qgamma(0.025,G,H),qgamma(0.975,G,H))

and those for σ using

c(qinvchi(0.025,g,h),qinvchi(0.975,g,h))
c(qinvchi(0.025,G,H),qinvchi(0.975,G,H))
2.3. CONFIDENCE INTERVALS AND REGIONS 41

Prior Posterior
µ: (4.3818, 6.4382) (5.4031, 5.5649)
τ: (1.4812, 55.9573) (14.0193, 42.2530) ← HDI
(4.1561, 64.1625) (15.0674, 43.7625)
σ: (0.1062, 0.4246) (0.1466, 0.2505) ← HDI
(0.1248, 0.4905) (0.1512, 0.2576)

Table 2.1: Prior and posterior 95% intervals for the analysis in Example 2.2

The numerical values for the prior and posterior 95% intervals for the analysis in Exam-
ple 2.2 are given in Table 2.1. Notice that there is little difference between the posterior
HDI and equi-tailed intervals for τ and for σ, whereas the prior intervals are fairly differ-
ent. This is because the prior distributions are quite skewed but the posterior distributions
are fairly symmetric; see Figure 2.2.
In Bayesian inference it can also be useful to determine (joint) confidence regions for
several parameters, in this case, for (µ, τ )T . In general this is a difficult problem to solve
mathematically, and it is in this case.
42 CHAPTER 2. INFERENCE FOR A NORMAL POPULATION

Example 2.4

Determine a joint confidence region for (µ, τ )T .

Solution

We know that the (joint) prior distribution for these parameters is


 
µ
∼ NGa(b, c, g, h).
τ
Therefore an HDI–type confidence region takes the form
  
µ
: π(µ, τ ) > k
τ
  n τ 
µ g− 12 2
o 0
= : τ exp − c(µ − b) + 2h > k
τ 2
    
µ 1 τ 2
 00
= : g− log τ − c(µ − b) + 2h > k
τ 2 2
τ c(µ − b)2
    
µ 1
= : + hτ − g − log τ < kα
τ 2 2
where kα will depend on the confidence level of the region. These regions are not difficult
to draw. The difficult part is determining the appropriate value for kα to get say a 95%
confidence region. If we could determine the distribution of
τ c(µ − b)2
 
1
Y = + hτ − g − log τ
2 2
when
 
µ
∼ NGa(b, c, g, h)
τ
then we could get the value for kα . Unfortunately it is quite difficult to do this mathe-
matically. However, we can use simulation methods to get a pretty accurate value for kα
(for a given confidence level).

Using an additional argument in the R function NGacontour produces plots of confidence


regions. For example
2.4. PREDICTIVE DISTRIBUTION 43

mu=seq(3.5,7.5,len=1000)
tau=seq(0,80,len=1000)
NGacontour(mu,tau,b,c,g,h,p=c(0.95,0.9,0.8),lty=3)
NGacontour(mu,tau,B,C,G,H,p=c(0.95,0.9,0.8),add=TRUE)

produces a plot containing the 95%, 90% and 80% prior and posterior confidence regions
for (µ, τ )T for the prior and posterior distributions in Example 2.2; see Figure 2.4. The
upper plot shows contours of both prior and posterior densities. The numbers within
the plot are the contour levels. The largest prior confidence region is the 95% region.
The next largest is the 90% prior confidence region and the smallest is the 80% prior
confidence region. The same ordering holds for the posterior confidence regions. The
posterior contours are so concentrated in the middle of the plot that there is no room to
put in the contour levels. However, these can be see on the lower plot which also shows
the contours but focuses the parameter range to highlight the contours of the posterior
density. The values of the contours in this lower plot show that the posterior density is
much more peaked, that is, the posterior has a much reduced variability. The location
of the centre of the central contour for both the prior and posterior densities shows that
there has been little change in the mean/mode.

2.4 Predictive distribution

Suppose we sample another value y randomly from the population. What values is it
likely to take? This is described by its predictive distribution. We can determine this
distribution by using the definition of the predictive density
Z
f (y |x) = f (y |µ, τ ) π(µ, τ |x) dµ dτ

or by using Candidate’s formula (as this is a conjugate analysis). However, for this
model/prior, there is a more straightforward method to determine the predictive distri-
bution in this model.

As Y is a random value from the population, we have that Y |µ, τ ∼ N(µ, 1/τ ). We also
know that the posterior distribution is (µ, τ )T |x ∼ NGa(B, C, G, H). Therefore, we can
write

Y = µ + ε,

where
 
1
ε|τ ∼ N(0, 1/τ ) and µ|x, τ ∼ N B, .

Hence Y is the sum of two independent normal random quantities, and so
   
1 1 C+1
Y |x, τ ∼ N B, + ≡ N B, .
τ Cτ Cτ
44 CHAPTER 2. INFERENCE FOR A NORMAL POPULATION

80
60
40
τ

20

0.0053
0.0014 0.0027
0

4 5 6 7

µ
60

0.0053
50

0.056
40
τ

30
20

0.11

0.029
10

5.3 5.4 5.5 5.6 5.7

Figure 2.4: 95%, 90% and 80% prior (dashed) and posterior (solid) confidence regions
for (µ, τ )T
2.4. PREDICTIVE DISTRIBUTION 45

Thus, as τ |x ∼ Ga(G, H)
   
Y C
x ∼ NGa B, , G, H
τ C+1

and so, using (2.4)  


H(C + 1)
Y |x ∼ t2G B, .
GC

We can determine 100(1 − α)% predictive intervals by noting that the predictive distri-
bution is symmetric about its mean and therefore the HDI is
r r !
H(C + 1) H(C + 1)
B − t2G;α/2 , B + t2G;α/2 .
GC GC
46 CHAPTER 2. INFERENCE FOR A NORMAL POPULATION

These predictive intervals can be calculated easily using the R function qgt. For example,
in Example 2.2, the prior and posterior predictive HDIs for a new value Y from the
population are (4.2604, 6.5596) and (5.0855, 5.8825) respectively, calculated using

c(qgt(0.025,2*g,b,h*(c+1)/(g*c)),qgt(0.975,2*g,b,h*(c+1)/(g*c)))
c(qgt(0.025,2*G,B,H*(C+1)/(G*C)),qgt(0.975,2*G,B,H*(C+1)/(G*C)))

2.5 Summary

Suppose we have a normal random sample with Xi |µ, τ ∼ N(µ, 1/τ ), i = 1, 2, . . . , n


(independent).

(i) (µ, τ )T ∼ NGa(b, c, g, h) is a conjugate prior distribution.

(ii) The posterior distribution is (µ, τ )T |x ∼ NGa(B, C, G, H) where the posterior pa-
rameters are given by (2.3).

(iii) The marginal prior distributions are µ ∼ t2g {b, h/(gc)}, τ ∼ Ga(g, h), σ = 1/ τ ∼
Inv-Chi(g, h).

(iv) The marginal posterior distributions are µ|x ∼ t2G {B, H/(GC)}, τ |x ∼ Ga(G, H),
σ|x ∼ Inv-Chi(G, H).

(v) Prior and posterior means and standard deviations for µ, τ and σ can be calculated
from the properties of the t, Gamma and Inv-Chi distributions.

(vi) Prior and posterior probabilities and densities for µ, τ and σ can be calculated using
the R functions pgt, dgt, pgamma, dgamma, pinvchi, dinvchi.

(vii) HDIs or equi-tailed CIs for µ, τ and σ can be calculated using qgt, hdiGamma,
hdiInvchi, qgamma, qinvchi.

(viii) Contour plots of the prior and posterior densities for (µ, τ )T can be plotted using
the NGacontour function.

(ix) Prior and posterior confidence regions for (µ, τ )T can be plotted using the NGacontour
function.

(x) The predictive distribution for a new observation Y from the population is Y |x ∼
t2G {B, H(C + 1)/(GC)} and its HDI can be calculated using the qgt function.
2.6. WHY DO WE HAVE SO MANY DIFFERENT DISTRIBUTIONS? 47

2.6 Why do we have so many different distributions?

So far we have used many distributions, some you will have met before and some will be
new. After a while the variety and sheer number of different distributions can become
overwhelming. Why do we need so many distributions and why do we name so many of
them?
Statistics studies the random variation in experiments, samples and processes. The variety
of applications leads to their randomness being described by many different distributions.
In many applications, bespoke distributions will need to be formulated. However, some
distributions come up time and time again for modelling random variation in data and
for describing prior beliefs. It is helpful for us to be able to refer to these distributions –
and so we give each one a name – and also to be able to quote known results for these
distributions such as their mean and variance. In this chapter you have been introduced
to a generalisation of the t-distribution and the inverse chi distribution, and we have been
able to use results for their mean and variance to study prior and posterior distributions
and have been able to plot these distributions using functions in the R package.
You will meet several other new distributions in the remainder of the module. You won’t
be surprised to hear that it is useful to have a working knowledge of each of these
distributions but perhaps not vital to remember all their properties listed in these notes.
To help in this regard, the exam paper will contain a list of all the distributions used in
the exam, together with their density (or probability function) and any useful results such
as their mean and variance (as needed for the exam); see the specimen exam paper at
the back of this booklet.
48 CHAPTER 2. INFERENCE FOR A NORMAL POPULATION

2.7 Learning objectives

By the end of this chapter, you should be able to:


• determine the posterior distribution for (µ, τ )T

• determine and use the univariate prior and posterior distributions

• determine confidence intervals, HDIs and confidence regions

• determine the predictive distribution of another value from the population, and its
predictive interval

• determine the predictive distribution of the mean of another random sample from
the population
both in general and for a particular prior and data set. Also you should be able to:
• appreciate the benefit of naming distributions and for having lists of properties for
these distributions

You might also like