0% found this document useful (0 votes)
20 views

Lecture 11

The document discusses conjugate distributions and their use in Bayesian inference for continuous probability distributions. Specifically, it introduces the concept of conjugate priors, which allow for analytical solutions to Bayes' rule for certain sampling distributions with unknown parameters. This allows Bayesian updating of beliefs about continuous parameters like failure rates based on observed data. Key distributions mentioned that have conjugate priors include the binomial, geometric, negative binomial, Poisson, exponential, and normal distributions. The document aims to explain these concepts to help a decision maker analyze real-world problems involving continuous uncertainties.

Uploaded by

ZUHAL TUGRUL
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Lecture 11

The document discusses conjugate distributions and their use in Bayesian inference for continuous probability distributions. Specifically, it introduces the concept of conjugate priors, which allow for analytical solutions to Bayes' rule for certain sampling distributions with unknown parameters. This allows Bayesian updating of beliefs about continuous parameters like failure rates based on observed data. Key distributions mentioned that have conjugate priors include the binomial, geometric, negative binomial, Poisson, exponential, and normal distributions. The document aims to explain these concepts to help a decision maker analyze real-world problems involving continuous uncertainties.

Uploaded by

ZUHAL TUGRUL
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Lecture 11: Conjugate Distributions

Daniel Frances 2016


c

Contents

1 Introduction 3

2 Continuous Probability Review 5

2.1 Variety of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Modeling Continuous Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Probability Distribution versus Probability Density Function . . . . . . . . . 6

2.4 Discrete and Continuous Distributions . . . . . . . . . . . . . . . . . . . . . 7

2.5 Distribution Normalizing Constant . . . . . . . . . . . . . . . . . . . . . . . 8

3 Eliciting Probabilities over Continuous Events 8

4 Conjugate Distributions 9

4.1 Binomial Sampling with unknown failure rate p . . . . . . . . . . . . . . . . 9

4.1.1 Example - The machining problem . . . . . . . . . . . . . . . . . . . 10

4.2 Geometric Sampling with unknown failure rate p . . . . . . . . . . . . . . . . 11

4.3 -ve Binomial Sampling with unknown failure rate p . . . . . . . . . . . . . . 12

4.4 Poisson Sampling with unknown event rate λ . . . . . . . . . . . . . . . . . . 12

4.5 Exponential Sampling with unknown event rate λ . . . . . . . . . . . . . . . 12

4.6 Normal Sampling Distn. with Unknown mean and precision . . . . . . . . . 13

4.6.1 Example - The student heights problem . . . . . . . . . . . . . . . . 13

1
CONTENTS CONTENTS

5 Conclusion 15

Page 2 Compiled on 2016/11/09 at 18:01:24


1 INTRODUCTION

1 Introduction

The change from discrete to continuous priors is quite a large change, requiring new concepts
and numerical techniques.Up to the 1990’s it was only possible to use continuous prior
distributions if they met the necessary conjugacy condition, something we will learn about
this week. If the condition is not met we can always resort to using OpenBUGS.

To solve problems using conjugate prior distributions, amounts to no more than deriving
simple formulas and applying them to solve decision problems. But the concepts involved
are new and take some getting used to. Let’s use the previous machinery example as a base
to illustrate these concepts.

• Before we were not sure the equipment sent to a customer would be perfectly flat or
not, and it could be tested before shipment with some level of accuracy.

• Suppose now that the customer could return the equipment at no charge if it failed
within the first 30 days of usage.

• The company has no way of testing to determine if it will fail in the first 30 days.

• The DM thinks of this problem as each piece of equipment has some probability of
failing within 30 days and he has a hard time guessing the true value of that probability.

• The only way DM is going to get a better handle on that probability is to make the
best possible product within reason, and then to monitor the number of returns as
sales proceed.

• DM does remember from statistics that if θ is the probability that a unit will fail within
the first 30 days then the probability of observing y returns after sending N units is a
Binomial Distribution  
N y
p(y|θ) = θ (1 − θ)N −y
y

• DM notes that unknown distribution parameters such as θ are continuous quantities,


so prior beliefs about θ will likely take the form of a continuous probability density
function pdf (θ)

• DM also recalls Bayes Rule which would include pdf (θ) as

p(y|θ)pdf (θ)
pdf (θ|y) = R
Θ
p(y|θ)pdf (θ)dθ

• So DM realizes that if some type of prior distribution pdf (θ) can be elicited then, in
principle, once the number returns y is observed, a posterior distribution pdf (θ|y) can
be computed to make appropriate business decisions that maximize expected utility.

Page 3 Compiled on 2016/11/09 at 18:01:24


1 INTRODUCTION

• He might , for example, stop sales to improve the product if more than a threshold
number of units are returned as faulty, or if his profit margin is low enough, due to
returns.

• The key problem for the DM is the difficulty with integrating the denominator of Bayes’
Rule for a particular continuous prior.

• To allow DM to proceed additional tools are required:

A. Conjugacy: An analytical solution to allow Bayes Rule to be applied to selected


discrete and continuous sampling distributions with unknown parameters. That
is the topic here.
B. MCMC: Numerical sampling methods to allow Bayes’ Rule to be applied to all
continuous distributions; the next topic.

Page 4 Compiled on 2016/11/09 at 18:01:24


2 CONTINUOUS PROBABILITY REVIEW

2 Continuous Probability Review

Before moving on we need some soak time regarding continuous probability

2.1 Variety of Distributions

Figure shows the wide variety of distributions which occur naturally in various settings.

Pallette.pdf

Figure 1: Distribution Pallete

We will spend some time becoming familiar with some notable examples such as the Beta,
Exponential, Gamma, Normal, and Weibull distributions.

2.2 Modeling Continuous Uncertainty

For discrete distributions the height of each of the bars - the reading on the y-axis - is the
probability that the value along the x-axis will materialize.

Page 5 Compiled on 2016/11/09 at 18:01:24


2.3 Probability Distribution2versus
CONTINUOUS
Probability Density
PROBABILITY
Function REVIEW

Is the same true for the continuous distributions? Is the height of the distribution the
probability that the value along the x-axis will materialize? NO

The probability that any specific value on the x-axis will materialize is zero. Just like the
probability that a dart will fall exactly on a single infinitesimally small dot on the line is
zero. Yet the dart falls somewhere! The solution to this paradox is that the point of every
dart and every line is not infinitesimally small. The overlap between dart point and line
has a finite area. And thus we always deal with probabilities that the dart will fall within a
given small part of the line. Thus if the distribution relates to the probability that the dart
falls on various parts of the x-axis then the area under the curve corresponding to a small
interval on the line is the probability that the dart will fall in that interval.

This also solves another misconception.

Can the height of the discrete distribution ever be greater than 1. Clearly not.

Can the height of the continuous distribution ever be greater than 1. Yes! How can that be?

Suppose that the interval is really small, say 0.01 cm, and that there is a 5% chance of falling
in that interval. Then the area under the distribution (approx height x width ) must be 0.05.
Thus the height must be 5, so that 5*.01=.05.

2.3 Probability Distribution versus Probability Density Function

Because of this qualitative difference between modeling discrete and continuous uncertainty
it is common not to refer to the graphs in the picture as Probability Distributions, but to
name them instead Probability Density Functions, or pdf.

The term Probability Distribution for continuous distributions is usually called the Cumu-
lative Probability Function, or cdf, which is defined as the probability that the value on the
x-axis will be less than or equal to a given value.

Z x
CDF (x) = pdf (x)dx
−∞

In probabilistic modeling we practically never take integrals based on calculus knowledge.


They are often not numerically defined! Only approximation exist which are then tabulated
in tables, callable within programs or apps.

BUT the integral sign is used conceptually to determine what needs to be computed in order
to solve our problems. So we cannot afford to be intimidated by the math!

Page 6 Compiled on 2016/11/09 at 18:01:24


2.4 Discrete and Continuous2 Distributions
CONTINUOUS PROBABILITY REVIEW

2.4 Discrete and Continuous Distributions

Figure 2 shows discrete distributions which occur naturally in various settings. In this part of
the course we assume that if the data we are collecting is an integer, it is distributed according
to one of these frequently occurring distributions. Similarly Figure 3 shows continuous
Discrete Distributions
Distribution Use Range Parameters Pdf Mean Var
Bernoulli  2 outcomes {0,1} 0<p<1
Binomial  Successes in n Bernoulli trials {0,…n} n>0 (int)
 Defective items in batch of n 0<p<1

Geometric  Successes before a failure in Bernoulli {0,1,…} 0<p<1


trials
 Items inspected before defective

Negative  Successes before rth failures in Bernoulli {0,1,…} r>0 (int)


Binomial trials 0<p<1
 Items inspected before rth defective

Poisson  Events in interval of time or space {0,1,…} >0

Figure 2: Discrete Distributions

distributions which occur naturally in various settings. In this part of the course we will
assume that if the information or data we are collecting is a real number, then it will be
distributed according to either the Exponential, Normal or Uniform distributions. The
Gamma, Beta and Pareto are for use in the coming Conjugate Distribution section.
Continuous Distributions
Distribution Use Range Parameters Pdf Mean Variance
Loc Scale Shape
Exponential  Inter-arrival times [0, ) >0
 Time to failure
Gamma Conjugate to [0, ) >0 >0
 Exponential
 Poisson
Normal  Errors (-,)  >0
 Sums & Averages
  = precision = 1/2
Beta Conjugate to [0, 1] >0,
 Bernoulli β>0
 Binomial
 Negative Binomial
Uniform  First Model, No data [a, b] a b-a>0

Pareto Conjugate to Uniform over (x0,) x0>0 α>2


[0,b]

Figure 3: Continuous Distributions

Page 7 Compiled on 2016/11/09 at 18:01:24


2.5 Distribution
3 ELICITING Normalizing
PROBABILITIES
Constant OVER CONTINUOUS EVENTS

2.5 Distribution Normalizing Constant

One useful observation to help understand the concept of conjugacy is that many distribu-
tions pdf (x) include multiplication by a constant to ensure the area under the distribution
equals 1. This constant is commonly referred to as a Normalizing Constant.
βα
The Normalizing Constant for the Exponential Distribution is β, for the Gamma τ (α)
, for
τ (α+β)
Normal √τ , for the Beta , for the Uniform 1
, and for the Pareto αxα0 .
2π τ (α)τ (β) b−a

3 Eliciting Probabilities over Continuous Events

Clearly the earlier material was focused on discrete events, i.e. the probability that a well
defined event Ai will take place, where i = 1 . . . n. Of course since it will take elicitation of
n − 1 probabilities this can become an onerous task for large n.

But what if we are trying to elicit probabilities over an infinite number of possibilities, e.q.
income, temperature, interest rates, etc. One way would be to assume a finite number of
discrete possibilities, and proceed in the same way. But what if the DM was concerned that
different discrete versions of the continuous reality will lead to different decisions, and would
much rather use a continuous variable model.

In that case we would define events in terms of ranges. Typically we define events so that
the resulting elicited probabilities result in Cumulative Probabilities. For example, suppose
we wish to assess the probability distribution of the demand for a certain product. Then we
might start by getting the DM indifference point between

A: a bet that the demand will exceed 1500 units.

B: a bet that the spinner falls within a given area of the wheel.

The relative size of the area is then the subjective probability that the demand will exceed
1500m units. Figure 4 below shows how a cumulative curve can be derived based on 16
assessments. Note that assessment 1 is way off. The diagram is marked from repeated
assessments. By measuring the slope of the curve at various points, and using a product
such as BestFit (used in MIE360) a continuous probability density function can be derived.

Experience seems to prefer an alternate approach for assessing continuous quantities through
indifference between the following:

A: a bet that the demand will exceed a given level.

B: a bet that spinner falls within 50% of the wheel area.

Page 8 Compiled on 2016/11/09 at 18:01:24


4 CONJUGATE DISTRIBUTIONS

Figure 4: Eliciting Cumulative Probabilities

Next the 50% would be replaced by 25% and 75% and repeated If need be the process would
continue by splitting each of the subintervals into equal halves until there were sufficient
points to draw a smooth graph between the points.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

4 Conjugate Distributions

4.1 Binomial Sampling with unknown failure rate p

Suppose the data collected is the number of failures x in n trials, which we know is Binomially
distributed. The reason we are collecting x is to give us a better idea about the underlying
uncertainty about the failure rate p.

If we had a prior distribution of p then we would be able to use Bayes rule in the form
n x

P r(x|p)pdf (p) x
p (1 − p)n−x pdf (p) px (1 − p)n−x pdf (p)
pdf (p|x) = R = R n x = R
p
P r(x|p)pdf (p)dp p x
p (1 − p)n−x pdf (p)dp p
px (1 − p)n−x pdf (p)dp

For any arbitrary prior distribution pdf (p) it is very difficult to accurately and efficiently
compute the integral in the denominator of Baye’s Rule. It is no coincidence that cumulative
distributions CDF (x) often appear as tables; it is because they are difficult to compute.

BUT, if the prior distribution is forced to be a ”conjugate” distribution to the sampling


distribution then the math simplifies. For this example, the sampling distribution is the
Binomial, the uncertainty is about p which we know lies in an interval [0, 1], and one of the

Page 9 Compiled on 2016/11/09 at 18:01:24


4.1 Binomial Sampling with unknown4 failure
CONJUGATE
rate p DISTRIBUTIONS

well known distributions over [0, 1], the Beta distribution is Conjugate to the Binomial. So
let’s assume the the DM believes that the prior distribution of p is Beta with parameters α0
and β 0 . Now we see how the math simplifies.

For starters, we take a surprising step which totally removes the need to take the integral in
the denominator of Bayes Rule. We replace the = sign with a proportionality ∝ sign. For a
given x
pdf (p|x) ∝ px (1 − p)n−x pdf (p)
Replacing pdf (p) with the formula for the Beta, with initial parameters α0 , β 0 yields

τ (α0 + β 0 ) α0 −1 0
pdf (p|x) ∝ px (1 − p)n−x 0 0
p (1 − p)β −1
τ (α )τ (β )
Removing the Normalization Constant
0 −1 0 −1 0 −1 0 −1
pdf (p|x) ∝ px (1 − p)n−x pα (1 − p)β = px+α (1 − p)n−x+β

We now observe that what remains, looks like a Beta Distribution, without its Normalizing
Constant and with revised parameters α+ , β +
0 +x)−1 0 +n−x)−1 + −1 + −1
pdf (p|x) ∝ p(α (1 − p)(β = pα (1 − p)β

Although it’s not necessary we can reinsert the Normalization Constant to explicitly state
the Posterior distribution as
τ (α+ + β + ) α+ −1 +
pdf (p|x) = + +
p (1 − p)β −1 Voila!
τ (α )τ (β )

Thus the posterior distribution pdf (p|x) is again a Beta Distribution with parameters α+ =
α0 + x, β + = β 0 + n − x

Note that if we collect N Binomial observations x1 , . . . , xN with the number of trials given
by each of size n1 , . . . ,P
nN then the posterior distribution
PN will P be a Beta distribution with
+ 0 N 0 + 0 N 0
PN
parameters α = α + i=1 xi = α + N x, β = β + i=1 ni − i=1 xi = β + i=1 ni − N x

4.1.1 Example - The machining problem

A machinery supplier (the DM) has a new machine which a customer is willing to buy for
$10,000, but pay only if it still works after a month. Otherwise the broken machine will be
sent back, and the supplier will receive nothing. The supplier initially has no idea about the
chances that it will work fine, except he thinks that on average it will be fine 80% of the
time. It costs the supplier $7,000 for each machine, so on average he seems to be doing fine,
but if the expected profit is less than $500 per unit he plans to discontinue the sales.

If we let p be the percentage of machines that fail within 30 days, then the sampling dis-
tribution of xi out of a sales volume of ni in month i will be binomially distributed with
unknown parameter p.

Page 10 Compiled on 2016/11/09 at 18:01:24


4.2 Geometric Sampling with unknown
4 failure
CONJUGATE
rate p DISTRIBUTIONS

Suppose that after 6 months the monthly sales volumes are 98, 132, 121, 117, 108, 143 and
the number of returns were 12, 16, 22, 32, 15 and 20. Thus there were a total 126 returns
out of a total sales of 719. What is the expected profit for the next month?

In order to take advantage of conjugacy, the prior on p has to be beta distributed. Since the
DM mentioned that on average he believes that 80% of units are good, therefore his prior
parameters α0 , β 0 have to be such that

α0
p0 = = 0.2
α0 + β 0
Suppose that his prior belief is based on a thinking that about 20 units out of 100 will fail
then α0 = 20, β 0 = 80.

Recall from before that α+ = α0 + ni=1 x and β + = β 0 + ni=1 ni − ni=1 xi .


P P P

Thus α+ = 20 + 117 = 137 and β + = 80 + 719 − 137 = 662, so that the expected number of
returns next month is p+ = 137/(137 + 662) = 0.17, less than previously believed, and so the
business continues with an expected profit per unit of 0.83 ∗ 3000 + 0.17 ∗ (−7000) = 1300.

4.2 Geometric Sampling with unknown failure rate p

Suppose the data collected is the number of failures x before a success, which we know is
Geometrically distributed. Again, the reason we are collecting x is to give us a better idea
about the underlying uncertainty about the failure rate p.

If we had a prior distribution of p then we would be able to use Bayes rule. Again, for any
arbitrary prior distribution pdf (p) it is very difficult to accurately and efficiently compute
the integral in the denominator of Baye’s Rule. BUT, if the prior distribution is forced to be
a ”conjugate” to the Geometric distribution then the math simplifies. Again the conjugate
prior for the Geometric sampling distribution is the Beta distribution. Let α0 , β 0 be the
DM’s initial beliefs.

For a given x

pdf (p|x) ∝ px (1 − p)pdf (p)


τ (α0 + β 0 ) α0 −1 0
∝ px (1 − p) 0 0
p (1 − p)β −1
τ (α )τ (β )
0 +x)−1 0 +1)−1
∝ p(α (1 − p)(β

Thus the posterior is again a Beta distribution with α+ = α0 + x, β + = β 0 + 1.

Note that if we collect N Geometrically distributed observations


PN x1 , . . 0. , xN then+the posterior
+ 0
distribution will be a Beta distribution with α = α + i=1 xi = α + N x, β = β 0 + N

Page 11 Compiled on 2016/11/09 at 18:01:24


4.3 -ve Binomial Sampling with unknown
4 CONJUGATE
failure rate p DISTRIBUTIONS

4.3 -ve Binomial Sampling with unknown failure rate p

Suppose the data collected is the number of failures x before r successes, which we know is
Negative Binomial distributed, which again has a Beta conjugate distribution. For given x
pdf (p|x) ∝ px (1 − p)r pdf (p)
τ (α0 + β 0 ) α0 −1 0
∝ px (1 − p)r 0 0
p (1 − p)β −1
τ (α )τ (β )
0 +x)−1 0 +r)−1
∝ p(α (1 − p)(β

Thus the posterior is again a Beta distribution with α+ = α0 + x, β + = β 0 + r.

Note that if we collect N -ve Binomially distributed observations


PN x1 , . . 0. , xN then the poste-
rior distribution will be a Beta distribution with α = α + i=1 xi = α +N x, β = β 0 +rN
+ 0 +

4.4 Poisson Sampling with unknown event rate λ

Suppose the data collected is the number of events x per unit time, which we know is Poisson
distributed, for which the Gamma distribution is conjugate. For a given x
pdf (λ|x) ∝ e−λ λx pdf (λ)
0 −1 0λ
∝ e−λ λx λα e−β
0 +x)−1 0 +1)
∝ λ(α e−(β
Thus the posterior is again a Gamma distribution with α+ = α0 + x, β + = β 0 + 1.

Note that if we collect N Poisson distributed observations


PNx1 , . . . , xN0 then the posterior
+ 0
distribution will be a Gamma distribution with α = α + i=1 xi = α + N x, β = β 0 + N
+

4.5 Exponential Sampling with unknown event rate λ

Suppose the data collected is a time between (memoryless) events x which we know is
Exponentially distributed, for which the Gamma distribution is again conjugate. For a
given x
pdf (λ|x) ∝ λe−λx pdf (λ)
0 −1 0λ
∝ λe−λx λα e−β
0 +1)−1 0 +x)λ
∝ λ(α e−(β
Thus the posterior is again a Gamma distribution with α+ = α0 + 1, β + = β 0 + x.

Note that if we collect N Exponentially distributed observations x1 , . . . , xNPthen the posterior


distribution will be a Gamma distribution with α+ = α0 + N, β + = β 0 + N 0
i=1 xi = β + N x

Page 12 Compiled on 2016/11/09 at 18:01:24


4.6 Normal Sampling Distn. with Unknown
4 CONJUGATE
mean and precision
DISTRIBUTIONS

4.6 Normal Sampling Distn. with Unknown mean and precision

Here we deal with with a sampling distribution with more than one parameter, i.e. both the
mean and the precision of the sampling distribution are assumed unknown. We will assume
that the DM can specify a prior on the mean µ given the precision τ , and a prior on the
precision τ so that the joint mean-precision prior is given by p(µ, τ ) = p(µ|τ )p(τ ). Thus we
will be dealing with two conjugate distributions for the normal sampling case with unknown
mean and precision. The result without proof is:

Theorem 4.1. For the normal sampling distribution with mean µ and precision τ the
following pair of prior distributions is conjugate:

• Normal prior for the mean, with mean µ0 and precision τ 0 = n0 τ where n0 is a
measure of strength in prior belief.

• Gamma prior for the precision, with parameters α0 , β 0

Then after observing a sample x1 . . . xn


n
µ+ = ρx + (1 − ρ)µ0 where ρ =
n0 +n
τ + = (n + n0 )τ
α+ = α0 + n/2
1 2 2 
β+ = β0 + nSn + n0 ρ µ0 − x
2

4.6.1 Example - The student heights problem

Suppose you observe the heights of 6 students entering a class room 1 (64, 73, 64, 63, 69 and
71 inches), and you are asked to bet $100 on the height of the next student. In exchange for
your $100, you will be given 200 − 20|d − z|, where d is your guess of the height, and z is
the actual height of the student. If the reward is negative you will not have to worry about
it. Assume you wish to maximize your expected returns, should you play? And if so, what
should be your guess d?

• We should play if the expected reward of playing exceeds the expected reward of not
playing, and if we play we should maximize the expected reward of playing. Thus in
either case we need to determine the d∗ that maximizes the expected reward of playing,
and if the expected reward is positive we play.

• If the reward would have been 200 − 20(d − z)2 then we would need to select d to
minimize the expected square error. It is well known that the mean value, or expected
1
Based on Biostatistics Workshop Notes by Professor Michael Escobar

Page 13 Compiled on 2016/11/09 at 18:01:24


4.6 Normal Sampling Distn. with Unknown
4 CONJUGATE
mean and precision
DISTRIBUTIONS

value E(z), minimizes the square error.


• But in our case the reward is maximized if the absolute error |d − z| is minimized. It
can be shown2 that the median value of z minimizes the absolute error.
• from experience we know that heights are normally distributed with some mean and
precision, which in our case are both unknown
• but fortunately the distribution of heights is symmetrical, thus its mean = median.
• thus we need to set d∗ = posterior mean µ+
• to take advantage of conjugate formulas, our prior beliefs must be conjugate to the
sampling distribution, i.e. the prior distribution of the mean needs to be normal, and
the prior distribution of the precision needs to be a gamma distribution.

Since we do not know neither the mean nor the variance (or precision) of the heights in ad-
vance we usually need to specify 4 parameters which reflect our opinion about the uncertainty
of the heights:

1. µ0 - our prior belief of the mean heights


2. n0 - a measure of confidence in this belief
3. α0 - one of the parameter of the prior gamma distribution
4. β 0 - the other parameter

To estimate µ0 we need to have a prior guesstimate of the heights of teenagers. Suppose


that from general experience we think that 5’6” is appropriate. Remember that this mean
needs to predate the observation of the data.

For estimating n0 we will use the concept of pseudo-observations. Observe that τ + = (n0 +
n)τ . We will interpret n0 as the number of representative kids that our 5’6” estimate is
based upon. Suppose that our estimate is based on 4 kids, then n0 = 4.

We have decided that d = µ will maximize our expected reward if µ were known. Based on
the prior estimates, prior to collecting data the DM would use d = µ0 =5’6” to maximize his
expected reward. What we need to compute now is the posterior estimate of the mean µ+ .

From the previous material on conjugate distributions


n
µ+ = ρy + (1 − ρ)µ0 where ρ =
n0 +n
With the prior estimates and observed data provided this means that
n 6
ρ= = = 0.6 and µ+ = ρy + (1 − ρ)µ0 = 0.6 ∗ 67.33 + 0.4 ∗ 66 = 66.80
n0 +n 4+6
2
See web.uvic.ca/~dgiles/blog/median2.pdf

Page 14 Compiled on 2016/11/09 at 18:01:24


5 CONCLUSION

Thus to maximize the expected reward the DM will select d = 66.8 ≈5’7”

For this problem we did not need α0 , β 0 . Generally for estimating α0 , β 0 we could ask the
DM to suggest a 95% confidence interval for the heights before observing the data. Suppose
she settles on a range of 56 to 76 inches, i.e. 2 standard deviations = 10 inches, or one s.d. of
5 inches, or a variance of 25. Thus her estimate of the precision is 1/25. Now the mean value
of a gamma distributed variable is α/β, and since we are dealing with a gamma distributed
precision then α/β = 1/25. Thus perhaps we might use α0 = 1 and β 0 = 25.

5 Conclusion

Here we have covered Conjugate distributions for solving this problem for some prior distri-
butions, and note that we have not required any software to solve the problems. Once we
assume conjugate prior distributions, the calculation can most often be performed with a
simple calculator.

Page 15 Compiled on 2016/11/09 at 18:01:24

You might also like