0% found this document useful (0 votes)
90 views9 pages

W9PS

This document provides solutions to practice problems involving Bayesian statistics and posterior distributions. [1] Using a uniform prior, the posterior distribution for a binomial problem is Beta(11,31). The posterior mean is 0.26 with this prior and changes to 0.30 and 0.33 with Beta(5,5) and Beta(10,10) priors respectively. [2] Using a Beta(2,8) prior, the posterior mean for a binomial problem is estimated to be 0.20. [3] Given a Poisson problem with a mixed prior, the posterior mode is 10. [4] The posterior mean of a normal problem is nYn/(n+1)

Uploaded by

polar neckson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views9 pages

W9PS

This document provides solutions to practice problems involving Bayesian statistics and posterior distributions. [1] Using a uniform prior, the posterior distribution for a binomial problem is Beta(11,31). The posterior mean is 0.26 with this prior and changes to 0.30 and 0.33 with Beta(5,5) and Beta(10,10) priors respectively. [2] Using a Beta(2,8) prior, the posterior mean for a binomial problem is estimated to be 0.20. [3] Given a Poisson problem with a mixed prior, the posterior mode is 10. [4] The posterior mean of a normal problem is nYn/(n+1)

Uploaded by

polar neckson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Statistics for Data Science - 2

Week 9 Practice Assignment Solution

1. Let p be the proportion of students in IITM online degree programme who approve the
online proctored exams. The students’ committee is going to take a random sample of
n = 40 students from IITM online degree programme and ask if they approve the online
proctored exams. Suppose 10 out of the 40 students answered yes.
i) Calculate the posterior distribution if we use a continuous Uniform[0, 1] prior.
a) Beta(10, 30)
b) Beta(11, 31)
c) Beta(10, 40)
d) Beta(11, 40)
Solution:
Let fp (p) denote the prior distribution of p.
Then, by given information fp (p) = 1, since, p ∼ Uniform[0, 1].
If X1 , X2 , . . . , Xn ∼ iid Bernoulli(p)
⇒ posterior density = Beta(w + 1, n − w + 1) where w is the number of success.
Here n = 40, w = 10 ⇒ posterior density = Beta(11, 31)
ii) Find the Bayesian estimate (posterior mean) of p. Enter your answer correct to two
decimals accuracy.
Solution:
11
posterior mean = = 0.26
11 + 31

iii) Find the Bayesian estimate (posterior mean) with Beta(5, 5) prior. Enter your an-
swer correct to two decimals accuracy.
Solution:
Given the prior distribution is Beta(α, β)
X1 , X2 , . . . , Xn ∼ iid Bernoulli(p)
⇒ posterior density = Beta(w + α, n − w + β)
Here n = 40, w = 10, α = 5, β = 5
⇒ posterior density = Beta(15, 35)
15
posterior mean = = 0.30
15 + 35

iv) Find the Bayesian estimate (posterior mean) with Beta(10, 10) prior. Enter your
answer correct to two decimals accuracy.
Solution:
Here n = 40, w = 10, α = 10, β = 10

1
⇒ posterior density = Beta(20, 40)
20
posterior mean = = 0.33
20 + 40
2. The new method of screening for a disease fails to detect the presence of the disease in
20% of the patients from prior experience. A new random sample of n = 100 patients
who are known to have the disease is screened using the new method. Out of these 100
patients, the new method failed to detect the disease in 20 cases. Use a Beta(2, β) with
a suitable β to estimate the failure fraction. Enter your answer correct to two decimals
accuracy.
Solution:
The prior is given as Beta(2, β) with the information that the new method failed to
detect the disease in 20 cases.
2
⇒ = 0.20
2+β
⇒ β = 8.
If X1 , X2 , . . . , Xn ∼ iid Bernoulli(p)
⇒ posterior density = Beta(w + α, n − w + β)
Here n = 100, w = 20, α = 2, β = 8
⇒ posterior density = Beta(22, 88)
22
posterior mean = = 0.20
22 + 88

3. Suppose that the number of customers arriving in a restaurant in a one day time period
follows the Poisson distribution with unknown parameter λ. Previous records suggest
that the prior probabilities of λ are P (λ = 10) = 0.4 and P (λ = 8) = 0.6. If on a
particular day 15 people arrive at the restaurant, find the posterior mode of λ.
Solution:

P (X = 15) = P (X = 15 | λ = 10)P (λ = 10) + P (X = 15 | λ = 8)P (λ = 8)


e−10 1015 e−8 815
= × 0.4 + × 0.6
15! 15!
Now,

P (λ = 10 | X = 15) = P (X = 15 | λ = 10)P (λ = 10)/P (X = 15)


e−10 1015 × 0.4
= −10 15
e 10 × 0.4 + e−8 815 × 0.6

2
And

P (λ = 8 | X = 15) = P (X = 15 | λ = 8)P (λ = 8)/P (X = 15)


e−8 815 × 0.6
= −10 15
e 10 × 0.4 + e−8 815 × 0.6

P (λ = 10 | X = 15) e−10 1015 × 0.4


⇒ = −8 15
P (λ = 8 | X = 15) e 8 × 0.6
= 2.56
⇒ P (λ = 10 | X = 15) > P (λ = 8 | X = 15)

Hence, the posterior mode of λ is 10.

4. Consider a Bayesian estimation problem, with X1 , . . . , Xn ∼ iid Normal(µ, 1), and a


n
P
Normal(0, 1) prior. Letting Yn = Xi , the posterior mean is
i=1

Yn
a)
n
Yn
b)
n+1
nYn
c)
n+1
nYn
d)
n+2
Solution:
If X1 , . . . , Xn ∼ iid Normal(µ, σ 2 ), and the prior is Normal(µ0 , σ02 )
nσ 2 σ2
then posterior density = X 2 0 2 + µ0 2
nσ0 + σ nσ0 + σ 2
2
Here σ = 1, µ0 = 0, σ0 = 1  
X1 + . . . + Xn n×1
⇒ posterior mean =
n n×1+1
Yn
⇒ posterior mean =
n+1
5. The marks distribution of IITM students in the end semester exam follows normal distri-
bution with unknown mean µ and variance 20. A random sample of marks of 8 students
are:
60, 60, 65, 65, 70, 70, 72, 75.
i) Assume that the prior distribution is Normal(50, 5). Find the posterior mean of µ .
Enter your answer correct to two decimals accuracy.

3
Solution:
60 + 60 + 65 + 65 + 70 + 70 + 72 + 75
X= = 67.125
8
Here X = 67.125, σ 2 = 20, n = 8, µ0 = 50, σ02 = 5

   
8×5 20
posterior mean = 67.125 + 50
8 × 5 + 20 8 × 5 + 20
   
40 20
= 67.125 + 50
60 60
= 61.416

ii) Assume that the prior distribution is Normal(50, 25). Find the posterior mean of µ .
Enter your answer correct to two decimals accuracy.
Solution:
60 + 60 + 65 + 65 + 70 + 70 + 72 + 75
X= = 67.125
8
Here X = 67.125, σ 2 = 20, n = 8, µ0 = 50, σ02 = 25

   
8 × 25 20
posterior mean = 67.125 + 50
8 × 25 + 20 8 × 25 + 20
   
200 20
= 67.125 + 50
220 220
= 65.56

6. Suppose X is a discrete random variable taking values {1, 2, 3} with respective proba-
bilities {p, 2(1 − p)/3, (1 − p)/3}, where 0 ≤ p ≤ 1 is a parameter. Consider the samples
1, 1, 3, 1, 3, 2, 1, 2, 3, 2 taken from X.
Use a Uniform[0, 1] prior on p to find the posterior mean. Enter your answer correct to
two decimals accuracy.
Solution:
Let fp (p) denote the prior distribution of p.
Then, by given information fp (p) = 1, since, p ∼ Uniform[0, 1].

Now, posterior density ∝ P (X1 = x1 , . . . , Xn = xn |p = p)fp (p)


⇒ posterior density ∝ pn1 (1 − p)n2 +n3 where ni denotes the number of i in the samples.
Here n1 = 4, n2 = 3, n3 = 3
⇒ posterior density ∝ p4 (1 − p)3+3
⇒ posterior density = Beta(5, 7)
5
⇒ posterior mean = = 0.416
5+7

4
7. The following ten samples are taken from the Geometric(p):
2, 4, 10, 8, 12, 6, 14, 6, 3, 5.
Find the posterior mean of p using Uniform[0, 1] prior. Enter your answer correct to two
decimals accuracy.
Solution:
If X1 , . . . , Xn ∼ iid Geometric(p), and the prior is Uniform[0, 1],
then posterior density = Beta(n + 1, x1 + x2 + . . . + xn − n + 1)
Here n = 10, x1 + x2 + . . . + xn = 2 + 4 + 10 + 8 + 12 + 6 + 14 + 6 + 3 + 5 = 70
⇒ posterior density = Beta(11, 61).
11
⇒ posterior mean = = 0.15
11 + 61
8. Consider the samples 7, 5, 0, 2, 10, 4, 9, 8, 3 taken from Poisson(λ), where λ is unknown.
Using a Gamma(4, 11) prior, find the posterior mean of λ. Enter your answer correct to
one decimal accuracy.
Solution:
If X1 , . . . , Xn ∼ iid Poisson(λ), and the prior is Gamma(α, β)
then posterior density = Gamma(x1 + x2 + . . . + xn + α, β + n)
Here n = 9, x1 + x2 + . . . + xn = 7 + 5 + 0 + 2 + 10 + 4 + 9 + 8 + 3 = 48, α = 4, β = 11
⇒ posterior density = Gamma(52, 20)
52
⇒ posterior mean = = 2.6
20
9. The number of defects per 10 meters of cloth produced by a weaving machine has the
Poisson distribution with mean λ. You examine 100 meters of cloth produced by the
machine and observe 61 defects. Your prior belief about λ is that it has mean 6 and
standard deviation 2. Use a Gamma(α, β) prior that matches your prior belief and find
the posterior distribution.
a) Gamma(70, 111.5)
b) Gamma(70, 11.5)
c) Gamma(61, 11.5)
d) Gamma(61, 13)
Solution:
Prior is Gamma(α, β) with mean 6 and standard deviation 2.
α α
⇒ = 6 and 2 = 4
β β
⇒ α = 6β and α = 4β 2

Solving these two equations we will get


α = 9 and β = 1.5.
Also n = 10 and x1 + x2 + . . . + xn = 61
⇒ posterior distribution = Gamma(70, 11.5)

5
10. Assume that the time that elapses from one call to the next at a 911 call center has
the exponential distribution with parameter λ. The time elasped between ten calls (in
minutes) are: 3, 4, 6, 1, 7, 8, 2, 5, 1. Your prior belief about λ is that it has mean 3.5 and
standard deviation 1. Use a Gamma(α, β) prior that matches your prior belief and find
the posterior mean. Enter your answer correct to two decimals accuracy.
Solution:
Prior is Gamma(α, β) with mean 6 and standard deviation 2.
α α
⇒ = 3.5 and 2 = 1
β β
⇒ α = 3.5β and α = β 2

Solving these two equations we will get


α = 12.25 and β = 3.5.
If X1 , . . . , Xn ∼ iid Exponential(λ), and the prior is Gamma(α, β)
then posterior density = Gamma(n + α, β + x1 + x2 + . . . + xn )
Here n = 9, x1 + x2 + . . . + xn = 3 + 4 + 6 + 1 + 7 + 8 + 2 + 5 + 1 = 37, α = 12.25, β = 3.5
⇒ posterior density = Gamma(21.25, 40.5)
⇒ posterior mean = 21.25 40.5
= 0.52

11. The frequency data on number of deaths per month due to a certain disease is given
below:

No. of deaths per month Frequency


0 224
1 102
2 23
3 5
4 1
5+ 0

Table 9.1.P

6
(i) Fit a Poisson distribution to the given frequency table and find the parameter.
Write your answer correct to two decimal places.
Solution:
Let λ̂ = X̄ be an estimate of λ.
P
ni f i
Sample mean(X̄) = P
fi

(0 × 224) + (1 × 102) + (2 × 23) + (3 × 5) + (4 × 1)


=
224 + 102 + 23 + 5 + 1
167
=
355
=0.47

Therefore, λ̂ = 0.47.
Therefore, the distribution is Poisson(0.47).

n Frequency Poisson fit


0 224 (e−0.47 )355 = 221.87
1 102 (e−0.47 (0.47)/1!)355 = 104.28
2 23 (e−0.47 (0.47)2 /2!)355 = 24.05
3 5 (e−0.47 (0.47)3 /3!)355 = 3.83
4 1 (e−0.47 (0.47)4 /4!)355 = 0.45
5+ 0 (e−0.47 (0.47)5 /5!)355 = 0.04

As we can observe from the table that the actual count is close to the expected
count, therefore, Poisson(0.47) is a reasonable fit for the given data.
(ii) Find an approximate 95% confidence interval using a normal approximation for the
sampling distribution.
(Use the following information:
sample variance S 2 = 0.498 and P (−0.07 < N(0, 0.0014) < 0.07) = 0.95)
(a) [0.40, 0.47]
(b) [0.40, 0.54]
(c) [0.44, 0.54]
(d) [0.44, 0.52]

Solution:
Error: λ̂ − λ
E[λ̂ − λ] = 0
σ2 S2
Var(λ̂ − λ) = Var(λ̂) = ≈
n n

7
 2
s
Therefore, we will assume the sampling distribution to be Normal 0, .
n
Given that, sample variance (s2 ) = 0.498.
Therefore, the sampling distribution is Normal(0, 0.0014).
Now, 95% confidence interval for λ is [λ̂ − δ1 , λ̂ − δ2 ].
It is given that P (−0.07 < N(0, 0.0014) < 0.07) = 0.95, therefore,

δ1 = 0.07 and δ2 = −0.07

Hence the 95% confidence interval for λ is [0.40, 0.54].

12. The number of emails received by Neeti in intervals of one hour is given in Table 9.2.P.

No. of emails per hour Frequency


0 5
1 15
2 22
3 22
4 17
5 10
6 5
7 3
8 1
9+ 0

Table 9.2.P: Emails received by Neeti in one-hour interval for the last 100 hours.

(i) Fit a Poisson distribution to the given frequency table and find the parameter.
Write your answer correct to two decimal places.

Solution:
Let λ̂ = X̄ be an estimate of λ.
P
ni f i
Sample mean(X̄) = P
fi
(1 × 15) + (2 × 22) + (3 × 22) + (4 × 17) + (5 × 10) + (6 × 5) + (7 × 3) + (8 × 1)
⇒ X̄ =
5 + 15 + 22 + 22 + 17 + 10 + 5 + 3 + 1
302
⇒ X̄ = = 3.02
100
Therefore, λ̂ = 3.02.
Therefore, the distribution is Poisson(3.02).
We can check the fit, the same way we did in the previous question.

8
(ii) Find an approximate 95% confidence interval using a normal approximation for the
sampling distribution.
(Use the following information:
sample variance S 2 = 3.05 and P (−0.34 < N(0, 0.0305) < 0.34) = 0.95)
(a) [1.89, 4.15]
(b) [2.08, 4.34]
(c) [2.68, 3.36]
(d) [1.89, 3.35]

Solution:
Error: λ̂ − λ
E[λ̂ − λ] = 0
σ2 S2
Var(λ̂ − λ) = Var(λ̂) = ≈
n n  2
s
Therefore, we will assume the sampling distribution to be Normal 0, .
n
Given that, sample variance (s2 ) = 3.05.
Therefore, the sampling distribution is Normal(0, 0.0305).
Now, 95% confidence interval for λ is [λ̂ − δ1 , λ̂ − δ2 ].
It is given that P (−0.34 < N(0, 0.0305) < 0.34) = 0.95, therefore,

δ1 = 0.34 and δ2 = −0.34

Hence the 95% confidence interval for λ is [2.68, 3.36].

You might also like