PSet6 Solutions
PSet6 Solutions
1. Suppose X is drawn from Uniform[0, T ]. The parameter T can be 1 or 2 or 3—these are the only
possibilities.
(a) Consider the following procedure for estimating T based on observing the data X:
• If we observe X ≤ 1, we estimate that T is 1
• If we observe 1 < X ≤ 2, we estimate that T is 2
• If we observe X > 2, we estimate that T is 3
For each possible value of T , compute the probability that we estimate T correctly or incorrectly.
Solution:
• If T = 1, then we are correct with probability 1, and incorrect with probability 0
• If T = 2, then we are correct with probability 1/2, and incorrect with probability 1/2
• If T = 3, then we are correct with probability 1/3, and incorrect with probability 2/3
.
(b) Consider a Bayesian framework where we place a prior on T — we assume it’s equally likely to
be 1 or 2 or 3. (In other words, T ∼ Uniform{1, 2, 3}, the uniform distribution over a finite set.)
Compute the posterior distribution of T , given the observed data X. As in the lecture, you can
assume that it is okay to combine densities and PMFs for this setting where X is continuous while
T is discrete. The final form of your answer should be very simple — your final answer should give
simple numerical values, without summation notation or anything like that, but you will need to
split into cases.
· 1x≤t
1
t
pT |X (t | x) = P .
t′ =1,2,3 t · 1x≤t
1
t 1 2 3
• If 1 < x ≤ 2, then T has the posterior PMF 1
3
1
2
pT |X (t | x) 0 1
2
1 = 5 1 1
3
= 5
2+3 2+3
t 1 2 3
• If 2 < x ≤ 3, then T has the posterior PMF
pT |X (t | x) 0 0 1
.
1
2. Rejection sampling for the Geometric distribution.
(a) Calculate the rejection sampling procedure that we would use if we have access to draws from the
Geometric(0.5) distribution, and would like to simulate draws from the Geometric(0.6) distribu-
tion.
Solution: We have access to the PMF q(x) = 0.5x , supported on x = 1, 2, 3, . . . , and would like
to sample from the target PMF p∗ (x) = 0.6 · 0.4x−1 . We calculate
(b) What goes wrong if we instead try to simulate draws from the Geometric(0.4) distribution (again
assuming that Geometric(0.5) is the distribution that we can draw samples from)?
Solution: We would not be able to run rejection sampling because the ratio is not finite:
p∗ (x) 0.4 · 0.6x−1 2
max = max x
= max · 1.2x = ∞.
x=1,2,3,... q(x) x=1,2,3,... 0.5 x=1,2,3,... 3
Distribution A Distribution B
5
5
4
4
3
3
density
density
2
2
1
1
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
2
Consider two settings:
(1) You have access to samples from Distribution A, and you use rejection sampling to produce
samples from Distribution B.
(2) You have access to samples from Distribution B, and you use rejection sampling to produce
samples from Distribution A.
Which of these two implementations of rejection sampling will be more efficient, and which will be less
efficient? Explain your answer thoroughly. You may use pictures to help explain your solution, but a
picture alone without an explanation is not sufficient.
Solution: Here is what rejection sampling looks like in scenario (1) (on the left) and (2) (on the
right):
5
5
4
4
3
3
density
density
2
2
1
1
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
On the left, we use the available distribution q(x) coming from the Distribution A plot, and would
like samples from density h(x) = p∗ (x) from Distribution B. This means that we need to find some
constant C such that C ≥ p∗ (x)/q(x) for all x. We can see that we will have C ≈ 1.25 by looking at
values x ≈ 0.5 in the plot. So we have access to samples from the larger (light blue) histogram, and
1
discard samples to reduce down to the lower (dark blue) histogram; we keep around 1.25 = 80% of the
samples, which is quite efficient (see the figure on the left in the solutions).
On the right, we use the available distribution q(x) coming from the Distribution B plot, and would
like samples from density h(x) = p∗ (x) from Distribution A. This means that we need to find some
constant C such that C ≥ p∗ (x)/q(x) for all x; we can see we have C ≈ 3.25 by looking at values x ≈ 0
(i.e., due to the spike of the Distribution A density around zero). This makes the sampling much less
efficient; most of the samples from the larger (light blue) histogram, are discarded in order to reduce
down to the lower (dark blue) histogram (see the figure on the right, in the solutions).
So, (1) is more efficient, and (2) is less efficient.
4. In this problem you will construct a loose confidence interval for Binomial data. This interval won’t be
optimal—there are techniques to compute a more narrow interval, which will be covered later on—but
the goal is to illustrate the idea behind confidence intervals rather than to compute the best possible
answer.
Suppose that X ∼ Binomial(n, p) (e.g., the total number of Heads, if a probability-p coin is flipped n
times). We will write p̂ = X
n , the proportion of Heads in the observed data. We know that E(X) = np
and Var(X) = np(1 − p).
3
where ϵ > 0 is some small constant. Your upper bound will depend on the unknown p. However,
by calculating maxp∈[0,1] {p(1 − p)}, you can construct a looser upper bound that doesn’t depend
on p by using your answer to part (a). So, your final answer should be of the form
p(1−p) 1
Solution: We know that E(p̂) = p and Var(p̂) = n ≤ 4n (since maxp∈[0,1] {p(1−p)} = 0.25),
so by Chebyshev’s inequality,
1
P(|p̂ − p| ≥ ϵ) ≤ .
4nϵ2
(b) Next, for some desired error level α ∈ (0, 1), find a value for ϵ so that the probability above is
≤ α. Your value of ϵ should depend on n and α but not on p. Once you’ve computed this, you
will have a statement of the form
so that, calculating probability with respect to a draw of the random variable X, it holds that
1
P(|p̂ − p| ≥ √ ) ≤ α.
4nα
So, the interval is
1 1
p̂ − √ , p̂ + √ .
4nα 4nα
5. For each part of this problem, use the CLT to approximate the probability you need to compute. You
can ignore the issue of continuity corrections for this problem.
(a) There are two games to play at a fair. For the first game, in each round of the game you roll a
fair die, and win $4 if you roll a 6 or lose $1 otherwise. Suppose you play this first game 36 times.
What is the approximate probability that in the end you are ahead (i.e., your total earnings are
positive)?
Solution: Let D1 be the dollars won when playing the dice game once. Then
E(D1 ) = 4 · 1/6 − 1 · 5/6 = −1/6, Var(D1 ) = E(D12 ) − E(D1 )2 = 16 · 1/6 + 1 · 5/6 − 1/36 = 125/36.
Dtotal = D1 + · · · + D36
4
where each Di is the earnings from game #i. Applying the CLT, we see that the distribution of
Dtotal is approximately
N (36 · −1/6, 36 · 125/36) = N (−6, 125).
Then
Dtotal − (−6) 0 − (−6) Dtotal − (−6)
P(Dtotal > 0) = P √ > √ =P √ > 0.53 ≈ 1−Φ(0.53) = 0.298.
125 125 125
(b) For the second game, in each round of the game you throw a football, and the money you win
is equal to 0.1(F − 20), where F (in feet) is the distance that you threw the ball. Assume that
F follows an Exponential(0.1) distribution. Suppose you play the second game 50 times. What
is the approximate probability that you lose no more than $30 in total when playing the second
game?
Solution: Let B1 be the dollars won when playing the ball game once. Then
1
E(B1 ) = 0.1(E(F ) − 20) = 0.1 − 20 = −1, Var(B1 ) = (0.1)2 Var(F ) = (0.1)2 /(0.1)2 = 1.
0.1
We can write our total earnings from the ball game as
Btotal = B1 + · · · + B50
where each Bi is the earnings from game #i. Applying the CLT, we see that the distribution of
Btotal is approximately
N (50 · −1, 50 · 1) = N (−50, 50).
Then
Btotal − (−50) −30 − (−50)
P(Btotal > −30) = P √ > √ ≈ 1 − Φ(2.828) = 0.0023.
50 50
(c) Now combine all your games—you play the first game 36 times and then the second game 50
times. What is the probability of losing less than $72 in total?
T = Dtotal + Btotal .
From the work above we know that Dtotal ≈ N (−6, 125) in distribution, and Btotal ≈ N (−50, 50)
in distribution. Furthermore, adding two independent normal random variables yields a nor-
mal random variable. Therefore, since Dtotal and Btotal are independent and are approximately
normal, their sum is approximately normally distributed:
T ≈ N (−56, 175).
Hence
T − (−56) −72 − (−56) T − (−56)
P(T > −72) = P √ > √ =P √ > −1.21 ≈ 1−Φ(−1.21) = 0.887.
175 175 175
5
6. Let X ∼ Binomial(60, 0.22) and let Y = X/60 be the proportion of successes in the sample.
(a) What is the normal distribution that approximates the distribution of Y ?
Solution: We can write X = X1 +. . . X60 where Xi is the indicator variable for success on the ith
trial. For each individual Xi we calculate mean µ = 0.22 and variance σ 2 = 0.22(1−0.22) = 0.1716.
So we have
E(Y ) = E(X̄) = µ = 0.22
and
0.1716
Var(Y ) = Var(X̄) = σ 2 /n = = 0.00286.
60
By the CLT, Y ’s distribution is approximately N(0.22, 0.00286).
(b) Calculate (approximately) the probability P(Y ≤ 0.25) (you can ignore issues of continuity cor-
rections etc). (To obtain values of Φ(x), the CDF of the normal distribution, you can use Table
2 in the back of your book or just search online for “standard normal table”. Or, if you have R,
you can use the command pnorm.)
Solution:
Y − 0.22 0.25 − 0.22 Y − 0.22
P(Y ≤ 0.25) = P √ ≤ √ =P √ ≤ 0.561 ≈ Φ(0.561) ≈ 0.71.
0.00286 0.00286 0.00286