0% found this document useful (0 votes)
32 views6 pages

PSet6 Solutions

This document describes a problem set for a statistics course. It includes 3 problems: 1) estimating an unknown parameter T based on observations from a uniform distribution, 2) using rejection sampling to simulate values from a Geometric(0.6) distribution given samples from a Geometric(0.5) distribution, and 3) determining which would be more efficient between using rejection sampling to simulate values from Distribution A given samples from Distribution B versus vice versa based on comparing their densities. Solutions or explanations are provided for parts of each problem.

Uploaded by

Alexander Qu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views6 pages

PSet6 Solutions

This document describes a problem set for a statistics course. It includes 3 problems: 1) estimating an unknown parameter T based on observations from a uniform distribution, 2) using rejection sampling to simulate values from a Geometric(0.6) distribution given samples from a Geometric(0.5) distribution, and 3) determining which would be more efficient between using rejection sampling to simulate values from Distribution A given samples from Distribution B versus vice versa based on comparing their densities. Solutions or explanations are provided for parts of each problem.

Uploaded by

Alexander Qu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Stat 244 Winter 2024 — Problem set 6

Due on Gradescope, Thursday Feb 15 (9:30am)

1. Suppose X is drawn from Uniform[0, T ]. The parameter T can be 1 or 2 or 3—these are the only
possibilities.
(a) Consider the following procedure for estimating T based on observing the data X:
• If we observe X ≤ 1, we estimate that T is 1
• If we observe 1 < X ≤ 2, we estimate that T is 2
• If we observe X > 2, we estimate that T is 3
For each possible value of T , compute the probability that we estimate T correctly or incorrectly.

Solution:
• If T = 1, then we are correct with probability 1, and incorrect with probability 0
• If T = 2, then we are correct with probability 1/2, and incorrect with probability 1/2
• If T = 3, then we are correct with probability 1/3, and incorrect with probability 2/3
.

(b) Consider a Bayesian framework where we place a prior on T — we assume it’s equally likely to
be 1 or 2 or 3. (In other words, T ∼ Uniform{1, 2, 3}, the uniform distribution over a finite set.)
Compute the posterior distribution of T , given the observed data X. As in the lecture, you can
assume that it is okay to combine densities and PMFs for this setting where X is continuous while
T is discrete. The final form of your answer should be very simple — your final answer should give
simple numerical values, without summation notation or anything like that, but you will need to
split into cases.

Solution: The likelihood for X given T is given by the conditional PMF


1
fX|T (x | t) = · 1x≤t
t
supported on x ∈ [0, 3]. Then the posterior for T is

· 1x≤t
1
t
pT |X (t | x) = P .
t′ =1,2,3 t · 1x≤t
1

We can simplify this as follows:


t 1 2 3
• If 0 ≤ x ≤ 1, then T has the posterior PMF 1 6
1
3
1
2
pT |X (t | x) 1+ 12 + 13
= 11
2
1+ 21 + 13
= 11
3
1+ 12 + 13
= 11

t 1 2 3
• If 1 < x ≤ 2, then T has the posterior PMF 1
3
1
2
pT |X (t | x) 0 1
2
1 = 5 1 1
3
= 5
2+3 2+3

t 1 2 3
• If 2 < x ≤ 3, then T has the posterior PMF
pT |X (t | x) 0 0 1
.

1
2. Rejection sampling for the Geometric distribution.
(a) Calculate the rejection sampling procedure that we would use if we have access to draws from the
Geometric(0.5) distribution, and would like to simulate draws from the Geometric(0.6) distribu-
tion.

Solution: We have access to the PMF q(x) = 0.5x , supported on x = 1, 2, 3, . . . , and would like
to sample from the target PMF p∗ (x) = 0.6 · 0.4x−1 . We calculate

p∗ (x) 0.6 · 0.4x−1


C= max = max = max 1.5 · 0.8x = 1.2.
x=1,2,3,... q(x) x=1,2,3,... 0.5x x=1,2,3,...

So, our acceptance probability function is


p∗ (x) 0.6 · 0.4x−1
a(x) = = = 1.25 · 0.8x .
C · q(x) 1.2 · 0.5x
The procedure is:
• Sample X ∼ Geometric(0.5) (e.g., by flipping a fair coin and counting how many times we
need to flip to reach the first Heads)
• Accept X with probability 1.25 · 0.8X , otherwise discard the sample.
.

(b) What goes wrong if we instead try to simulate draws from the Geometric(0.4) distribution (again
assuming that Geometric(0.5) is the distribution that we can draw samples from)?

Solution: We would not be able to run rejection sampling because the ratio is not finite:
p∗ (x) 0.4 · 0.6x−1 2
max = max x
= max · 1.2x = ∞.
x=1,2,3,... q(x) x=1,2,3,... 0.5 x=1,2,3,... 3

3. Consider the following two density functions:

Distribution A Distribution B
5

5
4

4
3

3
density

density
2

2
1

1
0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

2
Consider two settings:
(1) You have access to samples from Distribution A, and you use rejection sampling to produce
samples from Distribution B.
(2) You have access to samples from Distribution B, and you use rejection sampling to produce
samples from Distribution A.
Which of these two implementations of rejection sampling will be more efficient, and which will be less
efficient? Explain your answer thoroughly. You may use pictures to help explain your solution, but a
picture alone without an explanation is not sufficient.

Solution: Here is what rejection sampling looks like in scenario (1) (on the left) and (2) (on the
right):
5

5
4

4
3

3
density

density
2

2
1

1
0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

On the left, we use the available distribution q(x) coming from the Distribution A plot, and would
like samples from density h(x) = p∗ (x) from Distribution B. This means that we need to find some
constant C such that C ≥ p∗ (x)/q(x) for all x. We can see that we will have C ≈ 1.25 by looking at
values x ≈ 0.5 in the plot. So we have access to samples from the larger (light blue) histogram, and
1
discard samples to reduce down to the lower (dark blue) histogram; we keep around 1.25 = 80% of the
samples, which is quite efficient (see the figure on the left in the solutions).
On the right, we use the available distribution q(x) coming from the Distribution B plot, and would
like samples from density h(x) = p∗ (x) from Distribution A. This means that we need to find some
constant C such that C ≥ p∗ (x)/q(x) for all x; we can see we have C ≈ 3.25 by looking at values x ≈ 0
(i.e., due to the spike of the Distribution A density around zero). This makes the sampling much less
efficient; most of the samples from the larger (light blue) histogram, are discarded in order to reduce
down to the lower (dark blue) histogram (see the figure on the right, in the solutions).
So, (1) is more efficient, and (2) is less efficient.

4. In this problem you will construct a loose confidence interval for Binomial data. This interval won’t be
optimal—there are techniques to compute a more narrow interval, which will be covered later on—but
the goal is to illustrate the idea behind confidence intervals rather than to compute the best possible
answer.
Suppose that X ∼ Binomial(n, p) (e.g., the total number of Heads, if a probability-p coin is flipped n
times). We will write p̂ = X
n , the proportion of Heads in the observed data. We know that E(X) = np
and Var(X) = np(1 − p).

(a) Use Chebyshev’s inequality to find an upper bound on


P(|p̂ − p| ≥ ϵ)

3
where ϵ > 0 is some small constant. Your upper bound will depend on the unknown p. However,
by calculating maxp∈[0,1] {p(1 − p)}, you can construct a looser upper bound that doesn’t depend
on p by using your answer to part (a). So, your final answer should be of the form

P(|p̂ − p| ≥ ϵ) ≤ (some function that depends on n and on ϵ but not on p)

p(1−p) 1
Solution: We know that E(p̂) = p and Var(p̂) = n ≤ 4n (since maxp∈[0,1] {p(1−p)} = 0.25),
so by Chebyshev’s inequality,
1
P(|p̂ − p| ≥ ϵ) ≤ .
4nϵ2

(b) Next, for some desired error level α ∈ (0, 1), find a value for ϵ so that the probability above is
≤ α. Your value of ϵ should depend on n and α but not on p. Once you’ve computed this, you
will have a statement of the form

P(|p̂ − p| ≥ (some function of n and α)) ≤ α.

Then, based on this answer, compute an interval of the form


h i
(a lower bound which is a function of p̂ & n & α), (an upper bound which is a function of p̂ & n & α)

so that, calculating probability with respect to a draw of the random variable X, it holds that

P(the true parameter p lies in the interval) ≥ 1 − α.

Solution: Choosing ϵ = √1 , we get 1


= α. So, we get
4nα 4nϵ2

1
P(|p̂ − p| ≥ √ ) ≤ α.
4nα
So, the interval is  
1 1
p̂ − √ , p̂ + √ .
4nα 4nα

5. For each part of this problem, use the CLT to approximate the probability you need to compute. You
can ignore the issue of continuity corrections for this problem.
(a) There are two games to play at a fair. For the first game, in each round of the game you roll a
fair die, and win $4 if you roll a 6 or lose $1 otherwise. Suppose you play this first game 36 times.
What is the approximate probability that in the end you are ahead (i.e., your total earnings are
positive)?

Solution: Let D1 be the dollars won when playing the dice game once. Then

E(D1 ) = 4 · 1/6 − 1 · 5/6 = −1/6, Var(D1 ) = E(D12 ) − E(D1 )2 = 16 · 1/6 + 1 · 5/6 − 1/36 = 125/36.

We can write our total earnings from the dice game as

Dtotal = D1 + · · · + D36

4
where each Di is the earnings from game #i. Applying the CLT, we see that the distribution of
Dtotal is approximately
N (36 · −1/6, 36 · 125/36) = N (−6, 125).
Then
   
Dtotal − (−6) 0 − (−6) Dtotal − (−6)
P(Dtotal > 0) = P √ > √ =P √ > 0.53 ≈ 1−Φ(0.53) = 0.298.
125 125 125

(b) For the second game, in each round of the game you throw a football, and the money you win
is equal to 0.1(F − 20), where F (in feet) is the distance that you threw the ball. Assume that
F follows an Exponential(0.1) distribution. Suppose you play the second game 50 times. What
is the approximate probability that you lose no more than $30 in total when playing the second
game?

Solution: Let B1 be the dollars won when playing the ball game once. Then
 
1
E(B1 ) = 0.1(E(F ) − 20) = 0.1 − 20 = −1, Var(B1 ) = (0.1)2 Var(F ) = (0.1)2 /(0.1)2 = 1.
0.1
We can write our total earnings from the ball game as

Btotal = B1 + · · · + B50

where each Bi is the earnings from game #i. Applying the CLT, we see that the distribution of
Btotal is approximately
N (50 · −1, 50 · 1) = N (−50, 50).
Then
 
Btotal − (−50) −30 − (−50)
P(Btotal > −30) = P √ > √ ≈ 1 − Φ(2.828) = 0.0023.
50 50

(c) Now combine all your games—you play the first game 36 times and then the second game 50
times. What is the probability of losing less than $72 in total?

Solution: Let T be your total earnings,

T = Dtotal + Btotal .

From the work above we know that Dtotal ≈ N (−6, 125) in distribution, and Btotal ≈ N (−50, 50)
in distribution. Furthermore, adding two independent normal random variables yields a nor-
mal random variable. Therefore, since Dtotal and Btotal are independent and are approximately
normal, their sum is approximately normally distributed:

T ≈ N (−56, 175).

Hence
   
T − (−56) −72 − (−56) T − (−56)
P(T > −72) = P √ > √ =P √ > −1.21 ≈ 1−Φ(−1.21) = 0.887.
175 175 175

5
6. Let X ∼ Binomial(60, 0.22) and let Y = X/60 be the proportion of successes in the sample.
(a) What is the normal distribution that approximates the distribution of Y ?

Solution: We can write X = X1 +. . . X60 where Xi is the indicator variable for success on the ith
trial. For each individual Xi we calculate mean µ = 0.22 and variance σ 2 = 0.22(1−0.22) = 0.1716.
So we have
E(Y ) = E(X̄) = µ = 0.22
and
0.1716
Var(Y ) = Var(X̄) = σ 2 /n = = 0.00286.
60
By the CLT, Y ’s distribution is approximately N(0.22, 0.00286).

(b) Calculate (approximately) the probability P(Y ≤ 0.25) (you can ignore issues of continuity cor-
rections etc). (To obtain values of Φ(x), the CDF of the normal distribution, you can use Table
2 in the back of your book or just search online for “standard normal table”. Or, if you have R,
you can use the command pnorm.)

Solution:
   
Y − 0.22 0.25 − 0.22 Y − 0.22
P(Y ≤ 0.25) = P √ ≤ √ =P √ ≤ 0.561 ≈ Φ(0.561) ≈ 0.71.
0.00286 0.00286 0.00286

You might also like