0% found this document useful (0 votes)

32 views6 pages

PSet6 Solutions

This document describes a problem set for a statistics course. It includes 3 problems: 1) estimating an unknown parameter T based on observations from a uniform distribution, 2) using rejection sampling to simulate values from a Geometric(0.6) distribution given samples from a Geometric(0.5) distribution, and 3) determining which would be more efficient between using rejection sampling to simulate values from Distribution A given samples from Distribution B versus vice versa based on comparing their densities. Solutions or explanations are provided for parts of each problem.

Uploaded by

Alexander Qu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views6 pages

PSet6 Solutions

Uploaded by

Alexander Qu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Stat 244 Winter 2024 — Problem set 6

Due on Gradescope, Thursday Feb 15 (9:30am)

1. Suppose X is drawn from Uniform[0, T ]. The parameter T can be 1 or 2 or 3—these are the only
possibilities.
(a) Consider the following procedure for estimating T based on observing the data X:
• If we observe X ≤ 1, we estimate that T is 1
• If we observe 1 < X ≤ 2, we estimate that T is 2
• If we observe X > 2, we estimate that T is 3
For each possible value of T , compute the probability that we estimate T correctly or incorrectly.

Solution:
• If T = 1, then we are correct with probability 1, and incorrect with probability 0
• If T = 2, then we are correct with probability 1/2, and incorrect with probability 1/2
• If T = 3, then we are correct with probability 1/3, and incorrect with probability 2/3
.

(b) Consider a Bayesian framework where we place a prior on T — we assume it’s equally likely to
be 1 or 2 or 3. (In other words, T ∼ Uniform{1, 2, 3}, the uniform distribution over a finite set.)
Compute the posterior distribution of T , given the observed data X. As in the lecture, you can
assume that it is okay to combine densities and PMFs for this setting where X is continuous while
T is discrete. The final form of your answer should be very simple — your final answer should give
simple numerical values, without summation notation or anything like that, but you will need to
split into cases.

Solution: The likelihood for X given T is given by the conditional PMF

1
fX|T (x | t) = · 1x≤t
t
supported on x ∈ [0, 3]. Then the posterior for T is

· 1x≤t
1
t
pT |X (t | x) = P .
t′ =1,2,3 t · 1x≤t
1

We can simplify this as follows:

t 1 2 3
• If 0 ≤ x ≤ 1, then T has the posterior PMF 1 6
1
3
1
2
pT |X (t | x) 1+ 12 + 13
= 11
2
1+ 21 + 13
= 11
3
1+ 12 + 13
= 11

t 1 2 3
• If 1 < x ≤ 2, then T has the posterior PMF 1
3
1
2
pT |X (t | x) 0 1
2
1 = 5 1 1
3
= 5
2+3 2+3

t 1 2 3
• If 2 < x ≤ 3, then T has the posterior PMF
pT |X (t | x) 0 0 1
.

1
2. Rejection sampling for the Geometric distribution.
(a) Calculate the rejection sampling procedure that we would use if we have access to draws from the
Geometric(0.5) distribution, and would like to simulate draws from the Geometric(0.6) distribu-
tion.

Solution: We have access to the PMF q(x) = 0.5x , supported on x = 1, 2, 3, . . . , and would like
to sample from the target PMF p∗ (x) = 0.6 · 0.4x−1 . We calculate

p∗ (x) 0.6 · 0.4x−1

C= max = max = max 1.5 · 0.8x = 1.2.
x=1,2,3,... q(x) x=1,2,3,... 0.5x x=1,2,3,...

So, our acceptance probability function is

p∗ (x) 0.6 · 0.4x−1
a(x) = = = 1.25 · 0.8x .
C · q(x) 1.2 · 0.5x
The procedure is:
• Sample X ∼ Geometric(0.5) (e.g., by flipping a fair coin and counting how many times we
need to flip to reach the first Heads)
• Accept X with probability 1.25 · 0.8X , otherwise discard the sample.
.

(b) What goes wrong if we instead try to simulate draws from the Geometric(0.4) distribution (again
assuming that Geometric(0.5) is the distribution that we can draw samples from)?

Solution: We would not be able to run rejection sampling because the ratio is not finite:
p∗ (x) 0.4 · 0.6x−1 2
max = max x
= max · 1.2x = ∞.
x=1,2,3,... q(x) x=1,2,3,... 0.5 x=1,2,3,... 3

3. Consider the following two density functions:

Distribution A Distribution B
5

5
4

4
3

3
density

density
2

2
1

1
0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

2
Consider two settings:
(1) You have access to samples from Distribution A, and you use rejection sampling to produce
samples from Distribution B.
(2) You have access to samples from Distribution B, and you use rejection sampling to produce
samples from Distribution A.
Which of these two implementations of rejection sampling will be more efficient, and which will be less
efficient? Explain your answer thoroughly. You may use pictures to help explain your solution, but a
picture alone without an explanation is not sufficient.

Solution: Here is what rejection sampling looks like in scenario (1) (on the left) and (2) (on the
right):
5

5
4

4
3

3
density

density
2

2
1

1
0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

x x

On the left, we use the available distribution q(x) coming from the Distribution A plot, and would
like samples from density h(x) = p∗ (x) from Distribution B. This means that we need to find some
constant C such that C ≥ p∗ (x)/q(x) for all x. We can see that we will have C ≈ 1.25 by looking at
values x ≈ 0.5 in the plot. So we have access to samples from the larger (light blue) histogram, and
1
discard samples to reduce down to the lower (dark blue) histogram; we keep around 1.25 = 80% of the
samples, which is quite efficient (see the figure on the left in the solutions).
On the right, we use the available distribution q(x) coming from the Distribution B plot, and would
like samples from density h(x) = p∗ (x) from Distribution A. This means that we need to find some
constant C such that C ≥ p∗ (x)/q(x) for all x; we can see we have C ≈ 3.25 by looking at values x ≈ 0
(i.e., due to the spike of the Distribution A density around zero). This makes the sampling much less
efficient; most of the samples from the larger (light blue) histogram, are discarded in order to reduce
down to the lower (dark blue) histogram (see the figure on the right, in the solutions).
So, (1) is more efficient, and (2) is less efficient.

4. In this problem you will construct a loose confidence interval for Binomial data. This interval won’t be
optimal—there are techniques to compute a more narrow interval, which will be covered later on—but
the goal is to illustrate the idea behind confidence intervals rather than to compute the best possible
answer.
Suppose that X ∼ Binomial(n, p) (e.g., the total number of Heads, if a probability-p coin is flipped n
times). We will write p̂ = X
n , the proportion of Heads in the observed data. We know that E(X) = np
and Var(X) = np(1 − p).

(a) Use Chebyshev’s inequality to find an upper bound on

P(|p̂ − p| ≥ ϵ)

3
where ϵ > 0 is some small constant. Your upper bound will depend on the unknown p. However,
by calculating maxp∈[0,1] {p(1 − p)}, you can construct a looser upper bound that doesn’t depend
on p by using your answer to part (a). So, your final answer should be of the form

P(|p̂ − p| ≥ ϵ) ≤ (some function that depends on n and on ϵ but not on p)

p(1−p) 1
Solution: We know that E(p̂) = p and Var(p̂) = n ≤ 4n (since maxp∈[0,1] {p(1−p)} = 0.25),
so by Chebyshev’s inequality,
1
P(|p̂ − p| ≥ ϵ) ≤ .
4nϵ2

(b) Next, for some desired error level α ∈ (0, 1), find a value for ϵ so that the probability above is
≤ α. Your value of ϵ should depend on n and α but not on p. Once you’ve computed this, you
will have a statement of the form

P(|p̂ − p| ≥ (some function of n and α)) ≤ α.

Then, based on this answer, compute an interval of the form

h i
(a lower bound which is a function of p̂ & n & α), (an upper bound which is a function of p̂ & n & α)

so that, calculating probability with respect to a draw of the random variable X, it holds that

P(the true parameter p lies in the interval) ≥ 1 − α.

Solution: Choosing ϵ = √1 , we get 1

= α. So, we get
4nα 4nϵ2

1
P(|p̂ − p| ≥ √ ) ≤ α.
4nα
So, the interval is
1 1
p̂ − √ , p̂ + √ .
4nα 4nα

5. For each part of this problem, use the CLT to approximate the probability you need to compute. You
can ignore the issue of continuity corrections for this problem.
(a) There are two games to play at a fair. For the first game, in each round of the game you roll a
fair die, and win $4 if you roll a 6 or lose $1 otherwise. Suppose you play this first game 36 times.
What is the approximate probability that in the end you are ahead (i.e., your total earnings are
positive)?

Solution: Let D1 be the dollars won when playing the dice game once. Then

E(D1 ) = 4 · 1/6 − 1 · 5/6 = −1/6, Var(D1 ) = E(D12 ) − E(D1 )2 = 16 · 1/6 + 1 · 5/6 − 1/36 = 125/36.

We can write our total earnings from the dice game as

Dtotal = D1 + · · · + D36

4
where each Di is the earnings from game #i. Applying the CLT, we see that the distribution of
Dtotal is approximately
N (36 · −1/6, 36 · 125/36) = N (−6, 125).
Then

Dtotal − (−6) 0 − (−6) Dtotal − (−6)
P(Dtotal > 0) = P √ > √ =P √ > 0.53 ≈ 1−Φ(0.53) = 0.298.
125 125 125

(b) For the second game, in each round of the game you throw a football, and the money you win
is equal to 0.1(F − 20), where F (in feet) is the distance that you threw the ball. Assume that
F follows an Exponential(0.1) distribution. Suppose you play the second game 50 times. What
is the approximate probability that you lose no more than $30 in total when playing the second
game?

Solution: Let B1 be the dollars won when playing the ball game once. Then

1
E(B1 ) = 0.1(E(F ) − 20) = 0.1 − 20 = −1, Var(B1 ) = (0.1)2 Var(F ) = (0.1)2 /(0.1)2 = 1.
0.1
We can write our total earnings from the ball game as

Btotal = B1 + · · · + B50

where each Bi is the earnings from game #i. Applying the CLT, we see that the distribution of
Btotal is approximately
N (50 · −1, 50 · 1) = N (−50, 50).
Then

Btotal − (−50) −30 − (−50)
P(Btotal > −30) = P √ > √ ≈ 1 − Φ(2.828) = 0.0023.
50 50

(c) Now combine all your games—you play the first game 36 times and then the second game 50
times. What is the probability of losing less than $72 in total?

Solution: Let T be your total earnings,

T = Dtotal + Btotal .

From the work above we know that Dtotal ≈ N (−6, 125) in distribution, and Btotal ≈ N (−50, 50)
in distribution. Furthermore, adding two independent normal random variables yields a nor-
mal random variable. Therefore, since Dtotal and Btotal are independent and are approximately
normal, their sum is approximately normally distributed:

T ≈ N (−56, 175).

Hence

T − (−56) −72 − (−56) T − (−56)
P(T > −72) = P √ > √ =P √ > −1.21 ≈ 1−Φ(−1.21) = 0.887.
175 175 175

5
6. Let X ∼ Binomial(60, 0.22) and let Y = X/60 be the proportion of successes in the sample.
(a) What is the normal distribution that approximates the distribution of Y ?

Solution: We can write X = X1 +. . . X60 where Xi is the indicator variable for success on the ith
trial. For each individual Xi we calculate mean µ = 0.22 and variance σ 2 = 0.22(1−0.22) = 0.1716.
So we have
E(Y ) = E(X̄) = µ = 0.22
and
0.1716
Var(Y ) = Var(X̄) = σ 2 /n = = 0.00286.
60
By the CLT, Y ’s distribution is approximately N(0.22, 0.00286).

(b) Calculate (approximately) the probability P(Y ≤ 0.25) (you can ignore issues of continuity cor-
rections etc). (To obtain values of Φ(x), the CDF of the normal distribution, you can use Table
2 in the back of your book or just search online for “standard normal table”. Or, if you have R,
you can use the command pnorm.)

Solution:

Y − 0.22 0.25 − 0.22 Y − 0.22
P(Y ≤ 0.25) = P √ ≤ √ =P √ ≤ 0.561 ≈ Φ(0.561) ≈ 0.71.
0.00286 0.00286 0.00286

Introduction To Stochastic Processes With R (Solution Manual) (Robert P. Dobrow)
No ratings yet
Introduction To Stochastic Processes With R (Solution Manual) (Robert P. Dobrow)
98 pages
CQA Certification Guide and How To Crack Exam On Asq Certified Quality Auditor
100% (1)
CQA Certification Guide and How To Crack Exam On Asq Certified Quality Auditor
15 pages
Assignment 1 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 1 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
7 pages
Hw2sol PDF
No ratings yet
Hw2sol PDF
15 pages
hw01 Sol
No ratings yet
hw01 Sol
6 pages
Math 16B: Homework 7 Solutions: 1. Let Z Be The Standard Normal Random Variable. Find: (A) P (Z (B) P (0.45
No ratings yet
Math 16B: Homework 7 Solutions: 1. Let Z Be The Standard Normal Random Variable. Find: (A) P (Z (B) P (0.45
13 pages
Assignment
No ratings yet
Assignment
157 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Final Study Solutions
No ratings yet
Final Study Solutions
14 pages
Msqe Metrics 1 ps2
No ratings yet
Msqe Metrics 1 ps2
11 pages
A2 (Partial Solution)
No ratings yet
A2 (Partial Solution)
7 pages
Tutorial 1
No ratings yet
Tutorial 1
30 pages
MAT2337 December 2010 Final Exam
No ratings yet
MAT2337 December 2010 Final Exam
11 pages
HW 2 Sol
No ratings yet
HW 2 Sol
10 pages
MMAT5340 Middle18 Sol
No ratings yet
MMAT5340 Middle18 Sol
9 pages
Review 2024 04
No ratings yet
Review 2024 04
5 pages
X400004 20220215 Solutions
No ratings yet
X400004 20220215 Solutions
8 pages
HW04 Sol
No ratings yet
HW04 Sol
14 pages
MIT18 05S14 Exam Final Sol
No ratings yet
MIT18 05S14 Exam Final Sol
11 pages
2016 Spring 600 Midtermtwosols
No ratings yet
2016 Spring 600 Midtermtwosols
2 pages
Practice Mock End Term Jan 2022
No ratings yet
Practice Mock End Term Jan 2022
13 pages
ESM3a: Advanced Linear Algebra and Stochastic Processes
No ratings yet
ESM3a: Advanced Linear Algebra and Stochastic Processes
12 pages
SlidesCourse 14 Oct
No ratings yet
SlidesCourse 14 Oct
10 pages
Worksheets
No ratings yet
Worksheets
35 pages
Optimal Subsampling Algorithms For Massive Datasets: A Theoretical and Empirical Analysis of Statistical Efficiency and Computational Cost.
No ratings yet
Optimal Subsampling Algorithms For Massive Datasets: A Theoretical and Empirical Analysis of Statistical Efficiency and Computational Cost.
10 pages
CH 3
No ratings yet
CH 3
79 pages
W7GS
100% (1)
W7GS
10 pages
MIT18 05S14 Class27-Sol
No ratings yet
MIT18 05S14 Class27-Sol
11 pages
Probability Theory and Statistics (2023-05-25)
No ratings yet
Probability Theory and Statistics (2023-05-25)
6 pages
Statistics 100A Homework 6 Solutions: Ryan Rosario
No ratings yet
Statistics 100A Homework 6 Solutions: Ryan Rosario
13 pages
Final Exam 2023 - Solutions
No ratings yet
Final Exam 2023 - Solutions
9 pages
SlidesCourse 21 Oct
No ratings yet
SlidesCourse 21 Oct
10 pages
HW 2 Solution
No ratings yet
HW 2 Solution
6 pages
Probability Solution07
No ratings yet
Probability Solution07
7 pages
Solutions For Exercises in Principles of Digital Communication by Rimoldi
No ratings yet
Solutions For Exercises in Principles of Digital Communication by Rimoldi
7 pages
Book chap4CMCM
No ratings yet
Book chap4CMCM
12 pages
Solution Test2 Summer2021
No ratings yet
Solution Test2 Summer2021
13 pages
ML A0
No ratings yet
ML A0
7 pages
Solutions
No ratings yet
Solutions
323 pages
XXXXX
No ratings yet
XXXXX
5 pages
W4PS
No ratings yet
W4PS
8 pages
505 Review Problems For Final Solutions
No ratings yet
505 Review Problems For Final Solutions
6 pages
Final Exam Practice Problems
No ratings yet
Final Exam Practice Problems
8 pages
MSO 201a: Probability and Statistics 2019-20-II Semester Assignment-V Instructor: Neeraj Misra
No ratings yet
MSO 201a: Probability and Statistics 2019-20-II Semester Assignment-V Instructor: Neeraj Misra
27 pages
Probablistics Formulas
No ratings yet
Probablistics Formulas
8 pages
Math277 Assignment1 2024 Solutions
No ratings yet
Math277 Assignment1 2024 Solutions
6 pages
Solutions To Probability Book-136-168
No ratings yet
Solutions To Probability Book-136-168
33 pages
PNS Compre
No ratings yet
PNS Compre
3 pages
EEC 126 Discussion 4 Solutions
100% (1)
EEC 126 Discussion 4 Solutions
4 pages
Assign3sol 240910 160118
No ratings yet
Assign3sol 240910 160118
5 pages
1 Solving Probability Problems With R: # We Can Use The Built-In Function "Sample" As Follows
No ratings yet
1 Solving Probability Problems With R: # We Can Use The Built-In Function "Sample" As Follows
4 pages
Lec3 Inverse Transformation Rejection
No ratings yet
Lec3 Inverse Transformation Rejection
46 pages
Endsem Solutions
No ratings yet
Endsem Solutions
19 pages
Week One Notes
No ratings yet
Week One Notes
10 pages
Mit18 05s22 Practice Ex1 Qa
No ratings yet
Mit18 05s22 Practice Ex1 Qa
9 pages
W7PS
No ratings yet
W7PS
12 pages
HW4
No ratings yet
HW4
7 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
4.5/5 (2)
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Speed Mathamatics
From Everand
Speed Mathamatics
Naila Hina
1/5 (1)
Syllabus
No ratings yet
Syllabus
5 pages
2021 Autumn Final
No ratings yet
2021 Autumn Final
9 pages
Whole Food Meaning
No ratings yet
Whole Food Meaning
1 page
Wilk 2018 Full Text Final
No ratings yet
Wilk 2018 Full Text Final
8 pages
PSet 7
No ratings yet
PSet 7
2 pages
Griffiths, Chapter 2, Problem 2.27 2. Griffiths, Chapter 2, Problem 2.28 3. Griffiths, Chapter 2, Problem 2.47 4. Griffiths, Chapter 2, Problem 2.53
No ratings yet
Griffiths, Chapter 2, Problem 2.27 2. Griffiths, Chapter 2, Problem 2.28 3. Griffiths, Chapter 2, Problem 2.47 4. Griffiths, Chapter 2, Problem 2.53
1 page
Reading For Physics 220, Mathematical Physics
No ratings yet
Reading For Physics 220, Mathematical Physics
1 page
DLL COT v.2.0
No ratings yet
DLL COT v.2.0
4 pages
Correlation Lecture
No ratings yet
Correlation Lecture
10 pages
Determinants of Capital Structure in Tanzania
100% (2)
Determinants of Capital Structure in Tanzania
34 pages
Advanced Statistical Techniques Using R: Outliers and Missing Data
No ratings yet
Advanced Statistical Techniques Using R: Outliers and Missing Data
28 pages
Sample Size Calculation For Animal Studies
No ratings yet
Sample Size Calculation For Animal Studies
2 pages
Eutranfreqrelation Ericsson4g3gpp 20200204195643637 138 72
No ratings yet
Eutranfreqrelation Ericsson4g3gpp 20200204195643637 138 72
11 pages
York University MATH 2565 (Winter 2017) : Introduction To Applied Statistics Midterm Examination - Tue, Feb 28, 2017, 10:00 A.m.-11:15 A.M
No ratings yet
York University MATH 2565 (Winter 2017) : Introduction To Applied Statistics Midterm Examination - Tue, Feb 28, 2017, 10:00 A.m.-11:15 A.M
6 pages
Unit 7 - Week 4: Assignment 4
No ratings yet
Unit 7 - Week 4: Assignment 4
5 pages
Module 4. Data Collection and Sampling Week 3
No ratings yet
Module 4. Data Collection and Sampling Week 3
29 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Lesson1 - Simple Linier Regression
No ratings yet
Lesson1 - Simple Linier Regression
40 pages
Untitled
No ratings yet
Untitled
60 pages
PowerPoint CH 03b
No ratings yet
PowerPoint CH 03b
50 pages
FIFA WORLD Cup Kaushalkumar
No ratings yet
FIFA WORLD Cup Kaushalkumar
33 pages
Research Paper Chapter 1 3 Bardelosa Et Al.
No ratings yet
Research Paper Chapter 1 3 Bardelosa Et Al.
29 pages
Exam Research 2 Students
No ratings yet
Exam Research 2 Students
7 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
DMRT For Table 1 2 and 3 Date 19.10.2024
No ratings yet
DMRT For Table 1 2 and 3 Date 19.10.2024
56 pages
2024 Way Eco3104-11 Econometrics (1) Seokjoo Andrew Chang p1
No ratings yet
2024 Way Eco3104-11 Econometrics (1) Seokjoo Andrew Chang p1
2 pages
Mid Term Report
No ratings yet
Mid Term Report
11 pages
6.5 - The Central Limit Theorem: Objectives
No ratings yet
6.5 - The Central Limit Theorem: Objectives
6 pages
Firefight V 1.3.1
No ratings yet
Firefight V 1.3.1
121 pages
EPHREM Dejene Final Thesis 2025
No ratings yet
EPHREM Dejene Final Thesis 2025
101 pages
Institute of Mathematical Statistics
No ratings yet
Institute of Mathematical Statistics
16 pages
Types of Data
No ratings yet
Types of Data
14 pages
STA108 - Group Projects (Guidelines)
No ratings yet
STA108 - Group Projects (Guidelines)
6 pages
Compaire Mean Assignments
No ratings yet
Compaire Mean Assignments
4 pages
Midterm BioStat 2023
No ratings yet
Midterm BioStat 2023
11 pages

PSet6 Solutions

Uploaded by

PSet6 Solutions

Uploaded by

Stat 244 Winter 2024 — Problem set 6

Due on Gradescope, Thursday Feb 15 (9:30am)

Solution: The likelihood for X given T is given by the conditional PMF

We can simplify this as follows:

p∗ (x) 0.6 · 0.4x−1

So, our acceptance probability function is

3. Consider the following two density functions:

(a) Use Chebyshev’s inequality to find an upper bound on

P(|p̂ − p| ≥ ϵ) ≤ (some function that depends on n and on ϵ but not on p)

P(|p̂ − p| ≥ (some function of n and α)) ≤ α.

Then, based on this answer, compute an interval of the form

P(the true parameter p lies in the interval) ≥ 1 − α.

Solution: Choosing ϵ = √1 , we get 1

We can write our total earnings from the dice game as

Solution: Let T be your total earnings,

You might also like