0% found this document useful (0 votes)
218 views

Assignment 1

This document is a math assignment that contains two questions. For question 1, the document estimates the proportion of people over 40 with hypertension in a community based on survey data. It calculates the posterior distribution after initial and final surveys using Bayesian inference and beta distributions. For question 2, the document derives the posterior distribution for the mean μ of a normal distribution when using a normal prior. It shows the posterior is also normal and provides formulas for the posterior mean and covariance. It then begins to derive Jeffreys' prior for μ but does not complete it.

Uploaded by

robert_tan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
218 views

Assignment 1

This document is a math assignment that contains two questions. For question 1, the document estimates the proportion of people over 40 with hypertension in a community based on survey data. It calculates the posterior distribution after initial and final surveys using Bayesian inference and beta distributions. For question 2, the document derives the posterior distribution for the mean μ of a normal distribution when using a normal prior. It shows the posterior is also normal and provides formulas for the posterior mean and covariance. It then begins to derive Jeffreys' prior for μ but does not complete it.

Uploaded by

robert_tan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

MATH3871

Assignment 1

Robert Tan
School of Mathematics and
Statistics
[email protected]
Robert Tan MATH3871: Assignment 1.

Question 1
Let be the true proportion of people over the age of 40 in your community with hypertension.
Consider the following thought experiment:

Part (a)
Making an educated guess, suppose we choose an initial point estimate of = 0.2, obtained by
taking the expectation of a Beta(2, 8) distribution. We choose this type of distribution since it
is the conjugate prior of a binomial distribution, which is the distribution of our data.

Part (b)
If we survey for hypertension within the community, and the first five people randomly selected
have 4 positives, then our posterior distribution can be evaluated as follows:

f () (1 )7
L x| 4 (1 )


f|x () p () px| (x)


5 (1 )8

So our posterior has a Beta(6, 9) distribution, and the new point estimate using the expected
6
value is 6+9 = 0.4.

Part (c)
If our final survey results are 400 positives out of 1000 people, we can once again compute the
posterior as follows:

f () (1 )7
L x| 400 (1 )600


f|x () p () px| (x)


401 (1 )607

So our posterior has a Beta(402, 608) distribution, and the new point estimate using the ex-
402
pected value is 402+608 = 0.39802.

1
Robert Tan MATH3871: Assignment 1.

Question 2
Let x1 , . . . , xn Rd be n iid d-dimensional vectors. Suppose that we wish to model xi
Nd (, ) for i = 1, . . . , n where R is an unknown mean vector, and is a known positive
semi-definite covariance matrix.

Part (a)
Claim. By adopting the conjugate  prior Nd (0 , 0 ), the resulting posterior distribution
for |x1 , . . . , xn is Nd , , where
 1  
= 1
0 + n1
1
0 0 + n 1
x

and  1
= 1
0 + n 1
.

Proof. We have the prior Nd (0 , 0 ), so


 
1 1 > 1
f () = exp ( 0 ) 0 ( 0 ) .
(2)d/2 |0 |1/2 2
We also have the likelihood function as follows:

n
1 1X
(xi )> 1 (xi ) .

L x1 , . . . , xn | = nd/2 n/2
exp
(2) || 2 i=1

Calculating the posterior:



f|x1 ,...,x2 () p () L x1 , . . . , xn |

n
1 1 X
exp ( 0 )> 1 0 ( 0 ) (xi )> 1 (xi )
2 2 i=1

Expanding and eliminating the constant terms due to proportionality:



1
exp > 1 > 1 > 1
0 0 0 0 0 +
2
n n
!
X X
n> 1 x> 1
i > 1 xi
i=1 i=1

Adding in a constant term to complete the square and factorising (again, we can do this
because of proportionality):

 1  
1 >

> 1 > 1

1 1 1 1

exp
0 0 + nx 0 + n 0 + n
2

  1  
1 0 + n
1
1 1
0 0 + n x
.

2
Robert Tan MATH3871: Assignment 1.

Using (Ax)> = x> A> and the fact that covariance matrices (and their inverses) are symmetric
and hence invariant under the transpose, we obtain
 1  >   1
1 1 1 1 > 1 > 1 1 1
0 + n 0 0 + n x = 0 0 + nx 0 + n .

So we have
!
1
 1   >  
>
f|x1 ,...,x2 () exp 1
0
1
+ n 1
0 0
1
+ n x 1
0
1
+ n
2

  1  
01 + n1 1 1
0 0 + n x

 
which means the posterior distribution is a multivariate normal Nd , , where
 1  
= 1
0 + n1
1
0 0 + n 1
x

and  1
= 1
0 + n 1
.


Part (b)
We now derive Jeffreys prior J () for . We have the likelihood function from above:

n
1 1 X
(xi )> 1 (xi ) .

L x1 , . . . , xn | = nd/2 n/2
exp
(2) || 2 i=1

Lemma. If x is a n 1 vector, and A is a n n matrix, then we have


d  >  >

>

x Ax = x A + A .
dx
Proof. We shall use Einsteins summation convention for this proof for clarity. Let x =
(x1 , . . . , xn )> , ej be the jth basis column vector, and let [A]ij = aij .

d  > 
x Ax = (xi aij e>
j x)
dx
= (xi aij xj )
= 2aii xi + (aij + aji)xj e>

i

where j 6= i, since we have one x2i term and the rest are xi xj terms

= xi aii + aij xj e>


  >
i + xi aii + aji xj ei
 
= x> A> + A .

3
Robert Tan MATH3871: Assignment 1.

Now, returning to the derivation of Jeffreys prior:


! n
1 1X
(xi )> 1 (xi )

log L x1 , . . . , xn | = log nd/2 n/2

(2) || 2 i=1

n n n
d 1 d > 1 X X X
L= n x> 1
i > 1 xi + x> 1
i xi
d 2 d i=1 i=1 i=1

n d  
= > 1 x> 1 > 1 x
2 d
Using the abovelemma, with the fact that and hence 1 are symmetric, and noting that
   > 
d
dx
x> A = dx
d
A> x = A where A is an n k matrix, with k Z+ (a result we can
confirm easily using summation notation):
d  
L = n > 1 x> 1
d

d2
L = n1 .
d2
! 21
2

d
E L 1.
d2

since the square root of the expectation of the determinant of a constant matrix will also be
a constant. We see that Jeffreys prior for the multivariate normal distribution with fixed
covariance matrix and unknown mean vector is simply proportional to a constant. This result
is similar to the one for a univariate Gaussian distribution with fixed variance, which also has
a constant (improper) distribution for its Jeffreys prior.

Question 3
Part (a)
We know that p is an estimate of the ratio of the area of the circle to the area of the square.
r2
This ratios true value is (2r) 2 = 4 , so this means 4p is an estimate of .

Part (b)
> n <- 1000
> x1 <- runif(n, -1, 1)
> x2 <- runif(n, -1, 1)
> ind <- ((x1^2 + x2^2) < 1)
> pi.hat <- 4 * (sum(ind) / n)
> pi.hat
[1] 3.156

The above R code gives a one-trial estimate of 4p = = 3.156.

4
Robert Tan MATH3871: Assignment 1.

Part (c)
 
We know that bi is a Bernoulli r.v. with probability /4, so its variance is 4 1 4 . Then
n
X bi  
the sampling variability of = 4 is simply 42 n 4 1 4 n12 = (4)
n
, so by the
i=1
n
Central Limit Theorem we have
!
d (4 )
n
N , .
n
Note: we can re-write this in terms of p as
!
d 16p (1 p)
n
N 4p, .
n

Part (d)
n <- 1000
p <- 0.7854
pi <- 4*p
var <- 16*p*(1-p)/n
pi.hat <- c()
for (i in c(1:1000)) {
x1 <- runif(n, -1, 1)
x2 <- runif(n, -1, 1)
ind <- ((x1^2 + x2^2) < 1)
pi.hat[i] <- 4 * (sum(ind) / n)
}
hist(pi.hat, breaks = 20, freq = FALSE)
x <- seq(min(pi.hat),max(pi.hat),length = 100)
y <- dnorm(x, mean = 4*p, sd = sqrt(var))
points(x, y, type = "l")

Histogram of pi.hat
6
Density

4
2
0

3.00 3.05 3.10 3.15 3.20 3.25 3.30

pi.hat

We can see that the histogram fits the overlay distribution fairly well.

5
Robert Tan MATH3871: Assignment 1.

Part (e)
16p(1p)
We know that the variance is given by n
, which is maximised at p = 0.5, giving n4 .
We choose to maximise the variance since this will result in maximal Monte Carlo sampling
variability, which is what we need for the most conservative estimate of the sample size n
required to estimate to within 0.01 with at least 95% probability. Solving for n:

P | | 0.01 0.95

We apply the CLT and use a normal approximation to get:


!
0.01 0.01
P Z 0.95 where Z N (0, 1)
2/ n 2/ n
!
0.01
2 P Z 0.5 0.95
2/ n
!
0.01
P Z 0.975
2/ n
0.01
1.96
2/ n

n 392
n 153664.

So n = 153664 is a conservative sample size for estimating to within 0.01 with at least 95%
probability. To be even more conservative, we could round up to 160000 samples (after all, we
are using a normal approximation).

You might also like