Assignment 1
Assignment 1
Assignment 1
Robert Tan
School of Mathematics and
Statistics
[email protected]
Robert Tan MATH3871: Assignment 1.
Question 1
Let be the true proportion of people over the age of 40 in your community with hypertension.
Consider the following thought experiment:
Part (a)
Making an educated guess, suppose we choose an initial point estimate of = 0.2, obtained by
taking the expectation of a Beta(2, 8) distribution. We choose this type of distribution since it
is the conjugate prior of a binomial distribution, which is the distribution of our data.
Part (b)
If we survey for hypertension within the community, and the first five people randomly selected
have 4 positives, then our posterior distribution can be evaluated as follows:
f () (1 )7
L x| 4 (1 )
So our posterior has a Beta(6, 9) distribution, and the new point estimate using the expected
6
value is 6+9 = 0.4.
Part (c)
If our final survey results are 400 positives out of 1000 people, we can once again compute the
posterior as follows:
f () (1 )7
L x| 400 (1 )600
So our posterior has a Beta(402, 608) distribution, and the new point estimate using the ex-
402
pected value is 402+608 = 0.39802.
1
Robert Tan MATH3871: Assignment 1.
Question 2
Let x1 , . . . , xn Rd be n iid d-dimensional vectors. Suppose that we wish to model xi
Nd (, ) for i = 1, . . . , n where R is an unknown mean vector, and is a known positive
semi-definite covariance matrix.
Part (a)
Claim. By adopting the conjugate prior Nd (0 , 0 ), the resulting posterior distribution
for |x1 , . . . , xn is Nd , , where
1
= 1
0 + n1
1
0 0 + n 1
x
and 1
= 1
0 + n 1
.
Adding in a constant term to complete the square and factorising (again, we can do this
because of proportionality):
1
1 >
> 1 > 1
1 1 1 1
exp
0 0 + nx 0 + n 0 + n
2
1
1 0 + n
1
1 1
0 0 + n x
.
2
Robert Tan MATH3871: Assignment 1.
Using (Ax)> = x> A> and the fact that covariance matrices (and their inverses) are symmetric
and hence invariant under the transpose, we obtain
1 > 1
1 1 1 1 > 1 > 1 1 1
0 + n 0 0 + n x = 0 0 + nx 0 + n .
So we have
!
1
1 >
>
f|x1 ,...,x2 () exp 1
0
1
+ n 1
0 0
1
+ n x 1
0
1
+ n
2
1
01 + n1 1 1
0 0 + n x
which means the posterior distribution is a multivariate normal Nd , , where
1
= 1
0 + n1
1
0 0 + n 1
x
and 1
= 1
0 + n 1
.
Part (b)
We now derive Jeffreys prior J () for . We have the likelihood function from above:
n
1 1 X
(xi )> 1 (xi ) .
L x1 , . . . , xn | = nd/2 n/2
exp
(2) || 2 i=1
d >
x Ax = (xi aij e>
j x)
dx
= (xi aij xj )
= 2aii xi + (aij + aji)xj e>
i
where j 6= i, since we have one x2i term and the rest are xi xj terms
3
Robert Tan MATH3871: Assignment 1.
n d
= > 1 x> 1 > 1 x
2 d
Using the abovelemma, with the fact that and hence 1 are symmetric, and noting that
>
d
dx
x> A = dx
d
A> x = A where A is an n k matrix, with k Z+ (a result we can
confirm easily using summation notation):
d
L = n > 1 x> 1
d
d2
L = n1 .
d2
! 21
2
d
E L 1.
d2
since the square root of the expectation of the determinant of a constant matrix will also be
a constant. We see that Jeffreys prior for the multivariate normal distribution with fixed
covariance matrix and unknown mean vector is simply proportional to a constant. This result
is similar to the one for a univariate Gaussian distribution with fixed variance, which also has
a constant (improper) distribution for its Jeffreys prior.
Question 3
Part (a)
We know that p is an estimate of the ratio of the area of the circle to the area of the square.
r2
This ratios true value is (2r) 2 = 4 , so this means 4p is an estimate of .
Part (b)
> n <- 1000
> x1 <- runif(n, -1, 1)
> x2 <- runif(n, -1, 1)
> ind <- ((x1^2 + x2^2) < 1)
> pi.hat <- 4 * (sum(ind) / n)
> pi.hat
[1] 3.156
4
Robert Tan MATH3871: Assignment 1.
Part (c)
We know that bi is a Bernoulli r.v. with probability /4, so its variance is 4 1 4 . Then
n
X bi
the sampling variability of = 4 is simply 42 n 4 1 4 n12 = (4)
n
, so by the
i=1
n
Central Limit Theorem we have
!
d (4 )
n
N , .
n
Note: we can re-write this in terms of p as
!
d 16p (1 p)
n
N 4p, .
n
Part (d)
n <- 1000
p <- 0.7854
pi <- 4*p
var <- 16*p*(1-p)/n
pi.hat <- c()
for (i in c(1:1000)) {
x1 <- runif(n, -1, 1)
x2 <- runif(n, -1, 1)
ind <- ((x1^2 + x2^2) < 1)
pi.hat[i] <- 4 * (sum(ind) / n)
}
hist(pi.hat, breaks = 20, freq = FALSE)
x <- seq(min(pi.hat),max(pi.hat),length = 100)
y <- dnorm(x, mean = 4*p, sd = sqrt(var))
points(x, y, type = "l")
Histogram of pi.hat
6
Density
4
2
0
pi.hat
We can see that the histogram fits the overlay distribution fairly well.
5
Robert Tan MATH3871: Assignment 1.
Part (e)
16p(1p)
We know that the variance is given by n
, which is maximised at p = 0.5, giving n4 .
We choose to maximise the variance since this will result in maximal Monte Carlo sampling
variability, which is what we need for the most conservative estimate of the sample size n
required to estimate to within 0.01 with at least 95% probability. Solving for n:
P | | 0.01 0.95
So n = 153664 is a conservative sample size for estimating to within 0.01 with at least 95%
probability. To be even more conservative, we could round up to 160000 samples (after all, we
are using a normal approximation).