0% found this document useful (0 votes)
18 views66 pages

Inference Quals 1992-2019

The exam is 4 hours long from 1:00 PM to 5:00 PM. It is closed book but students can bring up to 4 pages of cheat sheets. Using electronics is prohibited. The exam involves solving multiple hypothesis testing problems simultaneously to minimize classification risk. The minimum risk is found using Bayes' rule and is expressed as an integral involving the null and alternative distributions.

Uploaded by

Anirban Nath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views66 pages

Inference Quals 1992-2019

The exam is 4 hours long from 1:00 PM to 5:00 PM. It is closed book but students can bring up to 4 pages of cheat sheets. Using electronics is prohibited. The exam involves solving multiple hypothesis testing problems simultaneously to minimize classification risk. The minimum risk is found using Bayes' rule and is expressed as an integral involving the null and alternative distributions.

Uploaded by

Anirban Nath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Theoretical Statistics Monday, August 26 2019

Qualifying Exam

The exam is 4 hours long (from 1:00 PM until 5:00 PM). It is closed books/notes. But you
are allowed to bring up to four pages of cheat-sheets. Using laptops, smart phones, etc. is
also prohibited during the exam. In case you feel that a problem is incorrect or you need
more assumptions to solve it, please clearly explain your reasoning and the assumptions
you need, and then solve that problem.

1. Let ✓1 , . . . , ✓m be i.i.d. Bernoulli(p), p 2 (0, 1) is assumed known, and suppose that our
data {Xi }m i=1 follows the model

Xi |✓i ⇠ (1 ✓i )f0 + ✓i f1 , for i = 1, . . . , m,

where f0 and f1 are known density functions. The Xi ’s are assumed to be drawn
independently. This model is widely used in multiple testing problems where ✓i is zero
(one) if the i-th null hypothesis is true (false), and Xi can be thought of as the z-score
(which can be assumed to have a normal distribution under null, i.e., f0 is N (0, 1)).
We are interested in inference about the unknown ✓ = (✓1 , . . . , ✓m ) 2 {0, 1}m based
on X = (X1 , . . . , Xm ). This involves solving m decision problems simultaneously and
is called a compound decision problem. Let = ( 1 , . . . , m ) 2 {0, 1}m be a general
decision rule (i.e., i = 0 means we believe that ✓i = 0). This naturally gives rise to a
weighted classification problem with loss function
m
1 X
L (✓, ) := [ I(✓i = 0) i + I(✓i = 1)(1 i )] (1)
m i=1

where > 0 is the relative weight for a false positive, and I(·) denotes the indicator
function. The weighted classification problem is then to find that minimizes the
classification risk E[L (✓, )].

(a) (3 marks) Find the marginal distribution of Xi , for i = 1, . . . , m.


(b) (3 marks) Find the posterior distribution of ✓ given X.
(c) (6 marks) Find the Bayes rule and hence the decision rule that minimizes the
classification risk, with L given in (1).

1
(d) (6 marks) Show that the minimum classification risk is
Z
⇤ -
R := inf E[L (✓, )] = p + [ (1 p)f0 (x) + pf1 (x)]dx,
K

where K = {x 2 R : (1 p)f0 (x) < pf1 (x)}.

2. Independent identically-distributed observations X1 , . . . , Xn take values in the set S =


{1, 2}, with common distribution which under the null hypothesis is of the form

P(X1 = k) = ⇡k ,

for some fixed ⇡k 2 (0, 1). Under the alternative hypothesis, the probability mass
function of Xi is unrestricted.

(a) (7 marks) Calculate the likelihood ratio test statistic for this problem and call it
LR. 1
P
(b) (8 marks) Define Nj = ni=1 I(Xi = j), and prove that under the null hypothesis
2
!
X (Nj n⇡j )2
LR = op (1).
j=1
n⇡ j

P2 (Nj n⇡j )2
(c) (4 marks) Characterize the limiting distribution of j=1 n⇡j
under the null
hypothesis.

3. Let X1 , . . . , Xn be a simple random sample (i.i.d.) from a parametric family {f✓ : ✓ 2


⇥ ⇢ R}, which satisfies the usual regularity conditions (e.g. log f✓ is continuously dif-
ferentiable (with respect to ✓) and the derivatives have finite moments). Let ⇢(u, x) be
a twice di↵erentiable convex loss function satisfying, for any ✓, E✓ ⇢(u, Xi ), as function
of u, is minimized at u = ✓. Here, E✓ denotes the expectation under density f✓ . Define
the corresponding M -estimator
n
X
✓˜ = arg min ⇢(✓, Xi ).

i=1

1
I should remind you that for Z ⇠ p✓ (z) the likelihood ratio statistic is defined as

max✓2⇥ p✓ (z)
LR = 2 log ,
max✓2⇥0 p✓ (z)

where ⇥0 is the null-set and ⇥ is the entire set of parameters.

2
Assuming sufficient smoothness and moment conditions (the consistency of ✓˜ can be
assumed as well), we have the following representation
n
1X 1
✓˜ ✓0 = ⌘(Xi ) + op ( p ), (1)
n i=1 n

where ✓0 denotes the true parameter value and ⌘(x) is known as the influence function.
p
(a) (9 marks) Suppose that ✓˜ ! ✓0 , and derive the influence function ⌘ in terms of ⇢
(and its derivatives).
(b) (10 marks) A special case of M-estimator is the MLE (⇢ = log f✓ ). Let ✓ˆ denote
the MLE and write, similarly to (1),
n
1X 1
✓ˆ ✓0 = ⇠(Xi ) + op ( p ). (2).
n i=1 n

You can make any reasonable assumption about the family of distributions. Show
that E✓0 {[⌘(Xi ) ⇠(Xi )]⇠(Xi )} = 0.

4. Suppose that (X1 , X2 , . . . , Xn ) have a multivariate Gaussian distribution with E(Xi ) =


✓, for some unknown ✓ 2 ( 1, 1) and n ⇥ n known covariance matrix ⌃.
(a) (6 marks) Suppose that Tn is a sequence of consistent estimators for ✓, and Un is
a sequence of sufficient statistics. Show that there exists a sequence of consistent
estimators which is a function of Un .
Hint: Note that E✓ (Tn ) may not be defined.
(b) (3 marks) Assume that ⌃(i, j) = min(i, j), and show that ⌃ 1 1 = e, where e
is the first canonical vector given by e(i) = I(i = 1). I(·) denotes the indicator
function.
(c) (6 marks) Under the same assumption as in part (ii), show that there does not
exist any consistent sequence of estimators for ✓.
(d) (7 marks) Now assume that ⌃(i, j) := e|i j|
. Does there exist any consistent
sequence of estimators for ✓ in this case?

5. Let Ti 2 Z+ = {0, 1, 2, 3, . . .}, i from 1 to n be independent and identically distributed


non-negative random variables with probability mass function f ✓ (t), where ✓ is an
unspecified one-dimensional parameter. For a fixed c 2 Z+ , define the following two
random variables:
Tic = min(Ti , c),
i = I(Ti  c).
c
(2)

3
Let Lc✓ denote the likelihood for the Tic and ci , i from 1 to n. Let S✓c denote the score
for ✓ in this model: S✓c is the derivative with respect to ✓ of the logarithm of Lc✓ with
respect to ✓, evaluated at the true value of ✓.

(a) (6 marks) Calculate the conditional distribution of Tic given c


i. Simplify the
formula for n
Y
Lc✓ = p✓Tic | ci (tci | ic )p✓ ci ( ic ).
i=1

as much as you can.


(b) (6 marks) Calculate the score S✓c . Use your formula to calculate the Fisher in-
c
formation. Simplify the expression as much as you can, and denote it with I✓,✓ .

Feel free to make reasonable assumptions about f (t) (e.g. smoothness, common
support).
(c) (6 marks) Let c1  c2 . Calculate E(S✓c1 S✓c2 ).
c1 c2
(d) (4 marks) Suppose that for c1 < c2 we have n1 I✓,✓ = t1 and n1 I✓,✓ = t2 . Characterize
1 c1 p1 c2
the limiting joint distribution of ( n S✓ , n S✓ ). Clarify any assumption you
p

require to obtain the limiting distribution.

4
Inference Qual (60 points)
Date: August 27, 2018

1. 2+4+4=10 points
i.i.d.
Suppose (X1 , · · · , Xn ) ⇠ N (µ, 2 ), and given (X1 , · · · , Xn ) we have (Y1 , · · · , Yn )
mutually independent, with Yi ⇠ N ( Xi , 1), where 2 R is an unknown parameter.
You observe both (X1 , · · · , Xn ) and (Y1 , · · · , Yn ), and you want to test H0 : = 0
versus H1 : > 0 at level ↵, where ↵ 2 (0, 1).
(a) Suppose µ = 0, = 1 is known. Show that there is no UMP level ↵ test for this
problem.
(b) Does there exist a UMPU level ↵ test for the problem in part (a)?
(c) Now suppose µ 2 R and > 0 are both unknown. Does there exist a UMPU
level ↵ test for this problem?
2. 4+6=10 points
Let X ⇠ N (✓, 1) and ✓ ⇠ ⇡(✓). Our goal is to estimate the parameter ✓. Let ⇡ 2 ,
where is the class of priors on R with
Z Z
✓⇡(d✓) = 0, ✓2 ⇡(d✓) = 1.

For any estimator , let


Z h 1 Z i
(x ✓)2
R(⇡, ) := E( (X) 2
✓) = ⇡(✓) p ( (x) ✓)2 e 2 d✓
2⇡
denote the Bayes risk of the estimator .

(a) Setting a,b (X) := aX + b find arg mina,b R(⇡, a,b ).

(b) Compute the value of sup⇡2 inf R(⇡, ), where the infimum is taken over all
estimators (.).

3. 3+3+4=10 points
Let (X1 , Y1 ), . . . , (Xn , Yn ) be iid bivariate Gaussian random vectors with E(Xi ) =
E(Yi ) = 0, Var(Xi ) = Var(Yi ) = 1, and finally Cov(Xi , Yi ) = ⇢.
(a) Find with proof a minimal sufficient statistic for ⇢.
(b) Is the statistic you suggested in part (a) a complete sufficient statistic? Prove
your answer.
P
(c) Is n1 ni=1 Xi Yi the UMVU for ⇢? Prove your answer.
4. 3+4+3=10 points
i.i.d.
Suppose (Y1 , · · · , Yn ) ⇠ N (✓, 1) with ✓ 2 R unknown, and Z ⇠ N (0, 1) independent
of (Y1 , · · · , Yn ). Suppose you observe (X1 , · · · , Xn ), where Xi := Yi + Z. Note that
(X1 , · · · , Xn ) are not i.i.d.

1
(a) Show that (X1 , · · · , XnP) is a multivariate Gaussian, by computing the moment
n
generating function Ee i=1 ti Xi .
(b) Find a minimax estimator for ✓.
(c) Is your minimax estimator consistent for ✓?

Hints: A multivariate normal density with mean vector µ and covariance matrix
⌃ has joint density
1 0 1
p e (x µ) ⌃ (x µ)/2 .
(2⇡)n |⌃|
5. 3+2+5=10 points
i.i.d.
Suppose (X1 , · · · , Xn ) ⇠ N (0, 1). Given (X1 , · · · , Xn ), let (Y0 , · · · , Yn ) be mutually
independent, with Y0 ⇠ N (0, 1), and Yi = ⇢Yi 1 + Xi . Here ⇢ 2 ( 1, 1) is an unknown
parameter. Suppose you only observe (Y0 , · · · , Yn ).

(a) Starting from the joint density of (Y0 , X1 , · · · , Xn ), find the joint density of
(Y0 , · · · , Yn ).
(b) Find the MLE for ⇢.
(c) Find non degenerate asymptotic distribution of the MLE, when the true param-
eter is ⇢ = 0.

Hints: Note that here (Y0 , · · · , Yn ) are not i.i.d. for general ⇢ 2 ( 1, 1), and so
your class result for asymptotic normality of the MLE does not apply.

6. 4+6=10 points
Suppose for every n 1 we have two probability measures Pn and Qn on R with density
pn (.) and qn respectively, with respect to Lebesgue measure. Assume that pn (.) and
qn (.) are strictly positive on the whole of R.

(a) If the random variable log pn (Xn ) log qn (Xn ) is Op (1) under both the probability
measures Pn and Qn , show that Pn and Qn are mutually contiguous.
(b) Conversely, if Pn and Qn are mutually contiguous, show that the random variable
log pn (Xn ) log qn (Xn ) is Op (1) under both the probability measures Pn and Qn .

Hints: We say a sequence of real valued random variable Yn is Op (1), if

lim sup lim sup P(|Yn | > K) = 0,


K!1 n!1

or equivalently, for any sequence of non negative reals Kn converges to 1 we have

lim sup P(|Yn | > Kn ) = 0.


n!1

2
Statistics Inference Qualifying Exam
Time 1:00 pm-5:00 pm Date: August 21, 2017
i.i.d.
1. Suppose {Xij , 1  i  n, 1  j  p} P⇠ N (0, 1), and given (Xij , 1 
i  n, 1  j  p), we have Yi ⇠ N ( pj=1 j Xij , 1) with (Y1 , · · · , Yn )
mutually indepdent, where 2 Rp is an unknown parameter of inter-
est, where p is fixed. The problem is to estimate . The proposal is
to construct an estimate by minimize the function
n
X p
X p
X
2 2
L( ) := (Yi j Xij ) + n j,
i=1 j=1 j=1

where n 0.

(a) Show that there is a unique minimizer ˆ of the function 7!


L( ), and compute it explicitly.
(b) Find the limit in probability of ˆ when nn converges to the
following limits {0, 5, 1}.
p
(c) Find the limit in distribution of n( ˆ ) when pn converges n
to the following limits {0, 5}.
p
(d) Between the MLE for and ˆ with n = 5 n, which estimator
would you recommend?

2. Suppose X ⇠ N (✓, 1) where ✓ 2 R is an unknown parameter, and


a (X) = a + X for some a 2 R.

(a) Show that no matter what a 2 R is, the estimator a (X) is not a
Bayes estimator for any prior on ✓.
(b) Let ⇡n be a sequence of (non-normal) prior densities with respect
to Lebesgue measure, such that the corresponding sequence of
Bayes estimators ⇡n converge in L2 to a (X), i.e.

Z hZ i
2 1 (x ✓)2 /2 n!1
E( ⇡n (X) X a) = ( ⇡n (x) x a)2 p e dx ⇡n (✓)d✓ ! 0.
R R 2⇡

Find the set of all possible values of a.

1
3. Suppose that W1 , . . . , WN are i.i.d. random elements having a common
distribution P . We assume that P is unknown and ✓ ⌘ ✓(P ) is a one
dimensional parameter of interest. Further suppose that a natural
estimator ✓ˆN converges in distribution at a rate rN , i.e.,
d
rN (✓ˆN ✓0 ) ! G, (1)
where G is non-normal, has mean zero and finite variance 1.
Assume that N is large and write N = n⇥m, where n is still large and
m relatively smaller (e.g., n = 10, 000, m = 100, so that N = 106 ).
With such a large sample size N it might be difficult to compute ✓ˆ
directly. We can define a new “averaged” estimator as follows:
(i) Divide the set of samples W1 , . . . , WN into m disjoint subsets
S1 , . . . , S m .
(ii) For each j = 1, . . . , m, compute the estimator ✓ˆm (j) based on the
data points in Sj .
(iii) Average together these estimators to obtain the final ‘pooled’
estimator:
m
¯ 1 Xˆ
✓N = ✓n (j). (2)
m
j=1

Assume that m is fixed, and n ! 1, so that N ! 1 as well.


(a) Find a scaling under which (✓¯N ✓) converges in distribution
to a non degenerate limit distribution, and describe the limit
distribution as much as possible.
(b) Compare the asymptotic risks with respect to squared error loss
of the two estimators ✓ˆN and ✓¯N , when rN = N for some > 0
Assume if necessary that {[N (✓ˆN ✓)]2 }N 1 is uniformly inte-
grable.
(c) On the basis of your answer in (b), which estimator would you
choose? (Your choice may depend on ).

4. (a) Let X1 , . . . , Xn be independently and identically distributed with


specified continuous, strictly increasing distribution function F .
Define the Kolmogorov-Smirnov statistic by setting
n
1X
Dn := sup 1{Xi  x} F (x) .
x2R n i=1

2
and show that its distribution does not depend on F .
i.i.d.
(b) Suppose we want to test the null hypothesis that X1 , · · · , Xn ⇠
F , where F = U (0, ✓) for some ✓ > 0 which is unknown, versus
the alternative that F is not uniform. Letting J := arg max1in Xi ,
show that under H0 the statistic (J, XJ ) is sufficient for ⇥. Show
further that, given {J = j, Xj = t}, the random variables (Xi :
i 6= j) are independent and have the U [0, t] distribution.
(c) How might you apply the Kolmogorov-Smirnov statistic to test
the hypothesis H0 ? Cut-o↵s for the statistic can be obtained
numerically.
5. Suppose X1 , X2 , X3 are random variables which take value in {0, 1}.
i.i.d.
Under H0 we have X1 , X2 , X3 ⇠ Bin(1, .5). Let ✓ 2 (0, .5) be an
unknown parameter. Find a UMP level ↵ test against the following
alternatives as explicitly as possible.

(a) Let I be such that


1
P(I = 1) = P(I = 2) = P(I = 3) = ,
3
and given I = i, we have {X1 , X2 , X3 } are mutually independent
with Xi ⇠ Bin(1, .5+✓) and Xk ⇠ Bin(1, .5) for k 2 {1, 2, 3}/{i}.
(b) Let I and J be mutually independent, such that
1 1
P(I = 1) = P(I = 2) = P(I = 3) = , P(J = ±1) =
3 2
and given I = i, J = j, we have {X1 , X2 , X3 } are mutually in-
dependent with Xi ⇠ Bin(1, .5 + j✓) and Xk ⇠ Bin(1, .5) for
k 2 {1, 2, 3}/{i}.

6. Let ⇡ and be two independent permutations on Sn which are uni-


formly distributed. Define
1 X
An := 1{(⇡(i) ⇡(j))( (i) (j)) > 0}
n(n 1)
i6=j

(a) Show that the distribution of An is exactly, i.e. non asymptoti-


cally same as that of Bn , where
1 X
Bn := 1{(Xi Xj )(Yi Yj ) > 0},
n(n 1)
i6=j

3
Theoretical Statistics Qualifying Exam
Date: August 22, 2016
i.i.d.
1. Suppose X1 , · · · , Xn ⇠ P oi( ) for some > 0.
(a) If the loss is L( (X), ) = [ (X) ]2 , then find the minimax risk.
(b) If the loss is L( (X), ) = 1 [ (X) ]2 , then find the minimax risk.
2. Decide whether Pn is contiguous to Qn (i.e. Qn dominates Pn ) in the following examples.

(a) Suppose {Xi }i 1 and {Yi }i 1 are mutually independent, with Xi ⇠ Exp(1) and Yi ⇠
Exp(✓i ). Let Pn and
P Qn denote the law of (X1 , · · · , Xn ) and (Y1 , · · · , Yn ) respectively,
and assume that 1 [✓
i=1 i 1] 2 < 1.

(b) Let Pn be the law of Bin(n, pn ), and Qn be the law of P oi( ), where npn converges to
> 0.
(c) Let Pn be the law of 2Bin(n,1/2)
p
n
n
, and Qn be the law of N (0, 1).

3. Suppose that we want to model the survival of twins with a common genetic e↵ect, but with
one of the two twins receiving some treatment. Let X represent the survival time of the
untreated twin and let Y represent the survival time of the treated twin. One (overly simple)
preliminary model might be to assume that X and Y are independent with Exponential( )
and Exponential( µ) distributions, i.e.,
x µy
f ,µ (x, y) = e µe 1(0,1) (x)1(0,1) (y).

(a) One crude approach to estimation in this problem is to reduce the data to W = X/Y .
Find the distribution of W . Hence compute the Cramer-Rao lower bound for unbiased
estimators of µ based on W .
(b) Find the information bound for estimation of µ based on the observation (X, Y ) pairs,
when is known.
(c) Find the information bound for estimation of µ based on the observation (X, Y ) pairs,
when is unknown.
Hints: Recall that for a multivariate parameter, the CR Lower bound is the correspond-
ing diagonal element of the inverse Information matrix.
(d) Compare the bounds you computed in parts (b) and (c) above, and discuss the pros and
cons of reducing to estimating based on W .

4. Suppose X ⇠ N (µ1 , 1) and Y ⇠ N (µ2 , 1) are mutually independent, where µ1 µ2 0.

(a) Express the MLE µˆ1 , µˆ2 as explicitly as possible in terms of (X, Y ).
(b) Find the distribution of the likelihood ratio test statistic for testing H0 : µ1 = µ2 = 0.
i.i.d.
5. Suppose X1 , · · · , Xn ⇠ f✓ (.), where f✓ (x) := 12 [ (x ✓) + (x + ✓)] with (.) the standard
normal density, and ✓ 0.
Let ✓ˆn denote the MLE of ✓. Assume that the true ✓0 = 0.

(a) Show that ✓ˆn converges in probability to 0.


(b) Compute limn!1 P(✓ˆn = 0).
(c) Find ↵ > 0 such that n↵ ✓ˆn converges to a non-trivial distribution.

1
Statistical inference theory Monday, August 17 2015

Qualification Exam

The exam is 4 hours long. It has 5 questions, but you have to answer ONLY FOUR of
them. If you answer more than 4, we will only grade problems 1 to 4. The exam is closed
book/notes. But you are allowed to bring up to three pages of formulas, theorems, etc.
Using laptops is prohibited during the exam.

1. (25 points) Answer the following questions:

(a) (5 points) Let f : R ! R be a function such that f (0) = 0. Also, let Xn be a


p
sequence of random variables such that Xn ! 0. Prove that if

f (h) = o(|h|) as h ! 0,

then f (Xn ) = op (|Xn |).


(b) (5 points) Let X1 , X2 , . . . , Xn ⇠ N (✓, ✓2 ), ✓ 2 R. Either find a complete sufficient
statistic for this family or prove that complete sufficient statistic does not exist.
(c) (5 points) We have X ⇠ p✓ , where ✓ 2 R. We have a test function (x) whose
power (✓) = E✓ ( (X)) has the following properties:
i. (✓) is a di↵erentiable, convex function on R.
ii. (✓) takes its minimum at zero and (0) = ↵ < 1.
What else can you say about this test?
(d) (4 points) Let x ⇠ p✓ with ✓ 2 ⇥0 [⇥1 . Our goal is to test H0 : ✓ 2 ⇥R0 versus H1 :
✓ 2 ⇥1 . For a test , let the power function be denoted by (✓) = (x)dp✓ (x).
We have two priors ⇡0 (✓) and ⇡1 (✓) on ⇥0 and ⇥1 respectively. Our goal is to
find a test that solves

= arg max E1 (✓),
subject to E0 (✓)  ↵, (1)

where Ei (✓) is the expected value with respect to ✓ ⇠ ⇡i . Can you derive ⇤
?
Clarify the assumptions you use in deriving the optimal test.
(e) (6 points) Let X1 , X2 , . . . , Xn be iid randomP
variables with distribution F . Define
n
S as the set of all variables of the form i=1 gi (Xi ), where gi s are arbitrary
measurable functions with Egi (Xi ) < 1. Given a random variable T with E(T ) =
2

0 and E(T 2 ) < 1, we would like to find a random variable Z 2 S that minimizes
E(Z T )2 . Prove that that optimal Z is given by
n
X
Z= E(T | Xi ).
i=1

1
2. (25 points) A statistician and a sports-caster are having an argument. The statistician
claims that Larry Bird (a basketball player) hits half his shots and that di↵erent
attempts are independent. The sports-caster insists that this is nonsense – when Bird
hits a shot he gains momentum and his next shot will go in with a chance ✓ > 1/2, but
if he misses a shot, his next shot will fall with chance 1 ✓ < 1/2. To be specific, let
Xi = 1 if Bird makes his shot on attempt i and Xi = 1 if he misses. The sports-caster
believes that P(X1 = 1) = 1/2 and that

✓, if xi = xi+1 ;
P(Xi+1 = xi+1 |X1 = x1 , . . . , Xi = xi ) =
1 ✓, if xi 6= xi+1 ,

for i 1 and any choice for x1 , . . . , xi 2 {+1, 1}. The statisticians’ model is the
same with ✓ = 1/2.

(a) P
(4 points) Find the joint mass function for X1 , . . . , Xn and show that T =
n 1
i=1 Xi Xi+1 is a sufficient statistic.
(b) (4 points) Determine the form of the uniformly most powerful level-↵ test of
H0 : ✓ = 1/2 versus H1 : ✓ > 1/2.
(c) (4 points) Find the mean and variance of T under H0 .
(d) (5 points) Find the exact distribution of T when ✓ = 1/2.
(e) (4 points) Assuming that T is approximately normal with the mean and vari-
ance you derived in part (C) (it is provided n is large), find a test with level
approximately 5% when n = 40.
(f) (4 points) As empirical evidence supporting his claim, the sports-caster reveals
the following data from a recent game:
HHHM HHHH M M M M M M HM M M HM M HHM HHHHHH M M M HM M M M HM.

“The evidence is clear”, claims the sports-caster. “Bird started o↵ hitting 7 of his
first 8 shots. Then he had a cold spell sinking only 2 of his next 13 attempts. He
found a groove and canned 8 of the next 9, but the momentum switched back and
he only made 2 more shots in his last 10 attempts.What do you think? Would
the uniformly most powerful test with level (approximately) 5% reject the null
hypothesis that ✓ = 1/2, i.e., his shots are independent.

3. (25 points) Let X ⇠ N (✓, 1) where ✓ 2 ⇥ = { 2, 2}.

(a) (3 points) Is |X| a sufficient statistic? Justify your answer.


(b) (4 points) Find the maximum likelihood estimator (MLE) of ✓.
(c) (9 points) Let ⇢
ˆ = 1 if ✓ˆ 6= ✓
L(✓, ✓)
0 if ✓ˆ = ✓.
Use a prior that puts equal probability on ✓ = 2 and ✓ = 2. Find the Bayes
estimator.

2
(d) (9 points) Find a minimax estimator.

4. (25 points) In this problem, when required you may assume that the limit and integral
and the derivative and integral interchange. For a pdf f (y) define
Z
f 0 2 (y)
I(f ) = dy.
f (y)

(a) (4 points) Let X ⇠ N (✓, 2 ). For a di↵erentiable function g prove that if


2 2
lim g(x)e (x ✓) /(2 ) ! 0 as x ! 1 or x ! 1, then

E((X ✓)g(X)) = 2
E(g 0 (X)).

(b) (8 points) Suppose that ✓ ⇠ (✓). Prove that


0
2f (x)
E(✓ | X) = X + ,
f (x)
R (x ✓)2
where f (x) = p1 e 2
d (✓).
2⇡

(c) (7 points) Define the Bayes risk B( ) = E(✓ E(✓ | X))2 , where the expected
value is taken with respect to both X and ✓. Prove that
2 2
B( ) = (1 I(f )).

This identity is due to Larry Brown.


(d) (6 points) Prove that
2
var( )
B( )  2
,
+ var( )
R
where var( ) = (✓ E(✓))2 d (✓).

5. (25 points) Consider a model for the joint distribution of two variables Y and Z in which
Z has a Bernoulli distribution with success probability ⌘ 2 [0, 1] and the conditional
distribution of Y given Z = z is exponential with failure rate e z . Then Y and Z
have joint density

f✓ (y, z) = e z exp( e z y)⌘ z (1 ⌘)1 z , z 2 {0, 1}, y > 0,

where ✓ = ( , , ⌘). This is a parametric version of the Cox proportional hazards


model, and the regression parameter is of primary interest. Let (Yi , Zi ), i = 1, . . . , n
be i.i.d. observations from this model.

(a) (10 points) Determine the large sample distribution of ˆ , the maximum likelihood
estimator (MLE) of .

3
(b) (5 points) Define
⌫ = ⌫(✓) = P✓ (Y y0 |Z = 1),
where y0 is a fixed positive number, and consider estimation of ⌫ using
P
n 1 ni=1 1{Yi y0 , Zi = 1}
⌫ˆ1 = P .
n 1 ni=1 1{Zi = 1}

Why is this a reasonable estimator of ⌫?


(c) (10 points) Show that
n
p 1 X
n(ˆ
⌫1 ⌫) = p (Yi , Zi ) + op (1),
n i=1

where
1
(y, z) = [1{y y0 , z = 1} ⌫1{z = 1}].
EZ
p
Determine explicitly the limit distribution of n(ˆ
⌫1 ⌫).

4
Statistical inference theory Monday, August 18 2014

Qualification Exam

The exam is 5 hours long. It is closed book/notes. But you are allowed to bring up to
four pages of formulas, theorems, etc. Using laptops, smart phone, etc. is also prohibited
during the exam. There is NO optional question in the exam. Please try to answer as
many questions as you can.

1. (Short answer questions) (20 points)

(a) (5 points) Let X1 , X2 , . . . denote a sequence of random variables. We know that


Xn = Op ( p1n ). Let g : R ! R be a continuous function and g(0) = 0. Prove that
g(Xn ) = op (1). Can you prove a result of the form g(Xn ) = op (n ↵ ) for some
↵ > 0? If yes, prove your claim for the largest value of ↵ for which this statement
holds? If no, give a counter-example.
(b) (5 points) Consider the problem of estimating ✓ (✓ 2 ⇥) from X ⇠ p✓ . Let R̄(⇥)
denote the minimax risk. Let P denote a class of distributions whose support is
⇥ and for ⇡ 2 P, B(⇡) denotes the Bayes risk. Prove that

sup B(⇡)  R̄(⇥).


⇡2P

(c) (5 points) Does p✓ = U (0, ✓) belong to an exponential family? If yes write it in


the standard form of exponential families. If no, then prove your answer. As
usual, ✓ is the parameter.
(d) (5 points) Let X ⇠ N (✓, 1). We would like to test H0 : |✓|  1 versus H1 : |✓| > 1.
We consider the following optimization problem
Z
max (x)p✓ (x)dµ(x), 8✓ 2 {✓ : |✓| > 1}.
Z
s.t. (x)p1 (x)dµ(x) = ↵.
Z
(x)p 1 (x)dµ(x) = ↵,

2
where p✓ (x) = p12⇡ e (x ✓) /2 . Suppose that the above optimization problem has a
solution ⇤ that satisfies the ↵-significance level constraint for every ✓ 2 [ 1, 1].
Prove that ⇤ is UMPU for H0 : |✓|  1 versus H1 : |✓| > 1.

2. Let X1 , . . . , Xn be independent and identically distributed random variables, having


the exponential distribution Exp( ) with density p(x| ) = exp( x) for x, > 0.
Pn
(a) (3 points) Show that Tn = i=1 Xi is minimal sufficient and complete for .
(b) (4 points) For given x > 0, it is desired to estimate the quantity = Prob(X1 >
x| ). Compute the Fisher information for .

1
(c) (6 points) State the Lehmann-Sche↵é theorem. Show that the estimator ˜n of
defined by 8
<0, if Tn < x,
˜n = ⇣ ⌘n 1
: 1 x
, if Tn x
Tn

is the minimum variance unbiased estimator of based on (X1 , . . . , Xn ). Without


doing any computations, state whether or not the variance of ˜n achieves the
Cramér-Rao lower bound, justifying your answer briefly.
(d) (7 points) Let k  n. Show that E( ˜k |Tn , ) = ˜n .

3. A parameter ✓ has a distribution over R+ .

(a) (5 points) Find the general form of the Bayes estimate ✓⇤ and the Bayes loss b⇤
ˆ = (✓ ✓)
for the loss function `(✓, ✓) ˆ 2 /✓.
(b) (11 points) Suppose that X ⇠ Bin(n, ⇥). The loss function is `(✓, a) = (✓
a)2 /{✓(1 ✓)}. Calculate the Bayes rule d⇤ (X) for the prior ✓ ⇠ Unif[0, 1]. Find
its risk function? Is it minimax? Is it admissible?

4. Let X1 , X2 , . . . , Xn be a random sample from f (x | ↵, ) = ↵ I(0  x  ↵)x 1


,
where ↵ > 0 and > 0 are parameters.

(a) (10 points) Find the maximum likelihood estimates of ↵ and and show that
they are consistent.
(b) (8 points) Characterize the limiting distribution of ↵MLE and MLE ?

(c) (6 points) Construct an asymptotically valid 95% confidence interval for and
show that the coverage probability will indeed converge to 0.95 as n ! 1.
iid
5. Consider X1 , X2 , . . . , Xn ⇠ N (✓, I), where Xi 2 Rk . We know that ✓ belongs to the
unit `1 sphere, i.e.,
✓ 2 {v 2 Rk : sup |vi | = 1}.
i

(a) (10 points) Suppose that ✓ = (1, 1/2, 1/2, . . . , 1/2) and prove that ✓ˆMLE is a con-
sistent (weakly) estimator of ✓?

p
(b) (10 points) Characterize the limiting distribution of n(✓ˆMLE ✓0 ).

You might also like