Inference Quals 1992-2019
Inference Quals 1992-2019
Qualifying Exam
The exam is 4 hours long (from 1:00 PM until 5:00 PM). It is closed books/notes. But you
are allowed to bring up to four pages of cheat-sheets. Using laptops, smart phones, etc. is
also prohibited during the exam. In case you feel that a problem is incorrect or you need
more assumptions to solve it, please clearly explain your reasoning and the assumptions
you need, and then solve that problem.
1. Let ✓1 , . . . , ✓m be i.i.d. Bernoulli(p), p 2 (0, 1) is assumed known, and suppose that our
data {Xi }m i=1 follows the model
where f0 and f1 are known density functions. The Xi ’s are assumed to be drawn
independently. This model is widely used in multiple testing problems where ✓i is zero
(one) if the i-th null hypothesis is true (false), and Xi can be thought of as the z-score
(which can be assumed to have a normal distribution under null, i.e., f0 is N (0, 1)).
We are interested in inference about the unknown ✓ = (✓1 , . . . , ✓m ) 2 {0, 1}m based
on X = (X1 , . . . , Xm ). This involves solving m decision problems simultaneously and
is called a compound decision problem. Let = ( 1 , . . . , m ) 2 {0, 1}m be a general
decision rule (i.e., i = 0 means we believe that ✓i = 0). This naturally gives rise to a
weighted classification problem with loss function
m
1 X
L (✓, ) := [ I(✓i = 0) i + I(✓i = 1)(1 i )] (1)
m i=1
where > 0 is the relative weight for a false positive, and I(·) denotes the indicator
function. The weighted classification problem is then to find that minimizes the
classification risk E[L (✓, )].
1
(d) (6 marks) Show that the minimum classification risk is
Z
⇤ -
R := inf E[L (✓, )] = p + [ (1 p)f0 (x) + pf1 (x)]dx,
K
P(X1 = k) = ⇡k ,
for some fixed ⇡k 2 (0, 1). Under the alternative hypothesis, the probability mass
function of Xi is unrestricted.
(a) (7 marks) Calculate the likelihood ratio test statistic for this problem and call it
LR. 1
P
(b) (8 marks) Define Nj = ni=1 I(Xi = j), and prove that under the null hypothesis
2
!
X (Nj n⇡j )2
LR = op (1).
j=1
n⇡ j
P2 (Nj n⇡j )2
(c) (4 marks) Characterize the limiting distribution of j=1 n⇡j
under the null
hypothesis.
1
I should remind you that for Z ⇠ p✓ (z) the likelihood ratio statistic is defined as
max✓2⇥ p✓ (z)
LR = 2 log ,
max✓2⇥0 p✓ (z)
2
Assuming sufficient smoothness and moment conditions (the consistency of ✓˜ can be
assumed as well), we have the following representation
n
1X 1
✓˜ ✓0 = ⌘(Xi ) + op ( p ), (1)
n i=1 n
where ✓0 denotes the true parameter value and ⌘(x) is known as the influence function.
p
(a) (9 marks) Suppose that ✓˜ ! ✓0 , and derive the influence function ⌘ in terms of ⇢
(and its derivatives).
(b) (10 marks) A special case of M-estimator is the MLE (⇢ = log f✓ ). Let ✓ˆ denote
the MLE and write, similarly to (1),
n
1X 1
✓ˆ ✓0 = ⇠(Xi ) + op ( p ). (2).
n i=1 n
You can make any reasonable assumption about the family of distributions. Show
that E✓0 {[⌘(Xi ) ⇠(Xi )]⇠(Xi )} = 0.
3
Let Lc✓ denote the likelihood for the Tic and ci , i from 1 to n. Let S✓c denote the score
for ✓ in this model: S✓c is the derivative with respect to ✓ of the logarithm of Lc✓ with
respect to ✓, evaluated at the true value of ✓.
4
Inference Qual (60 points)
Date: August 27, 2018
1. 2+4+4=10 points
i.i.d.
Suppose (X1 , · · · , Xn ) ⇠ N (µ, 2 ), and given (X1 , · · · , Xn ) we have (Y1 , · · · , Yn )
mutually independent, with Yi ⇠ N ( Xi , 1), where 2 R is an unknown parameter.
You observe both (X1 , · · · , Xn ) and (Y1 , · · · , Yn ), and you want to test H0 : = 0
versus H1 : > 0 at level ↵, where ↵ 2 (0, 1).
(a) Suppose µ = 0, = 1 is known. Show that there is no UMP level ↵ test for this
problem.
(b) Does there exist a UMPU level ↵ test for the problem in part (a)?
(c) Now suppose µ 2 R and > 0 are both unknown. Does there exist a UMPU
level ↵ test for this problem?
2. 4+6=10 points
Let X ⇠ N (✓, 1) and ✓ ⇠ ⇡(✓). Our goal is to estimate the parameter ✓. Let ⇡ 2 ,
where is the class of priors on R with
Z Z
✓⇡(d✓) = 0, ✓2 ⇡(d✓) = 1.
(b) Compute the value of sup⇡2 inf R(⇡, ), where the infimum is taken over all
estimators (.).
3. 3+3+4=10 points
Let (X1 , Y1 ), . . . , (Xn , Yn ) be iid bivariate Gaussian random vectors with E(Xi ) =
E(Yi ) = 0, Var(Xi ) = Var(Yi ) = 1, and finally Cov(Xi , Yi ) = ⇢.
(a) Find with proof a minimal sufficient statistic for ⇢.
(b) Is the statistic you suggested in part (a) a complete sufficient statistic? Prove
your answer.
P
(c) Is n1 ni=1 Xi Yi the UMVU for ⇢? Prove your answer.
4. 3+4+3=10 points
i.i.d.
Suppose (Y1 , · · · , Yn ) ⇠ N (✓, 1) with ✓ 2 R unknown, and Z ⇠ N (0, 1) independent
of (Y1 , · · · , Yn ). Suppose you observe (X1 , · · · , Xn ), where Xi := Yi + Z. Note that
(X1 , · · · , Xn ) are not i.i.d.
1
(a) Show that (X1 , · · · , XnP) is a multivariate Gaussian, by computing the moment
n
generating function Ee i=1 ti Xi .
(b) Find a minimax estimator for ✓.
(c) Is your minimax estimator consistent for ✓?
Hints: A multivariate normal density with mean vector µ and covariance matrix
⌃ has joint density
1 0 1
p e (x µ) ⌃ (x µ)/2 .
(2⇡)n |⌃|
5. 3+2+5=10 points
i.i.d.
Suppose (X1 , · · · , Xn ) ⇠ N (0, 1). Given (X1 , · · · , Xn ), let (Y0 , · · · , Yn ) be mutually
independent, with Y0 ⇠ N (0, 1), and Yi = ⇢Yi 1 + Xi . Here ⇢ 2 ( 1, 1) is an unknown
parameter. Suppose you only observe (Y0 , · · · , Yn ).
(a) Starting from the joint density of (Y0 , X1 , · · · , Xn ), find the joint density of
(Y0 , · · · , Yn ).
(b) Find the MLE for ⇢.
(c) Find non degenerate asymptotic distribution of the MLE, when the true param-
eter is ⇢ = 0.
Hints: Note that here (Y0 , · · · , Yn ) are not i.i.d. for general ⇢ 2 ( 1, 1), and so
your class result for asymptotic normality of the MLE does not apply.
6. 4+6=10 points
Suppose for every n 1 we have two probability measures Pn and Qn on R with density
pn (.) and qn respectively, with respect to Lebesgue measure. Assume that pn (.) and
qn (.) are strictly positive on the whole of R.
(a) If the random variable log pn (Xn ) log qn (Xn ) is Op (1) under both the probability
measures Pn and Qn , show that Pn and Qn are mutually contiguous.
(b) Conversely, if Pn and Qn are mutually contiguous, show that the random variable
log pn (Xn ) log qn (Xn ) is Op (1) under both the probability measures Pn and Qn .
2
Statistics Inference Qualifying Exam
Time 1:00 pm-5:00 pm Date: August 21, 2017
i.i.d.
1. Suppose {Xij , 1 i n, 1 j p} P⇠ N (0, 1), and given (Xij , 1
i n, 1 j p), we have Yi ⇠ N ( pj=1 j Xij , 1) with (Y1 , · · · , Yn )
mutually indepdent, where 2 Rp is an unknown parameter of inter-
est, where p is fixed. The problem is to estimate . The proposal is
to construct an estimate by minimize the function
n
X p
X p
X
2 2
L( ) := (Yi j Xij ) + n j,
i=1 j=1 j=1
where n 0.
(a) Show that no matter what a 2 R is, the estimator a (X) is not a
Bayes estimator for any prior on ✓.
(b) Let ⇡n be a sequence of (non-normal) prior densities with respect
to Lebesgue measure, such that the corresponding sequence of
Bayes estimators ⇡n converge in L2 to a (X), i.e.
Z hZ i
2 1 (x ✓)2 /2 n!1
E( ⇡n (X) X a) = ( ⇡n (x) x a)2 p e dx ⇡n (✓)d✓ ! 0.
R R 2⇡
1
3. Suppose that W1 , . . . , WN are i.i.d. random elements having a common
distribution P . We assume that P is unknown and ✓ ⌘ ✓(P ) is a one
dimensional parameter of interest. Further suppose that a natural
estimator ✓ˆN converges in distribution at a rate rN , i.e.,
d
rN (✓ˆN ✓0 ) ! G, (1)
where G is non-normal, has mean zero and finite variance 1.
Assume that N is large and write N = n⇥m, where n is still large and
m relatively smaller (e.g., n = 10, 000, m = 100, so that N = 106 ).
With such a large sample size N it might be difficult to compute ✓ˆ
directly. We can define a new “averaged” estimator as follows:
(i) Divide the set of samples W1 , . . . , WN into m disjoint subsets
S1 , . . . , S m .
(ii) For each j = 1, . . . , m, compute the estimator ✓ˆm (j) based on the
data points in Sj .
(iii) Average together these estimators to obtain the final ‘pooled’
estimator:
m
¯ 1 Xˆ
✓N = ✓n (j). (2)
m
j=1
2
and show that its distribution does not depend on F .
i.i.d.
(b) Suppose we want to test the null hypothesis that X1 , · · · , Xn ⇠
F , where F = U (0, ✓) for some ✓ > 0 which is unknown, versus
the alternative that F is not uniform. Letting J := arg max1in Xi ,
show that under H0 the statistic (J, XJ ) is sufficient for ⇥. Show
further that, given {J = j, Xj = t}, the random variables (Xi :
i 6= j) are independent and have the U [0, t] distribution.
(c) How might you apply the Kolmogorov-Smirnov statistic to test
the hypothesis H0 ? Cut-o↵s for the statistic can be obtained
numerically.
5. Suppose X1 , X2 , X3 are random variables which take value in {0, 1}.
i.i.d.
Under H0 we have X1 , X2 , X3 ⇠ Bin(1, .5). Let ✓ 2 (0, .5) be an
unknown parameter. Find a UMP level ↵ test against the following
alternatives as explicitly as possible.
3
Theoretical Statistics Qualifying Exam
Date: August 22, 2016
i.i.d.
1. Suppose X1 , · · · , Xn ⇠ P oi( ) for some > 0.
(a) If the loss is L( (X), ) = [ (X) ]2 , then find the minimax risk.
(b) If the loss is L( (X), ) = 1 [ (X) ]2 , then find the minimax risk.
2. Decide whether Pn is contiguous to Qn (i.e. Qn dominates Pn ) in the following examples.
(a) Suppose {Xi }i 1 and {Yi }i 1 are mutually independent, with Xi ⇠ Exp(1) and Yi ⇠
Exp(✓i ). Let Pn and
P Qn denote the law of (X1 , · · · , Xn ) and (Y1 , · · · , Yn ) respectively,
and assume that 1 [✓
i=1 i 1] 2 < 1.
(b) Let Pn be the law of Bin(n, pn ), and Qn be the law of P oi( ), where npn converges to
> 0.
(c) Let Pn be the law of 2Bin(n,1/2)
p
n
n
, and Qn be the law of N (0, 1).
3. Suppose that we want to model the survival of twins with a common genetic e↵ect, but with
one of the two twins receiving some treatment. Let X represent the survival time of the
untreated twin and let Y represent the survival time of the treated twin. One (overly simple)
preliminary model might be to assume that X and Y are independent with Exponential( )
and Exponential( µ) distributions, i.e.,
x µy
f ,µ (x, y) = e µe 1(0,1) (x)1(0,1) (y).
(a) One crude approach to estimation in this problem is to reduce the data to W = X/Y .
Find the distribution of W . Hence compute the Cramer-Rao lower bound for unbiased
estimators of µ based on W .
(b) Find the information bound for estimation of µ based on the observation (X, Y ) pairs,
when is known.
(c) Find the information bound for estimation of µ based on the observation (X, Y ) pairs,
when is unknown.
Hints: Recall that for a multivariate parameter, the CR Lower bound is the correspond-
ing diagonal element of the inverse Information matrix.
(d) Compare the bounds you computed in parts (b) and (c) above, and discuss the pros and
cons of reducing to estimating based on W .
(a) Express the MLE µˆ1 , µˆ2 as explicitly as possible in terms of (X, Y ).
(b) Find the distribution of the likelihood ratio test statistic for testing H0 : µ1 = µ2 = 0.
i.i.d.
5. Suppose X1 , · · · , Xn ⇠ f✓ (.), where f✓ (x) := 12 [ (x ✓) + (x + ✓)] with (.) the standard
normal density, and ✓ 0.
Let ✓ˆn denote the MLE of ✓. Assume that the true ✓0 = 0.
1
Statistical inference theory Monday, August 17 2015
Qualification Exam
The exam is 4 hours long. It has 5 questions, but you have to answer ONLY FOUR of
them. If you answer more than 4, we will only grade problems 1 to 4. The exam is closed
book/notes. But you are allowed to bring up to three pages of formulas, theorems, etc.
Using laptops is prohibited during the exam.
f (h) = o(|h|) as h ! 0,
where Ei (✓) is the expected value with respect to ✓ ⇠ ⇡i . Can you derive ⇤
?
Clarify the assumptions you use in deriving the optimal test.
(e) (6 points) Let X1 , X2 , . . . , Xn be iid randomP
variables with distribution F . Define
n
S as the set of all variables of the form i=1 gi (Xi ), where gi s are arbitrary
measurable functions with Egi (Xi ) < 1. Given a random variable T with E(T ) =
2
0 and E(T 2 ) < 1, we would like to find a random variable Z 2 S that minimizes
E(Z T )2 . Prove that that optimal Z is given by
n
X
Z= E(T | Xi ).
i=1
1
2. (25 points) A statistician and a sports-caster are having an argument. The statistician
claims that Larry Bird (a basketball player) hits half his shots and that di↵erent
attempts are independent. The sports-caster insists that this is nonsense – when Bird
hits a shot he gains momentum and his next shot will go in with a chance ✓ > 1/2, but
if he misses a shot, his next shot will fall with chance 1 ✓ < 1/2. To be specific, let
Xi = 1 if Bird makes his shot on attempt i and Xi = 1 if he misses. The sports-caster
believes that P(X1 = 1) = 1/2 and that
⇢
✓, if xi = xi+1 ;
P(Xi+1 = xi+1 |X1 = x1 , . . . , Xi = xi ) =
1 ✓, if xi 6= xi+1 ,
for i 1 and any choice for x1 , . . . , xi 2 {+1, 1}. The statisticians’ model is the
same with ✓ = 1/2.
(a) P
(4 points) Find the joint mass function for X1 , . . . , Xn and show that T =
n 1
i=1 Xi Xi+1 is a sufficient statistic.
(b) (4 points) Determine the form of the uniformly most powerful level-↵ test of
H0 : ✓ = 1/2 versus H1 : ✓ > 1/2.
(c) (4 points) Find the mean and variance of T under H0 .
(d) (5 points) Find the exact distribution of T when ✓ = 1/2.
(e) (4 points) Assuming that T is approximately normal with the mean and vari-
ance you derived in part (C) (it is provided n is large), find a test with level
approximately 5% when n = 40.
(f) (4 points) As empirical evidence supporting his claim, the sports-caster reveals
the following data from a recent game:
HHHM HHHH M M M M M M HM M M HM M HHM HHHHHH M M M HM M M M HM.
“The evidence is clear”, claims the sports-caster. “Bird started o↵ hitting 7 of his
first 8 shots. Then he had a cold spell sinking only 2 of his next 13 attempts. He
found a groove and canned 8 of the next 9, but the momentum switched back and
he only made 2 more shots in his last 10 attempts.What do you think? Would
the uniformly most powerful test with level (approximately) 5% reject the null
hypothesis that ✓ = 1/2, i.e., his shots are independent.
2
(d) (9 points) Find a minimax estimator.
4. (25 points) In this problem, when required you may assume that the limit and integral
and the derivative and integral interchange. For a pdf f (y) define
Z
f 0 2 (y)
I(f ) = dy.
f (y)
E((X ✓)g(X)) = 2
E(g 0 (X)).
(c) (7 points) Define the Bayes risk B( ) = E(✓ E(✓ | X))2 , where the expected
value is taken with respect to both X and ✓. Prove that
2 2
B( ) = (1 I(f )).
5. (25 points) Consider a model for the joint distribution of two variables Y and Z in which
Z has a Bernoulli distribution with success probability ⌘ 2 [0, 1] and the conditional
distribution of Y given Z = z is exponential with failure rate e z . Then Y and Z
have joint density
(a) (10 points) Determine the large sample distribution of ˆ , the maximum likelihood
estimator (MLE) of .
3
(b) (5 points) Define
⌫ = ⌫(✓) = P✓ (Y y0 |Z = 1),
where y0 is a fixed positive number, and consider estimation of ⌫ using
P
n 1 ni=1 1{Yi y0 , Zi = 1}
⌫ˆ1 = P .
n 1 ni=1 1{Zi = 1}
where
1
(y, z) = [1{y y0 , z = 1} ⌫1{z = 1}].
EZ
p
Determine explicitly the limit distribution of n(ˆ
⌫1 ⌫).
4
Statistical inference theory Monday, August 18 2014
Qualification Exam
The exam is 5 hours long. It is closed book/notes. But you are allowed to bring up to
four pages of formulas, theorems, etc. Using laptops, smart phone, etc. is also prohibited
during the exam. There is NO optional question in the exam. Please try to answer as
many questions as you can.
2
where p✓ (x) = p12⇡ e (x ✓) /2 . Suppose that the above optimization problem has a
solution ⇤ that satisfies the ↵-significance level constraint for every ✓ 2 [ 1, 1].
Prove that ⇤ is UMPU for H0 : |✓| 1 versus H1 : |✓| > 1.
1
(c) (6 points) State the Lehmann-Sche↵é theorem. Show that the estimator ˜n of
defined by 8
<0, if Tn < x,
˜n = ⇣ ⌘n 1
: 1 x
, if Tn x
Tn
(a) (5 points) Find the general form of the Bayes estimate ✓⇤ and the Bayes loss b⇤
ˆ = (✓ ✓)
for the loss function `(✓, ✓) ˆ 2 /✓.
(b) (11 points) Suppose that X ⇠ Bin(n, ⇥). The loss function is `(✓, a) = (✓
a)2 /{✓(1 ✓)}. Calculate the Bayes rule d⇤ (X) for the prior ✓ ⇠ Unif[0, 1]. Find
its risk function? Is it minimax? Is it admissible?
(a) (10 points) Find the maximum likelihood estimates of ↵ and and show that
they are consistent.
(b) (8 points) Characterize the limiting distribution of ↵MLE and MLE ?
(c) (6 points) Construct an asymptotically valid 95% confidence interval for and
show that the coverage probability will indeed converge to 0.95 as n ! 1.
iid
5. Consider X1 , X2 , . . . , Xn ⇠ N (✓, I), where Xi 2 Rk . We know that ✓ belongs to the
unit `1 sphere, i.e.,
✓ 2 {v 2 Rk : sup |vi | = 1}.
i
(a) (10 points) Suppose that ✓ = (1, 1/2, 1/2, . . . , 1/2) and prove that ✓ˆMLE is a con-
sistent (weakly) estimator of ✓?
p
(b) (10 points) Characterize the limiting distribution of n(✓ˆMLE ✓0 ).