0% found this document useful (0 votes)

128 views16 pages

36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019

- The document is a homework assignment for a statistical machine learning course that provides solutions to 3 problems. - Problem 1 involves analyzing independence statements and causal relationships implied by a directed graph with 5 nodes. - Problem 2 derives upper and lower bounds for the minimax risk of estimating a multivariate Bernoulli distribution from data. The upper bound is O(√log(d)/n) and the lower bound is Ω(√log(d)/n). - Problem 3 shows that the minimax rate for estimating a parametric model using Hellinger distance is the same as the rate for squared error loss, which is achieved by the maximum likelihood estimator and is O(1/n).

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

128 views16 pages

36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Carnegie Mellon University Department of Statistics & Data Science

36-708 Statistical Machine Learning Homework #4

Solutions

DUE: April 19, 2019

Problem 1 [5 pts.]
Consider the directed graph with vertices V = {X1 , X2 , X3 , X4 , X5 } and edge set E = {(1, 3), (2, 3), (3, 4), (3, 5)}.
(a)[2 pts.] List all the independence statements implied by this graph.
(b)[1 pts.] Find the causal distribution p(x4 ∣set x3 = s).
(c)[2 pts.] Find the implied undirected graph for these random variables. Which independence
statements get lost in the undirected graph (if any)?

Solution.
The graph can be visualized as follows:

X1 X2

X4 X5

(a) The independence statements implied are the following:

– X1 ⊥ X2 ;
– X4 ⊥ {X1 , X2 }∣X3 and X5 ⊥ {X1 , X2 }∣X3 ;
– X5 ⊥ X4 ∣X3 .

(b) Given that we set X3 = X3 , then by the independence highlighted above X1 and X2 can be
dropped from the graph. Hence we have that:

p(x4 ∣set x3 = s) = ∫ p∗ (x4 , x5 )dx5 = ∫ p(x4 ∣x3 = s)p(x5 ∣x3 = s)dx5 = p(x4 ∣x3 = s)

(c) The moralized graph becomes:

1
X1 X2

X4 X5

We lose the (unconditional) independence between X1 and X2 during the moralization process,
while all the others are retained.

2
Problem 2 [20 pts.]
Let d ≥ 2, and let X1 , . . . , Xn ∼ P where Xi = (Xi (1), . . . , Xi (d)) ∈ Rd . Assume that the coordinates
of Xi are independent. Further, assume that Xi (j) ∼ Bernoulli(pj ) where 0 < c ≤ pj ≤ C < 1. Let P
be all such distributions. Let
Rn = inf sup EP ∣∣p̂ − p∣∣∞ .
p̂ P ∈P

Find lower and upper bounds on the minimax risk.

Solution.
For upper bound, consider an estimator X for estimating p. Then X̄ − p is sub-Gaussian with
1
parameter σ 2 = 4n . Hence, from the lemma below,
1 √
E∣∣X̄ − p∣∣∞ = E [max∣X̄ − p∣i ] ≤ √ 2 log(2d).
1≤i≤d 2 n
And since d ≥ 2, we have 2 log(2d) ≤ 4 log d, so,
√
log d
inf sup EP ∣∣p̂ − p∣∣∞ ≤ E∣∣X̄ − p∣∣∞ ≤ .
p̂ P ∈P n
√
For lower bound, let α = log d
16n , p
(0)
= (c, . . . , c) ∈ Rd .
And for 1 ≤ j ≤ d, construct d-dimensional vector parameter

pj ∶= (pj , i = 1, ⋯d) = (c, ⋯, c, c + α, c, ⋯, c) ∈ Rd ,

(i)
±
jth

then ∣∣pj − pk ∣∣∞ = α for j ≠ k, so

min∣∣pj − pk ∣∣∞ = α.
j≠k

Let Pj be the multivariate Bernoulli with parameter pj , then it’s a product of uni-variate Bernoulli’s,
Pj = ∏di=1 Pj , where Pj is uni-variate Bernoulli with parameter pj . Then
(i) (i) (i)

d
(j) (k) (j) (k) (j) (k)
KL(P (j) , P (k) ) = ∑ KL(Pi , Pi ) = KL(Pj , Pj ) + KL(Pk , Pk ).
i=1

since they only differ in two terms: i = j or i = k. Let uni-variate Beroulli with parameter c as Pc ,
then,
(j) (k) (j) (k)
KL(P (j) , P (k) ) = KL(Pj , Pj ) + KL(Pk , Pk )
Pc+α Pc
= ∑ Pc+α log( ) + ∑ Pc log( )
x Pc x Pc+α
Pc+α
= ∑(Pc+α − Pc ) log( )
x Pc
(α + c)(1 − c)
= α log
(1 − α − c)c
≤ Cα2 , by Taylor expansion.

3
(Fano’s method)
1
maxKL(Pj , Pk ) = max ∣∣µj − µk ∣∣22 = Cα2
j≠k j≠k 2
log d log(d + 1)
= ≤ .
8n 4n
Hence by Corollary 13 in the minimax notes where N = d + 1,
√
α log d
inf sup EP ∣∣p̂ − p∣∣∞ ≥ = .
p̂ P ∈P 4 256n

Note: following the same construction of p(i) and KL distance bound, lower bounds for minimax
with the same rate with respect to d and n can also be derived using Theorem 12 or Theorem 14.

Lemma 1 (Maximal inequality for subgaussian random variables)

Let {Xi }1≤i≤n be sub-Gaussian variables with parameter σ 2 , then
√ √
E [max Xi ] ≤ σ 2 log n and E [max ∣Xi ∣] ≤ σ 2 log(2n).
1≤i≤n 1≤i≤n

It’s covered in Advanced stats.

See http://www.stat.cmu.edu/~arinaldo/Teaching/36755/F17/Scribed_Lectures/F17_0911.pdf.

4
10/36-702 Statistical Machine Learning: Homework 4

Problem 3 [20 pts.]

Let {pθ ∶ θ ∈ Θ} where Θ ⊂ R be a parametric model. Suppose that the model satisfies the usual
regularity conditions. In particular, the Fisher information I(θ) is positive and smooth and the mle
has the usual nice properties. Let the loss function be L(θ̂, θ) = H(pθ̂ , pθ ) where H denotes Hellinger
distance. Find the minimax rate.

Solution.
Assume that the densities in {pθ ∶ θ ∈ Θ} is differentiable in quadratic mean (QMD) as equa-
tion (27) in the Minimax note, which says the Hellinger distance between pθ+h , pθ can be approxi-
mated by first order Taylor expansion on h. In other words, the loss with Hellinger distance is then
approximately equivalent with that with squared loss on θ,
1
H 2 (pθ+h , pθ ) = ∣∣h∣∣2 I(θ) + o(∣∣h∣∣2 ).
8
Proof see https://www.stat.berkeley.edu/ bartlett/courses/2013spring-stat210b/notes/25notes.pdf.
So the minimax rate for Hellinger loss is the same as for squared risk R(θ̂, θ) = ∣∣θ̂ − θ∣∣2 ,

inf sup H 2 (pθ̂ , pθ ) = O(inf sup R(θ̂nmle , θ)).

θ̂ θ∈Θ θ̂ θ∈Θ

Next, we prove that the minimax rate for squared loss is achieved by MLE estimator. For a
fixed true parameter θ, denote the MLE as θ̂nmle from sample X1 , ⋯, Xn ∼ Pθ . By MLE asymptotic
distribution, √
n(θ̂nmle − θ) → N (0, I −1 (θ).
So the squared risk for the MLE is,

R(θ̂nmle , θ) = V ar(θ̂nmle ) + bias2 → I −1 (θ)/n.

For any other estimator θ̂, by theorem 17 with ψ(x) = x and l as the square loss, under the QMD
condition, the squared risk is lower bounded by that for MLE,

R(Tn , θ) = V ar(Tn ) + bias2 ≥ V ar(Tn ) ≥ V ar(U ) = I −1 (θ)/n = R(θ̂nmle , θ).

This lower bound holds for all θ ∈ Θ. So the minimax risk for squared loss is achieved by the MLE,
1
inf sup R(θ̂, θ) = sup R(θ̂nmle , θ) → sup I −1 (θ).
θ̂ θ∈Θ θ∈Θ n θ∈Θ

Therefore,
1
inf sup H 2 (pθ̂ , pθ ) = O( ),
θ̂ θ∈Θ n
or equivalently,
inf sup H(pθ̂ , pθ ) = O(n−1/2 ).
θ̂ θ∈Θ

5
10/36-702 Statistical Machine Learning: Homework 4

Alternative solution (thank Tim Barry for the ideas)

We first derive an upper bound for the KL distance between arbitrary pθ+h , pθ ,
pθ
KL(pθ+h , pθ ) = ∫ log pθ dx
pθ+h
= ∫ log pθ − log pθ+h pθ dx
∂ log pθ ∂ 2 log pθ
= ∫ log pθ − (log pθ + h + h2 2
+ o(h2 ))pθ dx
∂θ ∂θ
∂ 2 log pθ
= h2 (− ∫ 2
pθ dx) + o(h2 )
∂θ
≤ CI(θ)h2 ,

where we assume good properties for the density to allow swapping the derivative and integral, so
that,

∂ log pθ 1 ∂pθ ∂pθ ∂( pθ dx)

∫ pθ dx = ∫ pθ dx = ∫ dx = ∫ = 0.
∂θ pθ ∂θ ∂θ ∂θ
For upper bound, consider MLE estimator,
√ √
H(pθ̂mle , pθ ) ≤ KL(pθ̂mle , pθ ) ≤ C(θ̂mle − θ)2 I(θ),

By MLE asymptotic distribution,

θ̂mle − θ → N (0, I −1 (θ)/n),

thus, √
inf sup H(pθ̂ , pθ ) ≤ H(pθ̂mle , pθ ) ≤ CI(θ)(V ar(θ̂nmle ) + bias2 ) → O(n−1/2 ).
θ̂ θ∈Θ
√
log 2
For lower bound, consider two distribution pθ and pθ+h , where h = CI(θ)n , then by QMD
condition,
H(pθ+h , pθ ) = Ch = O(n−1/2 ).
And the KL distance,
log 2
KL(pθ+h , pθ ) ≤ CI(θ)h2 ≤ .
n
Hence by Corollary 5 in the minimax notes,

inf sup H(pθ̂ , pθ ) = O(n−1/2 ).

θ̂ θ∈Θ

6
10/36-702 Statistical Machine Learning: Homework 4

Problem 4 [15 pts.]

Let Y = (Y1 , . . . , Yd ) ∼ N (θ, I) where θ = (θ1 , . . . , θd ). Assume that θ ∈ Θ = {θ ∈ Rd ∶ ∥θ∥0 ≤ 1}. Let
Rd = inf sup Eθ ∥θ̂ − θ∥2 .
θ̂ θ∈Θ

Show that c log d ≤ Rd ≤ C log d for some constants c and C.

Solution.

For the upper bound, we first prove a high-probability bound lemma for the maximum of Gaussians
(from 36-705, Lecture 27, Fall 2018):

Lemma 2 Suppose that, 1 , . . . , d ∼ N (0, σ 2 ) then with probability at least 1 − δ,

d √
max ∣i ∣ ≤ σ 2 log(2d/δ).
i=1

Proof. By the Gaussian tail bound, if ∼ N (0, σ 2 ):

P(∣∣ ≥ t) ≤ 2 exp(−t2 /(2σ 2 )),
By using the union bound:
P(max ∣i ∣ ≥ t) ≤ 2d exp(−t2 /(2σ 2 )),
i

By setting 2d exp(−t2 /(2σ 2 )) = δ we obtain the lemma.

Now, we assume θ̂ to be the hard-thresholding estimator, defined as:

θ̂i = yi I(∣yi ∣ ≥ t), ∀ i ∈ {1, . . . , d},

We have the following theorem (from 36-705, Lecture 27, Fall 2018):

Theorem 3 Suppose we choose the threshold:

√
t = 2σ 2 log(2d/δ),
then with probability at least 1 − δ,
d
t2
∥θ̂ − θ∥22 ≤ 9 ∑ min {θi2 , } ≤ Ct2
i=1 4
For some C1 > 0 ∈ R.

Proof. We condition on the event from the previous lemma, i.e. that
d √ t
max ∣i ∣ ≤ σ 2 log(2d/δ) ≤ .
i=1 2
Now, observe that,
d
∥θ̂ − θ∥22 = ∑(θ̂i − θi )2 ,
i=1

so we can consider each co-ordinate separately. Let us consider some cases:

7
10/36-702 Statistical Machine Learning: Homework 4

t
1. If for any co-ordinate ∣θi ∣ ≤ 2 our estimate is 0, so our risk for that coordinate is simply θi2 .
3t t2
2. If ∣θi ∣ ≥ 2 our estimate is simply θ̂i = yi so our risk is simply 2i ≤ 4.
t 3t
3. If 2 ≤ ∣θi ∣ ≤ 2, then our risk,

9t2
(θ̂i − θi )2 = (yi I(∣yi ∣ ≥ t) − θi )2 = θi2 I(∣yi ∣ < t) + 2i I(∣yi ∣ ≥ t) ≤ max{2i , θi2 } ≤ .
4

Putting these together we see that,

d
t2 t2 t2
∥θ̂ − θ∥22 ≤ 9 ∑ min {θi2 , } = 9 ∑ min {θi2 , } + 9 ∑ min {θi2 , } ≤ C1 t2
i=1 4 θi =0 4 θi ≠0 4

We also have that, under the same assumptions of the theorems above:

d
∥θ̂ − θ∥22 = ∑(yi I(∣yi ∣ ≥ t) − θi )2 ≤ C2 d max (t − θi )2 ≤ C2 dt2
i=1 i=1,..,d

And so we have that:

∞
E [∥θ̂ − θ∥22 ] = ∫ P (∥θ̂ − θ∥22 > x) dx
0
C1 t2 C2 dt2
=∫ P (∥θ̂ − θ∥22 > x) dx + ∫ P (∥θ̂ − θ∥22 > x) dx
0 C1 t2
C1 t2 C2 dt2
≤∫ (1 − δ)dx + ∫ δdx
0 C1 t2
≤ C1 t2 (1 − δ) + t2 (C2 d − 1)δ ≤ C3 t2 ≈ C3 log d

For the lower bound, let Pj = N (θj , I) where θ0 = (0, . . . , 0) and θj is the d-dimensional vector
(d ≥ 8) where √
⎧
⎪ log
⎪ d
k=j
θj (k) = ⎨ 32
⎪0
⎪ k ≠ j.
⎩
P0 is absolutely continuous wrt each other distribution and for all j = 1, . . . , d, we claim
log d
KL(Pj , P0 ) ≤ .
16
The statement above actually works for any Pj , Pi with i ≠ j. Let Pi = N (µi , I), then:
1
KL(Pj , Pk ) = ∫ (∣∣x − µk ∣∣22 − ∣∣x − µj ∣∣22 ) Pj (x)dx
2
1
= ∣∣µj − µk ∣∣22 .
2
Which implies:
√ 2
1 2 ⎛ log d ⎞ log d
maxKL(Pj , Pk ) = max ∣∣µj − µk ∣∣2 = 2 =
j≠k j≠k 2 ⎝ 32 ⎠ 16

8
10/36-702 Statistical Machine Learning: Homework 4

It then follows that:

1 d log d
∑ KL(Pj , P0 ) ≤ .
d j=1 16
Therefore, by Tsybakov’s bound,
s
Rd ≥
16
maxj≠k ∥θj − θk ∥22
=
16
√ 2
2 log 32
d
=
16
≥ c log d.

9
10/36-702 Statistical Machine Learning: Homework 4

Problem 5 [20 pts.]

Let X1 , . . . , Xn ∼ F where F is some distribution on R. Suppose we put a Dirichlet process prior on
F:
F ∼ DP(α, F0 ).

(a) (10 pts.) Recall the stick-breaking construction. Show that E( ∑∞

j=1 Wj ) = 1.

(b) (10 pts.) Simulate n = 10 data points from a N (0, 1). Try three values of α: namely, α = .1,
α = 1 and α = 10. Compute the 95 percent Bayesian confidence band and the 95 percent DKW
band. Plot the results for one example. Now repeat the simulation 1,000 times and report the
coverage probability for each confidence band.

Solution.

(a) We start by showing P(∑∞

j=1 Wj = 1) = 1. The expectation will then follow easily. First we prove
a series of lemmas.

Lemma 4 For all n ∈ N,

n n
1 − ∑ Wj = ∏(1 − Vj ).
j=1 j=1

Proof. (by induction)

Base case. k = 1
1 − W1 = 1 − V1 . ✓
Inductive hypothesis. Now assume that

n−1 n−1
1 − ∑ Wj = ∏ (1 − Vj ).
j=1 j=1

Inductive step.

n n−1
1 − ∑ Wj = 1 − ∑ Wj − Wn
j=1 j=1
n−1 n−1
= ∏ (1 − Vj ) − Vn ∏ (1 − Vj )
j=1 j=1
n−1
= (1 − Vn ) ∏ (1 − Vj )
j=1
n
= ∏(1 − Vj ). ✓ (1)
j=1

Lemma 5 Let v1 , v2 , . . . be a sequence such that 0 < vj < 1 for all j. Then
∞ ∞
∏(1 − vj ) > 0 if and only if ∑ vj < ∞.
j=1 j=1

10
10/36-702 Statistical Machine Learning: Homework 4

Proof. First notice that

∞ ∞
− log ∏(1 − vj ) = − ∑ log(1 − vj ),
j=1 j=1

and thus
∞ ∞
∏(1 − vj ) > 0 ⇐⇒ − ∑ log(1 − vj ) < ∞.
j=1 j=1

We now have
{ − log(1 − vj )} > 0 and {vj }j∈N > 0,
j∈N

so we can use the Limit Comparison test to prove that

∞ ∞
− ∑ log(1 − vj ) < ∞ ⇐⇒ ∑ vj < ∞.
j=1 j=1

Since both series diverge when vj Ð→

/ 0, it suffices to consider only the sequences where vj Ð→ 0.

− log(1 − vj ) L′ H 1
lim = lim (2)
vj →0 vj vj →0 1 − vj

= 1. (3)

Hence,
∞ ∞
− ∑ log(1 − vj ) < ∞ if and only if ∑ vj < ∞,
j=1 j=1

and thus,
∞ ∞
∏(1 − vj ) > 0 if and only if ∑ vj < ∞.
j=1 j=1

Corollary 6 Let v1 , v2 , . . . be a sequence such that 0 < vj < 1 for all j. Then
∞ ∞
∏(1 − vj ) = 0 if and only if ∑ vj = ∞.
j=1 j=1

Lemma 7 (Borel-Cantelli) If ∑∞
j=1 P(Vj > ) = ∞ for some > 0, then P(Vj > i.o.) = 1.

Now since
Vj ∼ Beta(1, α), j = 1, 2, . . .
we have
P(Vj > ) > 0 for all ∈ (0, 1) and j = 1, 2, . . . (4)
since the beta distribution puts positive mass over its entire support (0, 1), and now (4) implies
∞
∑ P(Vj > ) = ∞ for all ∈ (0, 1). (5)
j=1

11
10/36-702 Statistical Machine Learning: Homework 4

So altogether,
∞ ∞
Lemma 2
P( ∑ Wj = 1) = P( ∏(1 − Vj ) = 0)
j=1 j=1
∞
Cor. 4
= P( ∑ Vj = ∞)
j=1

≥ P(Vj > i.o.) (6)

Lemma 5
= 1,

where the inequality comes from the fact

∞
{ ∑ Vj = ∞} ⊃ {Vj > i.o.}.
j=1

Thus,
∞
P( ∑ Wj = 1) = 1. ∎
j=1

It follows that
∞ ∞
E( ∑ Wj ) = ∫ ∑ Wj dF
j=1 R j=1

∞
= P( ∑ Wj = 1) ⋅ 1 + 0
j=1

= 1,

where F is the distribution function of the random variable ∑∞

j=1 Wj .

(b) Two good resources for such simulation are the distr package in R, which is showcased by this
tutorial. For Python 3 this tutorial uses the pyMC3 modules to provide a achieve a similar goal.
We include R code for a single simulation, with parts taken from the tutorial indicated above.
library(distr)
library(coda)
library(latex2exp)

# Setup
set.seed(7)
n <- 10
alpha_vec <- c(0.1,1,10)
x_grid <- seq(-3, 3, by=0.05)
signif_level <- 0.05

# Sample observations
x_pts <- rnorm(n)

# Generate DKW Band

x_ecdf <- ecdf(x_pts)
x_ecdf_error <- sqrt(log(2 / signif_level) / (2 * n))

12
10/36-702 Statistical Machine Learning: Homework 4

dkw.lb <- pmax(x_ecdf(x_grid) - x_ecdf_error, 0)

dkw.ub <- pmin(x_ecdf(x_grid) + x_ecdf_error, 1)

## BAYESIAN BANDS
# Functions to generate Bayesian Credible Bands
sample_cdf <- function(F_hat, n){
F_hat@r(n) # F_hat is a S6 class object from the distr package
}

# Sampling from the prior distribution

sample_prior <- function(F0, alpha, n){
cdf_sample <- sample_cdf(F0, n)
v <- rbeta(n, 1, alpha) # See 5a) for definitions
w <- c(v[1], rep(0, n-1))
for(ii in 2:n){
w[ii] <- v[ii]*cumprod(1-v)[ii-1] # See 5a) for definitions
}
function(m){
sample(cdf_sample, m, prob=w, replace=T)
}
}

# Sampling from the posterior distribution

sample_posterior <- function(F0,alpha,data){
n <- length(data)
F_hat <- DiscreteDistribution(data) #distr function for empirical CDF
F_for_post <- n/(n+alpha)*F_hat+alpha/(n+alpha)*F0
sample_prior(F_for_post,alpha+n,n)
}

# Now simulate for all different alphas

list_out_bayes <- list()
for(alpha in alpha_vec){
iters <- 100
m <- 1000
F0 <- DiscreteDistribution(rnorm(m))

y <- matrix(nrow=length(x_grid), ncol=iters)

for(iter in 1:iters){
F_post <- sample_posterior(F0, alpha, x_pts)
y[,iter] <- ecdf(F_post(m))(x_grid)
}

mean_post_sim <- rowMeans(y) #Posterior Mean

cred_int <- apply(y, 1, function(row) HPDinterval(as.mcmc(row),prob=signif_level))
#obtains 95% credible interval
list_out_bayes[[as.character(alpha)]] <- list(’cred_int’ = cred_int,
’mean_post_sim’=mean_post_sim)
}

# Plot the results for each of the alpha

for (alpha_val in alpha_vec){
plot(x_ecdf, xlim=c(-3, 3),
main = TeX(sprintf("95%% DKW and Bayesian Credible Band ($\\alpha = %s$)",
as.character(alpha_val)))) #ECDF

13
10/36-702 Statistical Machine Learning: Homework 4

lines(x_grid, dkw.lb, col="green") #DKW

lines(x_grid, dkw.ub, col="green")

points(x_grid, list_out_bayes[[as.character(alpha_val)]]$’mean_post_sim’,
type="l", col="red")
points(x_grid, list_out_bayes[[as.character(alpha_val)]]$’cred_int’[2,], type=’l’,
col=’blue’)
points(x_grid, list_out_bayes[[as.character(alpha_val)]]$’cred_int’[1,], type=’l’,
col=’blue’)
curve(pnorm, xlim=c(-3, 3), add=TRUE, col="red", lwd=2)
legend("topleft", lty="solid", legend=c("True", "DKW", "Kolmogorov", "Posterior
Mean"),
col=c("black", "green", "blue", "red"))
}

Figure 1: 95% DKW and Bayesian credible bands at different alpha levels: 0.1, 1, 10 from upper left
to bottom center respectively

For n = 1, 000 simulations, one should replicate the above code and consider whether the full
empirical CDF is captured between the bands for both the Bayesian confidence and DKW
bands.

14
10/36-702 Statistical Machine Learning: Homework 4

Problem 6 [20 pts.]

In this question we consider a nonparametric Bayesian estimator and compare to the minimax esti-
mator. For i = 1, . . . , n and j = 1, 2, . . . let

Xij = θj + ij

where all the ′ij s are independent N (0, 1). The parameter is θ = (θ1 , θ2 , . . . ). Assume that ∑j θj2 < ∞.
Due to sufficiency, we can reduce the problem to the sample means. Thus let Yj = n−1 ∑ni=1 Xij . So
the model is Yj ∼ N (θj , 1/n) for j = 1, 2, 3, . . . . We will put a prior π on θ as follows. We take each
θj to be independent and we take θj ∼ N (0, τj2 ).

(a) (5 pts.) Find the posterior for θ. Find the posterior mean θ.
̂

P
(b) (7 pts.) Suppose that ∑j τj2 < ∞. Show that θ̂ is consistent, that is, ∥θ̂ − θ∥2 Ð→ 0.

(c) (8 pts.)Now suppose that θ is in the Sobolev ball

⎧
⎪ ⎫
⎪
⎪ 2p 2 2⎪
Θ = ⎨θ = (θ1 , θ2 , . . .) ∶ ∑ j θj ≤ C ⎬
⎪
⎪ j ⎪
⎪
⎩ ⎭

where p > 1/2. The minimax (for squared error loss) for this problem is Rn ≍ n−2p/(2p+1) . Let
τj2 = (1/j)2r . Find r so that the posterior mean achieves the minimax rate.

Solution.

(a) By Theorem 6 (in the appendix) we have

nYj τj2
θ̂j = .
1 + nτj2

(b) For any > 0,

E[∥θ̂ − θ∥2 ]
2
P (∥θ̂ − θ∥ > ) ≤ Markov’s inequality

1 ∞ 2
= ∑ E[(θ̂j − θj ) ]
j=1
⎡ 2
⎤
1 ⎢⎢ ∞ ∞ ⎥
= ⎢ ∑ (E[θj − θj ]) + ∑ Var(θj )⎥⎥
̂ ̂
⎢ j=1 j=1 ⎥
⎣ ⎦
⎡∞ 2
τj ⎥ ⎤
1⎢ 1 ∞
= ⎢⎢ ∑ θj2 +∑ ⎥ Theorem 6
⎢ j=1 (1 + nτj )2 2
j=1 1 + nτj2 ⎥⎥
⎣ ⎦
→ 0,

as n → ∞, since ∑j τj2 < ∞ and ∑j θj2 < ∞.

(c) See Shen and Wasserman (2001).

15
10/36-702 Statistical Machine Learning: Homework 4

Theorem 8 Let X1 , . . . , Xn ∼ N (µ, σ 2 ) where σ 2 is known. Let µ ∼ N (a, b2 ). Then,

aσ 2 + nXb2
E[µ∣X1 , . . . , Xn ] = .
σ 2 + nb2
Proof.
n
1 1 1 n
fX n (xn ∣µ) = ∏ √ exp{− (x i − µ) 2
} = (2πσ 2 −n/2
) exp{− 2
∑(xi − µ) }
i=1 2πσ 2 σ2 2σ 2 i=1

−1 −(n − 1)s2 −n(µ − X)2

= (2πσ 2 )−n/2 exp { [(n − 1)s2
+ n(µ − X) 2
]} = (2πσ 2 −n/2
) exp { } exp { }
2σ 2 2σ 2 2σ 2
−n(µ − X)2
∝ exp { }.
2σ 2
1 1 1
π(µ) = √ exp { − (µ − a)2 } ∝ exp { − 2 (µ − a)2 }.
2πb2 2b2 2b
Hence,
−n(µ − X)2 1
π(µ∣X n ) ∝ fX n (xn ∣µ)π(µ) ∝ exp { 2
− 2 (µ − a)2 }
2σ 2b
2
−nµ2 + 2nµX − nX −µ2 + 2µa − a2
= exp { + }
2σ 2 2b2
2
−1 n a nX a2 nX
= exp {µ2 ( − ) − 2µ( − − ) − ( + )}
2b2 2σ 2 2b2 2σ 2 2b2 2σ 2
For simplicity, let
−1 n a nX
U =( 2
− 2 ) and V = ( − 2 − 2 ).
2b 2σ 2b 2σ
Then
V V2 V2
π(µ∣X n ) ∝ exp{U µ2 − 2V µ} = exp {U (µ2 − 2µ+ 2)− }
U U U
V −1 V V −1
∝ exp {U (µ − )2 } = exp { √ (µ − )2 } ∝ N ( , ).
U 2(1/ −2U ) 2 U U 2U
Therefore the mean of the posterior is,

V −aσ 2 − nb2 X aσ 2 + nXb2

̂ = E[µ∣X n ] =
µ = = .
U −σ 2 − nb2 σ 2 + nb2

SA 8 3 6 Separation System: CO CO
100% (3)
SA 8 3 6 Separation System: CO CO
386 pages
Acuson Aspen - Service Manual
80% (10)
Acuson Aspen - Service Manual
230 pages
Mathematical Statistics (II)
No ratings yet
Mathematical Statistics (II)
112 pages
O-Levels Metal Technology and Design Exemplar
100% (2)
O-Levels Metal Technology and Design Exemplar
33 pages
Honda Hornet 2.0 - Owner's Manual
No ratings yet
Honda Hornet 2.0 - Owner's Manual
1 page
Stu Pen Dous Syllables Workbook
100% (1)
Stu Pen Dous Syllables Workbook
23 pages
Arts7 Q1 M1 Attiresfabricsandtapestriesv Final
100% (2)
Arts7 Q1 M1 Attiresfabricsandtapestriesv Final
28 pages
Module 2
No ratings yet
Module 2
54 pages
UN-Habitat - The State of African Cities 2010: Governance, Inequalities and Urban Land Markets
No ratings yet
UN-Habitat - The State of African Cities 2010: Governance, Inequalities and Urban Land Markets
277 pages
Talal Khan CV Civil Engineer - Planning Engineer - Dec 18
100% (1)
Talal Khan CV Civil Engineer - Planning Engineer - Dec 18
4 pages
Coastal Protection of Highways
No ratings yet
Coastal Protection of Highways
14 pages
Suburba Contest
100% (4)
Suburba Contest
4 pages
STA 303 Theory of Estimation 9th Lecture-1
No ratings yet
STA 303 Theory of Estimation 9th Lecture-1
7 pages
3-Low Voltage Aerial Bundle Cables (6001000V)
No ratings yet
3-Low Voltage Aerial Bundle Cables (6001000V)
11 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Applied Mechanics I - Fall 2013 PDF
100% (1)
Applied Mechanics I - Fall 2013 PDF
4 pages
AdvEcx Chp2 Full 1005
No ratings yet
AdvEcx Chp2 Full 1005
105 pages
Lyrics
No ratings yet
Lyrics
56 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
Inference Quals 1992-2019
No ratings yet
Inference Quals 1992-2019
66 pages
Solutions To Selected Exercises From Chapter 9 Bain & Engelhardt - Second Edition
No ratings yet
Solutions To Selected Exercises From Chapter 9 Bain & Engelhardt - Second Edition
13 pages
Ps 2,3
No ratings yet
Ps 2,3
48 pages
50 BMG Armor Penetration
No ratings yet
50 BMG Armor Penetration
13 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Visang High School English Lesson 04 WooJack SKUNK WORKS Q113
No ratings yet
Visang High School English Lesson 04 WooJack SKUNK WORKS Q113
50 pages
SSRN Id4138427
No ratings yet
SSRN Id4138427
12 pages
Homework 1: Instructions and Notes
No ratings yet
Homework 1: Instructions and Notes
2 pages
Dommel and Pichler - 2024 - On The Approximation of Kernel Functions
No ratings yet
Dommel and Pichler - 2024 - On The Approximation of Kernel Functions
27 pages
CS236 Hw1 Answers
No ratings yet
CS236 Hw1 Answers
9 pages
DL 1
No ratings yet
DL 1
10 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
An Introduction To Signal Detection and Estimation - Second Edition Chapter IV: Selected Solutions
100% (1)
An Introduction To Signal Detection and Estimation - Second Edition Chapter IV: Selected Solutions
7 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
SUTENE2 TRM Test U8B
No ratings yet
SUTENE2 TRM Test U8B
4 pages
Stat3110 Stat6110
No ratings yet
Stat3110 Stat6110
8 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
520 2022 Article 7194
No ratings yet
520 2022 Article 7194
9 pages
EE Exercise Solutions 2022
No ratings yet
EE Exercise Solutions 2022
21 pages
18.650 Statistics For Applications
No ratings yet
18.650 Statistics For Applications
25 pages
Sec File Asmita
No ratings yet
Sec File Asmita
13 pages
STAT 513 Solutions
No ratings yet
STAT 513 Solutions
16 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Manifold Estimation, Hidden Structure and Dimension Reduction
No ratings yet
Manifold Estimation, Hidden Structure and Dimension Reduction
39 pages
Sparse Additive Models: University of California, Berkeley, USA
No ratings yet
Sparse Additive Models: University of California, Berkeley, USA
22 pages
Lar 47th Edition Addendum I en
No ratings yet
Lar 47th Edition Addendum I en
5 pages
Data Analysis Exam 1 36-401, Section B
No ratings yet
Data Analysis Exam 1 36-401, Section B
3 pages
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
No ratings yet
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
35 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
ES Key
No ratings yet
ES Key
6 pages
MokpoBridge NewLandmarkinMokpoCity
No ratings yet
MokpoBridge NewLandmarkinMokpoCity
3 pages
4 Comparison of Estimators: 4.1 Optimality Theory
No ratings yet
4 Comparison of Estimators: 4.1 Optimality Theory
16 pages
Msqe Metrics 1 ps2
No ratings yet
Msqe Metrics 1 ps2
11 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Dis 1
No ratings yet
Dis 1
5 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
8 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
Econometrics PS9
No ratings yet
Econometrics PS9
9 pages
Nonparametric Regression
No ratings yet
Nonparametric Regression
24 pages
Lecture 1.4
No ratings yet
Lecture 1.4
13 pages
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
22 pages
Homework1 Solution
No ratings yet
Homework1 Solution
9 pages
Sol3 2015
No ratings yet
Sol3 2015
8 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
MA204 FinalTest 2022
No ratings yet
MA204 FinalTest 2022
14 pages
Nonparametric Classification 10/36-702: 1 1 N N N I I
No ratings yet
Nonparametric Classification 10/36-702: 1 1 N N N I I
20 pages
4.2. Anna Ferruta Freud's Three Essays Revised
No ratings yet
4.2. Anna Ferruta Freud's Three Essays Revised
6 pages
NDE Techniques For Reliable Inspection of Carbon Steel Tubes
No ratings yet
NDE Techniques For Reliable Inspection of Carbon Steel Tubes
9 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Kliscom Lessons 1-3
No ratings yet
Kliscom Lessons 1-3
17 pages
Math 3423 Past Paper
No ratings yet
Math 3423 Past Paper
6 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
STAT-36700 Homework 4 - Solutions: Fall 2018 September 28, 2018
No ratings yet
STAT-36700 Homework 4 - Solutions: Fall 2018 September 28, 2018
14 pages
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
No ratings yet
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
3 pages
2023 Tarea Curso Identificacion
No ratings yet
2023 Tarea Curso Identificacion
10 pages
Lecture 8: Inference 36-401, Fall 2015, Section B
No ratings yet
Lecture 8: Inference 36-401, Fall 2015, Section B
16 pages
Problem 1: Otherwise, 0 X 0 1), 0 ( ) (
No ratings yet
Problem 1: Otherwise, 0 X 0 1), 0 ( ) (
4 pages
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
Notes
No ratings yet
Notes
10 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
STAT 135 Solutions To Homework 3:: 30 Points
No ratings yet
STAT 135 Solutions To Homework 3:: 30 Points
8 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
Econometrics 2018 Final Solutions
No ratings yet
Econometrics 2018 Final Solutions
5 pages
Solution 3 Problem 1: Let X
No ratings yet
Solution 3 Problem 1: Let X
12 pages
STAT2602 Tutorial 5
No ratings yet
STAT2602 Tutorial 5
7 pages
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
No ratings yet
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
9 pages
CTP 3R
No ratings yet
CTP 3R
6 pages
STAT 135 Solutions To Homework 4:: 30 Points
No ratings yet
STAT 135 Solutions To Homework 4:: 30 Points
9 pages
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
No ratings yet
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
12 pages
10/36-702 Statistical Machine Learning Homework #2 Solutions
No ratings yet
10/36-702 Statistical Machine Learning Homework #2 Solutions
11 pages
NNLS1 2019 HW4 Solutions
No ratings yet
NNLS1 2019 HW4 Solutions
11 pages
Estimation of Glomerular Filtration Rate in South Asian Healthy Adult Kidney Donors
No ratings yet
Estimation of Glomerular Filtration Rate in South Asian Healthy Adult Kidney Donors
7 pages
Glasroc H PDF
No ratings yet
Glasroc H PDF
9 pages
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 4: Solutions
No ratings yet
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 4: Solutions
8 pages
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
No ratings yet
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
7 pages
Math525 2
No ratings yet
Math525 2
8 pages
Solution 5 Problem 1: Let a > 0 be a known constant, and let θ > 0 be a parameter
No ratings yet
Solution 5 Problem 1: Let a > 0 be a known constant, and let θ > 0 be a parameter
8 pages
Online Learning: T T T T T T T T
No ratings yet
Online Learning: T T T T T T T T
8 pages
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Math435 HW 8
No ratings yet
Math435 HW 8
8 pages
Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015
No ratings yet
Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015
11 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
1 Review
No ratings yet
1 Review
7 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Sol Stat Chapter2
No ratings yet
Sol Stat Chapter2
9 pages
Advanced Statistical Inference
No ratings yet
Advanced Statistical Inference
7 pages
Boosting: I I I I
No ratings yet
Boosting: I I I I
5 pages
Chapter 12 - The Cell Cycle
No ratings yet
Chapter 12 - The Cell Cycle
2 pages
Maximum Likelihood Estimation (MLE)
No ratings yet
Maximum Likelihood Estimation (MLE)
4 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
No ratings yet
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
2 pages
Homework - Plate No. 4b
No ratings yet
Homework - Plate No. 4b
1 page
SDET Formulae MidSem2 2018 Ver3
No ratings yet
SDET Formulae MidSem2 2018 Ver3
2 pages
HW7
No ratings yet
HW7
1 page
Solution 14
No ratings yet
Solution 14
5 pages