0% found this document useful (0 votes)
54 views21 pages

Ms BPR 1259

Uploaded by

Ricardo Puziol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views21 pages

Ms BPR 1259

Uploaded by

Ricardo Puziol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

UNDER PEER REVIEW

ASYMPTOTIC CONSISTENCY OF THE


JAMES-STEIN SHRINKAGE ESTIMATOR

Key words: Asymptotic, Consistency, Convergence, Efficiency, Mean Squared Error, Shrinkage.

Abstract: The study explores the asymptotic consistency of the James-Stein shrinkage estimator
obtained by shrinking a maximum likelihood estimator. We use Hansen’s approach to show that the
James-Stein shrinkage estimator converges asymptotically to some multivariate normal distribution
with shrinkage effect values. We establish that the rate of convergence is of order k√1 n and rate
√ √
k n, hence the James-Stein shrinkage estimator is k n-consistent. Then visualise its consistency
by studying the asymptotic behaviour using simulating plots in R for the mean squared error of
the maximum likelihood estimator and the shrinkage estimator. The latter graphically shows lower
mean squared error as compared to that of the maximum likelihood estimator.

1. Introduction
A shrinkage estimator is an estimator that, either explicitly or implicitly, incorporates the effects of
shrinkage. In loose terms this means that a naive or raw estimate is improved by combining it with
other information. The term relates to the notion that the improved estimate is made closer to the
“true value" by the supplied information than the raw estimate. Shrinkage estimation is a techqnique
used in inferential statistics to reduce the mean squared error (MSE) of a given estimator. The idea
of shrinking an estimator came in 1956 when (Stein, 1956) established that we can reduce the MSE
of an estimator if we give up a little on bias. This means that given an estimator we can shrink it
to obtain another estimator with lower MSE and the efficiency of the new estimator is desirable in
the way it estimates the “true" parameter value. This works well when the number of parameters is
more than two (p ≥ 3) called the “James-Stein classical condition.” When we shrink a maximum
likelihood estimator (MLE) under the “James-Stein condition”, we obtain a new shrinkage estimator
which is closer to the assumed true value compared to the MLE. The magnitude of the improvement
UNDER PEER REVIEW

depends on the distance between the “true" parameter value and the parametric restriction which
o
yields a shrinkage target denoted by θ̃ n . With all these modifications and restrictions to achieve
this estimator, we ask ourselves if this desirable shrinkage estimator is asymptotically consistent and
efficient.
Literature on shrinkage estimators is a lot but we just mention a few of the most relevant contri-
butions to our study. Stein with his student James James and Stein (1961) used shrinking techniques
to come up with an estimator called the James-Stein shrinkage estimator (JSSE) which has lower
squared risk loss compared to the MLE. (Baranchik, 1964) showed that the positive part James-
Stein shrinkage estimator has lower risk than an ordinary JSSE. (Berger, 1976) gives a discussion
on selecting a minimax estimator of a multivariate normal mean by considering different types of
the James-Stein type estimators. (Stein, 1981) used shrinking techniques to estimate the mean for a
multivariate normal distribution. (Carter and Ullah, 1984) constructed the sampling distribution and
F-ratios for a James-Stein shrinkage estimator obtained by shrinking an ordinary least square (OLS)
in regression models. (George, 1986) proposed a new minimax multiple shrinkage estimator that al-
lows multiple specifications for selection of a set of targets to shrink a given estimator. (Geyer, 1994)
looked at the asymptotics of constrained M-estimators which also fall in the class of shrinkage es-
timators. Then (Hansen, 2008) constructed a generalised James-Stein shrinkage estimator obtained
by shrinking a MLE, and (Hansen, 2016) derived its asymptotic distribution and showed that we can
shrink towards a sub-parameter space.

2. Statement of the Problem


The theory of shrinkage techniques plays an important role in developing efficient statistical esti-
mators which play a key role in statistical decision theory. Therefore, a clear understanding of the

asymptotic behaviour of the James-Stein shrinkage estimator β̂ n provides knowledge on the stability
and efficiency of the estimator when the sample size value n grows without bound.
This paper will investigate the asymptotic consistency and efficiency of the James-Stein shrink-
age estimator (JSSE) obtained by shrinking a maximum likelihood estimator (MLE) when we have
observed variables X ∼ N p (θ , Σ). Though the shrinkage estimator we are interested in is biased its
study is important because there is a realisation that efficiency (lower risk) dominates all other prop-
erties in estimation. (Efron, 1975) discusses how bias dominates unbiasedness in estimation.
We proceed by considering the asymptotic distributions of all the three estimators important to
this study by considering results in Hansen (2016). Using the asymptotic distribution derived by

(Hansen, 2016), we employ Taylor’s theorem and some limit theorems to show that β̂ o −→ p θ o the
“true" parameter value as n −→ ∞. Then we evaluate the asymptotic distributional bias (ADB) for
o ∗
the estimators θ̂ n , θ̃ n and β̂ o and show that the variance of the latter achieves the Cramér-Rao bound
(CRB) as n −→ ∞. The analysis is done along the sequences θ n as n −→ ∞. Simulation plots are
produced in a statistical package R to compare the JSSE and MLE in terms of mean squared error
(MSE), consistency and convergence.
The paper is organised as follows. Section 2.1 presents the parametric set up. Section 2.2 gives
the form of the JSSE considered in the study while Section 2.3 discusses the asymptotic distribu-
tions of the estimators. In Section 3 we present the main results by first presenting a lemma on the
UNDER PEER REVIEW

convergence in probability of the shrinking factor as Section 3.1. We then show the consistency of
the shrinkage estimator in Theorem 1. In Section 3.2 we evaluate the ADBs of the estimators in
play. Then we show that the James-Stein shrinkage estimator is asymptotically efficient in Section
3.3 and also establish the rate of convergence in Section 3.4. In Section 3.5 we present MSE plots
comparing the JSSE and MLE and in Section 4 we give a discussion and analysis of the whole study.
Then conclude our study by stating the main results.
The following definitions are used to establish the asymptotic consistency and efficiency of the

James-Stein shrinkage estimator β̂ n .

Definition 1
An estimator Tn = h(X1 , ., Xn ) is said to be consistent for θ n if it converges in probability to θ n .
That is, if for all ε > 0
lim Pr (| Tn − θ n |< ε) = 1
n−→∞
or
lim Pr (| Tn − θ n |> ε) = 0
n−→∞

where n is the sample size value.

Definition 2
Let X1 , ., Xn be independent and identically distributed (iid) according to a probability density fθ (X)
satisfying suitable regularity conditions. Suppose that Tn = h(X1 , ., Xn ) is asymptotically normal say

that n (Tn − θ n ) −→ p N p (0, Σn ) for a positive definite matrix Σn where Tn is estimating θ n . Then
a sequence of estimators {Tn } = {h(X1 , ., Xn )} satisfying

lim [nVar(Tn )] = Jn (θ )−1


n−→∞

for the fisher information Jn (θ ) is said to be asymptotically efficient.


We now consider a statistical model. We describe the set up of the parameter of interest and the
shrinking strategy used in the study.

2.1. Parametric structure


Consider an unbiased estimator θ̂ n for θ ∈ Ω such that θ̂ n ∼ N p (θ , Σn ) is a p-multivariate nor-
o
mal, where elements of Ω are p-dimensional parameter vectors. Let θ̃ n (shrinkage target) be a
restricted maximum likelihood estimator (RMLE) for θ ∈ Ωo a sub-parameter space partitioned
from the whole parameter space Ω by a parametric restriction Ωo = {θ ∈ Ω : a(θ ) = 0} where
a(θ ) : R p −→ Rm . The sub-parameter space Ωo provides a simple model of interest to shrink to. If
m = p then Ωo = {0} (θ is the kernel of R p ) a singleton zero vector and if m < p, we create a sub-
model of particular interest. Let A(θ ) = ∂∂θ a0 where A is a shrinkage matrix of dimension p × m. We
introduce another matrix G which harmonises the dimension of the RMLE from m to p. Hence we
have a mapping g(θ ) : Rm −→ R p such that G(θ ) = ∂∂θ g0 . The matrix G is an m × p matrix when
we consider a sub-parameter space Ωo and G = I p is a p-dimensional identity matrix when we have
UNDER PEER REVIEW

the whole parameter space Ω. We note that the matrix G is used to increase the dimension of the
RMLE since it will be m-dimensional. Therefore we have a plug-in restricted maximum likelihood
o o
estimator g(θ̃ n ) = β̃ n . The matrix G is a matrix harmonising the dimension due to shrinkage and the
actual dimension of the parameters of interest p. The plug-in unrestricted MLE θ̂ n in the shrinkage
sense is denoted by β̂ n . With all parameters set, we present the generalised James-Stein shrinkage

estimator β̂ n in the next section.

2.2. Positive part James-Stein shrinkage estimator


o
Let θ̂ n be the MLE for θ ∈ Ω and θ̃ n be a restricted maximum likelihood estimator for θ ∈ Ωo a
sub-parameter space of the whole parameter space Ω such that the elements of the parameter space
o o
are in R p as described before. Let g(θ̃ n ) = β̃ n be the plug-in estimator of the RMLE of p-dimension.
∗ o
Then the James-Stein shrinkage estimator β̂ n obtained by shrinking the MLE towards the target θ̃ n
is given by
 
∗ p−2  o

β̂ n = β̂ n −    β̂ n − β̃ n (1)
 
o
> 
o

−1
n β̂ n − β̃ n Σ β̂ n − β̄ n
+

where (a)+ is positive trimming function and p ≥ 3. The shrinkage estimator in (1) can be expressed
as a weighted average by letting

o >
   o

Dn = n β̂ n − β̃ n Σ−1 β̂ n − β̃ n (2)
 o

a distance statistic which is the same as the loss function n` β̂ n , β̃ n where Σ is a covariance matrix
and
 
τ
b = 1−
w (3)
Dn +

for τ = p − 2. Then (1) becomes


∗ o
bβ̂ n + (1 − w)
β̂ n = w b β̃ n (4)
o
which is a weighted average of β̂ n and β̃ n . The James-Stein shrinkage estimator presented above has
lower risk compared to the MLE as shown by (Hansen, 2016) and (James and Stein, 1961). To check
whether the shrinkage estimator is asymptotically consistent we need its asymptotic distribution. We
therefore present the asymptotic distributions for the MLE, RMLE and JSSE in the next section.

2.3. Asymptotic distribution


We assume that the maximum likelihood estimator satisfies the assumptions for regularity conditions
given in Hansen (2016) and Newey and Mcfadden (1994). With these assumptions in mind, the
o ∗ 1
asymptotic distributions of β̃ n and β̂ n are analysed along the sequences θ n = θ o + n− 2 h where θ o
UNDER PEER REVIEW

is the assumed true parameter value and h ∈ R p is a constant providing a neighbourhood for the true
parameter value θ o . From the normality of the MLE we have
√ 
n θ̂ n − θ n −→d Z ∼ N p (0, Σ) (5)

as n −→ ∞. Using (5), (Hansen, 2016) obtained the asymptotic distribution of the restricted maxi-
mum likelihood estimator as
√ o
 −1
n θ̃ n − θ n −→ p Z − ΣA A> ΣA A> (Z + h)

(6)

 −1
which has some shrinkage value effect k = ΣA A> ΣA A> . As a consequence of the conver-
gence in (5) and (6) we have

√  o
  −1
n β̂ n − β̃ n −→d G> ΣA A> ΣA A> (Z + h) (7)

which is an asymptotic distribution the MLE converges to when it is estimating the RMLE where
β n = g(θ n ). The distance statistic Dn in equation (2) converges to some distribution given by a non
central chi-squared distribution as described by (Hansen, 2016)
 o
  
Dn = n` β̂ n , β̃ n −→d (Z + h)> B(Z + h) = ξ ∼ χ p2 h> Bh (8)

 −1  −1
where matrix B = A A> ΣA A> ΣGΣ−1 G> ΣA A> ΣA A> . Using (2) (Hansen, 2016) showed
that
 
p−2
b −→d w(Z) = 1 −
w (9)
ξ +

which is a positive asymptotic distribution of an inverse of a chi-squared distribution with constant


τ = (p − 2) for p ≥ 3. Therefore using (9) as n −→ ∞, (Hansen, 2016) showed that

√  ∗    −1 
n β̂ n − β n −→d w(Z)G> Z + (1 − w(Z)) G> Z − G> ΣA A> ΣA A> (Z + h) (10)

which is normally distributed with some shrinkage value effect. With the asymptotic distribution of
the shrinkage estimator in place we now present the main results.

3. Main Results
The main results are presented in three sections. In the first section we show that the James-Stein

shrinkage estimator β̂ n is asymptotically consistent. In the second section we evaluate the asymptotic
distribution bias of the three estimators in play. Then in the last section we show that the shrinkage
estimator is asymptotically efficient by showing that its variance achieves the Cramér Rao-bound.
UNDER PEER REVIEW

3.1. Consistency of the James-Stein shrinkage estimator


We present Lemma 1 which shows the convergence in probability of the weight (shrinkage factor)

b The result is used when establishing the consistency of the James-Stein shrinkage estimator β̂ n .
w.

Lemma 1
From equation (8) we have
 
τ
b −→d w(Z) = 1 −
w (11)
ξ +

where ξ = (Z + h)> B(Z + h) ∼ χ p2 (h> Bh) is a non central Chi-squared distribution with non cen-
trality parameter h> Bh, τ = p − 2 and p ≥ 3. Along the sequences θ n , if h −→ ∞ then

w(Z) −→ p 1 (12)

and if h is fixed then

w(Z) −→ p 0 otherwise w(Z) −→ p r (13)

where r is a constant such that 0 < r < 1 and (a)+ in (11) is a positive trimming function which
keeps what is in the brackets greater than or equal to zero.

Proo f .
We begin by considering the first case when h diverges to infinity. Suppose that h −→ ∞ then

(Z + h) −→ ∞ as n −→ ∞. (14)

Therefore from (14) we have

(Z + h)> B(Z + h) = ξ −→ ∞ as h −→ ∞ and n −→ ∞. (15)

Now considering that  


τ
b −→d w(Z) = 1 −
w
ξ +
as n −→ ∞. Using (15) we have

w(Z) −→ p 1 as n −→ ∞. (16)

Hence we have established (12).

Secondly suppose that h is fixed, then we have the sequence


1
θ n = θ o + n− 2 h

which becomes θ n = θ o as n −→ ∞ implying that θ̂ n −→ θ o as n −→ ∞. Suppose


 
ξ = χ p2 h> Bh −→ p D as n −→ ∞ (17)
UNDER PEER REVIEW

where D is a constant, h is fixed and B is not affected by an increase in n, then


 τ
w(Z) −→ p 1 −
D +
τ
where p ≥ 3. If D = 1 as n −→ ∞, then

w(Z) −→ p 0.

If Dτ > 1 then 1 − D will be negative and by definition of the positive streaming function we end
τ

up with zero. This will vary as p changes but still considering ξ ∼ χ p2 (h> Bh) the probability of
ξ depends on the degrees of freedom p and will vary according to the chi-squared distribution,
implying that the ratio Dτ = M ≥ 1 as n −→ ∞. Therefore we have
 τ
w(Z) −→ p 1 − as n −→ ∞,
D +
w(Z) −→ p (1 − M)+ as n −→ ∞

for a constant M > 1. Proceeding in the same way we have

w(Z) −→ p (F)+ as n −→ ∞, f or 1 − M = F < 0,


w(Z) −→ p 0 as n −→ ∞

by definition of (x)+ . Thus

w(Z) −→ p 0 as n −→ ∞. (18)
τ τ
Otherwise, if the ratio D is such that 0 < D < 1 as n −→ ∞, we have

w(Z) −→ p r

where r ∈ (0, 1).

Lemma 1 above establishes convergence of the weight w b which determines shrinkage. In this
case we realise that the same weight determines the convergence in distribution and probability of
the shrinkage estimator. From the regularity conditions we know that the MLE θ̂ n is asymptotically
o
consistent and this consistency extends to the RMLE θ̄ n . With this fact in mind, we now present the

main result which shows the consistency of the shrinkage estimator β̂ n .

Theorem 1
Let θ ∈ Ω, where Ω is a parameter space with elements in R p . Suppose we have a James-Stein

shrinkage estimator β̂ n which is obtained by shrinking the maximum likelihood estimator θ̂ n of
o
θ ∈ Ω where the shrinkage target θ̃ n is the restricted maximum likelihood estimator of θ ∈ Ωo a
partitioned sub-parameter space from Ω by the restriction described in Section 2.1. Then the JSSE
is given by
!
∗ p−2 o
β̂ n = β̂ n − o o (β̂ n − β̃ n )
n(β̂ n − β̃ n )> Σ−1 (β̂ n − β̃ n ) +
UNDER PEER REVIEW

o o
where β̂ n = g(θ̂ n ), β̃ n = g(θ̃ n ), p ≥ 3 and (x)+ is a positive trimming function. If θ̂ n is consistent
for θ n as n −→ ∞ then the James-Stein shrinkage estimator β̂n∗ is also consistent for θ n as n −→ ∞,
where the sequence θ n is as defined in Section 2.3.

Proo f .

Let β n = θ n . To show that β̂ n is consistent for θ n as n −→ ∞ we consider the value of h which
determines the neighbourhood of the sequence θ n , when it diverges to infinity and when it is just
fixed. Suppose that h diverges to infinity, to evaluate
√ ∗
 
n(β̂ n − β n ) −→d w(Z)G> Z + (1 − w(Z)) G> Z − ΣA(A> ΣA)−1 A> (Z + h) (19)

as n −→ ∞, we first consider w(Z) from (3). By Lemma 1

w(Z) −→ p 1 as n −→ ∞. (20)

Hence from (20) and substituting w(Z) by 1 in (19) we have


√ ∗
n(β̂ n − β n ) −→d G> Z ∼ G> N p (0, Σ)

which gives
√ ∗
n(β̂ n − β n ) −→d N p (0, Σβ ) (21)

as n −→ ∞ where Σβ = G> ΣG. Thus we have



β̂ n −→ p β n

if  ∗ 
lim P |β̂ n − β n | > ε = 0
n→∞

for any ε > 0. Hence is consistent for β n = θ n .
β̂ n
Secondly, suppose h is fixed within an imaginable value, then the sequence
1
θ n = θ o + n− 2 h

becomes θ n = θ o as n −→ ∞. From this equality we have θ̂ n = θ̂ o and two conditions arise. The
first one is that the sequence θ n will be within the restricted parameter space Ωo with θ o ∈ Ωo . From
the restriction {θ ∈ Ω : a(θ ) = 0} of Ωo this means that the shrinkage target is exactly at the true
o
value and our consideration will be just on one parameter space. Therefore, we have θ̂ n = θ̃ n , but
from (6 )
√ o
n(θ̃ n − θ n ) −→d ζ = Z − ΣA(A> ΣA)−1 A> (Z + h)
which will be the same as the asymptotic distribution of θ̂ n since we only consider the sub-parameter
space Ωo , and the shrinkage value ΣA(A> ΣA)−1 A> (Z + h) affects it. Thus

n(θ̂ n − θ n ) −→d ζ = Z − ΣA(A> ΣA)−1 A> (Z + h) as n −→ ∞ (22)
UNDER PEER REVIEW

because we are estimating θ ∈ Ωo and from Section 2.1 there will be no difference between the MLE
and RMLE. Due to this equality of the two maximum likelihood estimators, from (4)
∗ o
bβ̂ n + (1 − w)
β̂ n = w b β̃ n

and substituting (22) in (19) we have


√ ∗
 
n(β̂ n − β n ) −→d w(Z)G> Z − ΣA(A> ΣA)−1 A> (Z + h) + (1 − w(Z))
 
G> Z − ΣA(A> ΣA)−1 A> (Z + h)

as n −→ ∞, which becomes
√ ∗
h  i
n(β̂ n − β n ) −→d w(Z)G> Z − ΣA(A> ΣA)−1 A> (Z + h)) −
h  i
w(Z)G> Z − ΣA(A> ΣA)−1 A> (Z + h)) +
 
G> Z − ΣA(A> ΣA)−1 A> (Z + h)

as n −→ ∞, and then simplifies to


√ ∗
 
n(β̂ n − β n ) −→d G> Z − ΣA(A> ΣA)−1 A> (Z + h) (23)

o o
as n −→ ∞, which is the same as the asymptotic distribution of β̃ n = g(θ̃ n ). Therefore using the con-

sistency of the RMLE and (23), the consistency of the James-Stein shrinkage estimator β̂ n follows
o
from the consistency of θ̃ n .
Lastly, we consider the case when we have two well defined parameter spaces, Ωo and Ω − Ωo .
o
Then we have θ̂ n 6= θ̃ n . Analysing (19) further, we consider the shrinkage effect value ΣA(A> ΣA)−1 A>
which is not affected by the sample size value n but it is a value affected by an increase or decrease
in the number of parameters p. Since
Z ∼ N p (0, Σ)
then

Z + h ∼ N p (h, Σ) (24)

by linearity property of the normal distribution. Also implying that

η(Z + h) ∼ N p (ηh, η > Ση) (25)

for some matrix η of dimension p × p. From (13) of Lemma 1 we have


τ
w(Z) −→ p 0 if ≥1 (26)
Dn
as n −→ ∞. Therefore (19) becomes
√ ∗
n(β̂ n − β n ) −→d w(Z)G> Z + (1 − w(Z))G> (Z − η(Z + h))
UNDER PEER REVIEW

for some shrinkage value effect matrix η = ΣA(A> ΣA)−1 A> . Evaluating this asymptotic distribution
as n −→ ∞ we have

w(Z)G> Z + (1 − w(Z))G> (Z − η(Z + h)) −→d G> (Z − η(Z + h))

as n −→ ∞ since w(Z) −→ p 0. Thus


√ ∗
 
n(β̂ n − β n ) −→d G> Z − ΣA(A> ΣA)−1 A> (Z + h) as n −→ ∞.

∗ o o o
Hence the consistency of β̂ n follows from the consistency of β̃ n = g(θ̃ n ) which is consistent since θ̃ n
is consistent. Similarly if w(Z) −→ r ∈ (0, 1), the consistency of the James-Stein shrinkage estimator

β̂ n follows from the consistency of the restricted maximum likelihood estimator and also the fact that

θ̂ n is consistent for θ n . Thus the shrinkage estimator β̂ n is asymptotically consistent for θ n .

In Theorem 1, we first consider the case when the sequence θ n has a neighbourhood which
is not restricted by letting h diverge to infinity. When this is the case, the entire parameter space
becomes of interest and for h −→ ∞ we obtain ξ −→ p ∞ and w b −→ p 1 as n −→ ∞. Hence there
is no difference on how the parameters in Ωo and Ω are asymptotically distributed. As a result
the asymptotic distribution of the James-Stein shrinkage estimator is the same as that of the initial
maximum likelihood estimator under this condition. Therefore the consistency of the James-Stein
shrinkage estimator follows from the consistency of the maximum likelihood estimator.
In the second case we take h as a fixed imaginable value. In this case the two parameter spaces
are well defined and distinctive in terms of where the parameters of interest are located. When
1
n −→ ∞ then θ n = θ o since n− 2 h −→ 0. Thus when we are within the restricted sub-parameter space
Ωo the maximum likelihood estimator and the restricted likelihood estimator are asymptotically
distributed the same. The consequence of having the two maximum likelihood estimators (MLE
and RMLE) distributed the same results in the James-Stein shrinkage estimator having
 the same
√  o
asymptotic distribution as the MLE and RMLE. Furthermore, we have n β̂ n − β̃ n −→ p 0 as
n −→ ∞. Stone (1974) obtained similar results though under invariant estimators. In this case the
two maximum likelihood estimators do not have to be necessarily invariant. Therefore the James-

Stein shrinkage estimator β̂ n is asymptotically consistent for θ n .

In the next section we investigate the asymptotic distribution bias of β̂ n . The results in this

section are used in showing the asymptotic efficiency of the shrinkage estimator β̂ n .

3.2. Asymptotic distributional bias


We study the asymptotic distributional bias (ADB) for the three estimators by analysing the asymp-
totic bias values. The ADB of an estimator Tn is given by
√ 
ADB(Tn ) = lim E n(Tn − θ n ) (27)
n−→∞

o
where the estimator Tn is estimating θ n . We present the asymptotic distributional bias for θ̂ n , θ̃ n
and β̂ n in the theorem below.
UNDER PEER REVIEW

Theorem 2
Suppose that regularity assumptions for the MLE and RMLE hold. Then under {Pn } a sequence of
o
parameter dimension with the sample size value n and p ≥ 3, the ADBs of the estimators θ̂ n , θ̃ n and
β̂ ∗ n are respectively

1. ADB(θ̂ n ) = 0
o
2. ADB(θ̃ n ) = −ΣA(A> ΣA)−1 A> h

3. ADB(β̂ n ) = −ϑ G> ΣA(A> ΣA)−1 A> h
h i
where ϑ = Eθ p−2
ξ
.

Proo f .
1. √ 
ADB(θ̂ n ) = lim Eθ n(θ̂ n − θ n )
n−→∞
= lim 0
n−→∞ (28)
=0
∴ ADB(θ̂ n ) = 0

since n(θ̂ n − θ n ) −→d Z ∼ N p (0, Σ) as n −→ ∞.

2. o √ o 
ADB(θ̃ n ) = lim Eθ n(θ̃ n − θ n )
n−→∞
 
= lim −ΣA(A> ΣA)−1 A> h
n−→∞ (29)
> −1 >
= −ΣA(A ΣA) A h
o
∴ ADB(θ̃ n ) = −ΣA(A> ΣA)−1 A> h
from equation (6).

√ ∗

3. ADB(β̂ n ) = lim Eθ n(β̂ n − β n ) . From equation (19) of Theorem 1 we have
n−→∞
√ ∗
 
n(β̂ n − β n ) −→d w(Z)G> Z + (1 − w(Z))G> Z − ΣA(A> ΣA)−1 A> (Z + h)

where  
p−2
w(Z) = 1 − .
ξ +

Therefore,
  
p−2
Eθ [w(Z)] = 1 −
ξ
 
p−2
= 1 − Eθ
ξ
= 1−ϑ
UNDER PEER REVIEW

h i
p−2
where ϑ = Eθ ξ
, p ≥ 3 and Eθ (Z) = 0 as n −→ ∞ since Z ∼ N p (0, Σ).

Then

h   i
ADB(β̂ n ) = lim (1 − ϑ ) 0 + (1 − 1 + ϑ ) −G> ΣA(A> ΣA)−1 A> h
n−→∞
h i
= lim −ϑ G> ΣA(A> ΣA)−1 A> h
n−→∞ (30)
> > −1 >
= −ϑ G ΣA(A ΣA) A h

∴ ADB(β̂ n ) = −ϑ G> ΣA(A> ΣA)−1 A> h
h i
p−2
where ϑ = Eθ ξ
for p ≥ 3.

Remark 1
When the fixed constant h = 0, the asymptotic distributional bias values of the three estimators are
zero. Therefore h 6= 0.
From equation (28) of Theorem 2, the maximum likelihood estimator is asymptotically unbiased.
Equations (29) and (30) show that the restricted maximum likelihood estimator and the James-Stein
shrinkage estimator are both asymptotically biased. This means that both shrinking and partitioning
of a parameter space brings bias to estimators.
Using the bias of the shrinkage estimator obtained above, we now analyse whether the James-

Stein estimator β̂ n is asymptotically efficient.

3.3. Asymptotic efficient



To check whether the shrinkage estimator β̂ n is asymptotically efficient, we use the Cramér-Rao
bound for biased estimators. In the theorem below we show that the variance of the JSSE achieves
this bound as n −→ ∞. We use concepts of the study by (Hogdes and Lehman, 1975).

Theorem 3

Let β̂ n be a James-Stein shrinkage estimator obtained by shrinking a maximum likelihood estimator

θ̂ n where the two estimators are as defined in Section 2.2. Given the asymptotic bias bθ (β̂ n ) of the

JSSE β̂ n , the Cramér-Rao bound for β̂n∗j is given by
h i2
1 + b0θ j (β̂n∗ )
CRB = f or j = 1, 2, .....p (31)
J j j (θ )

where J(θ ) is the fisher information and b0θ j is the derivative of the jth element of the bias vector.
Then
CRB
= 1 as n −→ ∞,
Σ j j (β̂n∗ )

and thus the James-Stein shrinkage estimator β̂ n is asymptotically efficient for all j = 1, 2, ...., p.
UNDER PEER REVIEW

Proo f .
We analyse asymptotic efficiency by evaluating the Cramér-Rao bound as n −→ ∞. Consider the

bias of the estimator β̂ n from part 3 of Theorem 2

bθ (β̂ n ) = −ϑ G> ΣA(A> ΣA)−1 A> h (32)
h i h i
where ϑ = E p−2
ξ
for p ≥ 3. The expectation E p−2
ξ
of the fraction p−2
ξ
which follows a distri-
bution determined by the distribution ξ ∼ χ p2 (h0 Bh)
has a value (constant) free of the parameter θ .
Therefore we regard it as a constant. Let α = −ϑ then (32) becomes

bθ (β̂ n ) = αG> ΣA(A> ΣA)−1 A> h (33)

and b0θ (β̂ n ) will be
∂∗ ∗
b0θ (β̂ n ) =
bθ (β̂ n )
∂θ (34)
∂  > > −1 >

=α G ΣA(A ΣA) A h
∂θ
a matrix of dimension p. Using the definition of the CRB and then combining (32) and (34) we
obtain
h i2 h  i2
1 + b0θ j (β̂n∗ ) 1 + α ∂ ∂θ j G> ΣA(A> ΣA)−1 A> h
= (35)
J j j (θ ) J j j (θ )
 −1
n
∂ ∂2
for j = 1, 2, .....p where ∂θ j is the partial derivative of the jth element, Σ = Σ(θ ) = − ∑ ∂θ∂θ0
log fθ (Xi ) ,
i=1
A = A(θ o ) = ∂ 0
∂ θ a(θ ) and
" #
n
∂2
J = J(θ ) = E ∑ − 0 log f θ (Xi ) .
i=1 ∂ θ ∂ θ

We begin our analysis of the bound by considering the terms involved. Thus

∂ ∂2
A= a(θ )0
∂θ ∂θ∂θ0
remains the same as n −→ ∞. We have
!−1
n
∂2
Σ(θ ) = −∑ 0 log f θ (Xi ) −→ Σ p (36)
i=1 ∂ θ ∂ θ

as n −→ ∞ where the elements of Σ p are zeros apart from the diagonal elements Σ j j (θ ) which are
ones for j = 1, 2, ..., p since the observations are iid and follow a p-multivariate standard normal
distribution. Thus from (36) we have
∂ ∂
Σ j j (θ ) −→ 0 and Σ −→ 0 (37)
∂θ j ∂θ
UNDER PEER REVIEW

as n −→ ∞ for j = 1, 2, ...., p. This implies that

∂ ∂ >
G> ΣA(A> ΣA)−1 A> h −→ 0 and G ΣA(A> ΣA)−1 A> h −→ 0 (38)
∂θ j ∂θ

for j = 1, 2, ...., p as n −→ ∞. Then from (38) we have



b0θ j (β̂n∗ ) −→ 0 and b0θ (β̂ n ) −→ 0 (39)

for j = 1, 2, ...., p as n −→ ∞. Therefore from (38) and (39), then using (35) we have
h i2 h  i2
1 + b0θ j (β̂n∗ ) 1 + α ∂ ∂θ j G> ΣA(A> ΣA)−1 A> h
=
J j j (θ ) J j j (θ )
[1 + 0] 2 (40)
=
J j j (θ )
= J j j (θ )−1

as n −→ ∞ for j = 1, 2, ..., p. Since for all j = 1, 2, ...., p we have Σ j j (θ ) = J j j (θ )−1 then Σ = J−1
as n −→ ∞. Hence from (40) we have the variance for the James-Stein shrinkage estimator β̂n∗j ,
Σ j j (β̂n∗ ) converges to the CRB as n −→ ∞ for all j = 1, 2, ....., p. This means that

CRB Σ j j (β̂n∗ )
−→ = 1 as n −→ ∞
Σ j j (β̂n∗ ) Σ j j (β̂n∗ )

for all j = 1, 2, ..., p. Thus the James-Stein shrinkage estimator β̂ n is asymptotically efficient.

Theorem 3 above shows that the James-Stein shrinkage estimator obtained from shrinking the
MLE achieves the CRB asymptotically. This means that the shrinkage estimator is asymptotically
efficient and hence stable when we have large sample size values. Since the initial estimator (MLE)
is known to be asymptotically efficient then we see that the shrinking process has no effect on the
asymptotic efficiency of an estimator we are shrinking.

3.4. Rate of Convergence



We now investigate the rate of convergence of the shrinkage estimator β̂ n (JSSE) by using concepts
applied on the MLE discussed in Hoeffding (Hoeffding, 1948). To proceed we consider the shrink-
o ∗
age estimator of the form in (1) using plug-in maximum likelihood estimators β̂ n and β̃ n . Let β̂ n be
a James-Stein shrinkage estimator obtained when we shrink the MLE β̂ n = g(θ̂ n ) defined earlier be-
fore for p ≥ 3. We proceed to find the rate of convergence of this estimator by using its relationship
with the MLE. Since the shrinkage target value may have no effect on the convergence rate, for easier
o o
transformation of our sequence θ n we set θ̃ n = 0 implying β̃ n = 0. Thus we have
 
∗ p − 2
β̂ n = β̂ n −   >   β̂ n
−1
n β̂ n V β̂ n
+
UNDER PEER REVIEW

which becomes
!
∗ p−2
β̂ n = 1− >
β̂ n
β̂ n V−1 β̂ n +

when we factor out β̂ n and drop out the n in the denominator to have a form with a lower MSE
according to the James-Stein shrinkage strategy. Let
!
p−2
k = 1− > , (41)
β̂ n V−1 β̂ n +

then

β̂ n = kβ̂ n . (42)

Now consider the sequence


 
1
β̂ n j = β o j + O p √ (43)
n

for j = 1, 2, ...., p where β o j is the “true" jth parameter value. From the equality in (42) we have

1 ∗
β̂ n = β̂ n (44)
k
for the shrinkage value k. Therefore substituting the right hand side of (44) in (43) the sequence β̂ n j
becomes
 
1 ∗ 1
β̂ = β o j + O p √ ,
k nj n
hence we have the sequence
 
∗ 1
β̂ n j = kβ o j + O p √ k (45)
n
which is in terms of the shrinkage estimator with the shrinking effect value k such that 0 < k ≤ 1.
Analysing this sequence further shows that it satisfies the smoothness regularity conditions for the
MLE, therefore we can proceed.

Let β ∗o j = kβ o j be the true value in the shrinkage sense which is obtained when we scale the true
value β o j with the shrinkage factor k. Then the sequence (45) becomes
 
∗ 1
β̂ n j = β ∗o j + O p √ k (46)
n
implying that
 
 ∗  1
β̂ n j − β ∗o j = O p √ k (47)
n
UNDER PEER REVIEW

 ∗ 
for all j = 1, 2, ...., p. This means that β̂ n j − β ∗o j is still within the neighbourhood of √1
n
since
0 < k ≤ 1. Therefore, using the second order Taylor’s theorem we have
n
∏ fβ̂ ∗ √ q
n j (xi )
 ∗ n ∗ 2
ln i=1
n = β̂ n j − β ∗o j n I j j (β ∗o )Z j − β̂ n j − β ∗o j I j j (β ∗o ) + O p (1) (48)
2
∏ fβ ∗o (xi )
i=1 j

for j = 1, 2, ...., p. Since



ln L(β̂ n ) = 0
∂β
for the maximum likelihood estimator β̂ n , then also
∂ ∗
ln L(β̂ n ) = 0
∂β
implying that
∂ ∗
ln L(β̂ n j ) = 0 (49)
∂β j
for all j = 1, 2, ...., p. Assuming that the log-likelihood function is differentiable, from (48) and (49)
we have
 n 
∏ f ∗
 i=1 β̂ n j (xi )   ∗ 1−1 √ q
∂ ∗ 2n ∗
∗ ln 
 n
 = β̂ nj − β oj n I j j (β ∗o )Z j − (β̂ n j − β ∗o j )2−1 I j j (β ∗o ) + O p (1)
∂ β̂ n j
 2
∏ fβ ∗o (xi )
i=1 j

(50)
and simplifying (50) becomes
 n 
∏ f ∗
 i=1 β̂ n j (xi )  √ q  ∗
∂ ∗ ∗


∗ ln 
 n

 = n I j j (β o )Z j − n β̂ nj − β o j I j j (β o ) + O p (1) = 0
∂ β̂ n j ∏ fβ ∗o (xi )
i=1 j

implying
√ q  ∗ 
n I j j (β ∗o )Z j − n β̂ n j − β ∗o j I j j (β ∗o ) + O p (1) = 0. (51)

Rearranging (51) we have


 ∗  √ q
nI j j (β ∗o ) β̂ n j − β ∗o j + O p (1) = n I j j (β ∗o )Z j (52)

for j = 1, 2, ...., p where Z j ∼ (0, Vβ j ) where Vβ j is the variance of the jth element of the covariance
matrix Vβ of the distribution = G> N p (0, V) and thus Z j is the standard normal distribution for the
√ p
jth element of β̂ n . Now dividing the left and right hand side of (52) by n I j j (β ∗o ) we obtain
√ q  ∗ 
n I j j (β ∗o ) β̂ n j − β ∗o j + O p (1) = Z j (53)
UNDER PEER REVIEW

where Z j is the distribution of the jth element of β̂ n and Z ∼ N p (0, Vβ ). Using sequence (45),
equation (53) becomes
√ q  
k n I j j (β ∗o ) β̂ n j − β o j + O p (1) = Z j (54)

for some β ∗o j −→ β o j and j = 1, 2, ...., p where Z ∼ N p (0, Vβ ) and Vβ = G> VG. The distribution
Z j is a normal distribution for each jth element of the plug-in estimator β̂ n as described before in the
analysis above.

Thus equation (54) establishes the condition which implies local asymptotic normality (LAN) and

differentiability in quadratic mean (DQM) for the estimator β̂ n which implies that the rate of conver-

gence is of order k√1 n and rate k n. This can also be achieved if we use the fact that the risk bound
of the James-Stein shrinkage estimator is bounded by that of the MLE and the latter converges at the
√ √
rate n. Hence the James-Stein shrinkage estimator is k n-consistent.

3.5. Simulation Plots


In this section the behaviour of the mean squared error (MSE) of the maximum likelihood estimator

θ̂ n is compared to that of the James-Stein shrinkage estimator β̂ n as the sample size n increases. The
statistical package R is used to simulate plots of the MSE for different sample size values of n using
the R package library (MASS) to stimulate data which follow a multivariate normal distribution.
The data is generated using a 3 × 3 correlation matrix (ρ) to get the covariance matrix Σ given by
 
1 0.3 0.1
Σ = 0.3 1 0.2
0.1 0.2 1
which is symmetric and the variance in the major diagonal is 1 representing a standard normal
variance. Thus we take p = 3 and this meets the James-Stein classical condition of p ≥ 3. Since
X ∼ N3 (θ , Σ), we have
1 n
θ̂ n = X̄n = ∑ Xin f or p = 3 (55)
n i=1

making the MLE θ̂ n a 3 × 1 matrix which implies that the dimension for the shrinkage estimator is
also 3 × 1. Now knowing that the maximum likelihood estimator θ̂ n is unbiased and the James-Stein

shrinkage estimator β̂ n is biased, the following expressions were used to calculate the mean squared
error (MSE) of the two estimators. Using (55), we have

MSEθ (θ̂ n ) = Σn (θ̂ n ) = Σn (X̄n ) = Var (X̄n ) (56)

as the mean squared error of θ̂ n for p = 3 since bθ (θ̂ n ) = 0. Similarly we have


∗ ∗ ∗ 2
h i
MSEθ (β̂ n ) = Σn (β̂ n ) + bθ (β̂ n )
(57)
∗ ∗ 2
h i
= Var(β̂ n ) + bθ (β̂ n )
UNDER PEER REVIEW

for the James-Stein shrinkage estimator which becomes

∗  2
MSEθ (β̂ n ) = Σn (kθ̂ n ) + bθ (kθ̂ n )
 2 (58)
= Var(kθ̂ n ) + bθ (kθ̂ n )

where k is a shrinkage value which shrinks the maximum likelihood estimator θ̂ n to a James-Stein
∗ ∗
shrinkage estimator β̂ n for p = 3. Thus the mean squared error of the shrinkage estimator β̂ n in (58)
is obtained by using (56). The shrinkage value k is evaluated using the expression

 
1 1
k = 1− >
= 1− (59)
X̄n Σn (X̄n )X̄n X̄n>Var(X̄n )X̄n

for p = 3. The commands for all expressions and plots produced in R are provided in the appendix.
We present MSE plots obtained by simulating the mean squared error using the sample size val-
ues of n = 30, 2000 and 100000. The MSE plots for both the James-Stein shrinkage estimator and
maximum likelihood estimator for each sample size value considered are plotted on the same graph
for easy comparison of the MSE trends. We begin by considering a small sample size value of n =30
to compare the the way the MSE line plots change from one particular point to the other. Since we
are interested in asymptotic behaviour, we increase the sample size value to 2000 and then 100000
to analyse the MSE trends and the rate at which the line plots become smooth.

Figure 4.1: MSE plots for the MLE and JSSE for n = 30
UNDER PEER REVIEW

Figure 4.2: MSE for the JSSE and MLE for n = 2000

Figure 4.3 : MSE plots for JSSE and MLE for n = 10000
Collectively the scaled plots show that there is some reduction in the mean squared error of the
James-Stein shrinkage estimator compared to that of the initial estimator (MLE). The trend in mean
squared error for both the maximum likelihood estimator and the James-Stein shrinkage estimator
shows that as the sample size value n increases, the MSE values converge to some value. The MSE
plots suggest that the James-Stein shrinkage estimator converges to a lower MSE value 0.9 compared
to the maximum likelihood estimator which converges to a MSE value 1.0. They also show that the
James-Stein shrinkage estimator converges faster compared to the MLE though the difference is
minimal.

4. Conclusions and Suggestions



In the study we explored the asymptotic properties of the James-Stein shrinkage estimator β̂ n which
is obtained by shrinking a MLE θ̂ n . Asymptotic consistency and efficiency of the shrinkage estima-

tor β̂ n were investigated. From the regularity conditions, the MLE is known to be unbiased, consis-
tent and efficient as the sample size value n −→ ∞. Therefore, the study analysed these asymptotic
UNDER PEER REVIEW

properties by checking whether the new estimator (shrinkage) obtained after shrinking posses them.
Thus the study examined whether the shrinking process has an effect on these properties. The re-

sults show that the James-Stein shrinkage estimator β̂ n is asymptotically consistent and efficient.
The study also showed that the shrinkage estimator (JSSE) is asymptotically biased, a property it
posses even for small values of the sample size value n. The bias is brought by the shrinking factor k
given in equation (59). We therefore see that the shrinking process introduces bias to the estimators
obtained but it preserves asymptotic consistency and efficiency, and more importantly it reduces the
MSE.
Thus the James-Stein shrinkage estimator obtained by shrinking techniques proves to be useful
though it is biased. This estimator is more effective than the maximum likelihood estimator as shown
in this study and by (Hansen, 2016). The study also showed that the JSSE is stable for large values
of the sample size value n, making it suitable in practical applications since large sample size values
are normally considered for effective estimation. Since error is always there in estimation, we justify
shrinking (minimising error) as a very important technique for yielding effective estimators.
The study has investigated the asymptotic behaviour of the James-Stein shrinkage estimator.
Asymptotic properties analysed in the study include rate of convergence, consistency and efficiency.
The results show that the James-Stein shrinkage estimator has a lower mean squared error compared
to the maximum likelihood estimator though it is biased. The results further show that the JSSE is
asymptotically consistent and efficient.

References
BARANCHIK , A. J. (1964). Multiple regression and estimation of the mean of a multivariate normal
distribution. In Technical Report, 51. Department of Statistics, Stanford University.
B ERGER , J. O. (1976). Minimax estimation of multivariate normal mean with arbitrary quadratic
loss. Journal of Multivariate Analysis, 6, 256–264.
C ARTER , R. L. AND U LLAH , A. (1984). The sampling distribution of estimators and their f-ratios
in rgression model. Journal of Econometrics, 25, 109–122.
E FRON , B. (1975). Biased versus unbiased estimation. In Advances in mathematics. Academic
press, New York.
G EORGE , E. I. (1986). Minimax multiple shrinkage estimation. Annals of Statistics, 14, 188–205.
G EYER , C. J. (1994). On the asymptotic of constrained m-estimation. Annals of Statistics, 22,
1993–2010.
H ANSEN , E. B. (2008). Generalised shrinkage estimators. www.ssc.wisc.edu/ bhansen.
H ANSEN , E. B. (2016). Efficient shrinkage in parametric models. Journals of Econometrics,
2016 (190), 188–205.
UNDER PEER REVIEW

H OEFFDING , W. (1948). A class of statistics with asymptotically normal distribution. Annals of


Mathematical Statistics, 19, 293–325.
H OGDES , J. L. AND L EHMAN , E. L. (1975). Some applications of the cramér-rao inequality.
In Proceedings of second Berkeley Symposium on Mathematical Statistics and Probability.
University of California Press, pp. 13–22.
JAMES , W. AND S TEIN , C. (1961). Estimation with quadratic loss. In Berkeley Symposium on
Mathematical Statistics and Probability. Stanford University.
N EWEY, W. K. AND M CFADDEN , D. L. (1994). Large sample estimation and hypothesis testing.
University Press, pp. 2113–2245.
S TEIN , C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distri-
bution. In Third Berkeley Symposium on Mathematical Statistics and Probability, 1. Stanford
University, pp. 197–206.
S TEIN , C. (1981). Estimation of the mean of a multivariate normal distribution. Annals of Statistics,
9 (6), 1135–1151.
S TONE , C. J. (1974). Asymptotic properties of estimators of a location parameter. The Annals of
Statistics, 6 (2), 1127–1137.

You might also like