Ms BPR 1259
Ms BPR 1259
Key words: Asymptotic, Consistency, Convergence, Efficiency, Mean Squared Error, Shrinkage.
Abstract: The study explores the asymptotic consistency of the James-Stein shrinkage estimator
obtained by shrinking a maximum likelihood estimator. We use Hansen’s approach to show that the
James-Stein shrinkage estimator converges asymptotically to some multivariate normal distribution
with shrinkage effect values. We establish that the rate of convergence is of order k√1 n and rate
√ √
k n, hence the James-Stein shrinkage estimator is k n-consistent. Then visualise its consistency
by studying the asymptotic behaviour using simulating plots in R for the mean squared error of
the maximum likelihood estimator and the shrinkage estimator. The latter graphically shows lower
mean squared error as compared to that of the maximum likelihood estimator.
1. Introduction
A shrinkage estimator is an estimator that, either explicitly or implicitly, incorporates the effects of
shrinkage. In loose terms this means that a naive or raw estimate is improved by combining it with
other information. The term relates to the notion that the improved estimate is made closer to the
“true value" by the supplied information than the raw estimate. Shrinkage estimation is a techqnique
used in inferential statistics to reduce the mean squared error (MSE) of a given estimator. The idea
of shrinking an estimator came in 1956 when (Stein, 1956) established that we can reduce the MSE
of an estimator if we give up a little on bias. This means that given an estimator we can shrink it
to obtain another estimator with lower MSE and the efficiency of the new estimator is desirable in
the way it estimates the “true" parameter value. This works well when the number of parameters is
more than two (p ≥ 3) called the “James-Stein classical condition.” When we shrink a maximum
likelihood estimator (MLE) under the “James-Stein condition”, we obtain a new shrinkage estimator
which is closer to the assumed true value compared to the MLE. The magnitude of the improvement
UNDER PEER REVIEW
depends on the distance between the “true" parameter value and the parametric restriction which
o
yields a shrinkage target denoted by θ̃ n . With all these modifications and restrictions to achieve
this estimator, we ask ourselves if this desirable shrinkage estimator is asymptotically consistent and
efficient.
Literature on shrinkage estimators is a lot but we just mention a few of the most relevant contri-
butions to our study. Stein with his student James James and Stein (1961) used shrinking techniques
to come up with an estimator called the James-Stein shrinkage estimator (JSSE) which has lower
squared risk loss compared to the MLE. (Baranchik, 1964) showed that the positive part James-
Stein shrinkage estimator has lower risk than an ordinary JSSE. (Berger, 1976) gives a discussion
on selecting a minimax estimator of a multivariate normal mean by considering different types of
the James-Stein type estimators. (Stein, 1981) used shrinking techniques to estimate the mean for a
multivariate normal distribution. (Carter and Ullah, 1984) constructed the sampling distribution and
F-ratios for a James-Stein shrinkage estimator obtained by shrinking an ordinary least square (OLS)
in regression models. (George, 1986) proposed a new minimax multiple shrinkage estimator that al-
lows multiple specifications for selection of a set of targets to shrink a given estimator. (Geyer, 1994)
looked at the asymptotics of constrained M-estimators which also fall in the class of shrinkage es-
timators. Then (Hansen, 2008) constructed a generalised James-Stein shrinkage estimator obtained
by shrinking a MLE, and (Hansen, 2016) derived its asymptotic distribution and showed that we can
shrink towards a sub-parameter space.
convergence in probability of the shrinking factor as Section 3.1. We then show the consistency of
the shrinkage estimator in Theorem 1. In Section 3.2 we evaluate the ADBs of the estimators in
play. Then we show that the James-Stein shrinkage estimator is asymptotically efficient in Section
3.3 and also establish the rate of convergence in Section 3.4. In Section 3.5 we present MSE plots
comparing the JSSE and MLE and in Section 4 we give a discussion and analysis of the whole study.
Then conclude our study by stating the main results.
The following definitions are used to establish the asymptotic consistency and efficiency of the
∗
James-Stein shrinkage estimator β̂ n .
Definition 1
An estimator Tn = h(X1 , ., Xn ) is said to be consistent for θ n if it converges in probability to θ n .
That is, if for all ε > 0
lim Pr (| Tn − θ n |< ε) = 1
n−→∞
or
lim Pr (| Tn − θ n |> ε) = 0
n−→∞
Definition 2
Let X1 , ., Xn be independent and identically distributed (iid) according to a probability density fθ (X)
satisfying suitable regularity conditions. Suppose that Tn = h(X1 , ., Xn ) is asymptotically normal say
√
that n (Tn − θ n ) −→ p N p (0, Σn ) for a positive definite matrix Σn where Tn is estimating θ n . Then
a sequence of estimators {Tn } = {h(X1 , ., Xn )} satisfying
the whole parameter space Ω. We note that the matrix G is used to increase the dimension of the
RMLE since it will be m-dimensional. Therefore we have a plug-in restricted maximum likelihood
o o
estimator g(θ̃ n ) = β̃ n . The matrix G is a matrix harmonising the dimension due to shrinkage and the
actual dimension of the parameters of interest p. The plug-in unrestricted MLE θ̂ n in the shrinkage
sense is denoted by β̂ n . With all parameters set, we present the generalised James-Stein shrinkage
∗
estimator β̂ n in the next section.
where (a)+ is positive trimming function and p ≥ 3. The shrinkage estimator in (1) can be expressed
as a weighted average by letting
o >
o
Dn = n β̂ n − β̃ n Σ−1 β̂ n − β̃ n (2)
o
a distance statistic which is the same as the loss function n` β̂ n , β̃ n where Σ is a covariance matrix
and
τ
b = 1−
w (3)
Dn +
is the assumed true parameter value and h ∈ R p is a constant providing a neighbourhood for the true
parameter value θ o . From the normality of the MLE we have
√
n θ̂ n − θ n −→d Z ∼ N p (0, Σ) (5)
as n −→ ∞. Using (5), (Hansen, 2016) obtained the asymptotic distribution of the restricted maxi-
mum likelihood estimator as
√ o
−1
n θ̃ n − θ n −→ p Z − ΣA A> ΣA A> (Z + h)
(6)
−1
which has some shrinkage value effect k = ΣA A> ΣA A> . As a consequence of the conver-
gence in (5) and (6) we have
√ o
−1
n β̂ n − β̃ n −→d G> ΣA A> ΣA A> (Z + h) (7)
which is an asymptotic distribution the MLE converges to when it is estimating the RMLE where
β n = g(θ n ). The distance statistic Dn in equation (2) converges to some distribution given by a non
central chi-squared distribution as described by (Hansen, 2016)
o
Dn = n` β̂ n , β̃ n −→d (Z + h)> B(Z + h) = ξ ∼ χ p2 h> Bh (8)
−1 −1
where matrix B = A A> ΣA A> ΣGΣ−1 G> ΣA A> ΣA A> . Using (2) (Hansen, 2016) showed
that
p−2
b −→d w(Z) = 1 −
w (9)
ξ +
√ ∗ −1
n β̂ n − β n −→d w(Z)G> Z + (1 − w(Z)) G> Z − G> ΣA A> ΣA A> (Z + h) (10)
which is normally distributed with some shrinkage value effect. With the asymptotic distribution of
the shrinkage estimator in place we now present the main results.
3. Main Results
The main results are presented in three sections. In the first section we show that the James-Stein
∗
shrinkage estimator β̂ n is asymptotically consistent. In the second section we evaluate the asymptotic
distribution bias of the three estimators in play. Then in the last section we show that the shrinkage
estimator is asymptotically efficient by showing that its variance achieves the Cramér Rao-bound.
UNDER PEER REVIEW
Lemma 1
From equation (8) we have
τ
b −→d w(Z) = 1 −
w (11)
ξ +
where ξ = (Z + h)> B(Z + h) ∼ χ p2 (h> Bh) is a non central Chi-squared distribution with non cen-
trality parameter h> Bh, τ = p − 2 and p ≥ 3. Along the sequences θ n , if h −→ ∞ then
w(Z) −→ p 1 (12)
where r is a constant such that 0 < r < 1 and (a)+ in (11) is a positive trimming function which
keeps what is in the brackets greater than or equal to zero.
Proo f .
We begin by considering the first case when h diverges to infinity. Suppose that h −→ ∞ then
(Z + h) −→ ∞ as n −→ ∞. (14)
w(Z) −→ p 1 as n −→ ∞. (16)
w(Z) −→ p 0.
If Dτ > 1 then 1 − D will be negative and by definition of the positive streaming function we end
τ
up with zero. This will vary as p changes but still considering ξ ∼ χ p2 (h> Bh) the probability of
ξ depends on the degrees of freedom p and will vary according to the chi-squared distribution,
implying that the ratio Dτ = M ≥ 1 as n −→ ∞. Therefore we have
τ
w(Z) −→ p 1 − as n −→ ∞,
D +
w(Z) −→ p (1 − M)+ as n −→ ∞
w(Z) −→ p 0 as n −→ ∞. (18)
τ τ
Otherwise, if the ratio D is such that 0 < D < 1 as n −→ ∞, we have
w(Z) −→ p r
Lemma 1 above establishes convergence of the weight w b which determines shrinkage. In this
case we realise that the same weight determines the convergence in distribution and probability of
the shrinkage estimator. From the regularity conditions we know that the MLE θ̂ n is asymptotically
o
consistent and this consistency extends to the RMLE θ̄ n . With this fact in mind, we now present the
∗
main result which shows the consistency of the shrinkage estimator β̂ n .
Theorem 1
Let θ ∈ Ω, where Ω is a parameter space with elements in R p . Suppose we have a James-Stein
∗
shrinkage estimator β̂ n which is obtained by shrinking the maximum likelihood estimator θ̂ n of
o
θ ∈ Ω where the shrinkage target θ̃ n is the restricted maximum likelihood estimator of θ ∈ Ωo a
partitioned sub-parameter space from Ω by the restriction described in Section 2.1. Then the JSSE
is given by
!
∗ p−2 o
β̂ n = β̂ n − o o (β̂ n − β̃ n )
n(β̂ n − β̃ n )> Σ−1 (β̂ n − β̃ n ) +
UNDER PEER REVIEW
o o
where β̂ n = g(θ̂ n ), β̃ n = g(θ̃ n ), p ≥ 3 and (x)+ is a positive trimming function. If θ̂ n is consistent
for θ n as n −→ ∞ then the James-Stein shrinkage estimator β̂n∗ is also consistent for θ n as n −→ ∞,
where the sequence θ n is as defined in Section 2.3.
Proo f .
∗
Let β n = θ n . To show that β̂ n is consistent for θ n as n −→ ∞ we consider the value of h which
determines the neighbourhood of the sequence θ n , when it diverges to infinity and when it is just
fixed. Suppose that h diverges to infinity, to evaluate
√ ∗
n(β̂ n − β n ) −→d w(Z)G> Z + (1 − w(Z)) G> Z − ΣA(A> ΣA)−1 A> (Z + h) (19)
w(Z) −→ p 1 as n −→ ∞. (20)
which gives
√ ∗
n(β̂ n − β n ) −→d N p (0, Σβ ) (21)
if ∗
lim P |β̂ n − β n | > ε = 0
n→∞
∗
for any ε > 0. Hence is consistent for β n = θ n .
β̂ n
Secondly, suppose h is fixed within an imaginable value, then the sequence
1
θ n = θ o + n− 2 h
becomes θ n = θ o as n −→ ∞. From this equality we have θ̂ n = θ̂ o and two conditions arise. The
first one is that the sequence θ n will be within the restricted parameter space Ωo with θ o ∈ Ωo . From
the restriction {θ ∈ Ω : a(θ ) = 0} of Ωo this means that the shrinkage target is exactly at the true
o
value and our consideration will be just on one parameter space. Therefore, we have θ̂ n = θ̃ n , but
from (6 )
√ o
n(θ̃ n − θ n ) −→d ζ = Z − ΣA(A> ΣA)−1 A> (Z + h)
which will be the same as the asymptotic distribution of θ̂ n since we only consider the sub-parameter
space Ωo , and the shrinkage value ΣA(A> ΣA)−1 A> (Z + h) affects it. Thus
√
n(θ̂ n − θ n ) −→d ζ = Z − ΣA(A> ΣA)−1 A> (Z + h) as n −→ ∞ (22)
UNDER PEER REVIEW
because we are estimating θ ∈ Ωo and from Section 2.1 there will be no difference between the MLE
and RMLE. Due to this equality of the two maximum likelihood estimators, from (4)
∗ o
bβ̂ n + (1 − w)
β̂ n = w b β̃ n
as n −→ ∞, which becomes
√ ∗
h i
n(β̂ n − β n ) −→d w(Z)G> Z − ΣA(A> ΣA)−1 A> (Z + h)) −
h i
w(Z)G> Z − ΣA(A> ΣA)−1 A> (Z + h)) +
G> Z − ΣA(A> ΣA)−1 A> (Z + h)
o o
as n −→ ∞, which is the same as the asymptotic distribution of β̃ n = g(θ̃ n ). Therefore using the con-
∗
sistency of the RMLE and (23), the consistency of the James-Stein shrinkage estimator β̂ n follows
o
from the consistency of θ̃ n .
Lastly, we consider the case when we have two well defined parameter spaces, Ωo and Ω − Ωo .
o
Then we have θ̂ n 6= θ̃ n . Analysing (19) further, we consider the shrinkage effect value ΣA(A> ΣA)−1 A>
which is not affected by the sample size value n but it is a value affected by an increase or decrease
in the number of parameters p. Since
Z ∼ N p (0, Σ)
then
Z + h ∼ N p (h, Σ) (24)
for some shrinkage value effect matrix η = ΣA(A> ΣA)−1 A> . Evaluating this asymptotic distribution
as n −→ ∞ we have
∗ o o o
Hence the consistency of β̂ n follows from the consistency of β̃ n = g(θ̃ n ) which is consistent since θ̃ n
is consistent. Similarly if w(Z) −→ r ∈ (0, 1), the consistency of the James-Stein shrinkage estimator
∗
β̂ n follows from the consistency of the restricted maximum likelihood estimator and also the fact that
∗
θ̂ n is consistent for θ n . Thus the shrinkage estimator β̂ n is asymptotically consistent for θ n .
In Theorem 1, we first consider the case when the sequence θ n has a neighbourhood which
is not restricted by letting h diverge to infinity. When this is the case, the entire parameter space
becomes of interest and for h −→ ∞ we obtain ξ −→ p ∞ and w b −→ p 1 as n −→ ∞. Hence there
is no difference on how the parameters in Ωo and Ω are asymptotically distributed. As a result
the asymptotic distribution of the James-Stein shrinkage estimator is the same as that of the initial
maximum likelihood estimator under this condition. Therefore the consistency of the James-Stein
shrinkage estimator follows from the consistency of the maximum likelihood estimator.
In the second case we take h as a fixed imaginable value. In this case the two parameter spaces
are well defined and distinctive in terms of where the parameters of interest are located. When
1
n −→ ∞ then θ n = θ o since n− 2 h −→ 0. Thus when we are within the restricted sub-parameter space
Ωo the maximum likelihood estimator and the restricted likelihood estimator are asymptotically
distributed the same. The consequence of having the two maximum likelihood estimators (MLE
and RMLE) distributed the same results in the James-Stein shrinkage estimator having
the same
√ o
asymptotic distribution as the MLE and RMLE. Furthermore, we have n β̂ n − β̃ n −→ p 0 as
n −→ ∞. Stone (1974) obtained similar results though under invariant estimators. In this case the
two maximum likelihood estimators do not have to be necessarily invariant. Therefore the James-
∗
Stein shrinkage estimator β̂ n is asymptotically consistent for θ n .
∗
In the next section we investigate the asymptotic distribution bias of β̂ n . The results in this
∗
section are used in showing the asymptotic efficiency of the shrinkage estimator β̂ n .
o
where the estimator Tn is estimating θ n . We present the asymptotic distributional bias for θ̂ n , θ̃ n
and β̂ n in the theorem below.
UNDER PEER REVIEW
Theorem 2
Suppose that regularity assumptions for the MLE and RMLE hold. Then under {Pn } a sequence of
o
parameter dimension with the sample size value n and p ≥ 3, the ADBs of the estimators θ̂ n , θ̃ n and
β̂ ∗ n are respectively
1. ADB(θ̂ n ) = 0
o
2. ADB(θ̃ n ) = −ΣA(A> ΣA)−1 A> h
∗
3. ADB(β̂ n ) = −ϑ G> ΣA(A> ΣA)−1 A> h
h i
where ϑ = Eθ p−2
ξ
.
Proo f .
1. √
ADB(θ̂ n ) = lim Eθ n(θ̂ n − θ n )
n−→∞
= lim 0
n−→∞ (28)
=0
∴ ADB(θ̂ n ) = 0
√
since n(θ̂ n − θ n ) −→d Z ∼ N p (0, Σ) as n −→ ∞.
2. o √ o
ADB(θ̃ n ) = lim Eθ n(θ̃ n − θ n )
n−→∞
= lim −ΣA(A> ΣA)−1 A> h
n−→∞ (29)
> −1 >
= −ΣA(A ΣA) A h
o
∴ ADB(θ̃ n ) = −ΣA(A> ΣA)−1 A> h
from equation (6).
∗
√ ∗
3. ADB(β̂ n ) = lim Eθ n(β̂ n − β n ) . From equation (19) of Theorem 1 we have
n−→∞
√ ∗
n(β̂ n − β n ) −→d w(Z)G> Z + (1 − w(Z))G> Z − ΣA(A> ΣA)−1 A> (Z + h)
where
p−2
w(Z) = 1 − .
ξ +
Therefore,
p−2
Eθ [w(Z)] = 1 −
ξ
p−2
= 1 − Eθ
ξ
= 1−ϑ
UNDER PEER REVIEW
h i
p−2
where ϑ = Eθ ξ
, p ≥ 3 and Eθ (Z) = 0 as n −→ ∞ since Z ∼ N p (0, Σ).
Then
∗
h i
ADB(β̂ n ) = lim (1 − ϑ ) 0 + (1 − 1 + ϑ ) −G> ΣA(A> ΣA)−1 A> h
n−→∞
h i
= lim −ϑ G> ΣA(A> ΣA)−1 A> h
n−→∞ (30)
> > −1 >
= −ϑ G ΣA(A ΣA) A h
∗
∴ ADB(β̂ n ) = −ϑ G> ΣA(A> ΣA)−1 A> h
h i
p−2
where ϑ = Eθ ξ
for p ≥ 3.
Remark 1
When the fixed constant h = 0, the asymptotic distributional bias values of the three estimators are
zero. Therefore h 6= 0.
From equation (28) of Theorem 2, the maximum likelihood estimator is asymptotically unbiased.
Equations (29) and (30) show that the restricted maximum likelihood estimator and the James-Stein
shrinkage estimator are both asymptotically biased. This means that both shrinking and partitioning
of a parameter space brings bias to estimators.
Using the bias of the shrinkage estimator obtained above, we now analyse whether the James-
∗
Stein estimator β̂ n is asymptotically efficient.
Theorem 3
∗
Let β̂ n be a James-Stein shrinkage estimator obtained by shrinking a maximum likelihood estimator
∗
θ̂ n where the two estimators are as defined in Section 2.2. Given the asymptotic bias bθ (β̂ n ) of the
∗
JSSE β̂ n , the Cramér-Rao bound for β̂n∗j is given by
h i2
1 + b0θ j (β̂n∗ )
CRB = f or j = 1, 2, .....p (31)
J j j (θ )
where J(θ ) is the fisher information and b0θ j is the derivative of the jth element of the bias vector.
Then
CRB
= 1 as n −→ ∞,
Σ j j (β̂n∗ )
∗
and thus the James-Stein shrinkage estimator β̂ n is asymptotically efficient for all j = 1, 2, ...., p.
UNDER PEER REVIEW
Proo f .
We analyse asymptotic efficiency by evaluating the Cramér-Rao bound as n −→ ∞. Consider the
∗
bias of the estimator β̂ n from part 3 of Theorem 2
∗
bθ (β̂ n ) = −ϑ G> ΣA(A> ΣA)−1 A> h (32)
h i h i
where ϑ = E p−2
ξ
for p ≥ 3. The expectation E p−2
ξ
of the fraction p−2
ξ
which follows a distri-
bution determined by the distribution ξ ∼ χ p2 (h0 Bh)
has a value (constant) free of the parameter θ .
Therefore we regard it as a constant. Let α = −ϑ then (32) becomes
∗
bθ (β̂ n ) = αG> ΣA(A> ΣA)−1 A> h (33)
∗
and b0θ (β̂ n ) will be
∂∗ ∗
b0θ (β̂ n ) =
bθ (β̂ n )
∂θ (34)
∂ > > −1 >
=α G ΣA(A ΣA) A h
∂θ
a matrix of dimension p. Using the definition of the CRB and then combining (32) and (34) we
obtain
h i2 h i2
1 + b0θ j (β̂n∗ ) 1 + α ∂ ∂θ j G> ΣA(A> ΣA)−1 A> h
= (35)
J j j (θ ) J j j (θ )
−1
n
∂ ∂2
for j = 1, 2, .....p where ∂θ j is the partial derivative of the jth element, Σ = Σ(θ ) = − ∑ ∂θ∂θ0
log fθ (Xi ) ,
i=1
A = A(θ o ) = ∂ 0
∂ θ a(θ ) and
" #
n
∂2
J = J(θ ) = E ∑ − 0 log f θ (Xi ) .
i=1 ∂ θ ∂ θ
We begin our analysis of the bound by considering the terms involved. Thus
∂ ∂2
A= a(θ )0
∂θ ∂θ∂θ0
remains the same as n −→ ∞. We have
!−1
n
∂2
Σ(θ ) = −∑ 0 log f θ (Xi ) −→ Σ p (36)
i=1 ∂ θ ∂ θ
as n −→ ∞ where the elements of Σ p are zeros apart from the diagonal elements Σ j j (θ ) which are
ones for j = 1, 2, ..., p since the observations are iid and follow a p-multivariate standard normal
distribution. Thus from (36) we have
∂ ∂
Σ j j (θ ) −→ 0 and Σ −→ 0 (37)
∂θ j ∂θ
UNDER PEER REVIEW
∂ ∂ >
G> ΣA(A> ΣA)−1 A> h −→ 0 and G ΣA(A> ΣA)−1 A> h −→ 0 (38)
∂θ j ∂θ
for j = 1, 2, ...., p as n −→ ∞. Therefore from (38) and (39), then using (35) we have
h i2 h i2
1 + b0θ j (β̂n∗ ) 1 + α ∂ ∂θ j G> ΣA(A> ΣA)−1 A> h
=
J j j (θ ) J j j (θ )
[1 + 0] 2 (40)
=
J j j (θ )
= J j j (θ )−1
as n −→ ∞ for j = 1, 2, ..., p. Since for all j = 1, 2, ...., p we have Σ j j (θ ) = J j j (θ )−1 then Σ = J−1
as n −→ ∞. Hence from (40) we have the variance for the James-Stein shrinkage estimator β̂n∗j ,
Σ j j (β̂n∗ ) converges to the CRB as n −→ ∞ for all j = 1, 2, ....., p. This means that
CRB Σ j j (β̂n∗ )
−→ = 1 as n −→ ∞
Σ j j (β̂n∗ ) Σ j j (β̂n∗ )
∗
for all j = 1, 2, ..., p. Thus the James-Stein shrinkage estimator β̂ n is asymptotically efficient.
Theorem 3 above shows that the James-Stein shrinkage estimator obtained from shrinking the
MLE achieves the CRB asymptotically. This means that the shrinkage estimator is asymptotically
efficient and hence stable when we have large sample size values. Since the initial estimator (MLE)
is known to be asymptotically efficient then we see that the shrinking process has no effect on the
asymptotic efficiency of an estimator we are shrinking.
which becomes
!
∗ p−2
β̂ n = 1− >
β̂ n
β̂ n V−1 β̂ n +
when we factor out β̂ n and drop out the n in the denominator to have a form with a lower MSE
according to the James-Stein shrinkage strategy. Let
!
p−2
k = 1− > , (41)
β̂ n V−1 β̂ n +
then
∗
β̂ n = kβ̂ n . (42)
for j = 1, 2, ...., p where β o j is the “true" jth parameter value. From the equality in (42) we have
1 ∗
β̂ n = β̂ n (44)
k
for the shrinkage value k. Therefore substituting the right hand side of (44) in (43) the sequence β̂ n j
becomes
1 ∗ 1
β̂ = β o j + O p √ ,
k nj n
hence we have the sequence
∗ 1
β̂ n j = kβ o j + O p √ k (45)
n
which is in terms of the shrinkage estimator with the shrinking effect value k such that 0 < k ≤ 1.
Analysing this sequence further shows that it satisfies the smoothness regularity conditions for the
MLE, therefore we can proceed.
Let β ∗o j = kβ o j be the true value in the shrinkage sense which is obtained when we scale the true
value β o j with the shrinkage factor k. Then the sequence (45) becomes
∗ 1
β̂ n j = β ∗o j + O p √ k (46)
n
implying that
∗ 1
β̂ n j − β ∗o j = O p √ k (47)
n
UNDER PEER REVIEW
∗
for all j = 1, 2, ...., p. This means that β̂ n j − β ∗o j is still within the neighbourhood of √1
n
since
0 < k ≤ 1. Therefore, using the second order Taylor’s theorem we have
n
∏ fβ̂ ∗ √ q
n j (xi )
∗ n ∗ 2
ln i=1
n = β̂ n j − β ∗o j n I j j (β ∗o )Z j − β̂ n j − β ∗o j I j j (β ∗o ) + O p (1) (48)
2
∏ fβ ∗o (xi )
i=1 j
(50)
and simplifying (50) becomes
n
∏ f ∗
i=1 β̂ n j (xi ) √ q ∗
∂ ∗ ∗
∗
∗ ln
n
= n I j j (β o )Z j − n β̂ nj − β o j I j j (β o ) + O p (1) = 0
∂ β̂ n j ∏ fβ ∗o (xi )
i=1 j
implying
√ q ∗
n I j j (β ∗o )Z j − n β̂ n j − β ∗o j I j j (β ∗o ) + O p (1) = 0. (51)
for j = 1, 2, ...., p where Z j ∼ (0, Vβ j ) where Vβ j is the variance of the jth element of the covariance
matrix Vβ of the distribution = G> N p (0, V) and thus Z j is the standard normal distribution for the
√ p
jth element of β̂ n . Now dividing the left and right hand side of (52) by n I j j (β ∗o ) we obtain
√ q ∗
n I j j (β ∗o ) β̂ n j − β ∗o j + O p (1) = Z j (53)
UNDER PEER REVIEW
where Z j is the distribution of the jth element of β̂ n and Z ∼ N p (0, Vβ ). Using sequence (45),
equation (53) becomes
√ q
k n I j j (β ∗o ) β̂ n j − β o j + O p (1) = Z j (54)
for some β ∗o j −→ β o j and j = 1, 2, ...., p where Z ∼ N p (0, Vβ ) and Vβ = G> VG. The distribution
Z j is a normal distribution for each jth element of the plug-in estimator β̂ n as described before in the
analysis above.
Thus equation (54) establishes the condition which implies local asymptotic normality (LAN) and
∗
differentiability in quadratic mean (DQM) for the estimator β̂ n which implies that the rate of conver-
√
gence is of order k√1 n and rate k n. This can also be achieved if we use the fact that the risk bound
of the James-Stein shrinkage estimator is bounded by that of the MLE and the latter converges at the
√ √
rate n. Hence the James-Stein shrinkage estimator is k n-consistent.
making the MLE θ̂ n a 3 × 1 matrix which implies that the dimension for the shrinkage estimator is
also 3 × 1. Now knowing that the maximum likelihood estimator θ̂ n is unbiased and the James-Stein
∗
shrinkage estimator β̂ n is biased, the following expressions were used to calculate the mean squared
error (MSE) of the two estimators. Using (55), we have
∗ 2
MSEθ (β̂ n ) = Σn (kθ̂ n ) + bθ (kθ̂ n )
2 (58)
= Var(kθ̂ n ) + bθ (kθ̂ n )
where k is a shrinkage value which shrinks the maximum likelihood estimator θ̂ n to a James-Stein
∗ ∗
shrinkage estimator β̂ n for p = 3. Thus the mean squared error of the shrinkage estimator β̂ n in (58)
is obtained by using (56). The shrinkage value k is evaluated using the expression
1 1
k = 1− >
= 1− (59)
X̄n Σn (X̄n )X̄n X̄n>Var(X̄n )X̄n
for p = 3. The commands for all expressions and plots produced in R are provided in the appendix.
We present MSE plots obtained by simulating the mean squared error using the sample size val-
ues of n = 30, 2000 and 100000. The MSE plots for both the James-Stein shrinkage estimator and
maximum likelihood estimator for each sample size value considered are plotted on the same graph
for easy comparison of the MSE trends. We begin by considering a small sample size value of n =30
to compare the the way the MSE line plots change from one particular point to the other. Since we
are interested in asymptotic behaviour, we increase the sample size value to 2000 and then 100000
to analyse the MSE trends and the rate at which the line plots become smooth.
Figure 4.1: MSE plots for the MLE and JSSE for n = 30
UNDER PEER REVIEW
Figure 4.2: MSE for the JSSE and MLE for n = 2000
Figure 4.3 : MSE plots for JSSE and MLE for n = 10000
Collectively the scaled plots show that there is some reduction in the mean squared error of the
James-Stein shrinkage estimator compared to that of the initial estimator (MLE). The trend in mean
squared error for both the maximum likelihood estimator and the James-Stein shrinkage estimator
shows that as the sample size value n increases, the MSE values converge to some value. The MSE
plots suggest that the James-Stein shrinkage estimator converges to a lower MSE value 0.9 compared
to the maximum likelihood estimator which converges to a MSE value 1.0. They also show that the
James-Stein shrinkage estimator converges faster compared to the MLE though the difference is
minimal.
properties by checking whether the new estimator (shrinkage) obtained after shrinking posses them.
Thus the study examined whether the shrinking process has an effect on these properties. The re-
∗
sults show that the James-Stein shrinkage estimator β̂ n is asymptotically consistent and efficient.
The study also showed that the shrinkage estimator (JSSE) is asymptotically biased, a property it
posses even for small values of the sample size value n. The bias is brought by the shrinking factor k
given in equation (59). We therefore see that the shrinking process introduces bias to the estimators
obtained but it preserves asymptotic consistency and efficiency, and more importantly it reduces the
MSE.
Thus the James-Stein shrinkage estimator obtained by shrinking techniques proves to be useful
though it is biased. This estimator is more effective than the maximum likelihood estimator as shown
in this study and by (Hansen, 2016). The study also showed that the JSSE is stable for large values
of the sample size value n, making it suitable in practical applications since large sample size values
are normally considered for effective estimation. Since error is always there in estimation, we justify
shrinking (minimising error) as a very important technique for yielding effective estimators.
The study has investigated the asymptotic behaviour of the James-Stein shrinkage estimator.
Asymptotic properties analysed in the study include rate of convergence, consistency and efficiency.
The results show that the James-Stein shrinkage estimator has a lower mean squared error compared
to the maximum likelihood estimator though it is biased. The results further show that the JSSE is
asymptotically consistent and efficient.
References
BARANCHIK , A. J. (1964). Multiple regression and estimation of the mean of a multivariate normal
distribution. In Technical Report, 51. Department of Statistics, Stanford University.
B ERGER , J. O. (1976). Minimax estimation of multivariate normal mean with arbitrary quadratic
loss. Journal of Multivariate Analysis, 6, 256–264.
C ARTER , R. L. AND U LLAH , A. (1984). The sampling distribution of estimators and their f-ratios
in rgression model. Journal of Econometrics, 25, 109–122.
E FRON , B. (1975). Biased versus unbiased estimation. In Advances in mathematics. Academic
press, New York.
G EORGE , E. I. (1986). Minimax multiple shrinkage estimation. Annals of Statistics, 14, 188–205.
G EYER , C. J. (1994). On the asymptotic of constrained m-estimation. Annals of Statistics, 22,
1993–2010.
H ANSEN , E. B. (2008). Generalised shrinkage estimators. www.ssc.wisc.edu/ bhansen.
H ANSEN , E. B. (2016). Efficient shrinkage in parametric models. Journals of Econometrics,
2016 (190), 188–205.
UNDER PEER REVIEW