Paper 2
Paper 2
Functions
Author(s): G. G. Walter and G. G. Hamedani
Source: The Annals of Statistics, Vol. 19, No. 3 (Sep., 1991), pp. 1191-1224
Published by: Institute of Mathematical Statistics
Stable URL: https://www.jstor.org/stable/2241946
Accessed: 22-01-2019 12:43 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
The Annals of Statistics
1991, Vol. 19, No. 3, 1191-1224
1191
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1192 G. G. WALTER AND G. G. HAMEDANI
estimate, that is, by getting a better approximation to the true prior distribu-
tion as the partial sum of a series of these orthogonal polynomials.
The orthogonal polynomials defined here are exactly the "classical" orthog-
onal polynomials considered by Tricomi (1955). In each of the six NEF-QVF
distributions, the polynomials are identified as particular types of classical
orthogonal polynomials. In some cases, however, only a finite number of them
can be used, since the conjugate prior distributions may not have moments of
all orders.
In Section 2 we shall review, for subsequent use, some of the properties of
NEF-QVF distributions given in Morris (1982). In Section 3 we define our
family of orthogonal polynomials and show their relation to those defined by
Morris (1982). Some basic properties are also discussed, including the differ-
ential equation and recurrence formulas satisfied by the polynomials. More
detailed properties are relegated to Appendix A. Section 4 introduces a
biorthogonal system related to the polynomials which is used to recover the
prior distribution from the marginal distribution. This is applied to the
empirical Bayes estimation problems in Section 5. Appendix B contains
the results for each of the six basic NEF-QVF distributions. These results are
summarized in Table 1.
In the standard Bayesian approach it is assumed that the parameter, say 6,
is fixed but not precisely known. The prior probability law g(6) has a different
character than the probability law f(xJ0) of the random variable X. It is
assumed to be a subjective measure of the investigator's prior knowledge of 0.
The observations are of the function f(xJ1), and a sample X=
(X1, X29 .., XN) therefore has the probability law
N
f(xIo) = 1f(xil6),
i=1
rl f(xbI2O)g(6)n
i=l
The X1, X2,'. .., XN are observable, but the 091, 6 2X*~ . 0N are not. The
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1193
The assumption here is that (X1, 01), (X2, 02), ... , (XN, ON) is an indepen-
dent sample from the distribution with density function f(xlI)g(O). The
conditional probability law of X given 0, namely, f(xl 4 is assumed known;
g( ) is assumed unknown but a smooth density.
Most approaches to the problem of estimating g(O) have been indirect in
that estimators are obtained not for g(O) itself, but for the moments of g(O);
see Maritz (1970). These approaches, while simple, often suffer from excess
"jumpiness" [as was observed by Berger (1985)] and should be smoothed. The
direct methods in which g(O) itself is estimated have usually been based on
step functions [see, e.g., Deely and Kruse (1968)] or Dirichlet processes [see,
e.g., Berry and Christensen (1979)] or maximum likelihood [see, e.g., Laird
(1978) or Leonard (1984)1. Laird pointed out that her method is equivalent to
the simultaneous estimation of several exchangeable parameters and leads to
an estimator with finite support. Since we shall assume that g is a smooth
density, such estimators suffer from the same difficulty as the empirical
distribution, viz. they are not mean-squared consistent.
At the other extreme lie the parametric methods in which g depends only
on a finite number of parameters. The simplest method depends on the
assumption that g belongs to a class of conjugate densities, for example, the
assumption that g is a beta density in the binomial case.
The method presented here is intermediate between the two, and may
involve a finite or infinite number of parameters. It is similar to that in Walter
and Hamedani (1987, 1989) and is based on orthogonal polynomials. It in-
volves a preliminary choice of a conjugate prior and of two parameters, the
prior mean and variance of which may be subjective (Bayesian) or estimated
from the data (parametric empirical Bayesian), followed by an improved
estimate of g based on the sample from the mixture. Then Bayes and
empirical Bayes methods are combined, but in a fashion somewhat different
than that of Deely and Lindley (1981).
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1194 G. G. WALTER AND G. G. HAMEDANI
(2.2) = f'(6)
and
dO -
(2.3) V(,) = d/(e) = d I
A NEF has quadratic variance function if V has the form
(2.8) gE() c= ( - )
n=O n.
by
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1195
1 dn
(3.1) rn(IL) = rn(A' m,A0) = (-1) n( d
n = 0, 1,2,....
where g(,u) is the prior distribution given by (2.10). Then
ro(p) = 1,
ri(,u)= m (t I- 0),
n= 1,2, ....
rn = (_1)n(VgVn-1 (n/g
n
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1196 G. G. WALTER AND G. G. HAMEDANI
{O, k <n,
in!| Vn (A)g(A)dA, k = n,
and hence rn and rk are orthogonal. Let rn have leading coefficient kn.
the normalizing factor is
fbrn2(tL)g(A) d- = B rnJ(r)g(tL) dA B
fb/.Lrn(A.)rn+l(/)g(A.) dA =A
a bAlrn(u)rn_
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1197
TABLE 1
Natural exponential families with quadratic variance functions, their conjugate prior distribution
and associated orthogonal polynomials
1 ~~~~~~~~Axe-A
Density - (X-A)2/20,2
rll2S x!
A
6 2 log A
o(A) Ar log,u
o262
ifi(6) ~ ~~~~~~l2o
41(02 ee
.2
V(,u) ol2A
(a,b) (-oo,o) (O,oo)
Zero of V(,) 0
tUp to a multiplicati
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1198 G. G. WALTER AND G. G. HAMEDANI
TABLE 1
Continued
-1( -P)
-r
O(C) log A
4fr(0) -r log(-8) r log(1 + e')
A2 2
V(,u) r A r
(a, b) (O, oo) (O, r)
Zero of V(A) 0 0, r
Std. go(,)t Aae - VL z(r - )
f(2 + a) a +,8 + 2
for m =
r r
-1 13+1
AO = -(2 + a)
m
(n/yla A -anpna
rn(A) (2 + aY AO 2+a) n!rnP r)(2$ 1
r.(a) rn(0) = (2 + a) n ()nrn r(n + 1) =
n r(2n + a + 1) r(2n + a + 3 + 1)
kn (-1) r(n+a+1) r(n+a+a 3+1)
no > maxn -1 -a 00
PROPOSITION 3.1. Let vn - JbVn(U)g(p )dp < 0 for n <nO. Then rnQ,),
given by (3.1), satisfy the differential equation
d
(3.9) d (g(Ai)V(,)rn(0)) = {nrn(A)9(0,
where 6n = n((n - 1)v2 - m).
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1199
TABLE 1
Continued
AO m mi
F(-2n- a -/3 -1)
fn= fvgt rnB(n + /3 + 1, -2n - a - - 1) (4r ) F(-n
Standard polynomial Jacobi on (1, oo) Jacobi on (- ioo, ioo)
Usual symbol Pna t)(x) Pn(a, (x)
rn(A) n!rnp(a,)( 2 + 1 (_i)n !2npn,)(a-,P )
F(n + a + 1) n, r(n + a+1)
rn(a) rn(O) = rnr + rn(ir) = (-i) 2n r(a + 1)
r'(a )ra+1
k l)n r(2n +a +f3 + 1) (-_)nr(2n +a + + 1)
n r(n + a + 1 +1) rnr(n + a + + 1)
no > max n -1a - /3 --a - 6
tUp to a multiplicative constant and po
show that
fbg(/)p(A)/ d,u = 0
and hence that p(,u) is orthogonal to -all polynomials of degree less than n.
Thus p(y) is a multiple of rn(,u), say 6nrn(,A). To see this we integrate by parts
twice,
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1200 G. G. WALTER AND G. G. HAMEDANI
We may also obtain an expression for the derivatives of rn. Indeed r1'V is a
polynomial of degree less than or equal to n + 1; it satisfies the recurrence
formula
an =nv2kn/kn+,
(kn-j/kn)Yn(m - (n - 1)V2).
All the coefficients of these two recurrence formulas may be given in term
of kn, vn and Bn. This last coefficient may be found in terms of the others a
well if the polynomials are known at one point [usually a zero of V(,)]. Then
arn(a) = Anrn+1(a) + Bnrn(a) + Cnrn-l(a)
or
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1201
they will be less appropriate in many others. Dawid (1973) investigated prior
densities with thicker tails than normal and showed that it is unreasonable to
expect the same results from analysis based upon a normal prior. Alterna-
tively, g might possess more than one mode, in which case fairly complex
analysis might be involved. In view of these observations due to Leonard
(1984), he studied the empirical estimation of the general prior density g, that
is, under no prior information about g. He pointed out that if some partial
information about g were available, then it could be used for smoothing
densities.
We are therefore interested in prior distributions which are not necessarily
conjugate distributions but are more general. In this section we shall denote
the conjugate density by go(g.) and shall allow g(,t) to be any density in
(topological) span of {r,1 in L2(go). This is not a restriction if the {r,} a
complete as they are for finite intervals.
In this case, if
by (3.8).
We shall be interested in turning the problem around and going from f(x)
in (4.2) to g(,u), that is, in finding coefficients an such that
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1202 G. G. WALTER AND G. G. HAMEDANI
the form
n UO)~~~k
(4.6) rn(A) = E C'nk
k=o
given by
CnkC
(4.7) An(X) = E -P,Pk(X, AO)
k=O ak
Then we have
(4.8) =|E0(Ak(X))rn(A)gO(A) dA
afbrk(A)rn(A)gO(/.) dA = 5nkYn,
1. go(A) may not have moments of all orders so that rn(Au)gO(u) may not
integrable for large n.
2. g(,u) may not be identifiable. This may happen if the In are not linearly
independent.
3. The topological span of {rn) in L2(go) may not include all the prior
distributions of interest.
The variance of An(X) may also be calculated from the general formula in
Morris (1983). It is
k =3
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1203
(4.11) In = fr(k)()i}2V,(
a
A)go(I) d1.
The appropriate formula is
For k = 0, InO = yn. In the other cases we apply (4.12) repeatedly to obtain:
For k = 1 we use (4.12) with p = rn, and q = rn. Then we find that
n^Yn-
br-)fbr(k-i)(A,)(r(k).(L)Vk(A L))dV()g ) dp
a
- _ Jbrn(ki-1)( (A)
a
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1204 G. G. WALTER AND G. G. HAMEDANI
tb(rn(k)(A))2Vk(A)g (u) dA
(4.17)
This is not a contradiction, since in those cases in which v2 > 0, the conjugate
prior distribution has only a finite number of moments. If v2 < 0, the binomial
case only, then we must have 1 +iv2 > 0 for j = 0,1, ... ., k - 1, that is,
r = k, where V(yi) = ,u- IL2Ir.
We shall first estimate f(x) by using density estimators similar to those used
with orthogonal functions. Then we estimate g(,u) by employing the procedure
mentioned in the last section. Finally we obtain Bayes empirical Bayes esti-
mates of the moments.
If g(pi) is a conjugate prior density, then the (Bayesian) posterior estimate
of the mean is a weighted average of 40 and the sample mean X as is wel
known. However, this assumption is excessively restrictive, since such conju-
gate priors are usually univalent. This excludes the common assumption that
mixtures consist of a linear combination of the f(xl0d). This in turn corre-
sponds to prior distributions of the form E pi (6 - Oi). A "smeared" smooth
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1205
version of this would be E piA (O - 6i), where {bin} is smooth delta family
[Walter and Blum (1979)]. Prior distributions of this form arise from MLE
[Laird (1978)] and are the form considered in Leonard (1984). If g(,u) is not
the conjugate prior density, this is no longer necessarily true and the posterior
mean is
where f is the probability law of the sample mean which is also a NEF-QVF.
This can either be estimated directly or by first estimating g(,) from the
sample. We shall adopt the latter approach, which has the advantage of giving
estimates of other moments as well.
We shall assume that g0(,U), an initial Bayesian conjugate prior distribution,
has been found and has moments up to 2n0 which may be infinity. If our
Bayesian is reluctant to specify ,u0 and m based on his subjective knowledge,
other procedures may be used. One such is to assume a noninformative prior
distribution as the initial guess for g0. This only works if the interval (a, b) is
bounded. Another procedure is to estimate ,u0 and m from a portion of the
data by using MLE or other methods and then using the conjugate prior
distribution
(5.4) ak =- E Ak(Xi)Yk 1
and
p
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1206 G. G. WALTER AND G. G. HAMEDANI
The variance of ak may be obtained from that of Ak(X), which in turn may
be based on (4.10). Indeed we have
LbE[ OX-f(x)]2dF0(x) = ( N)
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1207
THEOREM 5.3. Let go(A) have moments of every order and let h(,) be a
bounded continuous function such that Tqh E L2(go), for some positive inte-
ger q, where T is the differential operator given by
To = V+" - rlqY.
Let fip and fp be given by (5.5) and (5.6), respectively. Then for some constants
C1 and C2 independent of N and p and each E> 0,
h(,u)= E akrk(L).
k=O
ak = |h(L)rk()gO() dt/Yk
(5.8) f
Ta | qh(A)rk(A)gO(A) d/AkYk
and
rb 0ar0 g 2 00 (00
(5.10) |b[ E akrt(y)] g
a k=p+l ] k=p+l k=p+l
Since by (3.9), 6k = 0
that (5.10) is dominated by (p + )1l+e-2q for each ? > 0.
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1208 G. G. WALTER AND G. G. HAMEDANI
p b k , 2
E (| E (rk j(1i)] V
k =1 kj=1
In order to evaluate this expression, we use (4.18) to find that for 1 + iv2 > O,
i = 1,2, ...,)p,
1 P k
= - E (1 + m - (k - 1)V2)kIIhl.
1 + p-1) +I~1.
< const. N(m + (p - )*21)
Hence by combining (5.12) and (5.10) we reach the first conclusion. The second
follows from the first by Schwarz's inequality. o
for both Afp and 4p, where p + 1 = O(log N/2 log m).
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1209
The estimates of the posterior mean and variance arising from the estimate
of g(,u) = h(,.)go(pz) are
b n n
=i A fl f(x, 0) E akrk(A)g0(A) dA E aklk(X)
a k=O i
(5.13) n n
( E ak(Aklk+l(X) + Bklk(X) + Cklk(x))/ E aklk(x).
k=O /k=O
lk(X) = E(rk(A)IX = x)
and using the moment calculations obtained from Theorem 5.2 of Morris
(1983),
m + 1 E(V(A) IX = x) + xorl(x0)
m
V(Xo) (m + 1)(m + 1 - v2) + x0r1(xO).
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1210 G. G. WALTER AND G. G. HAMEDANI
EXAMPLE 1. In the binomial case, since the interval (a, b) is bounded, the
use of a noninformative prior is possible. If the interval is normalized to (0, 1)
by using p = ,/r (see Table 1) as the parameter, the setting is exactly the
same as in Walter and Hamedani (1987). The resulting polynomials are the
Legendre polynomials. These were used to estimate p from the past data
(5, 4, 5, 5, 0) and current value 5 from a binomial mixture with r = 5. The
results were
3Tr
g (p) = -j(2 sin(7Tp) + sin(37rp)),
and samples of size 15 were taken from the resulting marginal distribution.
The results were
in which the subscript denotes the number of terms in the estimate of g(p). In
this example a different sample was used in each of these cases as well as a
different current value. These were also generated randomly and were, respec-
tively, x = 3,5,5,3 for the four cases. The true value of the E(p) was of
course 0.5.
The expression for 4(p) was compared to that of g(p). For approximation
by a fourth degree polynomial (five terms in the series), the correct shape was
observed even when samples as small as 5 were taken.
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1211
Since the conjugate prior is also normal in this case, it has the form
0z2
go(A) = Kexp m,4oO - -
where 0 = ,A. In this case we cannot take the trial prior to be noninformative
since the interval (a, b) is infinite. Accordingly, we take it to be as simple as
possible with ,uo = m = 1. The polynomials rn(Au) are by (B.1.3)
for
go() 7= / expt
The polynomials Pn(x, ,) similarly may be found to be
= 1)n/2(X /L)
Pn(X, ) = Hnf
must satisfy
Ak(X) = Hk( 1 2
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1212 G. G. WALTER AND G. G. HAMEDANI
p /I -1 ( (//2 - 1)2\
4(1 an= E d2n/2Hnf , )(27r) exp(
n=O 2 2
where
i N IXi -
an N Hn 2
= 1 (ex{ A( - 1) 2}
RGO.} = exp 2 )
[ 1- H(4 ) H2y )
This estimate is very crude given the small sample size and the small number
of terms used. The mean is just 1 plus the coefficient of H1, in this case,
,u = 0.64. (This is not the posterior mean, but rather an estimate of the prior
mean.)
For a sample of size 10 with the same seed we have
a3 = 0.402, a4 -0.0052,
for the prior density. The O6 are chosen based on a sample xl, x2, ... , xm by
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1213
L = H E exp(xjoi - oo(i))
j=1 i=1
with respect to each Oi. Since this estimator shares the shortcomings (and
advantages) of the empirical distribution, it cannot be mean-squared consis-
tent. However, a smoothed version should be. This can easily be obtained in
terms of our orthogonal polynomials by approximating 3(/1 -,ui) by t
partial sums of its polynomial expansion,
00
This approach has not yet been explored but shows promise.
APPENDIX A
(A.2) yn = knn!vn
and may be found explicitly if kn is calc
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1214 G. G. WALTER AND G. G. HAMEDANI
kn ~kn-1 Yn n vn
(A.4) A kn' C n= k_ =nl
k+ oyn-, n V_
k' k'n1
(A.5) B= - ___
n n+1
By again using the differential equation, we can find k'n to be [Tricomi (1955),
page 1371
d2y (M (v1+d+2V2A0\m dy 6n
=n n- 1-2)
V2 V2
the hypergeometric
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1215
m -m , a
(A.9) rn(As) = c.F - n,n -) 12dv2
- (d + v- o
+ 2V2
b
where cn is a constant. Since F(a, b; c
Cn = rn(a).
vo d 2y m dy m
(A.10) y v d ---( AO) n-Y = ?.
By letting , = a + bt, a = -vo/v1, b = vl/m, we obtain the confluent
hypergeometric equation
d 2y dy
dt2 - 2t- + 2ny = 0.
d 0(a)(b) v
-F(a, b; c,x)
dxv
d m
drn(,u, A,) = rn(a, AO)enF -n + 1,n - -;
(A.13)
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1216 G. G. WALTER AND G. G. HAMEDANI
where
2n(n - 1 - m/V2)V2
e
But
Id
rn-l(/A1.o- -
m -m d A ,ua
xF -n + 1,n -- 2 d + ul + 2V2 AiO - 'I;
since a and b depend only on V(no) and not the prior paIameters m and b (.
Thus we have
d - rn(a, L ) en dl
(A.14) n(A,(t, O) = (-i)/ rn-1E(P (mOt- mm}
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1217
APPENDIX B
B.1. Normal. In the normal case we have ,u = Oor2, if(0) = 02cr2/2 and
V(y) = 2. The conjugate prior distribution is a constant times
1
(B.1.1) gg(p) = _2 exp(m(t00 - .202 /2))
1
= -2 exp((m,t/2cr2)(2pu0 - .x)).
The polynomials satisfy the Rodrigues formula
B.2. Poisson. In the Poisson case the parameters 0 and ,u are related by
,u = e' = ql(0) = V(,), where ,u may take values in fl = (0, oc). There is an
immense literature in this case, most of which deals with estimation of the
parameter 0 [Hudson (1978)]. That their mean-squared error is often better
than ours is not surprising given the generality of our method. The conjugate
prior is a multiple of
1
(B.2. 1) gO( t) =-exp( m A 0 log , - m t ) = m'O0 -1 exp( - mI. ).
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1218 G. G. WALTER AND G. G. HAMEDANI
rn(A) n 2 dp
(B.3.2) () ( 1) n2+mremLor/A ~dn m2
(,.2n-mr-2e-mIor/).
dn
(B.3.3)~yn(a)(X) 2 2-nX -'e2/x .. 2n+a -2/
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1219
In the normalized case =uo = 1/mr = -1/(2 + a), the vn are found to be
00
This last expression can also be given by [Tricomi (1955), page 218]
=frmr+1-nX2n-a-2e--mLorx dx
0
= ,A(r - ) r-(a+P+l).
With a change of scale (i.e., r = 1), this leads to the usual Rodrigues formula
for the Jacobi polynomials on (0, 1),
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1220 G. G. WALTER AND G. G. HAMEDANI
The Bayes empirical Bayes problem has already been treated for this case in
Walter and Hamedani (1987). They also considered the case of a noninforma-
tive initial prior which led to the Legendre polynomials. The more general
problem in which the indices ni of the binomial distribution are allowed to
vary was not considered, but may be attacked by the method of Leonard
(1976). The recurrence formulas for r = 1 are well known [see Szeg6 (1967),
pages 71-721 as is the differential equation. We observe merely that rn(
F(n + a + 1)/F(a + 1), that the leading coefficient is
k F(2n +a +,B + 1)
n r(n +a +13 + 1)
and
The {rn(O)} are complete in L2((O, r), pJ(r - 1)a) but the correspondin
{ln(x)} given by (4.3) are not linearly independent since x has only r + 1
distinct values. Hence to avoid problems with identifiability of g(,u)=
h(y)go(,u), we must restrict h(,i) to the span of {ro, rl, ... , rr}.
B.5. Negative binomial. In the case of the negative binomial the mean is
given by A = r/(e - - 1), qr(6) = -r log(1 - e@) and V(,u) = a + k&2/r.
conjugate prior distribution may be expressed as a constant times
= 1(r + ,)ar(a+P+l)
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1221
rnf 2 ) (-2)n( 2 ) 2)
dn {x -1 n+(x + 1\ n+a\
x 1 1 1 1
(B.5.3) dXn 2 J 2Jj
1 2 )(x-1) (x + 1) a _
= (-1) n!P, eP)(x) = n!P( -(x).
However, since the interval in x is (0, oo), many of the standard calculations
do not hold. The moments vn are given by
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1222 G. G. WALTER AND G. G. HAMEDANI
x d ((r - i,)(mAoi/2)-(mr/2)-l+n(r +
d2y dy
(B.6.4) (1 - X2) dX
by the change of variable ,u = ixr. But this is the equation of the Jacobi
polynomials on (-1, 1), with the solution
y = pn(a, 0 3( x )
d n+f
( (1 X) +) dxn(( )( )' )
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
EMPIRICAL BAYES ESTIMATION 1223
2 iir2 vi-
- (cos - 1 cos(yO) dO
(B.6.9) 21-vF(v)
REFERENCES
BERGER, J. 0. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer,
New York.
BERRY, D. A. and CHRISTENSEN, R. (1979). Empirical Bayes estimation of binomial parameter via
mixtures of Dirichlet processes. Ann. Statist. 7 558-568.
CHIHARA, T. S. (1978). An Introduction to Orthogonal Polynomials. Gordon and Breach,
New York.
DAWID, A. P. (1973). Posterior expectations for large observations. Biometrika 60 664-667.
DEELY, J. J. and KRUSE, R. L. (1968). Construction of sequences estimating the mixing distribu-
tion. Ann. Math. Statist. 39 268-288.
DEELY, J. J. and LINDLEY, D. V. (1981). Bayes Empirical Bayes. J. Amer. Statist. Assoc. 76
833-841.
ERDPLYI, A. (ed.) (1954). Tables of Integral Transforms 1. McGraw-Hill, New York.
HUDSON, H. M. (1978). A natural identity for exponential families with applications in multipa-
rameter estimation. Ann. Statist. 6 473-484.
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms
1224 G. G. WALTER AND G. G. HAMEDANI
This content downloaded from 118.185.164.5 on Tue, 22 Jan 2019 12:43:00 UTC
All use subject to https://about.jstor.org/terms