A Tutorial On MM Algorithms
A Tutorial On MM Algorithms
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve
and extend access to The American Statistician.
http://www.jstor.org
_General_
A Tutorial on MM Algorithms
R. Hunter
David
Most
in frequentist
problems
a function
are
rithms
as
such
Lange
and Kenneth
the most
among
involve optimization
statistics
or a sum
a likelihood
effective
of
EM
squares.
of
algo
for maximum
algorithms
continued.
log-likelihood.
a surrogate
of
optimization
as
function
ing
an
minorizing
MM
some
discusses
merous
of
for
and
some
error
standard
KEY WORDS:
We
to illustrate
the concepts
on
optimization;
EM
algo
constrained
algorithm;
designed
but most
practical maximum
must
problems
presentation
be
likelihood
solved
can
be
solved
analytically,
numerically.
This
article
dis
is a generalization
of
the well-known
EM
algo
R. Hunter
is Assistant
30
The American
Statistician,
at UCLA,
February
2004,
other
among
places,
Berge 1992), and inmedical imaging (De Pierro 1995 ;Lange and
Fessier 1995). The recent survey articles of de Leeuw (1994),
Heiser (1995), Becker, Yang, and Lange (1997), and Lange,
Hunter, and Yang (2000) deal with the general principle, but it
is not until the rejoinder of Hunter and Lange (2000a) that the
acronym MM first appears. This acronym pays homage to the
names
and
"majorization"
"iterative
of
majorization"
the
INTRODUCTION
for
reappears,
principle
Newton-Raphson.
1.
Maximum
of estimation
new material
MM
The
earlier
nu
include
estimation.
Constrained
Minorization;
Majorization;
MM
them,
constructing
features.
the article
introduces
article
optimization
arguments
opinion,
this
mation
methods
attractive
their
throughout
examples
described.
rithms,
some
suggests
algorithms,
and
In our
function.
objective
algorithms deserve
sional statisticians.
for majorize
problems,
and
the
for maximize.
(We
second
the first M
define
the
terms
"majorize"
"minorize"
and
view,
MM
are
algorithms
to understand
easier
and
complete
observed
data.
The
to a constant,
up
2004 American
function.
minorizing
Statistical
surrogate
In maximization
for minimize.
created
by
of the
to the
the E-step
In the M-step,
is,
this mi
Association
DOT
10.1198/0003130042836
o
o
-1
-2
1. For q = 0.8,
Figure
minimized
that
is
f(6)
by
(a)
(b)
typical
of MM
applications
revolve
around
this
section
with
a note
on
nomenclature.
downhill.
its quadratic
function
the objective
(b) shows
for9(m) = 2.5.
majorizer,
care
We
function
pq(9) and its quadratic majorizing
1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 5, along with
Just
= /
f^(m))(2)
<
|
directly from the fact #(0(m+1)
| 0(m))
g(9^m)
anMM
and
The
lends
definition
descent
(1).
property (2)
?(m)^
algorithm remarkable numerical stability. With straightforward
rather than
changes, theMM recipe also applies tomaximization
a function f(6), we minorize
it by
minimization: To maximize
to
a surrogate function g(6 \ 0^)
and maximize g(0 | 0^)
follows
produce
2.1
Calculation
of Sample Quantiles
algorithm."
2.
THE MM
a one-dimensional
As
PHILOSOPHY
parameter 0, and let
of 9 whose form de
is said to majorize a
puting
sample
0,
a sample
from
quantile
of com
the problem
xn
xi,...,
of
q G
(0,1),
a qth
sample
quantile
of X\,...,
xn
real
that
minimizes
the function
provided
> f(0)
for all
g(0\eW)
Uj
=
/(<9(m)).
p(0(m) I0(m))
consider
example,
f(0)
Y,P?(xi-6)>
(3)
Statistician,
February
31
derived
by
Uo\e(m))
under
and
its,
an
with
composition
relation
of
the formation
sums,
between
functions
nonnegative
lim
These
rules
function.
increasing
is
products,
3.1
stemming
some common
inequalities
section
outlines
or
majorizing
of objective
10^1}.
1{|^|
various
This
concavity.
to construct
+(49-2)0 +
the majorization
Fortunately,
closed
from
from
functions
minorizing
or
convexity
inequalities
used
for various
types
functions.
Jensen's
Inequality
The function f{9) and its majorizer g(9 | 0(m)) are shown in
? 12.
Figure 1(b) for a particular sample of size n
Setting the first derivative of g(9 | 0(m)) equal to zero gives
the minimum point
w- (m)rx.
"(2<7-i)+Er=i
?
\x% 0(m)|_1 depends on 0(m). A
flaw of algorithm (5) is that the weight i*4
is undefined when
ever 0(m) = x%. Inmending this flaw, Hunter and Lange (2000b)
also discussed the broader technique of quantile regression in
troduced by Koenker and Bas sett (1978). From a computational
=
the weight w?
where
iteration.
For
the case
of
the
sample
median
(1973)
gorithm (5) is found in Schlossmacher
anMM algorithm by Lange and Sinsheimer
(1995).
in Equation (4) is a
Because g(9 \ 9^)
of 9, expression (5) coincides with the more
Raphson
(m+l)
(q
If h(x)U
(4)
?
1/2),
quadratic function
general Newton
update
left-hand
side
above
?(m)
V2#(0(m) |0(r
Vg ( 6>(m)| 6>(m)
In
b{X)
and we
vanishes
so
1,
the
obtain
Minorization
Jensen's
perplane
via Supporting
inequality
of a convex
property
Hyperplanes
Any
linear
tan
function
(7)
+V/^0(m)V(0-0(m)),
Majorization
al
and is shown to be
(1993) and Heiser
<
ln<E
it, then we
can
K,(t) is convex
use
the
of Convexity
definition
of convexity;
namely,
if and only if
(6)
where V#(0(m)
dient vector and
at 0(m). Because
In the
provided this update decreases the value of g(9 \ 9^).
context of EM algorithms, Dempster et al. (1977) called an al
it
gorithm that reduces g(9 \9^) without actually minimizing
a generalized EM (GEM) algorithm. The specific case of Equa
tion (6), which we call a gradient MM algorithm, was studied in
the EM context by Lange (1995a), who pointed out that update
(6) saves us from performing iterations within iterations and yet
still displays the same local rate of convergence as a full MM
algorithm thatminimizes g(9 | 0(m)) at each iteration.
3.
In the quantile
OF THE TRADE
TRICKS
example
of Section
many
majorizing
or
2.1,
as depicted
minorizing
"vee"
in Figure
relationships
(8)
^2q??k(U)
For
xt6.
instance,
for
suppose
vectors
that we make
Inequality
Now
inequality
x,
0,
and
0^m)
)/a? + xt9^rn\
(9)
?im)) + z*0(m)
<9i
if all components
Alternatively,
=
we
then
itive,
may take ti
Xi9?m /xt9(rn\
the convex
<
k I
^^OLiU
of x, 0, and 9^
xt9^9i/9?n^
(8) becomes
are pos
?
and a?
func
1(a). In
may
be
Ktfe) < ?
xte^ej
ztQ{m)
General
(10)
via a Quadratic
Majorization
(1995)
0-0(m)
(0-0(m))2
?3
(0("O)2~+
3.5
of Section
example
Mean
2=1
If we
let xi =
eli,
then we
2=1
obtain
the
standard
rithms
are
In other
separated.
2=1
the
form
Because
inequality.
function
surrogate
4.1
Poisson
to optimize
easier
sur
mapping
at each
iteration.
Sports Model
with
process
where
e0i~dj,
intensity
is an
o?
Pij{oi -dj)
Uje0i~dj
the expo
InUj
+pij
lnp^-l,
(14)
where 0 = (o, d) is the parameter vector. Note that the parame
=
ters should satisfy a linear constraint, such as ]T\ o i+ V
dj
in
order
for
the
model
be
0,
identifiable; otherwise, it is clearly
possible to add the same constant to each o? and dj without
altering the likelihood. We make two simplifying assumptions.
First, different games are independent of each other. Second,
each team's point total within a single game is independent of
point
total.
The
are
performances
second
is more
assumption
sus
2=1
mean
the arithmetic-geometric
the
words,
function
surrogate
defensive
of
on
rely
its opponent's
\
often
spaces
parameter
inwhich
rogate functions
?ij{9)
Inequality
SEPARATIONOF PARAMETERS
AND CYCLICMM
"offensive
6.
The Arithmetic-Geometric
4.
follows
is the Cauchy-Schwartz
inequality. De Leeuw and Heiser
(1977) and Groenen (1993) used inequality (13) to derive MM
algorithms for multidimensional
scaling.
Bound
Upper
1
1
0 W)~
which
somehow
to one
unrelated
and
another;
nonetheless
the model gives an interesting first approximation
to reality. Under these assumptions, the full data log-likelihood
is obtained by summing ?ij(9) over all pairs (i,j). Setting the
partial derivatives of the log-likelihood equal to zero leads to the
equations
^1^2
/
_<
i
2 x2
1
X1-t?r+X2?7?T
(m)
2 ^1
(m)
?cy
The Cauchy-Schwartz
/io\
(12)
?-*
tions
=
w
and -
-^
satisfied by themaximum
Inequality
llfljll
?'**
E,^
inequality is used
The Cauchy-Schwartz
inequality for the Euclidean norm is a
case
of
The function n(9) = ||0|| is convex
(7).
special
inequality
because it satisfies the triangle inequality and the homogeneity
condition ||a011= \a\ \\9\\. Since k(9) =
#2, we see that
^/^
=
and
therefore
(7)
inequality
gives
Vft(0)
0/||0||,
* ^(m)?+l
11*11
do
not
a closed
admit
j H3 ?e"
form
so we
solution,
turn
to an MM
algorithm.
Because
function.
need
minorizing
>
-
Focusing
on
the
(14), we
term,
?tije0i~dj
-^
the log-likelihood
, -hL&f]?\
2
eo(->+4"*>
ie?r,+*r\
(15)
The American
Statistician,
February
33
Table
Regular
According
Defensive
Strength.
the Basis
Offensive
team played
82 games.
Plus
Strength
0/ + Of/
Team
Wins
the 2002-2003
of
to Their Estimated
Each
o; + Of/
Team
on
Teams
of all 29 NBA
1. Ranking
Season
Wins
4.2
to National
Application
Results
Cleveland
-0.0994
17
Phoenix
0.0166
Denver
-0.0845
17
New Orleans
0.0169
47
Toronto
-0.0647
24
Philadelphia
0.0187
48
Miami
-0.0581
25
Houston
0.0205
43
Chicago
Atlanta
-0.0544
30
Minnesota
0.0259
51
-0.0402
35
LA Lakers
0.0277
50
in minutes.
LAClippers
-0.0355
27
0.0296
48
Memphis
New York
-0.0255
28
Utah
0.0299
47
period, if necessary,
-0.0164
37
Portland
0.0320
50
-0.0153
37
Detroit
0.0336
50
-0.0077
44
New
0.0481
49
-0.0051
38
San
0.0611
60
Orlando
-0.0039
42
Sacramento
0.0686
59
Milwaukee
-0.0027
42
Dallas
0.0804
60
Washington
Boston
State
Golden
Seattle
Indiana
Jersey
Antonio
44
40
0.0039
Although the right side of the above inequality may appear more
complicated than the left side, it is actually simpler in one impor
tant
o? and
components
parameter
respect?the
dj
are
separated
on the right side but not on the left. Summing the log-likelihood
(14) over all pairs (?, j) and invoking inequality (15) yields the
function
=
fl(0|0(m)) ?E
tin
Pij(oi
-dj)
%
i
e o(m)+d(m)
?om;+d
minorizing
and
reduces
substantially
costs.
computational
Setting
the par
the updates
Association
to
score
4:Se?l~dj
lasts
game
regular
48 minutes,
and
overtime
each
team
against
the
j when
teams
two
two
of
Poisson
independent
random
variables.
Third,
one
Lij ?-2d,
Basketball
statistics.
sports
5. SPEED OF CONVERGENCE
j
E7-P'W
o (m+l)
(m)_Am)
2-^j
MM
algorithms
tije
plementary
(m+l)
J2^P^J
In
(16)
o nL>+d\
rithms
boast
strengths.
a
quadratic
point
optimum
.Zik
soon
as they
become
available.
For
instance,
T,iPij
In
J+d\
if we
Newton-Raphson
as
they
convergence
words,
under
certain
update
?
||0( +i)_0*||
_
J{m)
0*||2
near
general
algo
a local
condi
(16),
for
some
faster
the
linear
rate
quadratic
of
lim
m^oo^?-??||0M
(17)
c. This
constant
than
"
lim
the
.Eik
34
hand,
rate of
tions,
one
On
In other
0*.
and Newton-Raphson
rate of
convergence
is much
convergence
c <
(18)
0*||
MM
be
far more
onerous
computationally
algorithm. Examination
0(m+i) =
than
an
iteration
of the form
?(rn)-v2/(^(m))_
General
vy(0(m))
of an
of
iteration
Newton-Raphson
that
reveals
it requires
trast,
on
MM
well-designed
but
to p3. By con
arithmetic
parameters
iteration.
per
operations
algorithms
For
than Newton-Raphson.
an
sometimes
enjoy
advantage
algorithms
for
linear
the
constraint).
iterations
matrix
to MATLAB.
according
iteration
Raphson
requires
even
Thus,
more
inversion
Newton
single
in
computation
this
example
also
stability
the balance
point.
al
of
the value
to appropri
guaranteed
the objective
at
function
iteration.
every
scoring,
can match
until
convergence,
mation
matrix
them
each
scoring
information
iterations
The
expected
infor
is sometimes
easier
to eval
of Newton-Raphson.
matrix
Quasi-Newton
or even
mitigate
eliminate
the need
for
any
best
algorithm
optimization
In our
overall.
ex
In most
cases,
covariance
formation
is often
matrix.
a maximum
matrix
likelihood
to the
equal
In practice,
the
by
well-approximated
estimator
inverse
of
the
has
expected
information
matrix
the observed
information
matrix
in
M(??)
to be the MM
algorithm map
taking 9^
| i?)
matrix.
identity
These
(20)
J t?=?
were
formulas
de
are
equal
at the
point
of minorization;
and
the gradient
second,
2, j entry equals
AM, w
(21)
where
An MM Algorithm
for Logistic
Regression
numerical
methods,
we
an
consider
of the var
the
in which
example
of the log-likelihood
is easy to compute. B?hning and
Lindsay (1988) apply the quadratic bound principle of Section
3.4 to the case of logistic regression, inwhich we have an n x 1
asymp
expected
and define
(19)
Hessian
6. STANDARDERROR ESTIMATES
totic
the
dOj
ground. Although
in required
methods
gradient-free
denotes
are based on
to ?V2?{9)
approximations
rived by Lange (1999) using two simple facts: First, the tan
gency of ?(9) and itsminorizer imply that their gradient vectors
such as
algorithms,
Newton-Raphson
has
its own merits.
in Fisher
used
the observed
than
or
like Nelder-Mead,
of
optimization
methods,
quasi-Newton
none
uate
are
algorithms
or decrease
increase
ately
MM
contrast,
By
A Newton-Raphson
sheet.
where
via MM
a>v^'*>
reason
this
in computational
single
Thus,
For example, the Poisson process scoring model for the NBA
dataset of Section 4 has 57 parameters (two for each of 29 teams
one
Differentiation
V2?{9)
speed.
minus
Numerical
takes
usually
iterations
simpler
MM
separates
to invert
needed
calculations
is roughly proportional
that
algorithm
of p or p2
the order
of
the number
then
components,
6.1
evalua
If 0 has p
vector
of binary
and
responses
an n
x p matrix
of predic
= 1
stipulates that the probability 7r?(0) that Y%
equals ex.p{9txi}/
Straightforward differen
(1 + exp{0tx2}).
tiation of the resulting log-likelihood function shows that
tors. The model
V2?{9)
=
-^^(0)[l-^(0)]x,4
2=1
?
Since 7Ti(9) [1 tt1(9)} is bounded above by 1/4, we may define
the negative definite matrix B =
and conclude that
?\XlX
?
B is nonnegative definite as desired. Therefore, the
X/2l(9)
quadratic function
- 0(m)
= ?
+ V?
g ?9 I
0(m)) (9{rnA
(0(m)V(0
+-
to 0O+1).
The American
Statistician,
(0-0(m)V?
February
35
2.
Table
and
Coefficients
Estimated
Logistic
Exact V2l{6)
Constant
AGE
LWT
RACE2
RACE3
SMOKE
PTL
HT
Ul
FTV
mizing
on:
based
0.48062
1.1969
1.1984
1.1984
-0.029549
0.037031
0.037081
0.037081
0.0069194
0.0069336
0.0069336
0.52736
0.52753
0.52753
0.44076
-0.015424
1.2723
minorizes
for the
Example
errors
Standard
Variable
Errors
Standard
Regression
0.8805
0.44079
0.44076
0.93885
0.40215
0.40219
0.40219
0.54334
0.34541
0.34545
0.34545
1.8633
0.69754
0.69811
0.69811
0.76765
0.45932
0.45933
0.45933
0.065302
0.17240
0.17251
0.17251
0(m+l)
?(m)
Vj
Vj(9im))\-lnVj(9)+lnVj(9{m)))
Adding
?-lyg
algorithm of Equation
an
it enjoys
once,
as
Newton-Raphson
Application
We
now
test
\y-tt
.(22)
(0(m))
increasing
computational
of predictors
the number
negative.
optimization
For
example,
we
Here
>0,
tuning
parameter
construct
the
function
Vj (0(*0?
increases
error
Data
based
approximations
on Equa
except
for
race,
which
is a three-level
purposes.
7. HANDLING CONSTRAINTS
Many
positive
In
eters.
v3{9{rn)) v3{9).
1988).
the standard
for practical
advantage
factor with level 1 for whites, level 2 for blacks, and level 3 for
likelihood estimates
other races. Table 2 shows the maximum
errors
for
the
10
standard
and asymptotic
parameters. The dif
ferentiation increment Sj was 0j/lOOO for each parameter 9j.
The standard error approximations in the two rightmost columns
turn out to be the same in this example, but in other models they
enough
s)
is quantitative
>
?
s~1(t
+V^-(0(m))t(0-0(m))
tions (19) and (20) on the low birth weight dataset of Hosmer
and Lemeshow
(1989). This dataset involves 189 observations
and eight maternal predictors. The response is 0 or 1 accord
ing to whether an infant is born underweight, defined as less
than 2.5 kilograms. The predictors include mother's age in years
(AGE), weight at lastmenstrual period (LWT), race (RACE2 and
RACE3), smoking status during pregnancy (SMOKE), number
of previous premature labors (PTL), presence of hypertension
history (HT), presence of uterine irritability (UI), and number of
physician visits during the first trimester (FTV). Each of these
predictors
?9{m))
by
only
over
Vj(9{m)) [-ln^-(0)+lnt;J(0(m))
= 9{m)4(XtX)-1X?
Since theMM
?0(m))
by maxi
algorithm proceeds
problems
parameters
a
discuss
constraints
impose
are often
required
majorization
technique
on
9-9{rn)\tVvj
?0(m))
(23)
in
majorizing f{9) at 9^m\ The presence of the term lnvj(0)
<
The
from
0
(23)
prevents ^(0(m+1))
Equation
occurring.
of mi?j(0)
multiplier Vj(9^)
gradually adapts and allows
there
v.{Q(m+1)) to tend to 0 if it is inclined to do so. When
are equality constraints A9 = b in addition to the inequality
constraints v3; (0) > 0, these should be enforced during the min
imization of g (9 \9^).
7.1
Multinomial
Sampling
param
to be
that
non
in a
sense eliminates
General
de Leeuw,
of Correction Matrix
J., and Heiser, W. J. (1977), "Convergence
for Multidimensional
Algorithms
of
Scaling," in Geometric Representations
Relational Data, eds. J. C. Lingoes, E. Roskam, and I. Borg, Ann Arbor, MI:
Mathesis
735-752.
Press, pp.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), "Maximum Likelihood
from Incomplete Data via the EM Algorithm," Journal of the Royal Statistical
Society, Ser. B, 39, 1-38.
Lagrangian
h(9) = 5(0|0^) +
A???-lJ.
equal to zero and multiplying
Setting dh(9)/d9l
-rii
cj02(m)
uj9% +
n?+o;0
1
n +
?(m+i)
components.
positive
\9%
final
components
/^(m)
%
cj V
_ n?\ '
n )
the parameter
zero.
equal
one
where
or more
of
demonstrates that 9^
approaches the estimate 9 at the linear
rate uj/(n + uj), regardless of whether 9 occurs on the boundary
of
Theory and
"Convergent Computing
by Iterative Majorization:
inMultidimensional
inDe
Data Analysis,"
in Recent Advances
Applications
ed. W. J. Krzanowski,
Oxford: Clarendon
Analysis,
scriptive Multivariate
Press, pp. 157-189.
New
S. (1989), Applied Logistic Regression,
Hosmer, D. W., and Lemeshow,
York: Wiley.
-(1995),
rearrangement
n +
for
Algorithm
IEEE Transac
toMultidimensional
Scal
Groenen, P. J. F. (1993), The Memorization Approach
DSWO Press.
Leiden, the Netherlands:
ing: Some Problems and Extensions,
Heiser, W. J. (1987), "Correspondence
Analysis with Least Absolute Residuals,"
Statistical Data Analysis,
5, 337-356.
Computational
the update
cj
_ ^i =
1
n
0(m+l)
0.
Summing
Hence,
by 9%gives
its components
convergence
is too mathematically
algorithms
properties
to present
demanding
0?-(2000b),
putational
here.
Fortunately,
without
change
to MM
are
there
Furthermore,
algorithms.
of a Class of
Kiers, H. A. L., and Ten Berge, J. M. F. (1992), "Minimization
Matrix Trace Functions by Means
of Refined Majorization,"
Psychometrika,
57,371-382.
Koenker, R., and Bassett, G. (1978), "Regression Quantiles," Econometrica,
46,
33-50.
Lange,
K.
(1994),
"An Adaptive
Barrier Method
there
known,
already
some
new
results
here.
Our MM
are unable
to cite
them
all. Readers
should
explained
tantly,
MM
we
hope
as MM
simply
this article
will
Even
algorithms.
stimulate
readers
be
on
that can be
more
to discover
impor
new
algorithms.
[Received
September
Programming,"
to the EM Algo
Equivalent
Ser. B, 57, 425-437.
of the EM Algorithm,"
Statis
Numerical
New York:
Analysis for Statisticians,
Verlag.
Lange, K., and Fessier, J.A. (1995), "Globally Convergent Algorithms
imum A Posteriori Transmission
IEEE Transactions
Tomography,"
1430-1438.
Processing A,
treat -(1999),
ment
ature
for Convex
2003.]
REFERENCES
Becker, M. P., Yang, I., and Lange, K. (1997), "EM Algorithms Without Missing
inMedical Research,
Data," Statistical Methods
6, 38-54.
Bijleveld, C. C. J. H., and de Leeuw, J. (1991), "Fitting Longitudinal Reduced
Rank Regression Models
56,
by Alternating Least Squares," Psychometrika,
433-447.
of Quadratic Approx
B?hning, D., and Lindsay, B. G. (1988), "Monotonicity
imation Algorithms,"
Annals of the Institute of Statistical Mathematics,
40,
641-663.
S. A.
"Proximal Minimization
With D
Censor, Y., and Zenios,
(1992),
73, 451-464.
Functions," Journal of Optimization
Theory and Applications,
de Leeuw, J. (1994), "Block Relaxation Algorithms
in Statistics, in Information
eds. H. H. Bock, W. Lenski, and M. M. Richter,
Systems and Data Analysis,
Berlin: Springer-Verlag,
pp. 308-325.
Springer
forMax
on Image
Transfer using
Lange, K., Hunter, D. R., and Yang, I. (2000), "Optimization
Journal of Computational
(with discussion),
Surrogate Objective Functions"
and Graphical
Statistics, 9, 1-20.
J. S. (1993), "Normal/Independent
Distributions
Lange, K., and Sinsheimer,
and Their Applications
in Robust Regression,"
and
Journal of Computational
Statistics, 2, 175-198.
Graphical
(2nd ed.), Read
Luenberger, D. G. (1984), Linear and Nonlinear
Programming
ing, MA: Addison-Wesley.
Football Scores," Statistica Neer
Maher, M. J. (1982), "Modelling Association
landica, 36, 109-118.
and
Marshall, A. W., and Olkin, I. (1979), Inequalities: Theory ofMajorization
its Applications,
San Diego: Academic.
G. J., and Krishnan, T. (1997), The EM Algorithm and Extensions,
McLachlan,
New York: Wiley.
and Rubin, D. B. (1991), "Using EM
X.-L.,
Meng,
Variance-Covariance
Matrices:
The SEM Algorithm,"
can Statistical Association,
86, 899-909.
to Obtain Asymptotic
Journal of the Ameri
A
"Maximum Likelihood
via the ECM Algorithm:
Estimation
Framework, Biometrika,
800, 267-278.
via the EM
of the Information Matrix
Oakes, D. (1999), "Direct Calculation
Journal of the Royal Statistical Society, Ser. B, 61, Part 2, 479
Algorithm,"
482.
?-(1993),
General
Statistician,
February
37