0% found this document useful (0 votes)

172 views

A Tutorial On MM Algorithms

1. MM algorithms are a general class of iterative optimization algorithms that encompass EM algorithms as a special case. They work by substituting a simple optimization problem via majorization or minorization of the objective function. 2. A tutorial provides examples of constructing surrogate majorizing functions for problems like quantile regression and least squares estimation. MM algorithms typically exploit convexity to simplify the original difficult optimization problem into an easier subproblem. 3. Iteration allows solving the simplified subproblem instead of the original problem directly. Each iterate serves to drive the objective function downhill until convergence is reached, providing numerical stability to MM algorithms.

Uploaded by

trananh1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

172 views

A Tutorial On MM Algorithms

Uploaded by

trananh1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

A Tutorial on MM Algorithms

Author(s): David R. Hunter and Kenneth Lange

Source: The American Statistician, Vol. 58, No. 1 (Feb., 2004), pp. 30-37
Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association
Stable URL: http://www.jstor.org/stable/27643496 .
Accessed: 25/10/2014 14:08
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve
and extend access to The American Statistician.

http://www.jstor.org

This content downloaded from 128.235.29.171 on Sat, 25 Oct 2014 14:08:39 PM

All use subject to JSTOR Terms and Conditions

_General_
A Tutorial on MM Algorithms
R. Hunter

David

Most

in frequentist

problems

a function
are

rithms

such

Lange

and Kenneth

the most

among

involve optimization

statistics
or a sum

a likelihood

effective

squares.

algo

for maximum

algorithms

continued.

likelihood estimation because they consistently drive the likeli

a simple surrogate function for the
hood uphill by maximizing
Iterative

log-likelihood.

a surrogate

optimization

function

exemplified by an EM algorithm does not necessarily require

missing data. Indeed, every EM algorithm is a special case of
the more general class of MM optimization algorithms, which
typically exploit convexity rather than missing data inmajoriz
or

ing

minorizing

some

discusses

merous

for

and

some
error

standard

KEY WORDS:

to illustrate

the concepts

optimization;

algo

constrained

algorithm;

likelihood and least squares are the dominant forms

in frequentist statistics. Toy optimization problems
classroom

designed

but most

practical maximum
must

problems

presentation

likelihood

solved

cusses an optimization method

and

can

solved

analytically,

and least squares esti

numerically.

This

article

dis

that typically relies on convexity

is a generalization

the well-known

algo

rithm method (Dempster, Laird, and Rubin 1977; McLachlan

and Krishnan 1997). We call any algorithm based on this itera
tive method anMM algorithm.
the general principle behind MM algo
To our knowledge,
rithms was first enunciated by the numerical analysts Ortega
and Rheinboldt
(1970) in the context of line search methods.
De Leeuw and Heiser (1977) presented an MM algorithm for
multidimensional
scaling contemporary with the classic Demp

of Statistics, Penn State

Professor, Department
Park, PA 16802-2111
(E-mail: [email protected]).
University,
University
of Biomathematics
Kenneth Lange is Professor, Departments
and Human Genet
David

R. Hunter

is Assistant

ics, David Geffen School of Medicine

Research
supported in part by USPHS

The American

Statistician,

at UCLA,

Los Angeles, CA 90095-1766.

and MH59490.
grants GM53275

February

2004,

other

among

places,

Berge 1992), and inmedical imaging (De Pierro 1995 ;Lange and
Fessier 1995). The recent survey articles of de Leeuw (1994),
Heiser (1995), Becker, Yang, and Lange (1997), and Lange,
Hunter, and Yang (2000) deal with the general principle, but it
is not until the rejoinder of Hunter and Lange (2000a) that the
acronym MM first appears. This acronym pays homage to the
names

and

"majorization"

"iterative

majorization"

the

cluding quantile regression (Hunter and Lange 2000b), survival

analysis (Hunter and Lange 2002), paired and multiple compar
isons (Hunter 2004), variable selection (Hunter and Li 2002),
and DNA sequence analysis (Sabatti and Lange 2002).
One of the virtues of the MM acronym is that it does dou
ble duty. In minimization
problems, the first M of MM stands

INTRODUCTION

for

reappears,

principle

MM principle, emphasizes its crucial link to the better-known

EM principle, and diminishes the possibility of confusion with
the distinct subject inmathematics known as majorization (Mar
shall and Olkin 1979). Recent work has demonstrated the utility
of MM algorithms in a broad range of statistical contexts, in

Newton-Raphson.

1.
Maximum
of estimation

new material

The

in robust regression (Huber 1981), in correspondence

analy
sis (Heiser 1987), in the quadratic lower bound principle of
literature
B?hning and Lindsay (1988), in the psychometrics
on least squares (Bijleveld and de Leeuw 1991; Kiers and Ten

earlier

include

estimation.

Constrained

Minorization;

Majorization;

them,

constructing

features.

the article

introduces

article

optimization

arguments

opinion,

In addition to surveying previous work on MM

this

mation

methods

attractive

their

throughout

examples

described.
rithms,

some

suggests

algorithms,

and

In our

function.

objective

to be part of the standard toolkit of profes

This article explains the principle behind

algorithms deserve
sional statisticians.

ster et al. (1977) article on EM algorithms. Although the work

of de Leeuw and Heiser did not spark the same explosion of
interest from the statistical community set off by the Dempster
et al. (1977) article, steady development of MM algorithms has

Vol. 58, No.

for majorize

problems,

and

the

for maximize.

(We

second

the first M
define

the

terms

and the second M

"majorize"

"minorize"

and

in Section 2.) A successful MM algorithm substitutes a simple

optimization problem for a difficult optimization problem. Sim
plicity can be attained by (a) avoiding large matrix inversions,
(b) linearizing an optimization problem, (c) separating the pa
rameters of an optimization problem, (d) dealing with equality
and inequality constraints gracefully, or (e) turning a nondiffer
entiable problem into a smooth problem. Iteration is the price
we pay for simplifying the original problem.
In our

view,

are

algorithms

to understand

easier

and

sometimes easier to apply than EM algorithms. Although we

have no intention of detracting from EM algorithms, their domi
nance over MM algorithms is a historical accident. An EM algo
rithm operates by identifying a theoretical complete data space.
In the E-step of the algorithm,
data log-likelihood

complete
observed

data.

The

to a constant,

2004 American

function.

minorizing

Statistical

the conditional expectation

is calculated with respect
function

surrogate

norizing function ismaximized

In maximization

for minimize.

stands for minorize

created

of the
to the

the E-step

In the M-step,

is,

this mi

with respect to the parameters of

Association

This content downloaded from 128.235.29.171 on Sat, 25 Oct 2014 14:08:39 PM

All use subject to JSTOR Terms and Conditions

DOT

10.1198/0003130042836

o
o
-1

-2

1. For q = 0.8,
Figure
minimized
that
is
f(6)
by

(a)

(b)

the "vee" function

(a) depicts
of the sample
the 0.8 quantile

the underlying model; thus, every EM algorithm is an example

of an MM algorithm. Construction of an EM algorithm some
times demands creativity in identifying the complete data and
technical skill in calculating an often complicated conditional
it analytically.
expectation and then maximizing
In contrast,

typical

of MM

applications

revolve

around

for constructing MM algorithms and to illustrate various aspects

of these algorithms through the study of specific examples.
conclude

this

section

with

a note

nomenclature.

downhill.

its quadratic

function

the objective
(b) shows
for9(m) = 2.5.

majorizer,

Indeed, the inequality

care

ful inspection of a log-likelihood or other objective function to

be optimized, with particular attention paid to convexity and in
equalities. Thus, success with MM algorithms and success with
EM algorithms hinge on somewhat different mathematical ma
neuvers. However, the skills required by most MM algorithms
are no harder tomaster than the skills required by most EM algo
rithms. The purpose of this article is to present some strategies

for 6(m) = -0.75;

function
pq(9) and its quadratic majorizing
1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 5, along with

Just

as EM is more a prescription for creating algorithms than an

actual algorithm, MM refers not to a single algorithm but to a
class of algorithms. Thus, this article refers to specific EM and
MM algorithms but never to "the MM algorithm" or "the EM

= /
f^(m))(2)
<
|
directly from the fact #(0(m+1)
| 0(m))
g(9^m)
anMM
and
The
lends
definition
descent
(1).
property (2)
?(m)^
algorithm remarkable numerical stability. With straightforward
rather than
changes, theMM recipe also applies tomaximization
a function f(6), we minorize
it by
minimization: To maximize
to
a surrogate function g(6 \ 0^)
and maximize g(0 | 0^)
follows

the next iterate #(m+1).

produce
2.1

Calculation

of Sample Quantiles

algorithm."

THE MM

Let #(m) represent a fixed value of the

g(9 | 0(m)) denote a real-valued function
pends on 0^m). The function g(6 \ 9^)
real-valued function f(0) at the point 9^

a one-dimensional

PHILOSOPHY
parameter 0, and let
of 9 whose form de
is said to majorize a

puting

sample

a sample

from

quantile

of com

the problem
xn

xi,...,

numbers. One can readily prove (Hunter and Lange 2000b)

for

q G

(0,1),

a qth

sample

quantile

of X\,...,

real

that

minimizes

the function

provided

> f(0)
for all
g(0\eW)
Uj
=
/(<9(m)).
p(0(m) I0(m))

consider

example,

f(0)

Y,P?(xi-6)>

(3)

where pq(0) is the "vee" function

In other words, the surface 9 i? g (9 |
lies above the
9^)
surface f(9) and is tangent to it at the point 9 ? 9^m\ The
9-?
(M - iq0
pq^}
if -g{9
function g(9 \ 9^)
is said tominorize f(9) at 9^
0<0.
\
\-{l-q)0
at0<m).
9^) majorizes -f(9)
=
represents the current iterate in a search of When q
Ordinarily, 9^
1/2, this function is proportional to the absolute
MM algorithm, we
the surface f(9). In a majorize-minimize
value function; for q ^ 1/2, the "vee" is tilted to one side or the
rather than the
minimize
the majorizing function g(9 \ 9^)
other. As seen in Figure 1(a), it is possible tomajorize the "vee"
at any nonzero
function.
actual function f(9). If 0(m+1) denotes the minimizer of g(9 \ function
point
by a simple
quadratic
we can show that the MM procedure forces f{9)
a
at
then
for
is
0^
?#(m)
^ 0, pq(0)
Specifically,
given
majorized
#(m)),
The American

Statistician,

This content downloaded from 128.235.29.171 on Sat, 25 Oct 2014 14:08:39 PM

All use subject to JSTOR Terms and Conditions

February

2004, Vol. 58, No.

derived

Uo\e(m))
under

and

its,

with

composition

relation
of

the formation

sums,

between

functions

nonnegative

lim

These

rules

function.

increasing

products,

permit us to work piecemeal in simplifying complicated objec

tive functions. Thus, the function f{9) of Equation (3) is ma
jorized at the point 9^ by

3.1

stemming
some common

inequalities

section

outlines
or

majorizing

of objective

10^1}.

1{|^|

various

This

concavity.
to construct

+(49-2)0 +

the majorization

Fortunately,
closed

from

functions

minorizing

convexity

inequalities

used

for various

types

functions.

Jensen's

Inequality

Jensen's inequality states for a convex function k(x) and any

random variable X that k[E (X)} < E [k(X)}. Since ? \n(x) is
a convex function, we conclude for probability densities
a(x)
and b(x) that

g (e |e{mA = ]T ?q (Xi o |x% o(mA .

The function f{9) and its majorizer g(9 | 0(m)) are shown in
? 12.
Figure 1(b) for a particular sample of size n
Setting the first derivative of g(9 | 0(m)) equal to zero gives
the minimum point
w- (m)rx.

"(2<7-i)+Er=i

?
\x% 0(m)|_1 depends on 0(m). A
flaw of algorithm (5) is that the weight i*4
is undefined when
ever 0(m) = x%. Inmending this flaw, Hunter and Lange (2000b)
also discussed the broader technique of quantile regression in
troduced by Koenker and Bas sett (1978). From a computational
=

the weight w?

where

perspective, themost fascinating thing about the quantile-finding

algorithm is that it avoids sorting and relies entirely on arithmetic
and

iteration.

For

the case

the

sample

median

(1973)
gorithm (5) is found in Schlossmacher
anMM algorithm by Lange and Sinsheimer
(1995).
in Equation (4) is a
Because g(9 \ 9^)
of 9, expression (5) coincides with the more
Raphson
(m+l)

If h(x)U

(4)

?
1/2),

quadratic function
general Newton

update

left-hand

side

above

?(m)

V2#(0(m) |0(r

Vg ( 6>(m)| 6>(m)

b{X)
and we

vanishes

the

obtain

E [lnapO] < E [\nb(X)] ,

which is sometimes known as the information inequality. It is
this inequality that guarantees that aminorizing function is con
structed in the E-step of any EM algorithm (de Leeuw 1994;
Heiser 1995), making every EM algorithm anMM algorithm.
3.2

Minorization
Jensen's

perplane

via Supporting

inequality

of a convex

property

Hyperplanes

is easily derived from the supporting hy

function:

Any

linear

tan

function

gent to the graph of a convex function is aminorizer at the point

of tangency. Thus, if k{6) is convex and differentiate,
then

?(0) > k(0m)

(7)

+V/^0(m)V(0-0(m)),

with equality when 0 = ?(m\ This inequality is illustrated by

the example of Section 7 involving constrained optimization.
3.3

via the Definition

Majorization

Ifwe wish tomajorize

IfX has the density b(x), thenE [a(X)/b{X)\

and is shown to be
(1993) and Heiser

ln<E

it, then we

can

K,(t) is convex

use

the

of Convexity

a convex function instead of minorizing

standard

definition

of convexity;

namely,

if and only if

(6)
where V#(0(m)
dient vector and
at 0(m). Because

denote the gra

| 0(m)) and V2g{9^
\ 9^)
the Hessian matrix of g(9 | 0(m)) evaluated
the descent property (2) depends only on de
it, the update (6) can
creasing g[9 | 0(m)) and not on minimizing
serve in cases where g[9 \9^ ) lacks a closed form minimizer,

In the
provided this update decreases the value of g(9 \ 9^).
context of EM algorithms, Dempster et al. (1977) called an al
it
gorithm that reduces g(9 \9^) without actually minimizing
a generalized EM (GEM) algorithm. The specific case of Equa
tion (6), which we call a gradient MM algorithm, was studied in
the EM context by Lange (1995a), who pointed out that update
(6) saves us from performing iterations within iterations and yet
still displays the same local rate of convergence as a full MM
algorithm thatminimizes g(9 | 0(m)) at each iteration.
3.
In the quantile

OF THE TRADE

TRICKS
example

of Section

tion admits a quadratic majorizer

general,
32

many

majorizing

2.1,

as depicted

minorizing

"vee"

in Figure

relationships

(8)

^2q??k(U)

for any finite collection of points t%and corresponding multipli

? 1.
ers o?i with a% > 0 and
Application of definition
^\ a%
(8) is particularly effective when k(?) is composed with a linear
function

For

xt6.

instance,

for

suppose

vectors

the substitution U = x?(0? ? 0?

(8) then becomes

that we make
Inequality

/<x?0) < ^a?

Now

inequality

and

0^m)

)/a? + xt9^rn\

(9)

?im)) + z*0(m)

<9i

if all components
Alternatively,
=
we
then
itive,
may take ti
Xi9?m /xt9(rn\

the convex

k I
^^OLiU

of x, 0, and 9^
xt9^9i/9?n^
(8) becomes

are pos
?

and a?

func

1(a). In
may

Ktfe) < ?

xte^ej
ztQ{m)

General

This content downloaded from 128.235.29.171 on Sat, 25 Oct 2014 14:08:39 PM

All use subject to JSTOR Terms and Conditions

(10)

Inequalities (9) and (10)

gorithms in the contexts
Lange and Fessier 1995)
matrix inversion (Becker
3.4

have been used to construct MM al

of medical
imaging (De Pierro 1995;
and least-squares estimation without
et al. 1997).

via a Quadratic

Majorization

k(9)< k ?0(m))+ v? ?0(m)V(e 0(m))

+1 (o o^Y m (o o(mA
provides a quadratic upper bound. For example, Heiser
noted in the unidimensional case that

(1995)

0-0(m)

(0-0(m))2
?3
(0("O)2~+

3.5

of Section

example

Mean

2=1

If we

let xi =

eli,

then we

2=1

obtain

the

standard

One of the key criteria in judging minorizing or majorizing

functions is their ease of optimization. Successful MM algo
in high-dimensional

rithms

are

In other

separated.

2=1

the

form

Because

inequality.

function

surrogate

4.1

Poisson

to optimize

easier

sur

mapping

at each

iteration.

Sports Model

Consider a simplified version of amodel proposed by Maher

(1982) for a sports contest between two individuals or teams
in which the number of points scored by team i against team j
a Poisson

with

process

where

e0i~dj,

intensity

is an

strength" parameter for team i and dj is a "defensive

strength" parameter for team j. If tij is the length of time that
i plays j and pij is the number of points that i scores against j,
then the corresponding Poisson log-likelihood function is
=

Pij{oi -dj)

Uje0i~dj

the expo

nential function is strictly convex, equality holds if and only if

all of the xi are equal. Inequality (11) is helpful in constructing
the majorizer

InUj

+pij

lnp^-l,

(14)
where 0 = (o, d) is the parameter vector. Note that the parame
=
ters should satisfy a linear constraint, such as ]T\ o i+ V
dj
in
order
for
the
model
be
0,
identifiable; otherwise, it is clearly
possible to add the same constant to each o? and dj without
altering the likelihood. We make two simplifying assumptions.
First, different games are independent of each other. Second,
each team's point total within a single game is independent of
point

total.

The

are

performances

second

is more

assumption

sus

it implies that a team's offensive

pect than the first because

2=1
mean

the arithmetic-geometric

the

words,

function

surrogate

defensive
of

rely

the individual parameter components

0 G U C Rd ?> R reduces to the sum of d real-valued functions

taking the real-valued arguments 6\ through 0<?.Because the d
univariate functions may be optimized one by one, this makes

its opponent's
\

often

spaces

parameter

inwhich

rogate functions

?ij{9)

Inequality

The arithmetic-geometric mean inequality is a special case of

?
? and a? = 1/ra yields
inequality (8). Taking n(t)

SEPARATIONOF PARAMETERS
AND CYCLICMM

"offensive

The Arithmetic-Geometric

follows

for 0 < c < min{0,9^}.

The corresponding quadratic lower
bound principle for minorization
is the basis for the logistic
regression

is the Cauchy-Schwartz
inequality. De Leeuw and Heiser
(1977) and Groenen (1993) used inequality (13) to derive MM
algorithms for multidimensional
scaling.

Bound

Upper

If a convex function k(9) is twice differentiable

and has
bounded curvature, then we can majorize k(0) by a quadratic
function with sufficiently high curvature and tangent to n{9)
at 0^
(B?hning and Lindsay 1988). In algebraic terms, if we
can find a positive definite matrix M such thatM ? V2k(9)
is
all
for
then
definite
0,
nonnegative

1
1
0 W)~

which

somehow

to one

unrelated

and

another;

nonetheless
the model gives an interesting first approximation
to reality. Under these assumptions, the full data log-likelihood
is obtained by summing ?ij(9) over all pairs (i,j). Setting the
partial derivatives of the log-likelihood equal to zero leads to the
equations

^1^2

/
_<

i
2 x2
1
X1-t?r+X2?7?T
(m)

2 ^1
(m)

?cy

of the product of two positive numbers. This

in the sports contest model of Section 4.
3.6

The Cauchy-Schwartz

/io\
(12)

?-*

tions

=
w

and -

satisfied by themaximum

Inequality

llfljll

?'**

E,^

inequality is used

The Cauchy-Schwartz
inequality for the Euclidean norm is a
case
of
The function n(9) = ||0|| is convex
(7).
special
inequality
because it satisfies the triangle inequality and the homogeneity
condition ||a011= \a\ \\9\\. Since k(9) =
#2, we see that
^/^
=
and
therefore
(7)
inequality
gives
Vft(0)
0/||0||,

* ^(m)?+l
11*11

not

likelihood estimate (o, d). These equa

a closed

admit

j H3 ?e"

form

so we

solution,

turn

to an MM

algorithm.

Because

the task is to maximize

function.

need

minorizing

we may use inequality

-i-.e*-^
13

>
-

Focusing

the

(14), we
term,

?tije0i~dj

(12) to show that

the log-likelihood

, -hL&f]?\
2
eo(->+4"*>

ie?r,+*r\

(15)
The American

Statistician,

This content downloaded from 128.235.29.171 on Sat, 25 Oct 2014 14:08:39 PM

All use subject to JSTOR Terms and Conditions

February

2004, Vol. 58, No.

Table
Regular

According

Defensive

Strength.

the Basis

Offensive

team played

82 games.

Plus

Strength

in the right direction; indeed, every iteration of a cyclic MM

algorithm is simply an MM iteration on a reduced parameter
set.

0/ + Of/

Team

Wins

the 2002-2003

to Their Estimated
Each

o; + Of/

Team

Teams

of all 29 NBA

1. Ranking
Season

Wins

4.2

to National

Application
Results

Cleveland

-0.0994

Phoenix

0.0166

Denver

-0.0845

New Orleans

0.0169

Toronto

-0.0647

Philadelphia

0.0187

Miami

-0.0581

Houston

0.0205

Chicago
Atlanta

-0.0544

Minnesota

0.0259

-0.0402

LA Lakers

0.0277

in minutes.

LAClippers

-0.0355

0.0296

Memphis
New York

-0.0255

Utah

0.0299

period, if necessary,

-0.0164

Portland

0.0320

-0.0153

Detroit

0.0336

-0.0077

New

0.0481

-0.0051

San

0.0611

Orlando

-0.0039

Sacramento

0.0686

Milwaukee

-0.0027

Dallas

0.0804

Washington
Boston
State

Golden

Seattle

Indiana

Jersey
Antonio

0.0039

Although the right side of the above inequality may appear more
complicated than the left side, it is actually simpler in one impor
tant

o? and

components

parameter

respect?the

are

separated

on the right side but not on the left. Summing the log-likelihood
(14) over all pairs (?, j) and invoking inequality (15) yields the
function

=
fl(0|0(m)) ?E

tin

Pij(oi

-dj)

%
i
e o(m)+d(m)

?om;+d

the full log-likelihood up to an additive constant in

dependent of 0. The fact that the components of 0 are separated
permits us to update parameters one by one
by g (9 | 0^)

minorizing

and

reduces

substantially

tial derivatives of g(9

costs.

computational

Setting

| 0^m)) equal to zero yields

the par

the updates

Association

Table 1 summarizes our application of the Poisson sports

model to the results of the 2002-2003
regular season of the
National Basketball Association.
In these data, tij is measured

score

4:Se?l~dj

lasts

game

regular

48 minutes,

and

overtime

each

adds five minutes. Thus, team i is expected

points

team

against

the

j when

teams

two

meet and do not tie. Team i is ranked higher than team j if

?
?
ai
di, which is equivalent to ?iJrdi>
dj > dj
?j H-dj.
It isworth emphasizing some of the virtues of themodel. First,
the ranking of the 29 NBA teams on the basis of the estimated
sums o i + di for the 2002-2003
regular season is not perfectly
consistent with their cumulative wins; strength of schedule and
margins of victory are reflected in themodel. Second, themodel
gives the point-spread function for a particular game as the differ
ence

two

Poisson

independent

random

variables.

Third,

one

can easily amend the model to rank individual players rather

than teams by assigning to each player an offensive and de
fensive intensity parameter. If each game is divided into time
segments punctuated by substitutions, then theMM algorithm
can be adapted to estimate the assigned player intensities. This
might provide a rational basis for salary negotiations that takes
into account subtle differences between players not reflected in
traditional

Lij ?-2d,

Basketball

statistics.

sports

Finally, the NBA dataset sheds light on the comparative

speeds of the original MM algorithm (16) and its cyclic modi
fication (17). The cyclic MM algorithm converged in fewer it
erations (25 instead of 28). However, because of the additional
work required to recompute the denominators in Equation (17),
the cyclic version required slightly more floating-point opera
tions as counted by MATLAB
(301,157 instead of 289,998).

5. SPEED OF CONVERGENCE
j
E7-P'W

o (m+l)

(m)_Am)
2-^j

algorithms

tije
plementary

(m+l)

J2^P^J

(16)

o nL>+d\

rithms

boast

strengths.
a
quadratic

point

optimum

.Zik

soon

as they

become

available.

For

instance,

T,iPij

J+d\

if we

Newton-Raphson
as
they

convergence

words,

under

certain

update

?
||0( +i)_0*||
_
J{m)
0*||2

near

general

algo
a local
condi

(16),

for

some

faster

the

linear

rate

quadratic
of

lim
m^oo^?-??||0M

(17)

In practice, anMM algorithm often takes fewer iterations when

we cycle through the parameters updating one at a time than
when we update the whole vector at once as in algorithm (16).
We call such versions of MM algorithms cyclic MM algorithms;
they generalize the ECM algorithms of Meng and Rubin (1993).
A cyclic MM algorithm always drives the objective function

c. This

constant

than

lim

the

.Eik

hand,

rate of

tions,

o vector before the d vector in each iteration of algorithm

we could replace the formula for
above by
dj
?rn+l)

one

In other

0*.

The question now arises as to whether one should modify

algorithm (16) so that updated subsets of the parameters are used
as

algorithms have com

and Newton-Raphson

rate of

convergence

is much

convergence

c <

(18)

0*||

displayed by typical MM algorithms. Hence, Newton-Raphson

algorithms tend to require fewer iterations thanMM algorithms.
On the other hand, an iteration of a Newton-Raphson
algorithm
can

far more

onerous

computationally

algorithm. Examination

0(m+i) =

than

iteration

of the form

?(rn)-v2/(^(m))_

General

This content downloaded from 128.235.29.171 on Sat, 25 Oct 2014 14:08:39 PM

All use subject to JSTOR Terms and Conditions

vy(0(m))

of an

iteration

Newton-Raphson

that

reveals

it requires

the p x p matrix V2/(0)

an MM

trast,
on

well-designed
but

to p3. By con

arithmetic

parameters

iteration.

per

operations

tend to require more

algorithms

For

than Newton-Raphson.
an
sometimes
enjoy
advantage

algorithms

for

linear

the

constraint).

iterations

matrix

to MATLAB.

according
iteration

Raphson

requires

even

Thus,

inversion

Newton

single
in

computation

this

example

than the 300,000 floating point operations required for theMM

in 28 iterations. Numerical
algorithm to converge completely
enters

also

stability

the balance

point.

the value

to appropri

guaranteed
the objective

function

iteration.

every

Other types of deterministic

Fisher

scoring,

can match

until

convergence,

mation

matrix

them

each

scoring

information

iterations

The

expected

infor

is sometimes

easier

to eval

of Newton-Raphson.

matrix

Scoring does not automatically lead to an increase in the log

it can always
likelihood, but at least (unlike Newton-Raphson)
be made to do so if some form of backtracking is incorporated.
methods

Quasi-Newton

or even

mitigate

eliminate

the need

for

matrix inversion. The Nelder-Mead

approach is applicable in
situations where the objective function is nondifferentiable. Be
cause of the complexities of practical problems, it is impossible
to declare

any

best

algorithm

optimization

In our

overall.

perience, however, MM algorithms are often difficult to beat in

terms of stability and computational simplicity.

In most

cases,

covariance

formation
is often

matrix.

a maximum
matrix

likelihood
to the

equal

In practice,

the
by

well-approximated

estimator

inverse

the

has

expected

information

matrix

the observed

information

matrix

computed by differentiating the log-likelihood i(9)

?V2?(9)
twice. Thus, after theMLE 9 has been found, a standard error of
0 can be obtained by taking square roots of the diagonal entries
?
In some problems, however, direct
of the inverse of V2?(9).
calculation of V2?{9) is difficult. Here we propose two numeri
cal approximations to this matrix that exploit quantities readily
obtained by running anMM algorithm. Let g(9 \9^ ) denote a
minorizing function of the log-likelihood i{9) at the point 0(m\

M(??)
to be the MM

arg max #(0

algorithm map

taking 9^

| i?)

matrix.

identity

These

(20)
J t?=?
were

formulas

are

equal

at the
point

of minorization;

and

the gradient

second,

2, j entry equals

= fa " (' + ?>-(?>,

s^o
5

AM, w

(21)

the vector Cj is the jth standard basis vector having a one

= 0,
in its jth component and zeros elsewhere. Because M(9)
the jth column of VM(0) may be approximated using only out

where

put from the corresponding MM algorithm by (a) iterating until

0 is found, (b) altering the jth component of 0 by a small amount
?j, (c) applying theMM algorithm to this altered 0, (d) subtract
of
ing 0 from the result, and (e) dividing by ?j. Approximation
nu
it
involves
is
analogous except
V2?{9) using Equation (20)
=
merically approximating the Jacobian of /i(#)
Vg(9
|#). In
this case one may exploit the fact that h{9) is zero.
6.2

An MM Algorithm

for Logistic

Regression

To illustrate these ideas and facilitate comparison

ious

numerical

methods,

consider

of the var
the

in which

example

of the log-likelihood
is easy to compute. B?hning and
Lindsay (1988) apply the quadratic bound principle of Section
3.4 to the case of logistic regression, inwhich we have an n x 1

asymp

expected

and define

(19)

Hessian

6. STANDARDERROR ESTIMATES

totic

the

dOj

ground. Although
in required

V2g{9 |0) /I VM(0)

d
V2g(9 |0) +

of g{9 | 0(m)) at its maximizer M(9^)

is zero. Alternative
derivations of formulas (19) and (20) were given by Meng and
these
Rubin (1991) and Oakes (1999), respectively. Although
formulas have been applied to standard error estimation in the
EM algorithm literature?Meng
and Rubin (1991) base their
SEM idea on formula (19)?to our knowledge, neither has been
applied in the broader context of MM algorithms.
of V2?{9) using Equation (19) requires a nu
Approximation
whose
merical approximation of the Jacobian matrix VM(0),

methods

gradient-free

denotes

are based on

to ?V2?{9)

approximations

rived by Lange (1999) using two simple facts: First, the tan
gency of ?(9) and itsminorizer imply that their gradient vectors

such as

algorithms,

Newton-Raphson
has
its own merits.

in Fisher

used

the observed

than

occupy a kind of middle

like Nelder-Mead,
of

optimization
methods,

quasi-Newton

none

uate

are

algorithms

or decrease

increase

ately

contrast,

A Newton-Raphson

sheet.

if started too far from an optimum

gorithm can behave poorly

where

of a 57 x 57 matrix requires roughly 387,000 floating point op

erations

via MM

a>v^'*>

reason

this

in computational

single

Thus,

For example, the Poisson process scoring model for the NBA
dataset of Section 4 has 57 parameters (two for each of 29 teams
one

Differentiation

The two numerical

the formulas

V2?{9)

speed.

minus

Numerical

takes

usually

iterations

simpler

separates

to invert

needed

calculations

is roughly proportional
that

algorithm
of p or p2

the order

the number

then

components,

6.1

evalua

If 0 has p

tion and inversion of the Hessian matrix V2/(0^m^).

vector

of binary

and

responses

an n

x p matrix

of predic

= 1
stipulates that the probability 7r?(0) that Y%
equals ex.p{9txi}/
Straightforward differen
(1 + exp{0tx2}).
tiation of the resulting log-likelihood function shows that
tors. The model

V2?{9)

=
-^^(0)[l-^(0)]x,4
2=1

?
Since 7Ti(9) [1 tt1(9)} is bounded above by 1/4, we may define
the negative definite matrix B =
and conclude that
?\XlX
?
B is nonnegative definite as desired. Therefore, the
X/2l(9)
quadratic function

- 0(m)
= ?
+ V?
g ?9 I
0(m)) (9{rnA
(0(m)V(0
+-

to 0O+1).
The American

Statistician,

This content downloaded from 128.235.29.171 on Sat, 25 Oct 2014 14:08:39 PM

All use subject to JSTOR Terms and Conditions

(0-0(m)V?
February

2004, Vol. 58, No.

Table

and

Coefficients

Estimated

Low Birth Weight

Logistic

Exact V2l{6)

Constant

AGE
LWT
RACE2
RACE3
SMOKE
PTL
HT
Ul
FTV

mizing

on:

based

Equation (19) Equation (20)

0.48062

1.1969

1.1984

-0.029549

0.037031

0.037081

0.0069194

0.0069336

0.52736

0.52753

0.52753
0.44076

-0.015424
1.2723

minorizes

for the

Example

errors

Standard
Variable

Errors

Standard

Regression

0.8805

0.44079

0.44076

0.93885

0.40215

0.40219

0.54334

0.34541

0.34545

1.8633

0.69754

0.69811

0.76765

0.45932

0.45933

0.065302

0.17240

0.17251

?(9) at 0^m). The MM

this quadratic, giving
=

0(m+l)

?(m)

Vj(9im))\-lnVj(9)+lnVj(9{m)))
Adding

?-lyg

algorithm of Equation
an

it enjoys

once,

Newton-Raphson

(B?hning and Lindsay

6.3

Application

now

test

\y-tt

.(22)

(0(m))

increasing

computational

with equality when 0 =

of predictors

the number

negative.

optimization
For

example,
we
Here

>0,

tuning

0(m). Summing over j and multiplying

uj, we

parameter

construct

the

function

Vj (0(*0?

increases

error

Data
based

approximations

on Equa

except

for

race,

which

is a three-level

purposes.

7. HANDLING CONSTRAINTS
Many

positive

will differ. The close agreement of the approximations with the

"gold standard" based on the exact value V2?(9) is clearly good

eters.

v3{9{rn)) v3{9).

1988).

the standard

for practical

advantage

factor with level 1 for whites, level 2 for blacks, and level 3 for
likelihood estimates
other races. Table 2 shows the maximum
errors
for
the
10
standard
and asymptotic
parameters. The dif
ferentiation increment Sj was 0j/lOOO for each parameter 9j.
The standard error approximations in the two rightmost columns
turn out to be the same in this example, but in other models they

enough

(22) needs to invert XtX

to Low Birth Weight

is quantitative

?
s~1(t

+V^-(0(m))t(0-0(m))

tions (19) and (20) on the low birth weight dataset of Hosmer
and Lemeshow
(1989). This dataset involves 189 observations
and eight maternal predictors. The response is 0 or 1 accord
ing to whether an infant is born underweight, defined as less
than 2.5 kilograms. The predictors include mother's age in years
(AGE), weight at lastmenstrual period (LWT), race (RACE2 and
RACE3), smoking status during pregnancy (SMOKE), number
of previous premature labors (PTL), presence of hypertension
history (HT), presence of uterine irritability (UI), and number of
physician visits during the first trimester (FTV). Each of these
predictors

Ins ? hit >

the last two inequalities, we see that

?9{m))

only
over

v3{9) > VVj ?0(m>V (0<m> 0).

Vj(9{m)) [-ln^-(0)+lnt;J(0(m))

= 9{m)4(XtX)-1X?
Since theMM

?0(m))

Application of the similar inequality

implies that

by maxi

algorithm proceeds

interior of the parameter space but allows strict inequalities to

become equalities in the limit.
Consider the problem of minimizing f{9) subject to the con
straints Vj(6) > 0 for 1 < j < g, where each v3; (0) is a concave,
differentiable function. Since ?Vj(6)
is convex, we know from
that
(7)
inequality

problems
parameters
a
discuss

constraints
impose
are often
required
majorization

technique

9-9{rn)\tVvj

?0(m))

(23)

in
majorizing f{9) at 9^m\ The presence of the term lnvj(0)
<
The
from
0
(23)
prevents ^(0(m+1))
Equation
occurring.
of mi?j(0)
multiplier Vj(9^)
gradually adapts and allows
there
v.{Q(m+1)) to tend to 0 if it is inclined to do so. When
are equality constraints A9 = b in addition to the inequality
constraints v3; (0) > 0, these should be enforced during the min
imization of g (9 \9^).
7.1

Multinomial

Sampling

To gain a feel for how these ideas work in practice, consider

the problem of maximum
likelihood estimation given a random
sample of size n from a multinomial distribution. If there are q
categories and n? observations fall in category i, then the log
a constant. The compo
likelihood reduces to
^2i rii In 9i plus
= 1.
nents of the parameter vector 0 satisfy 0? > 0 and
Yli 9i
it is well known that the maximum
likelihood esti
Although
mates are given by 0? = rii/n, this example is instructive be
cause it is explicitly solvable and demonstrates the linear rate of
convergence of the proposed MM algorithm.
To minimize the negative log-likelihood f(9) = ? ]T^ ni ln 9%
=
9i > 0 and the
subject to the q inequality constraints 1^(0)
?
we
construct
themajorizing func
equality constraint J2i 9% 1,
tion

param

to be
that

non
in a

sense eliminates

inequality constraints. For this adaptive barrier

method (Censor and Zenios 1992; Lange 1994) towork, an initial
point 0(?) must be selected with all inequality constraints strictly
satisfied. The barrier method confines subsequent iterates to the
36

suggested in Equation (23), omitting irrelevant constants. We

= 1
minimize g (9 | 0(m)) while enforcing
by introducing
]^?0?
a Lagrange multiplier and looking for a stationary point of the

General

This content downloaded from 128.235.29.171 on Sat, 25 Oct 2014 14:08:39 PM

All use subject to JSTOR Terms and Conditions

de Leeuw,
of Correction Matrix
J., and Heiser, W. J. (1977), "Convergence
for Multidimensional
Algorithms
of
Scaling," in Geometric Representations
Relational Data, eds. J. C. Lingoes, E. Roskam, and I. Borg, Ann Arbor, MI:
Mathesis
735-752.
Press, pp.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), "Maximum Likelihood
from Incomplete Data via the EM Algorithm," Journal of the Royal Statistical
Society, Ser. B, 39, 1-38.

Lagrangian

h(9) = 5(0|0^) +
A???-lJ.
equal to zero and multiplying

Setting dh(9)/d9l

-rii

cj02(m)

uj9% +

n?+o;0
1
n +

?(m+i)

all iterates have positive

The

components.

positive

\9%

final

if they start with

components

/^(m)
%
cj V

_ n?\ '
n )

the parameter
zero.
equal

one

where

or more

P. J. (1981), Robust Statistics, New York: Wiley.

for Generalized
Hunter, D. R. (2004), "MM Algorithms
Bradley-Terry Models,"
The Annals of Statistics, 32, 386-408.
to discussion of "Optimization
Hunter, D. R., and Lange, K. (2000a), Rejoinder
Transfer Using Surrogate Objective
Journal of Computational
Functions,"
and Graphical
Statistics, 9, 52-59.
Huber,

demonstrates that 9^
approaches the estimate 9 at the linear
rate uj/(n + uj), regardless of whether 9 occurs on the boundary
of

Theory and
"Convergent Computing
by Iterative Majorization:
inMultidimensional
inDe
Data Analysis,"
in Recent Advances
Applications
ed. W. J. Krzanowski,
Oxford: Clarendon
Analysis,
scriptive Multivariate
Press, pp. 157-189.
New
S. (1989), Applied Logistic Regression,
Hosmer, D. W., and Lemeshow,
York: Wiley.

-(1995),

rearrangement

n +

for
Algorithm
IEEE Transac

toMultidimensional
Scal
Groenen, P. J. F. (1993), The Memorization Approach
DSWO Press.
Leiden, the Netherlands:
ing: Some Problems and Extensions,
Heiser, W. J. (1987), "Correspondence
Analysis with Least Absolute Residuals,"
Statistical Data Analysis,
5, 337-356.
Computational

the update

_ ^i =
1
n

0(m+l)

De Pierro, A. R. (1995), "AModified

Expectation Maximization
Penalized Likelihood
in Emission Tomography,"
Estimation
tions on Medical
14, 132-137.
Imaging,

on i reveals that ? = n and yields

Summing

Hence,

by 9%gives

its components

via anMM Algorithm," Journal of Com

"Quantile Regression
and Graphical
Statistics, 9, 60-77.
in the Proportional Odds Model," Annals
-(2002),
"Computing Estimates
54, 155-168.
of the Institute of Statistical Mathematics,
8. DISCUSSION
Hunter, D. R., and Li, R. (2002), "A Connection Between Variable Selection and
technical report 0201, Dept. of Statistics, Pennsylvania
EM-type Algorithms,"
to whet
not
This
article
is meant
readers'
satiate
appetites,
State University.
much.
For instance,
there
is a great deal
them. We
have
omitted
of the
Acceleration
Jamshidian, M., and Jennrich, R. I. (1997), "Quasi-Newton
EM Algorithm,"
Journal of the Royal Statistical Society, Ser. B, 59, 569-587.
known about the
of MM
that
space

convergence

is too mathematically

algorithms

properties

to present

demanding

0?-(2000b),
putational

here.

Fortunately,

almost all results from the EM algorithm literature (Wu 1983;

Lange 1995a; McLachlan and Krishnan 1997; Lange 1999) carry
over

without

change

to MM

are

there

Furthermore,

algorithms.

of a Class of
Kiers, H. A. L., and Ten Berge, J. M. F. (1992), "Minimization
Matrix Trace Functions by Means
of Refined Majorization,"
Psychometrika,
57,371-382.
Koenker, R., and Bassett, G. (1978), "Regression Quantiles," Econometrica,
46,
33-50.
Lange,

(1994),

"An Adaptive

Barrier Method

several methods for accelerating EM algorithms that are also

Methods Applications
1, 392-402.
Analysis,
"A Gradient Algorithm
-(1995a),
to
MM
Locally
applicable
accelerating
algorithms (Heiser 1995; Lange
rithm," Journal of the Royal Statistical Society,
1995b; Jamshidian and Jennrich 1997; Lange et al. 2000).
"A Quasi-Newton
Acceleration
-(1995b),
tic a Sinica, 5, 1-18.
Although this survey article necessarily reports much that is
are

there

known,

already

some

new

results

here.

Our MM

of constrained optimization in Section 7 is more general

than previous versions in the literature (Censor and Zenios 1992;
Lange 1994). The application of Equation (20) to the estimation
of standard errors inMM algorithms is new, as is the extension
of the SEM idea of Meng and Rubin (1991) to theMM case.
There are so many examples of MM algorithms in the liter
that we

are unable

to cite

them

all. Readers

should

the lookout for these and for known EM algorithms

explained
tantly,

hope

as MM

simply
this article

will

Even

algorithms.
stimulate

readers

that can be

more
to discover

impor
new

algorithms.
[Received

June 2003. Revised

September

Programming,"

to the EM Algo
Equivalent
Ser. B, 57, 425-437.
of the EM Algorithm,"
Statis

Numerical
New York:
Analysis for Statisticians,
Verlag.
Lange, K., and Fessier, J.A. (1995), "Globally Convergent Algorithms
imum A Posteriori Transmission
IEEE Transactions
Tomography,"
1430-1438.
Processing A,

treat -(1999),

ment

ature

for Convex

2003.]

REFERENCES
Becker, M. P., Yang, I., and Lange, K. (1997), "EM Algorithms Without Missing
inMedical Research,
Data," Statistical Methods
6, 38-54.
Bijleveld, C. C. J. H., and de Leeuw, J. (1991), "Fitting Longitudinal Reduced
Rank Regression Models
56,
by Alternating Least Squares," Psychometrika,
433-447.
of Quadratic Approx
B?hning, D., and Lindsay, B. G. (1988), "Monotonicity
imation Algorithms,"
Annals of the Institute of Statistical Mathematics,
40,
641-663.
S. A.
"Proximal Minimization
With D
Censor, Y., and Zenios,
(1992),
73, 451-464.
Functions," Journal of Optimization
Theory and Applications,
de Leeuw, J. (1994), "Block Relaxation Algorithms
in Statistics, in Information
eds. H. H. Bock, W. Lenski, and M. M. Richter,
Systems and Data Analysis,
Berlin: Springer-Verlag,
pp. 308-325.

Springer
forMax
on Image

Transfer using
Lange, K., Hunter, D. R., and Yang, I. (2000), "Optimization
Journal of Computational
(with discussion),
Surrogate Objective Functions"
and Graphical
Statistics, 9, 1-20.
J. S. (1993), "Normal/Independent
Distributions
Lange, K., and Sinsheimer,
and Their Applications
in Robust Regression,"
and
Journal of Computational
Statistics, 2, 175-198.
Graphical
(2nd ed.), Read
Luenberger, D. G. (1984), Linear and Nonlinear
Programming
ing, MA: Addison-Wesley.
Football Scores," Statistica Neer
Maher, M. J. (1982), "Modelling Association
landica, 36, 109-118.
and
Marshall, A. W., and Olkin, I. (1979), Inequalities: Theory ofMajorization
its Applications,
San Diego: Academic.
G. J., and Krishnan, T. (1997), The EM Algorithm and Extensions,
McLachlan,
New York: Wiley.
and Rubin, D. B. (1991), "Using EM
X.-L.,
Meng,
Variance-Covariance
Matrices:
The SEM Algorithm,"
can Statistical Association,
86, 899-909.

to Obtain Asymptotic
Journal of the Ameri

A
"Maximum Likelihood
via the ECM Algorithm:
Estimation
Framework, Biometrika,
800, 267-278.
via the EM
of the Information Matrix
Oakes, D. (1999), "Direct Calculation
Journal of the Royal Statistical Society, Ser. B, 61, Part 2, 479
Algorithm,"
482.

?-(1993),
General

Ortega, J.M., and Rheinboldt, W. C. (1970), Iterative Solutions of Nonlinear

in Several Variables, New York: Academic,
Equations
pp. 253-255.
a
Identification Using
Sabatti, C, and Lange, K. (2002), "Genomewide Motif
Dictionary Model," Proceedings
of the IEEE, 90, 1803-1810.
E. J. (1973), "An Iterative Technique
Deviations
for Absolute
Schlossmacher,
Curve Fitting," Journal of the American Statistical Association,
68, 857-859.
The
Wu, C. F. J. (1983), "On the Convergence
Properties of the EM Algorithm,"
Annals of Statistics,
11, 95-103.
The American

Statistician,

This content downloaded from 128.235.29.171 on Sat, 25 Oct 2014 14:08:39 PM

All use subject to JSTOR Terms and Conditions

February

2004, Vol. 58, No.

A Tutorial on MM Algorithms
No ratings yet
A Tutorial on MM Algorithms
8 pages
MM Algorithm
No ratings yet
MM Algorithm
28 pages
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
28 pages
De Leeuw
No ratings yet
De Leeuw
50 pages
Lange Talk
No ratings yet
Lange Talk
40 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Generalized Majorization-Minimization
No ratings yet
Generalized Majorization-Minimization
10 pages
Elementary Theory and Application of Numerical Analysis: Revised Edition
From Everand
Elementary Theory and Application of Numerical Analysis: Revised Edition
David G. Moursund
No ratings yet
Summary SC Microeconometrics
No ratings yet
Summary SC Microeconometrics
20 pages
Finite Mathematics, Models, and Structure: Revised Edition
From Everand
Finite Mathematics, Models, and Structure: Revised Edition
William J. Adams
No ratings yet
lec13
No ratings yet
lec13
27 pages
Sparsity and Its Mathematics
No ratings yet
Sparsity and Its Mathematics
44 pages
ds11 2
No ratings yet
ds11 2
19 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Chapter 9.4 Allele Frequency Estimation
No ratings yet
Chapter 9.4 Allele Frequency Estimation
24 pages
2206.03069v1
No ratings yet
2206.03069v1
4 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
No ratings yet
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
15 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
16 Aos1435
No ratings yet
16 Aos1435
44 pages
2B Naive Bayes
No ratings yet
2B Naive Bayes
90 pages
Applied Iterative Methods
From Everand
Applied Iterative Methods
Louis A. Hageman
No ratings yet
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
No ratings yet
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
16 pages
Lec16 PDF
No ratings yet
Lec16 PDF
10 pages
Maximum Likelihood Estimation with Stata Fourth Edition William Gould instant download
100% (2)
Maximum Likelihood Estimation with Stata Fourth Edition William Gould instant download
74 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Optimization Algorithms and Hierarchical Convergence
From Everand
Optimization Algorithms and Hierarchical Convergence
Pasquale De Marco
No ratings yet
The Comprehensive Guide to Machine Learning Algorithms and Techniques
From Everand
The Comprehensive Guide to Machine Learning Algorithms and Techniques
Mohammed Ahmed
5/5 (1)
The Expectation Maximization Algorithm
No ratings yet
The Expectation Maximization Algorithm
7 pages
Maximum Likelihood Estimation with Stata Fourth Edition William Gould instant download
100% (3)
Maximum Likelihood Estimation with Stata Fourth Edition William Gould instant download
83 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
Unit 3 ML
No ratings yet
Unit 3 ML
45 pages
Unit 2
No ratings yet
Unit 2
7 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
From Everand
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
César Pérez López
No ratings yet
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
Beamer
No ratings yet
Beamer
34 pages
The EM Algorithm and Extensions 2nd Edition Geoffrey J. Mclachlan - The ebook in PDF format is ready for download
100% (1)
The EM Algorithm and Extensions 2nd Edition Geoffrey J. Mclachlan - The ebook in PDF format is ready for download
36 pages
Lecture3 EM
No ratings yet
Lecture3 EM
36 pages
Em Algorithm Thesis
100% (3)
Em Algorithm Thesis
6 pages
Fisher Information
No ratings yet
Fisher Information
59 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
ML UNIT III
No ratings yet
ML UNIT III
12 pages
Multivariable Optimization
No ratings yet
Multivariable Optimization
48 pages
The CMA Evolution Strategy A Comparing R
No ratings yet
The CMA Evolution Strategy A Comparing R
39 pages
Complete Download Maximum Likelihood Estimation with Stata Fourth Edition William Gould PDF All Chapters
100% (4)
Complete Download Maximum Likelihood Estimation with Stata Fourth Edition William Gould PDF All Chapters
81 pages
Instant Download Maximum Likelihood Estimation with Stata Fourth Edition William Gould PDF All Chapters
100% (1)
Instant Download Maximum Likelihood Estimation with Stata Fourth Edition William Gould PDF All Chapters
89 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
GMM Said Crv10 Tutorial
No ratings yet
GMM Said Crv10 Tutorial
27 pages
Collaborative Filtering Matrix Factorization Approach: Jeff Howbert Introduction To Machine Learning Winter 2012 #
No ratings yet
Collaborative Filtering Matrix Factorization Approach: Jeff Howbert Introduction To Machine Learning Winter 2012 #
30 pages
Basic Matrix Theory
From Everand
Basic Matrix Theory
Leonard E. Fuller
No ratings yet
(Slides) The em Algorithm
No ratings yet
(Slides) The em Algorithm
14 pages
Lecture 3 ML_optimization
No ratings yet
Lecture 3 ML_optimization
32 pages
AI29
No ratings yet
AI29
3 pages
NO LINEALs
No ratings yet
NO LINEALs
61 pages
Maximum Likelihood Estimation with Stata Fourth Edition William Gould - The full ebook version is ready for instant download
No ratings yet
Maximum Likelihood Estimation with Stata Fourth Edition William Gould - The full ebook version is ready for instant download
78 pages
10 1 1 57 8993 PDF
No ratings yet
10 1 1 57 8993 PDF
19 pages
An Introduction to Probability and Stochastic Processes
From Everand
An Introduction to Probability and Stochastic Processes
James L. Melsa
4.5/5 (2)
Total Variation Norm Regularized Least Square: I.E. Minimize Kax BK + KDXK Where KDXK Is Not Proximable
No ratings yet
Total Variation Norm Regularized Least Square: I.E. Minimize Kax BK + KDXK Where KDXK Is Not Proximable
12 pages
6th Class Paper
No ratings yet
6th Class Paper
2 pages
Grade 11 Periodic Test FINAL ANswer Key
0% (1)
Grade 11 Periodic Test FINAL ANswer Key
5 pages
Stats Answers 1.3
No ratings yet
Stats Answers 1.3
3 pages
YCT Motion in One Dimensions NEET JEE Practice Questions
100% (1)
YCT Motion in One Dimensions NEET JEE Practice Questions
210 pages
L2 数量课件
No ratings yet
L2 数量课件
196 pages
Topic 3 Dealing With Uncertainty Slides
No ratings yet
Topic 3 Dealing With Uncertainty Slides
204 pages
Control System
No ratings yet
Control System
7 pages
Download full Proof and Falsity A Logical Investigation 1st Edition Nils Kürbis ebook all chapters
100% (1)
Download full Proof and Falsity A Logical Investigation 1st Edition Nils Kürbis ebook all chapters
21 pages
Class XII Applied Math Mock Test
No ratings yet
Class XII Applied Math Mock Test
4 pages
Fatigue in Ls-Dyna
No ratings yet
Fatigue in Ls-Dyna
9 pages
Project Report For Class 11
100% (1)
Project Report For Class 11
17 pages
Multivariable Calculus - MTH 243, Sections 1 & 3
No ratings yet
Multivariable Calculus - MTH 243, Sections 1 & 3
5 pages
Business Analytics Module 2 Summary
No ratings yet
Business Analytics Module 2 Summary
3 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
3.4 Transformations of Power Functions
No ratings yet
3.4 Transformations of Power Functions
2 pages
Report of The Thirty Ninth Canadian Mathematical Olympiad 2007
No ratings yet
Report of The Thirty Ninth Canadian Mathematical Olympiad 2007
15 pages
Himat: H Control Design: Purpose
No ratings yet
Himat: H Control Design: Purpose
7 pages
Cad Manual 2016 17 - 2
100% (1)
Cad Manual 2016 17 - 2
52 pages
Flow Through An Orifice
No ratings yet
Flow Through An Orifice
5 pages
Chapter 6 _ Sampling Circuits
No ratings yet
Chapter 6 _ Sampling Circuits
36 pages
Math 210-Linear Algebra i
No ratings yet
Math 210-Linear Algebra i
2 pages
Motion Graphing Interactive Stations (Digital)
No ratings yet
Motion Graphing Interactive Stations (Digital)
21 pages
MATH#1
No ratings yet
MATH#1
2 pages
Shanks Transformation
No ratings yet
Shanks Transformation
42 pages
2018 Blind Radio Tomography
No ratings yet
2018 Blind Radio Tomography
15 pages
Ordinal Numbers in English Woodward English
No ratings yet
Ordinal Numbers in English Woodward English
6 pages
Vector Calculus
No ratings yet
Vector Calculus
18 pages
497717Reluctance Electric Machines Design And Control Boldea Ion Tutelea download
No ratings yet
497717Reluctance Electric Machines Design And Control Boldea Ion Tutelea download
76 pages
VTU CSE Updated Question Bank June&Makeup 2024
No ratings yet
VTU CSE Updated Question Bank June&Makeup 2024
5 pages
Friction Problem Analysis
No ratings yet
Friction Problem Analysis
9 pages