Appendix E. Lagrange Multipliers
Appendix E. Lagrange Multipliers
by Christopher M. Bishop
Springer Verlag, New York 2006
(E. I )
One approach would be to solve the constraint equation (E.l) and thus express x 2 as
a function of x 1 in the fonn x 2 = h (xt) . This can then be substituted into f (x 1 , :1: 2 )
to give a function of x 1 alone of the form f (x 1 , h (xt) ). The maximum with respect
to x 1 could then be found by differentiation in the usual way, to give the stationary
value xi, with the corresponding value of x 2 given by x~ = h (xt).
One problem with this approach is that it may be difficult to find an analytic
solution of the constraint equation that allows x 2 to be expressed as an explicit func-
tion of x 1 . Also, this approach treats x 1 and x 2 differently and so spoils the natural
symmetry between these variables.
A more elegant, and often simpler, approach is based on the introduction of a
parameter A called a Lagrange multiplier. We shall motivate this technique from
a geometrical perspective. Consider a D-dimensional variable x with components
x 1 , . . . , x v. The constraint equation g (x ) = 0 then represents a (D - 1)-dimensional
surface in x-space as indicated in Figure E.l.
We first note that at any point on the constraint surface the gradient \7 g (x ) of
the constraint function will be orthogonal to the surface. To see this, consider a point
x that lies on the constraint surface, and consider a nearby point x + £ that also lies
on the surface. If we make a Taylor expansion around x, we have
g (x + £) ~ g (x ) + £T\7g (x ). (E.2)
Because both x and x + £lie on the constraint surface, we have g(x ) = g (x + £) and
hence £T\7g (x ) ~ 0. In the limit 11 £11 ---+ 0 we have £T\7g (x ) = 0, and because£ is
707
708 E. LAGRANGE MULTIPLIERS
then parallel to the constraint surface g(x ) = 0, we see that the vector \1 g is normal
to the surface.
Next we seek a point x* on the constraint surface such that f (x ) is maximized.
Such a point must have the property that the vector \1 f (x ) is also orthogonal to the
constraint surface, as illustrated in Figure E. I, because otherwise we could increase
the value of f (x ) by moving a short distance along the constraint surface. Thus \1 f
and \1 g are parallel (or anti-parallel) vectors, and so there must exist a parameter>.
such that
\lf + >.\lg = O (E.3)
where).. #- 0 is known as a Lagrange multiplier. Note that).. can have either sign.
At this point, it is convenient to introduce the Lagrangian function defined by
L (x , >..) =f (x ) + >.g(x ). (E.4)
) g(x ) = 0
Solution of these equations then gives the stationary point as (:1:t, : r~) = (~,~), and
1 is normal the corresponding value for the Lagrange multiplier is A = 1.
So far, we have considered the problem of maximizing a function subject to an
naximized. equality constraint of the form g(x ) = 0. We now consider the problem of maxi-
onal to the mizing f (x ) subject to an inequality constraint of the form g(x ) ~ 0, as iiiustrated
ld increase in Figure E.3.
'·Thus \1 f There are now two kinds of solution possible, according to whether the con-
arameter A strained stationary point lies in the region where g(x ) > 0, in which case the con-
straint is inactive, or whether it lies on the boundary g(x ) = 0, in which case the
(E.3) constraint is said to be active. In the former case, the function g(x ) plays no role
her sign. and so the stationary condition is simply V' f (x ) = 0. This again corresponds to
efined by a stationary point of the Lagrange function (E.4) but this time with A = 0. The
latter case, where the solution lies on the boundary, is analogous to the equality con-
(E.4)
straint discussed previously and corresponds to a stationary point of the Lagrange
= 0. Fur- function (E.4) with A # 0. Now, however, the sign of the Lagrange multiplier is
= 0. crucial, because the function f (x ) will only be at a maximum if its gradient is ori-
tg(x ) = 0, ented away from the region g(x ) > 0, as illustrated in Figure E.3. We therefore have
: stationary V' f (x ) = - AV' g(x ) for some value of A > 0.
ctor x, this For either of these two cases, the product Ag(x ) = 0. Thus the solution to the
value of A.
tarity equa- Figure E.3 Illustration of the problem of maximizing
tltiplier' ). f( x ) subject to the inequality constraint
he function g(x ) ;;:: 0.
- 1 = 0, as
1 by
(E.5)
, and A give
(E.6)
(E.7)
(E.8)
710 E. LAGRANGE MULTIPLffiRS
g(x) ~ 0 (E.9)
>. ~ 0 (E.10)
>.g(x ) = 0 (E.ll)
These are known as the Karush-Kuhn -Tucker (KKT) conditions (Karush, 1939; Kuhn
and Tucker, 1951).
Note that if we wish to minimize (rather than maximize) the function f (x ) sub-
ject to an inequality constraint g(x ) ~ 0, then we minimize the Lagrangian function
L (x , >. ) = f (x ) - >.g(x ) with respect to x, again subject to >. ~ 0.
Finally, it is straightforward to extend the technique of Lagrange multipliers to
the case of multiple equality and inequality constraints. Suppose we wish to maxi- Referenc~
mize f (x ) subject to gj(x ) = 0 for j = 1, ... , J , and hk(x ) ~ 0 fork = 1, ... , K.
We then introduce Lagrange multipliers {>.j } and {fLk}, and then optimize the La-
grangian function given by
Abramowitz, M. ar
J K
of Mathematica
L (x , {>. j } , {ILk }) = f (x ) + L Aj gj(x ) + L /Lkhk(x ) (E.l2) Adler, S. L. (1981)
j=l k= l
Monte Carlo e
subject to /Lk ~ 0 and /Lkhk( x ) = 0 fork = 1, . .. , K . Extensions to constrained tion for multiqt
Appendix D functional derivatives are similarly straightforward. For a more detailed discussion D 23, 2901- 291
of the technique of Lagrange multipliers, see Nocedal and Wright ( 1999). Ahn, J. H. and J. !-
algorithm for pr
ral Computati01
Aizerman, M. A., E
noer (1964). Th
recognition lean
functions. Autot.
1175- 1190.
Akaike, H. (1974).
identification. /l
Control 19, 716
Ali, S. M. and S. D
of coefficients o
from another. ]Ol
ciety, B 28(1), 1 ~
Allwein, E. L., R. E.
Reducing multic
proach for margi1
Learning Resear.
Amari, S. (1985). Di
in Statistics. Sprl