0% found this document useful (0 votes)
36 views5 pages

Lagrange Mult

The document summarizes Lagrange multipliers, a method for finding the extremum (maximum or minimum) of a function subject to equality constraints. It begins by stating Lagrange's theorem, which introduces Lagrange multipliers as new parameters to solve a system of equations. It then provides examples of using Lagrange multipliers to solve constrained optimization problems. Finally, it gives a short proof of Lagrange's theorem showing that the gradients of the objective function and constraints must be parallel at an extremum.

Uploaded by

shambel Znabu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views5 pages

Lagrange Mult

The document summarizes Lagrange multipliers, a method for finding the extremum (maximum or minimum) of a function subject to equality constraints. It begins by stating Lagrange's theorem, which introduces Lagrange multipliers as new parameters to solve a system of equations. It then provides examples of using Lagrange multipliers to solve constrained optimization problems. Finally, it gives a short proof of Lagrange's theorem showing that the gradients of the objective function and constraints must be parallel at an extremum.

Uploaded by

shambel Znabu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

The Method of Lagrange Multipliers

S. Sawyer — July 23, 2004

1. Lagrange’s Theorem. Suppose that we want to maximize (or mini-


mize) a function of n variables
f (x) = f (x1 , x2 , . . . , xn ) for x = (x1 , x2 , . . . , xn ) (1.1a)
subject to p constraints
g1 (x) = c1 , g2 (x) = c2 , ..., and gp (x) = cp (1.1b)
As an example for p = 1, find
( n n
)
X X
min x2i : xi = 1 (1.2a)
x1 ,...,xn
i=1 i=1

or for p = 2
5
(
X x1 + 2x2 + x3 = 1 and
min x2i subject to (1.2b)
x1 ,...,x5
i=1
x3 − 2x4 + x5 = 6
Pn
A first guess for (1.1) (with f (x) = i=1 x2i in (1.2) ) might be to look for
solutions of the n equations

f (x) = 0, 1≤i≤n (1.3)
∂xi
However, this leads to xi = 0 in (1.2), which does not satisfy any of the
constraints.
Lagrange’s solution is to introduce p new parameters (called Lagrange
Multipliers) and then solve a more complicated problem:
Theorem (Lagrange) Assuming appropriate smoothness conditions, min-
imum or maximum of f (x) subject to the constraints (1.1b) that is not on
the boundary of the region where f (x) and gj (x) are defined can be found
by introducing p new parameters λ1 , λ2 , . . . , λp and solving the system
 
Xp
∂ 
f (x) + λj gj (x) = 0, 1≤i≤n (1.4a)
∂xi j=1

gj (x) = cj , 1≤j≤p (1.4b)

This amounts to solving n+p equations for the n+p real variables in x and λ.
In contrast, (1.3) has n equations for the n unknowns in x. Fortunately, the
system (1.4) is often easy to solve, and is usually much easier than using the
constraints to substitute for some of the xi .
The Method of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2. Examples. (1) There are p = 1 constraints in (1.2a), so that (1.4a)


becomes
à n n
!
∂ X X
x2k + λ xk = 2xi + λ = 0, 1≤i≤n
∂xi
k=1 k=1
Pn Pn
with i=1 xi = 1. Thus xi = −λ/2 for 1 ≤ i ≤ n and hence i=1 xi =
−nλ/2 = 1. We conclude λ = −2/n, from which it follows that xi = 1/n for
1 ≤ i ≤ n.
For xi = 1/n, f (x) = n/n2 = 1/n. One can check that this is a minimum
as opposed to a maximum or saddle point by noting that f (x) = 1 if x1 = 1,
xi = 0 for 2 ≤ i ≤ n.
(2) A System with Two Constraints: There are p = 2 constraints in (1.2b),
which is to find
X5 ½
2
x1 + 2x2 + x3 = 1 and
min xi subject to (2.1)
x1 ,...,x5 x − 2x + x = 6
i=1 3 4 5

The method of Lagrange multipliers says to look for solutions of


à 5 !
∂ X
x2k + λ(x1 + 2x2 + x3 ) + µ(x3 − 2x4 + x5 ) = 0 (2.2)
∂xi
k=1

where we write λ, µ for the two Lagrange multipliers λ1 , λ2 .


The equations (2.2) imply 2x1 + λ = 0, 2x2 + 2λ = 0, 2x3 + λ + µ = 0,
2x4 − 2µ = 0, and 2x5 + µ = 0. Combining the first three equations with
the first constraint in (2.1) implies 2 + 6λ + µ = 0. Combining the last three
equations in (2.2) with the second constraint in (2.1) implies 12 + λ + 6µ = 0.
Thus
6λ + µ = −2
λ + 6µ = −12
Adding these two equations implies 7(λ + µ) = −14 or λ + µ = −2.
Subtracting the equations implies 5(λ − µ) = 10 or λ − µ = 2. Thus
(λ + µ) + (λ − µ) = 2λ = 0 and λ = 0, µ = −2. This implies x1 = x2 = 0,
x3 = x5 = 1, and x4 = −2. The minimum value in (2.1) is 6.
(3) A BLUE problem: Let X1 , . . . , Xn be independent random variables
with E(Xi ) = µ and Var(Xi ) = σi2 . Find the coefficients ai that minimize
à n ! à n !
X X
Var ai X i subject to E ai Xi = µ (2.3)
i=1 i=1
The Method of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Pn
This asks us to find the Best Linear Unbiased Estimator i=1 ai Xi (abbre-
viated BLUE) for µ for given values of σi2 .
Since Var(aX) = a2 Var(X) and Var(X + Y ) = Var(X) Pn + Var(Y )
for
Pn independent random
Pn variables X and Y , we have Var( i=1 ai Xi ) =
2 2 2
i=1 ai Var(Xi ) = i=1 ai σi . Thus (2.3) is equivalent to finding

n
X n
X
min a2i σi2 subject to ai = 1
i=1 i=1

Using one Lagrange multiplier λ for the constraint


Pnleads to the equations
2 2
2ai σi + λ = 0 or ai = −λ/(2σi ). The constraint i=1 ai = 1 then implies
that the BLUE for µ is
n
X .X
n
ai Xi where ai = c/σi2 for c=1 (1/σk2 ) (2.4)
i=1 k=1

Pn Pn
If σi2 = σ 2 for all i, then ai = 1/n and i=1 ai Xi = (1/n) i=1 Xi = X is
the BLUE for µ. Pn
Conversely, if Var(Xi ) = σi2 is variable, then the BLUE i=1 ai Xi
for µ puts relatively less weight on the noisier (higher-variance) observations
(that is, the weight ai is smaller), but still uses the information in the noiser
observations. Formulas like (2.4) are often used in survey sampling.

3. A Short Proof of Lagrange’s Theorem. The extremal condition


(1.3) (without any constraints) can be written in vector form as
µ ¶
∂ ∂ ∂
∇f (x) = f (x), f (x), . . . , f (x) = 0 (3.1)
∂x1 ∂x2 ∂xn

By Taylor’s Theorem

f (x + hy) = f (x) + hy · ∇f (x) + O(h2 ) (3.2)

where h is a scalar, O(h2 ) denotes terms that are bounded by h2 , and x · y is


the dot product. Thus (3.1) gives the vector direction in which f (x) changes
the most per unit change in x, where unit change in measured in terms of
the length of the vector x.
In particular, if y = ∇f (x0 ) 6= 0, then

f (x0 − hy) < f (x0 ) < f (x0 + hy)


The Method of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

for sufficiently small values of h, and the only way that x0 can be a local
minimum or maximum would be if x0 were on the boundary of the set of
points where f (x) is defined. This implies that ∇f (x0 ) = 0 at non-boundary
minimum and maximum values of f (x).
Now consider the problem of finding

max f (x) subject to g(x) = c (3.3)

for one constraint. If x = x1 (t) is a path in the surface defined by g(x) = c,


then by the chain rule
d ¡ ¢ d ¡ ¢
g x1 (0) = x1 (0) · ∇g x1 (0) = 0 (3.4)
dt dt
¡ ¢
This implies that ∇g x1 (0) is orthogonal to the tangent vector (d/dt)x1 (0)
for any path x1 (t) in the surface defined by g(x) = c.
Conversely, if x0 is any point in the surface g(x) = c and y is any vector
such that y · ∇g(x0 ) = 0, then it follows from the Implicit Function Theorem
there exists a path x1 (t) in the surface g(x) = c such that x1 (0) = x0 and
(d/dt)x1 (0) = y. This result and (3.4) imply that the gradient vector ∇g(x0 )
is always orthogonal to the surface defined by g(x) = c at x0 .
Now let x0 be a solution of (3.3). I claim that ∇f (x0 ) = λ∇g(x0 )
for some scalar λ. First, we can always write ∇f (x0 ) = c∇g(x0 ) + y where
y ·∇g(x0 ) = 0. If x(t) is a path in the surface with x(0) = x0 and (d/dt)x(0)·
∇f (x0 ) 6= 0, it follows from (3.2) with y = (d/dt)x(0) that there are values
for f (x) for x = x(t) in the surface that both larger and smaller than f (x0 ).
Thus, if x0 is a maximum of minimum of f (x) in the surface and
∇f (x0 ) = c∇g(x0 )+y for y ·∇g(x0 ) = 0, then y ·∇f (x0 ) = y ·∇g(x0 )+y ·y =
y · y = 0 and y = 0. This means that ∇f (x0 ) = c∇g(x0 ), which completes
the proof of Lagrange’s Theorem for one constraint (p = 1).
Next, suppose that we want to solve

max f (x) subject to g1 (x) = c1 , . . . , gp (x) = cp (3.5)

for p constraints. Let x0 be a solution of (3.5). Recall that the each vector
∇gj (x0 ) is orthogonal to the surface gj (x) = cj at x0 . Let L be the linear
space
L = span{ ∇gj (x0 ) : 1 ≤ j ≤ p }
I claim that ∇f (x0 ) ∈ L. This would imply
p
X
∇f (x0 ) = λj ∇gj (x0 )
j=1
The Method of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

for some choice of scalar values λj , which would prove Lagrange’s Theorem.
To prove that ∇f (x0 ) ∈ L, first note that, in general, we can write
∇f (x0 ) = w + y where w ∈ L and y is perpendicular to L, which means that
y·z = 0 for any z ∈ L. In particular, y·∇gj (x0 ) = 0 for 1 ≤ j ≤ p. Now find a
path x1 (t) through x0 in the intersection of the surfaces gj (x) = cj such that
x1 (0) = x0 and (d/dt)x1 (0) = y. (The existence of such a path for sufficiently
small t follows from a stronger form of the Implicit Function Theorem.) It
then follows from (3.2) and (3.5) that y · ∇f (x0 ) = 0. Since ∇f (x0 ) = w + y
where y · w = 0, it follows that y · ∇f (x0 ) = y · w + y · y = y · y = 0 and y = 0,
This implies that ∇f (x0 ) = w ∈ L, which completes the proof of Lagrange’s
Theorem.

4. Warnings. The same warnings apply here as for most methods for
finding a maximum or minimum:
The system (1.4) does not look for a maximum (or minimum) of f (x)
subject to constraints gj (x) = cj , but only a point x on the set of values
determined by gj (x) = cj whose first-order changes in x are zero. This is
satisfied by a value x = x0 that provides a minimum or maximum typical
for f (x) in a neighborhood of x0 , but may only be a local minimum or
maximum. There may be several local minima or maxima, each yielding a
solution of (1.4). The criterion (1.4) also holds for “saddle points” of f (x)
that are local maxima in some directions or coordinates and local minima in
others. In these cases, the different values f (x) at the solutions of (1.4) have
to be evaluated individually to find the global maximum.
A particular situation to avoid is to look for a maximum value of f (x)
by solving (1.4) or (1.3) when f (x) takes arbitrarily large values when any of
the components of x are large (as is the case for f (x) in (1.2) ) and (1.4) has a
unique solution x0 . In that case, x0 is probably the global minimum of f (x)
subject to the constraints, and not a maximum. In that case, rather than
find the best possible value of f (x), one may end up with the worst possible
value. After solving (1.3) or (1.4), one often has to look at the problem more
carefully to see if it is a global maximum, a global minimum, or neither.
Another situation to avoid is when the maximum or minimum is on the
boundary of the values for which f (x) is defined. In that case, the maximum
or minimum is not an interior value, and the first-order changes in f (x) (that
is, the partial derivatives of f (x) ) may not be zero at that point. An example
is f (x) = x on the unit interval 0 ≤ x ≤ 1. The minimum value of f (x) = x
on the interval is x = 0 and the maximum is x = 1, but neither are solutions
of f 0 (x) = 0.

You might also like