Optimization Problems With Constraints - The Method of Lagrange Multipliers
Optimization Problems With Constraints - The Method of Lagrange Multipliers
Optimization Problems With Constraints - The Method of Lagrange Multipliers
Lecture 13
The plane x + y − 2z = 6 passes through the three points (6, 0, 0), (0, 6, 0) and (0, 0, −3) on,
respectively, the x, y and z-axes. As such, it forms a tetrahedron with the xy, yz and xz-planes that
lies below the xy-plane. We expect the desired point to lie on the part of this plane that forms a
triangular face of this tetrahedron.
p
The distance of a point P with coordinate (x, y, z) to the origin is given by D = x2 + y 2 + z 2 .
We are asked to minimize D. This is equivalent to minimizing the square of the distance
91
We now come to a fundamental point regarding problems with constraints: Each constraint in a
problem that can be expressed in the form
F (x, y, z) = 0 (2)
reduces the dimensionality of the problem, i.e., the number of independent variables, by one. In the
problem posed above, there are three variables, x, y and z needed to uniquely address a point P in
R3 . But the condition that the point must lie on the given plane can be written as the constraint
F (x, y, z) = x + y − 2z − 6 = 0. (3)
Therefore the number of independent variables in this problem is two. We shall rewrite this problem
as follows:
Minimize f (x, y, z) = x2 + y 2 + z 2
subject to x + y − 2z − 6 = 0.
In this problem, as in the one considered in Lecture 11, we can use the constraint to express one
variable in terms of the other two. For example, let us consider z as a function of x and y, i.e., z(x, y):
1 1
z = x + y − 3. (4)
2 2
This produces a function of two variables h(x, y) that we must now minimize over R2 , i.e.,
2
1 1
D2 = h(x, y) = x2 + y 2 + x+ y−3 , (x, y) ∈ R2 . (5)
2 2
Note: The following details were not presented in class, in order to save some time
We now determine the critical points of h(x, y):
∂h 1 1 1
= 2x + 2 x+ y−3 · (6)
∂x 2 2 2
∂h 1 1 1
= 2y + 2 x+ y−3 ·
∂y 2 2 2
~
The condition for a critical point ∇h(x, y) = (0, 0) leads to the equations
5 1
x+ y =3 (7)
2 2
1 5
x + y = 3.
2 2
92
This system has the unique solution x = y = 1.
We now use the constraint to obtain z:
1 1
z = x + y − 3 = −2. (8)
2 2
√ √
Therefore the desired point P is (1, 1, −2). The distance between P and the origin is 1 + 1 + 4 = 6.
We should really check that the above point represents the absolute minimum. Firstly, the point
(x, y) = (1, 1) was found to be the only critical point of h(x, y) defined above. A look at the second-
order derivatives at P gives
so that B 2 − AC = −6 < 0. Since A > 0, we have that (1, 1) is a relative minimum. Since the function
h(x, y) cannot be negative and can assume arbitrarily large values, we can conclude that (1, 1) is an
absolute minimum.
(We can add more constraints, e.g., G(x, y, z), and will do so later.)
Here’s the final result – we’ll provide a brief derivation later:
2. Now minimize or maximize the function L(x, y, z, λ) with respect to the four variables (x, y, z, λ) ∈
R4 .
You might be saying to yourself, “Wow! It doesn’t look like we’ve simplified the problem. We
now have four variables over which to optimize. We actually went down from three to two variables
when we used the constraint to remove a variable. What gives?”
93
The answer is that the method of Lagrange multipliers is a general method that is effective in
solving a wide variety of problems. It may not always be possible to express one variable in terms
of the others (recall our discussion on implicit functions). Furthermore, the method of Lagrangians
is very useful in more general or abstract problems involving an arbitrary number of independent
variables and/or constraints. For example, in a future course or courses in Physics (e.g., thermal
physics, statistical mechanics), you should see a derivation of the famous “Boltzman distribution” of
the energies of atoms in an ideal gas using Lagrange multipliers.
Example: Let us return to the optimization problem with constraints discusssed earlier: Find the
point P on the plane x + y − 2z = 6 that lies closest to the origin. Recall that we sought to minimize
the square of the distance:
Minimize f (x, y, z) = x2 + y 2 + z 2
subject to x + y − 2z − 6 = 0.
= x2 + y 2 + z 2 + λ(x + y − 2z − 6).
We must find the critical points of L in terms of the four variables x, y, z and λ:
∂L
= 2x + λ = 0 (12)
∂x
∂L
= 2y + λ = 0
∂y
∂L
= 2z − 2λ = 0
∂z
∂L
= x + y − 2z − 6 = 0.
∂λ
Note that the final equation simply correponds to the constraint applied to the problem. Clever, eh?
There are often several ways to solve problems involving Lagrangians and Lagrangian multipliers. The
most important point to remember is that one method does not often work for all problems. In this
case, we can find the critical point rather easily as follows. We use the first three equations to express
x, y and z in terms of λ:
λ λ
x=− , y=− , z = λ. (13)
2 2
94
We now substitute these results into the fourth equation:
λ λ
− − − 2λ − 6 = 0 => 3λ = −6, (14)
2 2
which implies that λ = −2. From the above three equations, we have determined x, y and z:
x = 1, y = 1, z = −2. (15)
Therefore the desired point is (1, 1, −2), which is in agreement with the result obtained in the previous
lecture.
Example:
The ellipse represents the constraint in this problem. We first express this constraint in the form
F (x, y, z) = 0, i.e.,
F (x, y, z) = x2 + 2y 2 + 3z 2 = 1 = 0. (17)
The critical points of the Lagrangian must satisfy the following equations
∂L
= yz + 2λx = 0 (a) (19)
∂x
∂L
= xz + 4λy = 0 (b)
∂y
∂L
= xy + 6λz = 0 (c)
∂z
∂L
The final condition = 0 yields the constraint.
∂λ
Once again, we’re faced with the problem of solving this system of equations, which is now
nonlinear. Here is a “trick” that works because of the symmetry of the problem. (It won’t always
95
work!) Multiply the first equation by x, the second by y and the third by z:
There are a number of possible paths to pursue, but we consider the following: Since there is an xyz
term in each equation, we can equate the other terms, i.e.,
and then remove the minus signs. Or we can subtract (e) from (d),
then (f) from (e), and finally (f) from (d). The final result is the same:
or
λx2 = 2λy 2 = 3λz 2 , (24)
It is tempting to divide out the λ, but we must consider the possibility that λ = 0.
Case 1: λ = 0 It follows, from (a), (b) and (c), that at least two of x, y and z are zero, implying that
f (x, y, z) = 0. We can easily solve for these points, since they lie on the ellipse:
1 1
(±1, 0, 0) , 0, ± √ , 0 , 0, 0, ± √ , . (25)
2 3
They lie at the extreme top and bottom and sides of the ellipse. At all of these six points, f (x, y, z) = 0.
x2 = 2y 2 = 3z 2 . (26)
The point (0, 0, 0) is inadmissible, since it does not lie on the ellipse. Since this point must lie on the
ellipse, we set
x2 = 2y 2 = 3z 2 = t. (27)
96
Substitution into the equation for the ellipse yields,
1
x2 + 2y 2 + 3z 2 = 3t = 1, implying that t = . (28)
3
Here we present another method of solving the above problem. Admittedly, it is a longer method.
But it is based on another idea that may be useful in other situations. It’s always helpful to be able
to consider more than one method for the solution of a problem.
We start again with Eqs. (d), (e) and (f) above. Instead of removing the xyz terms, we add up
both sides of the three equations:
1. λ = 0 or
1
2. x2 = 3 which implies that x = ± √13 .
97
Now multiply both sides of (b) by y:
1. λ = 0 or
1
2. y 2 = 6 which implies that y = ± √16 .
1. λ = 0 or
1
2. z 2 = 9 which implies that z = ± 13 .
We see that the case λ 6= 0 yields the same points that were found by the previous method. And
the case λ = 0 can be treated in the same way as before. This illustrates that there may be more than
one way to solve a problem involving Lagrange multipliers.
Note that in both of the examples examined above, we did not bother with the task of determining
whether or not the critical points of the Lagrangian L, hence the function f , were local minima, maxima
or saddle points. (In fact, we really couldn’t do this because we haven’t covered the second derivative
test for functions of more than two variables!) Just as in the earlier discussion of finding absolute
maxima and minima of functions on restricted domains, it is sufficient to find critical points and
evaluate the functions at these critical points, from which we can determine the absolute maximum
and/or minimum values. That being said, it may be necessary, in some cases, to examine the behaviour
of a function for arbitrarily large values of the independent variables x, y, etc., to ensure that we have,
in fact, found an absolute maximum or minimum (or both).
98
Lecture 14
maximize/minimize f (x, y)
subject to F (x, y) = 0 .
The constraint F (x, y) = 0 defines a curve C in R2 as sketched in the figure below. The points
(x, y) that lie on C represent the only allowable values of (x, y) that may be considered in the evaluation
of f (x, y). (For example, recall the path of the ant on the hotplate.) Associated with each point P
(x, y) on this curve C is the point Q (x, y, f (x, y)) that lies on the graph of f (x, y), in other words, the
surface z = f (x, y). As the point P moves along the xy-plane, it traces out a curve D in R3 which
lies on the surface z = f (x, y).
z
curve D on surface z = f (x, y)
generated by points (x, y) on C
z = f (x, y)
Q
O
y
P
x
curve C defined by F (x, y) = 0
We are interested in finding maximum or minimum values of f (x, y) evaluated over the curve C.
At the points where such extrema can occur, the rate of change of f (x, y) must be zero, but in the
direction of the curve C at such points. But the direction of the curve at a point P on the curve is
given by the tangent to the curve at P . Therefore, it follows that the condition for a maximum or
minimum of f (x, y) is that the directional derivative of f (x, y) in the direction of the tangent vector
to curve C is zero.
To see this a little more clearly, let us assume that we can parametrize the curve C as (x(t), y(t)).
As we move over the curve C, we consider the value of f (x(t), y(t)) evaluated at points on C. We now
99
look for points at which the rate of change of f (x(t), y(t)) is zero, in other words, the total derivative
df
= 0. (39)
dt
d ∂f dx ∂f dy
f (x(t), y(t)) = + = 0. (40)
dt ∂x dt ∂y dt
This again verifies that the directional derivative of f in the direction of v is zero. We shall return to
this result.
Now recall that we are moving along the curve C (x(t), y(t)) because of the constraint F (x, y) = 0.
This means that
F (x(t), y(t)) = 0, for all t. (42)
In other words, the value of F (x, y) is constant for all t: After all, curve C is a zero-level set for the
function F ! This means that
dF
= 0. (43)
dt
But the total derivative of F with respect to t is given by
d ∂F dx ∂F dy
F (x(t), y(t)) = + = 0. (44)
dt ∂x dt ∂y dt
~ · v = 0, for all t.
∇F (45)
~ and ∇F
Eqs. (41) and (45) imply that the two vectors ∇f ~ are perpendicular to the tangent vector
v at a relative maximum or minimum of f , as measured over curve C. The only way that two vectors
a and b can be perpendicular to a given vector u in the plane is that one is a multiple of the other,
i.e., b = Ka for some constant K. Therefore, at the critical point of f as measured over C, we have
the result that
~ = K ∇F.
∇f ~ (46)
100
The expression in brackets is the Lagrangian function associated with this optimization problem with
constraint, with λ = −K:
L(x, y, λ) = f (x, y) + λF (x, y). (48)
In summary, the condition for a relative maximum or minimum of f (x, y) as evaluated over the
curve defined by the constraint F (x, y) = 0 is that the point be a critical point of the Lagrangian
function L(x, y, λ) associated with the optimization problem, i.e.
∂L ∂L
= = 0. (49)
∂x ∂y
∂L
= F (x, y) = 0, (50)
∂λ
Note: You will probably see that Eq. (46) is how Stewart defines the method of Lagrange multipliers
in his textbook. This way of approaching the problem is perfectly fine – one is simply bypassing the
formal construction of the Lagrangian function L and then searching for critical points. We have
included the formal definition of L in this course because it represents the more “traditional” way of
discussing the subject, especially in Science and Engineering-oriented books. It may also be an easier
approach to remember.
The above discussion may still seem somewhat mysterious. A look at how the level sets of the function
f (x, y) being optimized, and their relation to the constraint curve C, will help. (This examination
was actually presented first in the lecture, before the above analysis.)
Let’s consider the following scenario, as sketched graphically below. Some level sets of the function
f (x, y) are drawn, along with the curve C that is produced by the constraint F (x, y) = 0. As an
observer travels along C, the values of f measured by the observer will be given by the values of the
level sets that the observer intersects along the journey.
Note that there is one special level set of f that intersects the curve C at only one point P ,
namely, the level set corresponding to f (x, y) = 35. As one approaches this point of intersection from
one side of P , the observed value of f increases until it attains the value 35 at P . As we continue,
101
y
40 P
30 35
20
10
5
x
therefore moving away from P , the observed value of f then decreases. As a result, we have located
a local maximum point of f subject to the constraint: It is the point P .
There is something special about this point of intersection, P : You’ll see that the two curves have
a common tangent vector. If they didn’t, then the curve C would cross the level set of f at 35
and continue to encounter higher values of f , in the same way that it crosses the level set of f at 30
and then proceeds to move on, finding larger values of f .
If these two curves have a common tangent vector at point P , it follows that their normal
vectors are either parallel (point in same direction) or antiparallel (point in opposite
direction). But what are these normal vectors?
2. For the function F : We are looking for the normal vector to the curve C defined by F (x, y) = 0.
~ (and multiples) are
But, as mentioned earlier, this is the zero-level set of F . Therefore ±∇F
normal vectors.
at the critical point P , for some constant K ∈ R. But this is the same result as in Eq. (46)!
~ , is fixed in space, as is
An important note: Note that the gradient vector field of f , namely ∇f
the gradient vector field of F over the curve C. It is at these special points of intersection, where Eq.
(51) is satisfied, however, where local max/min of f subject to the constraint are located.
102
Extension to higher dimensions
The above discussion can be extended to treat optimization problems for functions of more variables,
e.g., f (x, y, z). In higher dimensions, however, the situation is a little more complicated – for example,
if two vectors are perpendicular to a given vector v, it doesn’t follow that one is a multiple of the
other. In R3 , they must lie on the plane that has v as its normal. In any case, the above discussion
will hopefully give some idea as to why the Lagrangian method works.
The method of Lagrange multipliers can also accomodate several constraints. Each constraint will re-
quire a Lagrangian multiplier. For example, the Lagrangian associated with the optimization problem
with two constraints
maximize/minimize f (x, y, z)
subject to F (x, y, z) = 0 and G(x, y, z) = 0 ,
is given by
L(x, y, z, λ, µ) = f (x, y, z) + λF (x, y, z) + µG(x, y, z). (52)
∂L ∂L ∂L ∂L ∂L
= 0, = 0, = 0, = 0, = 0. (53)
∂x ∂y ∂z ∂λ ∂µ
f (x, y, z) = x2 + 2y 2 + z 2 (54)
x + 2y + 3z = 1 (55)
x − 2y + z = 5.
103
Before we outline the solution of this problem, let us step back for a moment and consider its
geometrical interpretation. The level sets of f (x, y, z) are concentric ellipsoids centered at the origin
(0, 0, 0). As we move outward, the values associated with these level sets increase.
The two constraints represent equations of planes. The fact that they must be satisfied simulta-
neously means that the planes are intersecting – in this case, their intersection produces a line in R3 .
As one moves along this line, closer and closer to the origin, the value of f on the line will decrease
as it intersects level sets associated with lower and lower values of f . If the line actually were to go
through the origin (which is not the case, since (0, 0, 0) does not satisfy any of the two equations),
the value of f evaluated on the line would go to zero.) At some point, a minimal value of f will be
attained, and the values will begin to increase.
F (x, y, z) = x + 2y + 3z − 1 = 0 (56)
G(x, y, z) = x − 2y + z − 5 = 0.
∂L
=0 : 2x + λ + µ = 0 (58)
∂x
∂L
=0 : 4y + 2λ − 2µ = 0
∂y
∂L
=0 : 2z + 3λ + µ = 0
∂z
∂L
=0 : x + 2y + 3z − 1 = 0
∂λ
∂L
=0 : x − 2y + z − 5 = 0.
∂µ
104
the first three equations from above to express x, y and z in terms of λ and µ:
1 1 1
x = − (λ + µ), y = − (2λ − 2µ), z = − (3λ + µ). (59)
2 4 2
Now substitute these results into the final two equations, which represent the constraints:
−6λ − µ = 1 (60)
−λ − 2µ = 5.
3 29
λ= , µ=− . (61)
11 11
13 16 10
x= , y=− , z= . (62)
11 11 11
71
This is the only critical point for this problem. At this point, f (x, y, z) =. This must correspond
11
to a global minimum since f (x, y, z) can assume arbitrary large values by letting x, y and z become
arbitrarily large while they satisfy the two constraints.
The discussion in this section is intended to be brief. We outline the application of method of La-
grangian multipliers a fundamental problem in Statistical Mechanics: finding the most probable dis-
tribution of energies assumed by a system of atoms or molecules. From now on, we simply refer to
these particles as molecules.
Consider a system of N independent, identical and distinguishable atoms or molecules, for ex-
ample, a container of oxygen gas. By “distinguishable,” we mean that we can index each molecule
uniquely and keep track of it. We assume that each molecule can exist in one of n states, 1, 2, · · · , n,
with respective energies E1 , E2 , · · · , En . (Examples of these energies: the electronic energies that can
be assumed by an atom, the vibrational energies of a diatomic molecule.) We’ll also let Nk denote the
“occupation number” of the kth state, i.e. the number of molecules in that state.
105
State 1 2 ··· n
Energy E1 E2 ··· En
Occupation No. N1 N2 ··· Nn
N1 + N2 + · · · + Nn = N. (63)
We also impose the condition that the total energy of the system is constant, i.e.,
N1 E1 + N2 E2 + · · · + Nn En = E. (64)
How many different ways are there of arranging N different molecules among these different
states?
This is a combinatorial problem with which you may be familiar. The answer is
N!
W (N1 , N2 , · · · , Nn ) = . (65)
N1 N2 ! · · · Nn !
As an example, we consider the very simple case, N = 2, n = 2, i.e., two molecules and two energy
states. There are four ways to arrange the molecules:
This is in agreement with the above formula, since the numbers of ways W (N1 N2 ) corresponding
to all possible (N1 , N2 ) values are
2! 2! 2!
W (2, 0) = = 1, W (0, 2) = = 1, W (1, 1) = = 2. (66)
0!2! 2!0! 1!1!
The claim of Statistical Mechanics is that the “equilibrium” distribution of energies corresponds
to the set of occupation numbers (N1 , N2 , · · · , Nn ) such that the function W (N1 , N2 , · · · , Nn ) is max-
imized, subject to the constraints in (63) and (64). This problem seems to be perfectly suited for the
method of Lagrange multipliers.
There is just one minor technicality, however. The quantities Nk are integers, whereas all of
our discussions of multivariable functions and their optimization is based on the assumption that the
independent variables are continuous real variables.
For very large N , as is the case for a container of gas molecules (N is on the order of Avogadro’s
number, roughly 1024 ), the variation of each Nk over consecutive integer values may be viewed as
106
an infinitesimal change in comparison to N . As such, we may consider the Nk as continuous real
variables, which then allows for differentiation.
Because the functional form of W involves products and quotients, it is convenient to consider
the logarithm of W , i.e.,
n
X
ln W = ln(N !) − ln(Nk ). (67)
k=1
where we have replaced the approximation sign with an equality. Note that the first term on the RHS
of (69) is a constant, which will disappear after partial differentiation with respect to the Nk .
The Lagrangian associated with this optimization problem with two constraints will require two La-
grange multipliers, λ and ν. It is given by
n
X n
X n
X
L(N1 , N2 , · · · , Nn , λ, µ) = ln(N !) − [Nk ln(Nk ) − Nk ] + λ( Nk − N ) + µ( Nk Ek − E). (70)
k=1 k=1 k=1
∂L
=0 (71)
∂Ni
107
If we now impose the first constraint (63),
n n
Ni = eλ eµEi = N,
X X
(74)
i=1 i=1
In whatever we choose to write it, these equations characterize what is known as the “Boltzmann
distribution.”
In most applications, particularly those involving higher dimensions, e.g., R3 , one must take into
consideration the energy degeneracy of states, i.e., there are several ways that a state with a particular
energy can be formed. For example, if we consider the kinetic energy of a molecule in R3 , then there
are many velocity vectors v which have a given speed v – in fact, there is an entire sphere of such
vectors, with radius v.
As such, a weighting function must be introduced into the above formula. For simplicity, we
simply write the form associated with the above discrete case:
Ni gi e−Ei /kT
pi = = Pn −Ei /kT
, (81)
N i=1 gi e
108
where gi is the degeneracy of the ith state.
With some work, one can then apply this idea to determine the (continuous) distribution of
speeds in an ideal gas. The complication is that we must now consider the components of the mo-
menta/velocities of the gas molecules in three directions and how they combine to yield a velocity,
therefore speed. We simply state the final result that the distribution of speeds, now considered as
a continuous variable v ∈ [0, ∞) is given by the so-called “Maxwell-Boltzmann distribution” that
you probably saw in your first-year Chemistry course. Some plots of this distribution for different
temperatures T are shown below. (The plot was taken from the wikipedia.org site.)
109