0% found this document useful (0 votes)
53 views3 pages

Lagrange Multipliers: D D N×D N 1

Lagrange multipliers allow constrained optimization problems to be solved by transforming them into unconstrained problems. The key steps are: [1] Expressing the constraints to parameterize all possible solutions. [2] Defining an objective function over the free parameters, which has critical points corresponding to the original problem. [3] Noting that the critical points of the Lagrangian (objective plus constraints) satisfy the necessary conditions of both problems. This approach extends to non-linear equality constraints using the implicit function theorem.

Uploaded by

vip_thb_2007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views3 pages

Lagrange Multipliers: D D N×D N 1

Lagrange multipliers allow constrained optimization problems to be solved by transforming them into unconstrained problems. The key steps are: [1] Expressing the constraints to parameterize all possible solutions. [2] Defining an objective function over the free parameters, which has critical points corresponding to the original problem. [3] Noting that the critical points of the Lagrangian (objective plus constraints) satisfy the necessary conditions of both problems. This approach extends to non-linear equality constraints using the implicit function theorem.

Uploaded by

vip_thb_2007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Lagrange Multipliers

May 16, 2020

Abstract
We consider a special case of Lagrange Multipliers for constrained opti-
mization. The class quickly sketched the “geometric” intuition for La-
grange multipliers, and this note considers a short algebraic derivation.
In order to minimize or maximize a function with linear constraints, we consider
finding the critical points (which may be local maxima, local minima, or saddle
points) of
f (x) subject to Ax = b
Here f : Rd → R is a convex (or concave) function, x ∈ Rd , A ∈ Rn×d , and
b ∈ Rn . To find the critical points, we cannot just set the derivative of the
objective equal to 0.1 The technique we consider is to turn the problem from a
constrained problem into an unconstrained problem using the Lagrangian,

L(x, µ) = f (x) + µT (Ax − b) in which µ ∈ Rn

We’ll show that the critical points of the constrained function f are critical
points of L(x, µ).

Finding the Space of Solutions Assume the constraints are satisfiable,


then let x0 be such that Ax0 = b. Let rank(A) = r, then let {u1 , . . . , uk } be an
orthonormal basis for the null space of A in which k = d − r. Note if k = 0, then
x0 is uniquely defined. So we consider k > 0. We write this basis as a matrix:

U = [u1 , . . . , uk ] ∈ Rd×k
Since U is a basis, any solution for f (x) can be written as x = x0 + U y. This
captures all the free parameters of the solution. Thus, we consider the function:

g(y) = f (x0 + U y) in which g : Rk → R

The critical points of g are critical points of f . Notice that g is unconstrained,


so we can use standard calculus to find its critical points.

∇y g(y) = 0 equivalently U T ∇f (x0 + U y) = 0.


1 See the example at the end of this document.

1
To make sure the types are clear: ∇y g(y) ∈ Rk , ∇f (z) ∈ Rd and U ∈ Rd×k .
In both cases, 0 is the 0 vector in Rk .
The above condition says that if y is a critical point for g, then ∇f (x) must
be orthogonal to U . However, U forms a basis for the null space of A and the
rowspace is orthogonal to it. In particular, any element of the rowspace can be
written z = AT µ ∈ Rd . We verify that z and u = U y are orthogonal since:

z T u = µT Au = µT 0 = 0

Since we can decompose Rd as a direct sum of null(A) and the rowspace of A, we


know that any vector orthogonal to U must be in the rowspace. We can rewrite
this orthogonality condition as follows: there is some µ ∈ Rn (depending on x)
such that
∇f (x) + AT µ = 0
for a certain x such that Ax = A(x0 + U y) = Ax0 = b.

The Clever Lagrangian We now observe that the critical points of the La-
grangian are (by differentiating and setting to 0)

∇x L(x, µ) = ∇f (x) + AT µ = 0 and ∇µ L(x, µ) = Ax − b = 0

The first condition is exactly the condition that x be a critical point in the
way we derived it above, and the second condition says that the constraint be
satisfied. Thus, if x is a critical point, there exists some µ as above, and (x, µ)
is a critical point for L.

Generalizing to Nonlinear Equality Constraints Lagrange multipliers


are a much more general technique. If you want to handle non-linear equality
constraints, then you will need a little extra machinery: the implicit function
theorem. However, the key idea is that you find the space of solutions and you
optimize. In that case, finding the critical points of

f (x) s.t. g(x) = c leads to L(x, µ) = f (x) + µT (g(x) − c).

The gradient condition here is ∇f (x)+J T µ = 0, where J is the Jacobian matrix


of g. For the case where we have a single constraint, the gradient condition
reduces to ∇f (x) = −µ1 ∇g1 (x), which we can view as saying, “at a critical
point, the gradient of the surface must be parallel to the gradient of the function.”
This connects us back to the picture that we drew during lecture.

Example: Need for constrained optimization We give a simple example


to show that you cannot just set the derivatives to 0. Consider f (x1 , x2 ) = x1
and g(x1 , x2 ) = x21 + x22 and so:

max f (x) subject to g(x) = 1.


x

2
This is just a linear functional over the circle, and it is compact, so the func-
tion must achieve a maximum value. Intuitively, we can see that (1, 0) is the
maximum possible value (and hence a critical point). Here, we have:
   
1 x1
∇f (x) = and ∇g(x) = 2
0 x2

Notice that ∇f (x) is not zero anywhere on the circle–it’s constant! For x ∈
{(1, 0), (−1, 0)}, ∇f (x) = λ∇g(x) (take λ ∈ {1/2, −1/2}, respectively). On the
other hand, for any other point on the circle x2 6= 0, and so the gradient of f
and g are not parallel. Thus, such points are not critical points.

Extra Resources If you find resources you like, post them on Piazza!

You might also like