Lagrange Multipliers: D D N×D N 1
Lagrange Multipliers: D D N×D N 1
Abstract
We consider a special case of Lagrange Multipliers for constrained opti-
mization. The class quickly sketched the “geometric” intuition for La-
grange multipliers, and this note considers a short algebraic derivation.
In order to minimize or maximize a function with linear constraints, we consider
finding the critical points (which may be local maxima, local minima, or saddle
points) of
f (x) subject to Ax = b
Here f : Rd → R is a convex (or concave) function, x ∈ Rd , A ∈ Rn×d , and
b ∈ Rn . To find the critical points, we cannot just set the derivative of the
objective equal to 0.1 The technique we consider is to turn the problem from a
constrained problem into an unconstrained problem using the Lagrangian,
We’ll show that the critical points of the constrained function f are critical
points of L(x, µ).
U = [u1 , . . . , uk ] ∈ Rd×k
Since U is a basis, any solution for f (x) can be written as x = x0 + U y. This
captures all the free parameters of the solution. Thus, we consider the function:
1
To make sure the types are clear: ∇y g(y) ∈ Rk , ∇f (z) ∈ Rd and U ∈ Rd×k .
In both cases, 0 is the 0 vector in Rk .
The above condition says that if y is a critical point for g, then ∇f (x) must
be orthogonal to U . However, U forms a basis for the null space of A and the
rowspace is orthogonal to it. In particular, any element of the rowspace can be
written z = AT µ ∈ Rd . We verify that z and u = U y are orthogonal since:
z T u = µT Au = µT 0 = 0
The Clever Lagrangian We now observe that the critical points of the La-
grangian are (by differentiating and setting to 0)
The first condition is exactly the condition that x be a critical point in the
way we derived it above, and the second condition says that the constraint be
satisfied. Thus, if x is a critical point, there exists some µ as above, and (x, µ)
is a critical point for L.
2
This is just a linear functional over the circle, and it is compact, so the func-
tion must achieve a maximum value. Intuitively, we can see that (1, 0) is the
maximum possible value (and hence a critical point). Here, we have:
1 x1
∇f (x) = and ∇g(x) = 2
0 x2
Notice that ∇f (x) is not zero anywhere on the circle–it’s constant! For x ∈
{(1, 0), (−1, 0)}, ∇f (x) = λ∇g(x) (take λ ∈ {1/2, −1/2}, respectively). On the
other hand, for any other point on the circle x2 6= 0, and so the gradient of f
and g are not parallel. Thus, such points are not critical points.
Extra Resources If you find resources you like, post them on Piazza!