A Simple Explanation of Why Lagrange Multipliers Works: Andrew Chamberlain, PH.D
A Simple Explanation of Why Lagrange Multipliers Works: Andrew Chamberlain, PH.D
Most textbooks focus on mechanically cranking out formulas, leaving students mystified about why
it actually works to begin with.
In this post, I’ll explain a simple way of seeing why Lagrange multipliers actually do what they do —
that is, solve constrained optimization problems through the use of a semi-mysterious Lagrangian
function.
Some Background
Before you can see why the method works, you’ve got to know something about gradients.
A gradient is just a vector that collects all the function’s partial first derivatives in one place.
Each element in the gradient is one of the function’s partial first derivatives.
An easy way to think of a gradient is that if we pick a point on some function, it gives us the
“direction” the function is heading.
Toward the peak I’ve drawn two regions where we hold the
height of f constant at some level a.
That’s the key idea here: level curves are where and
How the Method Works
In the drawing, the boundary where the constraint cuts the function is marked with a heavy line.
Along that line are the highest points we can reach without stepping over our constraint.
As we gain elevation, we walk through various level curves of f. I’ve marked two in the picture.
At each level curve, imagine checking its slope — that is, the slope of a tangent line to it — and
comparing that to the slope on the constraint where we’re standing.
If our slope is greater than the level curve, we can reach a higher point on the hill if we keep moving
right.
If our slope is less than the level curve — say, toward the right where our constraint line is
declining — we need to move backward to the left to reach a higher point.
When we reach a point where the slope of the constraint line just equals the slope of the level curve,
we’ve moved as high as we can.
Any movement from that point will take us downhill. In the figure, this point is marked with a large
arrow pointing toward the peak.
At that point, the level curve f = a2 and the constraint have the same slope.
So if these two curves are parallel, their gradients must also be parallel.
That means the gradients of f and g both point in the same direction, and differ at most by a scalar.
Then we have,
This is the condition that must hold when we’ve reached the maximum of f subject to the constr aint
g = c.
Now, if we’re clever we can write a single equation that will capture this idea.
or more explicitly,
To see how this equation works, watch what happens when we follow the usual Lagrangian
procedure.
First, we find the three partial first derivatives of L, , , and set them equal to zero.
To find the gradient of L, we take the three partial derivatives of L with respect to x1, x2 and
lambda.
First, the gradients of f and g must point in the same dir ection, or
The first and second elements of the gradient of L make sure the first rule is followed.
That is, they force assuring that the gradients of f and g both point in the same
direction.
The third element of the gradient of L is simply a trick to make sure g = c, which is our constraint.
In the Lagrangian function, when we take the partial derivative with respect to lambda, it simply
returns back to us our original constraint equation.
At this point, we have three equations in three unknowns.
So we can solve this for the optimal values of x1 and x2 that maximize f subject to our constraint.
So the bottom line is that Lagrange multipliers is really just an algorithm that finds where the
gradient of a function points in the same di rection as the gradients of its constraints, while also
satisfying those constraints.
As with most areas of mathematics, once you see to the bottom of things.
In this case, that optimization is really just hill-climbing, which everyone understands .
Things are a lot simpler than most economists and engineers make them out to be.