Chapter 3: The Lagrange Method: Elements of Decision: Lecture Notes of Intermediate Microeconomics
Chapter 3: The Lagrange Method: Elements of Decision: Lecture Notes of Intermediate Microeconomics
where the domain S for the choice variable is assumed open in the sense that it contains no boundary
points with respect to the space of all n-vectors. (For example, when n = 1, the entire set R of real
numbers is open, whereas the set R+ of nonnegative real numbers is not, as the latter contains the
boundary point zero.) Here the choice variable (x1 , . . . , xn ) has n dimensions and is subject to m
constraints, each in the form of an equation gk (x1 , . . . , xn ) = 0. For example, the problem
min w1 x 1 + w2 x 2 (2)
(x1 ,x2 )∈S
subject to f (x1 , x2 ) = y
2 The procedure
The Lagrange method to solve Problem (1) proceeds in three steps. First, write down the La-
grangian, a function defined by
m
X
L(x1 , . . . , xn ; λ1 , . . . , λm ) := u(x1 , . . . , xn ) + λk gk (x1 , . . . , xn ) (4)
k=1
1
for any n-vector (x1 , . . . , xn ) and m-vector (λ1 , . . . , λm ). Recall from basic math the summation
notation m
P Pm
k=1 ; for example, k=1 ak is just the shorthand for a1 + . . . + am . Note from Eq. (4)
that we form the Lagrangian by summing the objective u with all the constraint functions g1 , ...,
and gm , multiplied respectively by the coefficient λ1 , ..., and λm . For each k, the coefficient λk
for gk is called Lagrange multiplier for the kth constraint.
Second, write down the first-order condition for the Lagrangian to attain its local maximum.
In other words, calculate all the partial derivatives of the Lagrangian and set each of them to zero:
m
∂ ∂ X ∂
L = u(x1 , . . . , xn ) + λk gk (x1 , . . . , xn ) = 0 for all i = 1, . . . , n; (5)
∂xi ∂xi ∂xi
k=1
∂
L = gk (x1 , . . . , xn ) = 0 for all k = 1, . . . , m. (6)
∂λk
Note that Eqs. (5)–(6) constitute an equation system of n + m equations and n + m unknowns.
Third, solve Eqs. (5)–(6) for (x1 , . . . , xn ; λ1 , . . . , λm ). The (x1 , . . . , xn ) obtained thereof is a
candidate for the solution of Problem (1) as long as the method described above is applicable.
For illustration, consider the cost-minimization problem (2) with nonzero parameters w1
and w2 and differentiable production function f such that the partial derivatives are nonzero.
Rewrite the problem in the form of (1) thereby to obtain Problem (3), based on which we construct
the Lagrangian
L(x1 , x2 ; λ) := −w1 x1 − w2 x2 + λ (f (x1 , x2 ) − y) .
Then write down the first-order conditions for this Lagrangian, as if we were seeking a local maxi-
mum of L without constraint:
∂ ∂
L = −w1 + λ f (x1 , x2 ) = 0,
∂x1 ∂x1
∂ ∂
L = −w2 + λ f (x1 , x2 ) = 0,
∂x2 ∂x2
∂
L = f (x1 , x2 ) − y = 0.
∂λ
Finally, solve the three equations for (x1 , x2 ; λ): the three equations are equivalent to
∂
λ f (x1 , x2 ) = w1 , (7)
∂x1
∂
λ f (x1 , x2 ) = w2 , (8)
∂x2
f (x1 , x2 ) = y; (9)
divide the first equation by the second to cancel out λ (which can be done because the partial
derivatives are nonzero by assumption, and λ 6= 0, otherwise Eq. (7) would say that zero is equal
to a nonzero number w1 ) and obtain
∂ ∂
f (x1 , x2 ) f (x1 , x2 ) = w1 /w2 , (10)
∂x1 ∂x2
which coupled with Eq. (9) gives a solution for (x1 , x2 ); plug this solution into Eq. (7) or (8) to
obtain λ. Note that Eqs. (9)–(10) are exactly the equation system in Chapter 2 that determines
the cost-minimizing input bundle in the case where the production function is differentiable and
has diminishing TRS.
2
3 An example with multiple constraints
Consider the problem
Here a firm chooses between three kinds of inputs to deliver 100 units of output, though according
to the first constraint only inputs 1 and 2 can contribute to production. The second constraint,
x1 = 2x3 , may be due to an environmental protection legislation that requires hiring input 3 in a
certain proportion of another input that the firm hires. Note that the domain (0, ∞)3 , the space
of 3-vectors whose coordinates are all positive, is an open set. To solve this problem, rewrite it in
the form of (1):
λ1 /x1 = 3 − λ2 ,
5λ1 /x2 = 2,
λ2 = −2,
ln x1 x52 = 100,
x1 = 2x3 .
λ1 /x1 = 3 − (−2) = 5,
3
which coupled with the second equation gives x1 = 2x2 /25. Plug it into the fourth equation to get
2
x2 x52 = e100 ,
25
1/6 2
1/6
i.e., x2 = 25e100 /2 . Hence x1 = 2x2 /25 = 25 25e100 /2 , which, plugged into the last
equation, gives
2 1/6
25e100 /2 = 2x3 ,
25
1
1/6
i.e., x3 = 25 25e100 /2 . Thus we obtain the only candidate for a solution of (x1 , x2 , x3 ):
2 100
1/6 100
1/6 1 100
1/6
25e /2 , 25e /2 , 25e /2 .
25 25
Compared to the above procedure, it would be more cumbersome to solve the problem with the
two-dimensional methods in Chapters 2 and 3, as the choice variable here lives in the 3-dimensional
space, in which the two constraints are each a surface rather than a curve.
where f (x1 ) a differentiable function of x1 and, when x1 increases, f (x1 ) increases and the derivative
d
dx1 f (x1 ) decreases. A simple way to solve this problem is to plug the constraint x2 = f (x1 ) into
the objective so that the problem becomes
Then apply the technique in Chapter 1. Since the domain (0, ∞) of the choice variable x1 is
an open set, any maximum is an interior solution and hence satisfies the first-order condition
d w
dx1 f (x1 ) − p = 0, i.e.,
d w
f (x1 ) = . (12)
dx1 p
Since by assumption dxd1 f (x1 ) is decreasing in x1 , the second-order condition is always satisfied
(verify that yourself). Thus a solution to Eq. (12) is the same as a solution to Problem (11).
Geometrically, Eq. (12) means: On the x1 -x2 plane, draw the graph of f , which is upward sloping
4
and whose slope is decreasing in x1 ; draw the straight line that has the slope w/p and is tangent
to the graph of f ; the tangent point is the solution. (Rember Exercise 4 of Chapter 2?)
The question is how to generalize this method to cases with multiple constraints and more
than two choice dimensions. In those cases, the graph of a constraint equation (e.g., f (x1 , x2 ) = x3 )
is no longer just a curve but rather a surface in a higher-dimension space, and likewise for the tangent
“line” (e.g., the plane px3 − w1 x1 − w2 x2 = 300). While we can still imagine that the solution is
the tangent point between the two surfaces, it makes little sense to say that the two surfaces have
the same “slope,” as a surface may have different slopes along different directions.
Rather than slopes, a better way to look at surfaces tangent to each other is to compare their
gradients (which we shall define in the next subsection): at any point of a surface, the gradient of
the surface is a vector perpendicular to the surface at that point. In the above example, the gradient
of the tangent line is the vector (−w, p), which plotted on the x1 -x2 diagram is perpendicular to the
tangent line, whose slope is constantly equal to w/p.1 The gradient of the constraint f (x1 ) = x2
varies with the coordinates (x1 , x2 ) on the
curve, because
the slope of the curve varies with (x1 , x2 ),
d
and given x1 this gradient is the vector dx1 f (x1 ), −1 . Note that this vector is perpendicular to
the curve f (x1 ) = x2 at the point with horizontal coordinate x1 .2 Thus, the gradients of the tangent
line and the constraint curve, one perpendicular to the tangent line and the other perpendicular
to the constraint curve, must be aligned , i.e., belonging to a single straight line. We have now
arrived at a viewpoint elegantly suitable for higher dimensions: two surfaces are tangent to each
other iff their gradients are aligned . Thus, when the choice variable has higher dimensions, instead
of thinking of a solution as a tangent point, think of the solution as the point on the constraint
surface (the set of points that satisfy all constraints) such that at this point the gradient of this
surface and the gradient of the objective are aligned.
Algebraically, two vectors are aligned iff you can turn one vector into the other by multiplying
all coordinates of the former
by some common
number. That is, in our example, vector (−w, p)
d
being aligned with vector dx1 f (x1 ), −1 means that there exists a real number λ for which
" # " #
d
−w dx f (x 1 )
= −λ 1 . (13)
p −1
From the viewpoint described in the previous paragraph, we reach an alternative method to solve
Problem (11): instead of solving for (x1 , x2 ) by coupling the tangency equation (12) with the
constraint f (x1 ) = x2 , solve for (x1 , x2 , λ) by coupling Eq. (13) with the constraint f (x1 ) = x2 .
But are the two methods consistent to each other? To see that the answer is Yes, rearrange
Eq. (13) to obtain
d
−w + λ f (x1 ) = 0,
dx1
p − λ = 0.
These two equations are simply Eq. (5) applied to our example, with the Lagrangian
5
In the mean time, Eq. (13) is equivalent to
d
−w = −λ f (x1 ),
dx1
p = λ.
Dividing the first equation by the second to cancel out λ and obtain
d
−w/p = − f (x1 ),
dx1
which is exactly the tangency equation (12). Hence the two methods are consistent, except that
the one with gradients works also in higher dimensions.
6
Figure 1: The f here is our objective u, and h1 and h2 here our constraint functions g1 and g2
7
candidate produced by the Lagrange method solves Problem (1).3
While there are methods analogous to the procedure for equality constraints, the validity of such
methods require stronger conditions. At this point, just beware of a tempting danger for many—
alas, including many economists!—to abuse such methods even when their validity is not warranted.
When you see a “Lagrange method” solution of an optimization problem some of whose constraints
are inequalities, be careful.4
However, some problems with inequality constraints can be turned into ones with equality
constraints. For such problems we can solve by the procedure introduced above. Let us illustrate
3
For the curious mind only, here is a synopsis of the conditions for the Lagrange method. The Lagrange Multiplier
Theorem says that a solution (x∗1 , . . . , x∗n ) of Problem (1) is necessarily a solution of Eqs. (5)–(6) provided that two
conditions are met at (x∗1 , . . . , x∗n ): (a) the objective u and constraint functions gk (for all k = 1, . . . , m) are all
continuously differentiable, and (b) the constraint functions are regular in the sense that their gradients span the
m-dimensional vector space. The regularity condition (b), in turn, means that (i) n ≥ m and (ii) none of the gradients
∇g1 , . . . , ∇gm at (x∗1 , . . . , x∗n ) is redundant, i.e., none is equal to a linear combination of the other m−1 gradients (e.g,
if ∇g1 = λ2 ∇g2 + · · · + λm ∇gm for some real numbers λ2 , . . . , λm then ∇g1 is redundant). There would have been a
third condition (c), requiring that (x∗1 , . . . , x∗n ) not be a boundary point of its domain, which is already guaranteed
because we assume at the outset that the domain S is open.
We have explained previously why the procedure needs Condition (a) and part (i) of Condition (b). To have a
glimpse of what part (ii) of Condition (b) is up to, consider a case where it is violated: Suppose n = 3 and m = 2
such that the two constraints correspond to two surfaces that have exactly one common point and at that point
the two surfaces are tangent to each other. That immediately pins the choice variable down to this single point,
which is hence the solution of Problem (1). At this point, the gradients of the two constraints are aligned, as the
corresponding surfaces are tangent to each other. Thus, no matter how we scale up or down each of them, we cannot
alter the direction of the sum of the two. Consequently, if the gradient of the objective is not aligned with the two
constraint gradients at the outset, there is no way to scale the two constraint gradients thereby to align them with
the gradient of the objective, hence it is impossible to satisfy Eq. (14). This misalignment cannot be corrected by
perturbation, because there is no wiggle room to perturb the choice variable along the direction of the gradient of
the objective without violating one of the constraints (c.f. Exercise 4d.).
4
Even the famous Kuhn and Tucker, who are credited for a main theorem handling this case (and both are played as
characters in the Oscar-award-winning blockbuster “A Beautiful Mind”), made a serious mistake in the initial version
of their theorem. According to the late Leo Hurwicz, they did not know of the mistake until a seminar audience
pointed it out, with a counterexample, during their seminar presenting the “theorem.” Then they haphazardly
modified their theorem by adding a “constraint qualification” condition on the solution produced by the Lagrange
method. When the late Hurwicz related the anecdote to this author, who was then a graduate student working as
the former’s graduate class teaching assistant, Hurwicz hastened to add a moral of the story: “One does not need to
commit suicide even when his theorem is found wrong when he is presenting it.”
8
with the following problem:
max py − wx (16)
(x,y)∈R2+
subject to y ≤ f (x),
where p and w are positive parameters, and f the production function that is increasing, differen-
tiable, having derivative that is decreasing in x, and
d
lim f (x) = ∞. (17)
x→0 dx
i.e., when x converges to zero, the graph of f steepens to vertical. Written in the form of (15), this
problem is
max py − wx
(x,y)∈R2+
To apply the Lagrange method with equality constraints, first notice that there is no loss of general-
ity to restrict the choice to those such that f (x) − y = 0, for if f (x) − y > 0 then the decision maker
can increase the objective py − wx by increasing y slightly without changing x. This change is
feasible because f (x) − y > 0, and it brings in more profit because p > 0. Thus, the constraint (18)
can be replaced by the equation f (x) − y = 0. Hence the original problem is equivalent to
max py − wx
(x,y)∈R2+
subject to f (x) − y = 0.
Second, note that the domain R2+ of the choice variable is not open, as it contains boundary points
such as (0, y) and (x, 0). But since the slope of the the graph of f is decreasing and because of
Eq. (17), any supporting hyperplane of the graph of f touches the graph at a point where both
coordinates are nonzero. It follows that at any solution (x, y) of the problem, x > 0 and y > 0.
Thus, there is no loss of generality to replace the domain R2+ by the open set (0, ∞)2 . Then we
apply the Lagrange method. The Lagrangian by definition is
9
7 Exercises
1. Among the sets listed below, which sets are open?
a. (−∞, 0)
b. [0, 3) (i.e., the interval between 0 and 3, including 0 but excluding 3)
c. (0, 5)2 (i.e., (0, 5) × (0, 5), with (0, 5) denoting the interval between 0 and 5, excluding 0
and 5)
d. {0, 1/2} (i.e., the set consisting of 0 and 1/2)
e. (0, 1) × (0, 1]
2. A firm uses three kinds of inputs to produce one kind of output. If the firm employs a
quantity x1 of input 1, quantity x2 of input 2, and quantity x3 of input 3, with (x1 , x2 , x3 ) ∈
R3+ (R3+ denotes [0, ∞)3 ), then the maximum quantity of the output is equal to
where A, α, β and γ are each a positive parameter. For each k = 1, 2, 3, the market price of
input k is given to be a positive number, denoted by wk . Hence any input bundle (x1 , x2 , x3 )
would cost the firm w1 x1 + w2 x2 + w3 x3 . The firm has committed to supply a quantity y of
its output, with y a positive parameter.
10
b. Verify your observations in Step 3a. algebraically in the manner of Eq. (13)—for alignment—
and Footnotes 1 and 2—for perpendicularity.
c. Find a scaler (i.e., a real number) λ1 and a scaler λ2 such that the linear combination
" # " #
−1/3 4
λ1 + λ2
−2/3 −2
" # " # " #
−1/3 4 1
of vectors and is aligned with the vector .
−2/3 −2 0
d. Does there exist scalers λ1 and λ2 such that the linear combination
" # " #
−1/3 1
λ1 + λ2
−2/3 2
" #
1
is aligned with the vector ?
0
4. Which of the following constrained optimization problems can the Lagrange method with
equality constraints be applied to, either directly or after the problem is rewritten into an
equivalent form?
a. Minimize 3x1 + 4x2 among (x1 , x2 ) ∈ (0, ∞)2 subject to the constraint min{x1 , 2x2 } = y
b. Maximize 5y−8x among (x, y) ∈ [0, ∞)2 subject to the inequality constraint y ≤ ln(x+1)
√
c. Maximize 5y − 8x among (x, y) ∈ (0, ∞)2 subject to the constraints y = x, y = x − 1
and y = x2 + 1.
√
d. Maximize 5y − 8x among (x, y) ∈ (0, ∞)2 subject to the constraints y = x and y =
x/2 + 1/2. (Hint: Condition b.ii, Footnote 3.)
5. For each function defined below, calculate the gradient at the point (1, 1) and the gradient at
(4, 2) and plot each gradient in a two-dimensional coordinate system.
a. On the x-y plane, graph the function f and the straight line line whose slope is equal
to w/p (with the specific numbers given above) and passes through the point (4, 2). Note
that the point belongs to the graph of f .
b. Draw an arrow to indicate the gradient of the objective function at the point (4, 2).
c. Analogously, draw an arrow to indicate the gradient of the graph of f at the point (4, 2).
d. Is Eq. (13) satisfied at the point (4, 2)? In other words, are the two gradients aligned?
11
e. Find the coordinates of the point on the graph of f at which Eq. (13) satisfied. On the
above diagram draw the gradients of the objective and the graph of f at that point.
7. Which of the following optimization problems is equivalent to one where the inequality con-
straint is replaced by its corresponding equality, and the domain replaced by an open set?
max ln x1 + β ln x2
(x1 ,x2 ,x3 )∈R3+
subject to p1 x1 + x3 ≤ m
rx3 = p2 x2 ,
a. Explain why the domain of the choice variable can be restricted to the open set (0, ∞)3
without loss of generality.
b. Explain why the weak inequality constraint can be restricted to an equality constraint
without loss of generality.
c. Rewrite the above problem in a form analogous to (1) of this chapter.
d. Define the Lagrangian for the problem obtained in the previous step.
e. Write down the first-order conditions
f. Demonstrate that the solution of the first-order conditions is:
∗ ∗ ∗ ∗ ∗ m β rm βm β+1 β+1
(x1 , x2 , x3 , λ1 , λ2 ) = , , ,− ,− .
p1 (1 + β) 1 + β p2 1 + β m rm
λ∗1 ∇g1 (x∗1 , x∗2 , x∗3 ) + λ∗2 ∇g2 (x∗1 , x∗2 , x∗3 ) = −∇u(x∗1 , x∗2 , x∗3 ).
References
[1] David G. Luenberger. Optimization by Vector Space Methods. John Wiley & Sons, 1969. 4.2
12