Math Constrained Optimization II
Math Constrained Optimization II
Constrained Optimization II
11/22/22
1
Envelope theorems describe the relation between an optimized objective and various
parameters of the optimization problem. I don’t know their origin, but they have long
been used in economics. They can also be applied well beyond their traditional setting.
E.g., see Paul Milgrom and Ilya Segal (2002), Envelope theorems for arbitrary choice sets,
Econometrica 70, 583–601. Paul Samuelson had shown that Le Chatilier’s Principle
can be derived from an Envelope theorem. See Paul Samuelson (1947), Foundations
of Economic Analysis, Harvard Univ. Press, Cambridge, Mass.
19. CONSTRAINED OPTIMIZATION II 3
19.1.1 Multiplier Interpretation
Let’s start with the multiplier question. We first examine it in a stripped
down model. Start with an objective function f(x, y) and a single con-
straint function h(x, y). That gives us the following maximization prob-
lem:
M(c) = max f(x, y)
(x,y)
(19.1.1)
s.t. h(x, y) = c
constraint.
The optimality condition implicitly defines the multiplier µ.
Df = µDh.
Even in the simplest cases where it’s useful to use a multiplier, we must
have at least two goods and the first order condition is a covector equa-
tion. Let’s focus on the first component.
∂f ∂h ∂f ∂h
=µ or µ=
∂x ∂x ∂x ∂x
′ dh ∗ ∗
i
µ = M (c) = f x (c), y (c) .
dc
In fact, it does.
Theorem 19.1.1. Let f and h be C1 functions on U ⊂ R2 . Let x∗ (c), y∗ (c)
∗ ′ dh ∗ ∗
i
µ (c) = M (c) = f x (c), y (c) .
dc
∗ ′ dh ∗ ∗
i
µ (c) = M (c) = f x (c), y (c) .
dc
Sketch of Proof. Since we will use the same method to prove several
variations on the basic Envelope Theorem, it’s worth summarizing our
method upfront.
Logically, the main part of the proof starts when we use the Chain
Rule to calculate the derivative of the optimal value f x∗ (c), y∗(c) . We
then rewrite the equation twice: Once using the first order conditions
(19.1.2), and a second time using the results (19.1.3) from differentiating
the constraint. At that point the result pops out.
The formal proof is arranged somewhat differently. Since we know we
will need them, we pre-calculate both the first order conditions and the
derivative of the constraint. Once that is done, we are ready for the main
calculation, computing the derivative of the optimal value f x∗(c), y∗ (c) .
∂L ∂f ∂h ∂L ∂f ∂h
0= = − µ∗ 0= = − µ∗
∂x ∂x ∂x ∂y ∂y ∂y
so
∂f ∂h ∂f ∂h
= µ∗ and = µ∗ (19.1.2)
∂x ∂x ∂y ∂y
with everything evaluated at (x,y, µ) = x∗(c), y∗ (c), µ∗(c) . Differentiat-
∂h dx∗ ∂h dy∗
+ = 1. (19.1.3)
∂x dc ∂y dc
Applying the Chain Rule and using equations (19.1.2) and (19.1.3) we
find
′ dh ∗ ∗
i
M (c) = f x (c), y (c)
dc
∂f dx∗ ∂f dy∗
= +
∂x dc ∂y dc
∗ ∗
∂h dx ∂h dy
= µ∗ + µ∗ By (19.1.2)
∂x dc ∂y dc
∗
∂h dy∗
∗ ∂h dx
=µ +
∂x dc ∂y dc
= µ∗(c). By (19.1.3)
8 MATH METHODS
19.1.6 Marginal Utility of Income
Let’s apply Theorem 19.1.1 to the simple consumer’s problem we ex-
amined in the previous chapter.
∂v
(p, m) = µ∗ (m).
∂m
c(w, q) = min w· z
z
s.t. f(z) = q, .
Dx f = µ∗ Dx h (19.1.5)
Use the Chain Rule to differentiate f x∗(c), c and then substitute equa-
It’s easy to see that if a constraint gh (x∗) ≤ bh does not bind at x∗, that
it will still not bind
if we change bh slightly. It follows that the maximized
∗
value, f x (b, c) , does not change either, and that its bh derivative must
be zero. Since constraint h does not bind, the corresponding multiplier
is zero by complementary slackness. The multiplier and derivative are
the same.
12 MATH METHODS
19.1.10 Theorem on Multipliers and Mixed Constraints
The proof of the following theorem is similar to the previous results.
BEWARE! The requirement that the maximized value and the maximiz-
ers be C1 can be difficult to meet when there are inequality constraints.
The problem is that you might be switching from one type of solution to
another when a constraint starts or finishes binding. The transition might
lead to a discontinuous derivative.
Theorem 19.1.3. Let U ⊂ Rm and suppose f : U → R, g : U → Rk , and
h : U → Rℓ are C1 functions. Let x∗ (b, c) be the solution to the max-
imization problem (19.1.7) and µ∗ (b, c) and λ∗ (b, c) the corresponding
multipliers. Suppose x∗ (b, c), µ∗ (b, c) and λ∗ (b, c) are C1 in (b, c).
Let ĝ be the vector of the k̂ binding constraints at x∗ and define
ĝ(x)
G(x) = .
h(x)
We now employ the same procedure as before. The first order condi-
tions are
0 = Dx L = Dx f − µ∗ Dx h,
so
Dx f = µ∗ Dx h. (19.1.8)
so
(Dx h) (Da x∗ ) = −Da h (19.1.9)
Then we wrap it all together using the Chain Rule and substitute equa-
tions (19.1.8) and (19.1.9)
h i
∗
Da M(a) = Da f x (a), a Definition of M
= (Dxf) (Da x∗ ) + Da f Chain Rule
= µ∗ Dx h Da x∗ + Da f
Equation (19.1.8)
= −µ∗Da h + Da f Equation (19.1.9)
= Da f − µ∗ h
Rearrangement
= Da L x∗ (a), µ∗(a), a
Definition of L
19. CONSTRAINED OPTIMIZATION II 15
19.1.13 Statement of Envelope Theorem
We now state an Envelope Theorem based on the above calculations.
We just need to add appropriate hypotheses so that the preceding cal-
culations are valid. In particular, the hypotheses must guarantee that our
first order conditions are correct and that the maximizers are C1 .
Envelope Theorem. Consider the maximization problem
∂v ∂u
− µ∗(a) xi (p, m) = −µ∗(a) xi(p, m).
= Dp L i =
∂pi ∂pi
∂v/∂pi
xi (p, m) = − .
∂v/∂m
2
Roy’s identity was established by the French economist and econometrican René
Roy (1894–1977). See René Roy (1947), La distribution du revenu entre les divers
biens, Econometrica 15, 205–225.
19. CONSTRAINED OPTIMIZATION II 17
19.1.15 Cost and the Envelope Theorem
Another example involves the cost function.3
◮ Example 19.1.5: Shephard’s Lemma. The cost function c(w, q) is de-
fined by
c(w, q) = min w· z
z
s.t. f(z) = q, .
The row vector z(w, q)T is the vector of conditional factor demands. This
result is known as Shephard’s Lemma. ◭
3
Shephard’s Lemma is due to Ronald W. Shephard (1953), Cost and Production Func-
tions, Princeton University Press, Princeton, NJ. Cost functions had been used prior to
Shephard, but he may have been the first to use the modern definition. He used a
duality method to prove Shephard’s Lemma, based on the distance function.
Ronald W. Shephard (1912–1982) was an American economist who’s research con-
centrated on cost and production. He pioneered the use of duality based on the
distance function. In contrast, the usual economic duality is more directly related to
the conjugate function.
18 MATH METHODS
19.1.16 Expenditure and the Envelope Theorem
We can also apply this to the very similar expenditure function.4
◮ Example 19.1.6: Shephard-McKenzie Lemma. The expenditure func-
tion solves
4
The expenditure function was introduced by Lionel McKenzie (1957), Demand
theory without a utility index, Rev. Econ. Studies 24, 185–189.
Lionel McKenzie (1919–2010) was an American economist known for proving the
existence of general equilibrium, for his work on demand theory, and for optimal growth
and turnpike theory.
19. CONSTRAINED OPTIMIZATION II 19
0ℓ×ℓ Dh(x∗ )
B= .
Dh(x∗ )T D2x L
The extra minus signs do not affect whether the matrix D2x L is constrained
positive definite or constrained negative definite. Compared to the usual
bordered Hessian, the principal minors are unchanged because for each
column that is multiplied by (−1), the corresponding row is also multi-
plied by (−1). Together, they multiply each principal minor by (−1)2,
leaving every principal minor unchanged.
The bordered Hessian that appears in the proof of Theorem 19.2.1 is
the natural bordered Hessian, D2(µ,x)L, not our usual bordered Hessian.
Since they are equivalent in the sense that they both are negative definite
on the same subspaces and yield the same conditions on minors, it
doesn’t really affect the statement of the theorem.
22 MATH METHODS
19.2.4 Proof of Theorem 19.2.1
Proof. We start by noting that (4) and (4’) are equivalent. It doesn’t
matter which we use.
The rest of the proof proceeds by contradiction. Suppose that x∗ is
not a strict local maximum. Then for n = 1, 2, . . . , we can find an
xn 6= x∗ obeying h(xn ) = c with kxn − x∗k < 1/n and f(xn ) ≥ f(x∗ ).
Next define a sequence of unit vectors by
xj − x∗
uj = .
kxj − x∗ k
These all lie on the unit sphere Sm−1 = {x ∈ Rm : kxk = 1}, which is a
compact set. It follows that there is a convergent subsequence ujk . We
call its limit u∗ .
Since h is C2 , its first order Taylor polynomial with remainder about x∗
can be written5
xj k − x∗ R1(xkj ; x∗ )
∗
0 = Dh(x ) + .
kxjk − x∗k kxjk − x∗k
0 = Dh(x∗ )u∗
5
See section 30.29.
19. CONSTRAINED OPTIMIZATION II 23
19.2.5 Remainder of Proof of Theorem 19.2.1
Remainder of Proof. Take the second order Taylor polynomial of the
Lagrangian L about x∗ with remainder S2.
1
L(xjk , µ∗ ) − L(x∗, µ∗ ) = (xjk − x∗ )T D2xL(x∗, µ∗ ) (xjk − x∗) + S2(xjk ; x∗).
2
0 ≤ f(xjk ) − f(x∗)
≤ L(xjk , µ∗) − L(x∗, µ∗ )
1
= (xjk − x∗ )T D2xL(x∗, µ∗ ) (xjk − x∗ ) + S2 (xjk ; x∗).
2
1
0 ≤ u∗ T D2xL(x∗, µ∗ ) u∗ .
2
Dx∗ h v = 0}.
Assumptions (1) and (2) tell us that (x∗, µ∗ ) is a critical point of the mini-
mization problem analogous to (19.1.4).
Then x∗ is a strict local constrained minimum of f on M.
Moreover, the conclusion remains true if (4) is replaced by (4’).
4’. The last (m − ℓ) leading principal minors of the bordered Hessian B
have the same sign and the determinant of B obeys (−1)ℓ det B > 0.
Proof. Mimic the proof of Theorem 19.2.1.
19. CONSTRAINED OPTIMIZATION II 25
19.2.7 2nd Order Conditions: Mixed Constraint Maxima
We’ll jump directly to the full mixed constraint version of the theorem.
Theorem 19.2.3. Let U ⊂ Rm and suppose the functions f : U → R,
g : U → Rk , and h : U → Rℓ are C2 functions. Suppose further that there
are k̂ binding constraints at x∗ . Let ĝ be the vector of binding inequality
constraints at x∗ with corresponding constants b̂. Set
ĝ(x)
G(x) =
h(x)
Now suppose
1. x∗ ∈ M.
2. There are λ∗ ∈ Rk and µ∗ ∈ Rℓ with
Dx L = 0 at (x∗, λ∗ , µ∗ ),
λ∗ ≥ 0,
λ∗1 g1(x∗ ) − b1 ) = 0, . . . , λ∗k gk (x∗) − bk = 0,
h(x∗ ) = c.
Now suppose
1. x∗ ∈ M.
2. There are λ∗ ∈ Rk and µ∗ ∈ Rℓ with
Dx L = 0 at (x∗, λ∗ , µ∗ ),
λ∗ ≥ 0,
λ∗1 g1(x∗ ) − b1 ) = 0, . . . , λ∗k gk (x∗) − bk = 0,
h(x∗ ) = c.
Equivalently, either the minors det B̂r of any given size r > m−(k̂+ℓ) are
either all non-positive, or all non-negative. The generalized sign alternates
with r, and (−1)k̂+ℓ det B̂ > 0.
Simon and Blume punted on the determinant conditions. The best
thing to do is to look at the source. Our determinant condition is a
rewritten version of the one in Debreu (1952).6
Minimum. If x∗ is a minimum, we have to reverse the inequalities for g,
and adjust the Lagrangian accordingly. In that case, we can conclude
that the Hessian of the Lagrangian at (x∗ , λ∗ , µ∗ ) is positive semidefinite
on the tangent space Tx∗ .
Equivalently, the minors det B̂r with r > m − (k̂ + ℓ) are either all
non-positive, or all non-negative. and (−1)m−k̂−ℓ det B̂ > 0.
6
Debreu, Gerard (1952) Definite and semidefinite quadratic forms, Econometrica, 20,
295–300.
19. CONSTRAINED OPTIMIZATION II 29
must be non-singular.
If the second order sufficient conditions for a strict local maximum
are satisfied, the Hessian D2(x,y)f will be negative definite, and hence
non-singular.
So let’s write a theorem summing up what we proved. The Implicit
Function Theorem requires that our implicit function be defined by a
function that is itself C1. It also allows multiple exogenous variables, like
our variable a.
Differentiability is a local property, so the fact that the Implicit Function
Theorem only gives local results should not be a problem.
If we consider a similar minimization problem, the only real difference
is that the similar second order sufficient condition is that D2x f be positive
definite at x(a), which again means D2x f is non-singular.
Let’s add these considerations to our theorem.
19. CONSTRAINED OPTIMIZATION II 31
19.3.3 Smoothness of Unconstrained Optima
Here’s a version of such a theorem, stated for x ∈ Rm .
Theorem 19.3.1. Let U ⊂ Rm and V ⊂ R. Suppose f : U × V → R is C2
in x ∈ U and C1 in a ∈ V. Suppose also there are (x0, a0) ∈ U0 × V 0
with Dx f(x0 , a0) = 0.
If the Hessian D2x f(x0, a0 ) is either positive definite or negative definite,
then there is an ε > 0 and a C1 function x∗ : Bε (a0 ) → Rm with x∗(a)
obeying
Dx f x(a), a = 0
and any constrained optimum, x∗ (a) satisfies the first order conditions
DxL(x, µ, a) = 0, Dµ L(x, µ, a) = 0.
D2(µ,x)L = . (19.3.13)
−DhT D2x L
It is the version of the bordered Hessian we previously encountered
following Theorem 19.2.1. This matrix will be singular if NDCQ fails.
That means that if we require D2(µ,x)L be invertible, we are also requiring
that the NDCQ is satisfied, as shown by the following theorem.
Theorem 19.3.2. Suppose D2(µ,x)L(x∗ ) given by equation (19.3.13) is in-
vertible. Then rank Dh(x∗ ) = ℓ.
Proof. Consider
∗
iu − Dh(x ) v
h
D2(µ,x)L(x∗) = T
v − Dh(x∗ ) u + D2x L(x∗) v
max f(x)
x
s.t h(x) = c
0 = Dx L = µ 0 D x f − µ 1 D x h
∂L
0= = c − h(x).
∂µ1
max x
(x,y)
s.t. x3 + y2 = 0.
L = µ0 x − µ1(x3 + y2 ).
µ0 = 3µ1x2
0 = −2µ1y
0 = x3 + y2.
0 = 3µ1 x2,
so either x = 0 or µ1 = 0.
38 MATH METHODS
19.4.3 Example 18.33 Revisited II
If x < 0, µ1 = 0, which is forbidden by the condition that at least one µi
must be non-zero.
Finally, if x = 0, y = 0, and any value of µ1 will satisfy the first order
equations. This yields the correct solution (x, y) = (0, 0).
x3 + y2 = 0
b
x∗
x
x=0
Figure 19.4.2: The solution is at the cusp, x∗ = (0, 0). There Df(x∗ ) = (0, 0),
forcing the multiplier µ0 = 0 because Df = (1, 0) 6= (0, 0).
19. CONSTRAINED OPTIMIZATION II 39
19.4.4 Fritz John Theorem
This version of Theorem 19.4.1, including a multiplier on the objective,
allows for inequality constraints.7
Theorem 19.4.3. Let U ⊂ Rm . Suppose f : U → R and g : U → Rk
are C1 functions. Suppose further that x∗ is a local max of f under the
constraints
g(x) ≤ b.
k
X
L(x, λ0, . . . , λk ) = λ0f(x) − λj gj (x) − bj
j=1
with a multiplier λ0 for the objective function. Then there exist λ∗0, . . . , λ∗k
such that:
1. Dx L(x∗, λ∗0, . . . , λ∗k ) = 0.
2. λ∗1[g1 (x∗) − b1 ], . . . , λ∗k [gk (x∗) − bk ] = 0.
3. λ∗1 ≥ 0, . . . , λ∗k ≥ 0.
4. g(x∗) ≤ b.
5. λ∗0 = 0 or λ∗0 = 1, and
6. (λ∗0, . . . , λ∗k ) 6= 0.
7
The work of Karush, Kuhn, Tucker, and John founded the subject of nonlinear
programming, the study of nonlinear constrained optimization. Fritz John’s original
paper was Fritz John (1948), “Extremum problems with inequalities as subsidiary condi-
tions” in K.O. Friedrichs et al. (eds.), Studies and Essays, Courant Anniversary Volume,
Wiley/Interscience (Reprinted in: J. Moser (ed.): Fritz John Collected Papers vol. 2,
Birkhäuser, 1985, pp. 543-560).
40 MATH METHODS
19.4.5 More Constraint Qualification: Maximia
There are other constraint qualification conditions that can be used. The
following theorem collects some of the them without proof.
Theorem 19.4.4. Let U ⊂ Rm and suppose f : U → R and g : U → Rk
are C1 functions. Suppose also that x∗ is a local maximum of f under the
constraints
g(x) ≤ b.
December 6, 2022