Nlpsol 3
Nlpsol 3
Solutions Chapter 3
SECTION 3.2
3.2.6 www
For the reverse assertion, assume that x∗ and λ∗ satisfy the second order sufficiency condi-
tions of Prop. 3.2.1. Let ȳ ∈ n and z̄ ∈ m be vectors such that
ȳ
J = 0.
z̄
Consequently
∇2xx L(x∗ , λ∗ )ȳ + ∇h(x∗ )z̄ = 0, (1)
∇h(x∗ ) ȳ = 0. (2)
In view of Eq. (2), it follows that ȳ = 0, for otherwise the second order sufficiency condition
would be violated. Then Eq. (1) yields ∇h(x∗ )z̄ = 0. Since x∗ is a regular point, we must have
z̄ = 0. Hence, J is invertible.
1
Section 3.3
3.2.7 www
We have
∇2 p(u) = −∇λ(u).
∇f x(u) + ∇h x(u) λ(u) = 0.
We have
∇x(u)∇2xx L x(u), λ(u) + ∇λ(u)∇h x(u) = 0.
We also have ∇x(u)∇h x(u) = I, from which we obtain for all c ∈
c∇x(u)∇h x(u) ∇h x(u) = c∇h x(u) .
∇x(u) ∇2xx L x(u), λ(u) + c∇h x(u) ∇h x(u) + ∇λ(u) − cI h x(u) = 0.
From this, we obtain, for every c for which the inverse below exists,
−1
∇x(u) + ∇λ(u) − cI h x(u) ∇2xx L x(u), λ(u) + c∇h x(u) ∇h x(u) = 0.
Multiplying with ∇h x(u) and using the equations ∇x(u)∇h x(u) = I and ∇2 p(u) = −∇λ(u),
we see that
−1
−1
∇2 p(u) = ∇h x(u) ∇2xx L x(u), λ(u) + c∇h x(u) ∇h x(u) ∇h x(u) − cI.
SECTION 3.3
2
Section 3.3
3.3.5 www
(a) Let d ∈ F (x∗ ) be arbitrary. Then there exists a sequence {dk } ⊆ F (x∗ ) such that dk → d.
For each dk , we have
f (x∗ + αdk ) − f (x∗ )
∇f (x∗ ) dk = lim .
α→0 α
f (x∗ +αdk )−f (x∗ )
Since x∗ is a constrained local minimum, we have α ≥ 0 for all sufficiently small α
(for which x∗ + αdk is feasible), and thus ∇f (x∗ ) dk ≥ 0. Hence
as desired.
According to Farkas’ lemma, this is true if and only if there exists µ∗ such that
−∇f (x∗ ) = µ∗j ∇gj (x∗ ), µ∗j ≥ 0.
j∈A(x∗ )
(c) We want to show that F (x∗ ) = V (x∗ ), where V (x∗ ) is the cone of first order feasible variations
given by
V (x∗ ) = d | ∇gj (x∗ ) d ≤ 0, ∀ j ∈ A(x∗ ) .
First, let’s show that under any of the conditions (1)–(4), we have F (x∗ ) ⊆ V (x∗ ). By
Mean Value Theorem, for each j ∈ A(x∗ ) and for any d ∈ F (x∗ ) there is some ∈ [0, 1] such that
Because gj (x∗ + αd) ≤ 0 for all α ∈ [0, ᾱ] and gj (x∗ ) = 0 for all j ∈ A(x∗ ), we obtain for all
j ∈ A(x∗ )
lim ∇gj (x∗ + αd) d ≤ 0,
α→0
so that d ∈ V (x∗ ). Therefore F (x∗ ) ⊆ V (x∗ ) and F (x∗ ) ⊆ V (x∗ ) [because V (x∗ ) is closed].
3
Section 3.3
Now we need to show that V (x∗ ) ⊆ F (x∗ ) for each of the parts (1) through (4).
(1) Let gj (x) = bj x + cj for all j, where bj are vectors and cj are scalars. Let d ∈ V (x∗ ). We
have
gj (x∗ + αd) = bj (x∗ + αd) + cj = gj (x∗ ) + αbj d.
If j ∈ A(x∗ ), then by the definition of V (x∗ ) we have bj d = ∇gj (x∗ ) d ≤ 0, so that gj (x∗ + αd) ≤
gj (x∗ ) = 0 for all α > 0. If j ∈ A(x∗ ) and bj d ≤ 0, then gj (x∗ + αd) ≤ gj (x∗ ) < 0 for any α > 0
[because this constraint is not tight at x∗ ]. If j ∈ A(x∗ ) and bj d > 0, then gj (x∗ + αd) ≤ 0 for all
α ≤ ᾱj , where ᾱj = −gj (x∗ )/(aj d) [here we use gj (x∗ ) < 0]. Therefore we have gj (x∗ + αd) ≤ 0
for all j and all α ≤ ᾱ, where
ᾱ = min ᾱj | j ∈ A(x∗ ), bj d > 0 .
Define dγ = γ d¯ + (1 − γ)d. By using the Mean Value Theorem, for each j there is some ∈ [0, 1]
such that
Let γ be fixed. If j ∈ A(x∗ ), then by using the fact gj (x∗ ) < 0 it can be seen that for all
sufficiently small α we have
This combined with the fact d ∈ V (x∗ ) implies that for all sufficiently small α
Therefore, for a fixed γ, there exists a sufficiently small ᾱ such that gj (x∗ + αdγ ) ≤ 0 for all j
and α ∈ (0, ᾱ]. Thus dγ ∈ F (x∗ ) for all γ and
lim dγ = d ∈ F (x∗ ).
γ→0
4
Section 3.3
By defining d = x − x∗ and by using gj (x∗ ) = 0 for all j ∈ A(x∗ ), from the preceding relation we
obtain
∇gj (x∗ )d < 0, ∀ j ∈ A(x∗ ),
(4) Let B be a matrix with rows consisting of ∇gj (x∗ ) for j ∈ A(x∗ ). Since these gradients are
linearly independent, B has full row rank, so that the square matrix BB is invertible and the
matrix Br = B (BB )−1 is well-defined. Let
⎛ ⎞
−1
⎜ . ⎟
d = Br ⎝ .. ⎠ .
−1
which is equivalent to
∇gj (x∗ ) d = −1, ∀ j ∈ A(x∗ ).
(d) For this problem we can easily see that the point x∗ = (0, 0) is a constrained local minimum.
We have
0 0
∇g1 (0, 0) = and ∇g2 (0, 0) = .
1 −1
Note that both constraints are active at x∗ = (0, 0), i.e., A(x∗ ) = {1, 2}. Evidently g1 and g2 are
not linear, so the condition (c1) does not hold. Furthermore, there is no vector d = (d1 , d2 ) such
that
∇g1 (0, 0) d = d2 < 0 and ∇g2 (0, 0) d = −d2 < 0.
Hence, the condition (c2) is violated. If the condition (c3) holds, then as seen in proof of
part (c3) the condition (c2) also holds, which is a contradiction. Therefore, at x∗ = (0, 0) the
condition (c3) does not hold. The vectors ∇g1 (0, 0) and ∇g2 (0, 0) are linearly dependent since
∇g1 (0, 0) = −∇g2 (0, 0), so the condition (c4) is also violated.
5
Section 3.3
or equivalently
µ0 0 0 0
+ + = .
µ0 µ1 −µ2 0
It follows that µ0 = 0, i.e., there is no Lagrange multiplier.
(e) Note that x | h(x) = 0 = x | ||h(x)||2 ≤ 0 , so that x∗ is also a local minimum for
the modified problem. The modified problem has a single constraint g1 (x) = ||h(x)||2 , which
is active at x∗ . Since g1 is not linear, the condition (c1) does not hold. Because ∇g1 (x∗ ) =
2∇h(x∗ )h(x∗ ) = 0, the conditions (c2) and (c4) are violated at x∗ . If g1 is convex and the
condition (c3) holds, then as seen in the proof of (c3), the condition (c2) also holds, which is a
contradiction. Hence, at x∗ each of the conditions (1)–(4) of part (c) is violated. From
and ∇g1 (x∗ ) = 0, it follows that µ∗0 ∇f (x∗ ) = 0, and since ∇f (x∗ ) = 0, we must have µ∗0 = 0,
i.e., there is no Lagrange multiplier.
3.3.6 www
Assume that there exist x ∈ n and µ ∈ m such that conditions (i) and (ii) hold, i.e.,
m
µi ai = 0, µ = 0, µ ≥ 0, (2)
i=1
where ai are row vectors of the matrix A. Without loss of generality, we may assume that µ1 > 0.
By pre-multiplying Eq. (1) with µi ≥ 0 and summing the obtained inequalities over i, we have
m
µi ai x ≤ µ1 a1 x < 0.
i=1
m
µi ai x = 0,
i=1
which is a contradiction. Hence, conditions (i) and (ii) cannot hold simultaneously.
6
Section 3.3
The proof will be complete if we can show that conditions (i) and (ii) cannot fail to hold
simultaneously. Indeed, if condition (i) fails to hold, the minimax problem
subject to x ∈ n
m
has 0 as its solution. Hence by Prop. 3.3.10, there exists a µ ≥ 0 with i=1 µi = 1 such that
m
i=1 µi ai = 0, or A µ = 0. Thus condition (ii) holds, and it follows that the conditions (i) and
3.3.7 www
Assume, to obtain a contradiction, that the conclusion does not hold, so that there is a sequence
{xk } such that xk → x∗ , and for all k, xk = x∗ , h(xk ) = 0, and f (xk ) < f (x∗ ) + (1/k)||xk − x∗ ||2 .
Let us write xk = x∗ + δ k y k , where
xk − x∗
δ k = xk − x∗ , yk = .
xk − x∗
The sequence {y k } is bounded and lies on the surface of the unit sphere, so it must have a
subsequence converging to some y with y = 1. Without loss of generality, we assume that the
whole sequence {y k } converges to y.
where
A+ (x∗ ) = {j | µ∗j > 0},
7
Section 3.3
By taking inner product of this relation with y and by using the equation ∇hi (x∗ ) y = 0, we
obtain
∇f (x∗ ) y + µ∗j ∇gj (x∗ ) y = 0.
j∈A+ (x∗ )
Since all the terms in the above equation have been shown to be nonpositive, they must all be
equal to 0, showing that Eq. (1) holds.
We will now show that y ∇2xx L(x∗ , λ∗ )y ≤ 0, thus coming to a contradiction [cf. Eq. (2)].
Since xk = x∗ + δ k y k , by the mean value theorem [Prop. A.23(b) in Appendix A], we have
1 k (δ k )2 k 2 ˜k k
||x − x∗ ||2 > f (xk ) − f (x∗ ) = δ k ∇f (x∗ ) y k + y ∇ f (ξ )y , (3)
k 2
(δ k )2 k 2 ¯k k
0 = hi (xk ) − hi (x∗ ) = δ k ∇hi (x∗ ) y k + y ∇ hi (ξi )y , i = 1, . . . , m, (4)
2
(δ k )2 k 2 ˆk k
0 ≥ gj (xk ) − gj (x∗ ) = δ k ∇gj (x∗ ) y k + y ∇ gj (ξj )y , j ∈ A(x∗ ), (5)
2
k
where all the vectors ξ˜k , ξ i , and ξˆjk lie on the line segment joining x∗ and xk . Multiplying Eqs.
(4) and (5) by λ∗i and µ∗j , respectively, adding them and adding Eq. (3) to them, we obtain
⎛ ⎞
1 k m
||x − x∗ ||2 > δ k ⎝∇f (x∗ ) + λ∗i ∇hi (x∗ ) + µ∗j ∇gj (x∗ )⎠ y k
k i=1 ∗
j∈A(x )
⎛ ⎞
k 2
(δ ) k ⎝ 2 ˜k m
µ∗j ∇2 gj (ξˆjk )⎠ y k .
k
+ y ∇ f (ξ ) + λ∗i ∇2 hi (ξ i ) +
2 i=1 ∗ j∈A(x )
m
Since δ k = ||xk − x∗ || and ∇f (x∗ ) + i=1 λ∗i ∇hi (x∗ ) + j∈A(x∗ ) µ∗j ∇gj (x∗ ) = 0, we obtain
⎛ ⎞
2 m
> y k ⎝∇2 f (ξ˜k ) + µ∗j ∇2 gj (ξˆjk )⎠ y k .
k
λ∗i ∇2 hi (ξ i ) +
k i=1 ∗ j∈A(x )
8
Section 3.3
3.3.10 www
(a) Consider a problem where there are two identical equality constraints [h1 (x) = h2 (x) for all
x], and assume that x∗ is a local minimum such that ∇h1 (x∗ ) = 0. Then, ∇f (x∗ )+λ∇h1 (x∗ ) = 0
for some λ. Take a scalar γ > 0 such that λ + γ > 0 and let λ∗1 = λ + γ and λ∗2 = −γ. Then we
have
∇f (x∗ ) + λ∗1 ∇h1 (x∗ ) + λ∗2 ∇h2 (x∗ ) = 0,
but since λ∗1 and λ∗2 have different signs, there is no x such that simultaneously we have λ∗1 h1 (x) >
0 and λ∗2 h2 (x) > 0. Thus λ∗1 and λ∗2 violate the last Fritz John condition. As an alternative
example, consider the following inequality constrained problem
minimize x1 + x2
Then x∗ = (0, 0) is a local minimum with A(x∗ ) = {1, 2}, and µ∗0 = µ∗1 = µ∗2 = 1 satisfy
Karush-Kun-Tucker conditions, namely
However, there is no point (x1 , x2 ) such that g1 (x1 , x2 ) > 0 and g2 (x1 , x2 ) > 0, i.e., the Fritz
John condition (iv) does not hold.
(b) For simplicity, assume that all the constraints are inequalities (equality constraints can be
handled by conversion to two inequalities). If ∇f (x∗ ) = 0, we can take µj = 0 for all j, and we
are done. Assume that ∇f (x∗ ) = 0 and consider the index subsets J ⊂ A(x∗ ) such that ∇f (x∗ )
is a positive combination of the gradients ∇gj (x∗ ), j ∈ J, and among all such subsets, let J ∗
have a minimal number of elements. Without loss of generality, let J ∗ = {1, . . . , s}, so we have
We claim that ∇g1 (x∗ ), . . . , ∇gs (x∗ ) are linearly independent. Indeed, if this were not so,
we would have for some α1 , . . . , αs , not all zero,
so that
∇f (x∗ ) + (µ1 + γα1 )∇g1 (x∗ ) + · · · + (µs + γαs )∇gs (x∗ ) = 0,
9
Section 3.3
for all scalars γ. Thus, we can find γ such that µj + γαj ≥ 0 for all j and µj + γαj = 0 for at least
one index j ∈ {1, . . . , r}. This contradicts the hypothesis that the index set J ∗ has a minimal
number of elements.
Thus ∇g1 (x∗ ), . . . , ∇gs (x∗ ) are linearly independent, so that we can find a vector h such
that
∇g1 (x∗ ) h = · · · = ∇gs (x∗ ) h = 1.
where γ is a positive scalar. By Taylor’s theorem, for sufficiently small γ, we have gj (x∗ +γh) > 0
and hence also µj gj (x∗ + γh) > 0 for all j = 1, . . . , s. Thus, the scalars µj , j = 1, . . . , s, together
with µj = 0 for j = s + 1, . . . , r, satisfy all the Fritz John conditions with µ0 = 1.
3.3.11 www
µ∗j ∇gj (x∗ ) = 0, (1)
j∈A(x∗ )
where µ∗1 , . . . , µ∗r are Lagrange multipliers satisfying the Fritz John conditions. Since the functions
gj (x) are convex over n , for any j ∈ A(x∗ ) and any feasible vector x we have
Therefore
µ∗j gj (x) ≥ µ∗j gj (x∗ ) + ∇gj (x∗ ) (x − x∗ )
µ∗j gj (x) = µ∗j gj (x) = 0
j∈A(x∗ ), µ∗
j
>0 j∈A(x∗ )
for all feasible x. This is possible only if gj (x) = 0 for all feasible x and j ∈ A(x∗ ) with µ∗j > 0.
Since not all µ∗j are equal to zero, there is at least one index j with µ∗j > 0.
10
Section 3.3
3.3.12 www
It is straightforward that the given condition is implied by the condition (iv) of Prop. 3.3.5.
To show the reverse, we replace each equality constraint hi (x) = 0 with the two constraints
hi (x) ≤ 0 and −hi (x) ≤ 0, and we apply the version of the Fritz John conditions given in the
−
i and λi be the multipliers corresponding to the constraints hi (x) ≤ 0 and
exercise. Let λ+
−hi (x) ≤ 0, respectively. Thus in any neighborhood N of x∗ there is a vector x such that
−
Evidently µ∗j gj (x) > 0 for all j with µ∗j > 0. Since λ∗i = λ+ ∗
i − λi , if λi = 0 then either
− ∗ − ∗
λ+ +
i > λi = 0 (corresponds to λi > 0) or λi > λi = 0 (corresponds to λi < 0). In either case,
Hence the Fritz John condition (iv), as given in Prop. 3.3.5, holds.
3.3.13 www
First, let us point out some important properties of a convex function that will be used in the
proof.
Convexity of f over n implies that f is continuous over n and the set ∂f (x) of subgra-
dients of f at x is nonempty for all x ∈ n (see Prop. B.24 of Appendix B).
(a) Let x∗ be a local minimum of f and S = {x | ||x − x∗ || ≤ }, where > 0 is such that
f (x) ≥ f (x∗ ) for all feasible x with x ∈ S. As in the proof of Prop. 3.1.1 (Sec. 3.1.1), for each
k ≥ 1 we consider the penalized problem
k k +
m r
1
minimize F k (x) = f (x) + (hi (x))2 + (gj (x))2 + ||x − x∗ ||2
2 i=1 2 j=1 2
subject to x ∈ S.
11
Section 3.3
Similar to Sec. 3.1.1, we conclude that the solution xk for the above problem exists and (using
the continuity of f , hi , gj+ ) that xk → x∗ as k → ∞. Therefore, there is an index k such that xk
is an interior point of S for all k ≥ k. For such k, we have 0 ∈ ∂F k (xk ), or equivalently
m
r
sk + ξik ∇hi (xk ) + ζjk ∇gj (xk ) + (xk − x∗ ) = 0,
i=1 j=1
for some sk ∈ ∂f (xk ) and ξik = khi (xk ), ζjk = kgj+ (xk ).
1 ξik ζjk
µk0 = , λki = , i = 1, . . . , m, µkj = , j = 1, . . . , r,
δk δk δk
and
m r
δ = 1 +
k (ξik )2 + (ζjk )2 .
i=1 j=1
Since xk → x∗ with sk ∈ ∂f (xk ) for all k, from Prop. B.24 and the boundedness of the se-
quence {µk0 , λk1 , . . . , λkm , µk1 , . . . , µkr } we see that there are a vector s∗ ∈ ∂f (x∗ ) and a limit point
(µ∗0 , λ∗1 , . . . , λ∗m , µ∗1 , . . . , µ∗r ) such that
m
r
µ∗0 s∗ + λ∗i ∇hi (x∗ ) + µ∗j ∇gj (x∗ ) = 0, (1)
i=1 j=1
is equal to zero. Otherwise, we can set µ∗0 = 1 in (1), which shows that the above vector is a
subgradient of f at x∗ . Thus, condition (i) of the exercise is satisfied. The rest of the proof is
the same as that of Prop. 3.3.5.
(b) Assume that ∇hi (x∗ ) are linearly independent, and that there is a vector d such that
If µ∗0 = 0 in (1), then using the same argument as in proof of Prop. 3.3.8 we arrive at contradiction.
Under the Slater condition, the proof that µ∗0 = 0 is the same as in Prop. 3.3.9.
12
Section 3.3
3.3.14 www
minimize r2
subject to x ∈ n .
where x∗ is optimal solution for the minimax problem and µ∗ is the corresponding Lagrange
multiplier.
Note that the cost function is continuous and coercive, so that the optimal solution always
exists. Furthermore, the cost function is convex and the given conditions are also sufficient for
optimality. By combining (i) and (ii) we have
p
p
x∗ = µ∗j yj , µ∗j = 1, µ∗j > 0, ∀ j,
j=1 j=1
(1) All constraints are active, so x∗ is at equal distance from all three points. Then x∗ is the
center of the circle circumscribed around the triangle of the three points. In this case x∗ must lie
within the triangle and is a positive combination of the yj , the coefficients being the multipliers.
This corresponds to the case when the triangle is not obtuse.
(2) Only two of the constraints are active, in which case x∗ lies on the line connecting the
two points. This occurs when the triangle formed by the given points is obtuse. Then x∗ is the
midpoint of the longest side of the triangle. If yj is not the end point of the longest side, then
µj = 0. The other two Lagrange multipliers are both positive.
13
Section 3.3
Now consider the degenerate case when the three points lie on the same line. We can assume
that y3 lies between y1 and y2 . Then the optimal point x∗ is the midpoint of the segment joining
y1 and y2 . The Lagrange multipliers µ∗1 and µ∗2 are positive, while µ∗3 = 0.
3.3.15 www
implies
xk − x y
ik
lim − = 0,
k→∞ ||xk
ik − x|| ||y||
which by the definition of T (x) means that y ∈ T (x). Thus, T (x) is closed.
(b) Let F (x) and F (x) denote, respectively, the set of feasible directions at x and its closure.
First, we will prove that F (x) ⊆ T (x) holds, regardless of whether X is convex. Let d ∈ F (x).
Then there is an α > 0 such that x+αd ∈ X for all α ∈ [0, α]. Choose any sequence {αk } ⊆ (0, α]
xk −x
with αk → 0 as k → ∞. Define xk = x + αk d. Evidently xk ∈ X \ {x}, and ||xk −x||
= d
||d||
converges to d
||d|| . Hence d ∈ T (x). It follows that F (x) ⊆ T (x), and since T (x) is closed, we
have F (x) ⊆ T (x).
Next, we prove that T (x) ⊆ F (x). Let y ∈ T (x) and {xk } ⊆ X \ {x} be such that
xk − x y
= + ξk ,
||x − x||
k ||y||
where ξ k → 0 as k → ∞. Since X is a convex set, the direction xk − x is feasible at x for all k.
xk −x
Therefore, the direction dk = ||xk −x||
· ||y|| = y + ξ k ||y|| is feasible at x for all k, i.e., {dk } ⊆ F (x).
Since
lim dk = lim (y + ξ k ||y||) = y,
k→∞ k→∞
14
Section 3.3
3.3.16 www
Let x be any vector in X. We will show that T (x) = V (x). We have, in general T (x) ⊂ V (x)
(see e.g., the proof of Prop. 3.3.17), so we focus on showing that V (x) ⊂ T (x). Let y ∈ V (x), so
that we have
∇gj (x) y ≤ 0, ∀ j ∈ A(x).
xk = x + αk y.
For all j ∈ A(x) we have gj (x) = 0, and using the concavity of gj , we obtain
3.3.17 www
Let y be a vector such that ∇gj (x∗ ) y < 0 for all j ∈ A(x∗ ). By continuity of ∇gj (x) (as a
function of x and j), there exist a neighborhood N of x∗ and a neighborhood A of A(x∗ ) (relative
to J) such that
∇gj (x) y < 0, ∀ x ∈ N, ∀ j ∈ A. (1)
gj (x∗ + αy) = gj (x∗ ) + α∇gj (x∗ + θαy) y ≤ α∇gj (x∗ + θαy) y, (3)
for some θ ∈ (0, 1). Since x∗ + θαy ∈ N and j ∈ A, from Eqs. (1) and (3) we obtain
For any α with 0 < α ≤ a the point x∗ + αy belongs to N , which together with Eq. (2) implies
The last two inequalities show that y is a feasible direction of X at x∗ . In the solution to part (b)
of Exercise 3.3.15, it is shown that the set of feasible directions at x∗ is a subset of the tangent
cone at x∗ , regardless of the structure of the set X.
15
Section 3.3
3.3.18 www
Assume that we have shown the validity of the Mangasarian-Fromovitz constraint qualification
for the problem without equality constraints, i.e., for a local minimum x∗ , there exist Lagrange
multipliers under the condition that there is a vector d such that
Now, consider the problem with equality and inequality constraints. Assume that there is
a vector d such that
∇hi (x∗ ) d = 0, ∀ i = 1, . . . , m,
(2)
∇gj (x∗ ) d < 0, ∀ j ∈ A(x∗ ).
Since the vectors ∇h1 (x∗ ), . . . , ∇hm (x∗ ) are linearly independent, by reordering the coordinates
of x if necessary, we can partition the vector x as x = (xB , xR ) such that the submatrix ∇B h(x∗ )
(the gradient matrix of h with respect to xB ) is invertible. The equation
h(xB , xR ) = 0
has the solution (x∗B , x∗R ), and the implicit function theorem (Prop. A.25 of Appendix A) can be
used to express xB in terms of xR via a unique continuously differentiable function φ : S → m
defined over a sphere S centered at x∗R . In particular, we have x∗B = φ(x∗R ), h (φ(xR ), xR ) = 0
for all xR ∈ S, and
−1
∇φ(xR ) = −∇R h (φ(xR ), xR ) (∇B h (φ(xR ), xR )) , ∀ xR ∈ S, (3)
where ∇R h is the gradient matrix of h with respect to xR . Observe that x∗R is a local minimum
of the problem
min F (xR )
(4)
subject to Gj (xR ) ≤ 0, j = 1, . . . , r,
where F (xR ) = f (φ(xR ), xR ), Gj (xR ) = gj (φ(xR ), xR ). Note that this problem has no equality
constraints. From (2) we have
and
∇gj (x∗ ) d = ∇B gj (x∗ ) dB + ∇R gj (x∗ ) dR < 0, (5)
for all j ∈ A(x∗ ). Since ∇B h(x∗ ) is invertible, from the first relation above we obtain
−1
dB = − ∇B h (φ(x∗R ), x∗R ) ∇R h (φ(x∗R ), x∗R ) dR ,
16
Section 3.3
dB = ∇φ(x∗R ) dR .
which is equivalent to
∇Gj (x∗R ) d < 0, ∀ j ∈ A(x∗ ).
This means that the Mangasarian-Fromovitz constraint qualification is satisfied for problem (4),
so there are Lagrange multipliers µ∗1 , . . . , µ∗r such that
r
0 = ∇F (x∗R ) + µ∗j ∇Gj (x∗R ) = ∇φ(x∗R )∇B f (x∗ ) + ∇R f (x∗ )
j=1
r
+ µ∗j (∇φ(x∗R )∇B gj (x∗ ) + ∇R gj (x∗ ))
j=1
⎛ ⎞ (6)
r
= ∇φ(x∗R ) ⎝∇B f (x∗ ) + µ∗j ∇B gj (x∗ )⎠ + ∇R f (x∗ )
j=1
r
+ µ∗j ∇R gj (x∗ ).
j=1
Define
B = ∇B h (φ(x∗R ), x∗R ) , R = ∇R h (φ(x∗R ), x∗R )
and ⎛ ⎞
r
λ∗ = −B −1 ⎝∇B f (x∗ ) + µ∗j ∇B gj (x∗ )⎠ .
j=1
−1
Then from Eq. (3) we see that ∇φ(x∗R ) = −R B , which combined with Eq. (6) implies
r
∇R f (x ) + R λ +
∗ ∗ µ∗j ∇R gj (x∗ ) = 0.
j=1
The proof of the existence of the Lagrange multipliers under the Slater constraint qualifica-
tion is straightforward from the preceding analysis by noting that the vector d = x − x∗ satisfies
the Mangasarian-Fromovitz constraint qualification.
17
Section 3.3
3.3.19 www
For simplicity we assume that there are no equality constraints; the subsequent proof can be
easily extended to the case whether there are some inequality constraints. To show that the
Mangasarian-Fromovitz constraint qualification implies boundedness of the set of Lagrange mul-
tipliers, follow the given hint.
Conversely, if the set of Lagrange multipliers is bounded, there cannot exist a µ = 0 with
µ ≥ 0 and j∈A(x∗ ) µj ∇gj (x∗ ) = 0, since adding γµ, for any γ > 0, to a Lagrange multiplier
gives another Lagrange multiplier. Hence by the theorem of the alternative of Exercise 3.3.6,
there must exist a d such that ∇gj (x∗ ) d < 0 for all j ∈ A(x∗ ).
3.3.20 www
We have
0
∇h1 (x) = ,
1
⎧ 3
⎪
⎪ 4x sin 1
− x2 cos 1
⎪
⎪
1 x1 1 x1
0,
if x1 =
⎨ −1
∇h2 (x) =
⎪
⎪ 0
⎪
⎪ if x1 = 0,
⎩
−1
and it can be seen that ∇h1 and ∇h2 are everywhere continuous. Thus, for λ1 = 1, λ2 = 1, we
have
λ1 ∇h1 (0) + λ2 ∇h2 (0) = 0.
On the other hand, it can be seen that arbitrarily closely to x∗ = (0, 0), there exists an x such
that h1 (x) > 0 and h2 (x) > 0. Thus x∗ is not quasinormal, although it is seen (most easily, by a
graphical argument) that x∗ is quasiregular.
3.3.21 www
(a) Without loss of generality, we assume that there are no equality constraints and that all
inequality constraints are active at x∗ . Based on the definition of quasinormality, it is easy
to verify that x∗ is a quasinormal vector of X if it is a quasinormal vector of X. Conversely,
suppose that x∗ is a quasinormal vector of X, but not a quasinormal vector of X. Then there
exist Lagrange multipliers µ1 , . . . , µr that satisfy the Fritz John conditions with µ0 = 0 and
µj > 0 for some j ∈ J (for otherwise, x∗ would not be a quasinormal vector of X). From the
18
Section 3.3
definition of the set J it follows that there is a vector y ∈ V (x∗ ) such that ∇gj (x∗ ) y < 0. By
multiplying the relation
r
µj ∇gj (x∗ ) = 0
j=1
with y, we obtain
r
0= µj ∇gj (x∗ ) y ≤ µj ∇gj (x∗ ) y < 0,
j=1
xk − x∗ y
xk → x∗ , → .
||xk − x∗ || ||y||
for all j. This implies gj (xk ) < 0 for all j ∈ J and all sufficiently large k. Therefore xk ∈ X for all
k sufficiently large, and consequently y is in the tangent cone of X at x∗ . Hence Ṽ (x∗ ) ⊂ T (x∗ ),
which is equivalent to quasiregularity of x∗ with respect to the set X.
(c) The given statement follows from parts (a) and (b).
3.3.22 www
Without loss of generality, we can assume that there are no equality constraints (every equality
constraint hi (x) = 0 can be replaced by two inequalities hi (x∗ ) ≤ 0 and −hi (x∗ ) ≤ 0 with hi (x)
and −hi (x) being linear, and therefore concave). Since x∗ is a local minimum, there exist a scalar
µ0 and Lagrange multipliers λ1 , . . . , λm , µ1 , . . . , µr satisfying the Fritz John conditions. Assume
that µ0 = 0. Then
r
µj ∇gj (x∗ ) = µj ∇gj (x∗ ) = 0. (1)
j=1 j∈A(x∗ )
19
Section 3.4
µj ∇gj (x∗ ) d = 0. (2)
j∈A(x∗ )
µj ∇gj (x∗ ) d ≤ µj0 ∇gj0 (x∗ ) d < 0,
j∈A(x∗ )
which is a contradiction to Eq. (2). Therefore for all j0 ∈ A(x∗ ) \ J we must have µj = 0. Then
from Eq. (1) we have
µj ∇gj (x∗ ) = 0. (3)
j∈J
Now we use the same line of argument as in the proof of Prop. 3.3.6 in order to arrive at a
contradiction. In particular, since gj is concave for every j ∈ J, we have
where the last equality follows from Eq. (3) and the fact that µj gj (x∗ ) = 0 for all j [by the Fritz
John condition (iv)]. On the other hand, we know that there is some j ∈ J for which µj > 0 and
an x satisfying gj (x) > 0 for all j with µj > 0. For this x, we have j∈J µj gj (x) > 0, which
contradicts Eq. (4). Thus, we can take µ0 = 1 so that x∗ satisfies the necessary conditions of
Prop. 3.3.7.
SECTION 3.4
3.4.3 www
20
Section 3.4
m
If cj − i=1 µi aij = 0, then q(µ) = −∞. Thus the dual problem is
m
max µi bi
i=1
m
µi aij = cj , j = 1, . . . , n
i=1
µ ≥ 0.
min −b µ,
Aµ=c,µ≥0
If ai x − bi < 0 for any i, then p(x) = −∞. Thus the dual of (D) is
subject to A x ≥ b.
Aµ = c.
Ax∗ − b ≥ 0,
21
Section 3.4
Next, consider
(P ) min c x ⇐⇒ max b µ. (D)
A x≥b,x≥0 Aµ≤c,µ≥0
min −b µ,
Aµ≤c,µ≥0
If ai x − bi < 0 for any i, then p(x) = −∞. Thus the dual of (D) is
subject to A x ≥ b, x ≥ 0
Ax∗ − b ≥ 0,
22
Section 3.4
3.4.4 www
m
(a) Let λj be a Lagrange multiplier associated with the constraint i=1 xij = βj , and let νi be
n
a Lagrange multiplier associated with the constraint j=1 xij = αi . Define
X = {x | xij ≥ 0, ∀ i, j}.
m
n
= (aij − νi − λj )xij + νi αi + λj βj .
i,j i=1 j=1
n
m
= λj βj + inf (aij − λj )αi ,
1≤j≤n
j=1 i=1
subject to λ ∈ n .
23
Section 3.4
(b) & (c) The Lagrange multiplier λj can be interpreted as the price pj . So if the transportation
problem has an optimal solution x∗ , then its dual also has an optimal solution, say p∗ , and
q(p∗ ) = aij x∗ij ,
i,j
i.e.,
n
m
p∗j βj + min (aij − p∗j )αi = aij x∗ij . (1)
1≤j≤n
j=1 i=1 i,j
n
n
m
p∗j βj = p∗j x∗ij ,
j=1 j=1 i=1
m
min {aij − p∗j }αi = (aij − p∗j )x∗ij . (2)
1≤j≤n
i=1 i,j
n
By the feasibility of x∗ , we have j=1 x∗ij = αi for all i, and from Eq. (2) it follows that
aij − p∗j − min {aij − p∗j } x∗ij = 0.
1≤j≤n
i,j
Since all the terms in the summation above are nonnegative, we must have
aij − p∗j − min {aij − p∗j } x∗ij = 0, ∀ i, j.
1≤j≤n
Since p∗ is arbitrary, this property holds for every dual optimal solution p∗ .
24
Section 3.4
whose optimal value is equal to minx∈X maxz∈Z x Az. Introduce dual variables z ∈ m and
n
ξ ∈ , corresponding to the constraints A x − ζe ≤ 0 and i=1 xi = 1, respectively. The dual
function is
n
q(z, ξ) = inf ζ+ z (A x − ζe) + ξ 1− xi
xi ≥0, i=1,...,n
i=1
⎧ ⎛ ⎞ ⎫
⎨ m ⎬
= inf ζ ⎝1 − zj ⎠ + x (Az − ξe) + ξ
xi ≥0, i=1,...,n ⎩ ⎭
j=1
m
ξ if j=1 zj = 1, ξe − Az ≤ 0,
=
−∞ otherwise.
Thus the dual problem, which is to maximize q(z, ξ) subject to z ≥ 0 and ξ ∈ , is equivalent to
the linear program
max ξ,
ξe≤Az, z∈Z
25